Topic: Genetic Selection… the next rabbit hole

Full disclosure: I got Claude to write this up, because I fancied documenting my failures about as much as you'd expect. The journey's mine though — promise.

I wanted to share where I've gotten to and ask the people who've gone further than me for some pointers.

Chapter 1 — The textbook pipeline. Built the full walk-forward: In-Sample to generate, Out-of-Sample to filter, a true held-out Forward as the final judge. Multi-asset, multi-timeframe, multiple brokers' data, Monte-Carlo, all of it. The plumbing's solid — I've stress-tested it.

Chapter 2 — Rich pools don't translate. I can churn out big, healthy-looking pools: nice IS curves, survive OOS, pass the usual robustness gates. But none of that richness carries into Forward. Every metric I'd rank on (expectancy, R²/linearity, stability ... I (think) I've tried them all) shows basically zero correlation with what the strategy actually does forward.

Chapter 3 — ML didn't save it. So I did the obvious next thing: build features from the IS/OOS walk-forward behaviour, label by forward outcome, train a model to pick survivors. Same wall. The IS side (IS/OOS degradation etc) just doesn't carry enough signal about the future. The model can re-rank a useless ranking, but it can't invent signal that isn't there.

Chapter 4 — A genetic-selection wrapper around gen.js. Latest move: I wrapped gen.js in my own GA — breeding strategies generation-to-generation with my own fitness functions, instead of leaning on the built-in search. Lots more control, and some encouraging signs… but I keep bumping into the same questions about what to actually select on, which is why I'm here.

The questions:

1. Any IS/OOS metric that actually correlates with forward, not just looks good in backtest?
2. Shorter / more recent in-sample windows — do they transfer better for you than long ones?
3. Is the real edge in constraining what you generate (skeletons, regime typing, indicator grammar) rather than filtering after?

Happy to share more on the methodology. Where am I being naive?

Cheers.
Ben

Re: Genetic Selection… the next rabbit hole

> 1. Any IS/OOS metric that actually correlates with forward, not just looks good in backtest?

Both  IS and OOS are calculated by the Backtester engine on historical data. They may correlate with future live trading if the strategy is not curve-fitted (over-optimized).
I have zero expectations for a strategy's real trading results until I see any.

> 2. Shorter / more recent in-sample windows — do they transfer better for you than long ones?
I cannot say. Any curve-fit strategy has perfect IS and OOS stats. That's why I don't look at IS, OOS at all. However, I retest the generated strategies on new data to assess their performance.

> 3. Is the real edge in constraining what you generate (skeletons, regime typing, indicator grammar) rather than filtering after?
My personal style is to leave the Generator working without constraints. Then I retest the strategies in new data a month or two later. I manually evaluate the trading rules of the ones with good performance. I only trade strategies with "meaningful" trading rules (which is fully subjective, of course).

My rule is: "no expectations." If a strategy works, it is okay. If it does not work, I get the next one.