1 (edited by Vincenzo 2025-10-11 20:45:01)

Topic: Follow-up: Effect of Filter Combinations and Post-Incubation Results

In the first part of this experiment (https://forexsb.com/forum/post/82865/#p82865),
I tested how different filter combinations applied to the same EA collection (233 EAs) affected Walk-Forward Analysis (WFA) success.

Results showed that filters alone could dramatically shift outcomes — from below 20 % to full 100 % WFA success.

To go further, I selected one setup — Combination A — with:

- Profit Factor > 1.1

- Winning Trades > 60 %

- SQN > 2

EAs meeting these rules were incubated on demo accounts and compared with older incubators created before SQN was used.
All results were manually collected last week from EAs running on Demo account across different symbols, timeframes, and brokers.

Results as Avg Win Rate (Profitable EAs / Total):

- Without SQN filter: 33 % (sample 348EAs)
- With SQN filter (Combination A): 55 % (sample 267EAs)

A total of 615 EAs were benchmarked.
Some SQN-filtered incubators reached over 70 % profitable EAs after incubation, while pre-SQN groups stayed around 25–40 %.

(Chart: EA Win Rate % After Incubation – SQN Filter vs No SQN)

Interpretation

The data suggest structural robustness over time.
Many EAs in this benchmark have been running for 300+ days, still active, trading, and trackable across multiple brokers.

Key points:

- Filter design strongly impacts long-term EA survival.
- SQN combined with PF and Win % clearly improves success probability.

Cross-broker, multi-timeframe behavior shows the effect isn’t dataset-specific.

Next Step

A practical approach might be:

- In the while several EAs have been moved from demos to live account and will be evaluated soon.

- Run 10 uncorrelated SQN-filtered EAs in one live portfolio. After ~50 trades per EA, switch off the weaker ones and add new SQN candidates to keep improving the portfolio’s overall win rate.

- I’ve also been copying only the trades from profitable SQN EAs on demo to live MT4, which helps reduce the usual performance drop when moving EAs from demo to real accounts.

Does an even better filter combination exist — one that can outperform this setup?

Let’s go back to testing and studying.

And above all, these results prove that the first part of the experiment was not just a dataset-driven bias, but a replicable effect visible across brokers, timeframes, and extended incubation periods.

Open to constructive observations...

Post's attachments

2025-10-09 ABS WinRate.png 55.47 kb, file has never been downloaded. 

You don't have the permssions to download the attachments of this post.

2 (edited by Blaiserboy 2025-10-09 10:17:44)

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

I think you will find Mt5 data includes varying spreads whereas Mt4 is constant spead, if spreads are material in your experimenting.

Also I think SQN requires 100 trades to be effective(Statistical Significance).

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

That’s a very good hint, thanks.
I honestly hadn’t considered the difference in spread behavior between MT4 and MT5.
I’ve been gradually switching to MT5, but around 90 % of what I run is still on MT4.

Spread impact is already on my list for the next test matrix, but as you know, these things are resource- and time-intensive.
I prefer an incremental approach — too many variables at once can make interpreting the results much harder.

Have you already done any MT4 vs MT5 spread benchmark yourself?

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

Sadly to say I have not done any great amount of testing.

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

Perhaps when you have arrived at the final testing technique you will share it, I notice that you have a very detailed method and use a lot of samples.

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

of course, when data and results can support discussions here I will be back

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

Vincenzo,
I read both parts of your experiment carefully and I want to break something down here not to debate, but to bring full clarity.

Because what you describe is not robustness testing.
It’s statistical bias presented as validation.

Let’s go step by step.

1. Same dataset = no generalization

Both parts of your experiment use the same 2016–2023 dataset from the same EA pool.
That means the “validation” is performed on the same structural regime the strategies were built in.

So when you say “filters alone shifted outcomes from 20% to 100% WFA success,”
you’re not proving robustness — you’re proving how sensitive results are to filter bias inside one dataset.

No part of this setup tests true out-of-sample behavior.
Real robustness requires testing across independent data eras
for example, build on 2008–2015, validate on 2016–2023, and test on 2024–2025.

Without that separation, your 100% WFA success is just alignment within one era’s bias.

2. 91 combinations = 91 curve-fits

Running 91 different filter combinations on the same EA pool guarantees false positives.
Even random data would produce several “perfect” combinations.

That’s called data-mining bias you’re not measuring robustness, you’re measuring how well a specific filter configuration fits your validation subset.

If you change the data window or rebuild from scratch, those “100% setups” will collapse
because they were statistically tuned to that particular run.

3. Correlated samples = illusion of variety

Your 233 EAs were all generated under the same conditions same symbol, same timeframe, same acceptance criteria.
That means they share the same market DNA.

They’re not 233 independent strategies; they’re 233 variations of the same structure.

So the differences in “success rate” between filters aren’t structural they’re intra-cluster noise.
It’s like testing 233 twins and calling them diverse.

4. Demo ≠ validation

Incubating EAs on demo accounts and calling that “proof of robustness” is another trap.
Demo feeds use different liquidity, spreads, and execution models than live servers.
They’re cleaner, smoother, and far more forgiving.

So a 55% “win rate” on demo doesn’t validate anything.
It only confirms that your systems survive under simulated conditions, not real execution pressure.

5. Post-hoc thresholds = confirmation bias

You defined your “Combination A” filters after observing earlier outcomes
(PF > 1.1, Win% > 60, SQN > 2).

That’s not forward design that’s backward justification.
You shaped the filter rules around what already worked in your first test,
which guarantees inflated results in your second.

True robustness means predefining your filter before running any test.

6. Survivorship bias

You mention “many EAs have been running for 300+ days, still active and profitable.”
But that’s only because the losing ones were dropped. And still its demo account not live account verry important

That’s survivorship bias measuring only the survivors automatically skews the average upward.
It’s like claiming an army is strong because the dead soldiers aren’t counted.

7. Cross-broker ≠ cross-regime

You mention multi-broker and multi-timeframe data as proof the effect isn’t dataset-specific.
But cross-broker doesn’t mean cross-market.

If all brokers feed the same price source and timeframe, you’re still inside one market regime.
To prove generalization, you need uncorrelated market conditions not just different servers.

8. The reality

What your chart actually shows is how filter selection manipulates perceived success,
not how systems generalize through time.

You demonstrated that with the same dataset and same EAs,
you can make results look 20% or 100% depending on how you slice them.

That’s not robustness.
That’s bias exposure.

9. The real robust process

To truly prove robustness, you would need to:

Separate build / validation / test eras (e.g., 2008–2015 → 2016–2023 → 2024–2025)

Generate independent EA sets per era

Run Monte Carlo randomization on data, spreads, and execution

Validate across different brokers and feeds

Forward-test on demo and live with predeclared filters

Only when a setup survives across all those layers can it be called robust.

So in short, your experiment is an interesting demonstration of how filters can distort perception
but it doesn’t prove structural robustness.

It proves the opposite:
that results can look “replicable” while still being the product of methodological bias.

Robustness survives independent data not repeated filters.

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

One more point: I’ve personally seen plenty of EAs survive demo and break live within weeks even on the same broker. Demo ≠ execution reality (liquidity, spread widening, slippage, queueing). If we’re talking robustness, that means live survival, not demo speculation. Define it clearly: 6–12 months live, 50–100 real trades per EA, stable risk, no re-optimization mid-stream, same broker feed. Until then, any win-rate on demo is methodology noise not edge.
I’d expect only a small fraction of your demo-winners to pass that live bar.

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

Additional Notes on the Experiment Context

Just to clarify a few points for readers:

- The datasets, brokers, and starting periods were different — this was not same-data validation.

- The goal wasn’t to prove universal robustness, but to see how filter design affects selection success rate across independent demo incubations.

- All EAs had already passed Monte Carlo validation, so the test measured the marginal impact of filters, not basic robustness.

- Demo incubation is one gate in a multi-step pipeline (Generation → Monte Carlo → Incubation → Live).

- Combination A was predefined and applied both to past and new collections — part of an iterative workflow, not a backward fitting.

- Win rate check point included all EAs (winners + losers), avoiding survivorship bias.

- Cross-broker and multi-timeframe results confirm the effect isn’t dataset-specific.

In short, this experiment isolates how selection criteria influence long-term survival probability in demo validation. The final live journey is just started.

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

I temporarily banned Jurgen2100 for trolling.

Re: Follow-up: Effect of Filter Combinations and Post-Incubation Results

The guys have to be allowed to express themselves without harassment, they are contributing and in this thread there may be something really useful if they are allowed to continue.

I, for one, am quite interested in this thread.