Topic: 8-Phase Automated Validation Pipeline
8-Phase Automated Validation Pipeline
Hi everyone,
I've been using EA Studio and Express Generator for years now, and after a long journey of trial and error — testing different robustness approaches, losing money to overfitted strategies, and slowly figuring out what actually works — I've finally built a fully automated validation pipeline that I'm genuinely happy with.
I wanted to share it with the community because this forum and Mr. Popov's continuous work on Express Generator have been instrumental in getting me here. The speed, the scriptability, the constant updates — Express Generator is truly a lovely system, and I owe a big thank you to Mr. Popov for making it all possible. Every new version brings something useful, and the tool just keeps getting better.
So here's my complete workflow. I hope it helps some of you avoid the mistakes I made along the way.
---
The Problem: Overfitting is the Silent Killer
We all know the story. You generate a beautiful strategy with a perfect equity curve, put it on a demo or live account, and within weeks, it's underwater. The strategy was curve-fitted to historical data and had no real edge.
I spent a long time trying simple IS/OOS splits and basic Monte Carlo tests, but strategies kept failing in live trading. The turning point was realizing that one robustness test is not enough. You need multiple independent tests that challenge the strategy from completely different angles. If a strategy survives all of them, the probability that it's genuinely capturing a market pattern goes up dramatically.
---
The 8-Phase Pipeline Overview
Here's the full pipeline. Each phase tests a different dimension of robustness:
+---+---------------------------------+----------+---------+----------------------------------------------+
| # | Test |Data Range| Server | Purpose |
+---+---------------------------------+----------+---------+----------------------------------------------+
| 1 | Generate In-Sample | 0%– 50% | Premium | Strategy discovery |
| 2 | Out-of-Sample Validation | 50%– 80% | Premium | Overfit filter |
| 3 | Monte Carlo — Execution Stress | 0%– 80% | Premium | Survives real-world execution? |
| 4 | Monte Carlo — Param Sensitivity | 0%– 80% | Premium | Overfit to specific indicator values? |
| 5 | Multi-Timeframe Validation | 0%–100% | Premium | Captures a real pattern, not timeframe noise?|
| 6 | Multi-Instrument Validation | 0%–100% | Premium | Works on related markets? |
| 7 | Cross-Broker Validation | 0%– 80% | Eightcap| Real pattern or data artifact? |
| 8 | Forward Test | 80%–100% | Premium | Performs on completely unseen data? |
+---+---------------------------------+----------+---------+----------------------------------------------+The key insight: Phases 1–4 use 80% of the data maximum, reserving 20% that the strategy has NEVER seen for the final forward test. Phases 5 and 6 use 100% data because they test on different timeframes/instruments entirely — the strategy has never been optimized on those markets.
---
Phase 1: Generate In-Sample (0%–50%)
Generation uses only the first half of the available data. This is deliberate — it gives us three more independent data segments to validate against.
I typically run this for 24 hours per instrument using `max_working_minutes = 1440`. The more strategies generated, the better the chances of finding genuinely robust ones.
---
Phase 2: Out-of-Sample Validation (50%–80%)
Feed the Phase 1 collection into Express Generator with data_start_percent = 50 and data_end_percent = 80. No generation, pure validation.
This 30% data window is completely unseen during generation. Strategies that were curve-fitted to the first 50% will typically fail here. This is your first major filter.
Settings are slightly relaxed compared to Phase 1 because the data window is smaller.
---
Phase 3: Monte Carlo — Execution Stress (0%–80%)
This is where I learned an important lesson. I used to combine all Monte Carlo tests into one phase, and it was eliminating 99.8% of strategies. Separating execution stress from parameter sensitivity into distinct phases was a game-changer — it gives you much better diagnostics about WHY a strategy failed.
Phase 3 tests execution robustness only — what happens when real-world conditions degrade?
---
Phase 4: Monte Carlo — Parameter Sensitivity (0%–80%)
Now test the opposite dimension: are the indicator parameters fragile?
Criteria are more lenient here than in Phase 3:
Why more lenient? Because parameter changes can legitimately shift a strategy's behavior more than execution noise does. A strategy using RSI(14) that still works at RSI(12) or RSI(16) is robust. It doesn't need to be equally profitable — just not broken.
Important lesson: I initially used 5% parameter variation, and it was too small to be meaningful. Use 15–20% for real stress testing. If a strategy breaks with 15% parameter variation, it's dangerously overfit.
---
Phase 5: Multi-Timeframe Validation (0%–100%)
This is where it gets interesting. A strategy generated on H1 should show some viability on adjacent timeframes (M30 and H4). If it completely falls apart, it's likely exploiting timeframe-specific noise rather than a real market pattern.
The workflow automatically maps adjacent timeframes:
- M5 → M1, M15
- M15 → M5, M30
- M30 → M15, H1
- H1 → M30, H4
- H4 → H1, D1
- D1 → H4Criteria are very lenient — we're not expecting profitability, just "not catastrophic":
- Must pass at least 1 of 2 adjacent timeframes (50% pass rate)
The workflow automatically fetches data for adjacent timeframes before testing. This uses 100% of data since it's a completely different timeframe — there's no "seen/unseen" concern.
---
Phase 6: Multi-Instrument Validation (0%–100%)
Same idea but across related currency pairs. A EURUSD strategy should show some viability on structurally related pairs that share market dynamics.
The workflow has a hardcoded mapping of related instruments:
- EURUSD → GBPUSD, USDCHF, EURGBP, EURJPY, EURCAD
- USDJPY → EURJPY, GBPJPY, AUDJPY, CHFJPY, CADJPY
- GBPUSD → EURUSD, EURGBP, GBPJPY, GBPAUD, GBPCAD
- AUDUSD → NZDUSD, AUDJPY, AUDCAD, AUDCHF, AUDNZDPass requirement is percentage-based: min_pass = max(1, floor(count / 3)) — roughly 33%:
- 3 related instruments → minimum 1 must pass
- 6 related instruments → minimum 2 must pass
Tagging system: The results get tracked as tags like MI4of5 (passed 4 out of 5 instruments). These tags flow through to the final output filename, so at the end you can see at a glance:
- P8_EURUSD_H1_TF2of2_MI5of5.json = fully validated, strong
- P8_EURUSD_H1_TF1of2_MI2of6.json = passed minimums, borderline
---
Phase 7: Cross-Broker Validation (0%–80%)
Test the strategy against a different broker's data feed. I use Eightcap as the cross-broker (ECN with tight spreads, which makes it a slightly tougher test).
This answers: is the strategy capturing a real market pattern, or is it an artifact of Premium's specific price feed?
---
Phase 8: Forward Test (80%–100%)
The final boss. The last 20% of data that the strategy has never seen in any form. No generation, no optimization, no Monte Carlo — just pure validation on unseen data.
This is the closest simulation to what will happen when you go live. If a strategy survives all 7 previous phases and still performs well on completely unseen forward data, you can have much higher confidence it has a genuine edge.
Settings are moderate (the 20% data window is smaller, so absolute metrics will be lower)
---
Real Results: USDJPY H1 Through All 8 Phases
Here's an actual strategy that made it through the entire pipeline:
+---------------+-----------------------+------------------------+-----------------------+
| Metric | Phase 3 (80% Premium) | Phase 7 (80% Eightcap) | Phase 8 (20% Forward) |
+---------------+-----------------------+------------------------+-----------------------+
| Profit | $3,939 | $4,528 | $3,763 |
| Profit/day | $0.84 | $0.96 | $3.22 |
| Profit Factor | 1.43 | 1.13 | 1.33 |
| R-squared | 86.21 | 37.61 | 72.22 |
| Return/DD | 4.87 | 2.41 | 3.66 |
| Max Drawdown | 6.6% | 18.5% | 8.31% |
| Stagnation | 19.3% | 42.6% | 16.15% |
| Trades | 517 | 907 | 346 |
+---------------+-----------------------+------------------------+-----------------------+The strategy actually improved on forward data — profit per day jumped from $0.84 to $3.22 and drawdown stayed low at 8.3%. It also passed all adjacent timeframes (M30, H4) and all 5 related JPY instruments (EURJPY, GBPJPY, AUDJPY, CADJPY, CHFJPY).
Final output: P8_USDJPY_H1_TF2of2_MI5of5.json
---
How It All Runs: Automation
The entire pipeline is a single Windows batch script.
The script:
1. Fetches data for the main instrument
2. Runs all 8 phases sequentially
3. Automatically fetches data for adjacent timeframes before Phase 5
4. Automatically fetches data for related instruments before Phase 6
5. Prints statistics after each phase
6. Tags the final output with TF and MI results
---
Lessons Learned the Hard Way
1. Separate your Monte Carlo tests. Combined MC (execution + parameters) was killing 99.8% of strategies. Separated, I get much better survival rates AND know exactly why a strategy failed.
2. Don't be too strict too early. I spent months with overly tight criteria and getting zero strategies through. Better to have moderate filters across many phases than one impossibly strict filter.
3. 5% parameter variation is useless. Use 15–20% to actually stress test. If your strategy breaks at 15% variation, it WILL break in live trading.
4.valid_tests_percent = 80 is too strict for initial discovery. Start with 65–75% and only tighten later if needed.
5. Reserve 20% of data for forward testing. Never touch it during any other phase. This is your reality check.
6. Multi-instrument testing reveals a lot. A strategy that works on 5+ related pairs is much more likely to capture a real pattern than one that only works on a single instrument.
---
Final Thoughts
This pipeline isn't magic — no system can guarantee profitable strategies. But it dramatically reduces the probability of deploying an overfitted strategy. The philosophy is simple: test from every angle, and only trust what survives everything.
A huge thank you again to Mr. Popov for Express Generator. The speed, the command-line interface, the INI configuration system, and the data percentage splits — all of these features made this kind of automated pipeline possible. And the continuous updates keep making it better. This community and this tool have been invaluable.
Let's build robust strategies together!
Cheers,
Hani