Topic: OOS vs. no OOS
From articles that I've read about backtesting, they all basically say that you must have an OOS period to validate your strategies (which of course are generated over IS). (Most also recommend a 2nd OOS period, which I'm not convinced is needed)
I see that some members of the forum disagree with this approach and instead have chosen not to include an OOS period.
One argument against OOS that I have read was that they believe that stronger EAs can be created over a shorter backtesting period and that EA Studio and FSB give us the tools to be able to constantly create new EAs. These EAs, the theory goes, are good only for a limited time, as we are simply exploiting the market's inefficiencies and other traders and their EAs will quickly become wise to those exploits and even the market out. And we can easily replace them with the constant flow of newly-created EAs that are more current with the markets.
I'd like to blow some holes in these theories, or at least get some good discussion going about them.
I'll admit--I know little about trading and market tendencies. My background is in math and statistics and I see some obvious flaws with the above logic.
Though I state the information below as fact, I assure you that I understand it's just my opinion at this point and an educated guess; I have an open mind if someone can come up with solid ideas to the contrary.
I'd like to start out by asking if there is anyone successfully using this method of no OOS and frequent replacement of strategies. By "successfully", I mean using a live account for a period of several months. Some will answer by saying that they are just working out the kinks of exactly how to eliminate poor-performing EAs before live deployment, or how to weed out the bad ones before they lose too much. Or when to eliminate previously winning strategies that have not been doing as well the last few trades.
But by doing this, you are essentially once again "curve-fitting" based upon past experience. You can never totally escape the curve-fitting beast.
When you use an OOS period, there is NO need to run the strategies on a demo account. The demo account will prove nothing beyond what your OOS period showed you. The demo account runs very efficiently--without slippage or other major problems--just like backtesting. Have you ever tried deploying the same portfolio of strategies on a live server and a demo server at the same time over a period of one month? You might be stunned at the difference. One account can skyrocket on the demo and fall on the live.
Instead, you should deploy your EAs on a CENT account. Yes, the spreads are higher, but your are risking very little. Plan on anything deposited in your CENT account as a total loss, though. Strategies that are effective on CENT accounts can be transferred to larger accounts. Once a trader is very proficient and has sufficient capital ("sufficient" == much more than you think!), then it will probably be more efficient to deploy strategies directly into a larger account with lower spread.
But don't kid yourself: very few of us know what the Hell we're doing--and you are probably not one of them! I've made most or all of my living playing Internet and live poker over the past 20 years, and one reason that I have made so much money is that bad players think they are good. They lose over and over and over again and believe it's just bad luck. And suddenly they have a WINNING month! And they PROVE to themselves that they are GREAT players. And then the bad luck starts all over again...Darn the luck!
I do tend to agree with the argument that you are "wasting" a certain time period if your OOS period occurs after the IS. I believe it could be correct to reverse IS and OOS. In that way, the OOS data is adjacent to the IS data to allow for good validation, and the optimized data is up-to-date. On the other hand, it's possible that it's correct to leave the OOS last and instead optimize over only the OOS data.
But why is an OOS period necessary?
Shouldn't it be enough to run IS and deploy the EA live until its effectiveness wears out? This seems logical at first. After all, this generated strategy ran so well over the last X months that it surely won't just quit working suddenly. Here is where curve-fitting comes into play. I've seen it suggested that more data increases the likelihood of curve fitting, when in fact exactly the opposite is true. That is why strategies generated over a shorter time period have MUCH higher metrics. The fewer trades there are, the more likely it is that one or two or a small handful of anomalous trades will adversely influence the strategy's optimized parameters.
The more trades that a system makes, the longer the time period tested, the better the model will be. Now, this may not be entirely true. There will be some point where the additional length of the time period will have little added benefit. (diminishing returns) I'm not sure at what point that is. My guess is that it is much longer than most people want to believe.
Referencing poker once again...A primary reason why there are so many bad players is that they take a wrong action once and by DUMB luck they win. They believe they played brilliantly, when in fact they won mainly due to another factor. (Opponent had a weaker hand and folded, their "miracle card" showed up, etc.) In the future, they are much more inclined to take this same wrong action. If they lose the next time ten times they do it, they selectively remember the time they won with that action. Over the months, they have a whole collection of horrible plays that cost them money. And what's worse...the good players will recognize these tendencies and exploit them further.
The reason that metrics are higher while testing shorter time periods is exactly that a better system can be fitted to a smaller amount of data than a large amount.
But wait, doesn't the market change, so if we use only the most recent data we can take advantage of current trends? Eeeerrr...maybe, yes. I do not know enough to say for sure one way or the other. But here is the problem: Using smaller time periods, as mentioned above, results in higher metrics. And you will find MANY, MANY more strategies if you generate a 30M timeframe over 2 years as opposed to 15 years. But the problem is that you can find patterns in any data if you look hard enough. And FSB/EA Studio looks VERY hard; it sorts through SO many strategies that many of those that it does find are going to be false strategies. By dumb luck, FSB will find patterns where there are really none...or at least not significant enough so that the shown profit becomes a loss.
The more data you generate strategies over, the lower the metrics will be and the fewer strategies will be generated (for the same Acceptance Criteria)...BUT the more you can depend on these strategies.
So then, you may ask, why not generate over a larger amount of data, but still use a strategy of quickly weeding out EAs as they do not perform well?
This is getting closer to the right approach. It just needs some refinement. How long is "long enough" to test a strategy on a live server? Even a relatively solid strategy can go through some poor periods of stagnation. One thing I think we can all agree on: no strategy can EVER be expected to perform better over a period of time in live trading than it did in backtesting. It can only get worse.
Is one month "long enough" for a 30M timeframe strategy? No way. Any strategy can easily exhibit poor performance for that period of time. Of course, if it's losing money hand-over-fist, that's different. There must be a least acceptable measure somewhere. But one month is surely not enough of a sample size to produce any meaningful result--whether it be winning or losing. Again, FSB searches through so many possibilities that it will still come up with many erroneous strategies, even with longer time period generation.
I propose the following system, the logistics of which might be a nightmare:
Start every EA out on a CENT account. Trade only the minimum possible position on this account. When an EA reaches a certain performance level, promote it to a main account, trading 0.01 lots. As the EA continues to prove its performance, slowly add 0.01 lot sizes. (this will take a certain size of account) If it's performance declines, adjust lot sizes accordingly, or demote it back to the CENT account. Note that an EA wouldn't automatically go back to the CENT account after poor performance over a given recent time period; it would depend upon the entire lifespan performance of the EA.
Absent a sizeable main account, you could still promote strategies to the main account and use only 0.01 lot sizes.
Oh yes, back to the main topic! OOS is necessary to determine whether the optimized solution was overly curve-fit. Without OOS, you will need to rely on a demo account for your validation of a successful strategy.
Here's an experiment for you that are still not convinced about using an OOS period: Set OOS to 20% (representing the short period of usefulness) and generate 100 strategies that would deem viable to place on a live server. Do this first without viewing the OOS portion (use Acceptance Criteria only over the IS). Then, you can view each OOS period and count how many of those strategies actually ended in profit...and approximately how much profit/loss you incurred. I will bet that 30% or less of those strategies show a profit in the OOS.
As a final note, and I know this statement will get a lot of flack (because it will eliminate so many more strategies, something most people don't find appealing...but get over it, more profit is your ultimate concern!): Run your Monte Carlo tests over only OOS data!
Let me know if you have any specific ideas with regard to these issues!
ThanX!
John