Topic: OOS vs. no OOS

From articles that I've read about backtesting, they all basically say that you must have an OOS period to validate your strategies (which of course are generated over IS). (Most also recommend a 2nd OOS period, which I'm not convinced is needed)

I see that some members of the forum disagree with this approach and instead have chosen not to include an OOS period.

One argument against OOS that I have read was that they believe that stronger EAs can be created over a shorter backtesting period and that EA Studio and FSB give us the tools to be able to constantly create new EAs. These EAs, the theory goes, are good only for a limited time, as we are simply exploiting the market's inefficiencies and other traders and their EAs will quickly become wise to those exploits and even the market out. And we can easily replace them with the constant flow of newly-created EAs that are more current with the markets.

I'd like to blow some holes in these theories, or at least get some good discussion going about them.

I'll admit--I know little about trading and market tendencies. My background is in math and statistics and I see some obvious flaws with the above logic.

Though I state the information below as fact, I assure you that I understand it's just my opinion at this point and an educated guess; I have an open mind if someone can come up with solid ideas to the contrary.

I'd like to start out by asking if there is anyone successfully using this method of no OOS and frequent replacement of strategies. By "successfully", I mean using a live account for a period of several months. Some will answer by saying that they are just working out the kinks of exactly how to eliminate poor-performing EAs before live deployment, or how to weed out the bad ones before they lose too much. Or when to eliminate previously winning strategies that have not been doing as well the last few trades.

But by doing this, you are essentially once again "curve-fitting" based upon past experience. You can never totally escape the curve-fitting beast.

When you use an OOS period, there is NO need to run the strategies on a demo account. The demo account will prove nothing beyond what your OOS period showed you. The demo account runs very efficiently--without slippage or other major problems--just like backtesting. Have you ever tried deploying the same portfolio of strategies on a live server and a demo server at the same time over a period of one month? You might be stunned at the difference. One account can skyrocket on the demo and fall on the live.

Instead, you should deploy your EAs on a CENT account. Yes, the spreads are higher, but your are risking very little. Plan on anything deposited in your CENT account as a total loss, though. Strategies that are effective on CENT accounts can be transferred to larger accounts. Once a trader is very proficient and has sufficient capital ("sufficient" == much more than you think!), then it will probably be more efficient to deploy strategies directly into a larger account with lower spread.

But don't kid yourself: very few of us know what the Hell we're doing--and you are probably not one of them! I've made most or all of my living playing Internet and live poker over the past 20 years, and one reason that I have made so much money is that bad players think they are good. They lose over and over and over again and believe it's just bad luck. And suddenly they have a WINNING month! And they PROVE to themselves that they are GREAT players. And then the bad luck starts all over again...Darn the luck!

I do tend to agree with the argument that you are "wasting" a certain time period if your OOS period occurs after the IS. I believe it could be correct to reverse IS and OOS. In that way, the OOS data is adjacent to the IS data to allow for good validation, and the optimized data is up-to-date. On the other hand, it's possible that it's correct to leave the OOS last and instead optimize over only the OOS data.

But why is an OOS period necessary?

Shouldn't it be enough to run IS and deploy the EA live until its effectiveness wears out? This seems logical at first. After all, this generated strategy ran so well over the last X months that it surely won't just quit working suddenly. Here is where curve-fitting comes into play. I've seen it suggested that more data increases the likelihood of curve fitting, when in fact exactly the opposite is true. That is why strategies generated over a shorter time period have MUCH higher metrics. The fewer trades there are, the more likely it is that one or two or a small handful of anomalous trades will adversely influence the strategy's optimized parameters.

The more trades that a system makes, the longer the time period tested, the better the model will be. Now, this may not be entirely true. There will be some point where the additional length of the time period will have little added benefit. (diminishing returns) I'm not sure at what point that is. My guess is that it is much longer than most people want to believe.

Referencing poker once again...A primary reason why there are so many bad players is that they take a wrong action once and by DUMB luck they win. They believe they played brilliantly, when in fact they won mainly due to another factor. (Opponent had a weaker hand and folded, their "miracle card" showed up, etc.) In the future, they are much more inclined to take this same wrong action. If they lose the next time ten times they do it, they selectively remember the time they won with that action. Over the months, they have a whole collection of horrible plays that cost them money. And what's worse...the good players will recognize these tendencies and exploit them further.

The reason that metrics are higher while testing shorter time periods is exactly that a better system can be fitted to a smaller amount of data than a large amount.

But wait, doesn't the market change, so if we use only the most recent data we can take advantage of current trends? Eeeerrr...maybe, yes. I do not know enough to say for sure one way or the other. But here is the problem: Using smaller time periods, as mentioned above, results in higher metrics. And you will find MANY, MANY more strategies if you generate a 30M timeframe over 2 years as opposed to 15 years. But the problem is that you can find patterns in any data if you look hard enough. And FSB/EA Studio looks VERY hard; it sorts through SO many strategies that many of those that it does find are going to be false strategies. By dumb luck, FSB will find patterns where there are really none...or at least not significant enough so that the shown profit becomes a loss.

The more data you generate strategies over, the lower the metrics will be and the fewer strategies will be generated (for the same Acceptance Criteria)...BUT the more you can depend on these strategies.

So then, you may ask, why not generate over a larger amount of data, but still use a strategy of quickly weeding out EAs as they do not perform well?

This is getting closer to the right approach. It just needs some refinement. How long is "long enough" to test a strategy on a live server? Even a relatively solid strategy can go through some poor periods of stagnation. One thing I think we can all agree on: no strategy can EVER be expected to perform better over a period of time in live trading than it did in backtesting. It can only get worse.

Is one month "long enough" for a 30M timeframe strategy? No way. Any strategy can easily exhibit poor performance for that period of time. Of course, if it's losing money hand-over-fist, that's different. There must be a least acceptable measure somewhere. But one month is surely not enough of a sample size to produce any meaningful result--whether it be winning or losing. Again, FSB searches through so many possibilities that it will still come up with many erroneous strategies, even with longer time period generation.

I propose the following system, the logistics of which might be a nightmare:

Start every EA out on a CENT account. Trade only the minimum possible position on this account. When an EA reaches a certain performance level, promote it to a main account, trading 0.01 lots. As the EA continues to prove its performance, slowly add 0.01 lot sizes. (this will take a certain size of account) If it's performance declines, adjust lot sizes accordingly, or demote it back to the CENT account. Note that an EA wouldn't automatically go back to the CENT account after poor performance over a given recent time period; it would depend upon the entire lifespan performance of the EA.

Absent a sizeable main account, you could still promote strategies to the main account and use only 0.01 lot sizes.

Oh yes, back to the main topic! OOS is necessary to determine whether the optimized solution was overly curve-fit. Without OOS, you will need to rely on a demo account for your validation of a successful strategy.

Here's an experiment for you that are still not convinced about using an OOS period: Set OOS to 20% (representing the short period of usefulness) and generate 100 strategies that would deem viable to place on a live server. Do this first without viewing the OOS portion (use Acceptance Criteria only over the IS). Then, you can view each OOS period and count how many of those strategies actually ended in profit...and approximately how much profit/loss you incurred. I will bet that 30% or less of those strategies show a profit in the OOS.

As a final note, and I know this statement will get a lot of flack (because it will eliminate so many more strategies, something most people don't find appealing...but get over it, more profit is your ultimate concern!): Run your Monte Carlo tests over only OOS data!

Let me know if you have any specific ideas with regard to these issues!

ThanX!
John

Re: OOS vs. no OOS

This is great -- and I disagree with a lot of what you say, but that's the point, right?

I'm an empiricist at heart and a strong advocate of the KISS principal.  I like the approach of throwing 100 EAs at the wall and then going with the ones that stick.  I no longer have much interest in which ingredients go into each EA or how it was tested -- if it trades well then its a "winner" in my book.  Based on some wise advice from one of the more experienced traders on this forum, I only use live micro accounts for testing -- no more demo accounts for me.

Suppose my Data Horizon includes 3000 bars of data.  When I employ 33% OOS that means the first 2000 bars are used to optimize the settings -- i.e. the OLDEST data is used to optimize my strategy (which makes no sense to me).  Furthermore, since forex data is cyclic in nature, then who's to say which 2000 bars (67%) is the best to use.  Maybe I should use the middle 67% or the latest 67% (which makes the most sense).  My point is that though OOS is used with the intent of selecting for "robust" strategies, it still can not predict the future (because of the nature of forex data).

Here's an example -- suppose I give you two strategies whose balance charts and statistics are very similar -- one used 30% OOS during back testing and the other used In Sample.  When I drop those into a live account can you predict which one will perform better?  Probably not -- no one could. 

I have nothing against OOS other than I can't convince myself there is a compelling reason to use it and because it causes me to lose out on many strategies that would have probably traded just fine if I had instead chosen In Sample optimization.  My experience is that strategies with good statistics have about a 40-60% chance of performing well in my live micro account regardless of whether I use OOS, Monte Carlo or Multi-Market.

Just my two cents...

Re: OOS vs. no OOS

What time period do you generate over and how long do you run these strategies on a live account before you deem them acceptable or not?

Since, as you say, Forex data is cyclical in nature, then isn't this an argument for generation over more data and adding multi-timeframe analysis? This is one reason why I believe that more data is better; if you use insufficient data, then you may have just analyzed an entire chunk of data that is trending the opposite way to what the market is doing now. At any rate, the trend is likely to reverse in the near future, so a strategy of discarding many EAs that have stopped working will be self-fulfilling.

How long have you been working with this approach and what is your general monetary result?

Don't get me wrong, I also love the idea of running 100s of EAs. In fact, I believe it's far superior to spending time "perfecting" 30 or 40 of them.

I just think that a little extra effort and selectivity "should" produce much better overall results. Similar to the "80/20 rule". And EA Studio/FSB can make so many fantastic strategies that it will be impossible for me to test them all out anyway...so why choose the ones that have a better chance. (Yes, I'm running on the assumption that your last statement about OOS and MC is incorrect. As for Multi-Market, I don't really even understand why it's used. I mean, I understand the argument for using it, I just don't believe it to be much benefit. But again, I'm a novice trader and have my own crazy ideas!)

Another question: How long do you expect your EAs to survive with this method? I'm concerned that there wouldn't be enough time to get them out of the Micro account and onto a real one before they turned.

I'm certainly still open to your method, and given enough computer resources I may try it out.

ThanX for the discussion!
John

P.S. You said, "Here's an example -- suppose I give you two strategies whose balance charts and statistics are very similar -- one used 30% OOS during back testing and the other used In Sample.  When I drop those into a live account can you predict which one will perform better?  Probably not -- no one could."

I completely agree with that. With the same argument, you could take two strategies generated in the same manner...one with SQN of 2.2 and another with SQN of 1.4 (substitute any other metric if you wish). While these metrics are significantly different, no one can predict for certain which system will actually perform better. That's the nature of historical data. We can only make our best guess. And given the choice (assuming this is the only information given), I would choose the higher SQN.

Re: OOS vs. no OOS

I'll try to answer some questions -- but if I skip some or my answers are rather general, then it probably is intentional.  I'd prefer not to get into specifics because (a) we all have to find our own way, and (b) I don't want to be blamed if you lose money because you took my answer too literally.

Regarding the examples we both posed -- I'd say they were different.  I proposed two strategies that were statistically indistinguishable.  You proposed two strategies that were distinguishable.  I would always go with the one with the better stats.  That doesn't mean it will always win, it just means that in the big picture the probability of winning is better when using strategies with better stats.  Like rolling dice, the probability of rolling a 6 or 7 is higher than 8 or 9.  So, if I had to bet I would go with 6 or 7.  But sometimes the next throw is an 8 or 9.  That's just the way probability works.

Regarding forex data -- yes, it's cyclical.  But I don't think we have a clue as to how long a cycle lasts.  More data is better -- but only to a certain point.  Everything is a trade-off.  Everything is a double-edged sword.  You only need enough data to generate statistically significant statistics.  Another example -- suppose I have a strategy that I back tested for 10 years and it gives a 70% win rate -- is it a good strategy?  Some people would say 'yes', but I would never use it.  That 70% over 10 years is an average -- that is, there could be periods of 90% and there could be periods of 30%.  And if we are currently in one of those 30% periods or cycles then I'm toast.  Those cycles could last for days, weeks, months, I don't know.  I much prefer to rely on statistics using the most recent data -- that's what makes sense to me.

I enjoy reading this forum and I learn a lot.  In general there seems to be two types of traders -- and Popov's software accommodates both kinds.  There are people who are into the details of each strategy and indicator and they tend to wrestle with the software in order to make it conform to their preferences.  And then there are others who trust the results and simply go with what it produces.  I prefer the latter approach.  I've learned to trust the software and I'll take what it gives me.  I set up certain Acceptance Criteria and let it run for a few days.  When that's done I'll use the filters and further select those strategies with the best statistics that I like.  I'll end up with maybe 60-70 strategies that I will then add to a micro account.  I'll let that trade for awhile and then prune it using MT4 Tracker.

Up until now I've only been using micro accounts -- I have several.  And I'm happy with the results.  The micro accounts allow me to continue testing 100's of EAs without risking a lot of money.

Re: OOS vs. no OOS

sleytus wrote:

......Here's an example -- suppose I give you two strategies whose balance charts and statistics are very similar -- one used 30% OOS during back testing and the other used In Sample.  When I drop those into a live account can you predict which one will perform better?  Probably not -- no one could. .....................

You are completely right, no one could. But from my feeling and experience I would trust the OOS type more ;-)

Re: OOS vs. no OOS

@sleytus: Thank you for your thoughtful answers. I am going to purchase EA Studio (in addition to FSB) and try the mass EA approach. I'm still going run them through OOS and MC, but not be quite as selective as I will with the methods I develop using FSB.

And you would never make me lose money. Only I can make me lose money. smile (Well, unless you robbed me, I guess)

Re: OOS vs. no OOS

With OOS you can get lucky, especially if same period is used over and over. This check is not holy grail, like more advanced Walk Forward Analysis too. From lots of tries you just get lucky. I did not used OOS automatically with FSB. Firstly I got strategies generated without OOS check with time horizon tool some data taken off. Then took few best strategies what I liked and checked once with unseen data. If passed, then I take all data, and optimize again from default variable values. Actually now I am thinking to use generator more like strategy idea generator, and reoptimize from default indicator values. Somehow I believe it would produce less curved systems, but I can be very wrong. Also I used lots of data, some rules like at least 50 trades for each variable.

What is very interesting that I got about 50% strategies of my portfolio making profit this way, like Sleytus way with short periods and no OOS check. Maybe there is no big deal how you do it? Still you get about random 50% amount working strategies? If that is true portfolio management becomes very important. Cut losers quick and let winners run smile I have set strategies stop point to *1.5  max back tested draw dawn, and only 3 strategies from 30 in 7 months reached that point. So this way to get rid off bad strategies is so slow... May I ask what criteria Sleytus use to stop strategies?

Probably only way to find what works is by experimenting with different ideas and choose one with better real time trading results.

Re: OOS vs. no OOS

This discussion is helping me clarify (in my own mind) my thoughts about OOS. My basic stance hasn't changed, but it's easier for me to understand what I'm thinking--and I do agree to an extent with both Irmantas and sleytus. Perhaps there's even a chance that I will move much closer to their side of thinking.

*Irmantas said: "With OOS you can get lucky, especially if same period is used over and over. This check is not holy grail, like more advanced Walk Forward Analysis too. From lots of tries you just get lucky."

Yes! That is very true. FSB and EA Studio examine so many potential strategies that not only will it "discover" so many patterns in the In Sample data that are not REAL patterns, a portion of those fake patterns will also (by DUMB luck) continue into the OOS period. I compare this to a very bad poker player who wins a lot of money over a certain period of time--or a craps or blackjack player who thinks he's figured out a winning system--he will inevitably lose that money (and more) back. No matter how much historical data is tested upon, there are (at least) two reasons that great-looking systems we've developed may perform poorly in the future: 1. We have found fake patterns; 2. The market tendencies changed enough that our system has recently become invalid (and depending on how long of a period you tested upon, it likely declined gradually.) -- and of course these two factors combine and magnify the problem.

*Irmantas said: "I did not used OOS automatically with FSB. Firstly I got strategies generated without OOS check with time horizon tool some data taken off. Then took few best strategies what I liked and checked once with unseen data. If passed, then I take all data, and optimize again from default variable values."

I COMPLETELY agree with this approach. In fact, the way FSB is currently set up, this is the superior approach. The problem is that there is currently no other way to validate the OOS data separately from the In Sample (correct me if I'm wrong!). Miroslav said that he's working on a feature where you could do just that and set Acceptance Criteria for IS and OOS so that it is validated separately. You can, of course, visually inspect the equity curves, but that is a lesser solution. Your method of running the strategies through separately accomplishes the same thing; it just takes a bit more effort and results in needing to prune the Collection more frequently.

And optimizing across the entire range of data is also the right approach.

*The observation made by Irmantas is one of my several concerns about the method of deploying a mass amount of EAs with limited backtesting. If half of your strategies are making profit, that's a good start--IF you have a very effective of eliminating poor performers.

"Cut loser quick and let winners run." This sounds good in theory, but even great strategies will go through many periods of drawdown and are very likely to start off as losers. Very few of my generated EAs have better than a 50% win rate. Here's an exercise: take what you believe to be a decent-enough EA to deploy in this manner and examine its trading history over your test period. Find all the locations of its drawdowns and gauge about how much it lost each time before resuming its upward climb. What percentage of the time was it lower than its peak? And by how much each time? These are routine drawdowns that could very well occur as soon as you deploy the EA.

And keep in mind...your strategy WILL perform worse than your backtest. It will almost never perform better, and if it does that is ALWAYS dumb luck and will turn at any time.

So, back to the question: how do you cut out losers at the earliest opportunity without cutting too many of the winning strategies? The drawdown limit would certainly need to be reduced drastically. Maybe to half of maximum observed in the backtest--depending on the length of your backtest data.

But the fact is that you will have many strategies that were never well-performing in the first place (that DUMB luck thing from backtesting). And these will definitely cost you however much you decide that your "stop loss" on strategies is. So if 50% of your strategies are losers, then your "winners" must make up for their losses--PLUS give you a nice profit to boot.

The next question is...when do you cut off your Winners? When they are back to break-even? When they lose back half their profits? When they have lost money during the last month? You don't want to give too much of their profits back, or you won't have any overall profit to counteract the losers. And again, even solid strategies will experience drawdowns.

Maybe enough of the strategies that have you generated will show larger profits over a longer period so as to make up for a number of losing strategies. However, if this is the case, most of those strategies probably would have remained viable if you tested them over a longer period of time. AND many of your poorer performers would have been eliminated with a larger backtesting period. Yes, it would certainly take you longer to generate these strategies, but the overall effect is to increase your Return On Investment (ROI).

The greater your ROI, less Variance you will experience. Variance is the Demon of traders, just as it is of poker players. This is at the heart of all money management when trading.

Let me pose a question: Would you rather use a set of EAs with a potential ROI of 50% per year or a set with potential ROI of 10% ROI per year?

If you Immediately said, "What a dumb question. Not enough information!" then good for you. If not, then you need to do research into Variance. Obviously, if you are risking too much of your account (assuming you cannot easily replace it with outside income), it doesn't matter what ROI you "might" achieve. You have too great of a risk of going bust. (So many otherwise good poker players don't follow this rule, even some of the great players like Phil Ivy, who's gone completely broke several times...but he can count on a quick $100,000 stake from his friends to get back in action. Can you?)

Almost all traders are aware of drawdown, of course. And I would bet that nearly every person who's been studying trading for a month or longer can recite how important it is to control your drawdown. But how many REALLY understand how much Variance can really affect them? Aside from already-successful traders, I would guess that number to be 1 in 10 -- at most.

What is the point of this aside? The "100s of untested EAs" method will result in a lower ROI than a much smaller number of EAs generated over a longer time period. In my hypothetical example, yes...the 100s of EAs may have even made a larger profit, but at the expense of a much lower ROI.

This lower ROI causes much more Variance. And more Variance is BAD. More Variance becomes exponentially worse, much like compounding interest. And less ROI also causes Variance to increase exponentially. (That's not just a double whammy...those two exponential situations exponentialize each other as well!)

OK, sorry, I got a little carried away, as usual.

Despite my ranting and railings against the "100s of EAs" method, I still hold out some hope that it will work in some form.

One danger I think almost all of us have is that we WANT it to work. I mean, wouldn't it be GREAT and EXCITING to throw these 100s of EAs we can generate non-stop onto a server and rake in the profits! And these profits will just keep on building and building practically on auto-pilot until we have so much money we don't know what to do with it all! (Come on, we've all thought this through!)

I'm just trying to bring people back down to Earth a little bit.

Anyway, I do like the way both of you (sleytus and Irmantas) are thinking. I think much along the same lines as Irmantas, but sleytus gives me new ideas.

Re: OOS vs. no OOS

Irmantas wrote:

May I ask what criteria Sleytus use to stop strategies?

When I tell you then you'll be disappointed -- nothing magical and mostly trial and error.  The strategies I add to a live micro account usually have a Win Percent of at least 80% and at least 100 back testing trades.  If a strategy then loses its first 2 or 3 trades in a live account then I'll remove it.  I know that isn't very long and the strategy might be a "late bloomer" -- but the back testing stats indicated it should win 80% and the fact it loses the first 2 or 3 then suggests to me there is some disconnect.  So, though you might consider that a rather harsh penalty -- two strikes and your out -- since I'm losing real money and since I have so many more strategies whose initial trading pattern *is* consistent with the back testing strategies -- then I'm comfortable excluding the early losers.  Since we have so many strategies to work with, then it makes sense to me to error on the side of being strict (rather than lenient).  I don't feel I need to be kind to every strategy and allow it additional opportunities to lose my money...

And then I'll use MT4 Tracker once a week -- usually after closing on Friday -- to clear out the poor performers and further refine the portfolio in preparation for Monday's open.  And I'll often add a new Portfolio EA (with 60 or 70 new strategies) to the accounts on Saturday or Sunday -- which allows me to repeat the process.  So, it's this continuous process of adding and pruning...

I also assume this is the new paradigm that Popov had in mind when he created his software applications.  The only thing missing was a way to make it easy to prune -- which motivated me to develop MT4 Tracker.  I'm also working on a new indicator that has an interesting feature I'll call "adaptive pruning" -- where it is constantly doing the bookkeeping and automatically excluding the poor performers.  I'll have more to say in a month or so.

10 (edited by qattack 2017-08-11 05:56:09)

Re: OOS vs. no OOS

sleytus, thank you for your more detailed explanation of your system. Actually, I wasn't disappointed at all about your selection criteria. It sounds like a very logical start, and you may sway me towards you thinking yet!

I have a few questions generated by your above discussion:
1. How many strategies can you find with an 80% win rate? This seems quite high. It's easier, of course, since you are using less data...and an 80% win rate would indicate an especially good recent run, unlikely to be kept up. (I saying this as a positive, not a negative, as even though it's an unusually good run, it's more likely that these strategies will produce higher win rates on average in the future.)
2. What time frames do you generally use?
3. How many bars do you generate over? Is it the same number for each time frame?
4. With your pruning & adding strategies, do the number of strategies deployed each week generally grow?
5. Do you just add your best 60 to 70 strategies, or do you add all strategies above a certain threshold?
6. Do you "Search Best Win/loss ratio"?

Here's an idea: What if instead of deleting an EA at an arbitrary two or three losses, you delete it when it exceeds it's "Max consecutive losses"? Or maybe that number divided by two and rounded down? Or maybe don't include strategies with Max consecutive losses more than a certain number. The nature of the market will cause certain strategies to lose (or win) more trades in a row than it's Win% would indicate.

And if I didn't say so already: I haven't used MT4 Tracker yet, but it looks amazing, even if it was a paid software. It's virtually a necessity when trading 100s of EAs. I've been searching for something like it. And adaptive pruning sounds out of this world!

11 (edited by qattack 2017-08-12 02:38:05)

Re: OOS vs. no OOS

OK, I purchased both FSB and EA Studio today.

I am only experimenting at this point and over the next couple weeks will formulate both short-term and long-term plans. Short-term involves more EA Studio, while long-term will focus on FSB.

I set a single instance of EA Studio running about three hours ago, generating to optimize SQN. I am using 14 years of data and 50% OOS. 16 strategies have passed MC tests, of which I've visually eliminated 12 for bad OOS (or not enough trades in one case: I forgot to increase #/trades for the longer time period). One of the remaining four strategies is iffy, but OOS for the other three look quite good. All make between one and two-and-a-half trades per week, which is fine.

Yes, generating over such a long time period reduces the speed at which EA Studio can find strategies. But I'm currently running only one instance. I assume I can run one per unused thread efficiently. (??) So if I open eight instances, that might generate eight strategies an hour for me to examine more closely. If I deem half of those as "good", that will still leave me with 96 strategies per day.

I will set some other criteria such as Max Consecutive Losses. At any rate, I can quickly fill up a large server with strategies generated in this manner, even with my longer generation and OOS periods.

If such a simple generation of strategies would work, it would be as close to a "Holy Grail" as I could get. I'm not getting my hopes up at all, but it looks at least possible at this point that amassing 100s of EAs in this manner very quickly could work.

But on the other hand, Some of the OOS are inevitably due to Dumb Luck.