Hi there!
Here's my contribution to this topic.
I think, in the end everything is related to market conditions / market regimes.
One could try to partition/segment the market conditions into different kind of market regimes according to some variables or conditions. Imagine how different it is a trending market from a ranging market, a volatile market from a non-volatile market, etc.
One possible partition/classification of the market conditions could be based on trend conditions and volatility conditions. Thant would give us these situations: Strong Trend - Weak Trend - Range, and each of those states could be Volatile or Non-Volatile: total 6 types of market regimes. This classification is only an example but could help us to understand the idea.
The idea is those market conditions are cyclical, and now the market is in some condition (for example, heavily trending and highly volatile) but in a couple of months from now it will be in a different regime (for example ranging with low volatility).
If we train our EAs to perform well in a specific market regime this EA will likely perform well as long as this market regime lasts. But when the market conditions change the EA most likely will fail. Sorry if this is too obvious for most of you.
So imagine we use a training period of 15.000 bars. Those bars represent a period of several years (about 10) if the bars are H4 bars, but only a couple of months if they are 5mins bars. If we only use the last couple of months of 5min data, our EAs will likely be only fitted to the market conditions we have now. That will be the case for example if during the last couple of months the market has been ranging (and not trending) and as soon as the market starts trending our EAs will likely start failing.
I guess the idea behind using more bars is to try to not only adapt/fit our EAs to current market conditions because nobody knows when the current conditions will change and what conditions will come next. So when people use longer data (more bars) what they are trying is to cover other market regimes when building their EAs and as a result to get more reliable EAs that will survive to market changes (other market conditions). However, an EA trained over many different market conditions could mean hundred of thousands bars in m5, and that means many hours to build 5min EAs.
In the end, in my opinion the question about how much data is enough doesn't have an unique answer. Is 4000 bars enough? Well, what kind of bars are those? are those H4 bars? 4000 H4 bars represent several years of data. In those years there might have been very different market conditions and for that reason, training your EAs over that time might be enough to build EAs capable of perform well over different market situations. On the contrary 4000 M5 bars is hardly a month and training your EAs only during a month might not be enough. Market conditions could be very different next week and for that reason your EA will likely fail and will not survive the new conditions.
So, I guess, overall using longer data may be better if you want to build more reliable EAs, more capable to adapt to other market regimes/changes in market conditions. On the contrary, using shorter data will likely give you EAs only adapted to the specific conditions we have now, and for that reason you'll have to be fast removing them when you notice the EA start failing or the market conditions have changed.
That is why I've written a couple of posts here about trying to identify market conditions beforehand and to trade only the EAs that were built to the specific market conditions we have now. That's not an easy task, but I think that can be done, for example if we use higher TF indicators.
In my view, the recent market situation for EURUSD wasn't so much an "odd period" or "strange behaviour". It was simply a Strong Downtrend - Highly Volatile period. Similar to what happened from april 2014 to the end of that year. It is not the conditions have been hard recently (in my humble opinion). It is more likely our EAs weren't so well trained to those conditions. If I had trained my EAs (only) during that old period (april 2014) I think quite likely my EAs in EURUSD April-May 2018 would have been profitable (because both market conditions were similar).
I get your idea, Hannahis, about this conditions being diffult for trading: well, they were difficult for being Long in such a strong Sell market. But I get the idea of training the EA in this conditions (thanks for the idea). A buying EA in such a "difficult" market regime will likely survive when the conditions were not that "hard". However, on the contrary, a selling EA optimized during this period could find difficult other periods when the selling strength won't be so notorious.
Sorry for such a large post, and please continue contributing to this topic. Please forgive me if most of this post was too obvious for you or please correct me if you think I'm wrong.
Warmest regards,
RJ