Topic: Proof that too much historical data is harmful...

I recently posted about the importance of optimizing using only recent data.  Other than DoCZero it didn't generate much interest.  Here's the link:
https://forexsb.com/forum/topic/7458/optimize-using-recent-data/

I am deeply indebted to Popov and this forum.  His software totally changed my forex experience and I've learned a ton from a handful of people on this forum.  So I've decided to try one last time, and then I will give it a rest.  I chose a more controversial title hoping more people would be curious.

Below I present 3 charts that demonstrate why using too much historical actually leads to poor results.  If you are inclined, you can readily reproduce this observation for yourself -- it might take 2 minutes, perhaps less.

I have a few hundred strategies and chose one at random -- there is nothing special about it.  It was designed for EURUSD/H1.

IMAGE #1
https://s8.postimg.cc/x2ga21ns1/optimized-using-8years-old-data.png

IMAGE #2
https://s8.postimg.cc/rfjx41ij5/optimized-using-old-data.png

IMAGE #3
https://s8.postimg.cc/yirsjoitt/optimized-using-recent-data.png

IMAGE #1 -- shows the strategy optimized against 50,000 bars -- i.e. approximately 8 years of historical data.  From the stats you can see it's not so bad, and not so good.  It has a WinRatio of 0.54, a SQN or 3.67 and ProfitFactor of 1.32.  In the lower right-hand corner I've drawn a red square around the price chart for 2018.  You can see that same price pattern in IMAGE #2 and IMAGE #3.

IMAGE #2 -- the exact SAME strategy and settings as IMAGE #1.  However, this chart only shows the most recent 4000 bars (approximately 8 months).  Unless you have a time machine and could travel back in time, it would be foolish to add this strategy to a Real account in 2018.  But this is what many people do and then wonder why the strategy performs poorly compared to what they expected from IMAGE #1.

IMAGE #3 -- the same strategy, but it has been re-optimized using the most recent 4000 bars (approximately 8 months).  The strategies in IMAGE #2 and IMAGE #3 are the same -- they differ in that IMAGE #2 uses settings calculated over 8 years, whereas IMAGE #3 uses settings calculated over the most recent 8 months.

Let's compare:  IMAGE #2:  WinRatio:        0.48   
                                      SQN:          0.70
                                      Profit Factor:      1.22
                          
                      IMAGE #3:  WinRatio:      0.77
                                      SQN:          5.64
                                      Profit Factor:  3.39
                          
Those of you who would still trade the strategy from IMAGE #2 in a Real account in 2018 please raise your hand.  I wish you luck -- you will need a lot of it if you plan to succeed in trading forex.   

Here's an analogy -- suppose I have a glass of lemonade and poor it into a lake.  Does the lake now taste like lemonade?  Of course not -- the lemonade got diluted by the larger body of water.  This is similar to what happens when you mix a little bit of new data with a lot of old data -- the new data gets diluted.  Since we trade in present time then it makes no sense to dilute the recent data.  The recent data is your friend.

I imagine many people think that by using more data their strategy becomes "smarter" -- i.e. learns more patterns.  This is definitely not true.  Our strategies do not learn.  This is not AI -- there is no database backing our strategies.  The optimization step performs a one-time computation using the input data you provide.  The settings that it computes stick with the strategy throughout its lifetime (until you "refresh" or re-optimize).

I'm sure a number of people have nice looking Balance / Equity curves that span several years, or even longer.  But they are misleading -- your computer screen doesn't have enough resolution to see all the peaks and valleys.  What you see on your screen is from 30,000 feet high.  If you were to zoom-in you would see many peaks and valleys, and some of those valleys could last weeks or months.  Why would anyone voluntarily choose to trade a strategy that loses for weeks and months?

Okay -- I've made my peace.

Re: Proof that too much historical data is harmful...

Hi Sleytus,
I guess there would be a lot more attention if you post some image with rising PL curve big_smile ... I know a pain of ignored topics, but I guess forum is more silence due summer holiday time. Also I am very interested to read all yours and other users posts, but most of the time I have nothing to add valuable to the topic.

I would be very interested to see couple months of unseen (not included in optimization) data period and compare which did better job in the future. Can you please do it after one month or two if most recent data is used? Because I believe first one (longer optimization period) should perform better. In my experience I found out that I get better results with more data (2 years data portfolio's PL never rise up, but longer 4-6 years with some unseen data filter check made some money). Also few authors agree about this -  https://www.amazon.com/Trading-Systems- … s=tomasini and  https://systemtradersuccess.com/fooled- … tion-bias/

However I am open to all possibilities about being wrong and I can change my believes quickly. If you are profitable with small amount of data it rises me very good question - WHY? Especially after my few my experiences shows differently. Maybe the answer can be not in the amount of data used? but somewhere else? Maybe you have some very good/quick "throw out" method in portfolio management? Other thing can be HIGH trade numbers - I noticed that you use good number of trades for your systems, like 400, but my average trades is like 200 on a lot bigger data periods. What other secrets do you hold? smile

It is possible to think of bigger experiment to test if short/longer optimization is better with seen/unseen test. Let's do it with let's say 20 systems, and couple different data horizons and draw some conclusion about this. Maybe some volunteers? smile

3 (edited by ats118765 2018-08-17 01:58:20)

Re: Proof that too much historical data is harmful...

Hi Steve

Thanks for the post.

I thought I would add my two bobs worth....but I am only a newbie to this forum.

This post is more a 'food for thought' and is based on a few underlying assumptions that members can take with or without a pinch of salt. :-)

I tend to view markets through a prism of 'markets being very efficient' with only occasional alpha being made available through sporadic and generally unpredictable inefficiencies. Under this assumption I tend to throw my ability to predict future performance out with the bath water.

I do not pay much attention to OOS testing as in my opinion this is an outdated form of testing from back in the days when alternative methods of testing for robustness were not available.  I prefer to invest nearly all my time in testing strategies across as broad a range of market conditions as I can find and in the rigorous Monte-Carlo testing of strategy parameters to identify the more robust strategies that can weather the storms of market uncertainty and varying market condition.

The only OOS testing I do is left for my native MT4 broker platform and over a relatively short period to ensure that there are no execution errors or technical issues with the data mined EA that I have generated.

The intent of my search is to simply find those strategies that in unfavourable conditions, do not detrimentally affect the overall portfolio in terms of any undue bias that the singular return stream of the strategy will impose on the collection.

What this means is that all my decisions at the individual strategy level are based on three requirements:
a) That during extended unfavourable conditions, the strategy (return stream) will be limited in it's drawdown impact on the overall portfolio;
b) That the individual return stream has positive  expectancy overall (this is more a statement that the strategy has a good chance of having a slight edge); and
c) That the individual strategy offers correlation benefits to the overall portfolio and assists in reducing overall portfolio volatility.

As far as the overall profitability of the strategy itself, I tend to pay little regard to this feature. That is where I would differ in opinion from your take that 'Too much historical data is harmful'.

There is good data and there is bad data. Data that is not relevant to your native broker platform is meaningless to me and is what I refer to as 'bad data'. What I mean by this is that available alpha is so light in an efficient market but there is so much variability in overall result from slight changes to data such as differences in bar open, variations in spread/swap, quality of historical data....etc etc etc...that the differences can be material to the overall result on the return stream (equity curve) of the strategy. Good data is therefore platform specific data history and as much as you can garner...but once again there will be significant material variations in the result if you are using constant assumptions in your results such as constant spread, swap, slippage etc.

The reason for my preference for as much data history as I can get.... is less about data sample size (although this is generally a useful proxy for robustness)...... but more about ensuring that the strategy has been exposed to as many different market conditions as possible and of differing duration.

If the strategy still has positive expectancy overall (no matter how slight), but will assist in preserving capital, then you can keep this strategy 'turned on' all the time and be available for those unpredictable times when market conditions become favorable and you make hay while the sun shines.

The reason I diversify my portfolio is two fold:
1. Diversification of 'uncorrelated return streams is the major method used to make your portfolio robust and protect your finite capital from impacts of adverse volatility (namely drawdowns); and
2. Diversification of 'uncorrelated return streams' each with slight positive expectancy ensures that there is overall positive momentum in portfolio growth over time at most points in the time-series.

If you select your strategies from those that are performing well over recent times, there is a strong chance that you will be selecting from 'positively correlated' options. This is great when times are good....but exceptionally bad during unfavourable market conditions. Trading correlated return streams in your portfolio exacerbates portfolio volatility as each return stream amplifies positive growth and negative growth as there are all 'in phase'. By adopting this technique you will have to be right on the pulse with knowing when to 'turn on' or 'turn off' those strategies....which in my opinion is a predictive technique and always lagging in nature. There has been a lot of articles written on this topic of 'market timing' but very little success in this area. My preference is for strategies that are turned on all the time. To do this....you need an appreciation of what is 'acceptable limits' of volatility to your strategy so you are not tempted to interfere when entering drawdowns...as they are actually a necessity in the big picture. Benchmarks tend to be the way to go using your long term strategy metrics as the guidelines to assess strategy health during live trading periods.

Also there is a pretty well known 'mean reverting' phenomenon in fund management circles when selecting which funds to allocate your money to. When you have a selection of well established highly diversified long term funds to allocate your money towards, you actually select those funds that are recovering from a drawdown as opposed to those funds that are reaching new equity highs. This is counter to the idea that you should select the current best performing funds from the options available. The reason for this is that these funds have already passed the robustness challenge in that they are highly diversified AND they have stood the test of time over an array of market conditions. Once that challenge is achieved then the optimal times to re-balance your portfolio allocations dependent on when funds are recovering from a drawdown or when they are reaching new high water marks.

Anyway....here is the guide I use when looking at strategies generated by EA studio at the Monte Carlo phase of review.

I tend to view the most likely scenario from the array of simulations generated by Monte Carlo testing as the median solution.  This to me is a more probable result of return stream that would be generated by a particular class of strategy given the high sensitivity of the strategy to minor perturbations in market condition etc etc etc..

Your assumption about the nature of these markets will also influence this decision. For example I believe that the majority portion of any equity curve (which is a derivative signature of market condition) is attributed to market randomness. Under this viewpoint, a single 'best' equity curve is a misnomer and more a result of the way random results compile under optimisation to give you an optimal equity curve. My opinion is that there is only 'weak alpha' available in the markets which relates to the level of 'signals' in the noise and the degree of autocorrelation in the time series. The median choice of return stream from the alternative generations is the most 'probably' outcome of the series and gives a more realistic interpretation of the degree of 'signal in the noise'. The reason for the 'weak alpha statement' is that over long term horizons, the best fund managers in the world can only achieve a certain threshold performance that caps what is pipe dream and what is realistic. A long term CAGR of 15%-20% with commensurate drawdowns of 30-50% is about the best you can get over an extended performance period from the best of the best with heavy duty research teams. I just cannot find audited verifiable results that prove otherwise.

Anyway....back to the Monte Carlo array.......I then calculate an expected return from this median solution to ensure it still has slight positive expectancy and recognise that the optimised return stream is a very unlikely scenario going forward.  What this does is two things:
1) It reduces future expectations to more realistic live trading conditions; and
2) It provides a broad idea of the likely volatility of that return stream and it's degree of 'edge' that you can then use for portfolio construction using correlation as your guide.

I then superimpose the expected return stream profile over other strategies in my potential collection to ensure they are either anti-correlated or non-correlated.

This is a critical step in the portfolio construction process as your portfolio composite must be balanced with different return stream signatures to ensure there is no undue bias that will contribute to exacerbated drawdowns.

Through this method of portfolio construction the aim is to generate your linear upward sloping equity curve by inclusion of individual strategies that have different strengths and weaknesses but in composite mutually work together to create the ultimate blended solution that reduced overall volatility and improves the risk-weighted return.

Once this is achieved you can then go to step two and accelerate these results using position sizing methods that give far greater bang for buck than what could otherwise be achieved by a single strategy solution.

https://s22.postimg.cc/cwkbndl31/Monte_Carlo_and_Correlation.png


.....anyway...some food for thought.

Cheers

Rich

Diversification and risk-weighted returns is what this game is about

4 (edited by sleytus 2018-08-17 07:19:23)

Re: Proof that too much historical data is harmful...

Hello Irmantas and Rich,

Thanks for posting and keeping the topic alive. There is lots here to think about and I need to re-read your posts a couple more times.  To keep the conversation going, I did have some initial comments and questions.

Irmantas wrote:

I would be very interested to see couple months of unseen (not included in optimization) data period

I think you are referring to OOS -- is that correct?  I've experimented with OOS and hybrid data sets in the past, but lost interest.  One problem with IS/OOS is the OOS data is the most recent and that is the one you should be using for optimization.  If the IS/OOS data segments were reversed then that would make more sense.  There is another problem with OOS -- and that is how you interpret the results.  As an example, when applying OOS there are two possible results:
(a) OOS performs poorly.  In this case you throw the strategy away.  Which, I feel, is a shame.
(b) OOS performs well.  You now assume the strategy is robust because it can "see" into the future and perform well on unseen data.  But this may not necessarily be the case.  An alternative explanation is the OOS region is simply a continuation of the IS region -- in which case you would expect it to perform as well.  Given the two possibilities, I would claim the second explanation is more likely to be the case.  Also, if the unseen data were truly unique then you would sometimes expect your strategy to do better in the OOS region than the IS, but that is rarely the case.  Also, the OOS data is the most recent data -- this is the data you should be using to optimize your strategy.

Let me explain a different way...  We all know that indicator settings are important, right?  Minor changes to indicator settings can make a big difference -- even determining whether a strategy performs well or poorly.  When you run a Monte Carlo test then this becomes clear.  Using input data, the optimization algorithm computes the best settings for the indicators.  Since indicator settings are so critically important, then why would you ever choose to use old data to compute these settings.  Since we trade in the present, then the newest data should be used for this purpose.

"Recent data" is a relative term -- I intentionally did not define what I meant because it depends on a number of things and also your  trading style.  I can only share my experience and then, perhaps, you can better compare to your approach.  I now exclusively trade EURUSD/H1.  4000 bars works out to about 8 months of data.  Using FSB-Pro, I typically end up with between 250-400 executed orders.  If the number of executed orders is < 250 then I don't trust the results as being statiscally significant.  And if the number is > 400 then I'll usually add another indicator to bring it down (which has the added benefit that it also improves the statistics).  If you trade D1 then you very well need 10 to 15 years of data to have enough executed orders for your results to be statistically significant.  So, yes, you may need 4-6 years of data to achieve good results.

My main points were (a) recent data is your friend, and (b) we should only use just enough data for our results to be statistically significant -- since, otherwise, old data begins to dilute the most recent data.


ats118765 wrote:

I thought I would add my two bobs worth....but I am only a newbie to this forum.

Wow -- this is great information and worth way more than two bobs.  You may be a newbie to the forum, but you ain't a newbie to trading.  I need more time to digest all you wrote, but I still had some comments and questions.

I like that you mentioned Monte Carlo -- which I also use, but more as a "sanity check".

I like that you don't put much value on OOS -- I agree, but for different reasons (which I mentioned above).

I've totally bought into the concept of toggling strategies ON/OFF.  Since I'm a developer, I add a little bit of code that implements simple policies based on how a strategy is currently performing.  Freebird recently mentioned he sees strategies trade "in a groove episodically" -- which I think is 100% true.  So, toggling ON/OFF not only seems like a natural way of dealing with this behavior, but also a necessary one (unless you are willing to tolerate extended losing stretches).

If you are more of the mind of creating strategies once and letting them run and "weather the storms of market uncertainty" then, yes, I understand why you would want to use more historical data.  However, consider this -- indicators are very simple algebraic formulas and there are only a handful of settings.  The optimization algorithm attempts to find the best value for a particular metric using trial and error.  Also, the order of the indicators in the slots affects the computation.  So, forcing more and more data (especially old data) into the optimization function often leads to less than desirable results.  For example, in my particular case (EURUSD/1) I've taken the same strategy and optimized using 2000, 4000, 6000, ... 32,000 bars and compared the resulting statistics.  2000 was bad, 4000 was good, 6000 slightly better, and then it went downhill from there.  And that makes sense to me.  These are simple indicator formulas and the optimization algorithm is using trial and error to find the best overall result -- forcing more data onto the algorithm becomes self-defeating after a certain point.

You use the word 'alpha' a few times -- I'm not familiar with that.  Could you briefly explain.

There is lots of food for thought here -- again, thank you.  I need to re-read your post in order for more of it to sink in.  A lot of what you describe is new to me.

5 (edited by ats118765 2018-08-17 10:21:13)

Re: Proof that too much historical data is harmful...

Hi Guys and particularly Steve

Thanks for making the effort to keep interesting threads like this alive. I find such great tips when sharing info from this kind of discourse. One way diatribes get quite frustrating to me. :-)

Steve.....I see and understand the points you raise regarding optimisation. I am not a big fan of it myself as it exacerbates the propensity for curve fit solutions.

If I use optimisation in my data mining (which I do currently but in a very weak form to generate solutions that pass validation criteria )...then I have to ensure I spend extra effort in the MonteCarlo phase ensuring that symptoms of curve fitting are not present in my final choice of selected robust strategy.

Where I think we may be differing in opinion is in our faith in the validity of the equity curve as being a method to project a future state. I have very little confidence in the historic trajectory being the guide for a future outcome given market complexity and the inability in complex systems to predict what will happen tomorrow let alone what will happen in a years time. I have had to flip my mind given the continued disappointment faced with forcing my expectations onto the market condition......and rather accepting what the market gives to me.......and as a result have reverted my entire emphasis to think in terms of what we can control......namely risk. IF the market is Non-Gaussian.....in other words if anomalies are more prevalent to what a normal distribution would imply....then the profits will simply be an outcome from being present in the market at the unknown time for the systems you deploy to capitalise on that anomaly. By being present at all times (which means you need to preserve your capital base) you will be in a position to capitalise on alpha when it presents itself. You will be too far behind the eight ball if you attempt to predict an arising anomaly. Not only will you be wrong most of the time as valid identification of an anomaly is always a hind-sight measure...but your delay in identifying them will mean you most likely will miss the boat in an efficient marketplace.

Like a hard stop, deciding to 'turn off' your strategy if it is entering a drawdown can unnecessarily introduce drawdowns into your portfolio....... as it is difficult to determine if the impact of 'turning off' is from a truly deteriorating strategy....or simply a natural and temporary drawdown phase that is exhibited by all strategies as no single strategy can navigate all market conditions. When you 'turn off a strategy due to apparent under-performance this inevitably occurs when the strategy is drawn down. This forces the realisation of a loss at this point...where in fact if you had a balanced portfolio, there would be other strategies performing during this market condition that effectively hedge the adverse impact of that strategy. 

My ultimate resignation to accepting that the market is very efficient has forced me into the mindset of creating the unsinkable ship through risk management approaches and then sailing that ship into an uncertain market context. Given the emphasis applied to risk management it is likely to remain afloat and provided it has some positive momentum, will (like the subtle impact of compound interest over time)...lead to wealth generation over time provided you are prepared to ride the volatility and let your strategies breath. Too much tinkering by subjective human biases often overrides this ability.

Robustness tests is all about techniques you deploy to reduce the volatility of returns while capitalizing on any weak 'bias' that may exist in the data. It is less about the equity curve trajectory and more about the nature of volatility in that signature. That gives clues in methods we can apply to 'plug' the weaknesses in the equity curve. The best I can assume from a linear projected equity curve is that volatility over time is less than a more volatile equity curve in navigating past market conditions....and that the positive slope is a potential sign that the strategy has a definitive edge. The quantum of that slope is not an indicator I use to project a future outcome.

It has taken a long time for me coming from the 'school of hard knocks' to come to an opinion regarding to the level of noise in market data. I have always been an ardent critic of the Efficient Markets Hypothesis (EMH) but have finally adopted a resigned stance that in any efficient market there is only a small amount of exploitable opportunity. This is the bread and butter for speculators seeking arbitrage opportunities and 'generally' the arbitrage opportunity is very short lived.

When referring to 'alpha' I am referring to the ability of an active participant such as a trader who through their actions have the ability to outperform the market. The market being 'beta' with any outperformance of that benchmark being alpha. The market (like a portfolio) is representative of the summated entry and exit decisions of the entire population of participants that interact with it. We as traders are seeking to obtain arbitrage opportunities in this overall zero sum game (with the added burden of frictional costs that go to the intermediaries) by outperforming the average participant over the long term and stealing a greater portion of the 'market pie' than our competitors.

What I mean by 'noise' in the market are the price movements in market data that have no enduring potential to influence a future price outcome. For example at the finest level of granularity we have the tick which is a price exchange at the finest resolution arising from the interplay between a small number of participants each with their own price objectives and strategies that impart that information into the market through their entry and exit exchanges. That price movement at the micro scale may or may not have no bearing on the future price trajectory and if it doesn't....then it is classified as noise and effectively can be termed 'random' in that it's presence does not have a bearing on future price action. It is not termed 'random' to reflect the definition of randomness applied to physics but in less formal trading talk it is termed random in that it has no bearing to the future trajectory.   

When you have two random price signatures superimposed on each other, by definition....there is no correlation between the series. As a result you tend to get destructive interference of the signatures when placed on top of each other. However when you have auto-correlated signals buried in the price series, the superposition of different 'correlated' time series on top of each other tends to amplify the ups and downs to turn what originally had a lot of noise present in all the series, to a blended series with less noise and more signal.

What this means is that at the smaller timeframes, there is lot's of noise in the signals....whereas as you step out into the longer timeframes, the signal/noise ratio increases with more signal and less noise. This ultimately has a bearing on the markets where we look for fundamental reasons for price movement in long term data....as the signal bias present in a time series....only reveals itself over a large sample size.

The same effect is achieved through flipping a biased coin. At the smallest timeframes the results of the flips may or may not have any bearing on the long term statistical result. It is only through the Law of Large numbers that ultimately we can detect the bias.

Because of my resigned respect for the EMH, I have the following central assumptions built into the way I tackle them:
1. Markets are competitive and an edge is 'hard to find';
2. It is difficult to achieve excess returns after accounting for risk;
3. A trader with an edge is a rarity;
4. An edge is ephemeral and usually a fleeting condition;
5. The EMH is right most of the time but not all the time...and it is the 'not all the time' where speculators can flourish if they know their stuff. The aim of an efficient market is just that...to be an efficient method of transacting at fair value..hence arbitrage opportunities in an efficient market are a rarity (and treated as anomalies) as opposed to a norm;
6. Weak signals in market data get amplified with the Law of Large numbers but there are valid statistical methods such as the Dickey Fuller test that can within confidence levels determine the degree to which auto-correlation is present in a time-series;
7. You have no hope of simple eyeballing in discriminating between a random price series and a price series with non random signals present. The best techniques to apply in determining this are statistical techniques.   

I will stop there rather than continue with the Tolstoy Epic :-)....but hopefully it gives you an idea of what shapes by philosophies applied to trading....and gives a basis for some of the statements I might make. There is no right or wrong....just different approaches to tackle this moving feast we call a market. We would probably all agree that this is a very tough game indeed. :-)

It's great to chat with you.

Cheers

Rich

Diversification and risk-weighted returns is what this game is about

6 (edited by hannahis 2018-08-17 12:41:28)

Re: Proof that too much historical data is harmful...

Hi Rich,

Excellent post, as usual, packed with gems.

I just want to highlight one aspects that may get muddled up in the process of discussion.

Longer Historical Data vs Shorter Historical Data

a) we need to evaluate the "quality of the EA" produced by longer data vs shorter data.

Most would argue that longer data would produce more robust EA (which I do agree too) but the issue is not about just the span of duration but the assumption behind these longer data is that we can expose/train our EA with more varied market conditions so that it would remain profitable/robust.

Rich trading style is to search for robust EA so that he doesn't need to switch on/off the EA, it can weather through the market storms. 

The reason Steve, thinks longer data is "bad/harmful" is probably because Steve's trading style is to find the best/optimal EA for the current market situations and he doesn't need the EA to be robust to weather all sorts of markets because he uses the Sidekck software to determine when to turn on/off.


Survival of the fittest vs Profit maximisation

b) So the question is, does longer data produce better and more robust EA?

Are you looking for Robust EA that can trade and survive in all sorts of market (without you having to turn on/off)? - Survival and make the portfolio "unsinkable" like what Rich would say

or

Are you looking for Optimal EA that get the most profit out of the current market, while you need to constantly re-optimise these EA over and over again to keep it abreast.  Profit maximization is the aim (more so that survival).

So in conclusion, it's not a matter of whether to use longer or shorter historical data.  The question is, what's your trading style?  Do you want robust EA so that you can hands off or you want max profit with constant need for re-optimisation.  Both trading styles have their own merits and limitations.  The choice is always yours....mummm I sound like Q.

7 (edited by ats118765 2018-08-17 13:19:27)

Re: Proof that too much historical data is harmful...

hannahis wrote:

Hi Rich,

Excellent post, as usual, packed with gems.

I just want to highlight one aspects that may get muddled up in the process of discussion.

Longer Historical Data vs Shorter Historical Data

a) we need to evaluate the "quality of the EA" produced by longer data vs shorter data.

Most would argue that longer data would produce more robust EA (which I do agree too) but the issue is not about just the span of duration but the assumption behind these longer data is that we can expose/train our EA with more varied market conditions so that it would remain profitable/robust.

Rich trading style is to search for robust EA so that he doesn't need to switch on/off the EA, it can weather through the market storms. 

The reason Steve, thinks longer data is "bad/harmful" is probably because Steve's trading style is to find the best/optimal EA for the current market situations and he doesn't need the EA to be robust to weather all sorts of markets because he uses the Sidekck software to determine when to turn on/off.


Survival of the fittest vs Profit maximisation

b) So the question is, does longer data produce better and more robust EA?

Are you looking for Robust EA that can trade and survive in all sorts of market (without you having to turn on/off)? - Survival and make the portfolio "unsinkable" like what Rich would say

or

Are you looking for Optimal EA that get the most profit out of the current market, while you need to constantly re-optimise these EA over and over again to keep it abreast.  Profit maximization is the aim (more so that survival).

So in conclusion, it's not a matter of whether to use longer or shorter historical data.  The question is, what's your trading style?  Do you want robust EA so that you can hands off or you want max profit with constant need for re-optimisation.  Both trading styles have their own merits and limitations.  The choice is always yours....mummm I sound like Q.


Spot on Hannah. Excellent summary Different course for different horses.

In the trading world context matters as conditions are non-stationery and trading styles need to be adaptive. What works yesterday doesn't necessarily work today......but under a diversified portfolio, embedded in that solution or as a strategy response needs to be the ability to address a plethora of different market conditions that can change on a pinhead with no warning.

The bottom line is what we are applying here is the scientific method and the approaches we all use are simply different models. We can only gain access to partial information and as a result, our best models will only ever be approximates....some better than others at different times....but the ultimate arbiter are the performance results produced and as markets adapt, so should our models. What we need to avoid is assuming a particular solution is an enduring solution and we need to be on a continuous journey of improving our models. This may come from small scale refinements to the occasional total rewrite.....but we can take a cue from science in this regard as it is a never ending process of making better models that stand up to the current boundary of empiricism.

Cheers guys

Diversification and risk-weighted returns is what this game is about

8 (edited by sleytus 2018-08-18 06:41:06)

Re: Proof that too much historical data is harmful...

Thanks for keeping the thread alive -- lots of interesting points and concepts, many of which are new to me.

To keep the discussion going, I'll clarify a couple of points and then I also had a question.

ats118765 wrote:

I see and understand the points you raise regarding optimization. I am not a big fan of it myself as it exacerbates the propensity for curve fit solutions.

Yep -- the problem of overly curve-fitted strategies -- I've definitely fallen into that trap.  But I would claim there is a difference between optimization and over curve-fitting.  Optimization is a good thing, whether you use it in a weak or strong form, and is necessary if you employ strategies that use simple indicators like the ones in FSB-Pro or EA Studio.  Most all indicators require one or a few settings -- and these values are absolutely critical.  I don't know of a metric or recipe that can be applied to determine whether indicator settings are overly curve-fitted.  However, after many 1000's of trades in a Real account you start to get a feel for things.  A different thread topic -- "There is no substitute for Real trading...".


hannahis wrote:

The question is, what's your trading style?

Yes, there are different trading styles.  We all sit at different desks located at different latitude and longitude coordinates around the world -- which is just tooooo cool -- so, of course, our styles and preferences are bound to vary.  However, I would still claim that data is king regardless of your approach.  There is an expression -- "garbage in, garbage out..." -- it refers to input data used for computations or measurements.  There are many fields that collect data and then analyze it -- and I can't think of a single one where someone would intentionally dilute or obscure new data in favor of old data.  Whether you use GPS on your cell phone to find your next location, or are deciding whether to buy stock in a company, or are hooked-up to some monitoring device for medical reasons, or whatever.  Can anyone name 3 fields where data analysis is performed and old data is allowed to contaminate new data?


And, now, to my question.  Let's say I offer you the choice between two strategies:
1. Strategy "Trades Well Now" (TWN) -- this strategy currently trades in a Real account.  You don't know how it will trade in the future -- but, for now, it seems to be doing fine.  Also, you don't know whether or not this strategy would have passed all your tests for "robustness".
2. Strategy "Robust" (ROB) -- this strategy has passed all your testing and critera, but has NEVER seen a live account.

Which strategy -- TWN or ROB -- would you choose and why?  You can probably guess which one I would choose (sorry -- I've intentionally worded things to make that obvious and, perhaps annoying).  But I would like to better understand, when it comes down to making a simple choice, why you would choose one over the other.  This is how I learn -- make a choice and then let others shoot it down.  And then I'll refine or change my approach to adopt new insights.  Over the past 1.5 years I can't keep track of how many times I've changed my trading style.


Actually -- I lied.  A second question...
There are lots of manual traders -- and many are successful.  Furthermore, they may or may not even use indicators.  They look at the current price chart and their brain makes a calculation.  If you would claim that our EAs require lots of historical data, then how is it that a manual trader can be successful?


Thanks...

9 (edited by ats118765 2018-08-18 07:30:48)

Re: Proof that too much historical data is harmful...

sleytus wrote:

And, now, to my question.  Let's say I offer you the choice between two strategies:
1. Strategy "Trades Well Now" (TWN) -- this strategy currently trades in a Real account.  You don't know how it will trade in the future -- but, for now, it seems to be doing fine.  Also, you don't know whether or not this strategy would have passed all your tests for "robustness".
2. Strategy "Robust" (ROB) -- this strategy has passed all your testing and critera, but has NEVER seen a live account.

Which strategy -- TWN or ROB -- would you choose and why?

Steve....just to throw a spanner in the works I would choose the second with the following caveat....that even though it has not seen a live account, we assume that it's execution is in accordance with the backtest result. That backtest result derived from data mining methods needs to be treated to ensure it is 'within realistic' bounds of probability (eg. the median result of of a comprehensive MonteCarlo array or 'block bootstrapping' test).

The reason I would always select this option is due to survivorship bias.

Let's assume you are back in 2000 and have funds ready to allocate towards various strategies....in this case fund managers. The choice is to select either:
1. The Fund manager with the best annual return for the past 12 months; or
2. The Fund manager that has had a track record of 20 years plus and offers a steady risk weighted return that is far lower than the annual returns generated by the option 1 guys.

This is effectively your question simply reshaped for the following example. In this following example think of an individual fund as a single strategy.

I have actually looked at both options and there is an approach to effectively guarantee that you will outperform the market and be alive in 20 years time just by using long term risk metrics of risk-weighted return. The former approach that simply looks at 'best profit performance over a short timeframe' is very deceptive as you tend to jump onto these opportunities too late as you needed to first identify these opportunities which means that most of the meat has already been eaten and your data mining competitors are also greedily eyeing off the opportunity or have already jumped on board. Data mining is a pretty new game in town....but even today the competition in this space is significant and enough to ensure that the opportunity arising from recent outperformance is jumped onto quickly and any arbitrage available quickly exploited.

It is a tough game identifying whether a strategy is broken or whether it is entering a drawdown...and that is why alpha is more persistent in an approach that deliberately seeks more volatile strategies that others avoid but can be treated at the portfolio level to reduce overall risk volatility.

Here is an article https://www.raftradingsolutions.com/let … nds-story/ that showcases what you can do in 2000 without any hindsight bias that ensures you achieve this over the long run. If you had simply selected the best profit performers many of them would not be around today and furthermore the need to rotate to different funds each time you rebalance your portfolio will introduce considerable drawdowns into your performance.

The tail risk that exists in any strategy or portfolio relates to it's ability to manage the downside....not the upside.

Regarding the 'simplicity' of indicators generated by EA studio....ideally simplicity is what you after for robust strategies. The more parameters that exist in a strategy (or their increased complexity) reduces the degrees of freedom allowed for by your strategy. What this does is that your strategy then dictates terms to the market (imposed constraints), when in fact the more robust solutions with less parameters impose less dictates on the market but rather responds to the market condition.

For example think of a simple hard stop placement on a unidirectional 'long only' strategy. If the stop is very wide then the overall signature of the equity curve produced by the strategy is more representative of the market condition. For example when the market is bullish, the strategy will be profitable and the equity curve will rise. When the market is bearish, the equity curve will fall.

If the stop gets placed too closely to the entry without sufficient room to breathe, the equity curve shape is dictated by the strategy (a linear descending line represented by continuously being stopped out....and bears no resemblance to the market condition). In other words the equity curve is derived from the strategy as opposed to the market condition. I also believe data is king....but market data and not system derived data. The strategy needs to float on the market condition (respond to the market data) as opposed to dictate terms on how the market should behave to generate the profitability (eg. force a profit outcome by design).   

Andreas Clenow...a very successful fund manager....and many others in the FM space discuss at length the need for the individual strategies deployed in a portfolio to be very simple...such as simple moving average crossovers, Donchian breaks etc.....but that these strategies must be adjusted for the characteristic volatility present in different markets. Complexity occurs at the portfolio 'global level' and not the individual strategy level.

Unfortunately at the moment EA studio spits out fixed lot outputs for a single strategy but over time if EA studio is further developed with inclusions such as volatility adjusted position sizing etc. we will be able to get more heavily into the portfolio management side of things where the real 'free lunch' resides. It is already a fantastic piece of software in it's current form....but the future potential is teasing me :-)

It is essential that statistical or other methods are used to determine what is potentially random versus what is potentially a signal. Programs such as EA studio will identify candidates through validation that meet your specified criteria. You must however supplement this with robust techniques to assess  whether or not the result is simply curve fit to the data or data mined on the signal that may or may not exist.

For example here is a sample equity curve that appears valid as a profitable strategy. Unfortunately the result is simply random and and has no future forecasting potential

https://s22.postimg.cc/ooncpx3x9/Sample_Instrument_Random.png

Without rigurous statistical testing you could easily fall into the trap of assuming that this strategy will work into the future.

Here is another equity curve that is not random. There is auto-correlation in the data that creates a bias to the data.

https://s22.postimg.cc/9tyrbfwtp/Sample_Instrument_Drift.png

Can you visually spot the difference? Would your performance results pick this up? The signal bias in this data series projects into the future but the random element of the series also creates significant variance against the future projection. It is exceedingly hard to confidently identify strategies that have the potential to survive into the future. Robustness testing is an essential prerequisite to apply...but in itself is also no guarantee.

These are the sort of traps we might fall into by placing an undue emphasis on recent profitability. At least with long historical data series the probability for a curve fit result is significantly less that a result derived from a shorter timeframe.

Diversification and risk-weighted returns is what this game is about

Re: Proof that too much historical data is harmful...

I just wanted to jump in and highlight an observation which may be missed by some readers (Please note I just flew in from China so my brain isn't so sharp and this thread requires a few more readings).

We really talk about optimization period. 

I think it would be easy for someone to mix up the data horizon in which you build the system (which is a different topic). I imagine that you have a length of system building data , then a specific period for optimization.  Personally I put stock in building the system over a longer data horizon  - as I don't want the system to perform over one market regime.

So lets say you optimize over 4000 bars ,  What would you want the historical data length to be when you build the system (10x - 40000 bars?? or something else)?

P.S I ask this to all - as I am sure everyone has different methods, and its nice to get different perspectives.  smile

PT

11 (edited by ats118765 2018-08-18 08:22:13)

Re: Proof that too much historical data is harmful...

DoCZero wrote:

What would you want the historical data length to be when you build the system (10x - 40000 bars?? or something else)?
PT

Hi Doc. I hope you had a great trip :-)

For me..I use what is available from the platform I trade off.....and the more the merrier. I also tend to use multi-market tests where possible subject to limitations of fixed lot sizing.

It is all just data to me Doc...so different markets in relation to data mining simply represent different conditions. If I test across 6 different markets for say the last 10 years then it in my opinion represents 60 years of different market data.

But...you probably know by now where I am coming from and all of us have different approaches we take. Certainly I can say my portfolios are robust....but the returns they actually deliver are nothing like the stunning equity curves that are generated from EA studio.....but I tend to favour robustness and sustainability over shorter term objectives.

Diversification and risk-weighted returns is what this game is about

Re: Proof that too much historical data is harmful...

1. Why do EAs require 20 years of historical data,  whereas manual traders can trade profitably using only the last 3 days from a price chart?

2. Why is "currently profitable" knocked as a "short-term objective"?  I mean, since we can't predict the future, a strategy that is "currently profitable" could also be profitable well into the future. 

3. Since "robust and sustainable" refer to how a strategy trades in a Real account, then a strategy that currently trades profitably is, by definition, more robust and sustainable than a strategy that exists only on paper.

I don't know about you guys, but "currently profitable" sounds pretty good to me.  Many of us have created strategies that look great on paper, only to see them fail in a Real account.  Backtest results do not guarantee performance in a Demo account, and Demo account performance does not guarantee performance in a Real account.  The only true measure of a strategy is how it trades in a Real account.  Everything else is just playing in a sandbox.

13 (edited by ats118765 2018-08-19 03:01:07)

Re: Proof that too much historical data is harmful...

sleytus wrote:

Actually -- I lied.  A second question...
There are lots of manual traders -- and many are successful.  Furthermore, they may or may not even use indicators.  They look at the current price chart and their brain makes a calculation.  If you would claim that our EA's require lots of historical data, then how is it that a manual trader can be successful?

Sorry Steve....I missed the second query.

Discretionary traders certainly can capitalise on persistent market conditions where price action becomes more predictable operating around a temporarily stable equilibrium point....... and can significantly outperform systematic traders in the short term......but across a longer term array of different market conditions that are defined by non-stationery equilibria, very few discretionary traders survive.

Jarratt Davis had a small hiatus in the sun....yet I cannot find evidence of an extended run. Peter Brandt is another well know technical and discretionary trader who appears to have had a successful record.....however these guys are the exception rather than the rule.

The key to survival across different market regimes is adaptability. Discretionary traders tend to be specialists who excel in a particular trading style that responds to a particular market condition......however the downside to their specialty is that they rarely have an arsenal of alternative approaches that can be applied to different market conditions successfully.

This is where systematic diversified portfolio that are already equipped with strategies configured for alternate market conditions come into their own. They are less proficient in 'making hay when the sun shines' for a single stable market condition and therefore under-perform the discretionary specialists during these times....but over an extended time horizon across a number of different market conditions.......like the tortoise and the hare.....they outlast and outlive their 'flash' discretionary cousins.

I spend a lot of my time on Barclays and IASG reviewing verified performance data to report on fund manager performance https://www.raftradingsolutions.com/pre … june-2018/ and have yet to find more than a very few discretionary traders that have stood the test of time maintaining performance metrics over 10 years plus.

The most successful breed of fund managers are the CTA's which are a subset of the hedge fund segment but depend almost entirely on the repeated systematic application of diversified momentum/trend following strategies across their portfolio's. The likes of Dunn Capital, Winton, Dreiss Research, Transtrend, Fort Capital etc. etc. etc. having stunning long term metrics.   

It is worthwhile spending some time examining their monthly performance results as frequently you see extended periods of losses. As you have mentioned previously, trading on the right hand side of the chart is a different kettle of fish to the linear rising equity curves you see from these guys. Patience and resilience is required and long losing streaks to enjoy the long term benefits of what these guys deliver.

Diversification and risk-weighted returns is what this game is about

14 (edited by ats118765 2018-08-19 04:45:50)

Re: Proof that too much historical data is harmful...

sleytus wrote:

1. Why do EAs require 20 years of historical data,  whereas manual traders can trade profitably using only the last 3 days from a price chart?

The sample size is less important than the different array of market conditions encountered. For example in the High Frequency Trading environment, the sample size is huge (within a single market condition)....and strategies deployed in the very short term environment are very unlikely to still perform if conditions change (some exceptions here like order 'front running' bots). We use sample size more as a proxy to ensure we have a better chance of being exposed to alternating market conditions.

Certain classes of strategy (eg. divergent strategies) such as trend following and momentum methods 'rely' on non-stationery conditions and make their profits with market disruptions. Their performance results are a consequence of their style. They have Low Pwin% but relatively high R:R ratios....but also have a divergent risk feature being their positive skew that makes them robust. This characteristic signature creates the volatility in their equity curves. This doesn't mean they are 'worse strategies' or have 'higher risk exposure'. In fact they are far more long lasting than their convergent cousins and there are a swathe of white papers to demonstrate the efficacy and long lasting nature of these approaches.

The times these class of strategy shine are during volatile less stable market conditions. At other times they churn to simply keep their heads above water while waiting for 'crisis alpha' moments. Statistically these strategies are deployed to catch 'fat tailed' events on either side of the probability distribution.

Have a look at the following chart that demonstrates the 'churn' that is required to generate long term solid performance metrics for a style of divergent strategy (eg. a momentum breakout trader). To achieve these results (which appear easy in hind-site), you have to endure a painful long-winded process of disappointment. Approximately 90% of trades taken were simply to keep the head above water...but the anomaly (10% of trades) were what led to the overall strong performance metrics of this solution. The reason that alpha (arbitrage opportunities) persists for trend following and momentum is that the game is very hard and requires intense discipline and patience. You simply are diversified and follow price again and again and again with the same recipe....and let the market decide when it gives you a windfall.

https://s22.postimg.cc/ch6st3bu5/Divergent_-_Long_Term_Performance.png

The flip-side however to this argument is that mean reverting convergent systems may have a hiatus in the sun over a short term interval.....but are very unlikely to last more than a few years duration. They rely on the leptokurtic peaks of the profit distruibution as opposed to the 'fat tails' to make their bread and butter. Under a normal distribution, no strategy wins. Under a Non-Gaussian distribution you get two broad styles of trading strategy that can harvest arbitrage (the tails and the peak of the distribution).

These convergent styles of trading strategy work on the principle of reversion back to an equilibrium whereas divergent styles are forward looking and simply assume that conditions will change. Convergent styles are backward looking and rely on prediction, namely that price has taken an excursion away from a stationary equilibrium....but in the future will revert back to this 'known point'. The assumption typically used by this type of strategy is that the market becomes over-valued or undervalued and will revert. This is great while it lasts but totally disruptive while it doesn't.

For example the chart below represents the Trend Following Index which comprises the equal weighted index of 54 diversified fully systematic trend following FM's. You will note that market conditions post GFC have changed where we get extended periods of mean reversion (that these funds under-perform in) inter-dispersed with short sharp periods of strong volatility and momentum (aka trends). Is trend following dead? Well when viewed from this height you can see it isn't. The overall line of best fit generates a CAGR of 9.3%. The market conditions can in hind-site be attributed to central bank intervention during mean reverting periods, whereas the volatile market disruptions can be attributed to events such as Brexit, the GFC, oil shocks etc. The key to this diagram relates to how different strategies fair under different market conditions.

https://s22.postimg.cc/bp44u87ql/TF_Chart.png

You can see from this aerial view that a single market conditions such as a mean reverting market context can last for extended periods up to 10 years etc....however this is when conditions change, you get a massive disruption to participants where our mean reverting cousins become extinct :-)

If you only focus on the short term time horizon to derive your strategies, chances are you will skew your portfolio towards 'convergent styles' that bear far more intrinsic risk. They typically have high Pwin%, lower R:R but most importantly have negative skew which is a sure sign that they bear intrinsic risk far greater than what is revealed by their 'closed position' equity curves. If you had access to their floating equity curve....you would see this intrinsic risk in action. 

Other strategic styles that have negative skew are Martingale and Grid Trading variants. The symptoms of this style of strategy is that their equity curves look glorious for a period of time....until they don't....and the account blow up comes without warning. You would think that you could turn off these strategies in time to save yourself from account blowup....but rarely is this the case as a degree of prediction is required to know when to turn them off.

sleytus wrote:

2. Why is "currently profitable" knocked as a "short-term objective"?  I mean, since we can't predict the future, a strategy that is "currently profitable" could also be profitable well into the future.

It is not knocked mate....and is a valid and noble objective provided you take the big picture of market uncertainty and the impact of fat tailed events into account. The more you diversify your strategies, the more you tend to come into contact with the fat tails. You can capitalise on this on the principle of catching 'white swans' using divergent tactics....but if you are a convergent trader who is diversified, these 'white swans' turn 'black' very quickly.

Diversification is not just market specific, but also relates to timeframes chosen and different strategies deployed. In fact system diversification is the best way to 'become diversified'. So if you trade a single market and timeframe but do so with a variety of different strategies, you are actually diversified and will come into the realm where fat tails really matter as they are much more frequent than a normal distribution would imply. :-)

sleytus wrote:

3. Since "robust and sustainable" refer to how a strategy trades in a Real account, then a strategy that currently trades profitably is, by definition, more robust and sustainable than a strategy that exists only on paper.

Definitely Steve. But it helps to think in terms of 'uncertainty' about the future and your ability to address risk to have a longer livelihood in this game....namely the likelihood and consequence of unfavourable market conditions. Engineers like to think in terms of certainty where there is a reason for why things happen...so they build the perfect bridge.....but the market...being a complex system is not predictive but rather adaptive in nature as principles of emergent relationships drive cause and effect....as with any complex system.....and the market always has the last laugh.

sleytus wrote:

I don't know about you guys, but "currently profitable" sounds pretty good to me.  Many of us have created strategies that look great on paper, only to see them fail in a Real account.  Backtest results do not guarantee performance in a Demo account, and Demo account performance does not guarantee performance in a Real account.  The only true measure of a strategy is how it trades in a Real account.  Everything else is just playing in a sandbox.

Playing in the sandbox probably would be regarded by the research houses in Quant funds as a bit of a slanted statement. There is much we can learn from the research community that work in the industry that may assist us in our path. Programmers have to learn their craft from theory before they apply it...the same as Fund Managers. Unfortunately such rigor is seldom the case in the retail community.

I would just say that optimised backtests derived from EA studio 'unless stringently treated' will never produce similar results
in the live environment. This is not to say the software is bad....but rather, you need to be aware of it's current limitations.

Cheers guys

I am taking a break for a while as I have mouthed off a bit too much lately. :-)

Thanks for the conversations all.

Rich

Diversification and risk-weighted returns is what this game is about

15 (edited by sleytus 2018-08-19 05:09:15)

Re: Proof that too much historical data is harmful...

Rich -- thanks for hanging in there with me.  Clearly you are a professional in the field and your knowledge and expertise way eclipse mine by orders of magnitude.

As a research biochemist in a previous life and a developer of wireless diagnostic tools in this one, I've looked at more than my share of data.  And through it all I am the biggest fan of the KISS principle.  I approach forex sort of like a mechanic -- always keeping KISS in mind -- and I think Popov's software gives us the best tools for this type of approach.

The principles and techniques you describe and refer to are way over my head.  That doesn't mean they aren't correct, only I have no means to comment on them.  Also, even if I did understand them, I have zero clue how I would implement and apply them.  Furthermore, at the risk of sounding like a dimwit, I will step out onto the plank and venture to say I don't think they apply here.  Forex is a different type of beast than other types of investments (which could be a topic for a different thread).  I think the primary reason I can't buy into what you describe is I think the stormy periods would cause draw-downs that would eventually wipe out my account before it had a chance to turn around.  Big investors, institutions, banks, etc. can ride out the storms, as you say, but my smallish accounts could not.  Yes -- I know about risk and money management -- and do take care to ensure I live to trade another day.  We trade in the present -- there's no past or future -- which is why current profitability is really all that matters.  And then tomorrow becomes the current and, hopefully, the strategy again exhibits current profitability -- at least that's the plan.

Adapting to current market conditions -- absolutely -- that's why I claim refreshing strategies with recent data is very important.  I mentioned the mechanic analogy -- just like changing your oil every 3000 miles or 3 months (whichever comes first) keeps your car humming, it will do the same for your forex strategies.

Rich -- I thank *you* for the conversation.  One of the best I've had in a long time.

Re: Proof that too much historical data is harmful...

Cheers Steve

All good mate and thanks for the chance to contribute :-)

Rich

Diversification and risk-weighted returns is what this game is about

17 (edited by hannahis 2018-08-19 09:44:08)

Re: Proof that too much historical data is harmful...

Steve wrote - Forex is a different type of beast than other types of investments (which could be a topic for a different thread).  I think the primary reason I can't buy into what you describe is I think the stormy periods would cause draw-downs that would eventually wipe out my account before it had a chance to turn around.

Yes, Steve, without proper weighted risk management, one may not survive the long drawn down and the account can easily wiped out.

I believe Rich, therefore emphasized the importance of Robust EA and proper weighted risk management so that we learn the art of putting together a group of EA that can help keep the account "break even/keep the head floating above water level" during the prolong ranging period (90%) to catch the King hit of 10% windfall.  Without proper weighted risk management, one can blew his account very easily due to over trading, likewise with proper risk management, one may master the art of surviving in the long haul and not just short term success.

Lastly Rich, thanks so much for your delightful contribution and simulated our brains.  Could you kindly write in another separate post about your journey, the path and learning process you took to become where you are now.  I believe a lot of us, who are aspiring to become a fund manager one day, would learn from you and understand the stages of professional growth/milestones, we need to take to be successful.

Can you open another post, something like a personal blog to share whatever you deem is helpful for the aspiring traders.  Can you share the pitfalls to avoid and the road to success that seem so elusive to many of us.

Finally, the pointers I take off from this discussion are:

1) Robust EA - The importance of building more robust EA for long term sustainability (using longer historical data, to expose our EA to be trained in varied market conditions).  Hence, MC and Multi Markets are good tools to use for creating robust EA

2) Divergent EA, though may have terrible short term equity curve, they are by far, more sustainable in the long run compared to convergent (mean reversal EA).

3) Robust Portfolio - for long term survival, it is important to blend different types/diversification of EA to keep a well balance/composite of EA to construct a good, robust portfolio, together with weighted risk management, it is use thereafter to help one maximise the returns/profit.

4) So this is what I would do...

a) In the area of EA development - build more robust EA

b) In the area of Portfolio management - use diversification of EA with uncorrelated strategies/markets to keep a good blend of EA to tide through the storm.  I'm thinking, is it possible, in addition to a robust portfolio, can I still find a sweet spot (by observing my EA's behaviours) to turn on/off my EA (while keeping them running in demo to collect performance data) based on certain metrics?  So instead of having to resign to the fact that 90% of the time, my EA trying to stay a float (of that particular EA, but for a blended portfolio, while some EA are losing, others are winning because the portfolio selects uncorrelated EA/markets), I'm thinking of switching them off so that I don't have to endure the painful experience of bearing with the long drawn DD.

c) In the area of Risk Management - with proper weighted risk management, make my portfolio "unsinkable" so that we can survive over the long haul while many retail traders come and go.


Thanks once again to both Steve and Rich for such wonderful discussion that I certainly look forward for more.

18 (edited by Irmantas 2018-08-19 13:58:42)

Re: Proof that too much historical data is harmful...

Hi again,
Very interesting discusion, when having real profesionals here smile Lots of valuable information. Will try to contribute to the topic with my humble expierence/opinion.

I just did small experiment about "less data is better". Before results shown I want first to clear something about my views on using OS (out of sample) checks. There are 2 ways to use it. First one is traditional, if OS is passed then you confirm that strategy is valid and let it trade live. I do not use it anymore, because I did experiment (similar to the below), which showed that OS check do not have any value for future profitability. Also read some information(guess that it was in a paper about "system parameter permutation"), that it is shown/proven that OS pass is the same thing as using little more data, which you exclude for OS in the first place.

And there is 2nd way to use OS, what I use, it is for simulation purposes to find out what really matters in the automated system creation and trading. So if I have idea/question to test out, I do not create bunch of EAs and let it run live trading for 3/6/12 months demo to see if it is working or not. It can take decades to find out what really works or don't this way. I just exclude that wanted data period(6-12 months) and create EAs paying close attention/measure what I want about my idea. After some sample size created (20-100 systems) I then see OS, and make some conclusion if there is some correlation between OS results and my idea, does it improve things or it is just pure random? If no correlation measured/seen then I conclude that my idea is not important in system creation, and if there is, I can incorporate it to my future system creation method/workflow (of course creating systems always with fresh data smile ).

Like this I found couple things and I can share few examples - 1. Optimized strategies always perform better than not optimized (made maybe 3 experiments about this), also FSB optimizer is reliable compared to other methods, for example like choosing variables from "stable" regions. 2. Over-trading strategies perform very badly in the unseen future, so quality over quantity is important 3. Systems must be re-optimized or thrown out and changed with new ones over time, because there is some point then seemingly winning portfolio turns around (however I still believed that more data is better for a reason that systems can see more different market regimes)

So let's get back to experiment results

I generated 20 system portfolio on 5 different instruments. Then optimized 2 EA versions , one for all 4 year data, and other for 8 months, like Sleytus used in his example. After that I checked what for 2.5 months they would have done in most recent unseen market. 0.01 lot used on H1 timeframe.

https://s22.postimg.cc/n0ycsnhn1/excel.png

https://s22.postimg.cc/ju3t9dcq5/4year_data_portfolio.png

https://s22.postimg.cc/eiowoo631/8months_data_portfolio.png

My expectation was to get long and short optimization periods both negative portfolio balances on last unseen 2.5 months, because quick random generated strategies with lots of variables and no any robustness testing is usually not a good thing... However both comparable portfolios is with profits, but I consider it to be because of lucky 2.5 months. Also I expected that longer optimization period to outperform shorter period, that was main experiment point. BUT I got both very similar results, almost identical. However with more data it can diverge, and a real winner can pop up. There is a need to reproduce this experiment with more systems, and with different OS months to have bigger sample size to draw conclusions. But my believes are already shaken out, and maybe I will continue this experiment, who knows what results can be found and it is probable that less data will be better, especially it can be true if I start to walk forward on couple OS windows. Also  combined with Steve's sidekick software help you can get very powerful and satisfactory trading methodology smile I think that Sidekick should be a lot more powerful with less data, than more data, because it easier to get great looking PL curves, so you know very quickly when your system is not performing as expected. Compared with more data, you always have longer and deeper drawdawns, so you can not turn off systems quickly objectively like you can do it with less data. This thing alone can have edge. Hope it make sense what I try to explain here.

It makes me think, maybe what Sleytus already trying to say to us long time is if you generate strategy on long period, and optimize it on short/recent data you can take best things from 2 different views/perspectives - While you generate on long period, you know that systems logic is robust in long term, and while you optimize with only recent data you tune up strategies variables for a most recent market condition. Thank you Steve for trying hard to rise these questions and challenge our believes.

Have a nice day,
Irmantas

Re: Proof that too much historical data is harmful...

Just one quick note - Some other software I use lets you place the OOS time period at the front of the sample. So you build your system in the closest data - and validate on the older data.  Many ways to skin a cat wink

Re: Proof that too much historical data is harmful...

Hmmmmmmm

This is most interesting. A lot of great information in these posts.

I have a few strategies that require a lot of data because they are containing  a few indicators and will fail in the short term unless carefully processed with walk forward.

I have another few strategies that are based on patterns that do not require much data.

My good friend, Steve, expects me to disagree with him on the need for data..... hahaha  I am of mixed opinion on this, I suggest that we each have different views as to what we need and those views can vary as time passes.

Because this topic is so important I am changing my folders so that I can conduct some research into different optimization periods ie mark the folders to show the length of the original optimization and then using the handy tools that Steve has developed to measure results as time passes.

On some of my strategies, I have been doing walk forward with another software which demands a lot of data for what it calls 'statistical significance', that has influenced me one way. The results have been, so far, that I get lower win ratios on a few, especially those with more than one or two indicators.

It seems that the lower the number of variables, the shorter time period that is required.

For myself, I have to do some  detailed research to satisfy myself.

My 'secret' goal is to push EA Studio until I can net 3000 pips per day....

Re: Proof that too much historical data is harmful...

I agree with sleytus about the need to optimize using recent data.

But how about the generating of new strategies?
Is it more valuable to use most recent data for that as well, or is it better to use past data for the possibility to have kind of a backtest with the most recent data?

Re: Proof that too much historical data is harmful...

Lagoons wrote:

Is it more valuable to use most recent data for that as well...?

Good question.  Of course, I can't say for sure, but I can provide some anecdotal observations.  I continue to use strategies that were generated over a year ago (with 2-year-old data).  After refreshing (i.e. re-optimizing) using more recent data then their statistics are as good as before.

So, to answer your question, I suspect it doesn't make much difference what data is used when generating strategies as long as there are around 100 - 200 trades.  After all, these are just simple algebraic formulas whose constants (i.e. indicator settings) are adjusted to suit the current data.  You can continue to use "old" strategies as long as you re-adjust their indicator settings using current data.

Re: Proof that too much historical data is harmful...

Recently dannnnn_the_man started a new thread:
https://forexsb.com/forum/topic/7542/significance-of-timeframe-adjustment/

He showed a couple of charts where his strategy had been back tested against two different data horizons and was wondering why the charts looked so different.

The answer is obvious -- check-out the above link.  His chart comparison also shows why using too much old data when optimizing (that is, computing indicator settings) results in indicator settings that are poorly matched for the current data horizon.

When you look at your own charts it's so obvious.  I'm surprised so few people take notice -- perhaps it's because we only see what we want to see.  I don't know...

24 (edited by Lagoons 2018-10-22 21:41:19)

Re: Proof that too much historical data is harmful...

Thanks sleytus,
you are really helpful.

I think many people see a nice shaped equity curve over a huge data horizon but don't realise that the "resolution" (I hope you can understand what I mean) is huge.

So you don't see the drawdowns and (maybe) the lack of performance on a shorter period of time.

If you could scale into it, the picture might be much different.

25 (edited by sleytus 2018-10-23 12:55:29)

Re: Proof that too much historical data is harmful...

Lagoons wrote:

I hope you can understand what I mean

Lagoons -- thank you.  Finally someone else gets it -- at least no one else has bothered to acknowledge this is an important issue.

And I know *exactly* what you mean.  From 30 miles high I've seen what appear to be beautiful balance curves that span several years.  But then you see the Win Ratio is something like 0.51 -- which is lousy.  And you are correct -- from a distance you don't see the draw-downs.  If people would zoom-in then they would see many peaks and troughs. 

Sometimes the troughs last for weeks or months, but you can't see them because the screen resolution of your monitor is 1280x1024 pixels (or worse) and MT4 is attempting to chart thousands of data points.  Since there are more data points than available screen pixels, then the charting software compresses them in order to fit on a chart.