1 (edited by Lagoons 2018-06-27 14:50:14)

Topic: How to avoid curve fitting? Performance drops live significantly

Hello,

so I'm getting familiar with FSB.

I've created multiple strategies and I was wondering how reliable they would be.

So I generated strategies based on multiple years of data but I used no data after the beginning of 2017.
Cause I'd like to know how it would behave in "live" testing.

Attached is one quick example, I know it's not a perfect one. but the results are quite similar among different strategies.

That is how the generated strategy looked like originally.


https://s8.postimg.cc/dstfdd51t/Strategy.png

Re: How to avoid curve fitting? Performance drops live significantly

As I removed the "end date limit"
it looks like that.


https://s8.postimg.cc/4xsl2wdpd/Strategy_Live.png
You can clearly identify the drop when it "goes live".

I've experienced this phenomenon whith every strategy.

Monte Carlo looked good before, so I wasn't expecting such a performance.

So how can I avoid this?
Wouldn't it be better to build such a function into the generator itself, so that it automatically checks if the "live
performance" fits.

Cause I'm not sure if the Out of sample function does the trick?

Re: How to avoid curve fitting? Performance drops live significantly

You do it right. To make it easier, generate a full collection of 100 strategies. Then recalculate the collection on the new data.

Re: How to avoid curve fitting? Performance drops live significantly

I think there are differing opinions about OOS.  Originally I thought it made sense -- but no longer, especially if the OOS data is contiguous with the IS data.  Here's why:

1. If the OOS shows good results it most likely is because it represents a continuation of the same market conditions in the IS data.  So, you would expect the performance to be similar in both the IS and OOS regions.

2. If the OOS shows poor results it most likely is because it represents different market conditions than the IS data.  Since the strategy was trained / optimized with IS data then it should not be expected to perform well against data patterns it has never seen.

In my opinion OOS penalizes you in two ways:
1. If the OOS shows good results then it is misleading because it makes you think your strategy is more robust than it really is (since the OOS region is just a continuation of the IS region).
2. If the OOS shows bad results then you discard the strategy -- but the strategy could perform fine if it had been trained using recent data.

Solution:
1. If you have any strategies you wish to discard because they perform well in IS but not OOS, then please send them to me.
2. Train your strategies with recent broker data.  There is no guarantee, but the odds are in your favor that near-term market conditions will be similar to last months's market conditions.
3. If you must use OOS, then choose an OOS region that is far removed from the IS region and not one that is contiguous.  However, if you do this you will be greatly disappointed -- I've tried.  If you choose IS and OOS regions that are separated by a couple of years then no strategies will survive.  Remember -- these are just simple, algebraic formulas and you only have a handful of settings (i.e. constants) that can be tweaked.  Yet, there are zillions of different data patterns.  I don't think it is fair to expect a simple algebraic formula to be able to detect that many patterns.

Re: How to avoid curve fitting? Performance drops live significantly

This is what I do.

I usually increase the spread and commission before starting the generator as stress test, for example 17 pips of spread and $3 commission.

I also use H1 TF only, because this is the largest TF that is not affected by GMT difference. Any TF larger than H1 is vulnerable to GMT difference, and TF smaller than H1 is weak to spread and slippage. So H1 is the most ideal TF for me.

Stagnation play most important role on my portfolio, the second one is SQN. The others are count of trades and drawdown.

do or do not there is no try

6 (edited by hannahis 2018-06-28 19:43:53)

Re: How to avoid curve fitting? Performance drops live significantly

Hi Lagoons,

I think what you posted isn't the issue of curve fitting per se.

There are many possible reasons why strategies failed in live.

1. The most obvious would be different trading platform (brokers issue), the data you use may not be the "same" quality of data traded.  Some broker (Market Makers) manipulate their data/spread etc and hence given such scenario, the odds are already stacked against you.

2. Like what Sleytus mentioned, your EA aren't trained in the same kind of data, the current market is now presenting.  Hence, it is an issue of robustness of the strategy, rather than the issue of curve fitting.  A robust EA, ideally have been "trained" in various market conditions to be a all rounder.  Hence, some people believe you need longer historical data (assuming these data would cover extensively all the various market conditions) for your EA to be robust and all rounder.  However, it's a double edged sword, to get that kind of robustness, your EA may also be an "average" EA to round off all the extremes.

3. Sleytus has an ingenious way, instead of hoping to train his EA to be all rounders, he have a diverse number of EA that covers all types of market situations and hence, whichever EA "click" well with the current market, it is then activated to trade - Side Kick Software (if you want to read more about it).

4. I've done a small experiment, using 3mths, 1 yr and 2 yrs data to Optimsie my EA and see which data duration yield better results.  My preliminary findings (not conclusive yet) is that 3mths data seem to show better results, followed by 2yrs and 1yr.

Why is that so?

a) The 3mth period selected (Mar to May) is a relatively "difficult" trading period and hence, using this data to "train" the EA, the EA may therefore be more robust to handle "difficult" trading conditions.

b) 2yrs data period, with longer period may therefore, contain more "varied" market conditions and hence performed better than 1yr (which contain less varied market conditions).  With more "varied" market conditions, the EA is more robust than 1yr data, however, because of the longer period, it has "diluted" the intensive "training" period which the EA would get in the 3mths "difficult" trading conditions and hence 3mth period EA did better in overall performance.

Thus the issue isn't just about curve fitting but how robust are your EA being trained so that ultimately, when you trade live, it is robust enough to handle varied market situations.  If it is not trained to handle varied market conditions, it will still fail and that's nothing to do with curve fitted EA or not.

So do choose your data period wisely (whether it contains sufficient tough trading conditions) and do take note, it is not necessary that the longer the data, the better it is, it's not the duration of the data but rather the "quality of the data" be it the "clean" data and also "tough trading" period data to put your EA to the strenuous testing/training.

Last but not least, kindly don't just throw your EA because they don't pass the OOS (you can consider giving to both Sleytus and me), do at least put them into demo testing 1st and observe for yourself how your EA behave, you will then become a better trader and know what new rules to add to improve your EA further. 

EA that passed backtest may fail in live trading and hence similarly, EA that failed backtest may pass in demo/live trading and thus imagine, you could have possibly been throwing potentially good EA just because you think they failed in the backtesting stage.

With Sleytus's Portfolio Maker, one can test 100 EA easily in an instance, so why the need to "prune" your EA so early without giving it due opportunity to be tested out in demo accounts?  I normally test thousands of EA and I let my demo results determine my selection process.

1st level of accuracy = live testing

2nd level of accuracy = demo testing

3rd level of accuracy = backtesting

With the Portfolio Maker, you don't have to settle for 3rd best, you can go for demo testing to verify your EA's results and once you have shortlisted the good ones, do your tick backtesting (if you still need more confirmation).

Re: How to avoid curve fitting? Performance drops live significantly

Lagoons wrote:

Cause I'm not sure if the Out of sample function does the trick

There is a very simple way to check whether OOS works.

1. Start with a strategy that has "survived" OOS that is contiguous with the IS data.

2. Retest that strategy using an unrelated OOS -- i.e. a couple of years removed from the IS data.

If the strategy does NOT survive (2) it means the first OOS really wasn't OOS (i.e. the OOS was simply a continuation of IS).

If the strategy DOES survive (2) it means it likely is a robust strategy.

Re: How to avoid curve fitting? Performance drops live significantly

sleytus wrote:
Lagoons wrote:

Cause I'm not sure if the Out of sample function does the trick

There is a very simple way to check whether OOS works.

1. Start with a strategy that has "survived" OOS that is contiguous with the IS data.

2. Retest that strategy using an unrelated OOS -- i.e. a couple of years removed from the IS data.

If the strategy does NOT survive (2) it means the first OOS really wasn't OOS (i.e. the OOS was simply a continuation of IS).

If the strategy DOES survive (2) it means it likely is a robust strategy.


Hi,Sleytus

To mention your quote: "2. Retest that strategy using an unrelated OOS -- i.e. a couple of years removed from the IS data.". You mean the time duration of data that adjacently before IS,Right?

Re: How to avoid curve fitting? Performance drops live significantly

Hello electronics,

It's easy to test, but I probably should have been more clear.  When the topic is OOS we neglect to discuss whether to put the OOS data before or after the IS data.  It can make a big difference.

Here's a couple examples:
1.  IS Data:     October 2017 through March 2018 (6 months)
     OOS Data:  April 2018 through June 2018 (3 months)

2.  OOS Data:  October 2017 through December 2017 (3 months)
     IS Data:     January 2018 through June 2018 (6 months)

Both examples use CONTIGUOUS 6 months IS and 3 months OOS.  Are they the same?  Which would you choose?  I would choose (2) with no question.  The reason is because that IS portion includes the most recent data horizon.  However, most implementations of OOS use (1) -- which means the IS portion (which is used to optimize the strategy) uses data that is old.  Do you know what I mean?

The proper way to confirm whether or not OOS helps to create more robust strategies is to choose an OOS that is unrelated to the IS region.  For example:
3.  OOS Data:  October 2016 through December 2016 (3 months)
     IS Data:     January 2018 through June 2018 (6 months)
In case (3) you can see the OOS data is about 1 year removed from the IS data -- i.e. it is NOT contiguous -- and you are almost guaranteed the OOS and IS represent different market conditions (I didn't check this for sure, but you get the idea).

If your strategy were to survive (3) then it is probably robust.  However, it comes at a price...
a. None of your strategies will survive -- it's just asking too much from a simple algebraic formula.
b. Even if a strategy did survive, the OOS portion from 2 years ago will degrade the statistics for a strategy that is intended to be run today.  Do you really want to run a strategy that was trained with old data in your real account?  My feeling is that  to get the best performance from your strategy TODAY then it should be trained / optimized using recent data from your broker.

To be honest, I didn't always think that way.  Not long ago I had hoped I could train / optimize a strategy once and if it were a good one I could use it forever.  And I tried various things, including "hybrid" data sources where I would stitch together data from multiple pairs and use that to generate / train strategies.  The price you pay for using recent data is you have to repeatedly spend time "refreshing" your strategies with the latest data -- and if you are lazy like me then that is a pain.  But I now feel that really is the only way to go.

I apologize for the long response -- I didn't intend for it to be this long but after I began writing it kept growing...

10 (edited by hannahis 2018-06-29 11:09:03)

Re: How to avoid curve fitting? Performance drops live significantly

Hi Sleytus,

Interesting discussion.

1. Conventionally, most use "continuous data"/contagious as OOS

2. Since OOS data is used after generating the system, users tend to "naturally" think of using the remaining last portion of the data as OOS (i.e. 20% of the data = the last 20% of the date left).

3. You suggest using Non "continuous"/contagious data, such as using the beginning of the data as the OOS

4. If I were to follow your thread of thinking correctly...(I'm not suggesting that you agree with my thoughts below, but I'm saying that branching out from your idea of using non contagious data, we can use different OOS data from different period far removed from the current data used in IS

We can also use 100% of our current data to "train" our EA and use other period data and saved them as "multi markets" to further test these EA in OOS data

Eg.

a) use IS data = latest 3mths or 6mth (or whatever length user deem fit)

b) Then save different period data such as past 3mth under another data source

c) Save another different period data as another data source, etc as many different data period as one needed with each period saved under different data sources.

d) in selecting these different data, one can zoom in to identify different market conditions such as ranging, trending, high volatility market, low volatility market etc as saved them into different multi markets

e) run the EA under Multi markets functions and select those data source you have save, knowing which data source is for what market conditions etc

PS: I'm not very well verse in this area and hence, my suggestions may be off the mark, further such methods have already been suggested by others and hence I'm just recapping those ideas I've learnt from others (giving them their due credits)

Re: How to avoid curve fitting? Performance drops live significantly

Hello Hannah,

Yes -- what you describe is along the lines of the point I was making.  However, the main difference is I am advocating to not bother with OOS because I don't think any strategy would survive an OOS data horizon that is far removed from the IS data that was used to generate / train / optimize the strategy in the first place.

Remember the algebraic equation for a straight line?  y = mx + b, where 'm' and 'b' are constants.  If you have two data points do you think you can compute a good 'm' and 'b' that will fit the line?  Probably, yes.  What if I gave you 1000 data points?  10000 data points?  The more data points you use the poorer will be the fit (especially when the data is unpredictable).  There is no way a straight line can approximate 10000 nearly random points.  The algebraic formula for a straight line is similar to the algebraic formula for a strategy in that there are only a few constants that can be adjusted to fit the data.

So, what I am suggesting is why bother with OOS.  Instead, focus on using the most recent broker data to train / optimize your strategies.  You do NOT have to regenerate them -- just retrain / reoptimize.

12 (edited by rjectweb 2018-06-30 07:49:34)

Re: How to avoid curve fitting? Performance drops live significantly

Hi Sleytus,

I have a couple of quesstions about your reoptimization method

1. how many bars are you using as in-sample? are you using an In-sample as a number of bars or as a specific period that you manually select based on certain criteria (like the market regime you visually identify on it)?, and what kind of window do you use as In-Sample? I mean, is it a rolling window or is it an expanding window?

2. what fit function do you look for in your optimizations? is it net balance, best return to DD, best profit factor, sharpe, etc?

You told us you were performing the optimizations each weak on the weekends, is it correct?

If you are using 1 week live and then reoptimize, I understand that what you are doing is similar as leaving 1 week as the out of sample in a traditional training-testing process. And if you are reoptimizing then after one week, what I understand you are doing is like a walk forward with OOS periods of 1 week. Am I right?

I'm pointing that here because I'm interested in knowing the parameters you are using for your reoptimizations (in sample window and fitting function) but also because I feel not every EA can work as you describe. I think it is not always enough to build an EA optimized to the very last day thinking about reoptimize it once and again: the EA must be robust enough to benefit from reoptimizations because otherwise what we had is a very overoptimized EA unable to work one single day in live conditions.
So, my point in the end is, OK, your method may very well make sense but you need a special kind of EA for that (robust enough) and that can be tested with a cross-validation process (time series = walk-forward optimization).

I think you are right about recent market conditions are the most important in trading, but that leads to the problem we are describing in this posts: if we use that period as OOS, we leave out of our training the best period of data we have to train our EAs. But if we don't use that as OOS, we are not sure about our EAs being robust enough to trade them live. And I think that is the reason why algorithmic traders use walk forward analysis and walk forward optimization. Am I right here?

I understand the benefits of WFA/WFO and reoptimizing frequently (on a timely basis) but I need to learn more about how to do it, because in this case (if you use WFO) you must be very constant in what you are doing, and also know how to choose the training period (rolling window/specific window) and choose your fitting function (know in advance what you should search).

Thanks for your contributions. Warmest regards.
RJ

13 (edited by sleytus 2018-06-30 09:18:07)

Re: How to avoid curve fitting? Performance drops live significantly

Hi RJ,

1. I use 4000 bars.  Since I exclusively trade EURUSD/H1 that works out to about 8 months.  I do believe the number of bars is more important than a time period, however I don't think there is anything magical about 4000.  In my hands it seems that is good enough.  Too few and optimization may lead to over curve-fitting, and too many leads to poorer stats and a waste of CPU time.

2. I used to use SQN, Sharpe, PF -- but I think that leads to over curve-fitting.  Now I'm using Net Balance.

3. If I have time I'll try to do it each weekend, but sometimes I don't feel like it and I'll let a few weeks pass.  I don't think a few weeks makes that much difference -- it all depends on market conditions.  If market conditions are changing then you are hurting yourself by not using the latest data.

4.

what I understand you are doing is like a walk forward with OOS periods of 1 week. Am I right?

No.  I don't think in terms of OOS.  Remember -- my strategies go straight to live accounts.  If I don't refresh for a week or two or longer it's because it takes time and I didn't feel like it.  Also, if my strategies are currently performing well then I don't feel as compelled to refresh them.

5. I'm not talking about optimizing to the very last day.  I'm doing the same thing as everyone else -- except rather than using 6 months IS and 2 months OOS data, I'm simply using 8 months of the latest IS.  That's all. 

6. When most people use OOS or WF their IS data is OLD.  Using old data to train / optimize your strategy just doesn't make sense to me.  And the proof is right in front of you.  When you do a WF test and see the strategy performing well over the IS region and poorly over the OS region that proves that strategies do better over regions they were trained against.  Which makes perfect sense.  Furthermore, when you do encounter a strategy that performs well over the OS region there are two possible explanations:
a. you have found the holy grail
b. the OS region is simply a continuation of the IS region and, so, it is expected the performance would be the same.  Do you ever see a strategy perform *better* in the OS region?  Very rarely -- if ever.  If a strategy were truly robust there is no reason why you wouldn't expect it to often perform better in the OS region -- but you never see that.  Hence, they are only as robust as what they exhibit in the IS region.

Thanks for hearing me out...

Regards,
sleytus

Re: How to avoid curve fitting? Performance drops live significantly

https://s8.postimg.cc/u4aj6be75/trendng-ranging-02.png

This is a EURUSD /H1 price chart from FSB Pro.  I think I showed a similar chart in a different thread.  The data horizon is 4000 bars and covers the period 2017-11-03 thru 2018-06-29.  In other words, approximately the last (most recent) 8 months.  It nicely captures 3 market conditions -- trend up, ranging, and trend down.

When you train / optimize a strategy I think it is important to pay attention to the underlying price chart.  Your strategy may not perform well under all market conditions -- and that's okay.  I mean, if there are market conditions where the strategy doesn't trade or its performance remains neutral that's fine.  As long as the strategy doesn't decline it should be fine.  It will perform well under market conditions that include data patterns that it recognizes, and it should stay roughly neutral under market conditions that include patterns that it doesn't recognize.  Since Popov's software allows us to generate portfolios that include many strategies, then there is no harm if there are times when certain strategies rarely trade.

When people use IS and OOS they aren't paying attention to the underlying market conditions of the IS and OOS regions.  So, if you blindly use IS and OOS and don't pay attention to the finer details I think it offers little or no benefit.

15 (edited by Lagoons 2018-07-02 22:07:39)

Re: How to avoid curve fitting? Performance drops live significantly

sleytus wrote:

2. Train your strategies with recent broker data.  There is no guarantee, but the odds are in your favor that near-term market conditions will be similar to last months's market conditions.

Okay, I'll do that.

I've done it the other way cause I've thought it would make sense to train a strategy on a bit older data to create sort of a "live testing" environment.


And I'll look at the MT4 Sidekick the concept behind it sounds amazing and could be very helpful.

hannahis wrote:

4. I've done a small experiment, using 3mths, 1 yr and 2 yrs data to Optimsie my EA and see which data duration yield better results.  My preliminary findings (not conclusive yet) is that 3mths data seem to show better results, followed by 2yrs and 1yr.

Why is that so?

a) The 3mth period selected (Mar to May) is a relatively "difficult" trading period and hence, using this data to "train" the EA, the EA may therefore be more robust to handle "difficult" trading conditions.

b) 2yrs data period, with longer period may therefore, contain more "varied" market conditions and hence performed better than 1yr (which contain less varied market conditions).  With more "varied" market conditions, the EA is more robust than 1yr data, however, because of the longer period, it has "diluted" the intensive "training" period which the EA would get in the 3mths "difficult" trading conditions and hence 3mth period EA did better in overall performance.

Last but not least, kindly don't just throw your EA because they don't pass the OOS (you can consider giving to both Sleytus and me), do at least put them into demo testing 1st and observe for yourself how your EA behave, you will then become a better trader and know what new rules to add to improve your EA further. 

With Sleytus's Portfolio Maker, one can test 100 EA easily in an instance, so why the need to "prune" your EA so early without giving it due opportunity to be tested out in demo accounts?  I normally test thousands of EA and I let my demo results determine my selection process.

1st level of accuracy = live testing

2nd level of accuracy = demo testing

3rd level of accuracy = backtesting

With the Portfolio Maker, you don't have to settle for 3rd best, you can go for demo testing to verify your EA's results and once you have shortlisted the good ones, do your tick backtesting (if you still need more confirmation).

Hello hannahis,

I also thought that if I train an EA on a longer time period it would be more robust.
But suprisingly the results weren't better at all.

I will follow your advise and leave the EAs on demo for longer, I'm still quite new to FSB so I'll have to check out what suits me and what doesn't.

Before I forget; thanks for the input guys so far, it is really helpful, cause it is much more complex than it looks at first sight.