Re: Project To Determine Most Effective Acceptance Criteria Settings

footon wrote:

Another observation - Sleytus, you are assuming that those strats, which survive, retain their edge, correct? My modelling shows that this is not the case, there are still too many bad apples which go bad after initial pruning, thus rendering the overall profitability to breakeven minus trading costs in the best case scenario...

Although I'm an advocate of pruning I do *not* assume that strategies that survive an initial pruning will continue to retain their edge.  I'm thinking that with the paradigm shift to "Managing Portfolios" (from creating EAs) that pruning now becomes part of the process -- sort of like how you have to mow your lawn from time-to-time.  How often to prune -- I don't know.  For now, I'll usually spend a little time during the weekend doing some pruning -- sometimes I'm pruning a virgin Portfolio Expert that has never been pruned before, and sometimes I'm pruning an older one that had previously been pruned.

There's an old American Indian saying -- "Have faith in G-d, but row away from the rocks...".  In other words, I would like for strategies to retain their edge, but I am prepared for those times when they don't.  For now I continue to generate new collections and even though they are polluted with poor performers, overall they perform better than break-even.  And when I add pruning on top of that, then they really do well -- until the time they don't.

Re: Project To Determine Most Effective Acceptance Criteria Settings

footon wrote:

I woke up this morning and realized that the optimizer has an option - Add the optimized strategies to the Collection

Unless I was using this feature incorrectly -- I would recommend to be *very* careful when using it.  It has the potential of essentially replacing your collection of 100 strategies with 100 variations of *one* strategy.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Aha! I understand now. I wasn't looking in the Validator...only the Reactor. So that avoids the problem of filling the Collection with multiple Optimizations of the same EA.

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:
footon wrote:

I woke up this morning and realized that the optimizer has an option - Add the optimized strategies to the Collection

Unless I was using this feature incorrectly -- I would recommend to be *very* careful when using it.  It has the potential of essentially replacing your collection of 100 strategies with 100 variations of *one* strategy.

This is to be used with a single EA and an empty Collection. You can then compare the Optimizations.

30 (edited by qattack 2017-08-21 02:18:53)

Re: Project To Determine Most Effective Acceptance Criteria Settings

I've restarted my generation from scratch due to the new and valuable OOS validation.

I will post preliminary results within 24 hours. Some valuable insights are arising from the OOS data.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Here are my results from my initial runs incorporating OOS.

My working hypothesis is that through the use of OOS data, I can more quickly and efficiently test whether the strategies that I am generating (over the In Sample data) will continue to perform well in the future (OOS, on the data it was not optimized for). The OOS data should simulate placing the EAs on a demo account.

This allows me to change various settings (including, but not limited to, Acceptance Criteria, Monte Carlo variables, Optimizer settings, SL and TP, data sample size, etc.) and determine relatively quickly what effect those changes have in the OOS period. In fact, I can reach a "large enough" sample size to be meaningful, something that is almost impossible deploying strategies on test servers.

I know there will be dissenters, but it is my belief that using the newfound strategy generation settings in this manner will lead to generating EAs that are much more dependable. Please note that it is entirely possible that, once the "optimal" settings are found, the OOS period may be removed completely for live strategy generation. This method is simply using OOS data to determine those settings that make this possible.

This process has been extremely revealing so far. Some things I completely expected; there are a few that surprised me. It will take me a while to fully interpret and understand the data.  As always, I am open to changing my mind about anything. Thanks again to Steve/sleytus for making me interested in this more mass-generation-oriented approach than I was using before. I was quite against generation over a short periods initially.

This test was run on slightly different settings than I originally intended, due to the new OOS validation. here are those settings:

1. Historical Data:
   * Symbol: EURUSD
   * Period: M30
   (Data Horizon: 22500 bars; 15750 IS, 6750 OOS)
   * In Sample: From 10-23-2015 to 01-26-2017
   * OOS: From 01-27-2017 to 08-11-2017
2. Strategy Properties:
   (Account size: 100,000)
   * Entry Lots: 1
   * SL: Always/10-100 pips
   * TP: May use/10-100 pips
3. Generator settings:
   * Search best: System Quality Number
   * OOS: 30%
4. Optimization:
   * 5 steps
   * SQN
   * 30% OOS
5. All data validation: YES
6. MC Validation:
   * 100 tests
   * Validated tests: 95%
   (Settings: defaults PLUS "Randomize indicator parameters" [10/10/20 steps])
7. NO market validation

***Acceptance Criteria:
   > Complete Backtest:
      * Max Amb. bars: 10
      * Min Net Profit: 10 (no effect)
      * Min Trades: 100 (no effect)
   > IS part:
      * Min Trades: 100
      * Max DD%: 25
      * Max Stag%: 35
      * Min PF: 1.1
      * Min R/DD: 1
   > OOS part:
      * Min Trades: 25
      * Max DD%: 25
      * Max Stag%: 50
      * Min PF: 1.1
      * Min R/DD: 0.5
__________________________________
Philosophy of Setting Acceptance Criteria

When setting Acceptance Criteria Values, I entered values far worse than we would want to use in real trading systems (with the exception of number of trades IS = 100; it's necessary to generate over a sufficient sample of trades). This is so I don't accidentally filter out any strategies by trying to be too accurate, while still eliminating a portion of them.

I will progressively narrow the AC, but only so far as to remove only a very rare potentially good strategy.
__________________________________

Note: I ran more Calculations on IS/OOS data. Because significantly fewer strategies are validated with IS/OSS, the variance is much higher, so I needed a larger sample size. But I was initially very surprised when I noticed the IS/OOS had calculated MORE THAN TEN TIMES the number of strategies on average. First, I thought that my settings must be incorrect somewhere. This was not the case. Then, I thought perhaps the new backtesting engine had a bug in it.
___________________________________
And finally, the results:

IS only:
Generated Strategies: 180374
Number Passed Validation (Generation step): 4861
Percent Passed Validation (Generation step): 2.69%
Number Passed Validation (MC step): 760
Percent Passed Validation (MC step): 15.63%
Percent Passed Validation (All steps): 0.4213%


IS/OOS:
Generated Strategies: 4511276
Number Passed Validation (Generation step): 2337
Percent Passed Validation (Generation step): 0.05%
Number Passed Validation (MC step): 842
Percent Passed Validation (MC step): 36.03%
Percent Passed Validation (All steps): 0.0187%
_____________________________________

For this initial experiment, I used a test group without the OOS period for the control group. The In Sample length was identical to that of the generation run with OOS.

When complete, I compared the statistics each generation using the number (percentage) of strategies that Passed Validation in each of the Generator Step and The Monte Carlo step.

The difference between the Percentage of "Passed Validation" in the Generator step represents the relative number of strategies generated via In-Sample-only generation that were not viable when trading out of the optimized period (i.e. they were curve-fit) for the following 6.5-month time period.

Notice that for the IS/OOS, nearly all the strategies generated did not go on to be profitable over this 6.5 month period. This is far from the final word, but the reader should at least consider that relying on IS results alone may not be viable. You might say that 6.5 months is too long and you expect most strategies to fail within that time and that your pruning strategy will (eventually) solve the problem. Perhaps.

This test can be repeated using only 10% OOS (~2 months) and I bet the results wouldn't be substantially better; yes, you will have a higher proportion of winning strategies on average. But I contend that that is due mostly to random fluctuation and small sample size. Run your own experiment: 10% OOS over the same period as I have done (except change the OOS period...you will need only 1575 bars OOS). Reset Acceptance Criteria (but change #/trades OOS to 9, which is proportional to my AC #/trades) and quickly generate 100 random strategies. Don't spend time with MC testing. Examine those 100 random strategies and see just how many show a profit after two months OOS. I think you'll be amazed at how many actually do show a profit.

For this test, I used a mandatory time for testing for profitability. That length of time may be changed and the process repeated. You will, of course, find more strategies that are "profitable" in the shorter terms. But you will also need to base your results on MUCH reduced number of trades (or accept that the number of strategies generated will go down relative to that same length of time).

Basing your results on fewer trades leads to greater and greater problems with small sample size and reliability of results. Dr. Thorpe, in his formulation of System Quality Number (my favorite metric), details that you must make at least 40 trades before that statistic becomes truly accurate. I set "Number of Trades" to only 25 in the OOS period to capture more strategies. However, I'm sacrificing some confidence in the results.

But I quickly realized the reason: I am using Monte Carlo simulations that conduct 100 tests upon each strategy that passed the Acceptance Criteria. Because the IS/OOS was more picky in selecting its passed strategies, it had much more time to actually generate strategies rather than spending it on Monte Carlo validation.

Strategies passed Monte Carlo validation at well over twice the rate with IS/OOS. Monte Carlo validation in the IS/OOS was over both IS and OOS, so this certainly contributed to some degree to the higher validation rate. But consider, too, that the overall quality of the strategy's results are degraded by the use of OOS results. This is partially compensated by the fact that I did filter OOS, so they are the highest stats of any OOS runs.

One observation is that with the IS-exclusive generation, since there are many more strategies being generated than can be held by the Collection, it will in the end contain a much better-than-average representation of strategies than the above stats would otherwise indicate.
___________________________________
What do these results tell us and how can they be of use in further testing?

This intial run was not meant to prove anything so much as to provide a baseline for further testing.

The most interesting result to me was that using IS/OOS, my CPU can use its time generating MANY more strategies, rather than spend it on validating Monte Carlo tests.

One thing disturbing to me is that it's hard to immediately tell the difference between stats of the best strategies generated with IS vs. those IS/OOS. I thought the line may be more discernible. (I can hear "I told you so" already!) This seems to be a very strong argument that the actual Acceptance Criteria may have only a very small effect on proper workflow.

I haven't explored the stats very much yet, though, and I expect the real revelations to come with the further testing I have planned. Perhaps when certain AC are combined, they will generate a more predictable result.

Something else that occurred to me about this: it seems to say that the stats we have to work with cannot be heavily relied upon to select strategies. Not to say, for example, that there is not difference in a R/DD of 6 vs. a R/DD of 2. Of course, the generated strategy with R/DD of 5 has a better chance lacking more information. But what I'm saying is that the divide between these two values may be very small (by itself), especially if it is taken from an exclusively IS/Optimized strategy.

This would certainly be in line with the "observations" that there are no discernable performance differences between the "top ten" strategies of a Collection vs. the "second ten".

So if we cannot depend on the actual stats to make much of a difference, what else do we have? Currently, there is Monte Carlo testing and OOS. Monte Carlo settings are possibly the least-understood metric yet most powerful that traders have. From my time using StrategyQuant, I've watched the "gurus" propose all sorts of nonsense about how to validate your strategies, and much of it was focused on Monte Carlo testing.

The big problem is that we are given this tool, and we know it is somehow useful, but most of us don't have the math background to understand how to use it effectively. So someone that wants to sell his courses and/or software comes up with a process that seems logical to him (based on intuition, "observation", astrology, or whatever...) and this process becomes a "gold standard", unquestioned by people that follow it blindly. Because, after all, that's how EVERYONE does it now, so it MUST be the right way.

Trial and error and intuition and observation is very unlikely to get use moving in the right direction. It can actually set you in a direct opposite direction for an entire lifetime (I'm not exaggerating!).

We need to come up with a way scientifically determine the best method of MC analysis. A calculation of an exact or even near-exact method is far beyond my math skills, but I'll be happy with being in the ballpark. Right now, I think we're just flailing blindly about.
___________________________________
Finally, here is an additional quick test I did. The following sample size is super small, so it doesn't necessarily mean anything at all. But this small sample was rather discouraging.

I took the Collections resulting from the above runs and fed them all through 6750 bars of OOS data the immediately PREceeded the In Sample period.

Keep in mind that the IS/OOS strategies were already filtered for good OOS performance in another period adjacent to the In Sample.

Identical Acceptance Criteria was used as per the above OOS AC.

Total In Sample strategies: 300
% that Passed Acceptance Criteria: 21.7%

Total OOS strategies: 581
% that Passed AC: 19.6%

As I said, this is a very small sample, but if this trend continues then performance in OOS data is not consistent.

As long as the OOS period is adjacent to the IS, I don't think it should matter whether or not it is before or after in this case of monitoring performance.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Wow, this is very impressive. I really appreciate your efforts.

My 'secret' goal is to push EA Studio until I can net 3000 pips per day....

33 (edited by sleytus 2017-08-22 03:49:34)

Re: Project To Determine Most Effective Acceptance Criteria Settings

Thanks for writing-up all the results -- lots of good information.

I'm still trying to get my head around what you are trying to do -- so, I'm not sure what the "take home" lessons are.

There are many ways to generate good strategies -- I suspect the problem domain of how best to create forex strategies may not necessarily lend itself to following a strict recipe.  As you've discovered, there are a variety of criteria and parameters that need to be taken into account -- and each of those have a spectrum of possible settings.  In total, probably billions of combinations.  I'm thinking that a problem such as this should be approached programmatically -- though I'd still be dubious of the final results unless they were to also take into account live trading.  And that's because there is a real disconnect between live trading versus back testing and demo accounts.  We'd all be millionaires many times over if live accounts mirrored demo accounts and back testing results.

Perhaps a topic for a different thread -- Why is it that back testing often is not an accurate predictor of live trading performance?

Re: Project To Determine Most Effective Acceptance Criteria Settings

I understand that a successful backtest is not indicative of a successful trading strategy.

What IS true is that an UNsuccessful backtest IS indicative of an UNsuccessful trading strategy.

Our major tool is backtesting, for lack of anything else. Our goal with backtesting is basically to eliminate as many unsuccessful strategies as possible, leaving us with fewer bad strategies to bring down our good ones.

I agree that this is more a problem to approach programmatically, but that is a project beyond my programming skills. Well, I could certainly do it, but it would probably take me a couple years.

You say, "There are many ways to generate good strategies." I guess I probably agree with this statement, BUT it's not EASY to generate good strategies consistently enough to make any profit. In fact, I'd wager after the above tests that none of us knows any of these "many ways" and it is MUCH harder than any of us realize. We cannot just start cranking out a bunch of strategies that look good over the In Sample data and expect them to perform well in live trading.

Also, if we cannot find a way to filter out a number of bad strategies using OOS data as a guide, we won't be able to do so in live trading.

We need to develop a systematic approach based upon something other than trial and error.  As you say, there are billions upon billions of combinations. We're kidding ourselves if we think we can work by intuition and observation.

I also think that we may need to incorporate FSB to further refine our strategies in some manor.

For my next test, I'm conducting a huge-scale test similar to the very last test I mentioned at the end of my last post.

35 (edited by sleytus 2017-08-22 09:28:37)

Re: Project To Determine Most Effective Acceptance Criteria Settings

qattack -- if it were just you and me I wouldn't comment any further -- I admire that you have the confidence to take the road less traveled and are willing to expend the time and energy to carve out your own path and share the results.  Since this is a forum then I think it's okay to express some skepticism -- just to make clear to anyone who is reading that others have had success with Popov's software and it doesn't necessarily have to be this complex or hard.  We are both motivated and want to understand how all this works -- I think where we differ is which forex battles we choose to fight.   It is important for everyone to become familiar with Popov's software -- and it is clear that within a short time you really have.

On the positive side -- we agree that back testing does not guarantee success.  But then you follow that statement by saying an unsuccessful back test is indicative of an unsuccessful strategy.  And that is *way* not true.  Have you ever taken Popov's strategies and run them through MT4's back tester?  Have you ever taken a successful strategy and slightly modified the Data Horizon?  Also, what I still see missing from your posts is any mention of live accounts -- and to me that's a red flag.

It also sounds like one of your goals is to find a way to filter-out bad strategies before they get added to a live account.  And we've touched on this before -- I don't think you can filter-out bad strategies ahead of time.  In the brave, new world of EA Portfolios that include 100's of strategies -- bad apples are one of the prices we will need to pay.  And that's where portfolio management comes in -- something that is foreign to most of us.

In the end you will decide on one or more ways you like to generate strategies and others will have their way(s).  And I'll make another 25-cent bet -- that 100-member collection that you eventually place in a live account trades no better or worse than the 100-member collection that Popov places in a live account.  I'll send instructions for where to wire transfer the funds...

Re: Project To Determine Most Effective Acceptance Criteria Settings

Another point about back testing -- sorry, I can't help myself...

Several have reported that within a 100-member collection it is not necessarily true that the top-ten strategies are the most successful and the bottom-ten are the worst.  That's a clue that back testing can only take you so far.  I absolutely believe that back testing provides an "edge" -- but I think it's a probabilistic edge and is not necessarily quantifiable.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Good work, John! It is very much in line with what I have gone / am going through and your results are a good reference in this sense that I've got pretty similar ones. It seems we also share conclusions and vision about how to go ahead. Sleytus mentions live accounts and demo testing but if we have such good backtesting facility, actually facilities if we include Pro, then I don't see the point in demoing things if we can run through much more strats with multiple settings and more data. We can really make things rough and do significant stress-tests. After we have the methodology to churn out collections with positive expectancy with high probability, only then it's reasonable to tackle the problems of live trading. And lets be honest - a lot of it depends on the broker. But this my personal opinion, no offence to different ways of doing things.

Certainly, portfolio management is impossible to be left aside, but if there is huge difference between collections in profitability, and currently my research shows this, then it takes trading to a point where break-even is the best outlook... Not trying to bring anyone down, but sort of a pessimistic realism smile We must be able to pick more good than bad apples into a collection.

Back to the topic - John, you single out MC as a main validation tool, but I'm having trouble seeing its usefulness if the results it gives are inconsistent. As you said before there's huge variability if one runs MC through multiple times. The building block of any scientific work or method is that one must be able to replicate the results. With MC it is not possible and therefore I've let it sit aside. I've written bootstrapping and MC algos for White's reality check and in those instances results were confidently replicable.

I looked into anti-optimization. From what I'm seeing it is not worth it to discard best optimization runs. What is there is different level of adaptability between strats, one group is quite static and fits over a specific length of data and if used on another period (OOS) it fails to "adapt" and turns fiercely south. The other group seems to wrap itself better around new piece of data. It intrigues me and I will plow ahead in that direction.

38 (edited by sleytus 2017-08-22 22:03:11)

Re: Project To Determine Most Effective Acceptance Criteria Settings

Perhaps I'm missing something -- it wouldn't be the first time...

If I have a strategy with a SQN of 8 and a win ratio of 0.92, does it make a difference how it was generated?   

Or, another way of asking -- if I have two strategies with very similar statistics but one was generated in a casual way and the other using a highly refined procedure -- would your expectations be the same or different for the two?

I'll typically let EA Studio run for a week or so -- perhaps longer.  I end up with 100 strategies.  To my eye they all look great -- yes, the ones at the beginning of the list have better statistics and nicer looking balance charts, but the ones at the end also look fine.  I think we agree that position in the collection is no guarantee of success -- then how does one go about deciding which apples to toss if they all look good on the surface?

That's what I don't get.  If an apple looks bad then, sure, it is easy to exclude it.  But what about all the good looking apples that are destined to perform poorly -- how do you identify those?

Re: Project To Determine Most Effective Acceptance Criteria Settings

footon wrote:

Certainly, portfolio management is impossible to be left aside, but if there is huge difference between collections in profitability, and currently my research shows this, then it takes trading to a point where break-even is the best outlook... Not trying to bring anyone down, but sort of a pessimistic realism smile We must be able to pick more good than bad apples into a collection.

This is something I wanted to say, but just didn't want to. I'm still hopeful (but pessimistic as well) that we can find a way to effectively filter out the worst strategies.

footon wrote:

Back to the topic - John, you single out MC as a main validation tool, but I'm having trouble seeing its usefulness if the results it gives are inconsistent. As you said before there's huge variability if one runs MC through multiple times. The building block of any scientific work or method is that one must be able to replicate the results. With MC it is not possible and therefore I've let it sit aside. I've written bootstrapping and MC algos for White's reality check and in those instances results were confidently replicable.

I note MC as a main validation tool for a few reasons. Number one, it is one of the only validation tools we have. Number two, I have extensive experience with MC testing as it relates to poker...seperately with both with bankroll management and studying artificial intelligence with regards to finding the best poker strategy in a game with unknown variables. And three, I have little practical experience applying MC to trading systems, so I don't know it's proper uses or potential. As I've previously noted, I've seen some gurus using it when it is obvious to me that they don't know the specific mathematical reasons behind why they are using it. When it comes to trading, I can spot misapplications of MC tests, but I cannot tell you what the correct ways to use them are.

Sleytus mentioned White's reality check and I briefly looked it up, but didn't take the time to understand...and I certainly don't understand it for now, lol.

footon wrote:

I looked into anti-optimization. From what I'm seeing it is not worth it to discard best optimization runs. What is there is different level of adaptability between strats, one group is quite static and fits over a specific length of data and if used on another period (OOS) it fails to "adapt" and turns fiercely south. The other group seems to wrap itself better around new piece of data. It intrigues me and I will plow ahead in that direction.

I'm not quite sure what you're saying here, but I get the general gist of it. I believe what you are referring to is another problem in that validation methods (or settings) may very depend SPECIFICALLY what indicators are being used. I've thought of this before, and actually it applies to several different areas of strategy-making.

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

If I have a strategy with a SQN of 8 and a win ratio of 0.92, does it make a difference how it was generated?   

Or, another way of asking -- if I have two strategies with very similar statistics but one was generated in a casual way and the other using a highly refined procedure -- would your expectations be the same or different for the two?

When you ask the question in that way, then the answer is no, it doesn't matter how it was generated. (This is probably not entirely true in the case of SQN, but it's just a technicality)

But if you include Acceptance Criteria and the various Validation methods we have at our disposal (including OOS testing)--and we know how to use them effectively--then *theoretically* we "should" have fewer curve-fitted strategies in our overall collection.

So instead of a Collection of 100 strategies and 20 of them being "valid" strategies, we may end up with a Collection of 60 strategies and 17 "valid" ones.

The goal, as I envision it, is to filter out the bad strategies that are generated before the user even has to deal with them.

Actually, generation might possibly have an effect on the "curve-fittedness" of the strategies generated. As an extreme example, let's say you generate an M30 strategy over 100 years of data. Now, if you find a winning strategy with that amount of data, I guarantee that it's not curve-fit. There's just too much data to force to conform with a simple set of rules. However, if you generate over M30 for 2 weeks, then you can bet that curve-fit strategies will fill up your 3TB hard drive in no time. (A major problem, as noted by Hannah, is that algorithmic trading just became prevalent around 2014, so the market is substantially different than it was ten years ago.)

And that's what we want to prevent: pushing curve-fit strategies into our Collection. If we can find a way to filter out a certain portion of curve-fitted strategies before we even need to deal with them, that will be a huge help. Every such eliminated strategy is essentially pure profit.

sleytus wrote:

I'll typically let EA Studio run for a week or so -- perhaps longer.  I end up with 100 strategies.  To my eye they all look great -- yes, the ones at the beginning of the list have better statistics and nicer looking balance charts, but the ones at the end also look fine.  I think we agree that position in the collection is no guarantee of success -- then how does one go about deciding which apples to toss if they all look good on the surface?

That's what I don't get.  If an apple looks bad then, sure, it is easy to exclude it.  But what about all the good looking apples that are destined to perform poorly -- how do you identify those?

And again, that is what we need to filter out with validation methods before it's pushed to the collection.

As a very simple example, take two different strategies of SQN 8 and win rate of 92%. If we run these strategies through MC tests varying the system parameters, this test may show that one of the strategies is particularly vulnerable to these changes. That system is much more likely to be too highly curve-fit than the other, so we can discard it. Yes, I know, both of these strategies are highly curve-fit. But perhaps the other one is so basically sound that it will still produce of SQN of 2.4 in extensive live trading. The second system was not curve fit to such a degree--even though it was still highly curve-fit. There was just enough underlying actual potential that there was still plenty of profit remaining after the curve-fitting was removed.

Here is another example. Let's say you have a parameter, "enter the market when it rains". Other indicators were randomly generated to fill out a beautiful winning strategy with an SQN of 8 and win rate of 92%.

Of course, you will look at that strategy and remove it, since you know better. But suppose you DIDN'T know better. Do you think there could possibly be a combination of Acceptance Criteria, Monte Carlo tests, Optimization, OOS tests, etc. that could filter that strategy before it is pushed to your Collection? I would bet you 25 cents that it wouldn't stand up to a reasonable OOS test!

You might counter saying that there's little chance that the software would ever come up with a strategy based upon the rule "enter the market when it rains" that has an SQN of 8 and win rate of 92%. (!)

But I would disagree with you. Every time it curve-fits a strategy with some random indicators, that is what the software is doing.

Lastly, I'm not trying to discourage anyone or crush any dreams (particularly my own!). I'm just reporting what I find and throwing my thoughts on the matters down on "paper." While I have some strong opinions on the various issues, I am far from the last word. I'm very likely the newb to this whole trading game. One problem I have is that I tend to intertwine my strong opinions with those that I'm less sure about. If I didn't do that, I'd always be placing conditions on my statements (and my posts would be twice as long--and you wouldn't want that!). I'm very much open to changing my mind, as that's what this whole journey into shorter data horizon generation arose from in the first place.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Thanks for taking the time to answer my questions -- it helps me to better understand where you are coming from.  And it also raises some additional questions -- I'm still probing...

1. You mention ending up with more "valid" strategies in the final collection.  I don't know what the definition of "valid" means.  And you don't have to answer this -- it's just that I don't understand how one can test for something that isn't defined.

2. You brought up curve-fitting -- which I hadn't seriously considered before.  When comparing two strategies with similar statistics it could be that one is more curve-fitted than the other and then, yes, it would seem desirable to use the less curve-fitted one.  But how does one test for curve-fittedness?  Applying Monte Carlo tests?

3. Another comparison question -- there is no right or wrong answer.  One strategy has been back tested for 10 years and another for 1 year.  And we can assume the one back-tested for 10 years is less curve-fitted.   Which one would you choose?   A little more information -- the more input data (i.e. longer the back-testing) then the less curve-fitted the strategy will be.  But that also means the poorer will be the statistics.  I brought this up in a different thread -- when fitting a polynomial equation to data points, the more data points used then the poorer will be the fit.   So, again, which would you choose -- a less curve-fitted strategy that trades okay for 1 year or a more curve-fitted strategy that trades more profitably -- but then stumbles after 6 months? 

Or would you claim you can have both -- less curve-fitting plus equal or better statistics?  Do you understand what I'm getting at?

42 (edited by hannahis 2017-08-23 07:47:55)

Re: Project To Determine Most Effective Acceptance Criteria Settings

qattack wrote:

I understand that a successful backtest is not indicative of a successful trading strategy.

What IS true is that an UNsuccessful backtest IS indicative of an UNsuccessful trading strategy.


This is NOT true. 

Often FSB's backtesting result may indicate that my EA is unprofitable but I still go ahead because I trust my theory and put it in live or demo testing and I get profitable results.

So it is not true that if your EA yield unprofitable backtest, it will be unprofitable in real trading.  I have lots of experience to testify it.  So it's not just an opinion, it's experience.



There are generally two types of users

1.  Use statistical results/criteria acceptiance to search for Successful EA

2.  Use sound trading theory to build successful EA that will yield the kind statistical results you are looking for.

The outcome...

1.  Statistical search will yield billions or unlimited outcomes and the process of filtering these are varied and not necessary yield any good results at the end of the day.

2.  Using sound theory to build, I can get a 10 to 50 profitable EA in 1 round (a couple of hours to complete 1 round).

So to me, using theory to build EA is not like a shot in the dark and present much lesser uncertainity than a statistical search. 

Nevertheless, some of us are more incline to one than another, depending on our educational training and perspective.  If I'm a mathematician, I would be prone to statistical search as my 1st preference.  Unfortunately (and fortunately), statistic isn't my forte so I choose the latter.

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

1. You mention ending up with more "valid" strategies in the final collection.  I don't know what the definition of "valid" means.

"Valid" simply means strategies that "WILL" (more accurate is "are highly likely to") earn profit in the future based upon the fact that their underlying indicator combinations are sound.

So if my fantastic EA above with the entry rule "enter the market if it rains" continues to making soaring profits for the next five years, this is still NOT a valid strategy. (well, it actually may be, but that's another story...) So, no matter how much we filter and prune, we are still going to get "lucky" with some strategies that make money in the future, but not based on the merits of their indicators.

sleytus wrote:

2. You brought up curve-fitting -- which I hadn't seriously considered before.  When comparing two strategies with similar statistics it could be that one is more curve-fitted than the other and then, yes, it would seem desirable to use the less curve-fitted one.  But how does one test for curve-fittedness?  Applying Monte Carlo tests?

And that is the question we need to answer. Monte Carlo testing is certainly one way to test for curve-fittedness,
but knowing exactly how is a different matter entirely. And as I've mentioned elsewhere, it may be that each different combination of indicators needs its own MC test parameters (which would entirely defeat the purpose of such a test.).

OOS testing is currently the major way to test for curve-fittedness. That's because a curve-fitted strategy will only perform well in the data over which it is generated (not entirely true, as in our above example of the rain strategy).

One reason, among others, that a strategy becomes curve-fitted is that the software will locate a very few periods where the largest moves in the market occur and tailor a strategy almost exclusively around those moves. So if you are testing over a two year period and there are only seven of these moves, there are basically only seven trades that matter, even if your trade count is 200. That's very oversimplified, but you can get the idea.

There must be other innovative ways that we can filter for valid strategies.  I'm 100% certain of that. But I don't know that we have the tools to do so. I think they will mostly employ OOS in some fashion. I also think we have the capability as a group to develop those tools, but know this: it won't be easy. It would be an entirely revolutionary concept.

sleytus wrote:

3. Another comparison question -- there is no right or wrong answer.  One strategy has been back tested for 10 years and another for 1 year.  And we can assume the one back-tested for 10 years is less curve-fitted.   Which one would you choose?   A little more information -- the more input data (i.e. longer the back-testing) then the less curve-fitted the strategy will be.  But that also means the poorer will be the statistics.  I brought this up in a different thread -- when fitting a polynomial equation to data points, the more data points used then the poorer will be the fit.   So, again, which would you choose -- a less curve-fitted strategy that trades okay for 1 year or a more curve-fitted strategy that trades more profitably -- but then stumbles after 6 months?

You have a fundamental flaw in your thinking about this entire subject. That's sounds harsh. I'm not trying to mean, just trying to help you realize how you are thinking about it wrongly, and you need to realize that you keep missing an important logic of the generation process.

If you offered me these two strategies and guaranteed that each would perform as stated, then...I don't care. In fact, I wouldn't bother with either of them.

Let me rephrase your question for you a bit. I know it's picky, but it pertains to the flaw in your thinking.

Instead ask the question, "Assume two methods of generation. The first backtests for 10 years and produces strategies that are less curve-fitted that perform OK for one year; the second backtests for 2 years and produces strategies that are more curve-fitted, stumble after six months, but perform significantly better."

Asked in this way, I'd bet most people would jump at option #2. I would probably also be inclined to pick option #2, but not so fast. We don't have quite enough information. What if the Standard Deviation (and thus the risk) of strategies from option #2 ends up being extremely high and our account is in jeopardy? What if the second method only produces 10% of the number of strategies as the first?

OK, I didn't do a very good job of coming up with additional information needed (but there is a reason for that).

"Wait," you say, "method #2 is in reality always going to generate MORE strategies than method #1; MANY more, in fact."

Aha! That is very true! And this is also a huge problem. Method #2 is going to generate SO many more strategies that a MUCH larger proportion of them will be "invalid" and curve-fitted to such a degree that they are useless. So now we come back to our problem: How do we prune enough of these poor strategies so that they don't wipe out the profit we obtain with are valid strategies?

You might say there is a way to do it. And I really believe that there MUST be. The only problem is that we don't know HOW to do it, and it will *literally* take us the rest of our lives (or longer) to arrive at that way by trial and error.  The sample size compared to the Variance is just way too small.

Going back to what I know about...poker. Namely, Texas Holdem, by far the most popular variant. You wouldn't believe how sophisticated the strategy is among the top players today. (!!!) No, I'm not talking about all those chumps you see in tournaments on TV that they make into movie stars (although a select few players really DO know what they're doing!). No, I'm talking about a multitude of players that honed their games on the Internet, often playing six to twelve tables simultaneously (amounting to easily twenty times as many hands per hour as live play).

Holdem poker strategy, between it's introduction into Vegas in 1967 and the present day, has been increasing exponentially in complexity and effectiveness. It's similar to technology. Until the mid-2000s, all instruction was accomplished with books. These strategies were very stagnant and did not keep up with the quickly-developing strategies since 2003. Internet video learning sites became the Gold Standard. Pros were able to get their new ideas and discoveries to the public very quickly, and information exchange between players increased a million-fold (not exaggerating).

Each successful strategy was built upon the combination of all other previously-successful strategies. Early strategies were based upon only one or a handful of player's experiences.

You wouldn't believe how sophisticated today's top players are. Even players at my level routinely use "gaming theory" to calculate the best actions to take in particular situations. This is done with hundreds (or thousands) of hands away from the table, and with the aid of software, as the calculations take far too long to accomplish at the table. Then, when a situation arises at the table, we draw from our memory of situations we've analyzed and try to relate it as closely as we can to some of these situations to take appropriate action.

A program that can beat good poker players in a "ring" game (that is, a cash game with multiple players) is still a LONG ways off. I do believe it will eventually happen, but for now it is out of reach. Computer processors just aren't fast enough yet. It's kind of like chess...at one time, people said that a chess program would never be able to beat a human opponent. Now, even cheap programs can beat Grandmasters, especially at speed chess.

Trading is a LOT more complex than a game of poker. There are SO many more variables. It's relatively easy to put a poker player on a range of hands or at least the probability that he holds a range of hands. And worst-case scenario, there are only 169 different starting hands he can have anyway. (how they combine with the board cards becomes very complicated sometimes, however.)

In trading, we only think about a limited subset of variables, but in reality there are many more, and that is what produces all the "noise" we experience. Without this noise, we would all be raking in the big bucks.

My point with this expose (novella?) is that it will take time and mathematical analysis to develop sufficient workflows to effectively weed out the invalid strategies.

44 (edited by qattack 2017-08-23 17:19:15)

Re: Project To Determine Most Effective Acceptance Criteria Settings

hannahis wrote:
qattack wrote:

I understand that a successful backtest is not indicative of a successful trading strategy.

What IS true is that an UNsuccessful backtest IS indicative of an UNsuccessful trading strategy.


This is NOT true. 

Often FSB's backtesting result may indicate that my EA is unprofitable but I still go ahead because I trust my theory and put it in live or demo testing and I get profitable results.

So it is not true that if your EA yield unprofitable backtest, it will be unprofitable in real trading.  I have lots of experience to testify it.  So it's not just an opinion, it's experience.

I'm skeptical of this statement. Don't get me wrong, your simple statement almost immediately made me change my mind on my statement. So what I'm saying is that I'm skeptical of BOTH statements, but much moreso mine than yours now.  Does that make sense? smile

This is one statement that I made as "fact" that I intuitively believed, but also realized may not be 100% correct. So now, instead of being 95% sure, I am 20% sure. As I said before, if I put a qualification on all my opinions, my posts would be twice as long.

I'll probe you for more information on this later, because it's one of my core assumptions.

hannahis wrote:

There are generally two types of users

1.  Use statistical results/criteria acceptiance to search for Successful EA

2.  Use sound trading theory to build successful EA that will yield the kind statistical results you are looking for.

The outcome...

1.  Statistical search will yield billions or unlimited outcomes and the process of filtering these are varied and not necessary yield any good results at the end of the day.

2.  Using sound theory to build, I can get a 10 to 50 profitable EA in 1 round (a couple of hours to complete 1 round).

So to me, using theory to build EA is not like a shot in the dark and present much lesser uncertainity than a statistical search. 

Nevertheless, some of us are more incline to one than another, depending on our educational training and perspective.  If I'm a mathematician, I would be prone to statistical search as my 1st preference.  Unfortunately (and fortunately), statistic isn't my forte so I choose the latter.

I COMPLETELY agreed with everything you said, except for the last statement. Though I am tackling EA Studio's possibilities first, it is not my first preference. I only bought EA Studio for the possibility that we may find an effective workflow that will allow the premise to work.

As a "mathematician", I completely realize that FSB is the more powerful of the two tools. In fact, I'm 100% confident that I can (and will) develop a proven trading plan, exactly as you have, through the use of logic.

However, it is also logical that I first make basic experiments with EA Studio, as if it DOES work (...with only basic ideas), then it will be MUCH easier to generate an "infinite" number of profitable EAs in this way. Even though the success probability is much less, there is a possibility for greater returns (in efficiency, potential). It's like working for a living versus buying Powerball tickets hoping to hit the 700 million $$$ jackpot. (Thought I've never bought Powerball ticket and the I'm not comparing the odds with winning Powerball to the chance that our EA Studio methods will work!)

Does that make sense?

45 (edited by qattack 2017-08-23 17:33:42)

Re: Project To Determine Most Effective Acceptance Criteria Settings

hannahis wrote:

Often FSB's backtesting result may indicate that my EA is unprofitable but I still go ahead because I trust my theory and put it in live or demo testing and I get profitable results.

So it is not true that if your EA yield unprofitable backtest, it will be unprofitable in real trading.  I have lots of experience to testify it.  So it's not just an opinion, it's experience.

I thought of what I wanted to ask about this.

After you place your EAs on a demo server for a period of three months, and you subsequently go back and analyze those same EAs over only the period that you had them on the demo server (but obviously were not In Sample during your development), do those backtests show results that differ significantly from how they performed in your demo account? Enough to turn a slightly losing-looking strategy into a strategy that you would be confident trading (for a moderate amount of real money)?

Furthermore, if this is the case, can you think of any possibility to capture these strategies and push them to Collection without adding too many other strategies that would end up not being valid? Because if this is the case, it makes the situation even worse when trying to filter. Our choice is to allow more invalid strategies through or fewer valid strategies, both of which is bad for the proportion of good strategies in our Collection.

It is not so much the overall number of good strategies in the Collection that matters, but moreso the PROPORTION of good strategies to bad.

Re: Project To Determine Most Effective Acceptance Criteria Settings

qattack wrote:

...Let me rephrase your question for you a bit...

Thanks for answering my questions so completely.  You've given me a lot of food for thought.


hannahis wrote:

There are generally two types of users

I think you've captured the big picture well.  For now I would place myself in the first group (statistics) -- but that is only because I haven't been doing this long enough to have accumulated much in the way of sound theory. 

Popov's software has enabled me to trade successfully at an early stage.  This makes forex trading very interesting to me and also motivates me to learn more -- and I've learned more from this forum than any book or Google.  Over time I hope to be able to better merge the two. 

Thanks to both of you for great advice and information...

Re: Project To Determine Most Effective Acceptance Criteria Settings

Hi, I like to contribute my small experiance:
I always use maximum OOS-Range and simple Acceptance Criteria (SQN min 2, max DD 25-30%) without optimizing.
The results (max 100) I look through "manually" checking the overall performance chart.
As I am more an "optical" type of person I only leave those strategies having not too rough valiations in the performance and most important of all also show at least an acceptable performance in the range of the before unknown (newest) data.
This way I look at it as if I was moved in the past (beginning of OOS-data) to create the strategies and when checking the results I am looking into the future (till tody)
Doing so I see often the case that good results in the initial range are completely weak in the overall range, and strategies not in the to field give a quite nice result.
I never had the impression that there is a system in it, it seems to me that it just depends on the variation of input data and the best results (in my opinion with small baseline deviation) are sure not the initially top rated ones but ditributed over the 100 results.
The condensed 3-5 leftover strategies I use those then on demo- and life account.

Re: Project To Determine Most Effective Acceptance Criteria Settings

bru1 wrote:

Hi, I like to contribute my small experiance:
I always use maximum OOS-Range and simple Acceptance Criteria (SQN min 2, max DD 25-30%) without optimizing.
The results (max 100) I look through "manually" checking the overall performance chart.
As I am more an "optical" type of person I only leave those strategies having not too rough valiations in the performance and most important of all also show at least an acceptable performance in the range of the before unknown (newest) data.
This way I look at it as if I was moved in the past (beginning of OOS-data) to create the strategies and when checking the results I am looking into the future (till tody)
Doing so I see often the case that good results in the initial range are completely weak in the overall range, and strategies not in the to field give a quite nice result.
I never had the impression that there is a system in it, it seems to me that it just depends on the variation of input data and the best results (in my opinion with small baseline deviation) are sure not the initially top rated ones but ditributed over the 100 results.
The condensed 3-5 leftover strategies I use those then on demo- and life account.

Hey Bru1

What have you then for Experience in live? How long the Strategies work good before change?

Re: Project To Determine Most Effective Acceptance Criteria Settings

Currently I have about 30 EA's running on Demo since 8 months and 9 EA's on a life-account since 4 months.
But some have been selected with other criteria and I try to compare the performances. It is too short to give a judgement, but it do not have the impression that they get worse over time, as often predicted or reported here.
To me they just seem to have good and bad periodes each one with its individual variation. So I am still creating individual EA's in different symbols and time frames expecting to have every week more winners than loosers which works so far but I would not be surprised if it suddenly changes.
I also had 2 different symbol Portfolio experts with 9+5 EA's running, but although they were the best of a collection, after approximately 3 months only 2 EA's were left, all others performed too bad from the beginning.
Sorry for getting too far away from the AC-setting topic.

50 (edited by trader1234 2017-08-30 16:28:57)

Re: Project To Determine Most Effective Acceptance Criteria Settings

I have long believed that 50% OOS data is useful and always compare the IS and OSS curve to match - both for characteristics and gradient.  Whilst it surely filters out bad "curve fitted" strategies, I have recently thought that it does also cause many strategies that look good to underperform under testing.

The reason I believe is that by looking at the IS and OOS curves, I am just eventually ending up with optimising the OOS data by exclusion - I end up with exactly the same result as if I'd used all the data for optimisation, without the benefit of having double the data available, which might help out - by including many more different trading conditions, and reducing the chances of curve fitting.  I believe the longest data horizon available (200,000) in Studio is the best guard against curve fitting (along with only having a couple of entry and exit criteria).  It also simplifies the generation process as the computer can do what I have been spending my time doing.  I know that most of the curves will be nice and linear, and I know that most (80%) of tested strategies will fail testing, but they do already!  I will be no better off than before, and with less work.

Have I got it wrong regards OOS data?  I have used OOS data to visually check strategies for a few years, and not taken the time with walk forward testing as such, which might have more merit. But I do believe the OOS data is not really OOS once you have manually filtered out say 10000 strategies - it is not really out of sample any more - it is only OOS for the first 100 runs for instance.  Please suggest if I am misguided in this assumption - I will then carry on with my current method for strategy generation.