1 (edited by qattack 2017-08-19 02:00:44)

Topic: Project To Determine Most Effective Acceptance Criteria Settings

Over the next few days, I'm going to make a rough estimate on the effectiveness of the various EA Studio Acceptance Criteria.

I hope to determine the relative importance of each element and logically compose a more powerful set Acceptance Criteria.

For this model, I am adopting Sleytus' approach to EA Studio strategy generation and adding an Out Of Sample element. (Kudos to Sleytus for talking me into trying out this approach, leading to this large-scale test! Also, Hannah told me that data from 2014 onward is in general of more use to backtesting, as that's when algorithmic trading took off.) I am hopeful that the way in which I use the Out Of Sample period will greatly improve strategy selection and filtering.

This will be done specifically for the following settings:

Currency Pair: EURUSD
Time Frame: M30

SL: 10-100 (Must Use)
TP: 10-100 (May Use)

In Sample Start Date: 10-22-2015
In Sample End Date: 01-04-2017
Total In Sample Bars: 15,000

Out Of Sample Start Date: 01-05-2017
Out Of Sample End Date: 08-11-2017
Total Out Of Sample Bars: 7,500

My generation method will be System Quality Number, as I believe this to be the best metric available. SQN maximizes the Number of Trades and average Profit-vs.-Risk of trades, and minimizes Standard Deviation. The latter is very important to create a more balanced and predictable Equity Curve and has the effect of lowering drawdown.

1. I will first generate exclusively over the In Sample period using only very basic Acceptance Criteria, those which first appear when you hit "Reset Acceptance Criteria". Max Ambiguous Bars will remain constant through every subsequent generation.

  * Optimization will be used +/-5 steps (Search: SQN; In Sample).

>Monte Carlo tests will number 100
  * requiring 90% "Validated Tests"
  * using the default settings--with the exception that "Randomize Indicator Parameters" (10/10/20) will be used.

2. My first run will be six hours in length over eight instances. I will then tally up the following numbers for future comparison:
  * Percent "Generator/Passed Validation"
  * Percent "Monte Carlo/Passed Validation"

  * The Collection will be sorted on SQN.

3. Next, I will run each Collection through the Validator over the Out Of Sample period.
  * I will visually inspect the resulting Equity Curves and discard any I deem "unacceptable." This is highly subjective (denoted by use of quote marks below!), but I will attempt to eliminate those that exhibit any of the following characteristics:
   > "Large" drawdown at any point
   > "Excessive" Stagnation (including some that looks like stagnation, but is not technically stagnation)
   > "Uneven" Equity Curve
   > "Any curve that I just don't get along with"
  * Additionally, I will record Out Of Sample stats of the top two or three strategies from each instance for future comparison.

4. I will record the percent of strategies I eliminate in Step #3 for future comparison.

5. Lastly, I will rerun the remaining strategies and take note of the lowest value of each stat. This can possibly be used to indicate the strength of the remaining strategies, as well as serve as a guide for what the minimum effective Acceptance Criteria values might be.

***************************
After the initial process above, I will add a single Acceptance Criterion at a time to the original Criteria and rerun the generation. These generations will not be nearly as long. I will run eight instances for each Criterion, with each instance set to a different value of the new Criterion.

For each Acceptance Criterion, I will record the same data as in the initial generation.

I will then analyze each Acceptance Criterion value, comparing its results to results from the baseline generation.

Using these comparisons, I will be able to rank the Acceptance Criteria into an order of relative effectiveness.

I will then start combining Acceptance Criteria to determine how they interact with each other.

My goal is to determine the values at which the weaker strategies are eliminated, disallowing them from entering the Collection, but ensure the Criteria's combined effect does not start removing too many potentially viable strategies (and certainly no top strategies). I imagine that I will end up with most values being somewhat low.

I expect Balance Line Deviation and Maximum Consecutive to possibly surprise me with their effectiveness.
****************************
When I'm done, I will post my basic results.

Please feel free to add your input as to what might make this process better. But hurry, I hope to have it all wrapped up in three days!

2 (edited by qattack 2017-08-19 03:50:42)

Re: Project To Determine Most Effective Acceptance Criteria Settings

Two addendums:

1. Monte Carlo testing parameters changed: 95% "Validated Tests" @ (12/12/20). To reduce the large number of strategies successfully Validated.

2a. I will visually inspect In Sample Equity Curves as I do OOS Equity Curves (before moving on to Step 3)

Re: Project To Determine Most Effective Acceptance Criteria Settings

I was reading your post and it makes me think about things.

You mention my approach includes OOS -- but, actually, in a different thread I was making an argument why I do *not* use OOS.   Remember -- when the software performs its optimization what it is really doing is optimizing indicator settings using the input data specified by your Data Horizon.  Since I'm trading in 2017 then I want the indicator settings optimized using 2017 data and not data from 2011 or 2013 or whatever.  To be sure, there is no guarantee that a 2017-optimized EA will perform better than a 2015-optimized EA -- and no doubt there will be periods in the future when a 2015-optimized EA will perform better -- but if I'm going to go with the odds, then I'll go with 2017.

As for methodically optimizing Acceptance Criteria -- that seems like a noble endeavor, but I wonder what it gains in the end.  Unless you can correlate your results with performance in a live account, then it doesn't really count for much.  I mean, most of us have at one time created EAs with amazing statistics and demo account performance -- only to watch it stumble when placed in a live account.  I'm not sure -- but I think it has to do with the nature and pattern of forex data -- it really is true that past data patterns do not predict future data patterns.  So, even EAs with the greatest stats aren't guaranteed to be winners -- though they probably have a better chance than EAs with poor stats.

Several months ago when I first started with Popov's software one of the more experienced traders on the forum suggested I learn to take what the software gives me -- and a light bulb went on in my head.  Popov has not only made it easy for us, he has also introduced a paradigm shift -- we can now trade *portfolios* rather than EAs.  There is no longer a need to effort at creating the first holy grail EA.  Instead we can now create a nice collection of 100 EAs with good stats, add them to a live, micro account and monitor their initial trades.  You will probably find that approximately 50% perform well and 50% don't.  And then you prune the poor-performers -- leaving behind a portfolio that trades profitably.  It really is that simple.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Sorry, I was not clear with that statement...I meant that I am adding an OOS period to your method.

The purpose of the OOS while conducting this experiment is to provide a measure of what values of stats to look for in the In Sample period.

When I actually generate EAs, I will be optimizing over the latest data as my final step. The OOS period is only seven months and will essentially replace a demo server testing period. I'm not sure whether I will optimize over the entire IS plus OOS period, or half of IS plus the entire OOS.

The purpose of attempting to find the "most powerful" set of Acceptance Criteria is to ensure that fewer under-performing strategies make it into the final portfolios. Our original profit margin is probably quite thin. Inevitably and no matter what we do, we will end up with a number of poor performing EAs in our live portfolios. Every one of those EAs that we can prevent from getting into the portfolio in the first place is pure profit. If a strategy performs poorly in OOS, it will perform poorly in live trading. (The reverse is not true, of course.) I'm not trying to make the strategies more amazing; instead, I'm trying to discard those that will fail anyway.

That said, it is somewhat likely that we really don't need much in the way of Acceptance Criteria. We can just generate over SQN, sort Collections by SQN, and grab the top two or three strategies from every Collection after a sufficient period of generation.

Re: Project To Determine Most Effective Acceptance Criteria Settings

qattack -- I apologize.  In re-reading my previous post in the light of day it came across harsher than it should have.  I applaud you for making the effort to do the additional testing.  I think it was because I've been there, done it, I was suggesting a different direction.  There is no right or wrong way to experiment with Popov's software -- I'll be curious to learn your results -- and I will probably incorporate them into my approach and have to admit the error of my ways.  As for 'my method' -- I'm not comfortable taking credit for anything.  I'm simply following guidance I've gleaned from other's posts on this forum.  There are a number of very experienced and wise traders who support this forum -- I am not one of them.  I'm here as someone who wants to learn and can't help himself from playing devil's advocate.

Your last sentence about going easy with the Acceptance Criteria and then generating and sorting over SQN sounds more in line with what I have found that works well and is easy to do.  The only thing I would add is you may be pleasantly surprised there will be more than two or three good strategies in that collection of 100.

The Magic Numbers assigned to strategies in the Portfolio Expert created by EA Studio are sequential -- from 0 through 99.  So, strategy 100000 is the first strategy and 100099 is the 100th strategy (assuming '100' is the Magic Number's base).  You would think that most of the winners come from strategies with the smaller Magic Numbers and the poor performers would be further down the list (since I sorted using SQN).  But that's not my experience -- winners and losers can appear anywhere within the list of strategies.  I wonder if others have also seen that or have different results.  EA Studio is still very new, so it may take awhile to accumulate observations from different users.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Hello John and Steve,

Very interesting discussion!

I'm thinking from some time how to automate a validation process similar to the described above. The Validator tool and the OOS Criteria (under development) are components of the "grand design".

Probably we can make a brainstorming conference for sharing ideas.

Trade Safe!

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

You would think that most of the winners come from strategies with the smaller Magic Numbers and the poor performers would be further down the list (since I sorted using SQN).  But that's not my experience -- winners and losers can appear anywhere within the list of strategies.  I wonder if others have also seen that or have different results.

My observation is similar, the best in backtesting is never the best in OOS or real trading. Furthermore, it is quite often that the first 10 strats in the collection never make it to the green side... The biggest problem in my experience.

But I second to Sleytus and will be waiting results!

Re: Project To Determine Most Effective Acceptance Criteria Settings

My observation is similar, the best in backtesting is never the best in OOS or real trading.

In one of our talks with Rimantas and Justin from Autotrading Academy they gave an idea for a tool for something like anti-optimization. With two words, when we run the Optimizer, it searches for the best strategy according to the Acceptance Criteria and the Fitness goal. However the guys asked to modify it in order not to choose the "best" one, but one from the middle of the optimized set. It is something as to skip the first ten strategies from the collection.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Popov wrote:

My observation is similar, the best in backtesting is never the best in OOS or real trading.

In one of our talks with Rimantas and Justin from Autotrading Academy they gave an idea for a tool for something like anti-optimization. With two words, when we run the Optimizer, it searches for the best strategy according to the Acceptance Criteria and the Fitness goal. However the guys asked to modify it in order not to choose the "best" one, but one from the middle of the optimized set. It is something as to skip the first ten strategies form the collection.

I've studied strat distribution a bit... Yes, page number 2 is better than first, nevertheless it's all about if's and but's to my observations, pardon statisticians if I use exact terms too loosely, but winning strat distributions are still more or less random.

It seems to be that skipping first strats in collection is not really the same as skipping best optimization versions, mind you - every strat in the collection is optimized to a T.

Re: Project To Determine Most Effective Acceptance Criteria Settings

So, that's interesting...

Do you have any thoughts as to why strategies with the best statistics seem to perform poorly?  Are we sorting on the wrong metric?  What else might the highest ranked strategies have in common that cause them to lose?

I wonder whether there is some threshold where if the stats are too good then it must be because the strategy is highly curve-fitted and destined to lose when exposed to new data.  I mean, it seems like good statistics are desirable -- but perhaps if they are too good then it means the strategy is over-optimized for specific data patterns and anything new thrown at it will cause it to fail.  I don't know -- but it's an interesting and unexpected observation.

11 (edited by qattack 2017-08-19 23:04:14)

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

qattack -- I apologize.  In re-reading my previous post in the light of day it came across harsher than it should have.

No offense taken; I usually don't judge the tone of Internet posts. It's very easy to type something in manner that can be misinterpreted and different people also have different tones and different command of the language (speaking in general terms.). Your posts are always very thoughtful and I welcome them.

I the one that often tends to put "too much thought" into things...and I realize that. But I'm not going to change, I'm 49 years old. And I'm always curious about the math behind things (and if it's quantifiable).

Going into this project, I did realize that it could mostly turn out uneventful, but I do think I can't help but to find at least a couple helpful discoveries. And any little bit helps, as there is not really a clear-cut way to proceed at the moment.

And maybe you haven't come up with many of the original ideas on your own, but there really are not too many truly new ideas; and you've pieced together a string of related ideas and re-share them in ways that make more sense.

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

So, that's interesting...

1. Do you have any thoughts as to why strategies with the best statistics seem to perform poorly? 
2. Are we sorting on the wrong metric? 
3. What else might the highest ranked strategies have in common that cause them to lose?

My 2 cents...

1-3: I've researched sorting metrics, no exploitable patterns yet. Statistically speaking there certainly are patterns, but the sample size for it needs to be huge, thousands and tens of thousands of strats me thinks, trading span maybe at least 5 years? But I'm not satisfied with that perspective. Currently my thinking is that obsession with stats and metrics is useless, I've been researching development methodology instead (sorting metrics, generation metrics, optimization metrics, validation methods etc.). I'm re-drawn to better validation methods as well, in theory that makes the most sense. I plan to re-visit White's reality check (WRC), my previous trials were ineffective, mostly because of my little experience and too little sample size. At the same time my belief grows by day that WRC will still fail by validating strats with sharpest equity curves. So determining statistical significance of a backtest might not fly...

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

  You would think that most of the winners come from strategies with the smaller Magic Numbers and the poor performers would be further down the list (since I sorted using SQN).  But that's not my experience -- winners and losers can appear anywhere within the list of strategies.

I haven't yet put too much thought into this type of observation, but there are a number of reasons this can happen.

I believe that most reasons depend to some extent on small sample size, either directly or indirectly.

My current guess is that the primary reason this occurs is that when a strategy is put through the Optimizer, it finds the "best" values for each parameter. Those "best" values are based upon...a small sample size. Many of the values that it decides upon are actually anomalies. If you took one step away from those values in either direction, the strategies SQN would drop by varying degrees. The more values that are optimized, the more the effect. It's  probably an exponential effect to some degree.

The market has many random fluctuations. Our various strategies' indicators (alone and in combination) will run into varying degrees of this randomness. Look at this hypothetical (and yes, simple) range of profit vs. parameter values. Pretend you are optimizing over only this parameter.

Parameter value/Profit$
20   100
21   130
22   120
23   210
24   240
25   370
26   140
27   150
28   210
29   200
30   250
31   300
32   330
33   340
34   290
35   340
36   330
37   310
38   290
39   250
40   220

Looking through this list, which parameter value would you prefer? The Optimizer would choose (I assume) a value of 25, the highest in the list. However, The best value may be 34...not even one of the top values. What's more, if #34 had a profit value of 400 and you are stepping by 5 (...and landing on 0's and 5's...centered around 30, for example), it would miss Parameter value #34 completely.

I'm making some assumptions in the above example. I'm not 100% certain this is how the optimizer works. I also don't know enough about the math of market fluctuation and trends to know if these numbers are logical...but intuitively I think they could very easily fall like this. It comes back to small sample size...a relatively small number of possible trades affecting each Parameter point. And if one of those trades is larger than the rest, this trade would abnormally affect the average.

Again, I may be wrong about how the Optimizer works. It would be helpful if the Optimizer looked at all closely surrounding values and was affected by them to some specified degree (this degree would be affected differently by different indicators, so that's probably not feasible). Or perhaps just a simple smoothing of adjacent values.

My overall point is that a number of seemingly small differences in a strategy can result in a huge difference in SQN.

SQN also takes into love for FSB of trades and Standard Deviation. It great that it does this, but both of these stats can throw kinks into our plans.

As number of trades increases, SQN will grow disproportionately from over-optimization, greatly increasing our error.

If I'm not mistaken, Win Rate of a strategy indirectly depends upon Standard Deviation. Let's say, for example, you found a "winning" strategy with a Win Rate of 18%. Assuming the strategy performs live exactly as in tests (no, of course it doesn't, but just assume...), your short-term results with this strategy will be all over the place compared to a strategy with Win Rate of 75%.

(As a side note, when I was using StrategyQuant to generate over SQN, I found TONS of strategies with less than 20% Win Rate. And VERY few over 42% or so. I'm lovin' the higher win rates found by EA Studio.)

Be very careful, though, of "observational bias." This is a primary reason I've made most of my living for the last 20 years playing poker. Bad players constantly make horrible plays and end up winning. They can get very lucky in the short-term and take down a few pots--even make a lot of money quickly--but they are guaranteed to repeat those same mistakes and lose it all back and then some.

I played for years with a lady who had won a $60,000 holdem pot in Vegas back in the sixties with a "63 suited" (both cards spades). It was her "lucky hand." Every time she was dealt that hand, she raised and played it come Hell or high water. I even saw her make a straight flush with it. But I guarantee she was a lifetime loser with that hand.

That's an extreme example, yet other poker players make similarly bad plays (though not as glaringly obvious) based upon "observational bias"--relying on a relatively small number of observations to formulate strategy in a game with millions of possibilities.

The same applies to trading. There is one major difference in this respect between trading and poker. In poker, you can come up with a pretty mathematical estimation of any action (given enough time). But with trading, there are so many factors and so much noise, that estimations become much less accurate. But if anything, that only increases the need for mathematical analysis, because observational bias will be inherently greater.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Popov wrote:

In one of our talks with Rimantas and Justin from Autotrading Academy they gave an idea for a tool for something like anti-optimization. With two words, when we run the Optimizer, it searches for the best strategy according to the Acceptance Criteria and the Fitness goal. However the guys asked to modify it in order not to choose the "best" one, but one from the middle of the optimized set. It is something as to skip the first ten strategies from the collection.

I think this is similar to what I was thinking.

I don't think there is a need to skip any strategies from the beginning of the collection. As I tried to detail above, the "top" strategies of a Collection are skewed exponentially by small changes in data points. The Optimizer amplifies these errors, but they exist prior to running it through the Optimizer.

For the type of Optimizer that is proposed here, it will actually DECREASE the SQN of many (most? all?) of the top strategies in the Collection. (And that is a great thing!) In fact, it might decrease nearly all of the SQN's of strategies contained in the Collection (depending on how long it's been generating).

I do like the term "anti-optimization."

This would be an extremely valuable tool.

Re: Project To Determine Most Effective Acceptance Criteria Settings

footon wrote:

My 2 cents...

1-3: I've researched sorting metrics, no exploitable patterns yet. Statistically speaking there certainly are patterns, but the sample size for it needs to be huge, thousands and tens of thousands of strats me thinks, trading span maybe at least 5 years? But I'm not satisfied with that perspective. Currently my thinking is that obsession with stats and metrics is useless, I've been researching development methodology instead (sorting metrics, generation metrics, optimization metrics, validation methods etc.). I'm re-drawn to better validation methods as well, in theory that makes the most sense. I plan to re-visit White's reality check (WRC), my previous trials were ineffective, mostly because of my little experience and too little sample size. At the same time my belief grows by day that WRC will still fail by validating strats with sharpest equity curves. So determining statistical significance of a backtest might not fly...

Yes, small sample size. There is likely never a big enough sample size when it comes to trading. That's why everyone on the Internet cause pose as the trading guru. smile

What I hope to gain from my Acceptance Criteria analysis is probably much less than most people are thinking. I will happy to glean any useful information I can from it. I am attempting to define what are the greatest Acceptance Criteria values I can use without possibly starting to over-filter. My guess is that I will end up with very low values set for most criteria.

And I do agree with you that Generation Metrics, Optimization Metrics, and Validation Methods are much more important than Acceptance Criteria. I'm just tackling the easiest one first. smile

Re: Project To Determine Most Effective Acceptance Criteria Settings

footon wrote:

...Currently my thinking is that obsession with stats and metrics is useless...

I like it.  And I think you've encapsulated in that one sentence what my experience has been the past few months trading *live* accounts.  Sorry for the long post -- but I'd like to elaborate a bit more...

I think footon is making a point by sort of exaggerating (I'm assuming, pardon me if I'm wrong) -- it's not that stats and metrics are a waste of time, it's just that beyond a certain point there is little additional benefit.  And I have a theory as to why this is so -- at least as applied to the Portfolio Expert created by EA Studio.

Suppose I create a Portfolio Expert that is a collection of 100 strategies, and let's suppose I sort using SQN.  And suppose after letting it run several days the SQNs range from a high of '9' down to '3'.  And let's suppose I add that entire collection to a live, micro account.  I'll bet 25 cents that approximately 50% will trade in the green zone (as footon likes to call it) and 50% will trade in the red zone.  (And I'll leave it to qattack to perform the actual experiment in his live account to determine if I am right or wrong).  If you think about it, 50 strategies trading in the green zone is AMAZING.  Can you imagine creating 50 profitable strategies without using Popov's software -- it would take forever.  If I have 50 winners I really don't care about the 50 losers -- since I'll just prune them.  Furthermore, I'll bet another 25 cents that those 50 winners are distributed throughout the collection.  I don't (yet) buy into the notion that the 10 best strategies really trade the worst.  I do have some strategies that come from the top of the list.  If anyone plans to throw away their top 10 strategies I will gladly accept them.

What I think is going on is this -- any strategy with a SQN of 3 or greater is probably fine and has the same chance of succeeding.  That is, if I take 100 strategies with a SQN of 9 and 100 strategies with a SQN of 3 and place all 200 in a live account, then in the end approximately the same number of each will trade in the green zone (again, another experiment for qattack).  The reason for this is beyond our control -- unless you have a crystal ball or time machine.  It's due to the nature of forex data patterns and how our indicators determine what is a signal and what isn't a signal.  A strategy with a SQN of 3 or 4 is a fine strategy -- good enough to be a winner during certain Data Horizons.  The same can be said of a strategy with a SQN of 9.  But I doubt a strategy with a SQN of 9 will really trade 3x better than a strategy with a SQN of 3.

To summarize:
1. No such thing as a holy grail EA -- no matter how good the stats, any strategy has a probability of winning or losing.  And that probability fluctuates according to the the Data Horizon.  This is why within a collection of 100 strategies there will always be some winners and losers.
2. Once the stats exceed some threshold -- e.g. a SQN of '3' or '4' -- they are probably good enough to be potential winners (or losers).  This is the main point -- trading success is *not* linearly proportional to the stats.  Probably more like exponential decay -- that is, trading success probably does increase as stats improve, but then reaches a point where things level off.
3. Strategies whose statistics exceed some threshold (e.g. SQN of '3' or '4') may not really trade much better.
4. The good news is there really isn't much more we need to do -- since this may all be due to the nature of forex data patterns and how indicators are triggered.  Yes -- it's a shame to have 50 losers in the 100-member collection.  But don't lose track of the 50 winners.


As a side note -- I'm working on an indicator that is called "Adaptive Pruning".  Since it is an indicator then it is constantly running and monitors all trading transactions.  The settings for this indicator allow me to specify that I don't want to trade any Magic Number whose Win Ratio is, say, less than 0.75.  The indicator manages a simple file that is a list of Magic Numbers that don't meet that requirement.  I then add a few lines of code to Popov's *.mq4 file that checks this list before opening a new trade, recompile and launch Popov's Portfolio Expert EA.  This is sort of a variation of MT4 Tracker, but it performs simple pruning automatically without any additional intervention.  And it has the benefit of ignoring the poor performers sooner -- especially if I use MT4 Tracker only once every couple of weeks.  I'm still testing it and will have more to say in about a month.

Re: Project To Determine Most Effective Acceptance Criteria Settings

qattack -- just to be clear, you understand in the previous post when I mentioned leaving the live testing to you, I was joking.  It is no fun to lose money and I am not prepared to perform the live tests myself.  I was just making a point and attempting to inject a little bit of humor that I hoped made it clear I was just blowing smoke and *not* ready to risk my own money on my own theories.

18 (edited by qattack 2017-08-20 08:50:44)

Re: Project To Determine Most Effective Acceptance Criteria Settings

Steve, I don't want to lose 50 cents, so I won't take your bets. I think you may very well be close to being correct with your first bet. And 50% would certainly be amazing. (However, this would take tens of thousands of EAs to "prove".) But the pruning strategy could present a problem. What to take out, what to leave in, and when?

Of course, the pruning strategy is essential no matter what method you are using--and I agree that it is probably by far the most important element. So we are in agreement there.

I believe, though, that adding an OOS analysis can probably give you a major head start on that job of pruning, without sacrificing the quality of generated EAs. If this is the case, then the extra work will be well worth it. After all, you can only run a finite number of EAs. Even if the group is improved a small amount, it's a big win.

Actually, I'd take your second bet; you make an argument against it in point #2. All of your points are completely true, though. But again, I'd argue against point #4, as there are further things we can do to improve our results. No, there is no Holy Grail of Trading, but there are plenty of bad strategies that we can toss out.

Re: Project To Determine Most Effective Acceptance Criteria Settings

It's early to report any findings yet, and this is jumping the gun, but I am running eight instances against "Maximum Balance Deviation" Acceptance Criteria. The range is from 6 to 13 in steps of 1.

Strategies that "Passed Validation" (in the Generator section) decline as expected with smaller MBD values. However, what is surprising is that values of 9 and lower, the "Passed Validation" (in Monte Carlo) DECLINES. This could be just due to small sample size, but the four larger values are in line with the baseline (with only minimum Acceptance Criteria). The four smaller values are at 7.49% to 10.93%, compared to 12.73% for the baseline. I expected all MC validation values to rise.

This doesn't make sense to me right now; I'm a bit too tired to think about it. Maybe a "too smooth" Equity Curve is a good flag of an over-optimized strategy? I'm going to leave this generation running overnight to get a more accurate picture.

20 (edited by qattack 2017-08-20 09:54:19)

Re: Project To Determine Most Effective Acceptance Criteria Settings

EDIT to the above post:

I can currently think of two additional reasons why this may be occurring:

1. A disproportionate number of strategies with higher-than-average Profit may be filtered out. This would make the remaining tests (on average) more vulnerable to MC tests.

2. Maybe the remaining strategies have a lower average Trade Count and perhaps strategies with lower Trade Count (on average) are more vulnerable to MC Tests.

Re: Project To Determine Most Effective Acceptance Criteria Settings

sleytus wrote:

I think footon is making a point by sort of exaggerating (I'm assuming, pardon me if I'm wrong) -- it's not that stats and metrics are a waste of time, it's just that beyond a certain point there is little additional benefit.

Yes, that is the point I try to convey. We are dealing with so many strats, it is impossible to approach a collection with the same mindset and methods we use on individual strats. With the current level of automation lets let the machine do the work, it is only about having the "right" way of letting it do its thing, but first we have to find it.

sleytus wrote:

I'll bet 25 cents that approximately 50% will trade in the green zone (as footon likes to call it) and 50% will trade in the red zone.  /---/  If you think about it, 50 strategies trading in the green zone is AMAZING.

I've sampled things quite a bit and I've achieved 80% greens at best, the worst result was around 30%. Can't give you other reason than data. It might as well be that we have to choose carefully the right data horizon window upon which generation and optimization takes place. I've been using numbers mainly - 75% train, 25% test and so on, but apparently this not the right way to do this.
My line of thinking - there's a method to produce a collection of strats, which when combined have a positive expectancy. I define this expectancy as having at least 65% of strats in the green zone. The problem is that I haven't been able to be consistent with those methods I've tested, having a drop of 50% in profitability is frankly a disaster smile

Another observation - Sleytus, you are assuming that those strats, which survive, retain their edge, correct? My modelling shows that this is not the case, there are still too many bad apples which go bad after initial pruning, thus rendering the overall profitability to breakeven minus trading costs in the best case scenario...

---

I woke up this morning and realized that the optimizer has an option - Add the optimized strategies to the Collection
It means that all optimization cycles, which are better than the last one, get pushed into Collection, therefore I can get a glimpse of this "anti-optimization" idea by testing a handful of strats (equal number of good and bad strats) and evaluate their effectiveness.

Lastly, John, forgive me for going off-topic, if you want I can move my ramblings to a different thread.

Re: Project To Determine Most Effective Acceptance Criteria Settings

Hi,
I played a little bit with few strategies optimization criteria. By this experiment I wanted to know answer to similar your thread question. What is best optimization criteria? Was doing walk forward analysis, which means that I optimized on some data and looked for period of unseen data, and repeated it 3-5 times. Can't remember exact few what I was trying, but the winner was SQN smile It produced best results with unseen data chunk on few tested strategies. So I concluded that I will use it every time. Of course it was small size and so on. I am very interesting in your experiment results. Also I believe for strategy generation probably there is a way to set few of them in certain way to get some better strategies. So keep researching! smile

Also I made small experiment with different optimization methods. I could not trust FSB optimizer, and researched other methods, what google offers and some books advocates. However from 8 trading strategies (seen-unseen) I found that FSB auto optimizer is one from top performers, so now I am pretty confident with it and stopped worrying too much. Same thing here about small sample size, but it was so exhausting to make this experiment by hand and I dropped it.  Especially time consuming was number 5 from the list, which needed to be drawn variables stability graphs with excel, and take average values only from stable region ... very glad that this one was not a winner smile


https://s30.postimg.org/qs5mvokcd/Screenshot_1.png

Re: Project To Determine Most Effective Acceptance Criteria Settings

footon wrote:

sleytus wrote:

I'll bet 25 cents that approximately 50% will trade in the green zone (as footon likes to call it) and 50% will trade in the red zone.  /---/  If you think about it, 50 strategies trading in the green zone is AMAZING.

I've sampled things quite a bit and I've achieved 80% greens at best, the worst result was around 30%. Can't give you other reason than data. It might as well be that we have to choose carefully the right data horizon window upon which generation and optimization takes place. I've been using numbers mainly - 75% train, 25% test and so on, but apparently this not the right way to do this.
My line of thinking - there's a method to produce a collection of strats, which when combined have a positive expectancy. I define this expectancy as having at least 65% of strats in the green zone. The problem is that I haven't been able to be consistent with those methods I've tested, having a drop of 50% in profitability is frankly a disaster smile

Small sample size...How much is "large enough" sample size. I fear it's in the tens of thousands of strategies. For comparison, run a Monte Carlo test with 20 runs and examine the Confidence Table, taking note of every Confidence Level. Then repeat the exercise for 100 runs and 1000 runs and compare the results. You will find that they vary wildly.

The various strategies we generate are MUCH more volatile than this compared to one another. That, with the combination of not having the luxury of 1000 test runs, will cause each group of 100 EAs to have drastically different performance levels. Only by finding the "best" selection methods can we reduce this problem significantly (as detailed below).

sleytus wrote:

Another observation - Sleytus, you are assuming that those strats, which survive, retain their edge, correct? My modelling shows that this is not the case, there are still too many bad apples which go bad after initial pruning, thus rendering the overall profitability to breakeven minus trading costs in the best case scenario...

This is exactly what I'm trying to say in many other posts...it just takes me 490 to say it instead of 49. How long is "long enough" before you decree that these strategies are "profitable"? The "profitability test" is over the lifetime of deployment of the strategies. A group of 100 EAs may perform impressively during the first week, with 60% winning strategies. In fact, you may be able to retain that 60% figure through pruning for a month or longer, yet over the lifetime of that group, it may still very well be an overall loser.

sleytus wrote:

I woke up this morning and realized that the optimizer has an option - Add the optimized strategies to the Collection
It means that all optimization cycles, which are better than the last one, get pushed into Collection, therefore I can get a glimpse of this "anti-optimization" idea by testing a handful of strats (equal number of good and bad strats) and evaluate their effectiveness.

I cannot locate this option; I opened up a new browser to search.

What would the effect of this option be? Am I correct in saying you wouldn't want to use this in normal generation, as the Collection would quickly become full of the same strategy Optimized in different ways? So it's just for observation of how the Optimizer works?

sleytus wrote:

Lastly, John, forgive me for going off-topic, if you want I can move my ramblings to a different thread.

Your comments relate well enough to the topic at hand. It's more that the Topic doesn't relate to the topic. Maybe we can have Popov change the topic to something that will better depict our mission.

ThanX!
John

Re: Project To Determine Most Effective Acceptance Criteria Settings

Irmantas wrote:

Was doing walk forward analysis...

Very much wish there was automated WFA. At the very least, this would help in many facets of testing.

Irmantas wrote:

...but the winner was SQN smile It produced best results with unseen data chunk on few tested strategies. So I concluded that I will use it every time.

I'm a convert from StrategyQuant, which I've used nonstop since May until looking into FSB/EA Studio. I settled into using SQN "every time" almost immediately, after quite a bit of research and thinking about generation methods. While there are some flaws with it, it is by far the best method of generation available. I honestly believe that we could improve upon this metric for our purposes. I haven't put any thought into that yet, but it's an interesting idea.

Irmantas wrote:

Also I made small experiment with different optimization methods (and) ...researched other methods

I think that Optimization has been done the same traditional way for some time now, and nobody's really thought about how it can be improved. But it's very obvious that it can be!

Re: Project To Determine Most Effective Acceptance Criteria Settings

John, I'm attaching the screenshot of Optimizer settings.

This addition mode works only manually - open a strat in the editor, then move on to the Optimizer, hit start and it will push greater-than-previous strat into collection, essentially you get numerous variants of one strat at a time (if the addition box is checked).

Post's attachments

bal.JPG 58.91 kb, file has never been downloaded. 

You don't have the permssions to download the attachments of this post.