The reason I believe is that by looking at the IS and OOS curves, I am just eventually ending up with optimising the OOS data by exclusion - I end up with exactly the same result as if I'd used all the data for optimisation, without the benefit of having double the data available, which might help out - by including many more different trading conditions, and reducing the chances of curve fitting. I believe the longest data horizon available (200,000) in Studio is the best guard against curve fitting (along with only having a couple of entry and exit criteria). It also simplifies the generation process as the computer can do what I have been spending my time doing. I know that most of the curves will be nice and linear, and I know that most (80%) of tested strategies will fail testing, but they do already! I will be no better off than before, and with less work.

Have I got it wrong regards OOS data? I have used OOS data to visually check strategies for a few years, and not taken the time with walk forward testing as such, which might have more merit. But I do believe the OOS data is not really OOS once you have manually filtered out say 10000 strategies - it is not really out of sample any more - it is only OOS for the first 100 runs for instance. Please suggest if I am misguided in this assumption - I will then carry on with my current method for strategy generation.

]]>But some have been selected with other criteria and I try to compare the performances. It is too short to give a judgement, but it do not have the impression that they get worse over time, as often predicted or reported here.

To me they just seem to have good and bad periodes each one with its individual variation. So I am still creating individual EA's in different symbols and time frames expecting to have every week more winners than loosers which works so far but I would not be surprised if it suddenly changes.

I also had 2 different symbol Portfolio experts with 9+5 EA's running, but although they were the best of a collection, after approximately 3 months only 2 EA's were left, all others performed too bad from the beginning.

Sorry for getting too far away from the AC-setting topic.]]>

Hi, I like to contribute my small experiance:

I always use maximum OOS-Range and simple Acceptance Criteria (SQN min 2, max DD 25-30%) without optimizing.

The results (max 100) I look through "manually" checking the overall performance chart.

As I am more an "optical" type of person I only leave those strategies having not too rough valiations in the performance and most important of all also show at least an acceptable performance in the range of the before unknown (newest) data.

This way I look at it as if I was moved in the past (beginning of OOS-data) to create the strategies and when checking the results I am looking into the future (till tody)

Doing so I see often the case that good results in the initial range are completely weak in the overall range, and strategies not in the to field give a quite nice result.

I never had the impression that there is a system in it, it seems to me that it just depends on the variation of input data and the best results (in my opinion with small baseline deviation) are sure not the initially top rated ones but ditributed over the 100 results.

The condensed 3-5 leftover strategies I use those then on demo- and life account.

Hey Bru1

What have you then for Experience in live? How long the Strategies work good before change?

]]>I always use maximum OOS-Range and simple Acceptance Criteria (SQN min 2, max DD 25-30%) without optimizing.

The results (max 100) I look through "manually" checking the overall performance chart.

As I am more an "optical" type of person I only leave those strategies having not too rough valiations in the performance and most important of all also show at least an acceptable performance in the range of the before unknown (newest) data.

This way I look at it as if I was moved in the past (beginning of OOS-data) to create the strategies and when checking the results I am looking into the future (till tody)

Doing so I see often the case that good results in the initial range are completely weak in the overall range, and strategies not in the to field give a quite nice result.

I never had the impression that there is a system in it, it seems to me that it just depends on the variation of input data and the best results (in my opinion with small baseline deviation) are sure not the initially top rated ones but ditributed over the 100 results.

The condensed 3-5 leftover strategies I use those then on demo- and life account.]]>

...Let me rephrase your question for you a bit...

Thanks for answering my questions so completely. You've given me a lot of food for thought.

hannahis wrote:

There are generally two types of users

I think you've captured the big picture well. For now I would place myself in the first group (statistics) -- but that is only because I haven't been doing this long enough to have accumulated much in the way of sound theory.

Popov's software has enabled me to trade successfully at an early stage. This makes forex trading very interesting to me and also motivates me to learn more -- and I've learned more from this forum than any book or Google. Over time I hope to be able to better merge the two.

Thanks to both of you for great advice and information...

]]>Often FSB's backtesting result may indicate that my EA is unprofitable but I still go ahead because I trust my theory and put it in live or demo testing and I get profitable results.

So it is not true that if your EA yield unprofitable backtest, it will be unprofitable in real trading. I have lots of experience to testify it. So it's not just an opinion, it's experience.

I thought of what I wanted to ask about this.

After you place your EAs on a demo server for a period of three months, and you subsequently go back and analyze those same EAs over only the period that you had them on the demo server (but obviously were not In Sample during your development), do those backtests show results that differ significantly from how they performed in your demo account? Enough to turn a slightly losing-looking strategy into a strategy that you would be confident trading (for a moderate amount of real money)?

Furthermore, if this is the case, can you think of any possibility to capture these strategies and push them to Collection without adding too many other strategies that would end up not being valid? Because if this is the case, it makes the situation even worse when trying to filter. Our choice is to allow more invalid strategies through or fewer valid strategies, both of which is bad for the proportion of good strategies in our Collection.

It is not so much the overall number of good strategies in the Collection that matters, but moreso the PROPORTION of good strategies to bad.

]]>qattack wrote:I understand that a successful backtest is not indicative of a successful trading strategy.

What IS true is that an UNsuccessful backtest IS indicative of an UNsuccessful trading strategy.

This is NOT true.

Often FSB's backtesting result may indicate that my EA is unprofitable but I still go ahead because I trust my theory and put it in live or demo testing and I get profitable results.

So it is not true that if your EA yield unprofitable backtest, it will be unprofitable in real trading. I have lots of experience to testify it. So it's not just an opinion, it's experience.

I'm skeptical of this statement. Don't get me wrong, your simple statement almost immediately made me change my mind on my statement. So what I'm saying is that I'm skeptical of BOTH statements, but much moreso mine than yours now. Does that make sense?

This is one statement that I made as "fact" that I intuitively believed, but also realized may not be 100% correct. So now, instead of being 95% sure, I am 20% sure. As I said before, if I put a qualification on all my opinions, my posts would be twice as long.

I'll probe you for more information on this later, because it's one of my core assumptions.

hannahis wrote:

There are generally two types of users

1. Use statistical results/criteria acceptiance to search for Successful EA2. Use sound trading theory to build successful EA that will yield the kind statistical results you are looking for.

The outcome...

1. Statistical search will yield billions or unlimited outcomes and the process of filtering these are varied and not necessary yield any good results at the end of the day.

2. Using sound theory to build, I can get a 10 to 50 profitable EA in 1 round (a couple of hours to complete 1 round).

So to me, using theory to build EA is not like a shot in the dark and present much lesser uncertainity than a statistical search.

Nevertheless, some of us are more incline to one than another, depending on our educational training and perspective. If I'm a mathematician, I would be prone to statistical search as my 1st preference. Unfortunately (and fortunately), statistic isn't my forte so I choose the latter.

I COMPLETELY agreed with everything you said, except for the last statement. Though I am tackling EA Studio's possibilities first, it is not my first preference. I only bought EA Studio for the possibility that we may find an effective workflow that will allow the premise to work.

As a "mathematician", I completely realize that FSB is the more powerful of the two tools. In fact, I'm 100% confident that I can (and will) develop a proven trading plan, exactly as you have, through the use of logic.

However, it is also logical that I first make basic experiments with EA Studio, as if it DOES work (...with only basic ideas), then it will be MUCH easier to generate an "infinite" number of profitable EAs in this way. Even though the success probability is much less, there is a possibility for greater returns (in efficiency, potential). It's like working for a living versus buying Powerball tickets hoping to hit the 700 million $$$ jackpot. (Thought I've never bought Powerball ticket and the I'm not comparing the odds with winning Powerball to the chance that our EA Studio methods will work!)

Does that make sense?

]]>1. You mention ending up with more "valid" strategies in the final collection. I don't know what the definition of "valid" means.

"Valid" simply means strategies that "WILL" (more accurate is "are highly likely to") earn profit in the future based upon the fact that their underlying indicator combinations are sound.

So if my fantastic EA above with the entry rule "enter the market if it rains" continues to making soaring profits for the next five years, this is still NOT a valid strategy. (well, it actually may be, but that's another story...) So, no matter how much we filter and prune, we are still going to get "lucky" with some strategies that make money in the future, but not based on the merits of their indicators.

sleytus wrote:

2. You brought up curve-fitting -- which I hadn't seriously considered before. When comparing two strategies with similar statistics it could be that one is more curve-fitted than the other and then, yes, it would seem desirable to use the less curve-fitted one. But how does one test for curve-fittedness? Applying Monte Carlo tests?

And that is the question we need to answer. Monte Carlo testing is certainly one way to test for curve-fittedness,

but knowing exactly how is a different matter entirely. And as I've mentioned elsewhere, it may be that each different combination of indicators needs its own MC test parameters (which would entirely defeat the purpose of such a test.).

OOS testing is currently the major way to test for curve-fittedness. That's because a curve-fitted strategy will only perform well in the data over which it is generated (not entirely true, as in our above example of the rain strategy).

One reason, among others, that a strategy becomes curve-fitted is that the software will locate a very few periods where the largest moves in the market occur and tailor a strategy almost exclusively around those moves. So if you are testing over a two year period and there are only seven of these moves, there are basically only seven trades that matter, even if your trade count is 200. That's very oversimplified, but you can get the idea.

There must be other innovative ways that we can filter for valid strategies. I'm 100% certain of that. But I don't know that we have the tools to do so. I think they will mostly employ OOS in some fashion. I also think we have the capability as a group to develop those tools, but know this: it won't be easy. It would be an entirely revolutionary concept.

sleytus wrote:

3. Another comparison question -- there is no right or wrong answer. One strategy has been back tested for 10 years and another for 1 year. And we can assume the one back-tested for 10 years is less curve-fitted. Which one would you choose? A little more information -- the more input data (i.e. longer the back-testing) then the less curve-fitted the strategy will be. But that also means the poorer will be the statistics. I brought this up in a different thread -- when fitting a polynomial equation to data points, the more data points used then the poorer will be the fit. So, again, which would you choose -- a less curve-fitted strategy that trades okay for 1 year or a more curve-fitted strategy that trades more profitably -- but then stumbles after 6 months?

You have a fundamental flaw in your thinking about this entire subject. That's sounds harsh. I'm not trying to mean, just trying to help you realize how you are thinking about it wrongly, and you need to realize that you keep missing an important logic of the generation process.

If you offered me these two strategies and guaranteed that each would perform as stated, then...I don't care. In fact, I wouldn't bother with either of them.

Let me rephrase your question for you a bit. I know it's picky, but it pertains to the flaw in your thinking.

Instead ask the question, "Assume two methods of generation. The first backtests for 10 years and produces strategies that are less curve-fitted that perform OK for one year; the second backtests for 2 years and produces strategies that are more curve-fitted, stumble after six months, but perform significantly better."

Asked in this way, I'd bet most people would jump at option #2. I would probably also be inclined to pick option #2, but not so fast. We don't have quite enough information. What if the Standard Deviation (and thus the risk) of strategies from option #2 ends up being extremely high and our account is in jeopardy? What if the second method only produces 10% of the number of strategies as the first?

OK, I didn't do a very good job of coming up with additional information needed (but there is a reason for that).

"Wait," you say, "method #2 is in reality always going to generate MORE strategies than method #1; MANY more, in fact."

Aha! That is very true! And this is also a huge problem. Method #2 is going to generate SO many more strategies that a MUCH larger proportion of them will be "invalid" and curve-fitted to such a degree that they are useless. So now we come back to our problem: How do we prune enough of these poor strategies so that they don't wipe out the profit we obtain with are valid strategies?

You might say there is a way to do it. And I really believe that there MUST be. The only problem is that we don't know HOW to do it, and it will *literally* take us the rest of our lives (or longer) to arrive at that way by trial and error. The sample size compared to the Variance is just way too small.

Going back to what I know about...poker. Namely, Texas Holdem, by far the most popular variant. You wouldn't believe how sophisticated the strategy is among the top players today. (!!!) No, I'm not talking about all those chumps you see in tournaments on TV that they make into movie stars (although a select few players really DO know what they're doing!). No, I'm talking about a multitude of players that honed their games on the Internet, often playing six to twelve tables simultaneously (amounting to easily twenty times as many hands per hour as live play).

Holdem poker strategy, between it's introduction into Vegas in 1967 and the present day, has been increasing exponentially in complexity and effectiveness. It's similar to technology. Until the mid-2000s, all instruction was accomplished with books. These strategies were very stagnant and did not keep up with the quickly-developing strategies since 2003. Internet video learning sites became the Gold Standard. Pros were able to get their new ideas and discoveries to the public very quickly, and information exchange between players increased a million-fold (not exaggerating).

Each successful strategy was built upon the combination of all other previously-successful strategies. Early strategies were based upon only one or a handful of player's experiences.

You wouldn't believe how sophisticated today's top players are. Even players at my level routinely use "gaming theory" to calculate the best actions to take in particular situations. This is done with hundreds (or thousands) of hands away from the table, and with the aid of software, as the calculations take far too long to accomplish at the table. Then, when a situation arises at the table, we draw from our memory of situations we've analyzed and try to relate it as closely as we can to some of these situations to take appropriate action.

A program that can beat good poker players in a "ring" game (that is, a cash game with multiple players) is still a LONG ways off. I do believe it will eventually happen, but for now it is out of reach. Computer processors just aren't fast enough yet. It's kind of like chess...at one time, people said that a chess program would never be able to beat a human opponent. Now, even cheap programs can beat Grandmasters, especially at speed chess.

Trading is a LOT more complex than a game of poker. There are SO many more variables. It's relatively easy to put a poker player on a range of hands or at least the probability that he holds a range of hands. And worst-case scenario, there are only 169 different starting hands he can have anyway. (how they combine with the board cards becomes very complicated sometimes, however.)

In trading, we only think about a limited subset of variables, but in reality there are many more, and that is what produces all the "noise" we experience. Without this noise, we would all be raking in the big bucks.

My point with this expose (novella?) is that it will take time and mathematical analysis to develop sufficient workflows to effectively weed out the invalid strategies.

]]>I understand that a successful backtest is not indicative of a successful trading strategy.

What IS true is that an UNsuccessful backtest IS indicative of an UNsuccessful trading strategy.

This is NOT true.

Often FSB's backtesting result may indicate that my EA is unprofitable but I still go ahead because I trust my theory and put it in live or demo testing and I get profitable results.

So it is not true that if your EA yield unprofitable backtest, it will be unprofitable in real trading. I have lots of experience to testify it. So it's not just an opinion, it's experience.

**There are generally two types of users**

1. Use statistical results/criteria acceptiance to search for Successful EA

2. Use sound trading theory to build successful EA that will yield the kind statistical results you are looking for.

The outcome...

1. Statistical search will yield billions or unlimited outcomes and the process of filtering these are varied and not necessary yield any good results at the end of the day.

2. Using sound theory to build, I can get a 10 to 50 profitable EA in 1 round (a couple of hours to complete 1 round).

So to me, using theory to build EA is not like a shot in the dark and present much lesser uncertainity than a statistical search.

Nevertheless, some of us are more incline to one than another, depending on our educational training and perspective. If I'm a mathematician, I would be prone to statistical search as my 1st preference. Unfortunately (and fortunately), statistic isn't my forte so I choose the latter.

]]>1. You mention ending up with more "valid" strategies in the final collection. I don't know what the definition of "valid" means. And you don't have to answer this -- it's just that I don't understand how one can test for something that isn't defined.

2. You brought up curve-fitting -- which I hadn't seriously considered before. When comparing two strategies with similar statistics it could be that one is more curve-fitted than the other and then, yes, it would seem desirable to use the less curve-fitted one. But how does one test for curve-fittedness? Applying Monte Carlo tests?

3. Another comparison question -- there is no right or wrong answer. One strategy has been back tested for 10 years and another for 1 year. And we can assume the one back-tested for 10 years is less curve-fitted. Which one would you choose? A little more information -- the more input data (i.e. longer the back-testing) then the less curve-fitted the strategy will be. But that also means the poorer will be the statistics. I brought this up in a different thread -- when fitting a polynomial equation to data points, the more data points used then the poorer will be the fit. So, again, which would you choose -- a less curve-fitted strategy that trades okay for 1 year or a more curve-fitted strategy that trades more profitably -- but then stumbles after 6 months?

Or would you claim you can have both -- less curve-fitting plus equal or better statistics? Do you understand what I'm getting at?

]]>If I have a strategy with a SQN of 8 and a win ratio of 0.92, does it make a difference how it was generated?

Or, another way of asking -- if I have two strategies with very similar statistics but one was generated in a casual way and the other using a highly refined procedure -- would your expectations be the same or different for the two?

When you ask the question in that way, then the answer is no, it doesn't matter how it was generated. (This is probably not entirely true in the case of SQN, but it's just a technicality)

But if you include Acceptance Criteria and the various Validation methods we have at our disposal (including OOS testing)--and we know how to use them effectively--then *theoretically* we "should" have fewer curve-fitted strategies in our overall collection.

So instead of a Collection of 100 strategies and 20 of them being "valid" strategies, we may end up with a Collection of 60 strategies and 17 "valid" ones.

The goal, as I envision it, is to filter out the bad strategies that are generated before the user even has to deal with them.

Actually, generation might possibly have an effect on the "curve-fittedness" of the strategies generated. As an extreme example, let's say you generate an M30 strategy over 100 years of data. Now, if you find a winning strategy with that amount of data, I guarantee that it's not curve-fit. There's just too much data to force to conform with a simple set of rules. However, if you generate over M30 for 2 weeks, then you can bet that curve-fit strategies will fill up your 3TB hard drive in no time. (A major problem, as noted by Hannah, is that algorithmic trading just became prevalent around 2014, so the market is substantially different than it was ten years ago.)

And that's what we want to prevent: pushing curve-fit strategies into our Collection. If we can find a way to filter out a certain portion of curve-fitted strategies before we even need to deal with them, that will be a huge help. Every such eliminated strategy is essentially pure profit.

sleytus wrote:

I'll typically let EA Studio run for a week or so -- perhaps longer. I end up with 100 strategies. To my eye they all look great -- yes, the ones at the beginning of the list have better statistics and nicer looking balance charts, but the ones at the end also look fine. I think we agree that position in the collection is no guarantee of success -- then how does one go about deciding which apples to toss if they all look good on the surface?

That's what I don't get. If an apple looks bad then, sure, it is easy to exclude it. But what about all the good looking apples that are destined to perform poorly -- how do you identify those?

And again, that is what we need to filter out with validation methods before it's pushed to the collection.

As a very simple example, take two different strategies of SQN 8 and win rate of 92%. If we run these strategies through MC tests varying the system parameters, this test may show that one of the strategies is particularly vulnerable to these changes. That system is much more likely to be too highly curve-fit than the other, so we can discard it. Yes, I know, both of these strategies are highly curve-fit. But perhaps the other one is so basically sound that it will still produce of SQN of 2.4 in extensive live trading. The second system was not curve fit to such a degree--even though it was still highly curve-fit. There was just enough underlying actual potential that there was still plenty of profit remaining after the curve-fitting was removed.

Here is another example. Let's say you have a parameter, "enter the market when it rains". Other indicators were randomly generated to fill out a beautiful winning strategy with an SQN of 8 and win rate of 92%.

Of course, you will look at that strategy and remove it, since you know better. But suppose you DIDN'T know better. Do you think there could possibly be a combination of Acceptance Criteria, Monte Carlo tests, Optimization, OOS tests, etc. that could filter that strategy before it is pushed to your Collection? I would bet you 25 cents that it wouldn't stand up to a reasonable OOS test!

You might counter saying that there's little chance that the software would ever come up with a strategy based upon the rule "enter the market when it rains" that has an SQN of 8 and win rate of 92%. (!)

But I would disagree with you. Every time it curve-fits a strategy with some random indicators, that is what the software is doing.

Lastly, I'm not trying to discourage anyone or crush any dreams (particularly my own!). I'm just reporting what I find and throwing my thoughts on the matters down on "paper." While I have some strong opinions on the various issues, I am far from the last word. I'm very likely the newb to this whole trading game. One problem I have is that I tend to intertwine my strong opinions with those that I'm less sure about. If I didn't do that, I'd always be placing conditions on my statements (and my posts would be twice as long--and you wouldn't want that!). I'm very much open to changing my mind, as that's what this whole journey into shorter data horizon generation arose from in the first place.

]]>Certainly, portfolio management is impossible to be left aside, but if there is huge difference between collections in profitability, and currently my research shows this, then it takes trading to a point where break-even is the best outlook... Not trying to bring anyone down, but sort of a pessimistic realism We must be able to pick more good than bad apples into a collection.

This is something I wanted to say, but just didn't want to. I'm still hopeful (but pessimistic as well) that we can find a way to effectively filter out the worst strategies.

footon wrote:

Back to the topic - John, you single out MC as a main validation tool, but I'm having trouble seeing its usefulness if the results it gives are inconsistent. As you said before there's huge variability if one runs MC through multiple times. The building block of any scientific work or method is that one must be able to replicate the results. With MC it is not possible and therefore I've let it sit aside. I've written bootstrapping and MC algos for White's reality check and in those instances results were confidently replicable.

I note MC as a main validation tool for a few reasons. Number one, it is one of the only validation tools we have. Number two, I have extensive experience with MC testing as it relates to poker...seperately with both with bankroll management and studying artificial intelligence with regards to finding the best poker strategy in a game with unknown variables. And three, I have little practical experience applying MC to trading systems, so I don't know it's proper uses or potential. As I've previously noted, I've seen some gurus using it when it is obvious to me that they don't know the specific mathematical reasons behind why they are using it. When it comes to trading, I can spot misapplications of MC tests, but I cannot tell you what the correct ways to use them are.

Sleytus mentioned White's reality check and I briefly looked it up, but didn't take the time to understand...and I certainly don't understand it for now, lol.

footon wrote:

I looked into anti-optimization. From what I'm seeing it is not worth it to discard best optimization runs. What is there is different level of adaptability between strats, one group is quite static and fits over a specific length of data and if used on another period (OOS) it fails to "adapt" and turns fiercely south. The other group seems to wrap itself better around new piece of data. It intrigues me and I will plow ahead in that direction.

I'm not quite sure what you're saying here, but I get the general gist of it. I believe what you are referring to is another problem in that validation methods (or settings) may very depend SPECIFICALLY what indicators are being used. I've thought of this before, and actually it applies to several different areas of strategy-making.

]]>If I have a strategy with a SQN of 8 and a win ratio of 0.92, does it make a difference how it was generated?

Or, another way of asking -- if I have two strategies with very similar statistics but one was generated in a casual way and the other using a highly refined procedure -- would your expectations be the same or different for the two?

I'll typically let EA Studio run for a week or so -- perhaps longer. I end up with 100 strategies. To my eye they all look great -- yes, the ones at the beginning of the list have better statistics and nicer looking balance charts, but the ones at the end also look fine. I think we agree that position in the collection is no guarantee of success -- then how does one go about deciding which apples to toss if they all look good on the surface?

That's what I don't get. If an apple looks bad then, sure, it is easy to exclude it. But what about all the good looking apples that are destined to perform poorly -- how do you identify those?

]]>Certainly, portfolio management is impossible to be left aside, but if there is huge difference between collections in profitability, and currently my research shows this, then it takes trading to a point where break-even is the best outlook... Not trying to bring anyone down, but sort of a pessimistic realism We must be able to pick more good than bad apples into a collection.

Back to the topic - John, you single out MC as a main validation tool, but I'm having trouble seeing its usefulness if the results it gives are inconsistent. As you said before there's huge variability if one runs MC through multiple times. The building block of any scientific work or method is that one must be able to replicate the results. With MC it is not possible and therefore I've let it sit aside. I've written bootstrapping and MC algos for White's reality check and in those instances results were confidently replicable.

I looked into anti-optimization. From what I'm seeing it is not worth it to discard best optimization runs. What is there is different level of adaptability between strats, one group is quite static and fits over a specific length of data and if used on another period (OOS) it fails to "adapt" and turns fiercely south. The other group seems to wrap itself better around new piece of data. It intrigues me and I will plow ahead in that direction.

]]>Several have reported that within a 100-member collection it is not necessarily true that the top-ten strategies are the most successful and the bottom-ten are the worst. That's a clue that back testing can only take you so far. I absolutely believe that back testing provides an "edge" -- but I think it's a probabilistic edge and is not necessarily quantifiable.

]]>