Topic: Using Your Broker's Data -- Yes or No?

The forum seems a bit quiet these days, so I thought I would stir things up by raising a controversial question.  I've often seen it recommended we should use data from our particular broker for generating and optimizing strategies.  My take on this is it probably doesn't make much difference what the data source is.  And here's why...

(1) The data source is a *.csv or *.json file of a few thousand OHLC values -- essentially an array of numbers.  A strategy is an algebraic formula.  If I give you 5 different *.csv files from 5 different brokers -- i.e. 5 different arrays of numbers --  can your algebraic formula really distinguish between the different brokers?  And, if so, what exactly differs between the 5 arrays of numbers?

(2) I have accounts with 5 different brokers.  I've found that winning strategies win with each of the brokers and losing strategies lose with each of the brokers.  I have yet to find a strategy that performs poorly with one broker and great with another -- or vice versa.

(3) When I "refresh" or re-optimize an existing strategy with a data source from a different broker (but using the exact same data horizon), the back test statistics come out fairly close.  Not exact, but similar.  Again, I have yet to come across a strategy that has great back test statistics using one data source compared to another (again -- same data horizon).

(4) When I "refresh" or re-optimize an existing strategy using the *same* data source but a different data horizon, then the back test statistics can differ significantly.  In fact, the delta in statistics using different data horizons is bigger than when using different data sources.

My point is this -- though the array of numbers from different brokers will differ somewhat, so will the array of numbers from different data horizons (but same broker).  Inherent in these arrays of numbers are certain patterns.  I'm making the claim that within the same data horizon, the patterns probably do not differ very much from broker to broker.  However, when comparing different data horizons, then patterns are more likely to differ.  This is probably why strategies perform better or worse at different times -- it depends on how well they have been trained to recognize the current data pattern.

One last point -- many or most of us acknowledge there is a "disconnect" between back testing results and live trading.  I suspect this is caused by live trading using tick data, compared with back testing using bar data.  Tick and bar data are two completely different beasts.  Considering the disconnect introduced by using tick data, then why would we be concerned about the relatively minor differences in *.csv data from different brokers when generating / training strategies?  Whatever differences may exist (in broker *.csv data) will eventually be overwhelmed by the differences between ticks and bars.

I'm curious whether any of this makes sense...

2 (edited by Irmantas 2018-03-15 21:22:49)

Re: Using Your Broker's Data -- Yes or No?

Hi,
Interesting points here. Will try to add something from my side. Now for me one of the main filters in the strategy testing is to check if strategy behaves similarly on the different brokers data (about 2-3 brokers data sets on the same data horizons) and find that results are not changing drastically. If one data sample shows steep PL curve, and on other it gets drastically worse, I think that strategy got curved around too much and picked up on something too small than "major repeatable market structure", and probably live unseen data will be sure miss too. For me it makes sense, why some great strategy should fail with different brokers data on the same time horizon period?

many or most of us acknowledge there is a "disconnect" between back testing results and live trading.  I suspect this is caused by live trading using tick data, compared with back testing using bar data

I think that is not the case. Every strategy will have some shortfall in live trading (unless you created trend following strategy and you are lucky to get unseen live data trending even more stronger than back testing smile ) Real question is by how much you will get that shortfall and if your strategy still will be profitable? That disconnect is really called "data mining bias" or "curve fitting/over-fitting". There are some tests that promise to fight this: Monte Carlo,  Multi Market, Walk forward analysis, System Parameter Permutation ... Still experimenting about these methods and searching for answer. But this one is not some easy question to be answered quickly ... For now I have found that MonteCarlo (variable shuffling and data randomization) has some positive effect. Walk Forward can be the one too, but this is hard test to pass, also I do not have conclusion about it effectiveness with big portfolio and longer real time period trading (stopped trading these systems on the drawdawn, probably too quickly). For now I am still working hard with System Parameter Permutation / Parameter Sweep test variation, this one is very promising, but unbelievable slow... still want to get more data to draw some conclusions.

With FSB we get pretty much accurate backtest, let it be bar opening, or tick entries. How mr. Popov or Footon would say "accurate interpolation method" smile Unless, there is increased spread in the real market, FSB does not take it into account. And I am sure it is causing some shortfall to the end PL results. I am recommending to check if your great unbelievable steep PL strategies is not trading midnight spread too much, or big break out events like NFP news releases too frequently smile

Re: Using Your Broker's Data -- Yes or No?

Irmantas -- thanks for sharing your perspective.  I enjoy learning about how others are approaching their forex trading -- it helps me a lot.

I had a few more comments to add...

Comparing results against different brokers -- yes, I agree.  If a strategy's back test results differ drastically from broker to broker then that is not a good sign.  However, though that may happen from time-to-time, I haven't seen it too often.

I used to not be a fan of Monte Carlo, but now I am and I do appreciate how it can help.  And Popov's implementation in FSB is excellent and easy to use.  However, I'm not so sure about Multi-Market or OOS (especially when data chunks are contiguous).

I think my primary motivation for bringing up this topic is I'm sort of lazy.  I use several brokers and I really don't want to create different portfolio EAs that differ only in which broker data was used for optimization / training.  So, perhaps I'm looking for an excuse to simplify things and just use one data source.

With regards to tick versus bar data -- I don't understand what "accurate interpolation" means.  I mean, I know what interpolation is, but unless FSB can predict the future there is no way it can predict the next tick.  Here's an example -- let's suppose I trade H1 and let's suppose there are 2 ticks per second.  That's 7200 ticks / hour (i.e. each bar really represents 7200 ticks) -- and that's conservative since often times there are more than 2 ticks per second.  Also, since several seconds can pass between the time you request to open or close a trade, many more ticks could have occurred.  There is no way FSB can predict those 7200 ticks, and it only takes 1 bad tick to convert your winning strategy into a losing one.

I always use StopLoss and TakeProfit and I think these are particularly sensitive to ticks.  I'm not so worried about TakeProfit, but a single tick (i.e. one inaccurate tick out of 7200) could trigger a StopLoss and then that trade immediately becomes a loser.  And those 7200 ticks represent only a single bar.  What if my trade spans multiple bars -- that's tens of thousands of ticks.  What's the possibility of one bad tick out of 50000 ticks?  Do you know what I mean?

Hannahis keeps emphasizing M1 strategies -- and though that is too short for me she does have a point that since M1 bars are composed of fewer ticks then they are more likely to mirror the real tick data.  But the trade-off is M1 bar data is more "noisy" and less accurate than longer time frames.

4 (edited by Irmantas 2018-03-16 11:32:13)

Re: Using Your Broker's Data -- Yes or No?

It does not matter how much ticks you had in the bar, or what order it was ticking, it matters that price was in specific point in that bar and it could trigger your strategy entry/exit price which was calculated from the last bar value. Unless there are some scenarios when it is not clear which price was firstly touched in same bar when TP SL or entry/exit point involved, but FSB warns you about it with ambiguous bars and it should be taken seriously that you got probably not trusty backtest in these marked bars. Probably mr Popov or mr Footon could explain here better.

How I said it can be some cases that FSB backtested entries/exits do not match exactly with real market trading: 1. Spread 2. Slippage 3. Mistake/difference in specific indicators code between FSB and MQL , if later is the case there is a need to rise this question and wait for a fix smile
You always can and actually it is mandatory to check couple real time entries with updated same broker data in FSB and check if they match. What I found that 95% of the time they match perfectly. So backtest is accurate, but it does not mean that it will produce same steep PL curves like backtest did for a reason what I mentioned earlier - "data mining bias". And to get method which completely deals with this (exact same steep profits which was backtested match with real time market profits) is probably worth some trillions dollars. But there is some methods which promises to reduce it impact.

Some point about using OOS check. I have read somewhere that is pretty much the same if you use OOS or all data for strategy creation. Using more data compensates using out of sample check to pass strategy. End results is pretty much the same. I did some experiment in this using two OOS, and found that using first OOS is not productive, it does not warn you about your 2nd OOS potential results, it was random. For now I am using OOS only for method effectiveness research, to skip long demoing part. I totally exclude last year from strategy creation and decision making. After some portfolio or strategy made by some researchable question I check if this method would be profitable on unseen last year data chunk.

Re: Using Your Broker's Data -- Yes or No?

A clarification -- I wasn't claiming the number of ticks mattered, simply that a single bad tick (e.g. one out of 50000) could trigger a StopLoss.  Back testing only deals with bars and, so, the thousands of ticks that compose a bar are "hidden" and not taken into account.  In other words, when using bar data you wouldn't necessarily see the effect of a bad tick -- but you certainly would when trading in a live account.  And that's what I think accounts for much of the "disconnect" between back testing and live trading -- at least for longer time frames.  Just to be clear -- I have not studied this in detail, so this is only my impression.

You bring up good points about Spread, Slippage and the difference between C# and mql4.  When trading longer time frames (e.g. > H1) is spread and slippage really that important?  I mean, they may decrease your profit but would they really convert a winning strategy into a losing one?  I've never looked at it that closely and I'm curious to learn more details.  FSB uses indicators written in C# and those are used during back testing.  When strategies are exported to MT4 then that code is converted to mql4.  I think the conversion is probably straight forward most of the time but you are right -- there could be subtle differences that cause a strategy to trade differently in a live account.  Unfortunately, I don't think there is any way of verifying code conversion from C# to mql4 other than examining the source code line-by-line.  I doubt it rarely is a problem, but it is something to be aware of.

I have not been comparing live bars with the same broker data in FSB, so I'm glad to learn you have found they match close to 95% of the time.  That is good to know and gives me more confidence.

Regarding OOS -- yes, I would tend to agree.  I used to experiment with it more frequently but not so much anymore.  If the two chunks of data are contiguous then from a statistical standpoint it has little benefit.  Only if the data chunks come from unrelated data horizons are they helpful for determining "robustness".  For now, I mostly rely on the Monte Carlo test and back testing using data from different brokers (same data horizon).

Re: Using Your Broker's Data -- Yes or No?

Midnight spread problem is specific to broker and strategy. Below are example on my recent H1 timeframe generated strategy, which picked up on trading almost all entries at 00:00 time. Indicators somehow picked that gap/spreading action, and profits are total illusion. To see these things after some time in real money account can be pretty expensive. Also I do not use H4 D1 bar opening strategies for same reason, unless you are sure that your not paying these insane spreads for your entry everytime. In the strategy below I added time filter, which do not allow to trade at midnight,  and all good results are gone.

https://s18.postimg.org/57c654iut/midnight_spread.png

For other types of strategies, which do not trade this midnight fake price action, spread/slippage effect is probably much lower. But it can be present in some cases, for example, if your strategy trades huge break out spike, it might be that these price spikes are caused by important news like FOMC or NFP (at these trading times spreads get a lot bigger), and if you check your entries with economic calendar and see if you have lots of winners on these news time... you can suspect that backtested profits might be not accurate.

Hidden bad ticks ... I think you refer to events that happens in single bar, when trade outcome depends on lower timeframe price action, which bar area was firstly touched -> TP or SL. Image below what I mean.  On these cases FSB goes to lower timeframe, and checks price action there to see what was happening inside that bar, and if there is still no proof which price was first touched in that bar, you get ambiguous bar warning. So nothing here hidden or bad... all ticks are in the bounds of the bar high and low. Please correct me someone if you see that I am talking false nonsense here smile

https://s18.postimg.org/lnkmopmut/example.png

Have a good weekend.

Re: Using Your Broker's Data -- Yes or No?

Irmantas, you are correct. I'll only add that the Pessimistic algorithm (default) will hit the SL. This is done in order to prevent of showing overestimated results. On the other hand, the Optimistic algorithm will hit the TP, which will be too goot to be true every time. Anyway, just to be sure, when you see Ambiguous bars, it is a good idea to pass your strategy via the Method Comparator.

8 (edited by GD 2018-03-18 17:37:49)

Re: Using Your Broker's Data -- Yes or No?

1. I think there is an indicator for maximum spread
2. In my experience, different brokers with same horizon good strategies win.
3. With FSB pro I Use M15.
4. I recommend to try to use add, reverse.
5. Sleytus can use FSB Pro Journal in his MT4Tracker to create statistics except detailed statement
6. How to create Portfolio can be an interested talk
7. An indicator which calculates equity in fsb pro can be interested in combination with other indicators.

Just some information...

Re: Using Your Broker's Data -- Yes or No?

Irmantas -- thank you so much.  Your explanation of "Hidden bad ticks" was excellent.  I think I finally understand.

With regards to the midnight price action -- you refer to a time filter.  Are you using the "Entry Time" indicator?

I just now started experimenting with the "Entry Time" indicator and finding that it frequently *improves* the statistics of many of my strategies -- e.g. the Win / Loss ratio goes up 5 - 10%.  I'm using an Entry Time of [4,22].  Why do you think it improves the stats?  From your example, when you apply a time filter it removes the midnight price action and your stats go down.  Why do you think mine go up?

And now I think you've answered my original question about brokers.  If a strategy includes an indicator such as "Entry Time" then you really must optimize using different broker data for each account.  As a US client I have to use foreign brokers -- so I'm using brokers around the world -- EU, South Pacific, etc.  And their time zones are very different.  So, a strategy that was optimized using data from a broker in EU may not perform as well with a broker located in the South Pacific.

GD -- thanks for your recommendation about the indicator that checks for maximum spread.  I will give that a try.  With regards to MT4Tracker and FSB Pro Journal -- FSB Pro already provides stats for each strategy, so I'm not sure what more MT4Tracker could do.  Also, MT4Tracker was intended to be used for portfolios, and FSB Pro journal shows the results for only one strategy.

Re: Using Your Broker's Data -- Yes or No?

Hi,
Thanks GD for mentioning spread filter. Yes, it helps to prevent entries in increased spread time, but only in real time trading. And it is good idea to not forget to add this indicator for each strategy big_smile It seems that I am not doing this right now... I wanted to mention times when FSB is not so accurate with backtest and rise awareness for other users. In my example I wanted to isolate few trading hours during increased midnight spread, and show that profits was not real.

Sleytus, I am heavy user of time filter. Not all day is the same in forex trading activity. Asia session is a lot calmer, rangy, huge price movements is rare in these time, and it seems you filtered out these times with your 22-04 filter (to be exact it depends on timezone, but for GMT+2 it seems Asian ranginess is cut off). It means that calmer price action is not so profitable with your strategies. So why not skip some losses? During London and US session trading is a lot more active. It is very logical to filter out calmer/loosing market hours and leave more active trading hours what are suitable for your strategy. With some strategies you can find that opposite is true, and calmer Asian session market is more suitable. Also you can isolate only one or few hours in a day, like London Open time, or US open session, and you will find certain strategies suitable for that market volatility. In short "time filter" opens  some possibilities what you will not find without using it. If you are lazy not to reoptimize strategies, and have some reasoning for this, you always can shift few hours manually in time filter for timezone compensation.

Re: Using Your Broker's Data -- Yes or No?

Irmantas wrote:

Hidden bad ticks

This was an excellent explanation and I now understand how lower time frames can be used to determine whether Max or Min occurs first.  However, I think there still is a problem with bad ticks -- and it is because several seconds and many ticks occur between the time your strategy receives a signal and a transaction is actually completed with a bank.

Let's suppose a bad tick occurs that is a Maximum and creates a TakeProfit signal.  But the actual price used by the bank will likely be much different because dozens of ticks can occur between the signal and completing the transaction.  So, whereas your strategy thinks it just won, it actually loses.

When the price action is relatively stable, then it probably doesn't make much difference if the price that caused the signal is a little bit different than the price used by the bank to close the transaction.

However, I'm not sure that FSB-generated strategies really protect us from bad ticks -- which contribute to the disconnect between back testing and live trading.  I won't claim that bad ticks account for all the disconnect, but I suspect they do contribute. 

I don't think this falls under the category of slippage -- because slippage is a naturally occurring difference between the price used to generate a signal and the price used to close a transaction.  Since slippage is not taken into account during back testing then it, too, can contribute to the disconnect.  In this thread I was primarily referring to bad ticks.  So, I guess what I'm suggesting is that since slippage and bad ticks contribute to the disconnect, and since they may have a bigger (negative) effect than using different broker data (assuming your strategies do not include a time filter), then maybe it is not so important to worry about using data from a particular broker because other factors have a bigger effect.

Re: Using Your Broker's Data -- Yes or No?

Hi,

and it is because several seconds and many ticks occur between the time your strategy receives a signal and a transaction is actually completed with a bank.

In my experience this time is about ~200ms , so not very much ticks. If you get some, you get slipped, and with ECN good execution broker I sometimes get positive slippage. So in the end it pretty much evens out. If you are seeing higher execution times, like couple seconds and bigger slippages you definitely need to change your broker to better one, because it is stealing from you... I had used some nasty brokers, witch sometimes putted some artificial lag and slipped like 10 pips in some cases. You need to avoid these for all costs. You can insert some mql code to EAs output executions times and slippage after each order. I was doing it in old mql days... And found some surprises with some brokers, lag and slippage was a norm for them.  I can recommend some ECN broker what I think is good, but not sure what they say about USA users. If interested please PM me.

Mr. Popov, it would be very nice to get : Execution time, Slippage and Spread output in the log after each entry/exit executed in new FSB versions. It would be tremendously useful to catch NASTY brokers.

Re: Using Your Broker's Data -- Yes or No?

Mr. Popov, it would be very nice to get : Execution time, Slippage and Spread output in the log after each entry/exit executed

Very reasonable request.  I'm adding it to my ToDo.  Thank you!

Re: Using Your Broker's Data -- Yes or No?

An interesting idea and got me to thinking.

I can create an indicator (for free) that will chart two values:
(1) the latency (i.e. delay) between the time a close signal is received and the time the position is actually closed
(2) the delta (i.e. difference) in price at the time a close signal is received and the price used by the broker (or bank) when the transaction is completed

You would run this indicator in both a Demo and Live account.  You would expect the latency and delta price charts to be very similar but, if not, then it may warrant further investigation.

Re: Using Your Broker's Data -- Yes or No?

I think you need to download the data from single broker's database. The time difference might effect the result of the strategy if you will download the data from different broker sources. By the way, best of luck for your future trades. Thank You!

16 (edited by Lagoons 2018-11-16 19:30:27)

Re: Using Your Broker's Data -- Yes or No?

I'm not so sure what's the best.

My guess it it would be the if your broker has high quality data which you can use for strategy creation and optimization.

Unfortunately only very few broker have high quality data available.

My current broker is IC Markets their execution is great no complains about that.
But the history data they are offering is bad.
I just get 2 months of good 1Minute data and that's it and since I know no way of building a live history database I'm always stuck with the last 2 months of data.

I generate my strategies with Dukascopy data (via Tickstory) but I cannot really compare them with the IC Markets data.
The strategies look similar during the last two months but it's not really satisfying.
Backtest woud not work either since I'd have to use Dukascopy for that also since not enough bars are loaded from my broker to make this work.

And I'm afraid there is another problem; many strategies need a certain amount of bars already loaded into MT4 to work.
But the quality of my broker's past data is qualitywise a joke compared to Dukascopy.

So I'm thinking does it make sense to just open an account with Dukascopy?
I mean it takes so much time and effort to create the strategies and then the (live) data is the problem?

What are your thoughts about it?

Re: Using Your Broker's Data -- Yes or No?

Though people throw the term around, I really don't know what "data quality" means.  Also, I don't know how it actually affects back testing statistics or how one would even begin to quantify its affect.

However, I do know the following:

1. Back testing with MT4 demo and real accounts -- from the same broker -- often yields different results.

2. Back testing with MT4 demo accounts from different brokers often yields different results.

3. DocZero demonstrated (in a different thread) that using the same data, MT4 and FSB-Pro do yield nearly identical results.

Like it or not, the data source makes a difference.  So -- if I'm going to trade a strategy with broker XYZ then I'll calibrate the strategy using data from my XYZ *real* account.  If I'm going to trade a strategy with broker QRS then I'll calibrate the strategy using data from that broker.

I keep separate folders for different brokers -- the folders contain the same strategies, but calibrated against data from that broker.

It's definitely redundant and sort of a pain -- but, you know, there's a very fine line between success and failure and it would be a shame to allow a strategy to trade poorly simply because it was calibrated using the wrong data source.

18 (edited by Lagoons 2018-11-16 21:47:31)

Re: Using Your Broker's Data -- Yes or No?

sleytus wrote:

Though people throw the term around, I really don't know what "data quality" means.  Also, I don't know how it actually affects back testing statistics or how one would even begin to quantify its affect.

OK, I better call it data range.
The problem is I have just 2 months of 1 Minute live data.

But I guess I have to deal with it in someway.

Re: Using Your Broker's Data -- Yes or No?

The problem is I have just 2 months of 1 Minute live data.

Ahhh -- yep, I know what you mean.  I have the same problem with the one US broker I use.  So, in that case, I use data from a different broker.  Please don't tell anyone I just contradicted my previous post.

Re: Using Your Broker's Data -- Yes or No?

;-) I'll keep my mouth shut

Re: Using Your Broker's Data -- Yes or No?

The FSB data has been carefully selected from a couple different brokers, and it is worth comparing results because of the simplicity in using it compared to updating from the broker or Dukas.