<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<title type="html"><![CDATA[Forex Software — Academic Paper: Evaluation of Trading Strategies]]></title>
	<link rel="self" href="https://forexsb.com/forum/feed/atom/topic/5640/" />
	<updated>2015-11-08T13:17:23Z</updated>
	<generator>PunBB</generator>
	<id>https://forexsb.com/forum/topic/5640/academic-paper-evaluation-of-trading-strategies/</id>
		<entry>
			<title type="html"><![CDATA[Re: Academic Paper: Evaluation of Trading Strategies]]></title>
			<link rel="alternate" href="https://forexsb.com/forum/post/32221/#p32221" />
			<content type="html"><![CDATA[<p>Once I was thinking what will happen if Moses never delivered God&#039;s rules to men.<br />After That I got what is Trading.</p>]]></content>
			<author>
				<name><![CDATA[GD]]></name>
				<uri>https://forexsb.com/forum/user/8542/</uri>
			</author>
			<updated>2015-11-08T13:17:23Z</updated>
			<id>https://forexsb.com/forum/post/32221/#p32221</id>
		</entry>
		<entry>
			<title type="html"><![CDATA[Re: Academic Paper: Evaluation of Trading Strategies]]></title>
			<link rel="alternate" href="https://forexsb.com/forum/post/32216/#p32216" />
			<content type="html"><![CDATA[<p>Also this:</p><div class="quotebox"><blockquote><p>A group of researchers have taken issue with the misuse of quantitative techniques, claiming there is a lack of stringent mathematical diligence in testing investment strategies. The paper, &quot;Pseudo- mathematics and financial charlatanism&quot;, goes so far as to equate the effects of backtest overfitting on out of sample performance to fraud, though with the caveat that such actions could be inadvertent.</p><p>Irrespective of its economic and financial viability, a strategy in its first stages is unproven. To help determine real-world performance, it undergoes backtesting: multiple simulations of the strategy are run against historical financial to derive a host of performance metrics. In the absence of scrupulous due diligence, the metrics may not indicate an optimal strategy at all, rather are biased by statistical flukes or false positives.</p><p>When a strategy or its parameters are tweaked multiple times against the same initial data set, or the in sample set, the problem of overfitting, or confusing noise with signal, arises. David Bailey, a research fellow at the Department of Computer Science in the University of California and one<br />of the four authors of the paper, explained that &quot;when a statistical test is applied multiple times, the probability of obtaining false positives increases as a function of the number of trials&quot;.</p></blockquote></div><p><a href="http://www.automatedtrader.net/articles/case-study/152931/fiddling-with-figures">http://www.automatedtrader.net/articles … th-figures</a></p>]]></content>
			<author>
				<name><![CDATA[vinoc]]></name>
				<uri>https://forexsb.com/forum/user/8764/</uri>
			</author>
			<updated>2015-11-08T09:19:20Z</updated>
			<id>https://forexsb.com/forum/post/32216/#p32216</id>
		</entry>
		<entry>
			<title type="html"><![CDATA[Academic Paper: Evaluation of Trading Strategies]]></title>
			<link rel="alternate" href="https://forexsb.com/forum/post/32214/#p32214" />
			<content type="html"><![CDATA[<p>Has anyone here come across the research paper entitled, &quot;Evaluation of Trading Strategies&quot; (Campbell R. Harvey and Yan Liu, Aug 2014)? Has this been looked into and considered by the creators of Forex Strategy Builder?</p><p><a href="https://faculty.fuqua.duke.edu/~charvey/Research/Published_Papers/P116_Evaluating_trading_strategies.pdf">https://faculty.fuqua.duke.edu/~charvey … tegies.pdf</a></p><p>I think the paper has some important implications when analysing the viability of a strategy. Here are some snips of it:</p><div class="quotebox"><blockquote><p>TWO VIEWS OF MULTIPLE TESTING</p><p>There are two main approaches to the multipletesting<br />problem in statistics. They are known as the<br />family-wise error rate (FWER) and the false discovery<br />rate (FDR). The distinction between the two is very<br />intuitive.</p><p>In the family-wise error rate, it is unacceptable to<br />make a single false discovery. This is a very severe rule<br />but completely appropriate for certain situations. With<br />the FWER, one false discovery is unacceptable in 100<br />tests and equally as unacceptable in 1,000,000 tests. In<br />contrast, the false discovery rate views unacceptable in<br />terms of a proportion. For example, if one false discovery<br />were unacceptable for 100 tests, then 10 are unacceptable<br />for 1,000 tests. The FDR is much less severe than<br />the FWER.</p><p>Which is the more appropriate method? It depends<br />on the application. For instance, the Mars One foundation<br />is planning a one-way manned trip to Mars in 2024<br />and has plans for many additional landings.12 It is unacceptable<br />to have any critical part fail during the mission.<br />A critical failure is an example of a false discovery (we<br />thought the part was good but it was not—just as we<br />thought the investment manager was good but she was<br />not).</p><p>The best-known FWER test is called the Bonferroni<br />test. It is also the simplest test to implement. Suppose<br />we start with a two-sigma rule for a single (independent)<br />test. This would imply a t-ratio of 2.0. The interpretation<br />is that the chance of the single false discovery is only<br />5% (remember a single false discovery is unacceptable).<br />Equivalently, we can say that we have 95% confidence<br />that we are not making a false discovery.</p><p>Now consider increasing the number of tests to 10.<br />The Bonferroni method adjusts for the multiple tests.<br />Given the chance that one test could randomly show up<br />as significant, the Bonferroni requires the confidence<br />level to increase. Instead of 5%, you take the 5% and<br />divide by the number of tests, that is, 5%/10 = 0.5%.<br />Again equivalently, you need to be 99.5% confident<br />with 10 tests that you are not making a single false discovery.<br />In terms of the t-statistic, the Bonferroni requires<br />a statistic of at least 2.8 for 10 tests. For 1,000 tests, the<br />statistic must exceed 4.1</p></blockquote></div><p>......</p><div class="quotebox"><blockquote><p>There are usually more discoveries with BHY.<br />The reason is that BHY allows for an expected proportion<br />of false discoveries, which is less demanding than<br />the absolute occurrence of false discoveries under the<br />FWER approaches. We believe the BHY approach is the<br />most appropriate for evaluating trading strategies.</p></blockquote></div><p>......</p><div class="quotebox"><blockquote><p>FALSE DISCOVERIES AND MISSED<br />DISCOVERIES</p><p>So far, we have discussed false discoveries, which<br />are trading strategies that appear to be profitable—but<br />they are not. Multiple testing adjusts the hurdle for significance<br />because some tests will appear significant by<br />chance. The downside of doing this is that some truly<br />significant strategies might be overlooked because they<br />did not pass the more stringent hurdle.</p><p>This is the classic tension between Type I errors<br />and Type II errors. The Type I error is the false discovery<br />(investing in an unprofitable trading strategy). The Type<br />II error is missing a truly profitable trading strategy.<br />Inevitably there is a tradeoff between these two errors.<br />In addition, in a multiple testing setting it is not obvious<br />how to jointly optimize these two types of errors.</p><p>Our view is the following. Making the mistake of<br />using the single test criteria for multiple tests induces a<br />very large number of false discoveries (large amount of<br />Type I error). When we increase the hurdle, we greatly<br />reduce the Type I error at minimal cost to the Type II<br />(missing discoveries).</p></blockquote></div><p>......</p><div class="quotebox"><blockquote><p>In our case, the original p-value is 0.004 and when<br />multiply by 200 the adjusted p-value is 0.80 and the<br />corresponding t-statistic is 0.25. This high p-value is significantly<br />greater than the threshold, 0.05. Our method<br />asks how large the Sharpe ratio should be in order to<br />generate a t-statistic of 0.25. The answer is 0.08. Therefore,<br />knowing that 200 tests have been tried and under<br />Bonferroni’s test, we successfully declare the candidate<br />strategy with the original Sharpe ratio of 0.92 as insignificant—the<br />Sharpe ratio that adjusts for multiple tests<br />is only 0.08. The corresponding haircut is large, 91%<br />(= (0.92 – 0.08)/0.92).</p><p>Turning to the other two approaches, the Holm<br />test makes the same adjustment as Bonferroni since the<br />t-statistic for the candidate strategy is the smallest among<br />the 200 strategies. Not surprisingly, BHY also strongly<br />rejects the candidate strategy.</p><p>The fact that each of the multiple-testing methods<br />rejects the candidate strategy is a good outcome because<br />we know all of these 200 strategies are just random numbers.<br />A proper test also depends on the correlation among<br />test statistics, as we discussed previously. This is not an<br />issue in the 200 strategies because we did not impose any<br />correlation structure on the random variables. Harvey<br />and Liu [2014b] explicitly take the correlation among<br />tests into account and provide multiple-testing-adjusted<br />Sharpe ratios using a variety of methods.</p></blockquote></div><p>......</p><div class="quotebox"><blockquote><p>Recent research by López de Prado and his coauthors<br />pursues the out-of-sample route and develops a<br />concept called the probability of backtest overfitting<br />(PBO) to gauge the extent of backtest overfitting (see<br />Bailey et al. [2013a, b] and López de Prado [2013]).<br />In particular, the PBO measures how likely it is for a<br />superior strategy that is fit IS to underperform in the<br />OOS period. It succinctly captures the degree of backtest<br />overfitting from a probabilistic perspective and should<br />be useful in a variety of situations.</p><p>To see the differences between the IS and OOS<br />approach, we again take the 200 strategy returns in<br />Exhibit 2 as an example. One way to do OOS testing<br />is to divide the entire sample in half and evaluate the<br />performances of these 200 strategies based on the first<br />half of the sample (IS), that is, the first five years. The<br />evaluation is then put into further scrutiny based on the<br />second half of the sample (OOS). The idea is that strategies<br />that appear to be significant for the in-sample period<br />but are actually not true will likely to perform poorly<br />for the out-of-sample period. Our IS sample approach,<br />on the other hand, uses all ten years’ information and<br />makes the decision at the end of the sample. Using the<br />method developed by López de Prado and his coauthors,<br />we can calculate PBO to be 0.45.18 Therefore, there is<br />high chance (that is, a probability of 0.45) for the IS best<br />performer to have a below-median performance in the<br />OOS. This is consistent with our result that based on<br />the entire sample, the best performer is insignificant if<br />we take multiple testing into account. However, unlike<br />the PBO approach that evaluates a particular strategy<br />selection procedure, our method determines a haircut<br />Sharpe ratio for each of the strategies.</p><p>In principle, we believe there are merits in both<br />the PBO as well as the multiple-testing approaches. A<br />successful merger of these approaches could potentially<br />yield more powerful tools to help asset managers successfully<br />evaluate trading strategies.</p></blockquote></div><p>......</p><div class="quotebox"><blockquote><p>The multiple-testing problem greatly confounds<br />the identification of truly profitable trading strategies<br />and the same problems apply to a variety of sciences.<br />Indeed, there is an influential paper in medicine by<br />Ioannidis [2005] called “Why Most Published Research<br />Findings Are False.” Harvey et al. [2014] look at 315 different<br />financial factors and conclude that most are likely<br />false after you apply the insights from multiple testing.</p><p>In medicine, the first researcher to publish a new<br />finding is subject to what they call the winner’s curse.<br />Given the multiple tests, subsequent papers are likely<br />to find a lesser effect or no effect (which would mean<br />the research paper would have to be retracted). Similar<br />effects are evident in finance where Schwert [2003]<br />and McLean and Pontiff [2014] find that the impact of<br />famous finance anomalies is greatly diminished out-ofsample—or<br />never existed in the first place.</p><p>So where does this leave us? First, there is no reason<br />to think that there is any difference between physical<br />sciences and finance. Most of the empirical research<br />in finance, whether published in academic journals or<br />put into production as an active trading strategy by an<br />investment manager, is likely false. Second, this implies<br />that half the financial products (promising outperformance)<br />that companies are selling to clients are false.</p><p>To be clear, we are not accusing asset managers of<br />knowingly selling false products. We are pointing out<br />that the statistical tools being employed to evaluate these<br />trading strategies are inappropriate. This critique also<br />applies to much of the academic empirical literature in<br />finance—including many papers by one of the authors<br />of this article (Harvey).</p><p>It is also clear that investment managers want to<br />promote products that are most likely to outperform in<br />the future. That is, there is a strong incentive to get the<br />testing right. No one wants to disappoint a client and no<br />one wants to lose a bonus—or a job. Employing the statistical<br />tools of multiple testing in the evaluation of trading<br />strategies reduces the number of false discoveries.</p></blockquote></div><p>What are your thoughts on this?</p>]]></content>
			<author>
				<name><![CDATA[vinoc]]></name>
				<uri>https://forexsb.com/forum/user/8764/</uri>
			</author>
			<updated>2015-11-08T09:12:26Z</updated>
			<id>https://forexsb.com/forum/post/32214/#p32214</id>
		</entry>
</feed>
