Quantopian’s Academic Paper About In vs. Out-of-Sample Performance of Trading Algorithms

A really good academic paper from guys (and girl) behind Quantopian:

Authors: Wiecki, Campbell, Lent, Stauth

Title: All that Glitters Is Not Gold: Comparing Backtest and Out-of-Sample Performance on a Large Cohort of Trading Algorithms

Link: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2745220


When automated trading strategies are developed and evaluated using backtests on historical pricing data, there exists a tendency to overfit to the past. Using a unique dataset of 888 algorithmic trading strategies developed and backtested on the Quantopian platform with at least 6 months of out-of-sample performance, we study the prevalence and impact of backtest overfitting. Specifically, we find that commonly reported backtest evaluation metrics like the Sharpe ratio offer little value in predicting out of sample performance (R² < 0.025). In contrast, higher order moments, like volatility and maximum drawdown, as well as portfolio construction features, like hedging, show significant predictive value of relevance to quantitative finance practitioners. Moreover, in line with prior theoretical considerations, we find empirical evidence of overfitting – the more backtesting a quant has done for a strategy, the larger the discrepancy between backtest and out-of-sample performance. Finally, we show that by training non-linear machine learning classifiers on a variety of features that describe backtest behavior, out-of-sample performance can be predicted at a much higher accuracy (R² = 0.17) on hold-out data compared to using linear, univariate features. A portfolio constructed on predictions on hold-out data performed significantly better out-of-sample than one constructed from algorithms with the highest backtest Sharpe ratios.

Notable quotations from the academic research paper:

"For the first time, to the best of our knowledge, we present empirical data that can be used to validate theoretical and anecdotal claims about the ubiquity of backtest overfitting and its impact on algorithm selection. This was possible by having access to a unique data set of 888 trading algorithms developed and tested by quants on the Quantopian platform. Analysis revealed several results relevant to the quantitative finance community at large – practitioners and academics alike.

Most strikingly, we find very weak correlations between IS and OOS performance in most common finance metrics including Sharpe ratio, information ratio, alpha. This result provides strong empirical support for the simulations carried out by Bailey et al. [2014]. More specifically, it supports the assumptions underlying their simulations without compensatory market forces to be present which would induce a negative correlation between IS and OOS Sharpe ratio. It is also interesting to compare different performance metrics in their predictability of OOS performance. Highest predictability was achieved by using the Sharpe ratio computed over the last IS year. This feature was also picked up by the random forest classifier as the most predictive feature.

Additionally, we find significant evidence that the more backtests a user ran, the bigger the difference between IS and OOS performance – a direct indication of the detrimental effect of backtest overfitting. This observed relationship is also consistent with Bailey et. al's [2014] prediction that increased backtesting of multiple strategy variations (parameter tuning) would increase overfitting. Thus, our results further support the notion that backtest overfitting is common and wide-spread. The observed significant positive relationship between amount of backtesting and Sharpe shortfall (IS Sharpe – OOS Sharpe) provides support for a Sharpe ratio penalized by the amount of backtesting
(e.g. the "deflated Sharpe ratio" by Bailey & Lopez de Prado [2014]). An attempt to calibrate such a backtesting penalty based on observed data is a promising direction for future research.

Together, these sobering results suggest that a reported Sharpe ratio (or related measure) based on backtest results alone can not be expected to prevail in future market environments with any reasonable confidence.

While the results described above are relevant by themselves, overall, predictability of OOS performance was low (R² < 0.25) suggesting that it is simply not possible to forecast profitability of a trading strategy based on its backtest data. However, we show that machine learning together with careful feature engineering can predict OOS performance far better than any of the individual measures alone. Using these predictions to construct a portfolio of strategies resulted in competitive cumulative OOS returns with a Sharpe ratio of 1.2 that is better than most portfolios constructed by randomly selecting strategies. While it is difficult to extract an intuition about how the Random Forest is deriving predictions, we have provided some indication of which features it deems important. It is interesting to note that among the most important features are those that quantify higher-order moments including skew and tail-behavior of returns (tail-ratio and kurtosis). Together, these results suggest that predictive information can indeed be extracted from a backtest, just not in a linear and univariate way. It is important to note that we cannot yet claim that this specific selection mechanism will work well on future data as the machine learning algorithm might learn to predict which strategy type worked well over the specific OOS time-period most of our algorithms were tested on (for a more detailed discussion of this point, see the limitations section). However, if these results are reproducible on an independent data set or the strategies identified continue to outperform the broad cohort over a much longer time frame, it should be of high relevance to quantitative finance professionals who now have a more accurate and automatic tool to evaluate the merit of a trading algorithm. As such, we believe our work highlights the potential of a data scientific approach to quantitative portfolio construction as an alternative to discretionary capital allocation."

Are you looking for more strategies to read about? Check http://quantpedia.com/Screener

Do you want to see performance of trading systems we described? Check http://quantpedia.com/Chart/Performance

Do you want to know more about us? Check http://quantpedia.com/Home/About

Share onRefer to a friend

Subscribe for Newsletter

Be first to know, when we publish new content

    The Encyclopedia of Quantitative Trading Strategies

    Log in

    - bi-weekly research insights -
    - tips on new trading strategies -
    - notifications about offers & promos -