Backtesting Done Right
Quantitative Investments
In the first part of this mini-series, we saw how backtesting can be both essential and deceptive: a valuable tool to compress decades of market history into hours of computation, but also vulnerable to biases such as look-ahead errors, survivorship distortions, and data snooping. The lesson was clear—careless backtesting can make weak strategies look strong.
In this second part, we turn to the constructive side of the story. What does it take to design backtests that genuinely inform investment decisions? We outline best practices that help separate meaningful signal from misleading noise, and discuss how to take the step from historical simulation to disciplined live implementation.
Best Practices for Reliable Backtesting
Backtesting is not about chasing the highest in‑sample Sharpe ratio. It is only meaningful if an investment strategy remains economically and statistically sound given realistic data limitations and trading frictions. The following four guidelines help to ensure that historical simulations form a sound basis for evaluating systematic investment strategies:
- Garbage in, garbage out: Make sure that data quality and data integrity are as good as possible. This means that sample data is properly time‑stamped and survivorship‑free, such that any delistings, bankruptcies or corporate actions are preserved. Any fundamental firm characteristics or macro data should be properly lagged to the date of first publication and should exclude any information that was unobservable at the time of trading (Arnott et al., 2019).
- Realistic trading frictions: A historical simulation should be based on realistic assumptions about trading costs (e.g. bid‑ask spread; market impact costs), capacity limits, and management fees. Not taking these costs into account is one of the main reasons why the actual performance of an investment strategy can fall short of the backtracked results.
- Out‑of‑sample testing: Start with a clean separation of the training and testing datasets, i.e. develop the investment strategy based on the training data and evaluate it using an untouched test sample. Ideally, the training window should slide forward period-by-period, so every simulated trade in the backtesting exercise mirrors the decision making process of the live version of the investment strategy as closely as possible (historic paper trading). Be mindful that personal memories of past market developments can influence the development of an investment strategy and, hence, quietly introduce a subtle form of look‑ahead bias. So be careful to make sure that the backtracking exercise does not materially depend on your knowledge about the past. As Arnott et al. (2019, p. 70) remind us, “no true out-of-sample data exist; the only true out of sample is the live trading experience”.
- Robustness checks: Repeat the backtest with alternative portfolio sizes, investment universes, weighting schemes, rebalancing frequencies and sub-sample periods. A robust, well-performing investment strategy should persist across most of these variations, whereas an over‑fitted strategy is unlikely to do so (Bailey & López de Prado, 2014).
By following these practices, one can better recognize whether the apparent advantage of a strategy is likely due to a genuine investment edge or to the idiosyncrasies of a particular data set.
Best Practices to go from Backtesting to Live Trading
A well‑designed backtest is an important, but not sufficient prerequisite for the use of real capital in an investment strategy. The following steps are often advisable to successfully move from backtesting an investment strategy to its live version:
- Paper trading in real time. Execute the investment strategy in parallel with the market, but without risking real money. This allows you to check whether the real-time data feed, the timing of the signals, and the trading characteristics (e.g. turnover) match the expectations from backtesting.
- Gradual capital deployment. Introduce live trades on a modest scale and confirm that the actual implementation costs are in line with those of backtesting and real-time paper trading of the investment strategy. If this is the case, start to gradually scale up real-money investments in the strategy.
- Continuous monitoring of the live strategy. Track the performance, turnover and implementation costs of the investment strategy compared to the expectations from the backtrack analysis and compared to paper trading in real time. Make sure to maintain the investment strategy even in phases of temporary underperformance. If the investment strategy has genuine edge, it will pay off in the medium to long term.
This step-by-step approach aligns the live implementation of the investment strategy with the expectations from the backtest and supports a disciplined implementation of the investment strategy over time.
Conclusion
In Part I, we saw how even a toy example with five stocks could produce wildly misleading results when biases or shortcuts crept in. To avoid such pitfalls in real-world applications, backtesting must be approached with rigor and discipline. The goal is to provide a realistic picture of the long-term performance of an investment strategy. To achieve this, the backtest of an investment strategy needs to:
- Rely on clean, time-stamped, and survivorship-free sample data. This ensures that tomorrow’s information never affects yesterday’s investment decisions.
- Take into account market frictions (e.g. bid-ask spreads) and implementation costs (e.g. management fees). Ignorance of these costs is a major reason why the actual performance of an investment strategy may lag behind the backtest results.
- Refrain from parameter hunting. Robust investment strategies work across a plateau of parameter settings and not on a knife-edge.
Adherence to these principles enables backtesting to replay history as faithfully as possible and largely without illusion. However, even the most rigorous historical simulation is no guarantee of tomorrow’s success. Markets will always find new ways to surprise. What careful backtesting analysis does offer, however, is the well-grounded confidence that the expected performance of an investment strategy is based on evidence and not just the ghostly afterglow of an over-fitted past.
References
Arnott, R., Harvey, C. R., & Markowitz, H. (2019). A backtesting protocol in the era of machine learning. Journal of Financial Data Science, 1(1), 64 74.
Bailey, D., & López de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting and non normality. Journal of Portfolio Management, 40(5), 94 107.
Important Information: The content is created by a company within the Vontobel Group (“Vontobel”) for institutional clients and is intended for informational and educational purposes only. Views expressed herein are those of the authors and may or may not be shared across Vontobel. Content should not be deemed or relied upon for investment, accounting, legal or tax advice.
Results and/or use of backtesting are hypothetical in nature and use historical data without live trading results. Past performance is no guarantee of future performance, and changes in market conditions, liquidity, or execution costs could significantly impact live results. Backtesting carries the risk of overfitting, where a strategy is too closely aligned with past data and fails to adapt to new market conditions. This information is for educational purposes and does not constitute investment advice.
Vontobel makes no express or implied representations about the accuracy or completeness of this information, and the reader assumes any risks associated with relying on this information for any purpose. Vontobel neither endorses nor is endorsed by any mentioned sources.