Today’s abundance of both data and computing power means one is able to quickly conduct backtests of strategies on many markets over long periods of time. However, without appropriate understanding of the data it can be too easy to discover fool’s gold. Here we discuss such an example in the context of historical equity prices.
Historical price data generally comes in two flavours; quotes and trades. Quotes provide details of the price at which people are willing to buy and sell an asset. From these you can see a whole order book, with the most competitive price levels defining the best bid and best ask prices. The difference between these prices is called the bid-ask spread.
Trade prices simply refer to prices at which actual trades took place. Figure 1 displays an example of best-bid and best-ask data, with actual trades that occurred marked by the red dots. The red line defines the “last trade” time series.
Prices must inevitably be restricted to some level of precision, and this precision is decided by the exchange. The resulting minimum possible movement of a price is referred to as the tick size. The bid-ask spread can never be less than one tick.
Historical trade prices are much simpler to obtain than quote prices, as they have been recorded for decades in the case of US stock markets and large futures markets. Quote prices, on the other hand, tend to only be available for more recent history. This is, in part, a by-product of exchanges moving to electronic execution rather than “open out-cry”.
For this reason, it is common to use trade prices to backtest trading ideas. When you do this, you have no knowledge of whether the trades happened at the bid or the ask price. Imagine if we considered only the trade prices in Figure 1. The price would seemingly bounce between two levels. This well-known phenomenon is called the bid-ask bounce and is caused by a participant crossing the spread to trade immediately.
A computer backtest of a naïve short-term mean reversal strategy would produce positive returns, but returns which can never actually be experienced as it would not be possible to buy at the bid and sell at the ask.
We have focused on intraday prices so far, but more generally the bid-ask bounce effect is significant on timescales for which the volatility is comparable to the spread. In the past, this has also had implications for backtests on daily price data.
Several US exchanges began the process of moving to decimalised systems in the late 1990s, abandoning a 200-year tradition of trading in 1/8th of a dollar tick sizes. With ticks, and consequently spreads, at this size, even daily trade prices have the potential to contain the spurious ‘bid-ask-bounce’ signal.
In the following example, we look at a backtest of a short-term mean-reversal strategy trading S&P 500 stocks since 1963, using last trade closing prices.
On each date, we calculate a weekly momentum signal and construct equal-weighted high and low momentum portfolios. Our strategy goes long the low momentum portfolio and short the high momentum portfolio. We plot the cumulative returns, normalised by the volatility, of our naïve backtest in Figure 2.
The performance of our backtest looks exceptional on the early data with a five-year annualised Sharpe ratio of about 4 in the mid-1990s. However, as decimalisation is gradually introduced the results of the backtest start to look less flattering.
To determine whether this result can be wholly attributed to the tick size, we create a Monte Carlo simulation that trades a mean-reversal signal on a portfolio of 500 pretend stocks. The fake returns of these stocks were randomly generated, drawing from a normal distribution with a mean of zero, a standard deviation of 3% and an intra-stock correlation of 8%.
Prices were generated from these returns and discretised to the nearest lower-bound tick level. We applied the same signal that was used in the backtest on real data, ran it over five years’ worth of data and repeated it 10 times at each level of tick size. Figure 3 shows the results.
When we have a decimalised system the backtest performs as expected; since the data is completely random by design the system doesn’t look profitable.
As we discretise the fake prices with larger and larger tick sizes the performance of the backtest improves. With a tick size of $1/8 we were able to generate backtest with an annualised Sharpe ratio of around 4, suggesting that most of the performance in the real-world example can be attributed to the discretisation of prices.
Why does the discretisation of prices produce a successful backtest? Recall that we created prices using random returns with a mean of zero. The continuous price series are unpredictable by design, they are random walks, and so a price-based strategy cannot be successful.
However, the discrete prices have a mean-reverting behaviour that makes them predictable. This predictability is not useful, because when our strategy picks a low-momentum stock to buy, naively we have assumed that we can trade at the last price. In reality, we would have to trade at the pervading ask price, eroding all our apparent profits.
In backtesting the hypothetical performance of trading strategies, one runs the risk of making apparent discoveries that a proper understanding of the data will reveal to be false. If a result looks too good to be true it probably is. In our experience it often indicates a lack of understanding about the data, unrealistic assumptions in the back-test, or a data quality issue.
This article contains simulated or hypothetical performance results that have certain inherent limitations. Unlike the results shown in an actual performance record, these results do not represent actual trading. Also, because these trades have not actually been executed, these results may have under- or over-compensated for the impact, if any, of certain market factors, such as lack of liquidity and cannot completely account for the impact of financial risk in actual trading. There are numerous other factors related to the markets in general or to the implementation of any specific trading program which cannot be fully accounted for in the preparation of hypothetical performance results and all of which can adversely affect actual trading results. Simulated or hypothetical trading programs in general are also subject to the fact that they are designed with the benefit of hindsight. No representation is being made that any investment will or is likely to achieve profits or losses similar to those being shown.
This article contains information sourced from S&P Dow Jones Indices LLC, its affiliates and third party licensors (“S&P"). S&P® is a registered trademark of Standard & Poor’s Financial Services LLC and Dow Jones® is a registered trademark of Dow Jones Trademark Holdings LLC. S&P make no representation, warranty or condition, express or implied, as to the ability of the index to accurately represent the asset class or market sector that it purports to represent and S&P shall have no liability for any errors, omissions or interruptions of any index or data. S&P does not sponsor, endorse or promote any Product mentioned in this material.