When investment managers launch a new systematic fund, they often show hypothetical performance data in the marketing literature. Here, we find evidence that this data can be significantly over-optimistic compared to subsequent realised performance. We therefore conclude that hypothetical performance data should be approached with caution, and not deemed comparable to a real track record.
Hypothetical performance comes in various guises: backtests, simulations, or pre-inception performance data. Various rules exist regarding its use, and when published it tends to be followed by disclaimers highlighting the fact that the results are not the product of actual trading . But even so, hypothetical performance is hard to totally ignore when it contains relevant information for evaluating an investment strategy.
The task of navigating hypothetical track records is not restricted to the potential clients of investment managers, but also the “quants” that are tasked with designing investment systems. Researching strategies, or building a portfolio, will usually involve backtesting a system on historical data to evaluate and estimate its performance. There are many reasons why such results can be misleading:
- Practicalities: The simulation may not take into account unforeseen costs or restrictions. These could include market opening hours; the liquidity available; the precise details of how a portfolio is margined; and data feeds not available at the time or containing errors which have since been corrected in historical databases.
- Bias: A number of different rules or many different parameters may have been tested during the research process. Selecting the rules with the best historical performance could lead to selection bias or overfitting.
- Changing markets: There is no guarantee that markets will continue to behave as they have in the past and markets can change in response to the actual system being tested.
- Costs: Transaction costs are very hard to model, and even gaining realistic estimates of historical brokerage fees, taxes, and funding costs can be tricky.
- Sample error: A limited amount of past data results in only an estimate of the true underlying population statistics of a set of returns. In which case, a hypothetical track record may contain more than its fair share of luck.
- Black swans: Events that have never been seen before in past data may occur in the future.
A previous study analysed performance data within the context of the ETF industry , and found clear evidence that, on average, hypothetical performance of new indices (which the ETFs were designed to track) was optimistic and misleading.
The study found that 370 indices had an average annualised outperformance of 10.3% pre-inception relative to the MSCI US Broad Market Index, but would go on to underperform by 0.9% per annum after launch.
Here, we focus on CTA hypothetical track records, and we find evidence that these also display unrealistic levels of outperformance relative to a common CTA benchmark. We start by outlining our dataset and methodology, we then present our results, and conclude with a discussion.
Data and methodology: funds
We collect hypothetical track records and live performance from marketing literature and reports for over 40 funds. We only use funds with at least six months of both pre-inception and live performance data, up to and including June 2013, and claim to be following a trend-following strategy on futures markets for a significant part of their portfolio. We verify this by insisting that they have a correlation of at least 50% to the Barclay BTOP50 Index. This reduces our dataset to 18 funds.
For each of these, we use data from five years before inception up to two years after, as availability dictates. Most of our data is for relatively new funds ‒ only half have more than two years of live data. In Figure 1, we align the fund returns at their inception dates, and plot the cumulative of the individual track records, as well as the average. The inception date marks the point when the track records switch from hypothetical to actual returns. Here we can already see a propensity for the pre-inception data to outperform the live data.
Figure 1: Cumulative returns of 18 CTA funds, pre- and post-inception
Data and methodology: benchmark
Although the CTA industry is aiming for absolute returns, it is important to compare the track records to a benchmark to account for changes in performance over time that are not due to any of the issues already discussed in the introduction.
We compare the track records to the Barclay BTOP50 Index. This widely used industry benchmark is an equally weighted portfolio of the largest CTAs, which between them account for at least 50% of the industry assets, with 20 constituents in 2013. We estimate the average volatility of the constituents to be 11.5% annualised, based on data from the Barclay Hedge database for the past 10 years. Note that the volatility of the index will be lower than this, due to the diversification effect of adding the funds' returns together. When comparing the performance of a single fund to this index, it is the average volatility of the constituents which we should match, hence we have risk-adjusted the individual track records to have annualised volatilities of 11.5%.
The performance of the benchmark is shown in Figure 2. The average return for the Barclay BTOP50 Index for the 20-year period July 1993 to June 2013 is 5.9%, annualised. As Figure 2 shows, the performance of this benchmark has decreased over time, with an average annualised return of 0.2% over the past five years, which is consistent with other studies into the performance of trend-following strategies over time .
Figure 2: Cumulative monthly performance of CTA benchmark
Figure 1 shows an average annualised return of 16.9% pre-inception, but only 3.8% in the live period. But, as Figure 2 shows, trend-following industry performance has not been consistent over time. To remove this effect from our data, we subtract benchmark returns from fund returns and average the outperformance in Figure 3. We find that the optimistic outperformance of 11.5% falls to 1.5% in live trading, and that only 12 out of 16 funds continue to outperform. We assess the significance of this result in two ways. First we consider whether the pre-inception and live results are consistent with each other. That is, what are the chances of a 10% performance difference between a five and two-year period, if the true underlying average return is the same in both parts of the data? We use Welch's t-test to evaluate this probability, and we find the difference of 10% has a t-statistic of 5.3, giving a p-value of less than one in a million.
Second, we use a Monte Carlo simulation to help us assess the significance of our result. We generate random data with similar properties to our 18 funds, but with no expected outperformance. For each fund we take the benchmark for the corresponding time-period, and generate random normal returns with an expected correlation of 50% to the benchmark, the same returns and the same expected annualised volatility. We then subtract the benchmark from these random returns and average over the resulting 18 outperformance series. We repeat this process 1,000 times and plot the simulated results against our actual result in Figure 3.
We find that the pre-inception data is inconsistent with zero outperformance, with none of our 1,000 simulations experiencing such a high level of outperformance. Contrary to this, the live data appears perfectly consistent with these simulations. Therefore we conclude that the pre-inception data is not consistent with the live data.
Figure 3: Cumulative outperformance of CTA funds relative to the benchmark
We have seen that the hypothetical data appears to have different properties to the actual results. We perform several further checks to ensure the robustness of this result. We extend the series, where available, to use 10 years of pre-inception and 5 years of live data. We find that that the results remain as significant; the average annual benchmark outperformance is 10.7% pre-inception over 10 years, and 4.1% live over 5 years, with a t-statistic of 4.7 for the difference (p-value less than 1 in 100,000).
We also check the robustness of our result by selecting only 10 funds at random (from the 18, without replacement) and repeating the analysis comparing their performance to the BTOP50 using five years of pre-inception and two years of live data. Performing 100 such random selections, we find the average pre-inception performance is always better than the live performance, and the difference has t-statistics in the range of 2.4 to 5.4, averaging around 3.8 which has a p-value less than 1in 10,000.
Bias can seep into hypothetical track records in many ways. Some of its sources are obvious, whilst others are more subtle . The CTA track records in this study show there is significant evidence of optimism in the hypothetical results, whilst actual performance is very much in line with average industry returns.
We have not provided details of the funds for which we were able to obtain hypothetical data, because the purpose of this study is not to suggest that specific industry participants have set out to mislead investors. Rather we believe this is evidence of an endemic problem, an opinion that is backed up by the Vanguard study into exchange-traded funds .
There is value to being able to produce a hypothetical track record to understand how a strategy might have performed historically. However, its interpretation should be treated with caution, especially if the assumptions behind it are unclear. Without a full understanding of how a retrospective set of investment decisions were taken, it is safe to assume that any such results will contain a large dose of optimism. After all, it is unlikely anyone would advertise a fund that appears to underperform the industry average in the past, but half of funds will do just that in the future.
 NFA Manual, Interpretive Notices, 9025, www.nfa.futures.org/rulebook/rules.aspx
 J. M. Dickson, S. Padmawar, S. Hammer, Joined at the hip: ETF and index development, Vanguard research, 2012.
 Winton Research, Historical Performance of Trend Following, 2013.
 Winton Research, Blinded by Optimism, 2013
This article contains simulated or hypothetical performance results that have certain inherent limitations. Unlike the results shown in an actual performance record, these results do not represent actual trading. Also, because these trades have not actually been executed, these results may have under- or over-compensated for the impact, if any, of certain market factors, such as lack of liquidity and cannot completely account for the impact of financial risk in actual trading. There are numerous other factors related to the markets in general or to the implementation of any specific trading program which cannot be fully accounted for in the preparation of hypothetical performance results and all of which can adversely affect actual trading results. Simulated or hypothetical trading programs in general are also subject to the fact that they are designed with the benefit of hindsight. No representation is being made that any investment will or is likely to achieve profits or losses similar to those being shown.