Much has been made about the poor performance of quantitative equity strategies over the past five years, or so. With a corresponding increase in assets allocated to such strategies, it would seem that overcrowding has led to profit margins being squeezed as markets become more efficient. We caution against drawing conclusions from short time series and emphasise the importance of acknowledging the uncertainty on performance estimates.
In 1993, Fama and French published Common risk factors in the returns on stocks and bonds in which they argued that value stocks earned a risk premium. They showed that an investor can profit from being long high-value stocks and short low-value (or “growth”) stocks.
These ideas appeared to be vindicated a decade or so later when such strategies performed well after the dot-com bubble. This led to a surge in assets following value-based strategies, and those exploiting similar effects such as the size, momentum and quality. These “smart beta” products aim to improve on the performance of a market-capitalisation-weighted index.
Just as quickly as an idea comes into fashion, it can fall out of fashion; especially in the face of a few years of relatively poor performance. In Figure 1, we show the rolling five-year return of the Fama-French Value factor, after we risk adjusted the returns to target an annualised volatility of 10% [1, 2].
Figure 1: Rolling five-year return of the Fama-French Value factor
At first sight, Figure 1 indicates a severe decline in Value factor performance, but there are two questions one should really ask.
1) Are the results significant?
What do we mean by significant? There is a subjective notion of significance, such as whether something has had a real and meaningful effect. But we are talking about significance in a statistical sense; could it be that the performance over this time period is actually consistent with a relatively stable strategy, and the observed decline is no more than a freak fluctuation? Put another way, might the result we see just be down to bad luck?
To investigate this, it is simpler to work with the results from non-overlapping, independent periods, rather than a rolling series. We look at five-year performance, with no shared data between the points. We also compute the sample error that we expect for a five-year return. This is the variation that you would expect in a result just from chance; and not something that would reoccur if history was repeated.
We plot this in Figure 2, on top of the previous rolling performance data, and we extend the period to cover an almost 15-year period from January 2000 to May 2014.
Figure 2: Discrete five-year performance and sample errors versus rolling five-year rolling performance
The red dots mark the results of independent five-year periods, marked in time at the mid-point of the relevant period (the rolling estimates are always backwards looking and so they mark the same result at the end of the five-year period). The horizontal bar spans the period used, and the vertical bar reflects the sample error for these results.
Using the numbers associated with the red points we can now answer the question of whether the decline is significant. In Table 1, we show the difference between the 2nd and 1st points, and then again between the 3rd and 2nd points.
Table 1: Returns and standard deviation of returns for the three discrete periods
We find that the difference between the 1st and 2nd points is significant at the 92% confidence level. That is, if the true underlying performance was the same, the chance of getting a result at least as extreme as this is only 8%. The subsequent decline between the 2nd and 3rd points is less significant.
2) What happens over a longer time period?
Next we need to assess what happens to the result if we use more data. It is easy to zoom in on the part of a chart that backs up ones prejudices, and ignore anything that contradicts it. Data and statistics should be objective, but this is rarely the case in practice.
The Fama-French factor data is available from July 1926, and this allows us to extend Figure 2 to a much longer time frame of 90 years in Figure 3.
Figure 3: Five-year performance and sample errors for 18 discrete periods since 1926
Looking back beyond 2000, we see a different story. Rather than 2005 to 2014 being a significantly poor period, it would appear that recent performance is consistent with an average overall performance of around 5% per annum. Further, no overall decline is observed across the 90-year period, so the longer history does not back up the extrapolation one might make from the last three five-year periods.
If anything was to be marked as anomalous in recent history it would not be the last two points, but the early 2000s when performance was significantly good compared to the 15 years before and the 10 years after. This would suggest that recent performance is now anomalously low, but that the early 2000s performance was anomalously high.
Further tests of stability
The results in Figure 3 look fairly unstable, ranging between -4.6% and 16.6% over the 90-year history. Some scatter would be expected, even for a stable process. In Figure 4, we consider how many points we would expect at different levels from pure chance, assuming they all came from the same underlying distribution.
Figure 4: Expected distribution of performance versus observed distribution of performance
The data shows more small and large returns than would be expected from just chance. This implies the performance of the strategy is not consistent over time.
We have demonstrated how important it is to use all the data you have to draw conclusions about the performance of a system or portfolio. In the case of traditional value strategies, it would appear that recent “poor” performance is consistent with the long-term history of this strategy, and only appears poor when compared to the abnormally high performance seen in the early 2000s.
 Factor data from Kenneth French’s Data library: mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
 We use a backwards-looking, rolling and exponentially weighted 100-day volatility estimate.
This article contains simulated or hypothetical performance results that have certain inherent limitations. Unlike the results shown in an actual performance record, these results do not represent actual trading. Also, because these trades have not actually been executed, these results may have under- or over-compensated for the impact, if any, of certain market factors, such as lack of liquidity and cannot completely account for the impact of financial risk in actual trading. There are numerous other factors related to the markets in general or to the implementation of any specific trading program which cannot be fully accounted for in the preparation of hypothetical performance results and all of which can adversely affect actual trading results. Simulated or hypothetical trading programs in general are also subject to the fact that they are designed with the benefit of hindsight. No representation is being made that any investment will or is likely to achieve profits or losses similar to those being shown.