Where the results of an experiment are unlikely to have occurred by chance alone.
The Reproducibility Crisis
Increased computing power has necessitated soul-searching about the way statistical methods are applied. This is not an issue limited to quantitative investment management: in science, more generally, there is a live debate about how researchers currently determine whether an experiment’s result is “statistically significant”. Significance testing and p-values – devised for empirical investigation at a time when the potential for testing huge numbers of hypotheses did not exist ‒ are facing increasing scrutiny as researchers fail to replicate huge swathes of research.
There are related questions for quantitative investment managers, as evidenced by a slew of investment products that fail to live up to the performance promised by their simulated returns. Running simulations of investment systems is now computationally as easy as it has ever been, but with greater power comes greater responsibility. The relative ease of running backtests has put an even greater onus on the proficiency and integrity of the research process required to make correct inferences.
Our research process needs to be robust in order to prevent us from falling into a common trap. Overfitting, selection bias, publication bias, survivorship bias, p-hacking: there are many terms associated with the deleterious effect that emerges – almost unavoidably – when a quantitative researcher is free to simulate large numbers of investment systems outside of a rigorous testing framework. These pitfalls have been discussed at length in a past Winton research paper.