A recent paper based on analysis by Research Affiliates shows that once research data is analyzed outside the original sample period, about half the factors were “not robust,” according to an article co-authored by RA partner Juhani Linnainmaa, Ph.D.
“Our motivation was the belief that some of the many factors identified have been data mined,” Linnainmaa said in an interview, explaining that researchers analyzing large databases collectively have an incentive to find a factor that works: “They are independently doing these massive amounts of data mining, so it could be that many ‘factors’ that have been ‘found’ are just not real.”
Linnainmaa explains that his study involved using “out-of-sample” research data—that is, data from outside the original research period to test for statistical significance. “The main point of the paper,” he says, “is that even with massive amounts of alpha in the sample, whether we look forward or backward in time outside the original study, we find more than a 50% decline in the performance of a factor…And the fact that it happens both after and before the sample gives us confidence this is probably happening because of data mining and not because, for example, the factor is discovered and then traded on more and more causing the anomaly to go away.”
The paper concludes that there will be “significant decay” in returns on newly discovered factors,” Linnainmaa explained, adding, “when you invest in a newly discovered anomaly, you want to be quite cautious, because it might turn out to be one of the ones that doesn’t hold up in the future.”