Beware the Back Test

Beware the Back Test

By Jack Forehand, CFA, CFP® (@practicalquant) —

Active investment managers have a pretty dismal long-term record. Depending on which study you look at somewhere between 80% and 90% of managers underperform their benchmarks over the long-term net of fees.

But what if I told you there was a way to turn those results upside down? What if there was a world where managers beat their benchmark 90% of the time? What if there was a world where managers knew how to position their portfolio in advance for any market environment and were able to adjust to anything the market throws at them?

I want to be the first to tell you that this world does exist. This nirvana where investment managers beat their benchmarks, are able to limit losses or avoid bear markets entirely, and effortlessly switch between value and growth when the time is right is a reality.

This world is the world of back testing.

Now you may have sensed a hint (or a lot) of sarcasm in the things I said above, and that is for good reason. Back testing is one of the most misused tools in investment management. With the rise of computerized investing and index ETFs which pursue more active investment strategies, we are seeing more and more back tested results accompanying investment products. In fact, much of the evidence that support things like value and momentum, which are used in countless investment products, is based on historical testing of the factors that impact stock prices.

As an investor, however, it’s very important to understand what back tested results mean and what insights (if any) they can provide into the investment strategy they represent.

I don’t want to give the impression that all back testing is bad. Back testing is a very important tool that all of us who use quantitative strategies can utilize to help develop and optimize investment strategies. The problem is that much of the back testing that is presented publicly is flawed in one way or another. And even the best back testing can’t perfectly simulate what it is like to run an actual portfolio.

After many years of running back tests and analyzing the results, I have developed a check list of some of the most important things to look for in evaluating them. Some of these things are technical in nature and are often not disclosed in the end result you see, but all of them are very important in determining the degree of trust that should be given to the output of the tests.

Here are a few key things to look for.

1 – Does the test account for human emotions?

This is obviously a trick question because back tests can’t account for the human element of investing. This human element comes into play in two ways. First, the manager of a strategy is likely to deviate from it at some point in the real world. Even if it is a quantitative strategy, the temptation to alter it when it isn’t working will be strong. Second, the investor in it is also likely to panic and make poor decisions somewhere along the way. These are both likely to cause real world results to be significantly worse than the back test. It is difficult to quantify how much worse, but the more the strategy loses during the down periods and the more it deviates from its benchmark, the bigger the gap is likely to be.

2 – Does the strategy have an economic reasoning behind It?

When you test something, it is very important that the input used for the test relates to the output in some intuitive way. In other words, if you test a strategy that relates valuations to stock prices, that makes sense. If you test phases of the lunar cycle against stock prices, that does not. We have talked to some machine learning experts for our podcast that would disagree with this rule because they sometimes find that signals that make the least sense tend to persist more due to the fact that they aren’t as widely used by other investors, but when it comes to long-term fundamental based strategies, I think this rule still applies.

3 – Does the test cover a very long period of time?

Cycles in the market can last a lot longer than you think. Investing styles can go in and out of favor for long periods of time. It is important that any investment strategy shows that it can perform well across multiple market cycles to ensure its performance isn’t just the result of a trend that won’t persist long-term. If you were to run back tests on the decade ending in 2020, you would likely conclude that value investing is a waste of time and growth investing produces significant outperformance. Looking at 50 years of data (and the past couple of years), however, would yield the opposite conclusion.

Even periods of 20+ years can be problematic. For instance, consider the belief many have that bonds provide a hedge for stocks during market declines. If you look at the past 30 years, you would find strong evidence to support that statement. The major exception to it, though, is that both stocks and bonds can struggle during periods of high inflation. A test that didn’t include the 1970s (and the current period of inflation) would completely miss that.

4 – Does the test show periods where the investment strategy struggles?

If something seems too good to be true, it almost always is. Every investment strategy will go through periods where it doesn’t work. If you see a back test that does not include these periods, you should run, not walk, away from the investment strategy it is being used to advertise. In addition to making sure they exist, periods where a strategy struggles can also teach you a lot about it. For example, if a strategy struggles during inflationary environments and you expect high inflation going forward, you may want to avoid it even if it has a strong back test.

5 – Does it assume knowledge that wasn’t available until after the fact?

This is a common problem in back testing. It is easy to know what happened after the fact. It is easy to know now that value investing didn’t work from 2007 on. It was impossible to know that then. If armed with what I know now I go back and look for indicators that might have told me to avoid value starting in 2007, I could easily find them. But it wouldn’t have been so obvious at the time. Any good back test should only include what could have been known at the time and not information that only became obvious after the fact. This rule can be easy to break without even knowing it since all of us know what happened in the past and we can’t avoid that. For example, if I am testing a bond strategy and developing rules for it, it will be very difficult for me to not include the knowledge that interest rates fell for 40 years starting in 1980 in my thought process.

This is by no means an all-inclusive list and there are many other more technical things (like testing a strategy out of sample and avoiding survivorship bias) that are key to successful back testing. But the most important point here is that all back tests should be viewed with a degree of skepticism. When used properly, back tests can be a useful tool for evaluating the pros and cons of an investment strategy. But more often than not they are instead used as a marketing tool and don’t accurately depict how a strategy will perform in the real world.

Like everything else in investing, a common-sense approach can help to avoid relying on test results that are clearly flawed. If something seems to good to be true, it likely is. There is perhaps no area of investing that applies more to than back testing.


Jack Forehand is Co-Founder and President at Validea Capital. He is also a partner at Validea.com and co-authored “The Guru Investor: How to Beat the Market Using History’s Best Investment Strategies”. Jack holds the Chartered Financial Analyst designation from the CFA Institute. Follow him on Twitter at @practicalquant.