Marketing Matrix: Statistics: Don't Make These Mistakes

Posted by Aaron Wheeler

Statistics can be very powerful tools for SEOs, but it can be hard to extract the right information from them. People understandably make a lot of mistakes in the process of taking raw data and turning it into actionable numbers. If you know what potential mistakes can be made along the way, you're less likely to make them yourself - so listen up! Will Critchlow, the co-founder and chief strategist at Distilled, has a few tips on how to use valid and sound statistics reliably. Use a lot of statistics yourself? Let us know what you've learned in the comments below!

Video Transcription

Howdy, Whiteboard fans. Different look this week. Will Critchlow from Distilled. I am going to be talking to you about some ways you can avoid some common statistical errors.

I am a huge fan of the power of statistics. I studied it. I have forgotten most of the technical details, but I use it in my work. We use it a lot at Distilled. But it is so easy to make really easy to avoid mistakes. Most of it comes from the natural way that humans aren't really very good at dealing with numbers generally, but statistics and probability in particular.

The example I like to use to illustrate that is imagine that we have a disease that we are testing for, a very rare disease, suppose. So, we have a population of people, and some small, some tiny proportion of these people have this rare disease. There is just this tiny, tiny sliver at the top. Maybe that is 1 in 10,000 people, something like that. We have a test that is 99% accurate at diagnosing when people have or don't have this disease. That sounds very accurate, right? It's only wrong 1 time in 100. But let's see what happens.

We run this test, and out of the main body of the population, I am going to exaggerate slightly, most people are correctly diagnosed as not having the disease. But 1% of them, this bit here, are incorrectly diagnosed as having the disease. That's the 99% correct but 1% incorrect. Then we have the tiny sliver at the top, which is a very small number of people, and again 99% correct, a small percent are incorrectly told they don't have it. Then, if we just look at this bit in here, zoom in on there, what we see is actually of all the people who are diagnosed as having this disease, more of them don't have it than do. Counterintuitive, right? That's come from the fact that, yes, our test is 99% accurate, but that still means it is wrong 1 in 100 times, and we're actually saying it is only 1 in 10,000 people who have this disease. So, if you are diagnosed as having it, it is actually more likely that is an error in the diagnosis than that you actually have this very rare disease. But we get this wrong. Intuitively people, generally, everyone would be likely to get this kind of question wrong. Just one example of many.

Some things that may not be immediately intuitively obvious, but if you are working with statistics, you should bear in mind.

Number one, independence is very, very important. If I toss a coin 100 times and get 100 heads, then if those were independent coin flips, there is something very, very odd going on there. If that is a coin that has two heads on it, in other words, they're not in fact independent, the chance of me getting a head is the same on every one, then they're completely different results. So, make sure that whatever it is you are testing, if you are expecting to do analysis over the whole set of trials, that the results are actually independent. The common ways this falls down are when you are dealing with people, humans.

If you want reproducible results, if you accidently manage to include the same person multiple times, their answers to a questionnaire, for example, will be skewed the second time they answer it if they have already seen the site previously, if you are doing user testing or those kinds of things. Be very careful to set up your trials, whatever it is that you are testing, for independence. Don't over worry about this, but realize that it is a potential problem. One of the things we test a lot is display copy on PPC ads. Here you can't really control who is seeing those, but just realize there is not a pure analysis going on there because many of the same people come back to a site regularly and have therefore seen the ad day after day. So there is a skew, a lack of independence.

On a similar note, all kinds of repetition can be problematic, which is unfortunate because repetition is kind of at the heart of any kind of statistical testing. You need to do things multiple times to see how they pan out. The thing I am talking about here particularly is you often have seen confidence intervals given. You have seen situations where somebody says we're 95% sure that advert one is better than advert two or that this copy converts better than that copy or that putting the checkout button here converts better than putting the checkout button there. That 95% number is coming from a statistical test. What it is saying is, it is assuming a whole bunch of independence of the trials, but it is essentially saying the chance of getting this extreme a difference in results by chance if these two things were identical is less than 5%. In other words, fewer than 1 in 20 times would this situation arise by chance.

Now, the problem is that we tend to run lots of these tests in parallel or sequentially. It doesn't really matter. So, imagine you are doing conversion rate optimization testing and you tweak 20 things one after another. Each time you test it against this model and you say, first of all I am going to change the button from red to green. Then I am going to change the copy that is on the button. Then I am going to change the copy that is near the button. Then I am going to change some other thing. You just keep going down the tree. Each time it comes back saying, no, that made no difference, or statistically insignificant difference. No, that made no difference. No, that made no difference. You get that 15 times, say. On the 16th time, you get a result that says, yes, that made a difference. We are 95% sure that made a difference. But think about what that is actually saying. That is saying the chance of this having happened randomly, where the two things you are testing between are actually identical, is 1 in 20. Now, we might expect something that would happen 1 in 20 times possibly to come up by the 16th time. There is nothing unusual about that. So actually, our test is flawed. All we've shown is we just waited long enough for some random occurrence to take place, which would have happened definitely at some point. So, you actually have to be much more careful if you are doing those kinds of trials.

One thing that works very well, which scuppers a lot of these things, is be very, very careful of this kind of thing. If you run these trials sequentially and you get a result like that, don't go and tell your boss right then. Okay? I've made this mistake with a client, rather than the boss. Don't get excited immediately because all you may be seeing is what I was just talking about. The fact that you run these trials often enough and occasionally you are going to find one that looks odd just through chance. Stop. Rerun that trial. If it comes up again as statistically significant, you are now happy. Now you can go and whoop and holler, ring the bell, jump and shout, and tell your boss or your clients. Until that point, you shouldn't because what we very often see is a situation where you get this likelihood of A being better than B, say, and we're like we're 95% sure here. You go and tell your boss. By the time you get back to your desk, it has dropped a little bit. You're like, "Oh, um, I'll show in a second." By the time he comes back over, it has dropped a little bit more. Actually, by the time it has been running for another day or two, that has actually dropped below 50% and you're not even sure of anything anymore. That's what you need to be very careful of. So, rerun those things.

Kind of similar, don't train on your sample data. If you are looking for a correlation between, or suppose you're trying to model search ranking factors, for example. You're going to take a whole bunch of things that might influence ranking, see which ones do, and then try to predict some other rankings. If you get 100 rankings, you train a model on those rankings, and then you try and predict those same rankings, you might do really well, because if you have enough variables in your model, it will actually predict it perfectly because it has just learned, it has effectively remembered those rankings. You need to not do that. I have actually made this mistake with a little thing that was trying to model the stock market. I was, like, yes, I am going to be rich. But, in fact, all it could do was predict stock market movements it had already seen, which it turns out isn't quite as useful and doesn't make you rich. So, don't train on your sample data. Train on a set of data over here, and then much like I was saying with the previous example, test it on a completely independent set of data. If that works, then you're going to be rich.

Finally, don't blindly go hunting for statistical patterns. In much the same way that when you run a test over and over again and eventually the 1 in 20, the 1 in 100 chance comes in, if you just go looking for patterns anywhere in anything, then you're definitely going to find them. Human brains are really good at pattern recognition, and computers are even better. If you just start saying, does the average number of letters in the words I use affect my rankings, and you find a thousand variables like that, that are all completely arbitrary and there is no particular reason to think they would help your rankings, but you test enough of them, you will find some that look like they do, and you'll probably be wrong. You'll probably be misleading yourself. You'll probably look like an idiot in front of your boss. That's what this is all about, how not to look like an idiot.

I'm Will Critchlow. It's been a pleasure talking to you on Whiteboard Friday. I'm sure we'll be back to the usual scheduled programming next week. See you later.