The Null Is Always False (Except When It Is True) (2014)

mkl951 pts0 comments

The 20% Statistician: The Null Is Always False (Except When It Is True)

The 20% Statistician

A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Thursday, June 12, 2014

The Null Is Always False (Except When It Is True)

An often heard criticism of null-hypothesis significance testing is that the null is always false. The idea is that average differences between two samples will never be exactly zero (there will practically always be a tiny difference, even if it is only 0.001). Furthermore, if the sample size is large enough, tiny differences can be statistically significant. Both these statements are correct, but they do not mean the null is never true.

The null-hypothesis assumes the difference between the means in the two populations is exactly zero. However, the two means in the samples drawn from these two populations vary with each sample (and the less data you have, the greater the variance). The difference between two means will get really really close to zero when the number of samples approaches infinity. This is a core assumption in Frequentist approaches to statistics. It’s therefore not important that the observed difference in your sample isn’t exactly zero, as long as the difference in the population is zero.

Some researchers, such as Cohen (1990) have expressed their doubt that the difference in the population is ever exactly zero. As Cohen says:

The null hypothesis, taken literally (and that's the only way you can take it in formal hypothesis testing), is always false in the real world. It can only be true in the bowels of a computer processor running a Monte Carlo study (and even then a stray electron may make it false). If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null is always false, what’s the big deal about rejecting it? (p. 1308).

One ‘big deal’ about rejecting it, is that to reject a small difference (e.g., a Cohen’s d of 0.001) you need a sample size of at least 31 million participants to have a decent chance of observing such a statistical difference in a t-test. With such sample sizes, almost all statistics we use (e.g., checks for normality) break down and start to return meaningless results.

Another ‘big deal’ is that we don’t know whether the observed difference will remain equally large irrespective of the increase in sample size (as should happen, when it is an accurately measured true effect) or whether it will become smaller and smaller, without ever becoming statistically significant, the more measurements are added (as should happen when there is actually no effect). Hagen (1997) explains this latter situation in his article ‘In Praise of the Null-Hypothesis Significance Test’ to prevent people from mistakenly assuming that every observed difference will become significant if you simply add participants. He writes:

‘Thus, although it may appear that larger and larger Ns are chasing smaller and smaller differences, when the null is true, the variance of the test statistic, which is doing the chasing, is a function of the variance of the differences it is chasing. Thus, the "chaser" never gets any closer to the "chasee."’

What’s a ‘real’ effect?

The more important question is whether it is true that there are always real differences in the real world, and what the ‘real world’ is. Let’s consider the population of people in the real world. While you read this sentence, some individuals in this population have died, and some were born. For most questions in psychology, the population is surprisingly similar to an eternally running Monte Carlo simulation. Even if you could measure all people in the world in a millisecond, and the test-retest correlation was perfect, the answer you would get now would be different from the answer you would get in an hour. Frequentists (the people that use NHST) are not specifically interested in the exact value now, or in one hour, or next week Thursday, but in the average value in the ‘long’ run. The value in the real world today might never be zero, but it’s never anything, because it’s continuously changing. If we want to make generalizable statements about the world, I think the fact that the null-hypothesis is never precisely true at any specific moment is not a problem. I’ll ignore more complex questions for now, such as how we can establish whether effects vary over time.

When perfect randomization to conditions is possible, and the null-hypothesis is true, every p-value is going to be just as likely. There a great blog post by Jim Grange explaining that p-values are uniformly distributed if the null is true using simulations in R. Take the script from his blog, and change the sample size (e.g., to 100000 in each group), or change the variances, and as long as the means of the two groups remain identical, p-values will be...

null true difference always false sample

Related Articles