The Statistical Significance of Differences in Statistical Significance

April 12, 2012

Continuing my new practice of linking to and attempting to summarise statistics papers, here is a short piece by Andrew Gelman and Hal Stern:

Gelman, Andrew and Stern, Hal, ‘The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant’, The American Statistician, November 2006, Vol. 60, No. 4 328-331 [pdf]

If I understand things aright, Gelman and Stern make the following point: that the emphasis on statistical significance in the reporting of results in the social sciences can lead to a misleadingly firm line being drawn between statistically significant and non-statistically significant results – which itself misrepresents the statistical significance of differences between results.

For example: if we are testing the same hypothesis against two different samples, and find a statistically significant result for one but not for another, this may lead us to draw a strong distinction between our two samples. One yields statistical significance and another does not – what difference could be clearer? Nevertheless, this does not itself indicate any statistically significant difference between our samples. If one test yields statistical significance at p = 0.0499, and another test does not yield statistical significance, at p = 0.0501, we have probably not discovered a dramatic difference between them. The actual difference between our samples is presumably tiny – yet because the difference in p value happens to bridge our choice of significance level, this difference can easily be reified, when equally large, or larger, differences between other samples are ignored.

This is intuitive enough – but the same point can apply even when the differences in p value are very substantial. Gelman and Stern write:

Consider two independent studies with effect estimates and standard errors of 25 ± 10 and 10 ± 10. The first study is statistically significant at the 1% level, and the second is not at all statistically significant, being only one standard error away from 0. Thus, it would be tempting to conclude that there is a large difference between the two studies. In fact, however, the difference is not even close to being statistically significant: the estimated difference is 15, with a standard error of … 14.

Additional problems arise when comparing estimates with different levels of information. Suppose in our example that there is a third independent study with much larger sample size that yields an effect estimate of 2.5 with standard error of 1.0. This third study attains the same significance level as the first study, yet the difference between the two is itself also significant. Both find a positive effect but with much different magnitudes. Does the third study replicate the first study? If we restrict attention only to judgments of significance we might say yes, but if we think about the effect being estimated we would say no, as noted by Utts (1991). In fact, the third study finds an effect size much closer to that of the second study, but now because of the sample size it attains significance.

In a blog post that references this paper, Gelman writes:

I’m thinking more and more that we have to get rid of statistical significance, 95% intervals, and all the rest, and just come to a more fundamental acceptance of uncertainty.

I don’t yet know what Gelman means by this latter clause, or what alternative approaches he endorses.

5 Responses to “The Statistical Significance of Differences in Statistical Significance”

  1. ktismatics Says:

    I couldn’t get the links to work but anyhow, here’s another one. Say you’ve collected data on 10 variables and you’re trying to figure out if they correlate with each other. You do a series of pairwise contrasts for the 20 variables — a with b, a with c, etc. — and find that 10 of these contrasts yield correlations that are significant at the p<.05 level — not bad. However… 20 variables yield (20 x 19)/2 = 190 pairwise contrasts. You've set the significance level at .05, meaning that you're interested in correlations only if there is less than a 5% chance of their having occurred through random variation in the sample. How many of the 190 pairwise contrasts are likely to show random pairwise correlations with each other in your sample? Per Boole's Inequality the answer is np, which in this case is .05 x 190 = 9.5. So your 10 statistically significant correlations are just about what you'd expect from random variation in the sample. There are recognized corrections to compensate for Boole's inequality; it would be an interesting analysis to see how many studies reporting significant findings in some field of study actually perform the adjustment.

  2. ktismatics Says:

    I meant to say that you’ve collected data on 20 variables.

  3. duncan Says:

    Yes, exactly – the dreaded “I did 1000 t tests and nearly 50 of them were significant (p<0.05)! I'm going to publish those results (and barely even mention the others)!"

    One of the many things I don't know but should, is how the various post-hoc tests one can run after a statistically significant ANOVA result (for example) actually work – the mechanism by which (and thus extent to which…) the different tests aim to compensate for the type I error problem. Without that knowledge I can't myself judge whether or to what extent any or which tests might be appropriate to use in any given circumstance (one relies instead on others' rules of thumb…) There is obviously a large literature on these issues (which are, after all, pretty basic) (and which extends beyond post-hoc tests) – I just haven't read it. Maybe this is something I could look at in future posts.

    Not quite the same issue, but the same family of problem: have you seen this xkcd strip?

  4. ktismatics Says:

    I don’t much care for green jellybeans anyhow.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: