August 16, 2012
By the Australian National Bureau of Statistics’ measure, life expectancy at birth for Australians is 78.5 years for men, and 82.4 years for women (figures for 2005-2007).
These aren’t precisely the same as the World Bank’s figures for the same period (which are slightly higher) – but they’re pretty close. (The World Bank figures for 2006, which I’m assuming is the best comparison year for this data, are 78.7 for men, 83.5 for women, and 81 overall).
These figures are outstanding: they put Australia above the average life expectancy for high income countries. (Which is: for men 76.2, for women 81.9, overall 79) Australia is in the top ten countries in the world for life expectancy.
But these numbers don’t capture demographic differentials within Australia. Notably, Indigenous Australians have markedly lower life expectancies than those for the country overall.
In 2006-7, the life expectancy at birth for Indigenous Australians was 67.2 for men, and 72.9 for women. (Weighted by the relative size of the male and female populations, I make that an overall life expectancy of 70.06)
If Indigenous Australians were their own country, the life expectancy of that country would fall between those of Guatemala and Morocco.
Now looking just at the Indigenous Australian population of the Northern Territory, life expectancy for men is 61.5; for women 69.2. Again weighted by male and female population, I get an average life expectancy for this population overall of 65.41.
That figure is less than the average life expectancy for the world. If the Northern Territory’s Indigenous Australian population were its own country, that country would have a life expectancy just higher than Bhutan, and a bit lower than the Solomon Islands.
This doesn’t seem to be a defensible state of affairs.
May 23, 2012
Éoin Clarke’s blog has a post on the uneven geographical distribution of NHS cuts. He writes:
The wealthiest, and dare I say it Toriest, parts of England have actually experienced no job losses. The South East of England has actually grown its NHS workforce since the May General Election, while the North West of England alone has experienced more than 6,500 job losses.
His post includes a chart. Clarke’s chart shows absolute figures – I thought I’d make my own version of it, showing percentage change. This doesn’t make any real difference to the story, but here it is anyway. Note that these figures are Hospital and Community Health Service staff, excluding primary care staff – lots of NHS employment isn’t captured.
Click the chart to enlarge it. Data from here.
Continuing my new practice of linking to and attempting to summarise statistics papers, here is a short piece by Andrew Gelman and Hal Stern:
Gelman, Andrew and Stern, Hal, ‘The Difference Between “Signiﬁcant” and “Not Signiﬁcant” is not Itself Statistically Signiﬁcant’, The American Statistician, November 2006, Vol. 60, No. 4 328-331 [pdf]
If I understand things aright, Gelman and Stern make the following point: that the emphasis on statistical significance in the reporting of results in the social sciences can lead to a misleadingly firm line being drawn between statistically significant and non-statistically significant results – which itself misrepresents the statistical significance of differences between results.
For example: if we are testing the same hypothesis against two different samples, and find a statistically significant result for one but not for another, this may lead us to draw a strong distinction between our two samples. One yields statistical significance and another does not – what difference could be clearer? Nevertheless, this does not itself indicate any statistically significant difference between our samples. If one test yields statistical significance at p = 0.0499, and another test does not yield statistical significance, at p = 0.0501, we have probably not discovered a dramatic difference between them. The actual difference between our samples is presumably tiny – yet because the difference in p value happens to bridge our choice of significance level, this difference can easily be reified, when equally large, or larger, differences between other samples are ignored.
This is intuitive enough – but the same point can apply even when the differences in p value are very substantial. Gelman and Stern write:
Consider two independent studies with effect estimates and standard errors of 25 ± 10 and 10 ± 10. The ﬁrst study is statistically signiﬁcant at the 1% level, and the second is not at all statistically signiﬁcant, being only one standard error away from 0. Thus, it would be tempting to conclude that there is a large difference between the two studies. In fact, however, the difference is not even close to being statistically signiﬁcant: the estimated difference is 15, with a standard error of … 14.
Additional problems arise when comparing estimates with different levels of information. Suppose in our example that there is a third independent study with much larger sample size that yields an effect estimate of 2.5 with standard error of 1.0. This third study attains the same signiﬁcance level as the ﬁrst study, yet the difference between the two is itself also signiﬁcant. Both ﬁnd a positive effect but with much different magnitudes. Does the third study replicate the ﬁrst study? If we restrict attention only to judgments of signiﬁcance we might say yes, but if we think about the effect being estimated we would say no, as noted by Utts (1991). In fact, the third study ﬁnds an effect size much closer to that of the second study, but now because of the sample size it attains signiﬁcance.
In a blog post that references this paper, Gelman writes:
I’m thinking more and more that we have to get rid of statistical significance, 95% intervals, and all the rest, and just come to a more fundamental acceptance of uncertainty.
I don’t yet know what Gelman means by this latter clause, or what alternative approaches he endorses.
April 9, 2012
There was a piece in the Guardian recently with the headline “Religious people are more likely to be leftwing, says thinktank Demos”
new research suggests… people with faith are far more likely to take left-of-centre positions on a range of issues… The report found that 55% of people with faith placed themselves on the left of politics, compared with 40% who placed themselves on the right.
The figures given here are unhelpful. The relevant comparison is of course not the percentage of people with faith who identify as left, versus the percentage of people with faith who identify as right – but, rather, the political positions of those with faith compared to the political positions of those without.
So – let’s look at the report – specifically figure 7. “The social and political views of people who belong to religious organisations and those who do not, in western European countries and the UK”
The cluster of bars C indicates that 55% of people in the UK who belong to a religious organisation (not “people with faith” as the article says, but so it goes) place themselves on the left in politics. So far so good. What about people in the UK who do not belong to a religious organisation – what percentage of this group places themselves on the left? Well, the chart is a bit hard to read, but we can go to Appendix B and look at table 17a to find that it’s 62%. [Chi-square p = 0.0125]
I.e. those who belong to religious organisations in the UK are on average considerably less likely to identify as left of centre than those who don’t. The headline is precisely wrong – it should read “Religious people are more likely to be rightwing…”
There’s plenty else wrong with the report and its coverage, but that’ll do. Fucking Demos.
[ PDF of the report here: http://www.demos.co.uk/files/Faithful_citizens_-_web.pdf?1333839181 ]
April 9, 2012
Another good piece on common misuses of statistics (full details at the bottom of the post) – this one demonstrating (among other things) that listening to different types of music will change your age:
Using the same method as in Study 1, we asked 20 University of Pennsylvania undergraduates to listen to either “When I’m Sixty-Four” by The Beatles or “Kalimba.” Then, in an ostensibly unrelated task, they indicated their birth date (mm/dd/yyyy) and their father’s age. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040
The gag here, of course, is that if you have enough data, and you analyse it in enough different ways, you’ll be able to find a statistically significant result almost anywhere. The authors of the paper reproduce this same passage later, with some additional phrases added to give a fuller account of the data collection and analysis process:
Using the same method as in Study 1, we asked 20 34 University of Pennsylvania undergraduates to listen only to either “When I’m Sixty-Four” by The Beatles or “Kalimba” or “Hot Potato” by the Wiggles. We conducted our analyses after every session of approximately 10 participants; we did not decide in advance when to terminate data collection. Then, in an ostensibly unrelated task, they indicated only their birth date (mm/dd/yyyy) and how old they felt, how much they would enjoy eating at a diner, the square root of 100, their agreement with “computers are complicated machines,” their father’s age, their mother’s age, whether they would take advantage of an early-bird special, their political orientation, which of four Canadian quarterbacks they believed won an award, how often they refer to the past as “the good old days,” and their gender. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040. Without controlling for father’s age, the age difference was smaller and did not reach significance (Ms = 20.3 and 21.2, respectively), F(1, 18) = 1.01, p = .33.
The authors dub this sort of problem “researcher degrees of freedom”. It is a form of data mining.
In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?
It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance,” and to then report only what “worked.” The problem, of course, is that the likelihood of at least one (of many) analyses producing a falsely positive finding at the 5% level is necessarily greater than 5%.
The authors propose a set of guidelines for researchers to follow that will limit “researcher degrees of freedom” –
1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article
2. Authors must collect at least 20 observations per cell or else provide a compelling cost-of-data collection justification.
3. Authors must list all variables collected in a study.
4. Authors must report all experimental conditions, including failed manipulations.
5. If observations are eliminated, authors must also report what the statistical results are if those observations are included.
6. If an analysis includes a covariate, authors must report the statistical results of the analysis without the covariate.
These solutions are oriented towards psychology, and many of them relate to the data collection/creation process and its reporting. I don’t know how one might effectively limit “researcher degrees of freedom” in a discipline like economics, where often the data is already public, and the “researcher degrees of freedom” can lie in analytic choices alone.
Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant” Psychological Science, XX(X) 1–8, 2011 [pdf!]
April 9, 2012
Browsing around on Google Scholar I ran across this accessible paper – which seems excellent, to my eyes – on the use and misuse of regression analysis. It’s focused on the use of the technique in criminology, but its claims apply more broadly.
Berk distinguishes between three different levels of regression analysis:
Level I: descriptive – simply identifying patterns in the data. No broader inferential or causal claims made. This can always be justified.
Level II: inferential – estimating parameters of a population, hypothesis testing, use of confidence intervals, etc. This can be justified if the data has been generated by probability sampling. If the data has not been generated by probability sampling, level II analysis is “difficult to justify” (485).
[Berk gives several types of justification that could be offered in this scenario: 1) Treating the data as the population (i.e. falling back to descriptive statistics); 2) Making the case that the data can be treated as if it were a probability sample (“rarely credible in practice”); 3) Treating the data as a random sample from an imaginary ‘superpopulation’ (“even more difficult to justify than inferences from the as-if strategy”); 4) Making use of a model of how the data was generated (risky, because the model might be wrong).]
Level III: causal – estimating causal relationships between variables in the population. “Perhaps too simply put, before the data are analyzed, one must have a causal model that is nearly right” (481) But: “It is very difficult to find empirical research demonstrably based on nearly right models.” (482)
Berk concludes that: “With rare exceptions, regression analyses of observational data are best undertaken at Level I. With proper sampling, a Level II analysis can be helpful.” Level III is very difficult to justify. Unfortunately: “The daunting part is getting the analysis past criminology gatekeepers. Reviewers and journal editors typically equate proper statistical practice with Level III.” (486)