Monday, January 9, 2012

Oh, Those Statistical Nits!

Recently, I read a Paul Krugman NY Times column on income inequality, which referenced a NY Times article on income inequality, which referenced a Pew Research Center study on income mobility over generations. The NY Times article stated flatly that the Pew study found that 81% of Americans have more income than their parents. I read the first part of the study carefully, and it did indeed state that most sons of parents in the study, whether white or African-American, bottom or middle or top third of parental income, earned more than their parents did. And then I read even more carefully, and realized that the data absolutely did not support a conclusion that today’s American sons earn more than their parents did.

What went wrong? Well the study took a longitudinal study of families whose sons were between 0-18 years of age in 1967-1971, and compared the family income of the parents in 1967-1971 to the family income of the sons in 1995-2002 (omitting a couple of years). There were three basic problems with the analysis. First, government data shows that for the bottom third of family incomes, family income grew from 1967 to 1979, decreased slightly until 1994, grew again until 2001, and decreased to below the 1979 level by 2010. For the middle third, there was a similar trend, except that family incomes are now about the same or slightly below the 1979 level. For the upper third (and especially for the upper 1%), family income has grown consistently and, by 2010, substantially over 1967-1971. So the choice of the two “snapshot” time periods maximized the growth of income between parents and sons in all three income strata. Thus, it is likely that had the period been, say, 1980-1984 vs. 2006-2010, far fewer sons would have increased their family income compared to their parents.

The second problem with the study was the interval of the two “snapshots”. We know from demographic data that people were having kids, on average, earlier, in the 1950s and 1960s, and so it is reasonable to suppose that those sons who were 0-18 in 1967-1971 typically had parents that were 21–49, and most frequently about 35, while the sons in turn during 1995-2002 would be 24-55, and most frequently around 40. The reason this matters is that government data shows that families’ earnings trend steadily upwards from 20 onwards, and reach their peak from 45-55. In other words, the time period chosen exaggerated the income earned by the sons by pushing more of them into a peak earnings period.

The third problem with the study’s statistical approach is that it took “family income” as equivalent to “personal income.” Back in 1967-1971, across all three strata, less than one-third of women worked. According to the latest Census data, perhaps 80% as many women as men work, and that was pretty much true in the 1995-2002 period. These, in turn, have been earning perhaps 80% as much as men. So, especially in the bottom and middle thirds, women contributed less than 10% of average “family income” in the 1967-1971 period, and about 40% of “family income” in the 1995-2002 period. If we are really comparing apples to apples, we have to say that if we compare fathers to sons, it is clear that any upward trend in income is far less frequent. Now, the study notes that the “family income” is converted to personal income by being “family-size adjusted in all analyses”; but all this does is exaggerate things even further, because family sizes were slightly smaller in 1995-2002 (and now) than in 1967-1971.

One caveat: I was unable to access the Appendix to the study, which explained Pew’s methodology in greater detail. It is always possible that they dealt with these problems to some extent by further statistical tweaks. However, I view that as pretty unlikely, since these considerations are so important that they should have been noted in some way in the main paper.

Now, it is important to keep in mind that I am not a statistics “rocket scientist.” All it took me to figure this one out was a little ongoing digging on the topic of income inequality, and a careful lay-person reading of the methodology section of the Pew paper. The problem here is not that Pew was “lying with statistics”, because the facts were right there in the front of their study report. The real problem is that the so-called journalist of the NY Times apparently didn’t even bother to read that section carefully, much less do a little additional research which would have called the Times’ “81% of Americans” into further question.

So, as that fellow in the insurance commercials would say, what have we learned here? Well, first of all, statistical nits matter. I suspect that when the dust settles, we will find that less than half of all American males are presently earning as much, in real terms, as their fathers did (and the women aren’t a slam dunk either, since much of the surge in their employment and wages happened by the late 1980s). Even if that isn’t true, there’s no way the figure is anywhere near 81%. You need to consider statistical nits like the ones I have cited to convert a statistical study into a realistic picture of what’s going on in the real world.

Second, and equally important, you can’t trust any old reporter to do it for you, no matter how prestigious the name of their institution. You at least have to make a stab at the statistical nits yourself – or you’ll wind up believing what just isn’t so. Thank heavens for the Web, so that we can begin to check those statistical nits. Thank heavens for the Web, which gives us pointers to data that put those statistical nits in context. If we fail to do so, then by all means blame the knucklehead at the NY Times – but also blame ourselves.

No comments: