I was listening to an IBM presentation of their latest Business Intelligence offering today, including new capabilities based on their SPSS statistical package, and my mind wandered, as it often does, all the way to a way I have seen such statistics capabilities used wrongly, in our daily lives, to the point where it even becomes a danger to us. And no, I am not talking about “confirmation bias” (look it up?).
Here’s the way, in my naïve understanding, statistics presently works for us. We have a worldview. Scientists using statistics beaver away and discover possible changes in that worldview (A new drug helps cancer? Tobacco may be harmful to you?), and then test them against a null hypothesis, until it is extremely likely that your worldview should be modified; and at that point they announce, they get beaten up while others check their work (if necessary), and then as rational people we change our worldview accordingly.
No, again, my problem is not with how rational we are. Rather, I am concerned with what happens between the time when a new hypothesis shows promise and the time when it is pronounced “extremely likely” -- statistics-speak for “time to change.” [For those who care, think of “extremely likely” as a one-tailed or two-tailed distribution in which likelihood of rejecting the null hypothesis is greater than 95% to 99% and power and data mining bias are just fine]
So what’s wrong with that? We’re risk-averse, aren’t we? Isn’t it the height of rationality to wait for certainty before going off in the wrong direction, and doing worse instead of better?
Not really. Let me reprise a real-world example that I recently saw.
Polar Bears and Statistics
A scientist in Alaska (Dr. Monett) had been observing polar bears, sampling one-tenth of his area each year, for 20-odd years (I have stripped this down to its statistical core). One year, for the first time, he observed 4 polar bears in their migration swimming rather than walking across the ice. The next week, again for the first time, he observed 3 polar bears in that area, dead. Let me also add, to streamline this argument, that this was the first time he had observed open water next to the shore during migration season.
Here was what that scientist did, as a good statistician: wrote a paper noting the new hypothesis (open water is beginning to occur, and it’s killing polar bears), and indicating that based on his sample size, an initial alternate hypothesis was 40 polar bears per year swimming instead of ice-walking in that region, causing 30 additional polar bear deaths per year. He then requested help from other scientists in carrying out similar surveys in other regions, while he would continue his yearly sample in his region. Duly, they did so, and the evidence grew stronger over the last five years. Afaik, it has not yet been officially recognized as an extremely likely hypothesis, but it seems reasonable to guess that if it has not already been done, this hypothesis will be recognized as an “extremely likely scientific fact” in the next five years.
Now, let’s look more closely at the likely statistical distribution of the data. The first thing to realize is, there can’t be less than zero polar bear deaths. In other words, sorry, this is not a normal distribution. It’s not a matter of a cluster of data points around zero and a cluster greater than zero; if you get 20 years of zeros and then a year of 3 or 4, especially given the circumstances, the alternate hypothesis is in fact immediately more likely than your conservative statistician is telling you, even before more data arrive.
Now look at the statistical distribution of polar bears swimming in the new situation. Because you only have a year’s worth of data, that distribution is pretty flat. If you add that, on average, there are 10 polar bears migrating in the area surveyed, then the distribution runs from zero to 94 polar bears swimming in that year, and zero to 100 in any year, in a pretty flat fashion. But statistics also tells us that 40 is the most likely number – even if it has 2% likelihood – and that it is also the median of likely outcomes as of now: you are just as likely to see more polar bears swimming in the region as less. In other words, it’s pretty darn likely that some polar bears are going to be swimming from now on (you can take as given that there’s going to continue to be open water), and, if so, the new null hypothesis is 40 per year.
So the only question is, when in the last five years should we have changed our worldview? And the answer is, not after five years, when conservative statistics says the new null hypothesis is “extremely likely.” Rather, depending on the importance of the statistics to us, we should change after the first year’s additional data, at the latest, which is the point at which the new null hypothesis becomes much more likely than the old. [again, for those who care, the point at which the likely zero-swims lambda probability distribution is twice as unlikely as a semi-normal distribution somewhere around 40 polar bears].
Because now we have to ask, what are we using this data for? If it’s a nice-to-have cancer cure, sure, by all means, let’s wait. If it’s a matter of just how fast climate change is happening …
Of Course Disastrous Climate Change is Still Unlikely. Isn’t It?
I picked that last example above on purpose, as a kind of shock therapy. You see, it appears to me that there is a directly comparable “super-case”: how we – or at least, almost all of the people I see quoted – view the likelihood of various scenarios for upcoming climate change.
Let’s start with the 2007 IPCC report upon which most analysis is based. Those sane, rational folks recognized that this IPCC report lays out all the scientific facts – those aspects of climate change that scientists have confirmed are “extremely likely” – and fits them all together into a model of how climate change is happening and will happen. In that model, we can by reasonable efforts limit global climate change to 2 degrees C total, reached by the end of the century, which is a disastrous amount, but, given the time frame over which this occurs, not civilization-threatening.
That, it turns out, as in the polar bear case, is just about the absolute minimum. And, as in the polar bear case, what the scientists knew in the years before 2007, and which to a fair extent they have confirmed since then, is that it is not the most likely case nor the median-likelihood case under all scenarios except those involving a really drastic reduction in fossil-fuel use over the next 8-18 years. It is just that, still, all the real-world data has not quite reached the point where scientists can stamp it as an “extremely likely” new hypothesis. But even in 2007, it was quite a bit more likely than the 2007 “scientific fact” model; it is now far, far more likely.
So, as in the polar bear case, here we have the most likely, median-likelihood case, and it is much worse than the minimum case, and what does everyone talk about? The minimum case – which means that it is all too easy to assume that we will make a reasonable effort (or the “free market” has begun taking care of everything) that will hold climate change to 2 degrees C or below.
Just to cite a few examples of this: all the political leaders are talking about conferences to set as targets (bad ones, but valid targets if we assume the global economy doesn’t grow from now on) carbon emissions reductions aimed at reducing emissions by 20% by 2020-30, the number that should hold climate change to 2 degrees C; and Prof. Krugman, whom nobody regards as an optimist, is pointing approvingly to the work of an economist expert in the relationship of climate change to economics, who argues that although climate change of 2 degrees C is really unlikely, we should do something about it because of the horrible consequences if it did occur.
These are not irrational people (or, at least, their thinking here seems to be pretty rational). It is just that scientists, concerned about the scientific process, have told them that the IPCC report is scientific fact and the stuff beyond it is not, and we, who should be operating based on “most likely” and “median likelihood”, are instead, operating on the basis of a stale “null hypothesis” and “alternative hypothesis.”
Moreover, we should not expect scientists to change their tune. It is their job to move from scientific fact to scientific fact. It should be our job to listen to their “opinions” as well, and weave them together into an understanding of just what the most likely, median-likelihood model is right now. And, because in this case the most likely, median-likelihood case is pretty darn frightening, we must prioritize; we must see that this is not like the polar bear or the cancer cure: The answer to this matters a lot, right now.
A Quick, Boring Aside on Climate Change Statistics, Right?
Not that anyone cares, what is that most likely, median-likelihood climate change model and what are the resulting scenarios over the next 90 years or so? Well, first let’s look at where real-world data is diverging from the “minimum” model. Arctic ice decrease is supposed to be linear, and the alternative hypothesis has been it’s exponential. In the first case, less than 5 % ice at minimum somewhere around 2100; in the second, somewhere around 2030. Real-world volume data? Somewhere between 2016 and 2020. Oh, and the same data shows the Arctic all but free of ice year-round by somewhere between 2035 and 2045.
Similar thing with Greenland ice. Looks like it’s doubling its rate of ice loss every decade, but null hypothesis is, it will be linear from now on. Antarctic ice? Null hypothesis is still zero ice loss; initial surveys suggest linear ice loss; you think maybe we’ll start to figure out exponential ice loss in the next ten years? Natural-source methane release: can’t rule out zero natural-source increase, the likelihood is that a fairly big increase in the last 10 years is due mainly to natural causes, too early to tell about exponential, never mind the Russian surveys. At least, with atmospheric carbon measurements, somebody has noted a semi-linear increase in the annual increase over the last thirty years, and another scientist has come up with a reason that the rate of temp change caused by that rising increase will top out at 1 degree C per decade in 2050 – and he’s not operating on the basis of the minimum model, thank heavens.
So what does that mean? Well, the closest any scientist has come to a most likely, median likelihood model is the MIT 2011 study. “Business as usual” – a reasonable effort – yields 6 degrees C (11 degrees F) temp rise by end of century and 10-15 feet of sea rise. Of course, the results of that are probably not much more than the results of 2 degrees C temp rise, right?
And, remember, this model is still more scientifically “conservative” than the most likely one. Now we’re talking more than 6 degrees C by 2100, perhaps 25 feet of sea rise (mostly between 2050 and 2100) by 2100. And still, there’s that methane kicker to possibly add – which is “unlikely” now because we still aren’t sure how it will play out; it’s happening faster than we thought, but the absolute upper limit on its effects is coming down as we learn more.
Back to Statistics, Wrapping Up
I would dearly like for scientists to make a reasonable effort, as part of communicating a result, to tell us the implications of most-likely, median-likelihood models based on that result. I can understand why they won’t, given how the legal system and the press beats them up for imagined “sins” in that direction. Which is why I would simply ask the reader to make the effort, in cases that just might be very important to the reader or to us all.
Because, you see, in the long run, scientists are all dead, and so are we. If we wait for scientific fact to be established, which can sometimes take a lifetime, no one can criticize us for it. But, in cases where we should have acted long before we did, because, in fact, our old worldview was dangerously wrong and the evidence told us well before the final scientist’s official extremely-likely stamp, we ought at least to feel some pangs of guilt. Rationality isn’t enough. Understanding and compensating for the limitations of our null-hypothesis statistics hopefully is.
Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts
Monday, February 27, 2012
Monday, January 9, 2012
Oh, Those Statistical Nits!
Recently, I read a Paul Krugman NY Times column on income inequality, which referenced a NY Times article on income inequality, which referenced a Pew Research Center study on income mobility over generations. The NY Times article stated flatly that the Pew study found that 81% of Americans have more income than their parents. I read the first part of the study carefully, and it did indeed state that most sons of parents in the study, whether white or African-American, bottom or middle or top third of parental income, earned more than their parents did. And then I read even more carefully, and realized that the data absolutely did not support a conclusion that today’s American sons earn more than their parents did.
What went wrong? Well the study took a longitudinal study of families whose sons were between 0-18 years of age in 1967-1971, and compared the family income of the parents in 1967-1971 to the family income of the sons in 1995-2002 (omitting a couple of years). There were three basic problems with the analysis. First, government data shows that for the bottom third of family incomes, family income grew from 1967 to 1979, decreased slightly until 1994, grew again until 2001, and decreased to below the 1979 level by 2010. For the middle third, there was a similar trend, except that family incomes are now about the same or slightly below the 1979 level. For the upper third (and especially for the upper 1%), family income has grown consistently and, by 2010, substantially over 1967-1971. So the choice of the two “snapshot” time periods maximized the growth of income between parents and sons in all three income strata. Thus, it is likely that had the period been, say, 1980-1984 vs. 2006-2010, far fewer sons would have increased their family income compared to their parents.
The second problem with the study was the interval of the two “snapshots”. We know from demographic data that people were having kids, on average, earlier, in the 1950s and 1960s, and so it is reasonable to suppose that those sons who were 0-18 in 1967-1971 typically had parents that were 21–49, and most frequently about 35, while the sons in turn during 1995-2002 would be 24-55, and most frequently around 40. The reason this matters is that government data shows that families’ earnings trend steadily upwards from 20 onwards, and reach their peak from 45-55. In other words, the time period chosen exaggerated the income earned by the sons by pushing more of them into a peak earnings period.
The third problem with the study’s statistical approach is that it took “family income” as equivalent to “personal income.” Back in 1967-1971, across all three strata, less than one-third of women worked. According to the latest Census data, perhaps 80% as many women as men work, and that was pretty much true in the 1995-2002 period. These, in turn, have been earning perhaps 80% as much as men. So, especially in the bottom and middle thirds, women contributed less than 10% of average “family income” in the 1967-1971 period, and about 40% of “family income” in the 1995-2002 period. If we are really comparing apples to apples, we have to say that if we compare fathers to sons, it is clear that any upward trend in income is far less frequent. Now, the study notes that the “family income” is converted to personal income by being “family-size adjusted in all analyses”; but all this does is exaggerate things even further, because family sizes were slightly smaller in 1995-2002 (and now) than in 1967-1971.
One caveat: I was unable to access the Appendix to the study, which explained Pew’s methodology in greater detail. It is always possible that they dealt with these problems to some extent by further statistical tweaks. However, I view that as pretty unlikely, since these considerations are so important that they should have been noted in some way in the main paper.
Now, it is important to keep in mind that I am not a statistics “rocket scientist.” All it took me to figure this one out was a little ongoing digging on the topic of income inequality, and a careful lay-person reading of the methodology section of the Pew paper. The problem here is not that Pew was “lying with statistics”, because the facts were right there in the front of their study report. The real problem is that the so-called journalist of the NY Times apparently didn’t even bother to read that section carefully, much less do a little additional research which would have called the Times’ “81% of Americans” into further question.
So, as that fellow in the insurance commercials would say, what have we learned here? Well, first of all, statistical nits matter. I suspect that when the dust settles, we will find that less than half of all American males are presently earning as much, in real terms, as their fathers did (and the women aren’t a slam dunk either, since much of the surge in their employment and wages happened by the late 1980s). Even if that isn’t true, there’s no way the figure is anywhere near 81%. You need to consider statistical nits like the ones I have cited to convert a statistical study into a realistic picture of what’s going on in the real world.
Second, and equally important, you can’t trust any old reporter to do it for you, no matter how prestigious the name of their institution. You at least have to make a stab at the statistical nits yourself – or you’ll wind up believing what just isn’t so. Thank heavens for the Web, so that we can begin to check those statistical nits. Thank heavens for the Web, which gives us pointers to data that put those statistical nits in context. If we fail to do so, then by all means blame the knucklehead at the NY Times – but also blame ourselves.
What went wrong? Well the study took a longitudinal study of families whose sons were between 0-18 years of age in 1967-1971, and compared the family income of the parents in 1967-1971 to the family income of the sons in 1995-2002 (omitting a couple of years). There were three basic problems with the analysis. First, government data shows that for the bottom third of family incomes, family income grew from 1967 to 1979, decreased slightly until 1994, grew again until 2001, and decreased to below the 1979 level by 2010. For the middle third, there was a similar trend, except that family incomes are now about the same or slightly below the 1979 level. For the upper third (and especially for the upper 1%), family income has grown consistently and, by 2010, substantially over 1967-1971. So the choice of the two “snapshot” time periods maximized the growth of income between parents and sons in all three income strata. Thus, it is likely that had the period been, say, 1980-1984 vs. 2006-2010, far fewer sons would have increased their family income compared to their parents.
The second problem with the study was the interval of the two “snapshots”. We know from demographic data that people were having kids, on average, earlier, in the 1950s and 1960s, and so it is reasonable to suppose that those sons who were 0-18 in 1967-1971 typically had parents that were 21–49, and most frequently about 35, while the sons in turn during 1995-2002 would be 24-55, and most frequently around 40. The reason this matters is that government data shows that families’ earnings trend steadily upwards from 20 onwards, and reach their peak from 45-55. In other words, the time period chosen exaggerated the income earned by the sons by pushing more of them into a peak earnings period.
The third problem with the study’s statistical approach is that it took “family income” as equivalent to “personal income.” Back in 1967-1971, across all three strata, less than one-third of women worked. According to the latest Census data, perhaps 80% as many women as men work, and that was pretty much true in the 1995-2002 period. These, in turn, have been earning perhaps 80% as much as men. So, especially in the bottom and middle thirds, women contributed less than 10% of average “family income” in the 1967-1971 period, and about 40% of “family income” in the 1995-2002 period. If we are really comparing apples to apples, we have to say that if we compare fathers to sons, it is clear that any upward trend in income is far less frequent. Now, the study notes that the “family income” is converted to personal income by being “family-size adjusted in all analyses”; but all this does is exaggerate things even further, because family sizes were slightly smaller in 1995-2002 (and now) than in 1967-1971.
One caveat: I was unable to access the Appendix to the study, which explained Pew’s methodology in greater detail. It is always possible that they dealt with these problems to some extent by further statistical tweaks. However, I view that as pretty unlikely, since these considerations are so important that they should have been noted in some way in the main paper.
Now, it is important to keep in mind that I am not a statistics “rocket scientist.” All it took me to figure this one out was a little ongoing digging on the topic of income inequality, and a careful lay-person reading of the methodology section of the Pew paper. The problem here is not that Pew was “lying with statistics”, because the facts were right there in the front of their study report. The real problem is that the so-called journalist of the NY Times apparently didn’t even bother to read that section carefully, much less do a little additional research which would have called the Times’ “81% of Americans” into further question.
So, as that fellow in the insurance commercials would say, what have we learned here? Well, first of all, statistical nits matter. I suspect that when the dust settles, we will find that less than half of all American males are presently earning as much, in real terms, as their fathers did (and the women aren’t a slam dunk either, since much of the surge in their employment and wages happened by the late 1980s). Even if that isn’t true, there’s no way the figure is anywhere near 81%. You need to consider statistical nits like the ones I have cited to convert a statistical study into a realistic picture of what’s going on in the real world.
Second, and equally important, you can’t trust any old reporter to do it for you, no matter how prestigious the name of their institution. You at least have to make a stab at the statistical nits yourself – or you’ll wind up believing what just isn’t so. Thank heavens for the Web, so that we can begin to check those statistical nits. Thank heavens for the Web, which gives us pointers to data that put those statistical nits in context. If we fail to do so, then by all means blame the knucklehead at the NY Times – but also blame ourselves.
Subscribe to:
Posts (Atom)