How to Lie With Statistics—Sometimes Without Even Trying

May 03, 2009

Some time back, there were news stories reporting on studies of several communities that showed smoking bans to be followed by reductions in heart attacks. There are now reports of a much larger study done at the NBER which finds no such effect. How can one explain the discrepancy?

The simple answer is that in some communities heart attack deaths went up after smoking bans, in some they went down, in some they remained more or less unchanged. Hence a study of a single community could find a substantial reduction even if it was not true on average over all communities.

How did the particular communities reported in the early stories get selected? There are two obvious possiblities.

The first is that the studies were done by people trying to produce evidence of the dangers of second hand smoke. To do so, they studied one community after another until they found one where the evidence fit their theory, then reported only on that one. If that is what happened the people responsible were deliberately dishonest; no research results they publish in the future should be taken seriously.

There is, however, an alternative explanation that gives exactly the same result with no villainy required. Every year lots of studies of different things get done. Only some of them make it to publication, and only a tiny minority of those make it into the newspapers. A study finding no effect from smoking bans is much less likely to be publishable than one that finds an effect. A study finding the opposite of the expected result is more likely to be dismissed as an anomaly due to statistical variation or experimental error than one confirming the expected result. And, among published studies, one that provides evidence for something that lots of people want to believe is more likely to make it into the newspapers than one that doesn't.

David Friedman’s Substack

Discussion about this post