The Abuse of Statistics

Jun 20, 2023

Over the years I have accumulated a number of examples. Here are some of them.

Tax Rates

A good deal of the rhetoric in support of proposals for raising taxes on higher incomes is designed to make it sound as though rich people pay federal taxes at a lower rate than everyone else. That, as one can easily check by looking at the published figures from the Congressional Budget Office, is not only false but wildly false. Most people in the bottom half of the income distribution pay no federal income tax at all1 although they do pay payroll taxes and some of the cost of other taxes is passed on to them in higher prices or lower wages. On the CBO calculations, the ratio of total federal tax paid to income rises pretty much monotonically with income.

A less extreme claim, which got a good deal of press some years ago, was that a quarter of the households with an income of at least a million dollar a year pay taxes at a lower rate than the ten percent of those with incomes of under $100,000 who pay at the highest rate. I have not seen any detailed explanation of how those numbers are calculated but presumably they are based on income and tax for a single year.

Income and tax liability vary for each individual from year to year. If you take a large capital loss one year, part of it carries over to reduce your taxes but not your income in the next year. If you have a large capital gain in one year, your taxes go up for that year but your average tax rate goes down, since capital gains are taxed at a lower rate than ordinary income.

Some of the 25% of high income taxpayers who pay at the lowest rate may be people who regularly pay less taxes than most, some are taxpayers who happen to be paying a lower rate than average this year. Some of the 10% of middle income taxpayers paying at the highest rate may be people who regularly pay more taxes than most, some are people who happen to be paying a higher rate this year than most years. The widely reported factoid overstates, by how much I have no way of knowing, the spread of both distributions, both the number of middle income taxpayers who, year after year, are taxed at a higher rate than the bottom 25% of high income taxpayers and the number of high income taxpayers who are taxed at a lower rate than the top 10% of middle income taxpayers.

If the logic is not clear, consider betting on the races. Each day a significant fraction of the bettors, say a quarter, make money. A few of them make money because they are much better than most at guessing which horse will win, most because that was a day that they happened to be lucky.

If you looked only at the day's results, you would conclude that the top quarter make money at the races. If you looked at the year's results, you would come up with a much smaller number. If you looked at the tax rates paid by any group of taxpayers over a period of years, you would get fewer paying a rate that was unusually high or unusually low than if you look at them for a single year.

The same problem means that the Gini index, the standard measure of a society’s income inequality, fails to distinguish between lifetime inequality and year-to-year inequality. A society where everyone’s income alternated between $50,000 one year and $100,000 the next would have the same Gini index, be by that measure exactly as unequal, as a society where half the people had an income of $50,000 a year and half of $100,000.

For readers interested in a more general account of how to lie with statistics, I have a book to recommend.

The Effect of Helmet Laws

CNN: "Report: Fatalities soar after helmet law lifted"

Clicking on the link, one discovers that:

MELBOURNE, Fla. (AP) – Motorcycle fatalities involving riders without helmets have soared in the nearly six years since Gov. Jeb Bush repealed the state’s helmet law, a newspaper reported Sunday.

A Florida Today analysis of federal motorcycle crash statistics found “unhelmeted” deaths in Florida rose from 22 in 1998 and 1999, the years before the helmet law repeal, to 250 in 2004.

Total motorcycle deaths in the state have increased 67 percent, from 259 in 2000 to 432 in 2004, according to National Highway Traffic Safety Administration statistics.

Records, though, also show motorcycle registrations have increased 87 percent in Florida since Bush signed the helmet law repeal on July 1, 2000.

Deaths went up 67%, registrations went up 87%, so deaths per registered motorcycle have been going down. "Unhelmeted deaths" went up steeply sounds convincing — until you realize that one result of not wearing a helmet is that an accident that would have killed you even with a helmet now counts as an "unhelmeted" instead of a "helmeted" death. I do not know what else changed over the period; it would be interesting so see comparable statistics from states that did not change their laws. But the evidence presented in the article, taken by itself, implies precisely the opposite of what the top level headline suggests.

"Poll: Clinton gets high 'no' vote for 2008"

That was the headline of a story on a CNN web page. Someone who actually read the story, would discover that 47% of those polled said they would definitely vote against Hilary Clinton, 47% against Kerry 48% against Gore, and 63% against Jeb Bush. It is true that McCain scored 34% and Giuliani 30%, but that puts Hilary in the middle of the unpopularity ratings and not, as the headline implies, at the top.

She did, however, have one distinction—the highest positive rating. 22% of those polled said they would definitely vote for her. The other candidates had ratings ranging from 19% (Giuliani) down to 9% (Jeb Bush).

The Cost of Healthy Eating

According to one article,

"A new analysis shows healthy eating can really run up a grocery bill, making it tough for Americans on tight budgets to meet nutritional guidelines. The study estimates that getting the average American to the recommended target of just one nutrient, potassium, would cost an additional $380 each year."

Anyone who believes that should Google for "potassium supplement"—priced at $9 for a 120 potassium iodide tabs of 32.5 mg each from one source for $9, 100 caplets of 99 mg of potassium gluconate for $6.87 from another, and about ten cents a pill—with calcium and magnesium thrown in for free—from a third.

The article pretends to be about what healthy eating costs. It is actually about what people who eat healthily spend. Higher income correlates with better education, so people who spend more also, on average, spend better, nutritionally speaking. That is no evidence that good nutrition costs more and, as a comparison between the price of spareribs and the price of pork and beans or fruit salad would demonstrate, it often does not. I expect that the same analysis could be used to show that people who spend more on rent eat better too.

It is possible, although not likely, that an author could be sufficiently clueless to make the argument and believe it. But not this author. Reading the article it was pretty obvious what axe is being ground. And I have difficulty believing in an author who thinks that if only the prices of apricots and raisins were sufficiently subsidized, people who currently prefer Happy Meals would switch to fruit salads instead.

When I commented on the article on my blog, more than ten years ago, I included a link. It no longer links to that article. I tried to find it with the Wayback Machine, one of my favorite bits of online magic, but it seems to have fallen through the cracks, been on a page updated between scans of the web by the Internet Archive. If a reader can find it — my blog post was on August 5, 2011 — let me know.

Truth, Falsity, and Jobless Numbers

Back during the Obama administration I came across a web page with the following claim:

Given the difficulty of estimating economic data it is common practice for government agencies to announce a preliminary number subject to later revision. Under the law of averages, estimates should balance out between being higher or lower than later revisions. Amazingly, though, the Obama Department of Labor’s preliminary estimates of new jobless claims have been lower than later revisions in 56 of the last 57 weeks.

The claim was followed by what purported to be a quote from Labor Secretary Hilda Solis.

"We feel it is better to err on the side of optimism,” she said. “The preliminary estimate is widely reported. The subsequent revisions are rarely noticed. By adding a bit of sheen to the preliminary estimate we feel we are helping to boost morale. We believe that good morale is an important building block for positive change.”“Making the economy look better will make people feel better,” Solis went on. “If people feel better they are more likely to support the policies of the Administration, which we feel is crucial if we are to be given the opportunity to continue on the path laid out by the President for another four years."

My immediate reaction was suspicion — the quote sounded too much like what a critic of the President would imagine his labor secretary saying and quite unlike what an administration official would actually say. I put a comment on the web page to that effect, adding that I didn't have an opinion on whether the initial fact was true.

Further investigation found the quote only on pages hostile to the administration and no source other than another such page; a commenter pointed out that the "quote" from the labor secretary originated as part of a longer piece, obviously intended as satire. On the other hand, the fact, initial underestimates for 56 of the past 57 weeks, is from the Wall Street Journal and so presumably true.

I conclude that the labor department has indeed been deliberately misrepresenting the evidence — erring in the same direction 56 times out of 57 is not something that has any significant probability of happening by chance. The obvious explanation is the one given in the purported quote. But I am quite confident that the labor secretary did not actually say what the quote asserted she said. At least not in public.

How to Lie With Statistics—Sometimes Without Even Trying

Some time back, there were news stories reporting on studies of several communities that showed smoking bans to be followed by reductions in heart attacks. There are now reports of a much larger study done at the NBER which finds no such effect. How can one explain the discrepancy?

The simple answer is that in some communities heart attack deaths went up after smoking bans, in some they went down, in some they remained more or less unchanged. Hence a study of a single community could find a substantial reduction even if it was not true on average over all communities.

How did the particular communities reported in the early stories get selected? There are two obvious possibilities.

The first is that the studies were done by people trying to produce evidence of the dangers of second hand smoke. To do so, they studied one community after another until they found one where the evidence fit their theory then reported only on that one. If that is what happened the people responsible were deliberately dishonest; no research results they publish in the future should be taken seriously.

There is, however, an alternative explanation that gives exactly the same result with no villainy required. Every year lots of studies of different things get done. Only some of them make it to publication and only a tiny minority of those make it into the newspapers. A study finding no effect from smoking bans is much less likely to be publishable than one that finds an effect. A study finding the opposite of the expected result is more likely to be dismissed as an anomaly due to statistical variation or experimental error than one confirming the expected result — possibly by a researcher reluctant to get labeled as an apologist for smoking. And, among published studies, one that provides evidence for something that lots of people want to believe is more likely to make it into the newspapers than one that doesn't.

Arguably this is a problem that has been made worse by technological progress. In the old days, back when my father was doing his original research, a multiple regression required hours of work with an elaborate desktop calculator; there was one on the desk of his home office. Nowadays it requires a few minutes with the sort of computer everyone has — I am typing this blog post on a machine that could do it, sitting at a picnic table at Porcfest.

The easier it is to do a regression the more get done and similarly with other forms of data processing. The more regressions are done, the greater the chance that one will support your theory, even if the theory isn’t true. A confidence level of .05 is often misinterpreted as meaning that there is only one chance in twenty that the positive result is due to chance. What it actually means is that if the theory being tested were false in the particular way defined by the null hypothesis — if, for a simple example, the coin you were flipping had equal chances of coming up heads or tails — the chance that the evidence for your theory would come out as strong as it did is less than .05.

Run your coin flipping experiment forty times with a fair coin and odds are you will get the theory that it is biased confirmed at the .05 level at least once.

As one commenter on my blog put it:

Computers have created many more monkeys typing on many more typewriters.

The effect is amplified by the fact that most people using statistical factoids neither know nor care how reliable the information embedded in them is, provided it supports the position they want supported.

I first came across the second hand smoke issue when my university proposed to ban smoking anywhere on the campus and supported the ban with the claim that second-hand smoke kills more than fifty thousand people a year. That turned out to be a misstatement of a claim in a 2005 report from the California EPA. In that report 50,000 was the midpoint of a range of possible values. In the justification for the proposed ban, it was converted to a lower bound.

Reading the 2005 report I was unable to find out where the number came from. It also appears in a Surgeon General's Report, but reading that it is reasonably clear that it was simply repeating the CA EPA figure not offering an independent estimate. My best guess is that it was based on the sort of study described above.

When I raised the question by email with a supporter of the ban it became clear that he neither knew nor cared. A university, a Jesuit university, was willing to proclaim a purported scientific fact without any effort to determine whether it was true. The motive, I suspect, was a paternalistic desire to discourage smoking, which does indeed have statistically demonstrable negative effects, by making it less convenient.

My one Jesuit friend and colleague is unfortunately no longer alive and I am retired from SCU so not likely to find another, but I should look for an opportunity to ask a member of the Order what their view is of telling lies for good purposes.

As of 2019, the bottom quintile of households by income paid .1% of federal income taxes.

Mr T.

Jun 21, 2023

I think I found the story about nutrition:

http://web.archive.org/web/20230206155922/https://www.thenewsherald.com/2012/03/14/study-healthy-eating-costs-more/

Expand full comment

1 reply by David Friedman

Johnathan

If we say that all taxes are paid by flesh and blood people, then who pays the corporate income tax? I think the answer is people who have stakes in corporations. I think this fact makes your argument even stronger when you consider how high these tax rates have been in the US. Rich people pay a lot of corporate income taxes even though they don't write the check themselves.

8 more comments...

David Friedman’s Substack

Discussion about this post