Statistical Arguments

And Climate

May 09, 2023

Two Problems With the 1% Claim

News stories claimed that a 2014 paper by Lovejoy proved that the probability that the warming of the past century is entirely due to natural causes is less than one percent. I find the conclusion plausible enough but, so far as I can tell, there is no way that it can be derived in the way Lovejoy is said to have derived it.

The first problem, the fault of the reporters not of Lovejoy himself, is the misinterpretation of what the confidence result produced by classical statistics means. If you analyze a body of data and reject the null hypothesis at the .01 level that means that if the null hypothesis is true, the probability that the evidence against it would be as strong as it is is less than .01—the probability of the evidence conditional on the null hypothesis. That does not imply that the probability that the null hypothesis is true given that the evidence against it is that strong is less than .01—the probability of the null hypothesis conditional on the evidence. The two sound similar but are in fact entirely different.

My standard example is to imagine that you pull a coin out of your pocket, toss it without inspecting it, and get heads twice. The null hypothesis is that it is a fair coin, the alternative hypothesis that it is a double headed coin. The chance of getting two heads if it is a fair coin is only .25. It does not follow that, after getting two heads, you should conclude that the probability is .75 that the coin is double headed.

The second problem is that, so far as I can tell, there is no way Lovejoy could have calculated the probability that natural processes would produce 20th century warming from the data he was using, which consisted of a reconstruction of world temperature from 1500 to the present. The paper is sufficiently complicated so that I may be misinterpreting it, but I think his procedure went essentially as follows:

Assume that changes in global temperature prior to 1880 were due to random natural causes. Use the data from 1500 to 1875 to estimate the probability distribution of natural variation in global temperature. Given that distribution, calculate the probability that natural variation would produce as much warming from 1880 to 2008 as occurred. That probability is less than .01. Hence reject at the .01 level the null hypothesis that warming from 1880 on was entirely due to natural causes.

The problem with this procedure is that data from 1500 on can only give information on random natural processes whose annual probability is high enough so that their effect can be observed and their probability calculated within that time span. Suppose there is some natural process capable of causing a global temperature rise of one degree C in a century whose annual probability is less than .001. The odds are greater than even that it will not occur even once in Lovejoy's data, hence he has no way of estimating the probability that such a process exists. The existence of such a process would provide an explanation of 20th century warming that does not involve human action, so he cannot estimate, from his data, how likely it is that natural processes would have produced observed warming, which is what he is claiming to do. 20th century warming would, in that case, be what Taleb refers to as a Black Swan event. If one swan in a thousand is black, the observer looks at five hundred swans, finds all of them white, and concludes, incorrectly, that the probability of a black swan is zero.1

How does Lovejoy solve that problem? If I correctly read the paper, the answer is:

Stated succinctly, our statistical hypothesis on the natural variability is that its extreme probabilities ... are bracketed by a modified Gaussian...

In other words, he is assuming a shape for the probability distribution of natural events that affect global climate. Given that assumed shape, he can use data on the part of the distribution he does observe to deduce the part he does not observe. But he has no way of testing the hypothesis, since it is in part a hypothesis about a part of the curve for which he has no data.

If I am correctly reading the paper—readers of this post are welcome to correct me if they think I am not—that means that Lovejoy has not only not proved what reporters think he has, he has not proved what he thinks he has either. A correct description of his result would be that the probability that natural processes would produce observed warming, conditional on his assumption about the shape of the probability distribution for natural processes that affect global temperature, is less than .01.

One obvious question is whether this problem matters, whether, on the basis of data other than what went into Lovejoy's paper, one can rule out the possibility of natural events capable of causing rapid warming that occur too infrequently for their probability to be deduced from the past five hundred years of data. I think the answer is that we cannot. The figure below is temperature data deduced from a Greenland ice core. It shows periods of rapid warming, some much more rapid than what we observed in the 20th century, occurring at intervals of several thousand years. During one of them, "The temperature increased by more than 10°C within 40 years." The temperature shown is local not global—we do not have the sort of paleoclimate reconstructions that would be needed to spot similar episodes on a global scale. But the fact that there are natural sources of very rapid local warming with annual frequency below .001 is an argument against ruling out the possibility that such sources exist for global warming as well.

Lovejoy responded on my blog.

AGW, Considered as a Black Swan2

In order to calculate the probability that what happened could happen as a result of natural causes of temperature change, Lovejoy needed a probability distribution showing what the probability was of a natural cause producing any given temperature change. He could estimate that distribution by looking at changes over the period from 1500 to 1880 on the (plausible) assumption that humans had little effect on global temperature over that period. But that data could not tell him the probability distribution for events rare enough to be unlikely to show up in his data, for instance some cause of warming that occurred with an annual probability of only .001.

His solution to that problem was to assume a probability distribution, more precisely a range of possible distributions, fit it with the data he had and deduce from it the probability of the rare large events that might have provided a natural cause for 20th century warming. That makes sense if those events are a result of the same processes as the more frequent events, just less likely versions of them—just as flipping a coin and getting eight heads in a row is a result of the same processes that give you four, five, or six heads in a row. But it makes no sense if there are rare large events produced by some entirely different process, one whose probability the observed events tell us nothing about—if, for instance, you got four heads in a row by random chance, forty heads in a row because someone had slipped you a two headed coin. The forty heads, or the hypothetical rare cause of large warming, would be a black swan, an event sufficiently rare that it had not been observed and so was left out of the calculation.

It occurred to me, after considering a response by Lovejoy, that not only was such a black swan event possible in the context of climate, one had occurred. AGW itself is a black swan, a cause of rapid warming whose probability cannot be deduced by looking at the distribution of climate change from the period 1500 to 1880.

If the point is not clear, imagine that Lovejoy wrote his article in 1880. Since rapid warming due to human activity had not yet occurred there would be no reason for him to distinguish between causes of warming and natural causes of warming. He would interpret the results of his calculations as showing that the probability of warming by a degree C over the next 128 years was less than .01. He would be assuming away the possibility of a cause of substantial warming independent of the causes of the past warming in his data, one whose probability could not be predicted from their probability distribution.

That cause being, of course, greenhouse gases produced by human action.

Why Unlikely Events Are Not Unlikely

A FaceBook post starts:

Is it really a coincidence that so many unprecedented weather events are happening this year

with a link to a news story about a "once in 50 years" rain in Japan. It is an argument I frequently see made, explicitly or implicitly. Lots of unlikely things are happening and there must be a reason. When the subject is climate change the unlikely things are mostly about climate.

It looks convincing until you think about it. The world is large. There are lots of different places in it where, if an unusual weather event happens, it is likely to show up in the news. There are at least four categories of unusual weather events that could happen—unusually hot, unusually cold, unusually large amount of rain, unusually small amount of rain—and probably a few others I haven't thought of. A year contains four seasons and twelve months and a record in any of them is newsworthy—a recent news story, for example, claimed that this August was the hottest August in the tropics on the record.

For a very rough estimate of how many chances there are each year for an unlikely event to happen and make the news, I calculate:

100 countries prominent enough + 100 cities prominent enough +10 geographic regions (tropics, poles, North America, ...) + 50 U.S. states = 260

times

12 months + 4 seasons=16

times

4 kinds of events that would qualify

=16,640 opportunities each year for an unlikely weather event to occur and be reported.

So we would expect more than 300 once in 50 years events to happen each year and about sixteen once in a thousand year events.

My guess is that those number are too low—the story about floods in Japan does not make it clear whether the one in fifty years record is for the whole country or only one region. But they at least show why we should expect lots of unlikely things to happen each year.

If you flip a coin ten times and get ten heads, you should be surprised. If you flip sixteen thousand coins ten times each, you can expect to get ten heads about sixteen times—and should not be surprised when you do.

When I put this argument on my blog, one commenter responded:

Baseball in particular seems to provide many illustrations of this argument. You won't watch long before a commentator explains how the batter is about to break the record for most consecutive Tuesday afternoon doubles among National League teams for a right-handed batter against a left-handed pitcher born outside of the United States after the Paris Peace Accords.

Another, disagreeing with me, wrote:

People have done stats on this. Extreme stuff is happening more often than would be expected in the no-warming scenario.

The first question for that claim is not whether it is true but what it means. Warming results in more extreme high temperatures, fewer extreme low temperatures.3 To add them up you need a common definition — “extreme” is not a well defined category. Greenhouse gas warming is greater in cold times and places than in hot, so if “extreme” is defined as, say, more than five degrees C hotter than the average over the past x years (for extreme highs) and more than five degrees colder than average (for extreme lows), the total number of extreme temperatures has probably gone down — but there are other possible definitions.4

Expanding the claim to cover all climate related “extreme stuff” would raise similar problems. The IPCC projections imply that strong tropical cyclones will maintain about their current frequency but get a little stronger, weak tropical cyclones become less common. If we view any tropical cyclone as an extreme that is a decrease in extreme events, if we count only unusually strong tropical cyclones, an increase.

How to Lie With Statistics Updated

How to Lie With Statistics is an old and good book on how not to be fooled by statistical tricks. I just came across one that, so far as I can remember (I read the book a long time ago) was not included.

Someone commenting on a Facebook global warming post put up this graph. Two comments later he wrote "Notice a trend?"

The trick, of course, is that the years are arranged in order of how hot they were. 2014 is at the right end not because it is the most recent year but because it was the hottest. 2012 is at the left end because it was the coolest of the years shown. Arranging them that way guarantees the appearance of a rising trend, whether temperatures are actually going up or down.

The claim on the graph that those were “the ten hottest years globally” is false, at least if we accept the webbed NASA data, which show several years hotter than 2012.

I am somewhat oversimplifying Taleb’s use of “black swan.”

Strictly speaking, the industrial revolution, since humans were having some effect on climate earlier, just much more slowly.

“It is virtually certain that hot extremes (including heatwaves) have become more frequent and more intense across most land regions since the 1950s, while cold extremes (including cold waves) have become less frequent and less severe” (IPCC A.3.1)

“An event is considered extreme if the average temperature exceeds the threshold for a 1- in 5-yr recurrence.” Peterson et al., “Monitoring And Understanding Changes In Heat Waves, Cold Waves, Floods, And Droughts In The United States.”

Discussion about this post

Herbert Jacobi

Jun 7, 2023

I read an article a long time ago, can't remember where: Supposedly some of "the hottest" examples were created by looking back at previous "hottest" years. Recalculating them and deciding they weren't really as "hot" as they were reported to be. The recalculations always showed them to be cooler and not warmer. Since they were now cooler than they were thought to be, at least originally, the new "hottest" years, which conveniently fell within the AGW period of now supported the arguments for AGW. If, and I stress if, this is true it seems to throw all of the statistical (mystical?) analysis in a cocked hat.

Further it seems that most of the US weather stations are not in compliance with the Weather Bureaus own standards (96%?) and the data is "adjusted" using algorithms. Have no idea about stations outside of the US though it was reported a long time ago that when the Soviet Union collapsed they shut down a number of the stations in Siberia so data from that region was suspect.

As I said I have no idea if any of this is true\not true and unfortunately there doesn't seem to be much interest in finding out. Maybe the first part of lying with statistics is lying about the initial data or the recalculated data used in the statistical analysis.

Expand full comment

STEPHEN A BLOCH

May 18, 2023

IIUC, what Lovejoy has actually shown is that, at the 5% significance level, SOMETHING happened to the climate in the last 150 years that didn't happen in the preceding 375; it remains to be seen what.

So for simplicity, let's take that as a fact:_something_ happened to the climate in the last 150 years that didn't happen in the preceding 375. What can we say about the possibilities? We can rule out natural cycles with a period of less than about 300 years, or more generally "black swan" events with an annual probability more than about 1/300, because we would have seen them in the 1500-1875 period; we can't rule out natural "black swan" events with a lower probability than that. In fact, let's take "annual probability < 1/300" as our definition of "black swan".

Now let's pretend we hadn't done any climatic observations yet, and were interested in such hypothetical natural black-swan climate-warming events (henceforth NBSCE). _A priori_, the likelihood of such an event happening in the past 150 years is at most 1 - (1-1/300)^150) ~= 0.4 (possibly much less, depending on how black-swan events are distributed, which we don't know). Meanwhile, the likelihood of human activity that could plausibly affect the climate happening in the past 150 years is 1: we have extensive records of it. But we don't know for certain that it _did_ warm the climate significantly. I don't know how one would estimate that likelihood _a priori_; perhaps 2/3, based on what we know about the greenhouse effect?

So these are our priors: a probability less than 0.4 of a natural black-swan event significantly warming the climate, and a (presumably independent) probability of perhaps 2/3 that the human activity we know about has significantly warmed the climate.

P[both] = .267

P[AGW only] = .4

P[NBSCE only] =.133

P[neither] = .2.

If (with Lovejoy) we add the observation that the Earth's climate _has_ warmed significantly more in the past 150 years than in the preceding 375, we can with high confidence rule out the scenarios in which nothing new happened to warm the climate in the past 150 years. Which is the .2 in the above list of scenarios. So we divide each of the remaining probabilities by 0.8, producing

P[both | warming] = 1/3

P[AGW only | warming] = 1/2

P[NBSCE only | warming] = 1/6

We can't reject (at, say, 5% significance) the hypothesis that the warming has been entirely due to natural causes, but if you had to put money on it, you would bet on AGW-only over NBSCE-only at three to one, and you would bet on AGW happening over AGW _not_ happening at five to one.

There are a lot of guesses in the above. Maybe the _a priori_ probability of human activity warming the climate is only 1/2 or 1/3, not 2/3, which would make the case for AGW weaker. Maybe the plausible NBSCE's have a probability of 1/500 per year, not their ceiling of 1/300, which would make the case for AGW stronger. All in all, I think the statistical case for significant AGW is pretty strong, though not a "slam-dunk".