The graph above, which Iain Murray claimed showed that
“The fact that the ten hottest years happened since 1991 may well be an artifact of the collapse in the number of weather monitoring stations contributing to the global temperature calculations following the fall of communism (see graph)”
comes from
this paper by Ross McKitrick. McKitrick recently was in the news for publishing a controversial paper that claimed that an “audit” of the commonly accepted
reconstruction of temperatures over the past 1000 years was incorrect, so I thought it would be interesting to “audit” McKitrick’s graph.
I should first caution readers that I am not an expert in this area—I’m a computer scientist, not a climatologist. In other words, I’m no better qualified to comment on this than McKitrick. McKitrick writes:
“The main problem in the debate over what the Global Temperature is doing is that there is no such thing as a Global Temperature. Temperature is a continuous field, not a scalar, and there is no physics to guide reducing this field to a scalar, by averaging or any other method. Consequently the common practice of climate measurement is an ad hoc approximation of a non-existent quantity.”
This is untrue. Average temperature has a real, physical meaning. For example, if I have one kg of water at 20 degrees and another at 30 degrees, then their average temperature is 25 degrees. This is the temperature I would get if I mixed the water.

McKitrick then reproduces this graph (figure 2) (from GISS), describing it as “NASA’s version of this simulacrum”. He claims that a decreases in the number of weather stations is “problematic”, writing:
“In the early 1990s, the collapse of the Soviet Union and the budget cuts in many OECD economies led to a sudden sharp drop in the number of active weather stations.”
However, the graph he reproduces that shows the drop gives a different reason:
“The reasons why the number of stations in GHCN drop off in recent years are because some of GHCN’s source datasets are retroactive data compilations (e.g. World Weather Records) and other data were created or exchanged years ago.”
I looked at the GHCN data and while the number of weather stations in the former Soviet Union did drop from about 270 to 100, but the total number fell from 5000 to 2700 so the decrease there was only a small factor in the overall decrease.
McKitrick next refers to his figure at the top of this post:
“Figure 3 shows the total number of stations in the GHCN and the raw (arithmetic) average of temperatures for those stations. Notice that at the same time as the number of stations takes a dive (around 1990) the average temperature (red bars) jumps. This is due, at least in part, to the disproportionate loss of stations in remote and rural locations, as opposed to places like airports and urban areas where it gets warmer over time because of the build-up of the urban environment.”

I downloaded the raw GHCN temperature data from here, and tried to reproduce McKittrick’s graph by plotting the number of stations and the average temperature of all stations for each year. If you want to check my work, the program I wrote to do the calculations can be downloaded here. The graph above is reasonably similar to McKitrick’s graph. The biggest difference is that the right-hand vertical scale in McKittrick’s graph is clearly incorrect. The number peaked at 6,000, not 14,000 as his figure 3 indicates. (He actually has the correct number in his figure 2, which was copied from another paper.) Just taking the average of all the station temperatures is a rather poor way to estimate the global average temperature, since regions with a large number of stations will count for far too much in the global average. However, even this crude way of computing the average shows significant warming in the 90s. McKitrick’s graph is also rather misleading since the GISS graph above is not calculated this way—the stations are weighted so that regions get the correct weighting.

To test McKittrick’s claim that the warming in 90’s might have been caused by the decline in the number of stations, all I had to do was just consider the stations that has measurements for every year from 1980 to 2000. The average temperature of those stations is shown as the green line in the graph above, while the average of all stations is in red. The blue line is the average temperature shown in the GISS graph. Note that all three lines show significant warming in the 90s. Whether you analyse the data in a crude way or a sophisticated way you still see warming. It is true that after correcting for the change in the number of stations, the warming is less, but it actually agrees better with the average temperature shown in the GISS graph. If you look at Hansen et al’s paper
that describes how the GISS graph was constructed, you will find that of course they noticed and accounted for the change in the number of stations:
“Sampling studies discussed below indicate that the decline in number of stations is unimportant in regions of dense coverage, although the estimated global temperature change can be affected by a few hundredths of a degree.”
McKitrick does not acknowledge this or cite this paper.
The outcome of my analysis was just as I expected—if correcting for the change in the number of stations had removed the warming trend, Murray and McKitrick would already have told us about it.
In an email, McKitrick claimed that there were two problems with my test:
First, there was a change post-1990 in the quality of data in stations still operating, as well as the number of stations. Especially in the former Soviet countries after 1990, the rate of missing monthly records rose dramatically. So you need a subset of stations operating continuously and with reasonably continuous quality control.
However, the Soviet stations are only a small percentage of the total, so don’t make much difference. And of course, if you look at Hansen et al you find that they have extensive checks on the data quality.
McKitrick continued:
Second, if in this subset you observe an upward trend comparable to the conventional global average, in order to prove that this validates the global average you have to argue that the subset is a randomly chosen, representative sample of the whole Earth. Of course if this were true the temperature people would only use the continuously-available subset for their data products. It isn’t, which is why they don’t. It would leave you with a sample biased towards US and European cities, so it is not representative of the world as a whole. The large loss in the number of stations operating (50% in a few years) was not random in a geophysical sense, it was triggered by economic events, in which stations were closed in part if they were relatively costly to operate or if the country experienced a sudden loss of public sector resources. One can conjecture what the effect of that discontinuity was, but to test the conjecture, at some point you have to guess at what the unavailable data would have said if they were available. Because of that, I cannot see how one can devise a formal test of the representativeness of the subsample.
Now this is just wrong. You don’t need a random sample to estimate the temperature across the Earth’s surface. Temperatures tend to be quite similar at places that are close to each other. You just need to space your stations over the Earth’s surface and you have a representative sample. So you can actually estimate what the temperature would have been in the missing stations and you can actually test to see how representative the sample is and in fact Hansen et al wrote:
Sampling studies discussed below indicate that the decline in number of stations is unimportant in regions of dense coverage, although the estimated global temperature change can be affected by a few hundredths of a degree.
McKitrick, however, did not cite this paper.
McKitrick concludes:
None of this means that those researchers with access to the raw data can’t propose and implement such tests as you propose (I wish they would).
Gee, McKitrick implies that researchers hadn’t done such tests, when, as we have already seen, they had done such tests. When I challenged him on this, he contradicted himself:
I do not claim that adjustments are not being made, only that there is no formal test of their adequacy.
Presumably he talks of “formal” tests so he doesn’t have to count the tests that have actually been done. (Our entire email exchange is
here.)