Calculating global temperature anomaly

From Watts Up With That:

By Nick Stokes,

There is much criticism here of the estimates of global surface temperature anomaly provided by the majors – GISS, NOAA and HADCRUT. I try to answer these specifically, but also point out that the source data is readily available, and it is not too difficult to do your own calculation. I point out that I do this monthly, and have done for about eight years. My latest, for October, is here (it got warmer).

Last time CharlesTM was kind enough to suggest that I submit a post, I described how Australian data made its way, visible at all stages, from the 30-minute readings (reported with about 5 min delay) to the collection point as a CLIMAT form, from where it goes unchanged into GHCN unadjusted (qcu). You can see the world’s CLIMAT forms here; countries vary as to how they report the intermediate steps, but almost all the data comes from AWS, and is reported at the time soon after recording. So GHCN unadjusted, which is one of the data sources I use, can be verified. The other, ERSST v5, is not so easy, but there is a lot of its provenance available.

My calculation is based on GHCN unadjusted. That isn’t because I think the adjustments are unjustified, but rather because I find adjustment makes little difference, and I think it is useful to show that.

I’ll describe the methods and results, but firstly I should address that much-argued question of why use anomalies.


Anomalies are made by subtracting some expected value from the individual station readings, prior to any spatial averaging. That is an essential point of order. The calculation of a global average is inevitably an exercise in sampling, as is virtually any continuum study in science. You can only measure at a finite number of places. Reliable sampling is very much related to homogeneity. You don’t have to worry about sampling accuracy in coin tosses; they are homogeneous. But if you want to sample voting intentions in a group with men, women, country and city folk etc, you have inhomogeneity and have to be careful that the sample reflects the distribution.

Global temperature is very inhomogeneous – arctic, tropic, mountains etc. To average it you would have to make sure of getting the right proportions of each, and you don’t actually have much control of the sampling process. But fortunately, anomalies are much more homogeneous. If it is warmer than usual, it tends to be warm high and low.

I’ll illustrate with a crude calculation. Suppose we want the average land temperature for April 1988, and we do it just by simple averaging of GHCN V3 stations – no area weighting. The crudity doesn’t matter for the example; the difference with anomaly would be similar in better methods.

I’ll do this calculation with 1000 different samples, both for temperature and anomaly. 4759 GHCN stations reported that month. To get the subsamples, I draw 4759 random numbers between 0 and 1 and choose the stations for which the number is >0.5. For anomalies, I subtract for each place the average for April between 1951 and 1980.

The result for temperature is an average sample mean of 12.53°C and a standard deviation of those 1000 means of 0.13°C. These numbers vary slightly with the random choices.

But if I do the same with the anomalies, I get a mean of 0.33°C (a warm month), and a sd of 0.019 °C. The sd for temperature was about seven times greater. I’ll illustrate this with a histogram, in which I have subtracted the means of both temperature and anomaly so they can be superimposed:

The big contributor to the uncertainty of the average temperature is the sampling error of the climatologies (normals), ie how often we chose a surplus of normally hot or cold places. It is large because these can vary by tens of degrees. But we know that, and don’t need it reinforced. The uncertainty in anomaly relates directly to what we want to know – was it a hotter of cooler month than usual, and how much?

You get this big reduction in uncertainty for any reasonable method of anomaly calculation. It matters little what base period you use, or even whether you use one at all. But there is a further issue of possible bias when stations report over different periods (see below).


Once the anomalies are calculated, they have to be spatially averaged. This is a classic problem of numerical integration, usually solved by forming some approximating function and integrating that. Grid methods form a function that is constant on each cell, equal to the average of the stations in the cell. The integral is the sum of products of each cell area by that value. But then there is the problem of cells without data. Hadcrut, for example, just leaves them out, which sounds like a conservative thing to do. But it isn’t good. It has the effect of assigning to each empty cell the global average of cells with data, and some times that is clearly wrong, as when such a cell is surrounded with other cells in a different range. This was the basis of the improvement by Cowtan and Way, in which they used estimates derived from kriging. In fact any method that produced an estimate consistent with nearby values has to be better than using a global average.

There are other and better ways. In finite elements a standard way would be to create a mesh with nodes at the stations, and use shape functions (probably piecewise linear). That is my preferred method. Clive Best, who has written articles at WUWT is another enthusiast. Another method I use is a kind of Fourier analysis by fitting spherical harmonics. These, and my own variant of infilled grid, all give results in close agreement with each other; simple gridding is not as close, although overall the method often tracks NOAA and HADCRUT quite closely.

Unbiased anomaly formation.

I described the benefits of using anomalies in terms of reduction of sampling error, which just about any method will reflect. But there is care needed to avoid biasing the trend. Just using the average over the period of each station’s history is not good enough, as I showed here. I used the station reporting history of each GHCN station, but imagined that they each returned the same, regularly rising (1°C/century) temperature. Identical for each station, so just averaging the absolute temperature would be exactly right. But if you use anomalies, you get a lower trend, about 0.52°C/century. It is this kind of bias that causes the majors to use a fixed time base, like 1951-1980 (GISS). That does fix the problem, but then there is the problem of stations with not enough data in that period. There are ways around that, but it is pesky, and HADCRUT just excludes such stations, which is a loss.

I showed the proper remedy with that example. If you calculate the incorrect global average, and then subtract it (and add later) and try again, you get a result with a smaller error. That is because the basic cause of error is that the global trend is bleeding into the anomalies, and if you remove it, that effect is reduced. If you iterate that, then within six or so steps, the anomaly is back close to the exactly correct value. Now that is a roundabout way of solving that artificial problem, but it works for the real one too.

It is equivalent to least squares fitting, which was discussed eight years ago by Tamino, and followed up by Romanm. They proposed it just for single cells, but it seemed to me the way to go with the whole average, as I described here. It can be seen as fitting a statistical model
T(S,m,y) = G(y) + L(S,m) +ε(S,m,y)
where T is the temperature, S,m,y indicate dependence on station, month and year, so G is the global anomaly, L the station offsets, and ε the random remainder, corresponding to the residuals. Later I allowed G to vary monthly as well. This scheme was later used by BEST.


So those are the ingredients of the program TempLS (details summarized here) which I have run almost every night since then, when GHCN Monthly comes out with an update. I typically post on about 10th of the month for the previous month’s results (October 2018 is here, it was warm). But I keep a running report here, starting about the 3rd, when the ERSST results come in. When GISS comes out, usually about the 17th, I post a comparison. I make a map using a spherical harmonics fit, with the same levels and colors as GISS. Here is the map for October:

The comparison with GISS for September is here. I also keep a more detailed updated Google Earth-style map of monthly anomalies here.

Clive Best is now doing a regular similar analysis, using CRUTEM3 and HADSST3 instead of my GHCN and ERSST V5. We get very similar results. The following plot shows TempLS along with other measures over the last four years, set to a common anomaly base of 1981-2010. You can see that the satellite measures tend to be outliers (UAH below, RSS above, but less so). The surface measures, including TempLS, are pretty close. You can check other measures and time intervals here.

The R code for TempLS is set out and described in detail in three posts ending here. There is an overview here. You can get links to past monthly reports from the index here; the lilac button TempLS Monthly will bring it up. The next button shows the GISS comparisons that follow.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s