A new analysis of Global Land Daily temperature data

From Clive Best: A new analysis of Global Land Daily temperature data

This post describes my attempt to reproduce global temperatures from scratch. By scratch I mean using all the original  raw temperature measurements from the NCDC daily weather archive without adjustments.


The largest accessible archive of raw temperature measurements is the NCDC Daily Archive. It consists of 3 billion measurements from 106,000 weather stations starting in 1763. I have used all this data to calculate global temperature anomalies without any corrections and without  discarding any data except where flagged as duplicates.

The method I use is based on Icosahedral grids which has the advantage of being equal areas on the earth’s surface The connections to each grid point form hexagons like those on a football. I am using a 2562 node grid, for details see: Icosahedral Binning.

https://i2.wp.com/clivebest.com/blog/wp-content/uploads/2018/04/example.pngExample distribution for December 1980. These are the average anomalies formed from all stations within each grid cell. All cells are of equal area across the earth’s surface.

https://i2.wp.com/clivebest.com/blog/wp-content/uploads/2017/12/Icos.pngFirst I calculate the grid location numbers for all 106,000 stations. Those stations which share the same grid location are assumed to follow the same climate. I can then calculate the normal monthly temperatures for each grid point as being the average over all member stations for that month covering the 30 year period  from 1961-1990, which  is the same normalisation period as that used by HADCRUT4.

The advantage of this normalisation method is that afterwards I can use it as a reference to derive temperature anomalies over the grid cell irather than for all stations individually. This means I can use every recorded station temperature covering any time period because early stations ending before 1960 and newer ones starting after 1990 can still be included due to their contribution to the average temperature in each cell. All 3 billion temperature measurements can therefore be processed. However, unlike all other studies,  I am using no adjustments or any homogenisation. So these results are based on the raw temperatures  as originally recorded, which  are illuminating.


Clearly before 1950 temperatures are much higher than any other index, including Berkeley which also uses data back to 1750. The reasons are as follows.

  1. There are only 2 or 3 stations recording temperatures back to 1750 and these are all in central Europe, however some CET stations are missing before about 1830. The number and area covered gradually grows as corresponding temperature anomalies  reduce until around 1830 when a few US & Australian stations begin to appear.
  2. The spike from 1875 to 1895 is a sudden influx of US stations. This triples the spatial coverage and so dominates the global average. Exactly why the spike appears and then disappears 15 years later is unclear to me. However  pre-industrial temperatures depend critically on any adjustments made to US stations. My results show that the raw data disagrees strongly with CRUTEM4, GISS and NCDC itself. Interestingly though Berkeley sees a hint of  the same trends before 1850.


Berkeley however use a completely different method, and the data is after adjustments and homogenisation have be applied. After 1950 the agreement with CRUTEM4 is rather good

https://i1.wp.com/clivebest.com/blog/wp-content/uploads/2018/04/Global-anoms-detail.pngDetail comparison to CRUTEM4. After 1950 the agreement is good.

Adjustments and homogenisation make only small differences to the result after 1960. However these have always increased slightly net annual warming land. Note also that the raw data implies higher average temperatures for the early 20th century.


The raw data apparently show much higher temperatures before 1950 than other datasets. Is this due to the normalisation method? Well maybe it is. If you just have one station within a grid cell, as is the case before 1850, that the anomaly relative to the many stations average in 1985 may be biased. However I wanted to use all temperature data even those without coverage in the normalisation period. In general though I believe the raw data show higher mean temperatures than the ‘corrected’ data.

I was surprised to discover just how important the US stations are in setting the pre-industrial temperature baseline, as evident by the large spike in 1880. This is because the US surface area is much larger than northern Europe, the only other location with significant coverage. Consequently USHCN corrections, which have been discussed many times before, are critical to determining how much the earth has warmed since the 19th century.

Finally here is an animation of all the monthly distributions from 1868 onwards. The couple of stripes appearing around 1919 are cells which span the dateline which I later corrected !

Processing this data takes around 30 hours of iMac computer time but takes  far more time writing the algorithm and debugging it !



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s