GLO2ABS

The IPCC TGICA defined a set of criteria that have been applied to identify GCM experiments whose results could be deposited at the IPCC DDC, experiments which could therefore form the basis for impacts assessments undertaken from 1998 onwards. These criteria included:

  • An IS92a-type forcing scenario
  • Historically-forced integrations
  • Integrations without/with aerosol forcing and up to 2100 for greenhouse gas only
  • Integrations with results available now and with data lodged in the public domain
  • Documented models
  • Models that have participated in AMIP/CMIP

These criteria led to an initial selection of experiments from seven modelling centres, with the possibility of others to be added in subsequent months as they qualify for inclusion.

  • The UK Hadley Centre for Climate Prediction and Research (HadCM2)
  • The German Climate Research Centre (ECHAM4)
  • The Canadian Centre for Climate Modelling and Analysis (CGCM1)
  • The US Geophysical Fluid Dynamics Laboratory (GFDL-R15)
  • The Australian Commonwealth Scientific and Industrial Research Organisation (CSIRO-Mk2)
  • The National Centre for Atmospheric Research (NCAR-DOE)
  • The Japanese Centre for Climate System Research (CCSR)

 

takes time.. time I don't have! Though I'm pleased to see that the second FSM is helpfully
chipping in to pair things up when possible.

getting seriously fed up with the state of the Australian data. so many new stations have been
introduced, so many false references.. so many changes that aren't documented. Every time a
cloud forms I'm presented with a bewildering selection of similar-sounding sites, some with
references, some with WMO codes, and some with both. And if I look up the station metadata with
one of the local references, chances are the WMO code will be wrong (another station will have
it) and the lat/lon will be wrong too. I've been at it for well over an hour, and I've reached
the 294th station in the tmin database. Out of over 14,000. Now even accepting that it will get
easier (as clouds can only be formed of what's ahead of you), it is still very daunting. I go
on leave for 10 days after tomorrow, and if I leave it running it isn't likely to be there when
I return! As to whether my 'action dump' will work (to save repetition).. who knows?
talking about convoluted data from Australian weather stations lo

on leave for 10 days after tomorrow, and if I leave it running it isn't likely to be there when
I return! As to whether my 'action dump' will work (to save repetition).. who knows?

Yay! Two-and-a-half hours into the exercise and I'm in Argentina!

Pfft.. and back to Australia almost immediately :-(   .. and then Chile. Getting there.

Unfortunately, after around 160 minutes of uninterrupted decision making, my screen has started
to black out for half a second at a time. More video cable problems - but why now?!! The count is
up to 1007 though.

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as
Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO
and one with, usually overlapping and with the same station name and very similar coordinates. I
know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh!
There truly is no end in sight. Look at this:OK LO

 

I'll have to go home soon, leaving it running and hoping none of the systems die overnight :-(((

.. it survived, thank $deity. And a long run of duplicate stations, each requiring multiple
decisions concerning spatial info, exact names, and data precedence for overlaps. If for any reason
this has to be re-run, it can certainly be speeded up! Some large clouds, too - this one started
with 59 members from each database:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations:    7
  11. 7101965  4362  -7940   78 TORONTO ISLAND                     1905 1959    -999       0
  14. 7163427  4363  -7940   77 TORONTO ISLAND A     CANADA        1957 1994    -999       0
  23. 7101987  4380  -7955  194 TORONTO MET RES STN                1965 1988    -999       0
  24. 7163434  4380  -7955  194 TORONTO MET RES STN  CANADA        1965 1988    -999       0
  36.       0  4388  -7944  233 RICHMOND HILL                      1959 2003    -999       0
  39. 7163408  4388  -7945  233 RICHMOND HILL        CANADA        1959 1990    -999       0
  40. 7163409  4387  -7943  218 RICHMOND HILL WPCP                 1960 1981    -999       0
TMax stations:    8
  70. 7101965  4362  -7940   78 TORONTO ISLAND                     1905 1959    -999       0
  71. 7126500  4363  -7940   77 TORONTO ISLAND A                   1957 1994    -999       0
  73. 7163427  4363  -7940   77 TORONTO ISLAND A     CANADA        1957 1990    -999       0
  82. 7101987  4380  -7955  194 TORONTO MET RES STN                1965 1988    -999       0
  83. 7163434  4380  -7955  194 TORONTO MET RES STN  CANADA        1965 1988    -999       0
  95.       0  4388  -7944  233 RICHMOND HILL                      1959 2003    -999       0
  98. 7163408  4388  -7945  233 RICHMOND HILL        CANADA        1959 1990    -999       0
  99. 7163409  4387  -7943  218 RICHMOND HILL WPCP                 1960 1981    -999       0

There were even larger clouds later.

One thing that's unsettling is that many of the assigned WMo codes for Canadian stations do
not return any hits with a web search. Usually the country's met office, or at least the
Weather Underground, show up - but for these stations, nothing at all. Makes me wonder if
these are long-discontinued, or were even invented somewhere other than Canada! Examples:

7162040 brockville
7163231 brockville
7163229 brockville
7187742 forestburg
7100165 forestburg

Here's a heartwarming example of a cloud which self-paired completely (debug ines included):OK LO

DBG: pot.auto i,j:    6   6
DBG: i,ncs2m,cs2m(1-5):    6   1           6    8578    8582    8596       0
DBG: paired:    6   6  WATCH LAKE NORTH    

Attempting to pair stations:
From TMin:      7103660  5147 -12112 1069 WATCH LAKE NORTH                   1987 1996    -999 -999.00
From TMax:      7103660  5147 -12112 1069 WATCH LAKE NORTH                   1987 1996   -999  -999.00
DBG: AUTOPAIRED:    6   6
<END QUOTE>

Now arguably, the MILE HOUSE ABEL stations should have rolled into one of the other MILE HOUSE ones with
a WMO code.. but the lat/lon/alt aren't close enough. Which is as intended.

*

Well, it *kind of* worked. Thought the resultant files aren't exactly what I'd expected:

-rw-------   1 f098     cru      12715138 Jul 25 15:25 act.0707241721.dat
-rw-------   1 f098     cru        435839 Jul 25 15:25 log.0707241721.dat
-rw-------   1 f098     cru       4126850 Jul 25 15:25 mat.0707241721.dat
-rw-------   1 f098     cru       6221390 Jul 25 15:25 tmn.0707021605.dtb.lost
-rw-------   1 f098     cru       2962918 Jul 25 15:25 tmn.0707241721.dat
-rw-------   1 f098     cru             0 Jul 25 15:25 tmx.0702091313.dtb.lost
-rw-------   1 f098     cru       2962918 Jul 25 15:25 tmx.0707241721.dat

act.0707241721.dat: hopefully-complete record of all activities

mergeinfo() trails)

mat.0707241721.dat: hopefully-complete list of all merges and log.0707241721.dat: hopefully-useful log of odd happenings (and pairings

tmn.0707021605.dtb.lost: too-small collection of unpaired stations

tmn.0707241721.dat: too-small output database

tmx.0702091313.dtb.lost: MUCH too-small collection of unpaired stations!!!

tmx.0707241721.dat: too-small (but hey, the same size as the twin) output database

ANALYSIS

Well, LOL, the reason the output databases are so small is that every station looks like this:

9999810  -748  10932  114 SEMPOR               INDONESIA     1971 2000    -999 -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1971  229  225  225  229  229-9999  223  221  222  225  224-9999

Yes - just one line of data. The write loops went from start year to start year. Ho hum :-/

Not as easy to fix as you might think, seeing as the data may well be the result of a merge and
so can't just be pasted in from the source database.

As for the 'unbalanced' 'lost' files: well for a start, the same error as above (just one line of data),
then on top of that, both sets written to the same file. what time did I write that bit, 3am?!! Ecch.

33. So, as expected.. I'm gonna have to write in clauses to make use of the log, act and mat files. I so do
not want to do this.. but not as much as I don't want to do a day's interacting again!!

Got it to work.. sort of. Turns out I had included enough information in the ACT file, and so was able to
write auminmaxresync.for. A few teething troubles, but two new databases ('tm[n|x].0707301343.dtb')
created with 13654 stations in each. And yes - the headers are identical :-)

[edit: see below - the 'final' databases are tm*.0708071548.dtb]

Here are the header counts, demonstrating that something's still not quite right..

Original:
   14355 tmn.0707021605.dtb.heads

New:
   13654 tmn.0707301343.dtb.heads

Lost/merged:
   14318 tmn.0707021605.dtb.lost.heads (should be 14355-13654-37 = 664?)
      37 tmn.0707021605.dtb.merg.heads (seems low)

Original:   
   14315 tmx.0702091313.dtb.heads

New:
   13654 tmx.0707301343.dtb.heads

Lost/merged:
   14269 tmx.0702091313.dtb.lost.heads (should be 14315-13654-46 = 615?)
      46 tmx.0702091313.dtb.merg.heads (seems low)

In fact, looking at the original ACT file that we used:

crua6[/cru/cruts/version_3_0/db/dtr] grep 'usermerg' act.0707241721.dat | wc -l
       258
crua6[/cru/cruts/version_3_0/db/dtr] grep 'automerg' act.0707241721.dat | wc -l
       889

..so will have to look at how the db1/2xref arrays are prepped and set in the program. Nonetheless the
construction of the new databases looks pretty good. There's aminor problem where the external reference
field is sometimes -999.00 and sometimes 0. Not sure which is best, probably 0, as the field will usually
be used for reference numbers/characters rather than real data values. Used an inline perl command to fix.

..after some rudimentary corrections:

uealogin1[/cru/cruts/version_3_0/db/dtr] wc -l *.heads
   14355 tmn.0707021605.dtb.heads
     122 tmn.0707021605.dtb.lost.heads
     579 tmn.0707021605.dtb.merg.heads
   13654 tmn.0708062250.dtb.heads
   14315 tmx.0702091313.dtb.heads
      93 tmx.0702091313.dtb.lost.heads
     570 tmx.0702091313.dtb.merg.heads
   13654 tmx.0708062250.dtb.heads

Almost perfect! But unfortunately, there is a slight discrepancy, and they have a habit of being tips of
icebergs. If you add up the header/station counts of the new tmin database, merg and lost files, you get
13654 + 579 + 122 = 14355, the original station count. If you try the same check for tmax, however, you get
13654 + 570 + 93 = 14317, two more than the original count! I suspected a couple of stations were being
counted twice, so using 'comm' I looked for identical headers. Unfortunately there weren't any!! So I have
invented two stations, hmm. Got the program to investigate, and found two stations in the cross-reference
array which had cross refs *and* merge flags:

ERROR: db2xref(  126) =      127  -14010 :
  126> 9596400 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999   91293
14010> 9596900 -4170  14710  150 CRESSY RESEARCH STAT AUSTRALIA     1971 2006    -999   91306

and

ERROR: db2xref(13948) =      227    -226 :
13948> 9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1971 2006    -999       0
  226>       0 -3570  14560  110 FINLEY (CSIRO)       AUSTRALIA     2000 2001    -999       0

So in the first case, LOW HEAD has been merged with another station (#14010) AND paired with #127.
Similarly, NARRANDERA AIRPORT has been mreged with #226 and paired with #227. However, these apparent
merges are false! As we see in the first case, 14010 is not LOW HEAD. Similarly for the second case.

Looking in the relevant match file from the process (mat.0707241721.dat) we find:

AUTO MERGE FROM CHAIN:
TMax Stn 1:       0 -4110  14680    3 LOW HEAD                 AUSTRALIA 2000 2006   -999  -999.00
TMax Stn 2:       0 -4105  14678    4 LOW HEAD             AUSTRALIA     2000 2004   -999  -999.00
New Header:       0 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999       0
Note: Stn 1 data overwrote Stn 2 data

MANUAL PAIRING FROM CHAIN:
TMin:       9596400 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999   91293
TMax:             0 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999       0
New Header: 9596400 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999   91293

and

AUTO MERGE FROM CHAIN:
TMax Stn 1:       0 -3470  14650  145 NARRANDERA AIRPORT       AUSTRALIA 2000 2006   -999  -999.00
TMax Stn 2: 9570600 -3471  14651  145 NARRANDERA AIRPORT   AUSTRALIA     1972 1980   -999  -999.00
New Header: 9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1972 2006    -999       0
Note: Stn 2 data overwrote Stn 1 data

MANUAL PAIRING FROM CHAIN:
TMin:       9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1971 2003    -999       0
TMax:       9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1972 2006    -999       0
New Header: 9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1971 2006    -999       0

Found the problem - mistyping of an assignment.. and so:

crua6[/cru/cruts/version_3_0/db/dtr] wc -l *.heads

     14355 tmn.0707021605.dtb.heads
       122 tmn.0707021605.dtb.lost.heads
       579 tmn.0707021605.dtb.merg.heads
     13654 tmn.0708071548.dtb.heads

     14315 tmx.0702091313.dtb.heads
        93 tmx.0702091313.dtb.lost.heads
       568 tmx.0702091313.dtb.merg.heads
     13654 tmx.0708071548.dtb.heads

Phew! Well the headers are identical for the two new databases:

crua6[/cru/cruts/version_3_0/db/dtr] cmp tmn.0708071548.dtb.heads  tmx.0708071548.dtb.heads |wc -l
         0

34. So the to the real test - converting to DTR! Wrote tmnx2dtr.for, which does exactly that. It reported
233 instances where tmin > tmax (all set to missing values) and a handful where tmin == tmax (no prob).
Looking at the 233 illogicals, most of the stations look as though considerable work is needed on them.
This highlights the fact that all I've done is to synchronise the tmin and tmax databases with each
other, and with the Australian stations - there is still a lot of data cleansing to perform at some
stage! But not right now :-)

* How very useful! No idea what any of that means. although it's heartwarming to see that it's
nothing like the results of the 2.10 rerun, where 1991 looked like this:

1991 vap (x,s2,<<,>>):  0.000493031  0.000742087   -0.0595093      1.86497

Now, of course, it looks like this:

1991 vap (x,s2,<<,>>):  5.93288e-05  8.18618e-07   -0.0776650    0.0261283

Still here Lo HA

1991 vap (x,s2,<<,>>):  5.93288e-05  8.18618e-07   -0.0776650    0.0261283

From this I can deduce.. err.. umm..

Anyway now I need to use whatever VAP station data we have. And here I'm a little flaky (again),
the vap database hasn't been updated, is it going to be? Asked Dave L and he supplied summaries
he'd produced of CLIMAT bulletins from 2000-2006. Slightly odd format but very useful all the
same.

And now, a brief interlude. As we've reached the stage of thinking about secondary variables, I
wondered about the CLIMAT updates, as one of the outstanding work items is to write routines to
convert CLIMAT and MCDW bulletins to CRU format (so that mergedb.for can read them). So I look at
a CLIMAT bulletin, and what's the first thing I notice? It's that there is absolutely no station
identification information apart from the WMO code. None. No lat/lon, no name, no country. Which
means that all the bells and whistles I built into mergedb, (though they were needed for the db
merging of course) are surplus to requirements. The data must simply be added to whichever station
has the same number at the start, and there's no way to check it's right. I don't appear to have a
copy of a MCDW bulletin yet, only a PDF.. I wonder if that's the same? Anyway, back to the main job.

As I was examining the vap database, I noticed there was a 'wet' database. Could I not use that to
assist with rd0 generation? well.. it's not documented, but then, none of the process is so I might
as well bluff my way into it! Units seem to vary:

As I was examining the vap database, I noticed there was a 'wet' database. Could I not use that to
assist with rd0 generation? well.. it's not documented, but then, none of the process is so I might
as well bluff my way into it! Units seem to vary:

CLIMAT bulletins have day counts:

 SURFACE LAND 'CLIMAT' DATA FOR  2006/10.   MISSING DATA=-32768
  MET OFFICE, HADLEY CENTRE CROWN COPYRIGHT
WMO BLK WMO STN   STNLP    MSLP    TEMP   VAP P DAYS RN  RAIN R   QUINT SUN HRS   SUN %   MIN_T   MAX_T
     01     001   10152   10164      5     52      9       63        2   -32768  -32768     -12      20

Dave L's CLIMAT update has days x 10:

 100100 7093  -867    9JAN MAYEN(NOR-NAVY) NORWAY       20002006        -7777777
 2000  150  120  180   60  150   20   30  130  120  150   70   70

The existing 'wet' database (wet.0311061611.dtb) has days x 100:

   10010  7093   -866    9 JAN MAYEN(NOR NAVY)  NORWAY        1990 2003   -999     -999
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1990-9999-9999-9999-9999  400  600  600 1800 1500 1100  800 1800

The published climatology has days x 100 as well:

Tyndall Centre grim file created on 13.01.2004 at 15:22 by Dr. Tim Mitchell
.wet = wet day frequency (days)
0.5deg lan clim:1961-90 MarkNew but adj so that wet=<pre
[Long=-180.00, 180.00] [Lati= -90.00,  90.00] [Grid X,Y= 720, 360]
[Boxes=   67420] [Years=1975-1975] [Multi=    0.0100] [Missing=-999]
Grid-ref=   1, 148
 1760 1580 1790 1270  890  510  470  290  430  400  590 1160

 

Well, information is always useful. And I probably did know this once.. long ago. All official WMO codes
are five digits, countrycountrystationstationstation. However, we use seven-digit codes, because when no
official code is available we improvise with two extra digits. Now I can't see why we didn't leave the rest
at five digits, that would have been clear. I also can't see why, if we had to make them all seven digits,
we extended the 'legitimate' five-digit codes by multiplying by 100, instead of adding two numerically-
meaningless zeros at the most significant (left) end. But, that's what happened, and like everything else
that's the way it's staying.

So - incoming stations with WMO codes can only match stations with codes ending '00'. Put another way, for
comparison purposes any 7-digit codes ending '00' should be truncated to five digits.

 

Well, information is always useful. And I probably did know this once.. long ago. All official WMO codes
are five digits, countrycountrystationstationstation. However, we use seven-digit codes, because when no
official code is available we improvise with two extra digits. Now I can't see why we didn't leave the rest
at five digits, that would have been clear. I also can't see why, if we had to make them all seven digits,
we extended the 'legitimate' five-digit codes by multiplying by 100, instead of adding two numerically-
meaningless zeros at the most significant (left) end. But, that's what happened, and like everything else
that's the way it's staying.

So - incoming stations with WMO codes can only match stations with codes ending '00'. Put another way, for
comparison purposes any 7-digit codes ending '00' should be truncated to five digits.

Also got the locations of the original CLIMAT and MCDW bulletins.

CLIMAT are here:
http://hadobs.metoffice.com/crutem3/data/station_updates/

MCDW are here:
ftp://ftp1.ncdc.noaa.gov/pub/data/mcdw
http://www1.ncdc.noaa.gov/pub/data/mcdw/

Downloaded all CLIMAT and MCDW bulletins (CLIMAT 01/2003 to 07/2007; MCDW 01/2003 to 06/2007 (with a
mysterious extra called 'ssm0302.Apr211542' - which turns out to be identical to ssm0302.fin)).

Wrote mcdw2cru.for and climat2cru.for, just guess what they do, go on..

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru 

MCDW2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest MCDW file: ssm0301.fin
Enter the latest MCDW file (or <ret> for single files): ssm0706.fin

All Files Processed
tmp.0709071541.dtb: 2407 stations written
vap.0709071541.dtb: 2398 stations written
pre.0709071541.dtb: 2407 stations written
sun.0709071541.dtb: 1693 stations written

Thanks for playing! Byeee!
<END QUOTE>

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru

CLIMAT2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest CLIMAT file: climat_data_200301.txt
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt

All Files Processed
tmp.0709071547.dtb: 2881 stations written
vap.0709071547.dtb: 2870 stations written
pre.0709071547.dtb: 2878 stations written
sun.0709071547.dtb: 2020 stations written
tmn.0709071547.dtb: 2800 stations written
tmx.0709071547.dtb: 2800 stations written

Thanks for playing! Byeee!

vap.0709101706.dtb: 2870 stations written
rdy.0709101706.dtb: 2876 stations written
pre.0709101706.dtb: 2878 stations written
sun.0709101706.dtb: 2020 stations written
tmn.0709101706.dtb: 2800 stations written
tmx.0709101706.dtb: 2800 stations written

Thanks for playing! Byeee!
<END QUOTE>

Again, existing outputs are unchanged and the new rdy file looks OK (though see bracketed note above for MCDW).

So.. to the incorporation of these updates into the secondary databases. Oh, my.

Beginning with Rain Days, known variously as rd0, rdy, pdy.. this allowed me to modify newmergedb.for to cope
with various 'freedoms' enjoyed by the existing databases (such as six-digit WMO codes). And then, when run,
an unexpected side-effect of my flash correlation display thingy: it shows up existing problems with the data!

Here is the first 'issue' encountered by newmergedb, taken from the top and with my comments in <anglebrackets>:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
Should the incoming 'update' header info and data take precedence over the existing database?
Or even vice-versa? This will significantly reduce user decisions later, but is a big step!

Enter 'U' to give Updates precedence, 'M' to give Masters precedence, 'X' for equality: U
Please enter the Master Database name: wet.0311061611.dtb
Please enter the Update Database name: rdy.0709111032.dtb

Reading in both databases..
Master database stations:     4988
Update database stations:     2407

Looking for WMO code matches..

***** OPERATOR ADJUDICATION REQUIRED *****

In attempting to pair two stations, possible data incompatibilities have been found.

MASTER:  221130  6896   3305   51 MURMANSK             EX USSR       1936 2003   -999     -999
UPDATE: 2211300  6858   3303   51 MURMANSK             RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter 'C' for more information):
> -0.60 is minimum correlation coeff.
>  0.65 is maximum correlation coeff.
> -0.01 is mean correlation coeff.

Enter 'Y' to allow, 'N' to deny, or an information code letter: C

<OKAY - SO I'VE REQUESTED A DISPLAY OF THE LAGGED CORRELATIONS>

Master Data: Correlation with Update first year aligned to this year -v
 1936  900  600 1000  800 1000  900 1300 1700 2100 1800  900 1000    0.27
 1937  300 1400 1300  800 1400 1800  500 1200 1600 1000 1100 1500    0.15
 1938  900 1000 1500 1800 1200 1500 1200 1700  500  700 1600  700   -0.13
 1939 1500 1300 1100 1400 1200 1200 1000 1300 1800 1600 1100 1300    0.24
 1940 1000 1500 1000 1200 1100 1700 2600 1500 1500 1400 1700 1100    0.15
 1941 1800 1200 1000 1200  900 1100  900 1200 1900 1500 1000 1400    0.48
 1942  900  900 1700  900 1600 1000  600 1100 1400 1300  700  700    0.51
 1943  800 1000 1000 1300  900  800 1500 1600 1400 1500 1300 1200    0.44
 1944 1000  400  900  800 1200  600  900 2000  900 1100 1000  900    0.32
 1945  500  400  700  700  800 1800  900 1100 1200 1100 1300  700    0.19
 1946 1200 1200  100  700  900 1200  400  900  800 1900 1300 1400    0.16
 1947  900 1300 1300 1100 1600 1000  800 1400 1400 1700 2100 1900    0.09
 1948 1100 1400 1400 1200 1300 1800 1200 1700 1500 2200 2100 1900    0.10
 1949 1100 1100  500 1500 1600 1100 1500 1200 2200 2500  900 1600    0.04
 1950 1300  800 1000 1100 1700 1200 1500  800 1100 1300 1500 1400   -0.04
 1951 1100  600 1400 1400 1500 1600 2100 1300 1500 1700 2000 1700   -0.13
 1952 2100  800 1100 1800 1300 1200 2400 2200 1600 1000 1000 2300   -0.23
 1953 2100 1400 2100 1500  900  300 1300 1700 1500  800 1200  800   -0.24
 1954 2100  600 1300 1000 1300 1700 1600 2000 1800 1300 1400 1200   -0.40
 1955 2200 1300  900 1000 1600 2000 1100 1400 1000 2100 2300 1600   -0.20
 1956 1300 1100 1300  400 1600 1300  900 1500 2000 1300 2000 1400   -0.30
 1957 1700 1600 1100 1100 1900 1900 1400 1600 1400 1700 2300 2600   -0.27
 1958 1300 2200 1900  700 1500 1200 2100 1000 1900 1700 1600 1000   -0.21
 1959 2500 1800 1300  900  900 1600 1600 1500 2200 1700 1000  900   -0.33
 1960 1800 1700 1500  400 1300 1500  400 1000 1300 1500 1000 1400   -0.21
 1961 2100 1800 2200 1500  800 1400 1600 1100 1900 1200 1200 2100   -0.59
 1962 2100 1100 1000 1500 1300 1100 1300 1700 1200 2000 1600 2300   -0.37
 1963 2100 2100 2000 1000  700 2000 1400 1800 1400 1600 2000 2400   -0.56
 1964 2400 1100 1000 1700 1100 1400 1400 1400 2000 1200 2100 1800   -0.42
 1965 1400 2100 1300 1000 1700 1700 1400 2400 1300 2100 1900 2100   -0.41
 1966 1600 1600 2000 2000 1700 1200 2000 2500 2500 2700 1600  600   -0.34
 1967 2200 1700 1600 1200 1000 1400 1600 1300 1700 1500 1200 2100   -0.21
 1968 1600 1800 1800 1800 1500 1800 1400 2100 1000 2000 2100 2000   -0.28
 1969 1100  300 1900 1200 1000 1300 1500 1200 1200 2000 1700  800   -0.25
 1970 1900 1400 1200  900  600 1200 1500  700 2300 1700 1700 2100   -0.23
 1971 2000 1300 1600 1600 1200 1100 1400 1800 2000 1600 1700 1500   -0.39
 1972 1300 1200 1300 1200 1700  800 1400 1800 1900 2000 1700 1600   -0.26
 1973 1800 1100 1700  900 1200 1500  500 1800 1200 2000 2100 2100   -0.36
 1974 1100 2400  700 1600 1300 1300 1800 2000 1900 1200 1400 2400   -0.29
 1975 1500 2200 1400 1700 2500 2200 2300 1600 1700 2300 1800 2600   -0.47
 1976 1900  800 1100 1500 1000  900 1300 1800 2200 1600 1400 1600   -0.33
 1977 1800 1400 2200 1200 1600 1900 1300 1500 1500 1900 1500 2000   -0.40
 1978 1500 1800 1400 2100  700 1000 1100 1900 1700 2300 1500 2200   -0.24
 1979 1700 1700 1700 1200 1500 1800  900 1200 1800 1600 1500 2300   -0.39
 1980 1900 1300 1300 1000 1400  900  700 1100 1300 1600 2200 1700   -0.36
 1981 2600  500 1900 2000  800 1900 1500 2000 1400 1500 1800 1600   -0.46
 1982 2200 1800 1100 1600 1500 2200 1800 1400 1700 1700 1900 1400   -0.60
 1983 2400 1900 1700 1200  800 1500 1200 2000 1400 2100 2000 2500   -0.23
 1984 1900  800 1500 2000 1100 1600 2000 1700 1100 1400 1000 1200        
 1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999        
 1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999        
 1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999        
 1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999        
 1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999    0.65
 1990-9999-9999-9999-9999-9999  500 1300  900  700  900 1300  700    0.62
 1991-9999  900  500  300  700 1000 1500  700 1700 1000 1300 1300    0.54
 1992  800 1000  600  500  700  900-9999 1300-9999  700  900 1200    0.60
 1993  600  900  400  500  900 1500 1000  800  800 1000  400 1000    0.55
 1994 1300 1000  300  600  700 1000  900  600 1200    0 1400  600    0.43
 1995  900  900  600  700  700  900 1100 1300  600 1800 1300  500    0.61
 1996  500 1100  400  700  700 1200 1200 1100 1100  900 1000 1400    0.54
 1997 1200  800 1300  600  600  100  500 1100  900-9999 1000  900    0.61
 1998 1200 1300  800 1100 1100 1100  800  600 1200 1100  600 1200    0.52
 1999  600  400  600 1000  700  700 1800 1400  700 1600  800 1200    0.62
 2000 1100  600 1500 1700  900 1500  800  800 1000 1000  600  600    0.40
 2001  600  500  700  700  600  500 1200 1200  700 1300  900 1000    0.63
 2002 1000  800 1300  200  900 1100 1400 1200 1400 1800 1100  700        
 2003 1100-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999        
Update Data:
 2003 1100  700  700  500 1000  400  700 1100 1200 2100  800 1900
 2004  900  700  600  600 1300 1200 1000 1200 1400  900 1000 1000
 2005 1000  400  800 1100  900  600 1200 1000 1600 1000 1300 1200
 2006  700  500 1300  400  600 1200 1600  700 1000-9999  600 1500
 2007 1400  400  400 1300 1200 1200-9999-9999-9999-9999-9999-9999

<DO YOU SEE? THERE'S THAT OH-SO FAMILIAR BLOCK OF MISSING CODES IN THE LATE 80S,
 THEN THE DATA PICKS UP AGAIN. BUT LOOK AT THE CORRELATIONS ON THE RIGHT, ALL
 GOOD AFTER THE BREAK, DECIDEDLY DODGY BEFORE IT. THESE ARE TWO DIFFERENT
 STATIONS, AREN'T THEY? AAAARRRGGGHHHHHHH!!!!!>

 

2003 1100-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999        
Update Data:
 2003 1100  700  700  500 1000  400  700 1100 1200 2100  800 1900
 2004  900  700  600  600 1300 1200 1000 1200 1400  900 1000 1000
 2005 1000  400  800 1100  900  600 1200 1000 1600 1000 1300 1200
 2006  700  500 1300  400  600 1200 1600  700 1000-9999  600 1500
 2007 1400  400  400 1300 1200 1200-9999-9999-9999-9999-9999-9999

<DO YOU SEE? THERE'S THAT OH-SO FAMILIAR BLOCK OF MISSING CODES IN THE LATE 80S,
 THEN THE DATA PICKS UP AGAIN. BUT LOOK AT THE CORRELATIONS ON THE RIGHT, ALL
 GOOD AFTER THE BREAK, DECIDEDLY DODGY BEFORE IT. THESE ARE TWO DIFFERENT
 STATIONS, AREN'T THEY? AAAARRRGGGHHHHHHH!!!!!>

MASTER:  221130  6896   3305   51 MURMANSK             EX USSR       1936 2003   -999     -999
UPDATE: 2211300  6858   3303   51 MURMANSK             RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter 'C' for more information):
> -0.60 is minimum correlation coeff.
>  0.65 is maximum correlation coeff.
> -0.01 is mean correlation coeff.

Enter 'Y' to allow, 'N' to deny, or an information code letter: 
<END QUOTE>

So.. should I really go to town (again) and allow the Master database to be 'fixed' by this
program? Quite honestly I don't have time - but it just shows the state our data holdings
have drifted into. Who added those two series together? When? Why? Untraceable, except
anecdotally.

It's the same story for many other Russian stations, unfortunately - meaning that (probably)
there was a full Russian update that did no data integrity checking at all. I just hope it's
restricted to Russia!!

There are, of course, metadata issues too. Take:

<BEGIN QUOTE>
MASTER:  206740  7353   8040   47 DIKSON ISLAND        EX USSR       1936 2003   -999     -999
UPDATE: 2067400  7330   8024   47 OSTROV DIKSON        RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter 'C' for more information):
> -0.70 is minimum correlation coeff.
>  0.81 is maximum correlation coeff.
> -0.01 is mean correlation coeff.
<END QUOTE>

This is pretty obviously the same station (well OK.. apart from the duff early period, but I've
got used to that now). But look at the longitude! That's probably 20km! LUckily I selected
'Update wins' and so the metadata aren't compared. This is still going to take ages, because although
I can match WMO codes (or should be able to), I must check that the data correlate adequately - and
for all these stations there will be questions. I don't think it would be a good idea to take the
usual approach of coding to avoid the situation, because (a) it will be non-trivial to code for, and
(b) not all of the situations are the same. But I am beginning to wish I could just blindly merge
based on WMO code.. the trouble is that then I'm continuing the approach that created these broken
databases. Look at this one:

<BEGIN QUOTE>
***** OPERATOR ADJUDICATION REQUIRED *****

In attempting to pair two stations, possible data incompatibilities have been found.

MASTER:  239330  6096   6906   40 HANTY MANSIJSK       EX USSR       1936 1984   -999     -999
UPDATE: 2393300  6101   6902   46 HANTY-MANSIJSK       RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter 'C' for more information):
> -0.42 is minimum correlation coeff.
>  0.39 is maximum correlation coeff.
> -0.02 is mean correlation coeff.

Enter 'Y' to allow, 'N' to deny, or an information code letter: C
Master Data: Correlation with Update first year aligned to this year -v
 1936 1400  800 1700  900 1200  800  700  800 1800-9999-9999-9999    0.33
 1937 1400  800  500 1700 1500  800 1200 1000 1700 1300  700 1200    0.32
 1938 1000 1700 1200 1100 1100  800  800 1300 1400 1900 1800 1300    0.04
 1939 1100 1700 1600 1800 1500  800 1500 1900 1700 1800 1300 1300    0.09
 1940 1300  700  900  900 1800 1200  900 1300 1200 2200 1900 1800    0.08
 1941 1400 1100 1800 1000 1400 1900 1400  700 1300 1200 1900 2000    0.02
 1942 1700  900 1600  900 1200 1500 1300 1500 1200 1900 1500 1500   -0.06
 1943 1400 1300 1300  800 1400 1600 1300 1500 1900 2000  700 1900   -0.17
 1944 1900 1500 2000 1100 1200 1300 1500 1700 1800 1200 1500 1900   -0.32
 1945 1300 1000 1400 2100 2000 1100 1700  700 1600 1800 2300 1700   -0.42
 1946 2300 1900 1500 1100 1100 2000 1800 1000 1200 2100 2000 1800   -0.35
 1947 1900 1400 1600 1000 2100 1900 2100 1000 1200 2000 2100 1500   -0.35
 1948 1700 1500 1800  800 1300 1800 1700 1300 1800 2200 2000 2100   -0.15
 1949 2300 2100 1000  700 1600 1400 1200  800 2100 2000 1100 1400   -0.07
 1950 2100 2300 1000 1100 1500 1600 1600 2300 1900 1200 1100 1500    0.00
 1951 1600 1000 1500  800 1500 1400 1200  600 1800 1800 1400 2400   -0.07
 1952 1600  400 1100 1300 1100 1400  800 2000 1500 2300 1300 1600   -0.04
 1953 2000 1200 1500  500 1300 1500 1100 1200 2300 2200 1600 2100   -0.02
 1954 1700 1800  700  700 1000 1300 1200 1600 2000 1800 1800  600    0.01
 1955 2400 1400 1000 1100 1700 1200 1000 1300 1500 1300 2300 1600   -0.08
 1956 1300  800 1000 1100 1000 1000 1400 1800 1900 1900 2600 2000   -0.29
 1957 1900 1200 1700 1000 1100 1100 1100  700  800 2300 1900 2200   -0.18
 1958 1300 1600 1500  400 1500 1100 1300 1400 1900 2400 2000 1600   -0.28
 1959 1700 1600  700 1300 1700 1100 1100 1600 2000 2100 1900 1600   -0.04
 1960 1800 1600-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999    0.24
 1961-9999-9999-9999-9999-9999-9999-9999 1600 1600 1700 1900 1600    0.33
 1962 1700  800 1200  600  400 1100  900 2000 1100 1900 1700 1500    0.25
 1963 1200 1300 1700  700 1100 1600  900 1000 1100 1400 1800 2000   -0.04
 1964 1900  500 1300 1300 1200 1200 1100 1100 1700 1500 2000 1800    0.13
 1965 1200 1400  700  900 1200 1100 1300 1400 1800 2500 1000 1700    0.23
 1966 1800 1600 2100 1300 1500 2100  900 1800 1500 2400 1900  800    0.11
 1967 1600 1200 1100  600  800 1100 1100  700 1300 1200 1300 1900    0.39
 1968 1600 1400 1600 1200  900 1300 1400 1000 1700 1300 1400 1200    0.24
 1969  900 1000 1100 1500 1700 1700 1000 1800 1200 1400 1900 1300    0.04
 1970 1500 1200 1600 1400  700 1600  700 1600 1000 1500 1900 1600   -0.02
 1971 1700  400 1100 1700 1300 1700  700 2000  900 2100 2000 1900   -0.11
 1972 1200 1500 1400  800 1700 1300 1700 2000 2100 1700 2500 1900   -0.08
 1973 1200 1100 1100  700  800 1300 2100 1000 2400 1900 1800 2300   -0.11
 1974  700 1200 1800 1800 1400 1200 1000 1300 1100 1600 1900  700   -0.14
 1975 2200 1800 1400 1300 1500 1500 1400 1500 1400 2300 1900 2100   -0.15
 1976 2000 1500  600  700 1100 1600 1300 1100 1500 1800 1600 1200   -0.11
 1977 1900 1700 1800 1400 1000 1100 1000 1300 1500 1800 1700 2100   -0.15
 1978 1600 1000  800 1400 1400  800 1600 1600 2300 2200 2200 1800    0.03
 1979 1600 1600 1600  900  900 1900 1200 1700 1200 2100 1600 2000    0.00
 1980 1600 1200  500  800 1500 1100  800 1700 1200  600 2200 2200   -0.05
 1981 2000 1000 1700 1300 1500 1100  800  400 1500  800 1500 1900    0.06
 1982 2400 1800 1100 1200 1200 1100 1000 1700 1200 2100 1800 2000    0.03
 1983 2500 2100 1800 1300 1400 1200 1200 1300 1300 1900 2300 1900    0.10
 1984 1200  700  500 1300  900  800 1100 1000 1700 1600 1600 1300        
Update Data:
 2003 1500  900  600  400  900 1200  500  700 1100  600  700 1500
 2004  700  600  700  400  600 1100  500  900  900 1400 1500  600
 2005  700  400  800 1400  300  900  800  800  900  500 1200  600
 2006  800  700  900 1000  800  500 1000  500 1300 1100  700 1600
 2007 1100 1100  900  700 1300 1500-9999-9999-9999-9999-9999-9999
<END QUOTE>

Here, the expected 1990-2003 period is MISSING - so the correlations aren't so hot! Yet
the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah - there is no 'supposed', I can make it up. So I have :-)

If an update station matches a 'master' station by WMO code, but the data is unpalatably
inconsistent, the operator is given three choices:

<BEGIN QUOTE>
You have failed a match despite the WMO codes matching.
This must be resolved!! Please choose one:

1. Match them after all.
2. Leave the existing station alone, and discard the update.
3. Give existing station a false code, and make the update the new WMO station.

Enter 1,2 or 3: 
<END QUOTE>

You can't imagine what this has cost me - to actually allow the operator to assign false
WMO codes!! But what else is there in such situations? Especially when dealing with a 'Master'
database of dubious provenance (which, er, they all are and always will be).

 

1983 2500 2100 1800 1300 1400 1200 1200 1300 1300 1900 2300 1900    0.10
 1984 1200  700  500 1300  900  800 1100 1000 1700 1600 1600 1300        
Update Data:
 2003 1500  900  600  400  900 1200  500  700 1100  600  700 1500
 2004  700  600  700  400  600 1100  500  900  900 1400 1500  600
 2005  700  400  800 1400  300  900  800  800  900  500 1200  600
 2006  800  700  900 1000  800  500 1000  500 1300 1100  700 1600
 2007 1100 1100  900  700 1300 1500-9999-9999-9999-9999-9999-9999
<END QUOTE>

Here, the expected 1990-2003 period is MISSING - so the correlations aren't so hot! Yet
the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah - there is no 'supposed', I can make it up. So I have :-)

If an update station matches a 'master' station by WMO code, but the data is unpalatably
inconsistent, the operator is given three choices:

<BEGIN QUOTE>
You have failed a match despite the WMO codes matching.
This must be resolved!! Please choose one:

1. Match them after all.
2. Leave the existing station alone, and discard the update.
3. Give existing station a false code, and make the update the new WMO station.

Enter 1,2 or 3: 
<END QUOTE>

You can't imagine what this has cost me - to actually allow the operator to assign false
WMO codes!! But what else is there in such situations? Especially when dealing with a 'Master'
database of dubious provenance (which, er, they all are and always will be).

False codes will be obtained by multiplying the legitimate code (5 digits) by 100, then adding
1 at a time until a number is found with no matches in the database. THIS IS NOT PERFECT but as
there is no central repository for WMO codes - especially made-up ones - we'll have to chance
duplicating one that's present in one of the other databases. In any case, anyone comparing WMO
codes between databases - something I've studiously avoided doing except for tmin/tmax where I
had to - will be treating the false codes with suspicion anyway. Hopefully.

Of course, option 3 cannot be offered for CLIMAT bulletins, there being no metadata with which
to form a new station.

This still meant an awful lot of encounters with naughty Master stations, when really I suspect
nobody else gives a hoot about. So with a somewhat cynical shrug, I added the nuclear option - 
to match every WMO possible, and turn the rest into new stations (er, CLIMAT excepted). In other
words, what CRU usually do. It will allow bad databases to pass unnoticed, and good databases to
become bad, but I really don't think people care enough to fix 'em, and it's the main reason the
project is nearly a year late.

And there are STILL WMO code problems!!! Let's try again with the issue. Let's look at the first
station in most of the databases, JAN MAYEN. Here it is in various recent databases:

dtr.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
pre.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
sun.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
tmn.0702091139.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmn.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmp.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
tmx.0702091313.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmx.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
vap.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0

As we can see, even I'm cocking it up! Though recoverably. DTR, TMN and TMX need to be written as (i7.7).

Anyway, here it is in the problem database:

wet.0311061611.dtb:  10010  7093   -866    9 JAN MAYEN(NOR NAVY)  NORWAY        1990 2003   -999     -999

You see? The leading zero's been lost (presumably through writing as i7) and then a zero has been added at
the trailing end. So it's a 5-digi WMO code BUT NOT THE RIGHT ONE. Aaaarrrgghhhhhh!!!!!!

.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
sun.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
tmn.0702091139.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmn.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmp.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
tmx.0702091313.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmx.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
vap.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0

As we can see, even I'm cocking it up! Though recoverably. DTR, TMN and TMX need to be written as (i7.7).

Anyway, here it is in the problem database:

wet.0311061611.dtb:  10010  7093   -866    9 JAN MAYEN(NOR NAVY)  NORWAY        1990 2003   -999     -999

You see? The leading zero's been lost (presumably through writing as i7) and then a zero has been added at
the trailing end. So it's a 5-digi WMO code BUT NOT THE RIGHT ONE. Aaaarrrgghhhhhh!!!!!!

I think this can only be fixed in one of two ways:

1. By hand.

2. By automatic comparison with other (more reliable) databases.

As usual - I'm going with 2. Hold onto your hats.

Actually, a brief interlude to churn out the tmin & tmax primaries, which got sort-of
forgotten after dtr was done:

<BEGIN ABRIDGED QUOTES (separated by '#####')>
   > ***** AnomDTB: converts .dtb to anom .txt for gridding *****
   > Enter the suffix of the variable required:
.tmn
   > Select the .cts or .dtb file to load:
tmn.0708071548.dtb
   > Specify the start,end of the normals period: 
1961,1990
   > Specify the missing percentage permitted: 
25
   > Data required for a normal:           23
   > Specify the no. of stdevs at which to reject data: 
3
   > Select outputs (1=.cts,2=.ann,3=.txt,4=.stn): 
3
   > Check for duplicate stns after anomalising? (0=no,>0=km range)
0
   > Select the generic .txt file to save (yy.mm=auto):
tmn.txt
   > Select the first,last years AD to save: 
1901,2006
   > Operating...
   > NORMALS            MEAN percent      STDEV percent
   >         .dtb    3814210    65.5
   >         .cts     210801     3.6    4025011    69.2
   > PROCESS        DECISION percent %of-chk
   > no lat/lon          650     0.0     0.0
   > no normal       1793923    30.8    30.8
   > out-of-range        976     0.0     0.0
   > accepted        4024035    69.1
   > Dumping years 1901-2006 to .txt files...
#####
IDL> quick_interp_tdm2,1901,2006,'tmnglo/tmn.',750,gs=0.5,pts_prefix='tmntxt/tmn.',dumpglo='dumpglo'
#####
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: gunzip clim.6190.lan.tmn
FILE NOT FOUND - PLEASE TRY AGAIN: clim.6190.lan.tmn
Enter a name for the gridded climatology file: clim.6190.lan.tmn.grid
Enter the path and stem of the .glo files: tmnglo/tmn.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: tmnabs
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
tmn.01.1901.glo
(etc)
tmn.12.2006.glo
#####
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.
Enter a gridfile with YYYY for year and MM for month: tmnabs/tmn.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12
Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.tmn.dat  
Writing cru_ts_3_00.1901.1910.tmn.dat
(etc)
#####
   > ***** AnomDTB: converts .dtb to anom .txt for gridding *****
   > Enter the suffix of the variable required:
.tmx
   > Select the .cts or .dtb file to load:
tmx.0708071548.dtb
   > Specify the start,end of the normals period: 
1961,1990
   > Specify the missing percentage permitted: 
25
   > Data required for a normal:           23
   > Specify the no. of stdevs at which to reject data: 
3
   > Select outputs (1=.cts,2=.ann,3=.txt,4=.stn): 
3
   > Check for duplicate stns after anomalising? (0=no,>0=km range)
0
   > Select the generic .txt file to save (yy.mm=auto):
tmx.txt
   > Select the first,last years AD to save: 
1901,2006
   > Operating...
   > NORMALS            MEAN percent      STDEV percent
   >         .dtb    3795470    65.4
   >         .cts     205607     3.5    4001077    68.9
   > PROCESS        DECISION percent %of-chk
   > no lat/lon          652     0.0     0.0
   > no normal       1805313    31.1    31.1
   > out-of-range        471     0.0     0.0
   > accepted        4000606    68.9
   > Dumping years 1901-2006 to .txt files...
#####
IDL> quick_interp_tdm2,1901,2006,'tmxglo/tmx.',750,gs=0.5,pts_prefix='tmxtxt/tmx.',dumpglo='dumpglo'
#####
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.tmx
Enter a name for the gridded climatology file: clim.6190.lan.tmx.grid
Enter the path and stem of the .glo files: tmxglo/tmx.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: tmxabs
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
tmx.01.1901.glo
(etc)
tmx.12.2006.glo
#####
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.
Enter a gridfile with YYYY for year and MM for month: tmxabs/tmx.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12
Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.tmx.dat
Writing cru_ts_3_00.1901.1910.tmx.dat
(etc)
<END ABRIDGED QUOTES>

This took longer than hoped.. running out of disk space again. This is why Tim didn't save more of
the intermediate products - which would have made my detective work easier. The ridiculous process
he adopted - and which we have dutifully followed - creates hundreds of intermediate files at every
stage, none of which are automatically zipped/unzipped. Crazy. I've filled a 100gb disk!

 

This took longer than hoped.. running out of disk space again. This is why Tim didn't save more of
the intermediate products - which would have made my detective work easier. The ridiculous process
he adopted - and which we have dutifully followed - creates hundreds of intermediate files at every
stage, none of which are automatically zipped/unzipped. Crazy. I've filled a 100gb disk!

So, anyway, back on Earth I wrote wmocmp.for, a program to - you guessed it - compare WMO codes from
a given set of databases.  Results were, ah.. 'interesting':

<BEGIN QUOTE>
REPORT:

Database Title                Exact Match  Close Match  Vague Match  Awful Match  Codes Added      WMO = 0
../db/pre/pre.0612181221.dtb          n/a          n/a          n/a          n/a        14397         1540
../db/dtr/tmn.0708071548.dtb         1865         3389           57           77         5747         2519
../db/tmp/tmp.0705101334.dtb            0            4           28          106         4927            0
<END QUOTE>

So the largest database, precip, contained 14397 stations with usable WMO codes (and 1540 without).
The TMin, (and TMax and DTR, which were tested then excluded as they matched TMin 100%) database only agreed
perfectly with precip for 1865 stations, nearby 3389, believable 57, worrying 77. TMean fared worse, with NO
exact matches (WMO misformatting again) and over 100 worrying ones.

The big story is the need to fix the tmean WMO codes. For instance:

  10010   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

is illegal, and needs to become one of:
  01001   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
0001001   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
0100100   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

I favour the first as it's technically accurate. Alternatively we seem to have widely adopted the third, which
at least has the virtue of being consistent. Of course it's the only one that will match the precip:

 100100  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

..which itself should be either:

0100100  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

or:

  01001  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

Aaaaarrrggghhhh!!!!

And the reason this is so important is that the incoming updates will rely PRIMARILY on matching the WMO codes!
In fact CLIMAT bulletins carry no other identification, of course. Clearly I am going to need a reference set
of 'qenuine WMO codes'.. and wouldn't you know it, I've found four!

Location                                                N. Stations      Notes
http://weather.noaa.gov/data/nsd_bbsss.txt              11548            Full country names, ';' delim
http://www.htw-dresden.de/~kleist/wx_stations_ct.html   13000+           *10, leading zeros kept, fmt probs
From Dave Lister                                        13080            *10 and leading zeros lost, country codes
From Philip Brohan                                      11894            2+3, No countries

The strategy is to use Dave Lister's list, grabbing country names from the Dresden list. Wrote
getcountrycodes.for and extracted an imperfect but useful-as-a-reference list. Hopefully in the main the country
will not need fixing or referring to!!

Wrote 'fixwmos.for' - probably not for the first time, but it's the first prog of that name in my repository so I'll
have to hope for the best. After an unreasonable amount of teething troubles (due to my forgetting that the tmp
database stores lats & lons in degs*100 not degs*10, and also to the presence of a '-99999' as the lon for GUATEMALA
in the reference set) I managed to sort-of fix the tmp database:

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW, Australian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

 

Now of course, we can't add any of the CLIMAT bulletin stations as 'new' stations
because we don't have any metadata! so.. is it worth using the lookup table? Because
although I'm thrilled at the high match rate (87%!), it does seem worse when you
realise that you lost the rest..

* see below, CLIMAT metadata fixed! *

At this stage I knocked up rrstats.for and the visualisation companion tool, cmprr.m. A simple process
to show station counts against time for each 10-degree latitude band (with 20-degree bands at the
North and South extremities). A bit basic and needs more work - but good for a quick & dirty check.

Wrote dllist2headers.for to convert the 'Dave Lister' WMO list to CRU header format - the main difficulty
being the accurate conversion of the two-character 'country codes' - especially since many are actually
state codes for the US! Ended up with wmo.0710151633.dat as our reference WMO set.

Incorporated the reference WMO set into climat2cru.for. Successfully reprocessed the CLIMAT bulletins
into databases with at least SOME metadata:

pre.0710151817.dtb
rdy.0710151817.dtb
sun.0710151817.dtb
tmn.0710151817.dtb
tmp.0710151817.dtb
tmx.0710151817.dtb
vap.0710151817.dtb

In fact, it was far more successful than I expected - only 11 stations out of 2878 without metadata!

Re-ran newmergedb:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update - CLIMAT, MCDW, Australian - do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter 'B' for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0710151817.dtb

Reading in both databases..
Master database stations:     5836
Update database stations:     2876

Looking for WMO code matches..
   71 reject(s) from update process 0710161148

Writing wet.0710161148.dtb

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

OUTPUT(S) WRITTEN

New master database: wet.0710161148.dtb

Update database stations:         2876
 > Matched with Master stations:  2498
                 (automatically:  2498)
                   (by operator:     0)
 > Added as new Master stations:   307
 > Rejected:                        71
   Rejects file:                 rdy.0710151817.dtb.rejected
 Note: IEEE floating-point exception flags raised: 
    Inexact;  Invalid Operation; 
 See the Numerical Computation Guide, ieee_flags(3M) 
uealogin1[/cru/cruts/version_3_0/db/rd0] 
<END QUOTE>

307 stations rescued! and they'll be there in future of course, for metadata-free CLIMAT bulletins
to match with.

So where were we.. Rain Days. Family tree:

wet.0311061611.dtb
        +
rdy.0709111032.dtb  (MCDW composite)
        +
rdy.0710151817.dtb  (CLIMAT composite with metadata added)
        V
        V
wet.0710161148.dtb

Now it gets tough. The current model for a secondary is that it is derived from one or more primaries,
plus their normals, plus the normals for the secondary.

The IDL secondary generators do not allow 'genuine' secondary data to be incorporated. This would have
been ideal, as the gradual increase in observations would have gradually taken precedence over the
primary-derived synthetics.

The current stats for the wet database were derived from the new proglet, dtbstats.for:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./dtbstat

DTBSTAT: Database Stats Report

Please enter the (18ch.) database name: wet.0710161148.dtb

Report for: wet.0710161148.dtb

Stations in Northern Hemisphere:     5365
Stations in Southern Hemisphere:      778
                          Total:     6143

Maximum Timespan in Northern Hemisphere: 1840 to 2007
Maximum Timespan in Southern Hemisphere: 1943 to 2007
                        Global Timespan: 1840 to 2007

crua6[/cru/cruts/version_3_0/secondaries/rd0] 
<END QUOTE>

So, without further ado, I treated RD0 as a Primary and derived gridded output from the database:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./anomdtb

   > ***** AnomDTB: converts .dtb to anom .txt for gridding *****

   > Enter the suffix of the variable required:
.rd0
   > Select the .cts or .dtb file to load:
wet.0710161148.dtb   
   > Specify the start,end of the normals period: 
1961,1990
   > Specify the missing percentage permitted: 
25
   > Data required for a normal:           23
   > Specify the no. of stdevs at which to reject data: 
3
   > Select outputs (1=.cts,2=.ann,3=.txt,4=.stn): 
3
   > Check for duplicate stns after anomalising? (0=no,>0=km range)
0
   > Select the generic .txt file to save (yy.mm=auto):
rd0.txt
   > Select the first,last years AD to save: 
1901,2007
   > Operating...

   > NORMALS            MEAN percent      STDEV percent
   >         .dtb          0     0.0
   >         .cts     731118    45.4     730956    45.4
   > PROCESS        DECISION percent %of-chk
   > no lat/lon            0     0.0     0.0
   > no normal        878015    54.6    54.6
   > out-of-range         56     0.0     0.0
   > accepted         731062    45.4
   > Dumping years 1901-2007 to .txt files...

crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>

Not particularly good - the bulk of the data being recent, less than half had valid normals (anomdtb
calculates normals on the fly, on a per-month basis). However, this isn't so much of a problem as the
plan is to screen it for valid station contributions anyway.

<BEGIN QUOTE>
IDL> quick_interp_tdm2,1901,2007,'rd0glo/rd0.',450,gs=0.5,dumpglo='dumpglo',pts_prefix='rd0txt/rd0.'
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
    1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
    1902
(etc)
    2007
no stations found in: rd0txt/rd0.2007.08.txt
no stations found in: rd0txt/rd0.2007.09.txt
no stations found in: rd0txt/rd0.2007.10.txt
no stations found in: rd0txt/rd0.2007.11.txt
no stations found in: rd0txt/rd0.2007.12.txt
IDL> 
<END QUOTE>

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: clim.6190.lan.wet.grid2
Enter the path and stem of the .glo files: rd0glo/rd0.
Enter the starting year: 1901
Enter the ending year:   2007
Enter the path (if any) for the output files: rd0abs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A         ! this was a guess! We'll see how the results look
Right, erm.. off I jolly well go!
rd0.01.1901.glo
(etc)
<END QUOTE>

Then.. wait a minute! I checked back, and sure enough, quick_interp_tdm.pro DOES allow both synthetic and 'real' data
to be included in the gridding. From the program description:

<BEGIN QUOTE>
; TDM: the dummy grid points default to zero, but if the synth_prefix files are present in call,
;  the synthetic data from these grids are read in and used instead
<END QUOTE>

And so.. (after some confusion, and renaming so that anomdtb selects percentage anomalies)..

IDL> quick_interp_tdm2,1901,2006,'rd0pcglo/rd0pc',450,gs=0.5,dumpglo='dumpglo',synth_prefix='rd0syn/rd0syn',pts_prefix='rd0pctxt/rd0pc.'  

The trouble is, we won't be able to produce reliable station count files this way. Or can we use the same strategy,
producing station counts from the wet database route, and filling in 'gaps' with the precip station counts? Err.

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: clim.grid
Enter the path and stem of the .glo files: rd0pcglo/rd0pc.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: rd0pcgloabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Right, erm.. off I jolly well go!
rd0pc.01.1901.glo
(etc)
<END QUOTE>

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./mergegrids 
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: rd0pcgloabs/rd0pc.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.rd0.dat
Writing cru_ts_3_00.1901.1910.rd0.dat
Writing cru_ts_3_00.1911.1920.rd0.dat
Writing cru_ts_3_00.1921.1930.rd0.dat
Writing cru_ts_3_00.1931.1940.rd0.dat
Writing cru_ts_3_00.1941.1950.rd0.dat
Writing cru_ts_3_00.1951.1960.rd0.dat
Writing cru_ts_3_00.1961.1970.rd0.dat
Writing cru_ts_3_00.1971.1980.rd0.dat
Writing cru_ts_3_00.1981.1990.rd0.dat
Writing cru_ts_3_00.1991.2000.rd0.dat
Writing cru_ts_3_00.2001.2006.rd0.dat
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>

All according to plan.. except the values themselves!

For January, 2001:

Minimum      =      0
Maximum      =  32630
Vals >31000  =      1

For the whole of 2001:

Minimum      =      0
Maximum      =  56763
Vals >31000  =      5

Not good. We're out by a factor of at least 10, though the extremes are few enough to just cap at DiM. So where has
this factor come from?

Well here's the January 2001 climatology:

Minimum      =      0
Maximum      =   3050
Vals >3100   =      0

That all seems fine for a percentage normals set. Not entirly sure about 0 though.

so let's look at the January 2001 gridded anomalies file:

Minimum      =    -48.046
Maximum      =      0.0129

This leads to a show-stopper, I'm afraid. It looks as though the calculation I'm using for percentage anomalies is,

not to put too fine a point on it, cobblers. A

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: