Another point of view on our 30 year data
2011-11-27 17:32:51.000 – Ryan Knapp, Weather Observer/Meteorologist
A nice distraction today from climate data work.
In yesterday’s comment, Roger mentioned how much joy he was getting out of “playing” with our climatological data. But you know what they say, one man’s joy is another man’s sorrow. While Roger has been enjoying the task of making inquiries into our database, pulling several lines of code, importing them into excel and calculating the data, I have been working through not only our data but that provided by the National Climatic Data Center (NCDC). To the outside world, our 30 year averages should be pretty straight forward; add one number to another number or a series of numbers then divide by the number of data available to get the average. But reading through NCDC’s product description page, it is not quite that simple.
First I have to grab several huge text files containing between 5300 to 9300 stations (depending on the normals represented) with several thousand (if not millions) of lines of data, then import this data into a spreadsheet. The data is then filtered and cleaned up and (I thought) averaged to get the data. But delving deeper into the product descriptions, I am finding that their method of computation is much more involved. It requires filtering through their algorithms to weed out simultaneous zeros, duplicates, impossible or improbable values, streaks, gaps, outliers, and several inconsistencies. Then the data is fed into equations and concepts I haven’t used since gaining my degree. Such “fun” math concepts as “pairwise algorithms”, “time-series differences”, “change points”, “x-order polynomial fits”, “standard deviations”, “mean”, “median”, “aperiodic oscillations”, and “spline-fits”. While I know how to use most of these, I don’t have any starting points or definitive descriptions of what was used. While NCDC will provide the public with descriptions on how they calculated data, they are very protective of the equations they use to calculate that data. So we are left building the equations from the ground up or looking at the data and asking ourselves “How did they…?”
While I have been in contact with NCDC to get answers to several of my questions, getting a simple answer is anything but simple. My first email was forwarded to six different people/departments before I got an answer and even then we had to send a second email to “poke the bear” again and get a response. It reminded me of the video I saw (while working for the government mind you) of government employees getting stuck on an escalator. The escalator stops and instead of just continuing on up like stairs, they begin shouting and asking for assistance as to what to do. In my case, instead of just answering my question directly and continuing on with life, we get funneled through several people to get assistance as to what to do. And the answer we got wasn’t the clear cut response we wanted/needed it was just enough to answer our questions a bit then force us even further into the realm of “How did they…?” series of questions. So all in all, it has been a very slow going, frustrating, and depressing process so far. But like Roger said, we are working towards completing the data and we will hopefully get all our ducks in a row and post everything by January 1, 2012. So please be patient as we are working to get you the data as soon as possible.
Ryan Knapp, Weather Observer/Meteorologist