On Aug 23, 2006, at 3:47 PM, Jonathan Callahan wrote:
<rant>
This topic has me immediatlely climbing on my soapbox to talk about
data management in general. The following opinions are therefore my
own and not necessarily shared by others in the LAS group.
The same applies to my comments.
[...]
In the best of all possible worlds, data managers would take the
data that is created by data providers and, where necessary,
reformat it so as to provide optimal performance for data users.
After all, the work of reformatting only has to be done once but the
work of opening 10K separate snapshot files has to be done every
single time a user makes a time series request.
I concur - however, there are a number of issues involved in
transposing data from a many-fields-one-time to a one-field-many-
times format. One concern to us, as data providers, is that many of
our analysis packages require multiple fields for each time sample
processed. It is not a trivial exercise to rewrite all of our codes.
The bigger concern is archival storage costs. Basically, what we end
up with is 2X the data volume - the original data, and the transposed
data. Considering that as a data manager, I have to keep track of
literally hundreds of terabytes of data, and we do get charged for
each and every byte, generally it's just not practical at this time
for us to double our data storage charges.
As usual, there's nothing technically difficult in creating long time-
series files from single time multifield files, however, there are
policy and other issues that make the "best of all possible worlds" a
difficult one to attain.
Gary Strand
strandwg@ucar.edu
http://www.cgd.ucar.edu/ccr/strandwg