[ferret_users] Re: [las_users] limit on storage

To: Gary Strand <strandwg@xxxxxxxx>
Subject: [ferret_users] Re: [las_users] limit on storage
From: Jonathan Callahan <Jonathan.S.Callahan@xxxxxxxx>
Date: Thu, 24 Aug 2006 09:49:10 -0700
Cc: oar.pmel.ferret_users@xxxxxxxx, oar.pmel.las_users@xxxxxxxx
In-reply-to: <93DF4281-1743-4335-A0D1-943A1EAB4810@ucar.edu>
References: <BAY114-F38D332BB6E5BC075C20726C4540@phx.gbl><44ECCCEC.4030202@noaa.gov> <93DF4281-1743-4335-A0D1-943A1EAB4810@ucar.edu>
Sender: owner-ferret_users@xxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)

Gary,

Your concerns are all valid. I expect in some cases it may be best to configure LAS so that time series requests are simply unavailable. In the case where time series access to the data is a requirement, however, reformatting the data should at least be considered as an option.

Reformatting the data is essentially trading CPU cycle costs for disk space costs and it is up to each institution to decide what is most efficient in their case. If you only have a little activity on your server and lots of data you should probably leave the data as snapshot files. If you have lots of activity and a moderate amount of data you shoud probably reformat the data.

I just want to get folks thinking along these lines instead of assuming that LAS and THREDDS aggregation will do some magic to make all of their data management problems go away.

-- Jon

Gary Strand wrote:

On Aug 23, 2006, at 3:47 PM, Jonathan Callahan wrote:
<rant>
This topic has me immediatlely climbing on my soapbox to talk about data management in general. The following opinions are therefore my own and not necessarily shared by others in the LAS group.
The same applies to my comments.
[...]
In the best of all possible worlds, data managers would take the data that is created by data providers and, where necessary, reformat it so as to provide optimal performance for data users. After all, the work of reformatting only has to be done once but the work of opening 10K separate snapshot files has to be done every single time a user makes a time series request.

I concur - however, there are a number of issues involved in transposing data from a many-fields-one-time to a one-field-many- times format. One concern to us, as data providers, is that many of our analysis packages require multiple fields for each time sample processed. It is not a trivial exercise to rewrite all of our codes.

The bigger concern is archival storage costs. Basically, what we end up with is 2X the data volume - the original data, and the transposed data. Considering that as a data manager, I have to keep track of literally hundreds of terabytes of data, and we do get charged for each and every byte, generally it's just not practical at this time for us to double our data storage charges.

As usual, there's nothing technically difficult in creating long time- series files from single time multifield files, however, there are policy and other issues that make the "best of all possible worlds" a difficult one to attain.

Gary Strand
strandwg@ucar.edu
http://www.cgd.ucar.edu/ccr/strandwg

References:
- [ferret_users] limit on storage
  - From: Jerome King
- [ferret_users] Re: [las_users] limit on storage
  - From: Jonathan Callahan
- [ferret_users] Re: [las_users] limit on storage
  - From: Gary Strand

Previous by thread: [ferret_users] Re: [las_users] limit on storage
Next by thread: FW: Re: [ferret_users] limit on storage

[Thread Prev][Thread Next][Index]

Dept of Commerce / NOAA / OAR / PMEL / TMAP