Data Quality Control
GTMBA data undergo extensive quality control analysis to ensure that they meet stringent accuracy standards. This page summarizes the various procedures for real-time ATLAS data, delayed mode ATLAS data, and acoustic Doppler current profiler (ADCP) data. Quality indices which apply to these data are also described.
Real-time ATLAS data quality control is routinely performed on a daily, weekly, and monthly basis.
Daily quality control
The first step of daily quality analysis involves automatic flagging of data that fall outside of broad error specifications. Next, remaining data are checked against a narrower range of error specifications, and those that fall outside this range generate an error alert message. However, questionable data are not automatically removed. Rather, for each error alert, the suspect data are checked for validity by experienced data analysts.
In addition to the error checking program, daily comparisons are made between ATLAS data that are processed at PMEL and ATLAS data that are transmitted via the GTS. Any discrepancies between the data sets are immediately investigated and corrected.
Data quality control procedures are summarized in the table below:
Measurement | Preliminary gross automated error checking | Daily parameters that will generate error alerts | Additional daily checks |
Wind direction | Compass or vane zero; compass or vane constant; direction varies more than 90° from previous day. | Visual inspection of 5-day running mean wind vectors vs climatology | |
Wind velocity | Speed changes more than 5 m s-1 from previous day |
||
Relative humidity (RH) | RH set to missing if > 99.9% | Daily RH outside 65-99%; hourly RH outside 50-100% within past two weeks; changes >20% from previous day | |
Air temperature (AT) | AT set to missing if > 33.0° or < -9.0°. | Daily AT changes > 5°C from previous day; daily AT - SST > 1.4°C;daily AT outside 6-32°C; hourly AT outside 15-33°C within past two weeks | |
Sea surface temperature (SST) | SST set to missing if > 33.0° or < -9.0°. | SST changes > 5°C from previous day; SST - T at 20m or 25m > 0.2°; hourly SST outside 20-30°C within past two weeks | Visual inspection 5-day running mean plot of SST vs wind vectors |
Subsurface temperature (T) | T set to missing if > 99.99° or < -9.0°. | T changes > 5°C from previous day; vertical gradient between adjacent sensors checked. T conforms to the climatological values for the current quarter (+/- 3 s.d. of 90-day mean) | Visual inspection of T profiles |
Rainfall | Rate > 10mm hr-1 | Sensor output full scale; daily rainfall rate outside -0.1-10mm hr; daily rain rate > 1.0 mm hr for < 5% time raining; daily rain rate < 0.1mm hr for > 25% time raining | |
Shortwave radiation (SWR) | Set to missing if > 1400 W m-2. If any SWR value (mean, standard deviation, maximum) reads 0, all are set to missing for that day. | Sensor output zero or full scale; daily radiation outside 50-325 W m-2; max radiation exceeds 1350 W m-2 | Visual inspection and comparison with time series plots from neighboring sites. |
Longwave radiation (LWR) | Daily radiation outside 350-500 W m-2; daily radiation changes > 50 W m-1 from previous day; case thermistor and dome thermistor vary by more than 1°C | ||
Barometric pressure | BP changes > 5 mb from previous day; daily BP outside 990-1018 mb; hourly barometric pressure outside 990-1018 in past two weeks | Visual inspection and comparison with time series plots from neighboring sites. | |
Salinity | Computed only for conductivity in range 30.0-70.0 mS cm -1 and T > 0.0° | Salinity changes by > 0.5 psu; salinity outside 31.0-36.5 psu; density inversions computed from daily averaged salinities and temperatures > 0.05 kg m-3; salinity conforms to the climatological values for the current quarter (+/- 3 s.d. of 90-day mean) | |
Current Direction | if daily average is based on <50% of daily samples, the output is set to undersampled flag (no data reported) | direction varies more than 90° from previous day; no change in direction; | |
Current Velocity | if daily average is based on <50% of daily samples, the output is set to undersampled flag (no data reported) | speed change greater than 50 cm/s from previous day; no change in speed from previous day; | |
Position | Data from moorings which have drifted more than 1 degree of latitude or 5 degrees of longitude are excluded from data base. | Buoy position changes from deployment position by > 6nm |
Weekly real-time quality control
Every week, the National Centers for Environmental Prediction (NCEP) compiles statistics of GTMBA data transmitted via the GTS and compares these statistics to numerical weather prediction Medium Range Forecast (MRF) model output. Weekly mean and RMS differences of daily averaged TAO and NCEP 10 m winds are computed. Daily averaged NCEP winds in these computations are based on four 6-hourly forecasts at 00z, 06z, 12z, and 18z. Weekly mean and standard deviations for GTMBA air temperatures and sea surface temperatures are also computed. Based on these statistics, NCEP reports the number of suspect observations for wind, air, and sea surface temperature according to the criteria listed in the table below.
Weekly means of most variables are also compiled at PMEL and compared to COADS climatology. Conditions which generate error alerts are listed below. Anomalies are investigated by trained personnel and flagged only if there is a high probability that the data are bad.
Measurement | PMEL checks | NCEP checks |
Wind direction | Direction differs from climatology by > 30° | |
Wind vector components (U/V) | Mean and standard deviation of MRF output and TAO winds; RMS difference of MRF and TAO winds | |
Wind speed | Weekly average vs climatology | |
Relative humidity (RH) | Weekly average < 40% | |
Air temperature (AT) | Mean and standard deviation TAO AT; AT < 15.0 or > 35.0 | |
Sea surface temperature (SST) | Weekly average different from climatology by > 2°C | Mean and standard deviation TAO SST; SST < 15.0 or > 35.0 |
Subsurface temperature (T) | 20°C isotherm differs from climatology by > 25m | |
Rainfall |
Mean daily rainfall rate and standard deviation; number points since deployment where % time raining is > 30%; number points where rain rate >4mm hr-1 |
|
Shortwave radiation |
Mean daily radiation and standard deviation; number points since deployment where maximum daily radiation > 1350 W m -2 ; number points where average daily radiation > 650 W m-2; number of points average radiation < 50 W m -2 |
|
Barometric pressure |
|
|
Salinity | ||
Position |
|
|
Monthly real-time quality control
Daily averaged data are plotted by site for the most recent 12 months, and continuity between deployments is checked. Plots of daily mean data are also compared to COADS climatology.
General
Raw data recovered from the internal memory are first processed using computer programs that apply pre-deployment calibrations and generate time series in engineering units. These programs also search for missing data and perform gross error checks for data that fall outside physically realistic ranges. A computer log of potential data problems is automatically generated as a result of these procedures.
Next, time series plots, spectral plots, and histograms are generated for all data. Plots of differences between adjacent subsurface temperature measurements are also generated. Statistics, including the mean, median, standard deviation, variance, minimum and maximum are calculated for each time series.
Individual time series and statistical summaries are examined by trained analysts. Data that have passed gross error checks but which are unusual relative to neighboring data in the time series, and/or which are statistical outliers, are examined on a case-by-case basis. Mooring deployment and recovery logs are searched for corroborating information such as problems with battery failures, vandalism, damaged sensors, or incorrect clocks. Consistency with other variables is also checked. Data points that are ultimately judged to be erroneous are then flagged.
For some variables, additional postprocessing after recovery is required to ensure maximum quality. These variable-specific procedures are described below.
Rain Rate
Rainfall data are collected using a RM Young rain gauge, and recorded internally at a 1-min sample rate. The RM Young rain gauge consists of a 500 ml catchment cylinder which, when full, empties automatically via a siphon tube. Data from a 3-min period centered near siphon events are ignored. Occasional random spikes, which typically occur during periods of rapid rain accumulation, or immediately preceding or following siphon events, are eliminated manually.
Rain rates computed from first differences of 1-min accumulations are often noisy because of the sensitivity of rate calculations to noise in accumulations over short time scales. To reduce this noise, 1-min accumulations are filtered with a 16-point Hanning filter, and rates are computed at 10-min intervals. Residual noise in the filtered time series may include occasional spurious negative rain rates, but these rarely exceed a few mm hr-1. Serra et al (2001) [1] estimate the overall accuracy of 10-min data to be 0.3 mm hr -1 on average.
Subsurface Pressure (and other measurements)
The majority of ATLAS moorings are taut-line moorings. Therefore, vertical excursions of the mooring line are small in most situations, and subsurface instruments do not deviate far from their nominal measurement depths. Vertical excursions of the mooring line are detected by pressure sensors typically placed at depths of 300 m and 500 m where the largest line variations typically occur (McCarty et al (1997) [2]). Large, short-duration, upward spikes in subsurface pressure data are occasionally observed. These spikes usually indicate either purposeful or accidental interaction between fishermen and the moorings. Each spike, and its effects on the subsurface data, is individually evaluated. Data from all subsurface sensors are flagged when pressure excursions exceed the range expected for normal variability.
Salinity
Salinity values are calculated from measured conductivity and temperature data using the method of Fofonoff and Millard (1983) [3]. Surface salinity records are plotted and examined for periods of spiky data caused by response time differences between conductivity and temperature sensors. The identified spiky periods are flagged. Conductivity values from all depths are adjusted for sensor calibration drift by linearly interpolating over time between values calculated from the pre-deployment calibration coefficients and those derived from the post-deployment calibration coefficients.
A thirteen point Hanning filter is applied to the high-resolution (ten minute interval) conductivity and temperature data. A filtered value is calculated at any point for which seven of the thirteen input points are available. The missing points are handled by dropping their weights from the calculation, rather than by adjusting the length of the filter. Salinity values are recalculated from the filtered data and subsampled to hourly intervals.
The drift-corrected salinities are checked for continuity across deployments. In addition, for those deployments which had multiple depths instrumented with conductivity sensors, the records are compared to one another and checked for unusual density inversions indicating uncorrected drift of one or more instruments. If uncorrected drift is found, an attempt is made to identify the sensor at fault and adjust its data based on differences with data from adjacent depths during unstratified conditions. The procedures used to identify and adjust problematic data are similar to those described in Freitag et al (1999) [4] and used to correct Seacat salinity data.
Delayed mode daily salinity and density values are calculated by taking the mean of the available hourly values for the day. If there are fewer than 12 hourly values available, a daily mean value is not computed.
[1] Serra, Y.L., P.A'Hearn, H.P. Freitag, and M.J. McPhaden, 2001: ATLAS self-siphoning rain gauge error estimates. J. Atmos. Ocean. Tech., in press.
[2] McCarty, M.E., L.J. Mangum, and M.J. McPhaden, 1997: Temperature errors in TAO data induced by mooring motion. NOAA Tech. Memo. ERL PMEL-108, Pacific Marine Environmental Laboratory, Seattle, WA, 68 pp.
[3] Fofonoff, P., and R. C. Millard Jr., Algorithms for computation of fundamental properties of seawater, Tech. Pap. Mar. Sci., 44, 53 pp., Unesco, Paris, 1983.
[4] Freitag, H.P., M.E. McCarty, C. Nosse, R. Lukas, M.J. McPhaden, and M.F. Cronin, 1999: COARE Seacat data: Calibrations and quality control procedures. NOAA Tech. Memo. ERL PMEL-115, 89 pp.
Subsurface moored Acoustic Doppler Current Profiler (ADCP) data
Velocity profiles are obtained from upward looking Acoustic Doppler Current Profilers (ADCPs) deployed on subsurface moorings at nominal depths of 250 m to 300 m below the sea surface. The narrowband RD Instruments ADCPs have a 20 degree transducer orientation and are set to collect data with 8.68 m nominal bin and pulse lengths. The instruments collect data at a 3 second sample rate and form averages over 15 minutes beginning at the top of the hour.
Velocity data are processed and quality controlled at PMEL after the mooring is recovered and the data retrieved from the instrument's memory. The adcp velocity measurements assume a constant sound speed of 1536 m s-1 at the transducer. In situ hourly temperature and average salinity measurements are used to adjust the velocities for sound speed variations. The nominal adcp bin widths, which assume a constant sound speed with depth of 1475.1 m s-1, are adjusted using historical hydrographic sound speed profiles.
The actual depth of the ADCP transducer head is variable in time, as the mooring reacts to variations in ocean currents beneath the instrument. Therefore, velocity profiles need to be adjusted for head depth. The transducer head depth is computed using two independent methods. In the first, the hourly target strength for each beam and each depth bin is computed from the echo intensities. The sea surface appears as a maximum target strength for most (>80%) hourly profiles. A polynomial is fit to the target strengths of the three bins closest to the surface. The position of the maximum target strength with respect to the adcp transducer is then used as the depth of the instrument for each hourly profile. The second method of estimating the head depth is from pressure time series recorded by duplicate pressure sensors mounted near the adcp transducer. Estimates of head depth from the maximum target strength and the pressure sensors are typically within +/- 2m, less than half of the adcp bin width. The computed transducer head depth and the bin widths (nominal bin widths which have been adjusted for sound velocity) are used to compute the bin depths for the hourly adcp velocity data.
Near surface velocity measurements may be in error due to strong reflections from the surface that overcome the sidelobe suppression of the transducer. Hourly data are flagged as bad if the bin depth (the center of the velocity bin) is closer to the surface than
D*(1-cos(theta)) + bin width
where D is the transducer depth, theta is the angle of the transducer beam relative to vertical, and the bin width has been adjusted for sound velocity. Velocities from the remaining depth bins are then interpolated to standard depths at 5 meter intervals. Velocity time series at the shallowest five standard depths are plotted to visually verify that no contamination from surface reflections appears in the data.
The ADCP velocities are also compared with coincident point velocity measurements when available on nearby surface moorings. ADCP and point velocity measurements generally agree to within 5 cm s-1, and no velocity adjustments to the ADCPs have yet been made based on these comparisons. ADCP directions are also checked against available point velocity measurements. Average direction differences greater than 5 degrees are evaluated and adjustments made to the ADCP time series if necessary.
ADCP data are carefully reviewed when no velocity data from other nearby instruments are available for comparison. For equatorial sites, contour plots of zonal and meridional velocities are checked to ensure that no obvious aliasing of zonal flow appears in the meridional velocities, which could indicate the existence of compass error. Direction comparisons are also made with the preceding and following ADCP deployments at the same location. A depth range with minimal direction variance is selected. The average direction for these depths is computed for four time periods, the first and last two weeks of the deployment, the last two weeks of the preceding deployment and the first two weeks of the following deployment. The average direction difference is calculated for both consecutive two week pairs and used to adjust the deployment directions if necessary.
Quality indices and sensor drift
Instrumentation recovered in working condition is returned to PMEL for post-deployment calibration before being reused on future deployments. After post-deployment calibrations are made, the resultant coefficients are compared to the pre-deployment coefficients. A set of output values are computed by application of the calibration equation using pre-deployment coefficients to a set of input values. Input values are chosen so that the output values would range over normal environmental conditions. A second set of output values are generated by application of the calibration equation using post-deployment coefficients to the same set of input values. Sensor drift is calculated by subtracting the first set of output values from the second set of output values. The sensors are then assigned quality indices based on drift using the following criteria:
0 - Datum Missing.
1 - Highest Quality. Pre/post-deployment calibrations agree to within sensor specifications. In most cases, only pre-deployment calibrations have been applied.
2 - Default Quality. Default value for sensors presently deployed and for sensors which were either not recovered, not calibratable when recovered, or for which pre-deployment calibrations have been determined to be invalid. In most cases, only pre-deployment calibrations have been applied.
3 - Adjusted Data. Pre/post calibrations differ, or original data do not agree with other data sources (e.g., other in situ data or climatology), or original data are noisy. Data have been adjusted in an attempt to reduce the error.
4 - Lower Quality. Pre/post calibrations differ, or data do not agree with other data sources (e.g., other in situ data or climatology), or data are noisy. Data could not be confidently adjusted to correct for error.
5 - Sensor or Tube Failed.
When a recovered sensor meets the criteria for nominal drift, the quality index is changed from the default value of "2" to "1" for highest quality data.. When it does not meet the criteria for sensor drift, the index becomes "4". If an adjustment based on post-deployment calibrations or other information is later made, the index may then be set to "3" or "1". When damage or loss of an instrument due to vandalism, harsh environmental conditions, electronics failures, or loss of a mooring prevents post-deployment calibration, a default quality of "2" is assigned to the data.
Nominal drift criteria:
Measurement | Drift criteria |
Air temperature | 0.4°C |
Relative humidity | 4% |
Wind velocity | 0.6m s -1 or 6% |
Temperature | 0.02°C |
Salinity | 0.04 PSU |
Rainfall | 0.6mm hr-1 |
Shortwave radiation | 2 % |