ABSTRACT
Traditionally, the representativeness and sufficiency of data in environmental monitoring efforts are judged against an external standard, such as a pre-determined statistically-based survey design intended to achieve specified data quality objectives. However, given the nature of oil spill responses, where the primary focus is usually on finding the oil and documenting exposure related to the release, samples collected from oil spill studies rarely follow a statistically-based, pre-determined sampling design. Using water chemistry data from the Deepwater Horizon oil spill, we have developed statistical, observational, and forensic approaches to evaluate the representativeness and sufficiency of field-collected chemistry data to characterize exposure in the water column. Three complementary approaches were selected for evaluating water chemistry data. The first “Zone” approach evaluates properties of the data within defined spatial-temporal exposure zones. The second “Probability” approach examines the data independent of predetermined zones using three dimensional (3D; i.e., latitude, longitude, depth) modeling methods (interpolation, contouring) to assess whether the field-collected water chemistry data alone provide sufficient information to model chemical exposure in the water column. The final “Design” approach compares the field-collected sample data to a theoretical sampling design that could have been developed at the beginning of the incident. Integral to all of these approaches is a pre-analysis screening that considers the original objective of each sample collection and the method of sample collection. Review of the chemical forensics of samples can provide further refinement. In this way, samples that were collected as part of a targeted effort based on visual or sensor observations can be considered in light of the context in which they were collected. The results of this analysis can be used to inform future oil spill sample collections to provide sufficient and representative samples that meet the immediate needs of the response as well as longer-term damage assessment determinations.
INTRODUCTION
Characterizing the nature and extent of oil exposure to natural resources is essential to assessments of potential impacts following an oil spill (Boehm and Page 2007). Water samples collected during the oil spill response provide one foundation of this characterization for aquatic resources. They describe the chemical composition and concentration of oil constituents at the time and location where the samples were collected and resources may have been exposed. Both the quantity of independent observations (i.e., samples) and the quality of the data are keys to a successful assessment, but because of the typically brief duration of oil spills, the number of water samples collected is often limited. When a detailed characterization of the aquatic exposure field is necessary, three-dimensional fate and transport models are often used to estimate the spatial and temporal distribution of oil components in the water column (e.g., French-McCay 2004; Reed et al. 1995). These models, however, rely on robust water chemistry samples to calibrate and validate exposure predictions.
The Deepwater Horizon (DWH) oil spill produced the largest number of water chemistry samples ever collected during and after an oil spill, with more than 11,000 water samples collected during calendar year 2010 (Boehm et al. 2016). This large volume of data provided the opportunity to develop approaches to evaluate the sufficiency and representativeness of field-collected water chemistry samples to characterize exposure. Sufficiency considers whether the field-collected data are of required quantity in space and time to describe, with a reasonable degree of scientific certainty, the chemical exposure field(s) in the water column. Representativeness considers whether the data adequately represent the chemical exposure fields in time and space with reasonable bounds on uncertainty.
Traditionally, the representativeness and sufficiency of data in environmental monitoring efforts are judged against an external standard, such as a pre-determined statistically-based survey design guided by specific data quality objectives. In the case of an active oil spill without time or knowledge to construct a sampling design to meet every objective, a posteriori analysis of the statistical properties of the data themselves can inform evaluation of the sufficiency and representativeness of the data. Here we present statistical, observational, and forensic approaches to evaluate the sufficiency and representativeness of field-collected chemistry data. These approaches include an assessment that evaluates the properties of the data within defined spatial-temporal exposure zones (referred to as the “zone” approach); a modeling assessment that examines the probability of exceeding a specified hydrocarbon concentration using three dimensional (3D; i.e., latitude, longitude, depth) modeling that incorporates interpolation, contouring and resampling (referred to as the “probability” approach); and a data assessment that compares the field-collected sample data to a theoretical sampling design (referred to as the “design” approach).
METHODS
Data
The approaches presented here were developed using offshore water samples collected during the DWH response and natural resource damage assessment activities from May 5 through August 3, 2010 (the “spill period”) (Boehm et al., 2016) and analyzed for petroleum-related chemical constituents. Though the well was capped and oil ceased to flow into the ocean after July 15, August 3 was the last day with actionable oil reported on the sea surface (OSAT, 2010). These data are publically available from the Gulf of Mexico Research Initiative website created by BP (BP Gulf Science Data, 2016). Over 5,350 water samples analyzed for polycyclic aromatic hydrocarbons (PAH), excluding field replicate and other quality assurance samples, were collected during the “spill period” from 1,048 unique locations (latitude and longitude). Sampling occurred on 83 of 91 days between May 5 and August 3. We limited our analyses to samples with measured concentrations of PAH collected during this “spill period,” excluding field duplicate and other quality control samples, to focus on peak exposure concentrations that would be relevant to environmental impact assessment. Total PAH (TPAH) was calculated as the sum of the 42 PAH parameters most frequently measured in this dataset.
The question of representativeness implicitly entails an assessment of any bias in the data collection. The water chemistry data underlying these analyses represent samples collected by multiple studies with multiple objectives and, as a result, without an overarching, singular, pre-determined survey design. Thus, part of considering whether the data accurately represent the range and frequency of chemical exposures in the water column requires an assessment of how the data may be influenced by the various sampling strategies employed. Based on a review of the sampling plans and the data collected, sampling programs were designed based on four primary strategies: (1) systematic sampling that identified sampling locations and depths before any observations in the field; (2) adaptive depth sampling that utilized field observations of dissolved oxygen (DO) and/or fluorescence to sample subsurface anomalies in the water column; (3) adaptive location sampling that targeted locations for sampling based on field or other observations where oil was more likely to be present (frequently referred to as “plume chasing”); and (4) field testing of dispersant use that resulted in samples collected in, under, and nearby oil slicks. Systematic samples were collected at locations and depths determined before gaining knowledge in the field as described in the work plan for each study. The consequences of these different sampling designs on the data collected are explicitly considered as part of the zone approach.
Statistical Analyses
Zone Approach
The zone approach evaluates properties of the data within selected defined spatial-temporal exposure zones. Focal points of these analyses include the overall distribution of chemical concentrations as well as characterization of the mean concentration, variability, and frequency of exceeding a specified concentration of TPAH (defined for purposes of this analysis as 0.3 μg/L).
To develop methods using water chemistry data collected in response to the DWH oil spill, defined exposure zones included a primary exposure zone (PEZ) within 18.5 km (10 nmi) surrounding the wellhead and outside the PEZ, which included offshore samples greater than 18.5 km from the wellhead but excluded nearshore samples located within 5.6 km (3 nmi) of shore. The depths identified for analysis included surface (0–10 m), mid-depth (10–200 m), and deep waters (1,000–1,200 m). These depth intervals reflected the depth horizons where oiling was greatest (surface and deep waters) and the depth horizons with the greatest biological abundance and, therefore, potential for exposure (surface and mid-depth).
Three separate analyses constitute the zone approach. The first analysis compares concentrations to test whether multiple sampling surveys with varying objectives and methods characterize similar concentrations within the same exposure zone using analysis of variance (ANOVA) with post-hoc multiple comparisons of the mean concentration from each study against the overall mean across all studies (Sokal and Rohlf, 1981). The second analysis to evaluate representativeness in the zone approach compares the frequency distribution of measured concentrations for all samples to the distribution for only samples collected from systematic sampling surveys to assess whether the concentration distribution for all samples derived from both systematic and adaptive sampling is similar to the distribution for systematically-collected samples only. The final analysis of the zone approach considers whether collecting more water samples would have substantially reduced uncertainty in the mean concentration of TPAH. Statistical resampling, or bootstrapping, methods were used to investigate the stability of estimates (i.e., mean and variance) compared to the number of samples and address the sufficiency of the data within the selected exposure zones. Individual random selections from the entire dataset (i.e., sampling with replacement) produced hypothetical groups of samples ranging in size from 1 to 500 (Davison and Hinkley, 1997). The mean and confidence interval derived from these groups of samples were then plotted to show how the exposure statistics (mean and confidence interval) change with increasing numbers of samples.
Probability Approach
The probability approach relies on a 3D model using spatial interpolation to explore the sufficiency of water chemistry to estimate exposure above or below a specific concentration at a particular location. The approach uses a mathematical function (variogram) to describe the spatial relationships between measured TPAH in water samples. A probabilistic interpolation technique known as sequential Gaussian simulation is used to populate a 3D grid that overlays the area around the release site. The distribution of chemical concentrations at each grid node reflects the magnitude and variation of concentrations at that particular location but is not independent of other nearby locations. Due to the dependence among grid nodes derived from spatial relationships, each simulation produces dependent grid node concentrations but repeated model runs results in independent simulations of possible water chemistry concentrations over the complete 3D grid. The proportion of possible water chemistry concentrations above or below a selected concentration at each grid node provides a measure of the sufficiency of the field sample data to estimate exposure relative to a specified concentration. We assumed that when 70% of the concentration simulations at a given node resulted in estimates in the same direction relative to the specified TPAH concentration (i.e., consistently above or below), the available data at that node were “sufficient” to characterize the concentration of TPAH at that location in the water column as above or below the specified concentration. We were then able to connect the grid nodes that had sufficient information to define the grid cells, and thus 3D regions of the water column, where the measured data were sufficient to estimate exposure relative to the specified concentration with a reasonable level of scientific confidence.
The 3D model for the DWH data was constructed for a 37 × 37 km square area centered on the Macondo wellhead extending to 1,600 m depth (total volume 2,195 km3). The shape (square grid) and extent (37 km) were selected to optimize the model run by balancing the complexity of the analysis and the computer-intensive nature of 3D simulation. For this analysis, variogram functions describing the relationship between concentration and distance were fit to the water chemistry samples by month for May, June, and July–August 3, 2010, using log-transformed concentrations. A separate variogram was also fit for the combined data collected during the May 5 through August 3 spill period. Sufficiency within the water column was quantified using the 3D model with specified concentrations of TPAH ranging from 0.1 μg/L to 1.0 μg/L at 0.1 μg/L increments.
Design Approach
This approach compares the water sampling data to a theoretical stratified sampling design that considers the statistical objective of representativeness. The theoretical stratified sampling design used here was based on two geographic variables and one physical variable: distance from Macondo wellhead, angle or direction from wellhead, and water depth. The design was also constrained by the information available at the time the DWH oil spill began and does not benefit from information gained during the initial sampling period. Thus, assuming dominant ocean currents would carry oil in specific directions, the analysis area was partitioned, or stratified, by four directions (southeast, southwest, northeast, and northwest), eight distance rings around the wellhead, and five depth intervals.
Two case studies were considered as examples. Case 1 included locations up to 650 km from the wellhead and Case 2 represented a much smaller area up to 50 km from the wellhead. For each case, allocation of samples in the stratified design was limited to approximately the same number of total samples that were actually collected to allow direct comparison of exposure metrics (mean, variance). Variability of concentrations within 10 km of the wellhead (within the first two geographic strata) was assumed to be five times higher than other strata as a hypothetical assumption to focus samples in the area most oiled. This resulted in strata nearest the wellhead being allocated more samples than the proportional water volume of these strata. The final sampling designs include 160 strata for Case 1 (32 geographic strata by 5 depth strata) and 120 strata for Case 2 (24 geographic strata by 5 depth strata).
Once the number of samples per strata was allocated, a bootstrap resampling analysis was used to simulate concentration estimates for the designed sampling effort. This analysis generated a large number of simulated groups of samples (of the size determined by the allocated design) by random selections from the measured concentrations. The resulting simulated data for each stratum were compared with the field-collected samples for that stratum in terms of the number of samples, mean concentration, and frequency of exceeding a TPAH concentration of 0.3 μg/L. Representativeness of the field sampling was evaluated by comparing the number of samples allocated by the sampling design against the actual number of field samples collected.
RESULTS AND DISCUSSION
Zone Approach
Results of analyses included in the zone approach are reported in “Statistical Report Cards” (SRCs). The figure at the top of the SRCs compares concentrations between the sampling studies and indicates the number of samples collected by each study on the right side of the figure. Significant differences between studies appear as a 95% confidence interval entirely to the left or right of the vertical line at zero, i.e., no difference from the grand mean. If no significant differences were identified, then boxplots characterize the distribution of concentrations for each study. The more similar the distributions (closer together or overlapping) are for any two studies, the more similar the concentrations are, and the less likely that differences in sampling plan objectives affected the representativeness of the data.
The middle figure of the SRCs summarizes the bootstrap analysis of sample size. If the number of field-collected samples is within the sample size range with narrow and relatively stable bootstrap confidence intervals, then the field collected data are considered sufficient to provide reliable estimates of exposure.
The graph at the bottom of the SRCs provides the frequency distribution of TPAH for all samples (both systematic and adaptive sampling) in comparison to the distribution for only systematic samples. A lower concentration distribution for the systematic samples alone indicates the full dataset oversampled high concentration areas, as suggested by the adaptive sampling objectives.
Our analysis of the water chemistry data associated with the DWH oil spill indicated no consistent bias created by the adaptive sampling objectives. Systematic surveys collected samples with concentrations both higher and lower than adaptive surveys. The distribution of all sample concentrations was not consistently higher than the distribution for only the systematic survey samples. Comparisons of the field data to bootstrapped confidence interval distributions indicated time periods with fewer samples (e.g. May) had greater uncertainty in estimating the mean and 95% confidence interval of TPAH concentrations than time periods with more samples that had narrower confidence intervals and less uncertainty. Field data were considered sufficient if a larger number of samples would not have substantially reduced uncertainty in the mean concentration. Figure 1 demonstrates results from the primary exposure zone in June 2010 for water depths between 10 and 200 m. In contrast, Figure 2 shows results for the surface interval outside the PEZ in June 2010 with lower concentrations measured from the systematic sampling studies. When the dataset (May 5–August 3, 2010) is examined in aggregate, there are significant differences in the mean TPAH concentrations between studies; however, there is no systematic bias resulting from adaptive sampling compared to samples collected from fixed stations.
Probability Approach
Sufficiency of water chemistry samples was evaluated based on the ability to quantify concentrations with reasonable scientific confidence relative to a specified TPAH concentration. Using this approach, the volumes of water with sufficient and insufficient information to estimate concentrations relative to incremental specified TPAH value between 0.1 and 1.0 μg/L are tabulated in Table 1. The results demonstrate that 3D simulations of spatial dependency can be used to characterize exposure fields relative to specific values, which provides a greater understanding of concentrations within the water column throughout the modeled area surrounding the wellhead. This approach also highlights areas with more variable concentrations, fewer samples, and areas with TPAH concentrations close to the specified value.
Reasonable scientific confidence was interpreted as having 70% of the estimated simulated concentrations at a grid node above (or below) the specified threshold. Sufficiency of the field data depends on the specified concentration as well as the assumed threshold for scientific confidence, such that a higher threshold (e.g., >90% of the estimated simulated concentrations at a grid node) would result in a lower estimate of sufficiency, while a lower threshold for scientific confidence would result in a higher estimate of sufficiency.
Application of the results of the 3D interpolation must consider the specific metric used to evaluate sufficiency in light of any intended application of the data. Although there may be insufficient information to quantify concentrations relative to a low specified concentration, the same dataset may contain sufficient information with respect to a higher specified value. Similarly, within a smaller region of interest the dataset may contain sufficient information to quantify concentrations relative to a specified concentration of interest. Therefore, results of 3D modeling of sufficiency must be considered in light of the spatial scale upon which the 3D model was built, the relevance of any particular specified value considered, the threshold for scientific confidence, and the application of water chemistry data.
Design Approach
Field sampling was considered sufficient if the field-collected samples achieved confidence intervals of similar width (or narrower) than the number of allocated samples based on the hypothetical sampling design for mean TPAH concentration estimates and the frequency of exceeding the specified TPAH concentration. Table 2 provides a comparison of the water chemistry data from the DWH oil spill collected in the southwest direction from the wellhead and the concentration estimates derived from the two sampling designs developed for Case 1 and Case 2 using the hypothetical sample allocation.
Based on our analysis of the DWH water chemistry data, the field collected sampling, which included adaptive sampling, resulted in more samples collected in areas with more variable exposure conditions. The increased number of samples near the wellhead produced a more confident estimate of the mean with smaller variability than estimates derived from the number of samples allocated by the hypothetical design. Likewise, although fewer samples were collected distant from the wellhead than allocated by the hypothetical design, the less variable sample concentrations in these areas resulted in comparable mean and variance estimates for the field-collected samples compared to estimates from simulations of the designed number of samples. This result is particularly important given that, in general, the area near the source of an oil spill is frequently over-represented by data collection, and the areas distant from the source are typically under-represented despite the ultimate need to evaluate exposure conditions throughout the larger area.
CONCLUSIONS
Three different and complementary approaches were developed to consider the sufficiency and representativeness of water chemistry data following an oil spill. The zone approach relied on an understanding of oil fate and transport, supported by descriptive analyses of the measured water chemistry, to evaluate the representativeness and sufficiency of the water chemistry data collected within user-selected geospatial exposure zones. The probability approach created a 3D simulation of water chemistry concentrations, based on a model of spatial dependence derived from the field-collected samples, to quantify the volume of water with sufficient information to conclusively determine if concentrations were above or below a specified TPAH concentration. Lastly, the theoretical sampling design approach addressed the question of data representativeness and sufficiency by comparing an actual sampling effort with a hypothetical sampling strategy designed to achieve representative samples across a defined area by appropriately allocating samples into spatially-defined strata. Each analysis provides a different perspective on the representativeness and sufficiency of a water chemistry dataset and informs the application of these data to exposure assessment.