Quantifying foraging resources available to waterfowl in different habitat types is important for estimating energetic carrying capacity. To accomplish this, most studies collect soil-core samples from the marsh substrate, sieve and sort food items, and extrapolate energy values to wetland or landscape scales. This is a costly and time-intensive process; furthermore, extrapolation methods yield energy estimates with large variances relative to the mean. From both research and management perspectives, it is important to understand sources of this variation and estimate the number of soil cores needed to reduce the variance to desired levels. Using 2,341 cores collected from freshwater and salt marsh habitats at four sites along the Atlantic Coast, we examined sampling variation and biological variation among sites and habitats. When we removed extreme outliers in the data caused by large animal food items found in a small core sample, estimates of energy density decreased by an order of magnitude for most habitats. After removing outliers, we found inconsistent geographical variation among habitat types that was especially pronounced in freshwater and no evidence for within-season temporal depletion of food resources for any site or habitat. We used a Monte Carlo simulation approach to estimate the optimal number of cores (minimizing both cost and estimated variance) sampled in each habitat type. Across most contexts, a reduction in the coefficient of variation reached diminishing returns near 40 core samples. We recommend that researchers explicitly address outliers in the data and managers acknowledge the imprecision that can arise from including or excluding outliers when estimating energy density at landscape scales. Our results suggest that collecting 40–50 cores per habitat type was sufficient to reduce the variance to acceptable levels while minimizing overall sampling costs.
Estuarine marsh systems provide critical food resources for migrating and wintering waterfowl along the Atlantic Coast. Quantifying food availability in coastal marsh habitats is important for building bioenergetics models such as those used to estimate the carrying capacity of overwintering habitat for waterfowl (Reinecke et al. 1989; Petrie et al. 2011; Williams et al. 2014). Managers have used a variety of methods to estimate waterfowl food abundance, including vegetation clips (Bowyer et al. 2005), seed yield models (Gray et al. 1999), surface sampling (Reinecke and Hartke 2005), and soil core sampling for seeds (Swanson 1983; Olmstead et al. 2013); and dip netting (Voigts 1976), water column sampling (Anderson and Smith 2000), and core sampling for animal foods (Sherfy et al. 2000). Core sampling is the most common type of sampling, and is the most easily adapted technique for any habitat and food type.
Soil core sampling involves pushing a cylindrical coring device into the substrate, extracting the core, and then painstakingly washing, sieving, sorting, classifying, drying, and weighing the core material over the course of many months in the laboratory (Swanson 1983). Various methods for reducing the time and cost of core sampling have been validated and are gaining traction in the management community (Kross et al. 2008; Hagy et al. 2011; Stafford et al. 2011; Livolsi et al. 2014); nevertheless, core sampling remains a time- and cost-intensive method for quantifying waterfowl food availability. There are several additional drawbacks to using core sampling to estimate the energy value of waterfowl foraging habitats, the most important of which is that small amounts of food in relatively small cores are extrapolated to a landscape scale, which leads to extreme variability in between-core estimates. For example, given a true metabolizable energy value of 3.34 kcal/g for rice (Reinecke et al. 1989), finding a single 0.025-g rice grain in a 5.1-cm-diameter core increases the landscape-level estimate of kcal/ha by >400,000 kcal. Such potential variability brings up two practical issues of concern to waterfowl managers: 1) how to treat outliers in the data, given that sporadic, high-energy cores may capture potentially important variation on the landscape; and 2) how many cores are required to accurately quantify the among-core variation in landscape-level energy estimates.
We conducted a meta-analysis of 2,341 core samples collected from four past studies in salt marsh and freshwater habitats along the Atlantic Coast (Plattner et al. 2010; Cramer et al. 2012; and two unpublished datasets) to understand large-scale geographical variation in food energy potentially available to ducks and determine the number of core samples needed for precise estimates of energy density. Additionally, because we encountered outliers in the data, we further describe a broadly applicable statistical technique for keeping moderate outliers while removing extreme outliers.
Core samples were collected from four past research sites along the Atlantic Coast (New York, Virginia, and two sites in New Jersey) within habitat types that varied in salinity, tidal regime, and management. At three sites, sampling regimes were designed to assess food depletion over the winter by taking cores in September, January, and March (NY [Plattner et al. 2010], NJ [hereafter, NJ1; Cramer et al. 2012], VA [unpublished,] whereas the other NJ research site (hereafter, NJ2; unpublished) only sampled midwinter food density in January. Samples were collected in freshwater impoundments and unmanaged salt marsh, and we further separated salt marsh into four categories: high marsh (irregularly flooded by tides), low marsh (regularly flooded), mudflat, and subtidal habitats. Cores were collected with a polyvinyl chloride (PVC) corer measuring either 5.1 (NJ1 and NJ2) or 10.2 cm in diameter (NY and VA), pushed into the substrate to 12.7 cm in depth; Behney et al. (2014) showed that the size of the core sample has only a weak effect on the precision of energy estimates. After collection, samples were fixed with 10% buffered formalin and dye and stored for later processing. Core samples were later washed with water to remove the formalin, and passed through a series of sieves to separate large and small material; seeds and animal material were removed, dried, and weighed to the nearest 0.0001 g.
We estimated the amount of energy (in kcal) for seeds and animal foods found in each core using published true metabolizable energy values (Cramer 2009; Cramer et al. 2012). We summed seed and animal energy to obtain an estimate of total energy per unit area within each core, and extrapolated to obtain an estimate of kcal/ha for each habitat type, for each temporal period, for each study. We averaged across temporal periods (note: NJ2 had only one temporal period) to obtain site- and habitat-specific estimates of kcal/ha, and finally, we averaged across all sites to get a grand estimate of the energy value of the five habitat types. We report means and standard errors unless otherwise noted.
After calculating these values, we noted that our estimates seemed unusually high for all habitat types (e.g., 1,024,739 kcal/ha for mudflat), and standard errors were often 20–30% of the mean or more (e.g., subtidal: x̄ = 129,603 ± 53,519 kcal/ha). Upon closer examination, our data contained some extreme outliers: for example, one core sample from NY estimated the value of mudflat habitat to be nearly 92,000,000 kcal/ha (equivalent to 68 Twinkies®/m2). Where possible, we tried to diagnose these outliers by examining the raw data, and we determined that many arose because a single large food item, such as a mussel or fish, was found in the core. In many cases, these food items were too large to be eaten by dabbling ducks, or simply not a potential food resource (Cramer et al. 2012), and should therefore be discarded for a priori reasons. Unfortunately, these disparate data sets often did not consistently contain information on individual food items (rather, only estimates of kcal/core), so we were not able to systematically remove biological outliers. Therefore, we opted to remove outliers statistically. We only removed core samples that were outliers in animal food density (likely caused by a single large food item), and kept outliers in seed food density, which potentially captures important, biologically relevant variation in seed density on the landscape.
Core sample data are strongly right-skewed (Straub et al. 2012), so standard outlier-removal methods based on a normal distribution are not appropriate. We used an outlier-identification technique that is a generalization of the Stahel–Donoho measure of outlyingness that is not subject to assumptions of symmetrically or normally distributed data (Hubert and Van der Veeken 2008). We identified outlying core samples using the adjbox() function in the package “robustbase” (Rousseeuw et al. 2014) in R version 3.0.1 (R Core Team 2013), removed these cores from the data set, and recalculated our energy estimates.
In a second analysis, we estimated how many core samples were necessary to increase the precision of our variance estimates to a threshold level. Commensurate with previous core sampling studies (Stafford et al. 2006), we used the coefficient of variation (“CV”; the standard error divided by the mean) as our measure of variance. Decisions regarding how many cores to collect are made on a site-by-site, habitat-specific basis, so we conducted our analysis at this scale. We used a random-draw Monte Carlo simulation approach: for each habitat within a site (pooled across temporal periods, where applicable), we randomly chose N points (where N ranged from 2 to the total number of cores collected) and calculated the estimated kcal/ha and associated CV. We then repeated this procedure 10,000 times for each N, and calculated the average CV for each level of N. When plotted, these formed an asymptotic curve, where increasing levels of N (more core samples) reduced the estimate of the CV with diminishing returns.
There are potentially several ways to identify the point of diminishing returns in core sampling. Odum and Kuenzler (1955) arbitrarily defined the start of the asymptote as the point where adding another datum results in a <1% change in the overall estimate. However, core sampling data are not smooth enough for this to reliably apply (i.e., the 1% cutoff point happens sporadically because of random noise in the data), and calculating running averages to overcome this issue can be biased because the researcher must arbitrarily define the length of the averaging window. Another method to calculate the point of diminishing returns is to identify the point of inflection of a fitted polynomial or other function. Unfortunately, we found that the most appropriate model to fit to the data varied dramatically among sites and habitats; fitting the same degree polynomial to different data sets did a poor job of consistently estimating the point of diminishing returns. Visual approximation is likely the best method for identifying the point of diminishing returns in highly variable data (Bond et al. 2001). Here, we visually estimated the point of diminishing returns (rounded to the nearest five cores) for each habitat at each site. We also calculated the projected CV for taking different numbers of core samples in different habitats (averaged across sites) to provide a “power table” for researchers conducting coastal core-sampling studies in the future. In the Supplementary Material, we have provided annotated R code (Text S1) and an example data file (Table S1) for running Monte Carlo simulations, calculating CVs, and plotting asymptotic curves.
Our first analysis considered all 2,341 cores; we estimated the average energy value for each habitat type and values ranged from 129,603 kcal/ha for subtidal habitat up to 1,024,739 kcal/ha for mudflat (Table 1). Standard errors were large, especially for estimates of animal food density, where standard errors were 30–58% of the mean (Table 1). We next removed animal food outliers in each habitat type, and recalculated energy estimates. By removing 129 outliers, our estimated energy values dropped by an order of magnitude for all habitat types dominated by animal foods, and standard errors for total food density were reduced to 10–15% of the mean (Table 1). The full data set, with and without outliers can be found in the Supplementary Material Table S2.
Examination of the revised data set revealed some interesting geographical and temporal patterns. First, there were large among differences in seed energy (Figure 1a) in freshwater habitats that were substantial enough to drive overall differences in total energy (Figure 1c). Seed energy sources were similar among the four salt marsh habitats and among studies; mudflats and low marsh habitats were especially similar within sites (Figure 1a). Animal foods were notably absent in freshwater habitats across all sites, and NJ tended to have more animal foods across habitats than other sites (Figure 1b).
We expected to find seasonal depletion of food resources, particularly seeds, in coastal marsh habitats, and especially impounded freshwater wetlands. However, surprisingly, we found no evidence for depletion of food resources with habitat types over the wintering period; even in sites where foods appeared to decline across the season (e.g., NY Freshwater), standard errors overlapped substantially (Figure 2; Table S2).
We used a random-draw Monte Carlo approach to estimate the average number of cores that need to be taken before the variance begins to asymptote. We plotted the average CVs from our simulations and visually estimated the point of diminishing returns. In most simulations, the point of diminishing returns was obvious; however, for sites and habitats with a large number of data points (Figure 3a), we restricted our visual analysis to 60 cores to estimate the point of diminishing returns (Figure 3b) because, for most core sampling studies, collecting several hundred cores in a single habitat type is not logistically feasible.
We ran these Monte Carlo simulations for each study and in each habitat type, totaling 19 different simulations. For each simulation, we calculated the estimated CV for collecting 10–60 cores and averaged them across studies for each habitat type (Table 2). We also averaged our visually estimated point of diminishing returns across studies for each habitat type and rounded this average to the nearest five cores (Table 2). Across all sites and habitats, we estimated that 40 cores are needed to reduce the CV to a point of diminishing returns. The full results from the Monte Carlo simulation can be found in the Supplementary Material, Table S3.
Though not the original focus of this study, perhaps the most important part of our analysis was identifying the effect of removing outliers from the core sampling data. When we removed 129 animal food outliers (approx. 5% of the cores) from the data, our estimates of energy potentially available for foraging waterfowl dropped by an order of magnitude for most habitats. This statistical removal of outliers was our attempt to censor “biological” outliers—large pieces of animal biomass in core samples that may be avoided or simply too large for dabbling ducks to consume. Eliminating biological outliers based on a priori information about waterfowl diets is preferable to a post hoc statistical approach, though for many waterfowl species, empirical data are lacking. Cramer et al. (2012) was able to address this issue by removing animal matter too large for American black ducks Anas rubripes to consume; nevertheless, their energy values remain much closer to our uncensored estimates than to our outlier-removed estimates. Similarly, Plattner et al. (2010) and DiBona (2007) also reported energy values much closer to our uncensored data.
This situation is difficult to reconcile: on one hand, animal matter can be an important source of energy for foraging waterfowl, especially black ducks (Costanzo and Malecki 1989), and so animal-rich cores should be included in landscape-level energy calculations. On the other hand, including nonfood items in models of carrying capacity can overestimate energy density by nearly 30% (Hagy and Kaminski 2012). Our point is that a small handful of outlying core samples can increase mean landscape-level energy estimates by an order of magnitude, resulting in potentially dramatic overestimates of how many waterfowl a given area can support. For this reason, Straub et al. (2012) advocate using the median instead of the mean as a point estimate of food energy available to ducks. Here and elsewhere (Livolsi et al., in review), we advocate for carrying capacity models that explicitly include measures of variance around point estimates (whether they are means or medians), and so again we return to the problem of outlier identification and removal; by including outliers, the large variances in estimates of food density limit biological interpretation of these models.
We recommend that: 1) identification of outliers should be done biologically when possible; 2) statistical removal of outliers may present a viable option when the biology is unknown; 3) the presence and impact of outliers should be highlighted in core sampling publications; and 4) habitat managers should note the variation (and its underlying source) around the point estimates of energy values when making decisions. We acknowledge the importance of sporadically distributed rich sources of animal foods, but argue that core sampling is not the best way to estimate the abundance of these sources; other sampling regimes may yield better (more accurate with less variance) estimates of the density of animal foods. For example, Turner and Trexler (1997) found that benthic corers did a poor job of sampling aquatic invertebrates, and recommend funnel traps, sweep nets, and stovepipes for reliable sampling. Tapp (2013) used both core sampling and sweep nets to sample invertebrates, and found significant differences between biomass estimates and phylogenetic richness between the two sampling methods. We recommend continued evaluation of how sampling regime affects estimates of animal biomass, and urge caution in relying on benthic cores to estimate animal foods.
After we statistically removed the outliers, we compared geographical differences and temporal trends in the food energy data. Most notably, we found striking differences in seed abundances in freshwater wetlands. At the New Jersey sites, the freshwater wetlands that were sampled are actively managed to promote the growth of moist-soil seeds, similar to other nontidal systems (Naylor 2002; Olmstead et al. 2013), which explains the rich density of seeds in those habitats. All salt marsh habitat types contained similar amounts of seed energy across sites, but there was more variation in animal foods. We found only weak evidence to support the idea that more northerly sites (NY) provide fewer food resources to dabbling ducks than southerly sites (VA). Finally, we noted that mudflat and low marsh (marsh that is regularly flooded with the tides) contained a similar amount of seed and animal foods within each site. Therefore, we recommend that these two habitats can be combined in future studies to reduce sampling effort.
We also examined temporal trends in energy density from samples collected in the early, middle, and late parts of the season. We found no evidence for depletion in any of the four salt marsh habitats, presumably because the tidal nature of the system constantly refreshes the food supply. However, we were surprised to find no evidence for depletion, especially of seed resources, in impounded freshwater systems. Some impoundments were stream-fed throughout the season, and so perhaps provided a continual influx of seeds; however, this influx would likely taper off as all potential foods reach the impoundments, and factors such as decomposition and nonwaterfowl consumption should also lead to depletion of food resources. More research conducted at a finer temporal resolution is needed to resolve this paradox. In the meantime, we recommend that waterfowl bioenergetic carrying capacity models such as TRUEMET (Central Valley Joint Venture 2006) and SWAMP (Miller et al. 2014) consider accounting for continual renewal of food resources, especially in tidal habitats.
Hagy et al. (2014) suggest that waterfowl food densities are more variable at small spatial and temporal scales, but more predictable at larger scales—that is, the energy densities of particular habitats are consistent between sites and years. We found some support for this in coastal marshes along the Atlantic Coast. Salt marsh habitat types were relatively consistent across sites (and years, because not all sites were sampled in the same year), especially for seed resources, which is the focus of most carrying-capacity modeling. On the other hand, seed densities provided by freshwater habitats were more unpredictable, likely driven by active management practices (e.g., drawdown timing) and passive influences (salt-water intrusion, weather) that can dramatically influence seed yield (Olson 2011).
Our initial goal in undertaking this meta-analysis was to determine the optimal number of core samples that balance biological precision with the inherently costly and tedious nature of core sampling. Currently, sample size for core sampling studies is largely determined by financial and temporal constraints (Greer et al. 2009; Evans-Peters et al. 2012). Here we provide a quantitative estimate of how many samples are needed to reduce the variance to the desired level for important waterfowl management sites and habitats along the Atlantic Coast. Identifying the point of diminishing returns in an unbiased, purely mathematical way proved challenging for these highly variable core-sampling data; therefore, we estimated this point by examining the asymptotic graph and picking a reasonable tradeoff between the CV and the cost of collecting additional cores. In general, we found that 40–50 samples were enough to reach a point of diminishing returns across all habitat types (if outliers are removed). It is likewise important to note that this number may be an overestimate for more homogenous habitat types, such as rice fields (Stafford et al. 2006), and we advocate repeating our simulation analysis on core sampling data from those habitats. Finally, although our results indicate that collecting 40 cores was sufficient to reduce the variance to acceptable levels, the true point of diminishing returns for some habitats was only reached by collecting hundreds of cores. Such patterns are only revealed in studies that collect these large numbers of cores, but we speculate that this phenomenon may be common in many habitat types. Regardless, the time and expense associated with reaching the true point of diminishing returns is likely prohibitive for most researchers.
In conclusion, our meta-analysis of core sampling data taken along the Atlantic Coast has highlighted the importance of understanding and appropriately dealing with outliers in the sampling data and suggested improvements to core sampling methods and habitat management decisions. With the outliers statistically removed, we showed more geographical variation in freshwater habitats than in salt marsh habitat, and no evidence for temporal depletion of food resources. Finally, we provided a useful analysis tool that will help researchers and managers determine appropriate soil-core sample sizes. With these new techniques and revised estimates of habitat energy values, waterfowl managers will be better able to establish and deliver habitat goals to support waterfowl during the nonbreeding period.
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any supplemental material. Queries should be directed to the corresponding author for the article.
Table S1. Example data used for conducting the Monte Carlo simulation analyses. These data are energy estimates derived from benthic core samples taken in 2004–2008 from coastal marsh habitats along the Atlantic Coast, USA. This file is designed to be used in conjunction with Supplementary Material Text S1.
Found at DOI: http://dx.doi.org/10.3996/072014-JFWM-051.S1 (128 KB XLSX).
Table S2. Data used for the meta-analysis of coastal marsh core samples collected between 2004 and 2008 from the Atlantic Coast, USA. This Excel file consists of multiple data sheets that contain raw data, data with statistical outliers removed, and summary tables of both.
Found at DOI: http://dx.doi.org/10.3996/072014-JFWM-051.S2 (2682 KB XLSX).
Table S3. Results from the Monte Carlo simulations that estimate the coefficient of variation associated with collecting specific numbers of coastal marsh core samples from different coastal marsh habitat types. Data are from benthic core samples collected from coastal marsh habitats along the Atlantic Coast, USA from 2004 to 2008.
Found at DOI: http://dx.doi.org/10.3996/072014-JFWM-051.S3 (24 KN XLSX).
Text S1. Text document containing R simulation code designed to provide researchers with a framework for conducting their own Monte Carlo simulations on core sampling data. R code was developed for analysis of benthic core samples collected from coastal marsh habitats along the Atlantic Coast, USA, from 2004 to 2008, but can be adapted for other sites, habitats, and years.
Found at DOI: http://dx.doi.org/10.3996/072014-JFWM-051.S4 (2 KB TXT).
Reference S1. Central Valley Joint Venture. 2006. Central Valley Joint Venture implementation plan—conserving bird habitat. Sacramento, California: U.S. Fish and Wildlife Service.
Found at DOI: http://dx.doi.org/10.3996/072014-JFWM-051.S5 (16.4 MB PDF).
Reference S2. Petrie MJ, Brasher MG, Soulliere GJ, Tirpak JM, Pool DB, Reker RR. 2011. Guidelines for establishing Joint Venture waterfowl population abundance objectives. North American Waterfowl Management Plan Science Support Team Technical Report No. 2011‐1.
Found at DOI: http://dx.doi.org/10.3996/072014-JFWM-051.S6 (315 KB PDF).
We thank D. Plattner, D. Cramer, B. Lewis, and M. Goldstein for collecting the core samples used in these analyses. We also thank David Haukos, Heath Hagy, and an anonymous reviewer for providing helpful feedback during the review process. Funding and logistical support was provided by the University of Delaware, the Black Duck and Atlantic Coast Joint Ventures, and Ducks Unlimited.
Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Citation: Ringelman KM, Williams CK, Coluccy JM. 2015. Assessing uncertainty in coastal marsh core sampling for waterfowl foods. Journal of Fish and Wildlife Management 6(1):238–246; e1944-687X. doi: 10.3996/072014-JFWM-051
The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the U.S. Fish and Wildlife Service.