Abstract
The abundance of individuals in a population is a fundamental metric in basic and applied ecology, but sampling protocols yielding precise and unbiased estimates of abundance are often cost-prohibitive. Proxies of abundance are therefore common, but require calibration and validation. There are many ways to calibrate a proxy, and it is not obvious which will perform best. We used data from eight populations of Chinook salmon Oncorhynchus tshawytscha on the Oregon coast where multiple proxies of abundance were obtained contemporaneously with independent mark–recapture estimates. We combined multiple proxy values associated with a single level of abundance into a composite index and then calibrated the composite index to mark–recapture estimates using several different techniques. We tested our calibration methods using leave-one-out cross-validation and simulation. Our cross-validation analysis did not definitively identify a single best calibration technique for all populations, but we could identify consistently inferior approaches. The simulations suggested that incorporating the uncertainty associated with mark–recapture estimates into the calibration technique reduced precision and introduced bias when mark–recapture estimate uncertainty increased with point estimate values. Cross-validation techniques can be used to test multiple methods of calibrating multiple proxies to an estimate of abundance. Critical uncertainties with the application of calibrated proxies still exist, and cost-benefit analysis should be performed to help identify optimal monitoring designs.
Introduction
Population size is difficult to estimate because animal density varies spatiotemporally, and because many animals are difficult to detect when present (Thompson 1992; Lancia et al. 1994; Yoccoz at al. 2001). Spatiotemporal variation in abundance is troublesome for monitoring programs that employ random sampling designs because the sampling frame may include locations that are prohibitively expensive to access, or the logistics of sampling cannot meet assumptions of population closure. Furthermore, random sampling alone does not overcome the animal detection problem. Mark–recapture techniques explicitly address animal detection, but typically require multiple field visits to mark and then recapture individuals, and many individuals may need to be observed to fit models that relax unrealistic assumptions (Williams et al. 2002). Given the cost of obtaining precise and unbiased estimates of population size with known statistical properties, many monitoring programs instead collect proxy information. Proxies include observations of the focal species that are derived from nonrandom “convenience” sampling designs, as well as indirect or “index” observations of the focal species.
Convenience sampling has historically been the most common approach to sampling wildlife populations (Morrison et al. 2008:142). Rosenstock et al. (2002) determined that just 14% of 224 ornithological papers published between 1989 and 1998 employed random sampling designs. Instead of randomizing survey-site selection, sites are often selected because they are conveniently located near roads. For example, birds are sampled along roads in the North American Breeding Bird Survey (Peterjohn 1994; Bart et al. 1995). Road-side observations are also made in the North American Amphibian Monitoring Program (Weir and Mossman 2005), spotlight observations of white-tailed deer Odocoileus virginianus (Collier et al. 2007), and camera traps of more elusive mammals (Sollmann et al. 2013). Catch records from commercial fisheries are a form of convenience sampling when seeking information on population abundance. The Food and Agriculture Organization of the United Nations has published fisheries catch data, which has highlighted a debate about the use of convenience data in assessing global fisheries (Pauly et al. 2013).
Index sampling is the indirect observation of animals (Witmer 2005). Commonly observed indices of animal abundance include counts of tracks, burrows, scat, rubs, fur, and camera-trap photographs. Similar to convenience samples, there is an unknown relationship between the observation and the target population size. The inferential challenge in using any proxy is that a functional relationship between the proxy and true abundance must be assumed (Gibbs 2000; Pollock et al. 2002; Keeping and Pelletier 2014). The merits of this assumption underlie recent disagreement on dingo Canis dingo management (Hayward and Marlow 2014; Nimmo et al. 2015), as well as broader concern about inference in applied ecology (Hayward et al. 2015; Stephens et al. 2015). Indeed, discussion about the use and abuse of proxies is an enduring theme in ecological literature (Williams 1978; Edwards 1998; Anderson 2001; Johnson 2008; Gopalaswamy et al. 2015).
Given the ubiquity of proxy data, we focus here on methods of calibrating and validating proxies. If proxy information is obtained concurrently with unbiased estimates of the target population size, then a conversion from the former to the latter may yield a factor by which to infer population size when only proxies are available (Williams et al. 2002; Skalski et al. 2005; Gopalaswamy et al. 2015). Here, we address two issues that arise when relating multiple proxies to independent estimates of the target population size: 1) how to combine multiple proxies into a composite index (CI), and 2) how to statistically relate the CI to estimates of target population size.
Combining multiple proxy observations into a CI can be a useful modeling technique. For example, suppose a proxy metric of abundance is recorded at five different sites nested within the population when an unbiased mark–recapture estimate of the total target population size is made. Now repeat this at six different levels of the target population size. There are now just six “replicate” observations of total population size that could be regressed on five potential proxy metrics. This model would be overfit and confounded by multicolinearity in the proxy metrics (Graham 2003). Averaging the proxy metrics together is an intuitive solution, but if the proxy metrics correspond to different areas, then there are several ways to compute an average. Furthermore, if a given proxy does not covary with population size, then including it in an average-based CI may reduce the precision of inference about population size by including unrelated variation. Instead, proxy metrics could be standardized to a zero-mean and unit-variance normal distribution and then added together in a stepwise fashion until maximal correlation to the dependent variable (independent estimates of population size) is achieved (see Schoolmaster et al. 2012).
Several options exist when statistically associating a given CI to mark–recapture estimates of abundance. Regression, regression through the origin, and correction factors using a mean of ratios or a ratio of means are all technically feasible. Additionally, the estimated uncertainty of each mark–recapture estimate could be used as a weight during the fitting procedure (Skalski 1996). Here, we used cross-validation to test the predictive performance of the numerous combinations of methods for creating a CI and techniques to relate CI to population size.
Methods
Data
Beginning in the mid-1950s the Oregon Department of Fish and Wildlife began monitoring spawning Chinook salmon Oncorhynchus tshawytscha at nonrandomly selected sites. Sites were intentionally selected because they reliably contained spawning Chinook salmon across years, and because the sites were easy to access. Surveyors walked each stream segment and visually enumerated all living and dead fish within the boundaries of the site. Most sites were, on average, 1.4 km in length, but variation among sites existed. Not all Chinook salmon that spawn in a given year will be present on the spawning grounds at the same time. Spawning Chinook salmon arrive on the spawning grounds throughout approximately a 2-mo time period (Bottom et al. 2005), and may persist for ≥1 wk before dying (Parken et al. 2003). Surveyors therefore made repeated visits to sites throughout the spawning period. The maximum daily count of live and dead fish from these repeated visits to a site is known as the “peak count,” which is the only observation from a site-year considered hereafter. We chose to focus on peak counts instead of other summaries such as mean, sum, or area-under-the-curve (English et al. 1992; Holt and Cox 2008) because survey frequency varied greatly across years and populations. However, surveys were intentionally timed to capture the maximum number living and dead fish based on subjective knowledge of the effects of environmental cues (e.g., tides, stream flow) on run timing. Although it is possible that more robust metrics than peak count could have been obtained from more frequent surveys of a given site throughout the spawning season, this was not done because the additional sampling was cost-prohibitive. An example of the peak count data from a single population is given in Figure 1.
The observed time series of peak (maximum) daily counts of Chinook salmon Oncorhynchus tshawytscha at eights survey sites (thick white lines) within a single population (Siuslaw drainage basin) on the coast of Oregon, USA. The location of the basin in relation to the coast is shown in black silhouette on the left. Thick grey lines are thought to contain spawning habitat. Peak counts from multiple surveys (conducted 1985–2010) on a given year were combined into a composite index and calibrated to mark–recapture estimates of total population abundance.
The observed time series of peak (maximum) daily counts of Chinook salmon Oncorhynchus tshawytscha at eights survey sites (thick white lines) within a single population (Siuslaw drainage basin) on the coast of Oregon, USA. The location of the basin in relation to the coast is shown in black silhouette on the left. Thick grey lines are thought to contain spawning habitat. Peak counts from multiple surveys (conducted 1985–2010) on a given year were combined into a composite index and calibrated to mark–recapture estimates of total population abundance.
The Oregon Department of Fish and Wildlife also estimated total population abundance using mark–recapture studies. Adult Chinook salmon were captured while returning to coastal rivers at or near the head of tide using either tangle nets or traps and weirs. Fish were measured for length and age, given a mutilation mark (operculum punch varied in location with time) and then released. The initial capture and marking events occurred over a period of several weeks. Recapture of marked and unmarked fish occurred during extensive spatial sampling of carcasses on the spawning grounds throughout the season. All carcasses were inspected for the mutilation mark, and length and age were once again measured. The length and age data collected from both the marking and recapture periods were analyzed for violations of assumptions about equal catchability in both samples and complete mixing of tagged and untagged fish. These assumptions are difficult to meet in some salmon mark–recapture sampling designs (Schwarz and Taylor 1998). Abundance estimates were generated with two-sample closed population Lincoln–Petersen estimators (Chapman 1951; Williams et al. 2002) or a stratified estimator (Chapman and Junge 1956; Schwarz and Taylor 1998). Where applicable, independent estimates of harvest based on random sampling of anglers upstream of the marking site were subtracted from the mark–recapture estimate to yield a final estimate of the total population size of adult fish on the spawning grounds. Uncertainty in the harvest can be computed, and so the uncertainty in the postharvest abundance on the spawning grounds can also be computed by leveraging the additive property of variances of two independent variables.
The number of sites and mark–recapture estimates per population varied across populations (Table 1; Data A1, Archived Material). Some sites were not surveyed during years when mark–recapture estimates were generated (Table 1). These missing values would confound the techniques we wish to evaluate subsequently, so we imputed missing values by model-averaging predictions from a generalized linear model with Poisson error and log link function (McCullagh and Nelder 1983). For each missing value, we constructed a univariate regression of counts from the site with the missing value on counts from another site. Separate, individual regressions were constructed using all available sites within the population. The length of the time series for these regressions was generally from 1986 to 2013, but the Salmon River time series terminated in 2002 and the Elk River time series was from 1970 to 1980. We imputed the missing value with a model-averaged estimate:
where ĉi is the estimated count from regression with site i. The weights wi are
where m is the total number of regressions performed, BIC is the Bayesian Information Criterion (Schwartz 1978) of a given regression, which is computed as
where cy is the count on year y at a given site with a missing value. The 2 in the second term is the number of estimated parameters, which includes an intercept and slope, n is the sample size, or total number of years of counts, and λ is a parameter estimated as a linear function (intercept and slope) in the regression using maximum likelihood. The MATLAB code for computing this model-average imputation of missing counts is given in Code S1, Supplemental Material.
Number of years of data for mark–recapture abundance estimates conducted between 1985 and 2010 for eight Chinook salmon Oncorhynchus tshawytscha populations on the Oregon coast. An index of abundance was recorded at several survey sites within each population and year. Occasionally an index survey site was not surveyed concurrently with mark–recapture work, so missing values were imputed.

Constructing a composite index
The small number of mark–recapture estimates per population limits the number of estimable parameters in a model predicting population size from survey site observations. Thus, a univariate synthesis of survey sites is desirable. Sites have different lengths over which fish are enumerated; therefore, a CI can be computed as a mean of ratios (MoR)
or ratio of means (RoM)
where Ci is the peak count at site i, Li is the length of the site, and I is the number of sites within the population. Ratio of means may not appear to involve means at all; therefore, note that dividing the numerator and denominator by I produces means, but then I cancels out. The CIMoR and CIRoM measure different things. The CIMoR is the count density in an average index site (per site), whereas CIRoM gives the count density in a unit-length. The differences between mean of ratios and ratio of means has been studied theoretically (Cochran 1977; Rao 2002) and in applications (Larivière and Gingras 2011).
Higher count density surveys tend to have higher variation across years. Therefore the variation in a time series of an average, where the average is taken across surveys, will be predominantly driven by the high count density surveys. If all the sites constitute a relatively large fraction of the habitat available to the spawning population, then it is reasonable to allow higher count density surveys to contribute more to the index than lower count density surveys. However, if the survey sites constitute a relatively small fraction of the habitat available to the spawning population, then variation in a low count density survey could contain as much information about variation in total population size as a high count density survey, and so a geometric mean may be more appropriate:
Schoolmaster et al. (2012) describe a technique to construct a CI (hereafter, “CISM”) from multiple, unrelated metrics. Counts from different sites are not disparate metrics with different units, but the Schoolmaster et al. (2012) approach can nonetheless be applied to peak count data generated from multiple sites within a population. The CISM is constructed by standardizing all the observations (within survey) to a normal distribution with a zero mean and unit variance. Standardized values are then sequentially added together and a correlation to mark–recapture estimates is recorded. Schoolmaster et al. (2012) show that the addition of a metric (survey site observation) improves performance of the composite index depending on a nonlinear function of the signal response, the noise of the metric, and the correlation of the metric errors. The authors then describe a stepwise procedure for constructing the composite index that involves sequentially adding metrics (survey site observations) together until correlation of the composite index with the response no longer improves. We suspected that the stepwise algorithm would lead to overfitting (Whittingham et al. 2006), so we forced the algorithm to stop adding sites to CISM at iteration k, when the following condition was met:
where ρ is the correlation of CISM to the mark–recapture estimates of abundance on all years except the cross-validation year. We evaluated values of δ from −0.2 to 0.2 in increments of 0.05. The original Schoolmaster et al. (2012) algorithm is obtained when δ = 0. When δ = 0 the algorithm stops adding sites as soon as the addition of a site would cause the correlation of CISM to the estimates of abundance to decline. Negative values of δ make it easier for the algorithm to keep adding sites into CISM. In contrast, positive values of δ cause the algorithm to more quickly reach the point where it stops adding sites into CISM. We used the data withheld during the cross-validation step to assess whether predictive performance was affected by δ. We did this by recording the mean cross-validation error,
where N̂y is the mark–recapture estimate of total abundance in year y withheld during the cross-validation iteration, and is the predicted abundance of the cross-validation year using a simple linear regression of all other N̂ys on CISM. The R code used to implement the construction of CISM and measure MCVE is given in Code S2, Supplemental Material.
Relating composite index to abundance
Once a CI is constructed from multiple surveys, a technique to relate CI to mark–recapture estimates of total population size must be chosen. Several options exist. Ordinary regression assumes an intercept, whereas regression through the origin and two correction factors (CF) do not:
Where N̂y is the mark–recapture estimate of total abundance in year y, and Y is the number of years that mark–recapture and convenience sampling were performed concurrently, and CI is one of the four composite metrics mentioned above (CIMoR, CIRoM, CIGMoR, and CISM).
We used leave-one-out cross-validation (Efron and Gong 1983) to test the predictive performance of various combinations of CI and calibration techniques (Figure 2). Each year's N̂y and CI were sequentially withheld from model fitting and used to estimate the error of the prediction:
where is the predicted abundance on year y from a fitting technique using all the data except year y. Although the total number of years per population is relatively small (Table 1), we constructed box-and-whisker plots of the εy to facilitate comparison. The MATLAB code used to implement the cross-validation analysis is given in Code S3, Supplemental Material.
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. Peak (maximum) daily counts, C, from I sites made over different stream lengths, L, were combined into a composite index (CI) in four ways. These indices were then calibrated to estimates of total population size, N̂, made on Y years. The CI derived from the Schoolmaster et al. (2012) approach (CISM) involves standardizing counts so that they are zero-centered (second summation). This creates negative values and so CISM were only modeled with regression techniques. The first summation in CISM is made over a number of sites determined by an algorithm that maximizes correlation to total population size estimates. All calibration models could incorporate the known uncertainty in N̂ as a weight (dashed arrows), but we evaluate the effects of using weights separately from the methods implied by solid arrows. The abbreviations MoR, RoM, Geo, and SE correspond to mean of ratios, ratio of means, geomean, and standard error, respectively.
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. Peak (maximum) daily counts, C, from I sites made over different stream lengths, L, were combined into a composite index (CI) in four ways. These indices were then calibrated to estimates of total population size, N̂, made on Y years. The CI derived from the Schoolmaster et al. (2012) approach (CISM) involves standardizing counts so that they are zero-centered (second summation). This creates negative values and so CISM were only modeled with regression techniques. The first summation in CISM is made over a number of sites determined by an algorithm that maximizes correlation to total population size estimates. All calibration models could incorporate the known uncertainty in N̂ as a weight (dashed arrows), but we evaluate the effects of using weights separately from the methods implied by solid arrows. The abbreviations MoR, RoM, Geo, and SE correspond to mean of ratios, ratio of means, geomean, and standard error, respectively.
Weights
The standard errors of mark–recapture estimates of abundance were known in this study, and the inverse of the variance could be used as weights to combine estimates (Adams et al. 1997). Thus, relatively imprecise mark–recapture estimates have less influence than more precise estimates when relating mark–recaptures estimates to the CI. However, we were concerned that using the known precision of the mark–recapture estimates of abundance would be problematic, particularly because mark–recapture estimates and their standard errors should covary (Figure 3a). We therefore focused on this issue separately from the cross-validation used to test the performance of the various combinations of CI and calibration techniques. For illustrative purposes, we fit weighted and unweighted regression to a population where uncertainty in the mark–recapture estimate increased with the point estimate (Figure 3b).
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. The standard error of mark–recapture estimates across eight populations (symbols) tends to increase with the mark–recapture abundance point estimate (a). When mark–recapture uncertainty strongly covaries with the point estimate (b; points high on the y-axis in have greater standard errors), then the inferred relationship between total abundance and mean site density strongly depends on whether or not regression models are fitted using the inverse variance as a weight. Error bars in (b) are 1 standard error. In a simulation study, the magnitude of uncertainty in a dependent variable was unrelated to the dependent variable (c) or a linear function of the dependent variable (d). In both scenarios, the true correction factor (250, vertical dashed line) is the ratio between mark–recapture estimates of abundance and the peak count observed at one survey site. In both scenarios, using the uncertainty in the dependent variable as a regression weight decreases the precision of the estimate (greater variance in dark line relative to light line in both c and d). Using uncertainty as a regression weight introduces bias in the estimate when uncertainty in the dependent variable covaries with the independent variable (d).
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. The standard error of mark–recapture estimates across eight populations (symbols) tends to increase with the mark–recapture abundance point estimate (a). When mark–recapture uncertainty strongly covaries with the point estimate (b; points high on the y-axis in have greater standard errors), then the inferred relationship between total abundance and mean site density strongly depends on whether or not regression models are fitted using the inverse variance as a weight. Error bars in (b) are 1 standard error. In a simulation study, the magnitude of uncertainty in a dependent variable was unrelated to the dependent variable (c) or a linear function of the dependent variable (d). In both scenarios, the true correction factor (250, vertical dashed line) is the ratio between mark–recapture estimates of abundance and the peak count observed at one survey site. In both scenarios, using the uncertainty in the dependent variable as a regression weight decreases the precision of the estimate (greater variance in dark line relative to light line in both c and d). Using uncertainty as a regression weight introduces bias in the estimate when uncertainty in the dependent variable covaries with the independent variable (d).
We also conducted a simple simulation of the use of weights. We drew realizations from an independent variable, x, uniformly distributed on the interval [6, 60]. This is the range of peak counts observed in one survey within the Siletz population during mark–recapture years. We then drew realizations from a response variable, y, by multiplying x by a random factor. Random factors were realizations from a normal distribution with mean = 250 and standard deviation = 50. These parameter values for the normal distribution were chosen so that the resulting values of y approximately match the mean and standard deviation of the mark–recapture estimates of abundance in the Siletz while also minimizing the probability of randomly generating negative values for the factor (Pr[factor <0] = 2.9e − 07).
In the simulation we focus on using uncertainty in the estimate of y when estimating the (variable) relationship between x and y. We considered two ways that uncertainty in the estimates of y arises. First, each realization of y was given a standard error drawn from a lognormal distribution with mean = 6.9435 and standard deviation = 1.1249. These values are the maximum likelihood estimates of a lognormal distribution fitted to the standard errors of mark–recapture estimates in the Siltez. Note that this procedure will not create a correlation between y and its standard error. The second procedure for associating a standard error with y causes the two variables to positively covary. Under the second procedure, the standard error was a log-linear function of the independent variable, y; specifically, the standard error = exp(5.3367 + 0.0001993 × y). These specific values were obtained by regressing the log of the standard error of Siletz mark–recapture estimates on the mark–recapture point estimates. Within each simulation run, we generated nine values of x, y, and se(y), and then computed CFMoR.
We also computed a weighted correction factor:
We repeated the entire process of generating data and estimating the correction factor 5,000 times. We used a Gaussian kernel smoother to convert these 5,000 estimates of the correction factor into a probability density. The MATLAB code used to implement this simulation study on is given in Code S4, Supplemental Material.
Results
Stopping rule for CISM
There was not a clear across-population pattern that indicated an optimal stopping-rule criterion (δ) for the Schoolmaster et al. (2012) algorithm. The mean cross-validated errors did not attain a minimum around the same values of δ (Figure 4). Instead, some populations achieved a minimum prediction error with −δ, others had minima at +δ, and some were relatively invariant. We therefore chose δ = 0 when we constructed CISM for the subsequent cross-validation tests.
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. The mean cross-validated prediction error for eight populations (lines) is plotted against the stopping rule criterion, δ, imposed on the Schoolmaster et al. (2012) algorithm for creating a composite index out of multiple sites. The original Schoolmaster et al. (2012) algorithm is obtained when δ = 0.
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. The mean cross-validated prediction error for eight populations (lines) is plotted against the stopping rule criterion, δ, imposed on the Schoolmaster et al. (2012) algorithm for creating a composite index out of multiple sites. The original Schoolmaster et al. (2012) algorithm is obtained when δ = 0.
Cross-validation tests
There was no clear across-population pattern indicating an optimal combination of composite index and calibration technique. For example, regression had the lowest median prediction error in the Nehalem and Elk populations, but performed poorly compared with the regression through the origin and the two correction factors in the Nestucca and Coquille populations (Figure 5). Within a given population and calibration technique, the median prediction error from CIGeo was worse than both CIRoM and CIMoR in 22 out of 32 instances. The CISM was only used in regression with an intercept because it is an index that can be negative. Comparing column “b” with subpanel “I” (Figure 5) indicates that the median predictive performance of CISM was worse than CIRoM and CIMoR in four populations, approximately equal in three other populations, but performed well in the Salmon River. Simply using the mean of mark–recapture estimates over time to predict the withheld estimate (Figure 5 column “a”) worked well in the Elk population, but superior alternatives existed for all other populations.
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. Shown are absolute values of cross-validation prediction errors of abundance (y-axis) using multiple methods of constructing a predictor composite index (CI) and multiple methods of calibrating the CI to mark–recapture abundance (x-axis). Each panel corresponds to a unique population. Boxplots display median, 25th and 75th percentiles, whiskers extend to most extreme observations not considered outliers (beyond the 2.7σ envelope), and outliers are plotted individually. Column “a” uses the mean of the abundance estimates not withheld during cross-validation to predict the withheld abundance estimate. Column “b” is obtained by regressing mark–recapture abundance estimates on CISM where δ = 0. Columns labeled “c,” “d,” and “e” use CIMoR, CIRoM, and CIGeo, respectively, as inputs. Subpanels with Roman numerals I–IV correspond to the calibration methods of regression, regression through the origin, mean of ratios (MoR) correction factor, and ratio of means (RoM) correction factor, respectively. A site where zero fish were counted makes it impossible to use RoM correction factor with CIGeo. Boxplots for the Coquille population in IIe and IIIe are above the scale plotted here because of zero counts.
Methods to combine proxy data across multiple survey sites (conducted 1985–2010) in the Siuslaw Drainage Basin on the coast of Oregon within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. Shown are absolute values of cross-validation prediction errors of abundance (y-axis) using multiple methods of constructing a predictor composite index (CI) and multiple methods of calibrating the CI to mark–recapture abundance (x-axis). Each panel corresponds to a unique population. Boxplots display median, 25th and 75th percentiles, whiskers extend to most extreme observations not considered outliers (beyond the 2.7σ envelope), and outliers are plotted individually. Column “a” uses the mean of the abundance estimates not withheld during cross-validation to predict the withheld abundance estimate. Column “b” is obtained by regressing mark–recapture abundance estimates on CISM where δ = 0. Columns labeled “c,” “d,” and “e” use CIMoR, CIRoM, and CIGeo, respectively, as inputs. Subpanels with Roman numerals I–IV correspond to the calibration methods of regression, regression through the origin, mean of ratios (MoR) correction factor, and ratio of means (RoM) correction factor, respectively. A site where zero fish were counted makes it impossible to use RoM correction factor with CIGeo. Boxplots for the Coquille population in IIe and IIIe are above the scale plotted here because of zero counts.
Weights
Most populations displayed a positive relationship between the mark–recapture point estimate of abundance and the standard error of the estimate (Figure 3a). The Siletz population displayed a particularly strong positive relationship between point estimates and uncertainty; thus, unweighted regression and inverse-variance-weighted regression produced very different relationships between mean site density and the estimated populations size (Figure 3b). The simulation of the effects of using weights to estimate a correction factor, where weights are the inverse of the variance, reveals that estimates are unbiased but relatively imprecise when the weights were not related to the observations (Figure 3c). When the weights were correlated to the observations, the resulting estimates of the correction factor were biased low (Figure 3d).
Discussion
Differences among the predictive performances of the various calibration techniques were sometimes small. This result is not surprising given the small differences between some of the composite indices used as input and the small differences between some of the calibration techniques. For example, CIRoM and CIMoR are identical if all the constituent survey-site lengths are identical. The South Coos population had the smallest differences in the predictive performance of CIRoM and CIMoR, and the South Coos also had the smallest coefficient of variation of survey site lengths. The Elk population had the greatest coefficient of variation in the survey site lengths but large differences in the predictive performance of CIRoM and CIMoR were not observed, so variation in survey length does not guarantee differences in the performance of these two composite indices. For Chinook salmon on the Oregon coast, we have little confidence in the CIGeo, and CISM, but note that the latter was effective in the population with the greatest amount of data.
The biggest difference among the calibration techniques we explored is that standard regression has an intercept whereas the other three techniques do not. It is important to realize the significance of these different models. Models without an intercept assume that total abundance goes to zero when the index goes to zero. This is more realistic as the proportion of the total habitat surveyed increases. Including an intercept can allow an unrealistic results if the intercept is negative. However, negative intercepts can be avoided by log transforming the dependent variable (mark–recapture abundance estimates) and the independent variable (proxy). A positive intercept implies that animals are present in the population even if none were observed in the surveys. Although this may correctly represent imperfect detection and habitat use at low population size, it is clearly false if the population has actually become extirpated. However, inference about what occurs near the origin in a plot of total abundance vs. sample abundance is an exercise in extrapolation beyond the data. The linear models used here are suitable for the range of data on hand, but the relationship between survey site abundance and population size is likely not always linear over a larger range population sizes if animals are territorial or have nonlinear isodars (Falcy 2015). We also note that our simulation results indicate that using the uncertainty of the estimate of population size as a weight adds imprecision to correction factor estimates. If uncertainty in the population size is correlated to the point estimate, which was common in our mark–recapture estimates, then the correction factor estimate is biased.
Our cross-validation prediction error is a biased estimate of the uncertainty associated with calibrated indices even when inference is made without extrapolating beyond the data. The cross-validation procedure withheld one observation from the fitting procedure when we characterized uncertainty (Figure 5). The best point estimate will be made when no data are withheld from the fitting procedure; therefore, uncertainty will decline when the best estimate is generated. However, even when the magnitude of the proxy data used for inference is within the range used for model fitting, there is additional uncertainty if inference is needed for a time that is out of the range of the input time series. A trend in the relationship between total abundance and site abundance can occur as habitats change. Furthermore, there may be a trend in observer efficiency (detection probability) as surveyor knowledge is gained or lost through time. Thus, we emphasize that accurate estimates of uncertainty will not be available for most applications of a calibrated proxy.
Our cross-validation analysis revealed that in seven out of eight populations there was always at least one method of calibrating a proxy that could do a better job of predicting total abundance than simply using mean total abundance. However, this does not address the question about whether or not calibration techniques are cost-effective methods of estimating total population size. Morrison et al. (2008) allege that the costs to validate convenience sampling designs exceed the cost of applying random sampling designs. Here, the total abundance of spawning Chinook salmon is more than a spatial sampling problem; a spawning “run” has also has a seasonal time component and the probability of detecting individuals even when present is <1. For this system, peak count is a metric of total abundance that is much easier to obtain than a good mark–recapture estimate.
We propose a general framework for assessing the cost-effectiveness of calibrated proxies of abundance. The question of whether to implement a monitoring program that is cheap and imprecise or expensive and precise depends on the consequences of management decisions based on the monitoring information. The costs in a monitoring cost–benefit analysis should include not only the expense of collecting data, but also more intangible expenses of making the wrong management decision because of imprecise or inaccurate assessment of reality. Let the probability of making a management mistake decline exponentially with increasing expenditure on sampling. The probability of a mistake can be multiplied by the management cost of the mistaken decision, m, to get the expected cost of a management mistake for a given level of expenditure on sampling. The total cost is the amount spent on sampling, s, plus the expected cost of a management mistake:
where r is a constant and 0.5 is the probability of making a mistake if no money is spent on sampling. We plotted this function (Figure 6A), as well as the function of the function's derivative set to zero (Figure 6B), in order to illustrate that the total cost is minimized over a wide range of expenditure on monitoring. Indeed, the optimal expenditure on monitoring can be zero dollars if the costs of a management mistake are sufficiently small and the cost of obtaining precise estimates is high.
Each of the population monitoring methods used to collect, combine, and calibrate data collected within a a single population of Chinook salmon Oncorhynchus tshawytscha between 1985-2010 in the Siuslaw Drainage Basin on the coast of Oregon will entail different costs and benefits. A cost–benefit analysis can be used to determine the optimal expenditure on monitoring. The total cost is the amount spent on monitoring, s, plus the cost of a management mistake, m, which declines at an exponential rate, r, with increasing expenditure on monitoring. The objective should be to minimize the total cost (A, y-axis). A function for optimal expenditure on monitoring is obtained by taking the derivative of the total cost function and setting it equal to zero. This optimal monitoring expenditure function can be evaluated over all possible values of m and r (B). Note that the minimum of the lines in (A) are also found in (B) where the surface touches the left and right sides of the box.
Each of the population monitoring methods used to collect, combine, and calibrate data collected within a a single population of Chinook salmon Oncorhynchus tshawytscha between 1985-2010 in the Siuslaw Drainage Basin on the coast of Oregon will entail different costs and benefits. A cost–benefit analysis can be used to determine the optimal expenditure on monitoring. The total cost is the amount spent on monitoring, s, plus the cost of a management mistake, m, which declines at an exponential rate, r, with increasing expenditure on monitoring. The objective should be to minimize the total cost (A, y-axis). A function for optimal expenditure on monitoring is obtained by taking the derivative of the total cost function and setting it equal to zero. This optimal monitoring expenditure function can be evaluated over all possible values of m and r (B). Note that the minimum of the lines in (A) are also found in (B) where the surface touches the left and right sides of the box.
We hope that doubts about the utility of proxy information and doubts about accurate but cost-prohibitive designs can be reframed in a cost–benefit analysis. Wildlife managers and survey design researchers might form more effective information dialogues if there was focus on all the costs associated with monitoring wildlife. The costs of an incorrect decision may be difficult to quantify (Stem et al. 2005), but an examination of the risks of each management decision is possible and necessary. However, just as a clear statement of monitoring objectives is a prerequisite for a successful monitoring design (Witmer 2005; Nichols and Williams 2006), the explicit acknowledgement of monitoring costs is a prerequisite for efficient decision making.
In our opinion, calibrating and validating a proxy of abundance is a simple task relative to designing a new monitoring program. A new monitoring program should follow from careful contemplation of monitoring goals (Nichols and Williams 2006), the value of resolving management uncertainties (Maxwell et al. 2015), and fundamental principles of sampling (Albert et al. 2010). We do not claim to have found an easy solution to a difficult problem. Rather, proxy information is ubiquitous, and technological advances such as the quantification of environmental DNA (Goldberg et al. 2015) will only increase the amount of proxy information available to managers. The calibration and validation of proxies is a prerequisite for responsible ecological applications (Jennelle et al. 2002), and the techniques described here provide empirical measures with which to judge the usefulness of proxy information. We hope that such empirical assessments play a greater role in the application and discussion of proxy information.
Supplemental Material
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any supplemental material. Queries should be directed to the corresponding author for the article.
Code S1. MATLAB code used to impute missing values. The function receives a two-dimensional array, with time on rows and observational units (sites) on columns. Missing values are imputed by separately regressing each column on the column with missing values. Predicted values from each regression are weighted by Bayesian Information Criterion. Rows are automatically removed if the predictor variable contains missing information. Predicted values are never used to predict other missing values.
Found at DOI: http://dx.doi.org/10.3996/092015-JFWM-090.S1 (DOC 27 KB).
Code S2. R code to implement the procedure described in Schoolmaster et al. (2012). This algorithm constructs a composite index in stepwise manner until the correlation of the index to the response variable (mark–recapture estimates through time) no longer improves by an amount specified by “delta.” A cross-validation of the technique is also implemented.
Found at DOI: http://dx.doi.org/10.3996/092015-JFWM-090.S2 (DOC 27 KB).
Code S3. MATLAB code used to compute and plot the cross-validation error reported in Figure 5.
Found at DOI: http://dx.doi.org/10.3996/092015-JFWM-090.S3 (DOC 27 KB).
Code S4. R code used to conduct the simulation study of the effects of incorporating uncertainty in the response variable on the bias and precision of a mean of ratios correction factor.
Found at DOI: http://dx.doi.org/10.3996/092015-JFWM-090.S4 (DOC 27 KB).
Archived Material
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any archived material. Queries should be directed to the corresponding author for the article.
To cite this archived material, please cite both the journal article (formatting found in the Abstract section of this article) and the following recommended format for the archived material.
Falcy MR, McCormick JL, Miller SA. 2016. Data from: Proxies in practice: calibration and validation of multiple indices of animal abundance, Journal of Fish and Wildlife Management 7(1):117-128. Archived in Dryad Digital Repository.
Data A1. Microsoft Excel workbook with data used in the evaluation of methods to combine proxy data across multiple survey sites within a single population of Chinook salmon Oncorhynchus tshawytscha and then calibrate to mark–recapture estimates of total population size. Each sheet contains a given population's mark–recapture point estimate and standard error through time. The peak count recorded during at various sites (s1, s2, …) within the population are also given. Fractional peak counts were imputed from a longer time series of counts that extends beyond the years mark–recapture studies were performed.
Found at http://dx.doi.org/10.5061/dryad.nk513.
Acknowledgments
Tim Dalton and Brian Riggers contributed many useful ideas during discussions leading up to this work. The Journal of Fish and Wildlife Management found peer reviewers that engaged this material with far greater clarity and detail than reviewers from other journals. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
References
Author notes
Citation: Falcy MR, McCormick JL, Miller SA. 2016. Proxies in practice: calibration and validation of multiple indices of animal abundance. Journal of Fish and Wildlife Management 7(1):117-128; e1944-687X. doi: 10.3996/092015-JFWM-090
The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of theU.S. Fish and Wildlife Service.