## Abstract

Wildlife biologists monitor the status and trends of American woodcock *Scolopax minor* populations in the eastern and central United States and Canada via a singing-ground survey, conducted just after sunset along roadsides in spring. Annual analyses of the survey produce estimates of trend and annual indexes of abundance for 25 states and provinces, management regions, and survey-wide. In recent years, researchers have used a log-linear hierarchical model that defines year effects as random effects in the context of a slope parameter (the S model) to model population change. Recently, researchers have proposed alternative models suitable for analysis of singing-ground survey data. Analysis of a similar roadside survey, the North American Breeding Bird Survey, has indicated that alternative models are preferable for almost all species analyzed in the Breeding Bird Survey. Here, we use leave-one-out cross-validation to compare model fit for the present singing-ground survey model to fits of three alternative models, including a model that describes population change as the difference in expected counts between successive years (the D model) and two models that include *t*-distributed extra-Poisson overdispersion effects (H models) as opposed to normally distributed extra-Poisson overdispersion. Leave-one-out cross-validation results indicate that the Bayesian predictive information criterion favored the D model, but a pairwise *t*-test indicated that the D model was not significantly better-fitting to singing-ground survey data than the S model. The H models are not preferable to the alternatives with normally distributed overdispersion. All models provided generally similar estimates of trend and annual indexes suggesting that, within this model set, choice of model will not lead to alternative conclusions regarding population change. However, as in Breeding Bird Survey analyses, we note a tendency for S model results to provide slightly more extreme estimates of trend relative to D models. We recommend use of the D model for future singing-ground survey analyses.

## Introduction

In the 1930s Mendall and Aldous (1943) pioneered roadside surveys for American woodcock *Scolopax minor* in Maine. During the 1950s and 1960s, researchers expanded surveys and modified survey design in various ways to improve spatial sampling design and standardize counting methods (e.g., Kozicky et al 1954). The singing-ground survey (SGS) currently used by the U.S. Fish and Wildlife Service to assess population status and inform harvest strategies was initiated in its current form in 1968 (Seamans and Rau 2019). Although researchers occasionally criticize the SGS due to the low encounter rates of woodcock in southern states and limited samples from the boreal forests in Ontario and Quebec, the SGS provides information of much higher quality than the much more geographically extensive North American Breeding Bird Survey (BBS, Link et al. 2020) for the species, as observers record few woodcock during the morning samples collected using the BBS protocol (Sauer et al. 2008). However, the BBS and SGS share similar sample designs, and analyses of these roadside surveys are challenging due to the need to control variation in factors that influence counts, such as observer ability, and limited data over the large geographic range of the survey. Researchers analyze the SGS using methods developed for BBS analyses. Link and Sauer (2002) and Sauer et al. (2008) described the statistical model currently used for analysis of SGS data.

Recently, Link et al. (2020) expanded the model set for BBS analyses to include four alternative models. Researchers currently use one of these models for analysis of the SGS. The current SGS model employs a log-linear hierarchical model that defines year effects as random effects having a linear regression on year (i.e., varying randomly around a straight line across years, the S model). The Link et al. (2020) model set also includes a model that describes population change in terms of conditionally independent differences between years (the D model) and two models that use the S and D population change structure but include *t*-distributed extra-Poisson overdispersion effects as heavy-tailed alternatives to normally distributed extra-Poisson overdispersion. These models are denoted as SH and DH. Using leave-one-out cross-validation (LOOCV), Link et al. (2020) compared model fit for the S model to fit of three alternative models for 548 species of North American breeding birds based on BBS data. Their comprehensive analysis of BBS data indicated that the S model was the least-preferred model of the set; model S was preferred for only 4 of the 548 species they considered in their BBS analyses. The BBS does include American woodcock, and model selection based on the BBS data for American woodcock suggested that model DH is strongly favored over model S, and that model S was the least likely model to fit the BBS American woodcock data. Link et al. (2020) also noted that S models tended to produce estimates of change that are more extreme than those produced by D models, due to the presence of the slope parameter in the model.

In this paper, we apply the four models (S, D, SH, and DH) to the SGS data. We compare estimated trends and annual indexes based on these models to determine whether the model results present conflicting views of population change, and we use LOOCV to compare fit among the four models.

## Study Site

Wildlife biologists conduct the SGS in 25 states and provinces in the eastern and central United States and Canada, and define strata for analyses in terms of states and provinces, and estimate population change for Eastern and Central management regions (Seamans and Rau 2019, Figure 1). Sample sizes vary greatly among states, and researchers added Manitoba to the survey in 1992. Sauer et al. (2008) provides additional details on regions surveyed by the SGS.

## Methods

### The American woodcock singing-ground survey

The SGS is composed of approximately 1,600 roadside survey routes located within randomly selected 10-min degree blocks. Routes consist of 10 counting stops along a 5.4-km-long survey route that observers survey shortly after sunset. Observers conduct one survey on a route each year, seasonally timed to coincide with local peak of woodcock breeding activity after migrants have moved through; the observer records the number of woodcock heard vocalizing (“peenting”) during a 2-min count period at each stop. Sauer et al. (2008) and Seamans and Rau (2019) provide additional details on the survey design, implementation, and data management. Managers and researchers vet and analyze data yearly for presentation in status reports and as input into harvest strategy analyses (Seamans and Rau 2019).

### Hierarchical models for singing-ground survey analysis

*Y*) to be overdispersed Poisson random variables, and model the log of the expected count as a function of explanatory variables. Link et al. (2020) heuristically denote the general modeling framework in Equation 1: where Y

_{i}*|λ*

_{i}*∼ Poisson (λ*

_{i}*), and Γ*

_{i}*, Ω*

_{i}*, and ε*

_{i}*represent year, observer, and overdispersion effects associated with observation*

_{i}*i*. The model set consists of four models that have alternative forms for year effects (S or D models) and the distribution of overdispersion effects (normal distributions or “heavy-tailed”

*t*distributions).

For all models, observer effects are composed of ω, a mean zero normal random effect associated with observer *o*(*i*) and a “start-up” effect η associated with indicator variable *f*(*i*) to index first year of counting on a route for an observer (i.e., Ω* _{i}* = ω

_{o}_{(}

_{i}_{)}+ η

*f*(

*i*)).

Year effects in both models, Γ* _{i}* = γ

_{s}_{(}

_{i}_{)}

_{,y}_{(}

_{i}_{)}are parameters describing relative abundance as a function of stratum

*s*(

*i*) and year

*y*(

*i*). For S models, year effects are conditionally independent and normally distributed with means (expected values, denoted E(·)) defined by a stratum and slope parameter, E(γ

*) = S*

_{s,y}*+ β*

_{s}*(*

_{s}*y*−

*y*

_{0}), and precision . The

*y*

_{0}is a baseline year. One can think of the S model year effects as yearly deviations from a line defined by the β (slope) parameter and an intercept. For the D models, year effects γ

*are normally distributed with mean γ*

_{s,y}*and precision (i.e., the differences in year effects are conditionally independent). Similarly to S models, where the stratum effect S*

_{s,y−1}*serves as an intercept (baseline abundance), in D models a base year is fixed and we set the year effect in the base year to S*

_{s}*(i.e., E(γ*

_{s}_{s,y0}) = S

*). Stratum effects are normally distributed with mean μ*

_{s}*and precision τ*

^{s}*. Extra-Poisson overdispersion effects ε*

^{s}*take two possible forms: 1) they can be assumed to be independent and normally distributed with precision τ*

_{i}^{ε}, or 2) we can specify independent and

*t*-distributed ε

*that have scale parameter τ*

_{i}^{ε}and degrees-of-freedom parameter

*ν*. For the BBS, we have found that extreme counts occur within the data and that the

*t*-distribution generally fits better than the normal distribution (Link et al. 2020). We denote models that use the

*t*-distribution for overdispersion with an H, due to their heavy tails.

The JAGS code (Plummer 2003) for our four candidate models (S, SH, D, DH), along with specification of all priors is included in Link et al. (2020; Data S1, *Supplemental Material*). Briefly, we assign flat normal priors (mean zero with standard deviation = 1,000) and vague gamma priors (shape parameter = rate parameter = 0.001) to mean parameters. We use a gamma distribution with mean 20 and variance 200 as an objective prior for *ν* (Juárez and Steel 2010).

We describe population change using derived parameters (i.e., parameters that are functions of parameters in the estimation models). We define annual indexes of abundance for a stratum as a function of year effects, *n _{s,y}=* exp( γ

*+ 0.5+ 0.5), where σ*

_{s,y}_{ε}and σ

_{ω}are the standard deviations of the indicated random effects. In the SGS analysis, states and province areas are the strata; we present results for these strata as well as for Eastern and Central management regions (Seamans and Rau 2019) and survey-wide. For multistratum regions, we compute annual indexes as area-weighted averages of state or provincial indexes. We define trend as the geometric means of annual indexes, and estimate it as the ratio of annual indexes for the last and first years in the interval taken to the power 1/(

*y*−

_{l}*y*), where

_{f}*y*and

_{f}*y*are the first and last years. The current harvest strategy for woodcock uses 3-y moving averages of annual indexes for Eastern and Central management regions as a measure of population status in setting harvest levels (U.S. Fish and Wildlife Service 2010).

_{l}### Model selection

We used LOOCV for model selection. Link and Sauer (2016a) and Link et al. (2020) described the motivation for the approach and its application in the context of BBS analyses. In LOOCV, we fit the model to a data set from which we have omitted a single observation. We refer to the Bayesian estimate of a distribution function (i.e., averaged over the posterior distribution of unknown parameters) as the predictive distribution function. The value of the predictive distribution function evaluated at omitted data value *i* is termed the conditional predictive ordinate (CPO_{i}). We define the Bayesian predictive information criterion (BPIC) as the sum of the logs of CPO_{i} over all observations. For model selection, we compute BPIC for each of the four models and the largest BPIC value indicates the model with best fit. Link and Sauer (2016a) describe paired *t*-tests for evaluating significance of differences in BPICs among models, in which they use pairwise differences of CPOs between models as the observations.

### Model fitting

We used program JAGS called from program R for model fitting and summary of posterior distributions (Plummer 2003). We fit models with 3 chains, a thinning of 5, and a burn-in of 1,000 iterations, and used 4,000 iterations to generate replicates for computation of posterior distributions of parameters. We present medians of posterior distributions as parameter estimates, and express uncertainty as 95% credible intervals (CI) with lower boundaries as 2.5th percentiles and upper boundaries of 97.5% of the posterior distributions as computed from the 12,000 replicates (4,000 replicates from each chain). For all parameters except moving averages we present 95% CIs; for the moving averages we present 70% CIs (15th and 85th percentiles of the posterior distributions) to meet management needs (USFWS 2010). Link et al. (2020) noted that LOOCV is extremely time-consuming for BBS data sets, and they reduced the computational time by randomly selecting a balanced (by year) sample of observation for computing BPICs. They selected two observation/year/species. The SGS data set is large, (40,409 counts in the data from 1968 to 2019; Data S1, *Supplemental Material*), and we also made a random, yearly balanced selection of 20 observations per year for the LOOCV analysis, computing BPIC based on these 1,040 observations for the four models. These observations represent 2.7% of the total observations. Protocols for the cross-validation analyses followed those used in Link et al. (2020).

## Results

### Comparative analysis of annual indexes and trends for four models

When fit to SGS data, the four models produced very similar estimates of trend (Tables 1–3). Overall for the survey, estimates of change from 1968 to 2019 varied from −0.91%/y (model D) to −1.00%/y (SH). Note that no data exist for Manitoba for the early years of the survey, and the 1968–2019 trend analyses do not include Manitoba. Although none of the estimates from any state, province, or management region differed among models (as indicated by overlap in the 95% CIs), comparison of differences in trend estimates by state and provinces from S and D models indicated that S models produced slightly more negative trend estimates (average difference: −0.22%/y; 95% confidence interval: −0.35, −0.09). This mean difference includes results from states and provinces with limited data and imprecise results; trends from all regions combined indicate a smaller difference (−0.08%/y). For 10-y trend estimates, the average differences in S − D model estimates among states and provinces were negative (−0.55%/y; 95% confidence interval: −1.08, −0.02). Trends from a few states and provinces with few routes appear to drive this result; composite trends for the management regions and all regions combined were slightly larger under the S model results for 10-y trends. We could not discern any consistent differences in 2-y trend estimates (average difference in S − D estimates among states and provinces = 0.97%/y; 95% confidence interval: −1.15, 3.09).

We compared half-widths of 95% CIs of trends among the 25 states and provinces for the four models. For trends computed over the 1968–2019 interval, model D had larger 95% CIs than model S, but the mean difference is small (the half-width is only 0.15 larger for D models; Table 4). Generally, D models had slightly larger 95% CIs than did S models, but no consistent differences existed between D and DH or between S and SH (Table 4). For 2009–2019 trends, D models had larger 95% CIs than S models, but H model results also tended to have smaller 95% CIs than non-H models (i.e., SH results had smaller 95% CIs than S results, DH results had smaller 95% CIs than D results). Although all 95% CIs are large for 2-y trends, S models tend to have larger 95% CIs than D models among states and provinces and DH is more precise than D (Table 4).

Annual indexes provide a great deal of insight into relationships among models (Figure 2). Within many states and provinces (e.g., Indiana, Maine, Michigan, New Jersey, Nova Scotia, New York, Ohio, Ontario, Prince Edward Island, Virginia, Vermont, Wisconsin, and West Virginia), annual indexes are very similar for all models. For the other states and provinces, annual indexes based on models S and SH provide virtually identical patterns of population change over time. Results from models D and DH are likewise very similar. As a general rule, however, both sets of trajectories, even if varying over several years, varied within the 95% CI bands, suggesting that we can attribute any apparent differences in pattern of change over time to chance. A few differences might be meaningful. In Minnesota, trajectories were similar except in the first 3 y. The larger populations indicated by the S models in those early years are observable in the regional (Central Management Region) results. In Quebec, trajectories were markedly different in the early years, when researchers conducted very few surveys.

Results from regional moving averages are consistent with patterns in annual indexes (Figure 3). Moving averages smooth the annual indexes, and the 70% CIs used for this analysis make the width of the CIs much smaller. For both regions, the four models provided very similar estimates and 70% CIs. Eastern Management Region estimates for 2017–2019 ranged from 2.29 (model SH) to 2.38 (D) and the width of the CIs varied from 0.20 (SH) to 0.22 (DH). Central Management Region estimates for 2017–2019 ranged from 2.56 (model SH) to 2.60 (D) and the width of the CIs varied from 0.156 (SH) to 0.164 (D).

### Model selection results

We computed BPICs for each model based on 1,040 randomly selected observations. Model D had the largest BPIC (−2,665), followed by model S (−2,670), model DH (−2,701) and model SH (−2,702). Paired *t*-test results comparing fit of models D and S indicated no significant difference (*t* = 1.07, *P* > 0.28). Models SH and DH also had similar BPICs (*t* = 0.18, *P* < 0.86), but all comparisons of H models with non-H models were significant (all *P* < 0.01).

## Discussion

American woodcock are difficult to survey and observers encounter them at low frequencies on survey routes in many of the states and provinces included in the SGS. The slope-year effect parameterization of the S model relies on the random effects to model deviations from a consistent long-term trend. Researchers historically favored this parameterization for data sets such as the SGS because data quality (in terms of number of routes and consistency of coverage among routes) varies greatly among states and provinces; the S model effectively defaults to a slope-trend model for data sets with very limited data. However, Link et al. (2020) convincingly demonstrated that models D and DH better fit most BBS species data sets than models S and SH did. They also showed that there was a tendency for S models to produce slightly more extreme trend estimates than D models for the BBS data. They attributed this result to limited data in the early years of the BBS. When this occurs, the S model has limited data for estimation of the year effects, the yearly effects in that case tend to be shrunk toward the slope component of the S model, and the resulting annual indexes are primarily governed by a slope parameter that may not be an adequate representation of change in those years. Year effects in D models are shrunk toward adjacent year's year effects, and do not use a long-term trend to predict yearly estimates when data are limited. Our comparative analyses and LOOCV model selection results provide useful insights into how the alternative models behave when applied to SGS data and allow us to make recommendations for appropriate analyses to be used in future SGS analyses.

Our LOOCV analysis found that model D had the largest BPIC, suggesting that D is to be preferred, but the paired *t*-test results comparing fit at individual observations did not indicate a significantly better fit of the D model relative to the S model. Our expectations based on BBS model selection was that D models, and particularly model DH, were likely to best fit SGS data. In the BBS analysis, model D best fit 53 species while model DH best fit 458 species (Link et al. 2020). The BBS surveys American woodcock, although observers only rarely encounter the species on early-morning stops along BBS routes that temporally coincide with woodcock activity. The BBS has average counts of 0.02 observations/route across their range. Model selection of American woodcock from BBS data indicated a clear superiority of model DH, and the trend estimated using this model over 611 BBS routes during 1966–2019 indicated a trend of −0.73%/y (95% CI: −1.52, 0.07) for the species.

Analysis of SGS data indicates that the S model does appear to provide slightly more negative estimates of trend than does the D model when applied to SGS data. The size of these effects is small (−0.08%/y at the survey-wide level), and as with BBS results the pattern is likely due to limited samples in a few states and provinces in the early years of the survey. Inspection of annual indexes only indicates meaningful differences in annual indexes (as indicated by comparisons of CIs of indexes) for Minnesota and Quebec in the early years of the SGS. These results, along with the observations of more extreme trends associated with S model results in BBS analyses, are an argument in favor of use of the D model for SGS analyses. Although D model results tended to be less precise than S model results, the differences in CI width are small, indicating that use of the D model would not greatly limit our ability to detect meaningful population changes in the SGS.

The SH and DH models fit less well than expected based on the experience of BBS analyses. We expect H models to be superior for species in which extreme observations occasionally occur, leading to heavy-tailed overdispersion. The SGS is primarily based on observers hearing displaying male woodcock, and observers tend to display considerable differences in ability to hear woodcock. However, the models account for this aspect of counting by incorporating observer and startup effects; the SGS protocols, in which observers are generally drawn from the population of professional wildlife biologists and new observers conduct paired counts with old observers when observers change on a route, may lead to better training and more consistent counting by observers during their tenure of counting on routes. BBS observers are drawn from the larger population of citizen scientists, and often have limited tenures on survey routes (Link and Sauer 2016b), and this large pool of observers may lead to greater variability both within and among observers.

Implementing the D model for the SGS will lead to no significant differences in trend results presently used in woodcock status reports, and we can compute the moving average population status metric used in woodcock harvest management as derived statistics from the D model year effects in exactly the same way as presently implemented for the S model. Moving averages of regional population change are virtually identical in recent years in the Central Management Region and very similar in the Eastern Management Region, indicating that use of the D model for estimation of population indexes will have little consequence for management decisions based on SGS analyses.

Wildlife managers can also use derived statistics develop predictions based on long-term population trajectories as was required in 2020 when the SGS data collection was disrupted due to concerns about deploying staff to conduct surveys during the coronavirus pandemic (Seamans and Rau 2020). In this situation, managers were interested in prediction of expected population change in woodcock in 2020, given the long-term declining trend in woodcock populations. The S model readily produced these predictions, although data from 2020 did not inform these results. Analysts added an additional year node to the analysis and treated it as missing data. The Bayesian analysis permitted estimation of the “year effect” for 2020 (in this case, a prediction based on the slope parameter), and analysts produced posterior distributions of predicted annual indexes and trends. Long-term trend information such as that embedded in the slope parameter of the S model would not have informed the D model, if analysts employed it for such a prediction. However, analysts could produce a similar prediction by defining a derived slope statistic from the year effects estimated in the D model and predicting the change in population status between 2019 and 2020.

## Supplemental Material

Please note: The *Journal of Fish and Wildlife Management* is not responsible for the content or functionality of any supplemental material. Queries should be directed to the corresponding author for the article.

**Data S1.** Singing-ground survey data collected by observers along roadside survey routes in 25 states and provinces for years 1968–2019. We have formatted data for use with JAGS programs provided as supplemental materials in Link et al. (2020). Variables are *N* of counts (ncounts), *N* of route/observers (nobservers), *N* of years (nyears), *N* of strata (nstrata), areas of strata (areaweight), counts (count, vector of size *N*), year index for counts (year, vector of length *N* with values indexing year of count), route/observer index for counts (obser, vector of length *N*), index of strata for counts (strat, vector of length *N*), and index for first year of counting (firstyr, vector of length *N*).

Found at DOI: https://doi.org/10.3996/JFWM-20-079.S1 (797 KB TXT).

**Reference S1.** Seamans ME, Rau RD. 2019. American woodcock population status, 2019. Laurel, Maryland: U.S. Fish and Wildlife Service.

Found at DOI: https://doi.org/10.3996/JFWM-20-079.S2 (2.38 MB PDF).

**Reference S2.** Seamans ME, Rau RD. 2020. American woodcock population status, 2020. Laurel, Maryland: U.S. Fish and Wildlife Service.

Found at DOI: https://doi.org/10.3996/JFWM-20-079.S3 (1.61 MB PDF).

## Acknowledgments

The singing-ground survey is a cooperative effort coordinated by state, provincial, Canadian Wildlife Service, Birds Canada, and U.S. Fish and Wildlife Service biologists. We thank the many biologists who organize and conduct the survey. We also thank Guthrie Zimmerman, Tom Cooper, David E. Anderson, and the Associate Editor for helpful comments that improved the manuscript.

Any use of trade, product, website, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. Government.

## References

The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the U.S. Fish and Wildlife Service.

## Author notes

Citation: Sauer JR, Link WA, Seamans ME, Rau RD. 2021. American woodcock singing-ground survey: comparison of four models for trend in population size. *Journal of Fish and Wildlife Management* 12(1):83–97; e1944-687X. https://doi.org/10.3996/JFWM-20-079