Conservation planning depends on reliable information regarding the geographic distribution of species. However, our knowledge of species' distributions is often incomplete, especially when species are cryptic, difficult to survey, or rare. The use of species distribution models has increased in recent years and proven a valuable tool to evaluate habitat suitability for species. However, practitioners have yet to fully adopt the potential of species distribution models to inform conservation efforts for information-limited species. Here, we describe a species distribution modeling approach for at-risk species that could better inform U.S. Fish and Wildlife Service's species status assessments and help facilitate conservation decisions. We applied four modeling techniques (generalized additive, maximum entropy, generalized boosted, and weighted ensemble) to occurrence data for four at-risk species proposed for listing under the U.S. Endangered Species Act (Papaipema eryngii, Macbridea caroliniana, Scutellaria ocmulgee, and Balduina atropurpurea) in the Southeastern United States. The use of ensemble models reduced uncertainty caused by differences among modeling techniques, with a consequent improvement of predictive accuracy of fitted models. Incorporating an ensemble modeling approach into species status assessments and similar frameworks is likely to benefit survey efforts, inform recovery activities, and provide more robust status assessments for at-risk species. We emphasize that co-producing species distribution models in close collaboration with species experts has the potential to provide better calibration data and model refinements, which could ultimately improve reliance and use of model outputs.
Species status assessments and their role on listing decisions
Information on the geographic distribution of many at-risk species is often limited (Lomba et al. 2010). Yet, conservation decisions that may affect species persistence are regularly made despite lacking reliable knowledge on status (McDonald-Madden et al. 2008; Martin et al. 2017). Thus, efforts to increase the quality of information, including models to help target surveys, inform recovery planning, or assess the overlap of species distribution and threats, can increase the odds of making better conservation and management decisions for at-risk species (e.g., Rutrough et al. 2019).
Globally, the International Union for Conservation of Nature's Red List of threatened species (1963; https://www.iucnredlist.org/) has served as the main reference for species conservation status. In the United States, the U.S. Endangered Species Act (ESA 1973, as amended), aims to protect and recover imperiled species (including subspecies or any distinct population segment of any species) and the ecosystems upon which they depend. Since its passage, the ESA prevented the extinction of >90% of the species under its protection (Greenwald et al. 2019). However, mega-petitions that simultaneously propose hundreds of species for ESA listing as threatened or endangered have been a pervasive challenge given limited resources available to evaluate population status and support field surveys for each petitioned species (Wilde 2014; USFWS 2021). The U.S. Fish and Wildlife Service (USFWS) recently adopted a new standardized framework for conducting species status assessments (SSA) that intends to improve efficiency and rigor in ESA listing and management decisions. The aim of the SSA framework is to provide defensible, transparent, and transferable analytical approaches based on species' life-history requirements, current species condition, and projections for the future of the species under a variety of climate, urbanization, and management scenarios (Smith et al. 2018).
An essential component of an SSA is a comprehensive understanding of the ecology of the species considered for ESA listing, including known population locations and current distribution, abundance, and potential for local persistence (McGowan et al. 2017). Obtaining these data can be difficult for species whose distributions and habitat relationships are poorly understood. Therefore, managers rely on limited empirical data (e.g., as few as 20–30 occurrence records), existing literature, and expert knowledge to inform the status of a species. This often limits an individual SSA to basic descriptions of environmental conditions at locations where the species is known to be present, overlooking the conditions of areas where the species may persist undetected or where the species was historically present and may yet recolonize (Rutrough et al. 2019).
An ensemble species distribution modeling approach to inform species status assessments
A better understanding of at-risk species can be facilitated by using species distribution models that can complement existing species information (e.g., Rooper et al. 2016). Species distribution models (SDMs) have proliferated in multiple aspects of conservation, including identifying biodiversity hotspots, suitable habitat for vulnerable species, and managing invasive species (see reviews by Franklin 2013; Urbina-Cardona et al. 2019). A standard SDM describes empirical correlations between species occurrences and geospatially explicit environmental variables presumed to affect the species (Franklin 2009). This model can then be projected over different areas to measure their relative habitat suitability for the species. Many algorithms, from regression to machine learning, have been proposed to create those correlations, with multiple reviews highlighting the strengths and weaknesses of individual models (e.g., Elith et al. 2006; Jones-Farrand et al. 2011). This variety of options, however, can be confusing to conservationists working with rare or elusive species, where limited sample size, lack of structured monitoring to include both presence and absence, and limited understanding of species–habitat associations can compound model uncertainties (McCune 2016). An alternative to single SDM algorithms is the simultaneous use of multiple algorithms to generate an ensemble SDM (Araújo and New 2007). This ensemble model approach can be advantageous because it considers the variability in the predictions made by individual modeling techniques to produce a combined SDM that may improve such predictions (Meller et al. 2014).
The SDM literature has increased consistently in the past decades (Brotons 2014), but these models have not been fully adopted by practitioners developing SSAs. Therefore, we see an opportunity for SDM development that could contribute to multiple aspects of the status assessment process. These contributions include 1) help managers identify the relative likelihood of occurrence across the species' geographic range to guide strategic investment of survey efforts and increase the likelihood of detection (Aizpurua et al. 2015); 2) formalize our understanding of species associations with habitat conditions (current status) based on the relationships of species occurrences to the gradient of environmental predictor variables (Guisan and Thuiller 2005); 3) inform how species are likely to respond to changing conditions (future status; Elith and Leathwick 2009); 4) inform adaptive management decisions to promote species viability over time (Cassini 2011); 5) identify sites for species reintroductions (D'Elia et al. 2015); and 6) point to gaps in the species range that could signal geographic breaks in the species distribution, which might be important in terms of conserving cryptic genetic diversity. Our objective was to generate a framework to produce ensemble SDMs to inform SSAs for at-risk species. As a case study, we applied four SDM approaches: generalized additive modeling (GAM), generalized boosting modeling (GBM), maximum entropy (MaxEnt) and weighted ensemble to occurrence data for four at-risk species petitioned for ESA listing in the Southeastern United States to support SSA development. We discuss the advantages of using this method to strengthen SSAs for these and other species submitted for listing decisions.
Materials and Methods
In collaboration with the USFWS Southeastern Region and state agency partners, we identified four at-risk species petitioned for ESA listing and undergoing SSA development. The selected species were 1) Balduina atropurpurea Harper (purple-disk honeycombhead), a plant species associated with wet pine Pinus spp. flatwoods and savannas (Chafin 2000); 2) Macbridea caroliniana (Walter) S.F. Blake (Carolina birds-in-a-nest), a plant species associated with swamp forests (Sorrie 2011); 3) Papaipema eryngii Bird (rattlesnake-master borer moth), an insect associated with mesic and wet–mesic prairies (USFS 2003); and 4) Scutellaria ocmulgee Small (ocmulgee skullcap), a plant associated with moist hardwood forests on stream terraces (Chafin et al. 2016). All species nomenclature follow the current Integrated Taxonomic Information System database (ITIS 2021).
Species distribution model approach
We used an ensemble SDM approach to predict potential suitable habitat for each species based on the environmental conditions associated with known populations (Figure 1). We used two sets of data: (a) georeferenced occurrence records for each species; and (b) environmental predictor variables presumed to be associated with species presence in the form of digital raster surfaces. We processed all data using ArcMap 10.5, and conducted statistical analyses using Program R 3.5.2 (R Core Team 2017).
We collaborated with species experts to delineate a modeling extent for each species based on either their approximate native range (B. atropurpurea—216,746 km2 and M. caroliniana—368,254 km2) or within administrative units (P. eryngii—422,406 km2 and S. ocmulgee—199,517 km2; Figure 2). We identified relevant environmental variables (e.g., climatic, terrain, vegetation) based on available literature and expert knowledge. We obtained these geospatial data from different sources, including WorldClim (Hijmans et al. 2005) for climatic variables; Landsat satellite imagery collections on Google Earth Engine (Gorelick et al. 2017) for digital elevation models, vegetation, and surface reflectance variables; Landfire (U.S. Department of Agriculture 2019a) for fire frequency; national hydrography data set (U.S. Geological Survey 2018) for hydrology data; and SSURGO (Soil Survey Geographic Database; U.S. Department of Agriculture 2019b) for soil variables (see complete set of variables in Table 1). All environmental variables had a 30-m resolution with exception of WorldClim climatic variables (1 km), which were resampled to have a consistent 30-m resolution using a bilinear approach (the average value of the surrounding four pixels) and projected to a geographic coordinate system. We ran Pearson's correlations to select noncorrelated environmental variables. If two or more variables were highly correlated to each other (|r| > 0.7; Dormann et al. 2013) we retained those that better correspond to expected species ecology based on expert knowledge.
With the help of species experts and data providers, we compiled presence records for each of the four species from state Natural Heritage Programs and USFWS field offices (B. atropurpurea: n = 92, M. caroliniana: n = 200, P. eryngii: n = 66, and S. ocmulgee: n = 37). We removed records reported prior to 1990, labeled as locally extinct in the databases, or reported with an accuracy >50 m. We then applied a spatial filter on the remaining records by randomly selecting one occurrence for each 500-m distance to reduce spatial autocorrelation. Choosing one observation per pixel has been used to reduce spatial dependence of points (e.g., Syfert et al. 2013). Given that most of our predictor variables had 30-m resolution and some had 1-km resolution (i.e., Worldclim variables), we decided to use 500 m as the compromise distance to apply this spatial filter. We also used this distance to retain more of the few occurrences available to develop SDMs (Gaul et al. 2020). The final number of presence records per species was B. atropurpurea; n = 67, M. caroliniana: n = 115, P. eryngii: n = 54, and S. ocmulgee: n = 22 (Figure 2). We also modelled habitat suitability for the rattlesnake master plant, Eryngium yuccifolium Michx, because it is an obligate host for P. eryngii. In this case, we created a habitat suitability ensemble model for rattlesnake master in a similar fashion to the other species and used the resulting model's predictions as input to model P. eryngii (i.e., a hierarchical model; Gelman and Hill 2007).
When empirical data on species absences are not available, SDMs can be calibrated using a set of randomly generated geographic points where the species is assumed to not be present, or where the species may be present but was not observed (i.e., pseudoabsences). To generate pseudoabsences, we defined both a 2-km and a 50-km buffer around a presence record. We then defined a pseudoabsence sampling area as the polygon created by removing the 2-km buffer from the 50-km buffer (i.e., a torus centered on the recorded location). Omitting the 2-km center accommodates the common practice for Heritage Programs to consider such area as an approximate area for a unique population. For each species, we generated 500 pseudoabsence points across all pseudoabsence sampling areas for both model training and model evaluation.
We generated SDMs using three widely used modeling techniques: generalized additive model (GAM; Hastie and Tibshirani 1986); generalized boosted model (GBM; Friedman et al. 2000); and maximum entropy (MaxEnt; Phillips et al. 2006). Generalized boosted model and MaxEnt are machine-learning methods while GAM is an additive model approach. These models are well-documented in the literature (Elith et al. 2006; Phillips and Dudík 2008) and therefore not described here. We used different R libraries to run each model, including dismo for MaxEnt (Hijmans et al. 2017), mgcv for GAM (Wood 2018), and gbm for GBM (Greenwell et al. 2018). We ran each modeling algorithm with the same set of environmental predictor variables, presence and pseudoabsence data for each species to capture the influence of predictor variables in the models (Table S1, Supplemental Material). We projected each of the three calibrated models across the species' study areas to generate three separate predictions of habitat suitability.
Given the small sample size for each species, we used a leave-one-out approach to cross-validate the models that contrasted our set of presence and pseudoabsence records with a set of model-predicted presences and absences. We built a model using all presence and pseudoabsence records minus one observation (i.e., a training data set of size n − 1). We then used the omitted observation's environmental data to solve the estimated model to obtain a predicted value (i.e., a test value). We iterated this process to generate a list of predicted values of size n. We then compared these test values with the observed values to evaluate model performance using the area under the precision-recall curve (AUC-PR). Although the area under the receiver operating characteristic curve (AUC-ROC) is the most common metric, AUC-PR works well with small sample sizes, is robust to broad geographic extents, and is not sensitive to the use of pseudoabsences (Sofaer et al. 2019a). As a robustness check, we also calculated the AUC-ROC and correlation coefficients for each model. The range of each metric varied from 0 (poor model performance) to 1 (good model performance).
Our SDM outputs consisted of three habitat suitability index (HSI) maps for each species with pixel values ranging from 0 (low suitability) to 1 (high suitability). We generated a model ensemble based on weights to minimize uncertainties associated with each modeling approach (Marmion et al. 2009). The weight w of a particular model was calculated by dividing its AUC-PR score by the sum of the three AUC-PR scores (one for GAM, GBM, and MaxEnt). That is, we assigned models with greater AUC-PR a larger w in the final ensemble. The final ensemble model prediction HSIens for a given landscape pixel j was HSIens,j = wiHSIi,j where wi is the weight of the i-th model algorithm (GAM, GBM, or MaxEnt) and HSIi,j is the predicted suitability for the j-th pixel from the i-th algorithm. The final ensemble model for each species thus consisted of a habitat suitability map with pixel values ranging from 0 (low suitability) to 1 (high suitability) weighted by the relative explanatory power of each modeling algorithm. We calculated the performance metrics (AUC-PR, AUC-ROC, and correlation) for the model ensemble in a similar fashion to individual models. In the resulting ensemble maps, we considered suitable areas all pixels that surpassed the HSI threshold that maximized the sum of sensitivity (proportion of presences correctly predicted) and specificity (proportion of absences correctly predicted; Liu et al. 2005).
Individual models predicted different proportions of suitable habitat (Figure 2b), with GBM and MaxEnt being the most conservative with 8.71 and 10.91% of the study area deemed as suitable (once we considered the maximum sensitivity and specificity HSI thresholds of 0.25 and 0.5, respectively), whereas GAM was the most inclusive with 14.05% of the study area identified as suitable (HSI > 0.25) (Figure S1, Supplemental Material). Our ensemble habitat suitability map, however, indicated that 10% of the study area could be suitable for the species, with only 0.52% in the highest suitability bin (HSI > 0.8; Table 2). The explanatory environmental variables that consistently exhibited greater weights in the models were maximum winter temperature and fire frequency, whereas elevation and mean summer temperature had lower weights (Table 1). Among the three SDM approaches, GBM had the highest performance while GAM had the lowest (AUC-PR = 0.59 and 0.41, respectively). The ensemble model, however, had an AUC-PR = 0.79. Other model performance metrics also showed similar patterns (Table 3). We found that areas deemed as suitable in the three models generally overlapped (59.47% agreement in at least two models, and 32.97% agreement for the three models). The most suitable areas were found in the central and southern portions of the study area (Figure 2b; Figure S1, Supplemental Material).
We found that 9.15% of the study area could be suitable for the species according to the ensemble model (HSI > 0.31), while only 0.63% appeared in the highest suitability bin (HSI > 0.8; Table 2). Individual models estimated 11.58 (GAM, HSI > 0.22), 10.44 (GBM, HSI > 0.23) and 14.83% (MaxEnt, HSI > 0.28) of the study area as suitable for the species. The most important variable across the three individual models was distance to floodplain (Table 1). Generalized boosted model and MaxEnt had higher model performance (AUC-PR = 0.60 and 0.55, respectively), and the ensemble led to an AUC-PR = 0.90 (Table 3). We detected that 24.79% of suitable area was in agreement for three models, and 49.88% in agreement for at least two models. Most of the resulting suitable area in the ensemble model was located in the coastal plain, particularly in areas close to the floodplains (Figure 2b; Figure S1, Supplemental Material).
For this species, individual models estimated 6.85 (GAM, HSI > 0.23), 7.4 (GBM, HSI > 0.25) and 14.97% (MaxEnt, HSI > 0.36) of the study area as suitable. The ensemble model, however, estimated 7.22% of the study area as suitable, whereas areas in the highest suitability bin (HSI > 0.8) comprised only 0.38% (Table 2). Environmental predictor variables with greater importance for the models included the enhanced vegetation index, habitat suitability for the obligate host plant E. yuccifolium, and soil organic matter. Variables that had lower importance in the models were percent silt in soils and tree cover (Table 1). The GBM modeling approach yielded the best performance, while GAM performed the poorest (AUC-PR = 0.37 and 0.29, respectively). The ensemble model indicated an AUC-PR = 0.63 (Table 3). The tree models had a 14.76% agreement in the area deemed as suitable on their perspective models, while that proportion increased to 38.24% when considering the overlap of at least two models. Most of the areas with high suitability for the species were located toward the central-west and central-south portions of the study area, while other portions of suitable habitat were scattered in the remaining extent (Figure 2b; Figure S1, Supplemental Material).
The ensemble model for S. ocmulgee shows that approximately 3.19% of the study area was suitable for the species, while only 0.1% had high suitability (Figure S1, Supplemental Material;Table 2). The suitable area estimated by individual models was 2.68% for GAM (HSI > 0.25), 3.16% for GBM (HSI > 0.12) and 3.06% for MaxEnt (HSI > 0.53). The most consistently important environmental predictor variables in the models for this species were temperature, percent sand, and solar radiation, while elevation appeared to be the least important (Table 1). The GBM approach exhibited the greatest model performance and GAM the lowest (AUC-PR = 0.32 and 0.20, respectively), while the model ensemble had an AUC-PR = 0.67 (Table 3). For this species, the agreement of predicted suitable areas was only 18.18% between the three models but the percent increased to 45.87% agreement for two models. The model ensemble showed that areas with higher habitat suitability for the species were located in the central part of the study area (Figure 2b; Figure S1, Supplemental Material).
In our case studies, we identified potential suitable areas that can be candidates for surveys. Prioritizing locations within areas of greater suitability scores may increase survey efficiency by targeting smaller search areas with the potential for success (Table 2). In addition, during the modeling process we identified the species' associations with environmental predictor variable gradients that could help describe each species' ecological (realized) niche (Figure S2, Supplemental Material). For instance, we confirmed that one of the most relevant predictor variables for B. atropurpurea was fire frequency (Table 1). Fire is a determinant for the species because it requires open areas with periodic fire events (Lincicome 1998), and our model captured this trend. Models for P. eryngii agreed that habitat suitability for the host species E. yuccifolium was a good predictor for the presence of the species. We also noted that the species seemed to prefer areas with intermediate proportions of canopy cover (e.g., 40%) because a complete cover would impede the growth of the host species. Lastly, precipitation was the most important variable in S. ocmulgee models, matching the known association of the species with moist forest (Chafin et al. 2016). We found different suitable habitat for the species scattered in the central part of the study region, particularly near rivers, corresponding with the species' natural history (Chafin et al. 2016). S. ocmulgee was the most challenging species to work with given the small number of records available, which affected model performance. We therefore caution managers to treat predictions with caution. Nevertheless, our model approximated habitat suitability for the species based on available data, and our maps can be used to inform additional species surveys, which would then aid future model refinement.
Potential of ensemble species distribution models in species status assessments
Under the current SSA approach, USFWS personnel have to identify the “best available” data and complete the assessment within a short time frame (6–9 mo). Given the workload generated from mega-petitions this may represent a limiting factor to the successful completion of the SSA process. Whereas SDM development is not required for the SSA process, we believe distribution models can inform multiple aspects of an SSA. First, our results indicate SDMs can provide reliable, transparent, and defensible information on the potential spatial extent of a species (Villero et al. 2017). As such, our study produced reliable solutions within a geographic context to inform conservation efforts and benefit the SSA process through the ability to compare among different modeling approaches. Our ensemble models identified geographical areas of agreement of individual models (Figure S1, Supplemental Material), accounted for individual model performance (by using weights), and consistently ranked higher in model accuracy (Table 3). Therefore, the use of ensemble SDMs may be an improved approach to guide SSA assessments of current status. Moreover, ensemble SDMs provide a standardized, transferable modeling approach that may enhance consistency in the status assessment while making distribution products comparable across spaces and species (Araújo et al. 2019).
Our results suggest that areas identified as highly suitable in the ensemble SDM would be recognized as priority sites for conservation efforts. For instance, surveys aiming to detect additional populations can improve status assessments. Targeting surveys in areas with high HSI may improve the detection of at-risk species (Meller et al. 2014; McCune 2016) and consequently, improve the reliability of the information used for listing decisions. Furthermore, stratifying conservation efforts (e.g., surveys, recovery, and translocation planning) based on suitability gradients would indicate where to invest greater effort (i.e., predicted high-suitability locations). In this context, SDM outputs used as continuous (0–1) instead of categorical data (suitable vs. nonsuitable) would best avoid loss of suitability information (Guillera-Arroita et al. 2015).
By relating environmental predictor variables with occurrence data, managers can gain additional quantitative insight into how species respond to environmental gradients (e.g., Figure S2, Supplemental Material). This is particularly relevant for many rare and at-risk species targeted by SSAs, whose known habitat requirements are mostly limited to descriptive habitat associations. Further, SDM development could also function in a predictive context (future status) by incorporating climate and management scenarios, enhancing the conservation value of SDMs (Wiens et al. 2009; Porfirio et al. 2014; Lyon et al. 2019).
Adopting the use of SDMs in the SSA process would require investments in terms of technical capacity and training. Nonetheless, once the initial investment is complete, work with new species would require less time and resources, something particularly important in the context of mega-petitions. For instance, one major and time-consuming component in developing an SDM is the initial effort to compile relevant environmental predictor variables for a species of interest. Our study indicated that many environmental variables used for one species were also useful for other species (Table 1). This is because our case study species were located in the same region (i.e., Southeastern United States) and likely responded to similar environmental factors. Regardless, substantial effort to identify common data layers at regional extents prior to model fitting can simplify the modeling process for multiple species within the same region. Similarly, once the code for generating individual and ensemble models is built for one species, it will take substantially less time to adapt it for other species. Given the transferability of coding frameworks, we recommend future efforts also focus on development of on-the-fly ensemble model predictions in a web-enabled, user-guided geoplatform.
Perceived species distribution model challenges precluding their development
Although SDM studies have become common in the past decades, they are yet to be fully adopted in conservation planning (Tulloch et al. 2016). In practice, conservation organizations may still consider modeling as challenging for multiple reasons, including that SDM development is technically difficult, not reliable, or expensive (Joppa et al. 2013; Tulloch et al. 2016). In this study we illustrated how generating ensemble SDMs has the potential to inform several aspects of SSAs and that most perceived challenges can be surmounted. Generating an ensemble SDM requires knowledge of digital data manipulation and programming; however, there are existing tools and libraries available in the SDM field to facilitate both data manipulation and processing (Ahmed et al. 2015). For instance, the code we developed to generate an ensemble SDM (Text S1, Supplemental Material) can be adapted for specific needs of other species. There might also be reluctance by practitioners to develop SDMs based on perceived lack of reliability. In this regard, our study confirmed that different model algorithms produced different outputs (Qiao et al. 2015). Importantly, this is one of the strengths of ensemble models. The ensemble model considered individual model agreement and weighted each individual model by its relative model performance (Breiner et al. 2015), thereby increasing objectivity and improving support for the decision-making process.
Another perceived shortcoming of SDMs is that resolution of model inputs might not reflect species' ecological requirements, but recent developments in the remote sensing community provide environmental variables across large geographic extents at finer spatial and temporal resolutions (Leitão and Santos 2019). Lastly, practitioners might question the reliability of models built with a limited number of locations. Indeed, despite considerable advancements in tools and techniques, the efficacy of SDMs lies in the quality of data used to build the models, making presence data a major limiting factor (Phillips et al. 2009, Gaul et al. 2020). Conversely, SDMs can produce spatial predictions that may help field biologists target new survey areas (Fois et al. 2018). If an SDM is developed well in advance to the SSA, additional species locations from such model-guided surveys could be used to validate or revise the SDM to increase its reliability for use in the SSA.
Similar to other conservation methods, there is no single SDM approach that fits the needs and conditions of all practitioners developing SSAs. Whereas our approach was appropriate for our needs, and we hope that SDM developers use it as a guide for their own work in SSAs, this methodology still has room for improvement depending on particular needs and level of information available. For instance, we used an arbitrary distance to restrict the drawing of pseudoabsences given the lack of empirical information on proper biological distances. An improved model for the species could use the dispersal capacity of the species as the distance used to restrict such calibration area. Another avenue of improvement could be calibrating the models using stratified presence–absence samples based on the complexity of environmental conditions in the landscape (Hirzel and Guisan 2002). As such, more (or less) presence and pseudoabsences could be drawn depending on the heterogeneity of environmental conditions. Further, our filtering decision for presence data (i.e., randomly select one presence at a distance equal of the largest pixel size of environmental data) was chosen to retain as many records as possible given our relatively small sample sizes. This approach did not take into account other possible biases in the sample, such as greater survey efforts in areas with easy access. Enhanced analyses can make use of tools to reduce that potential bias to minimize overfitting and increase performance (Boria et al. 2014). In addition, although we aimed to have a balanced sample size for presence and pseudoabsences across individual SDM algorithms and equal data for model validation, MaxEnt is known to perform well with larger numbers of background points. Thus, a MaxEnt only model would be better tuned by including a larger size of background points (Phillips and Dudík 2008). Finally, given that we did not know the range for several species, we extrapolated the model to the political boundaries containing presences (e.g., Figure S1, Supplemental Material). Improved extrapolations can use a multivariate environmental suitability surface (MESS) as a measure of the similarity between new environments and those in the calibration sample (Elith et al. 2010).
Despite the vast number of ways to create SDMs, SSA developers should evaluate their model needs before selecting a particular approach. This is important, because SDMs can become very complex and require significant investment in familiarity with SDM tuning options. For instance, a model aimed to have high sensitivity (even if it generates overprediction) can be preferred for identifying additional sites for surveys, while a model aimed to have high specificity (choosing areas where the species is highly likely to be present) would be preferred when designing reserves (Barbet-Massin et al. 2012). In our work we aimed to have a balance of model sensitivity and specificity so that model outputs could be relevant for a wide range of uses in SSAs (e.g., improving surveys, finding translocation sites, finding species and habitat relationships, and determining status), especially for species with limited data. Further, our models provide a foundation for building our knowledge and understanding of rare and vulnerable species. Ultimately, the model choice and calibration settings can be adapted depending of the decision context (Villero et al. 2017).
Simple but effective models that address model study objectives, consider the attributes and limitations of the data used, and provide a framework for how these interact with the underlying biological processes, are useful tools for conservation decision-making (Merow et al. 2014; Sofaer et al. 2019b). As such, we aimed to create an approach to develop ensemble models with relatively simple calibration settings for individual algorithms that can guide novice SDM developers and be useful for at-risk species. Others have argued that single, high-tuned models can yield results as good as or even better than ensemble models (Hao et al. 2020). However, in the decision context of SSAs to support ESA listing decisions, legal and workload constraints often demand that models be developed rapidly with the data in hand. We believe that use of an ensemble SDM produces a more transparent and defensible decision because the ensemble model encapsulates the current state of knowledge of the species with clear assumptions and balances the biases inherent in individual modeling approaches. In any case, we recommend identifying the strengths and limitations of the SDM in a transparent way. For instance, Sofaer et al. (2019b) developed a rubric for communicating the credibility of SDMs for decision-making (e.g., see this table for our case studies in Table S2, Supplemental Material). In addition, it is useful for decision-makers to recognize the variation of individual or ensemble models. For instance, we provided maps showing areas with higher predicted likelihood of suitability across models (e.g., Figure S1, Supplemental Material). Finally, it is important to consider that even if a model has a good discriminatory accuracy (e.g., high AUC-ROC scores for correctly predicted occurrences during model testing), this does not guarantee that models reflect a good functional accuracy (e.g., a model's ability to make continuous estimates of relative habitat suitability). Making sure that models are simple and reflect known biology for the species can close the gap between these two types of accuracy and improve the usability of the SDM outputs (Warren et al. 2020).
Concluding remarks on developing effective ensemble species distribution models
Developing an effective SDM requires close collaboration between multiple model participants, and preferably those involved in the decision process, which can also improve use (Sofaer et al. 2019b). In our project, these participants included conservation agencies and organizations responsible for the species SSA, a modeling team, data providers, and experts with local knowledge of the species' biological needs (Figure 3). Although not all SSAs require generation of novel SDMs, the structure used here provides a useful model for effective co-production of SDMs, and science in general. The decision-making organization facilitates coordination of participants to obtain the necessary resources and sustain appropriate communication among other participants. Modelers provide the technical expertise to compile, harmonize, and process data and create model outputs. Species experts are essential to inform model development, because they can help model developers select relevant environmental variables that reflect the species' natural history and also help locate additional calibration data. Later on, species experts can also help other participants assess iterative model outputs. Finally, the decision-making organization coordinates and verifies that model outputs are archived and delivered to practitioners.
Mechanisms to promote effective collaboration will require funding for initial and subsequent model development, clear communication about the existence and performance of the model, and perennial free access to modeling code and outputs. In the spirit of a co-production approach, a close and iterative collaboration among participants will help ensure that the outputs of SDM lead to actionable outcomes (Guisan et al. 2013; Sofaer et al. 2019b). Finally, this collaborative approach should be initiated such that it results in the timely development of SDMs (i.e., in time to inform the decision). Specifically for SSAs, modelers should, at a minimum, be involved in the early planning stages to help weigh the appropriateness of modeling and help determine the most appropriate approach within the decision context (e.g., budget, timeline, capacity). The earlier in the process that models can be generated the better because it allows more time for review and revision. This will ensure that the best possible model is readily available to support SSA development and ultimately policy and management decisions for the target species.
In summary, SDMs have the potential to improve the listing decision process but these models have not been fully adopted as standard practice when developing SSAs. We suggest that using ensemble SDMs can benefit the SSAs or other similar processes by providing objective and defensible information about species and habitat relationships, as well as a spatial characterization of the species suitable habitat. This information can then be used to optimize species surveys, relocate or reintroduce individuals, support strategic planning for current and future climate and management threats, and with external partners for habitat conservation planning. Policies for conservation of at-risk species in other regions may also benefit from this approach. However, these benefits require that a specific ensemble SDM adequately addresses stakeholder needs in a timely manner via model co-production, and that stakeholders commit to updating the model over time to reflect what they have learned through the model's application.
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any supplemental material. Queries should be directed to the corresponding author for the article.
Table S1. Microsoft excel file containing the environmental predictor values for each presence and pseudoabsence points used in 2020 to create habitat suitability maps. Each of the tabs corresponds to each of the four at-risk species in the Southeastern United States for which we created a habitat suitability map in support of species status assessments.
Found at DOI: https://doi.org/10.3996/JFWM-20-072.S1 (175 KB XLSX).
Table S2. Description of model quality for each at-risk species for which we developed habitat suitability maps in the Southeastern United States in support of species status assessments in 2020. The model categories and scores are based on Sofaer et al. (2019b) guidelines to report species distribution models for decision-making.
Found at DOI: https://doi.org/10.3996/JFWM-20-072.S2 (26 KB DOCX).
Figure S1. Large output figures for each modeling technique used to create habitat suitability maps for four at-risk species in the Southeastern United States in support of species status assessments in 2020. For each species and modeling technique, we show (1) a set of maps of predicted habitat suitability index and (2) a set of maps with binary suitable and nonsuitable areas defined by their model-specific maximum sum of sensitivity and specificity threshold. We also include a compound map showing how many times individual models identified a specific pixel as suitable.
Found at DOI: https://doi.org/10.3996/JFWM-20-072.S3 (7.04 MB PDF).
Figure S2. MaxEnt response curves for the environmental predictor variables used to model habitat suitability in 2020. We present these curves for each of the four at-risk species in the Southeastern United States for which we created habitat suitability map in support of species status assessments.
Found at DOI: https://doi.org/10.3996/JFWM-20-072.S4 (697 KB PDF).
Text S1. Example of code used to generate individual (generalized additive modeling [GAM], generalized boosting modeling [GBM], maximum entropy [MaxEnt]) and Ensemble models. The code was used in 2020 to generate habitat suitability maps for four at-risk species in the Southeastern United States in support of species status assessments. The code is written in Program R programming language.
Found at DOI: https://doi.org/10.3996/JFWM-20-072.S5 (39 KB DOCX).
Reference S1.[USFS] U.S. Forest Service. 2003. Conservation assessment for Eryngium root borer (Papaipema eryngii). Milwaukee, Wisconsin: Eastern Region of the Forest Service—Threatened and Endangered Species Program.
Found at DOI: https://doi.org/10.3996/JFMW-20-072.S6 (133 KB PDF).
Reference S2. Lincicome DA. 1998. The rare perennial Balduina atropurpurea (Asteraceae) at Fort Stewart, Georgia. Champaign, Illinois: US Army Corps of Engineers. USACERL Technical Report 98/75.
Found at DOI: https://doi.org/10.3996/JFWM-20-072.S7 (7.22 MB PDF).
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any archived material. Queries should be directed to the corresponding author for the article.
To cite this archived material, please cite both the journal article (formatting found in the Abstract section of this article) and the following recommended format for the archived material.
All the high-resolution ensemble suitability maps created for four at-risk species in the Southeastern United States in support of species status assessments have been archived and can be found at ScienceBase: https://www.sciencebase.gov/catalog/item/5f5117f082ce4c3d12385940
We want to thank our collaborators at the USFWS (M. Harris, K. Lundh, M. Lombardi, A. Punsalan, M. Elmore, D. Caldwell), State Fish & Wildlife Agencies (A. Fowler, T. Patrick), NatureServe (H. Hamilton, R. Smith), species experts (J. Wiker, J. Bess, K. Bradley, Cecelia N. Dailey), and all the other partners that provided guidance in this project. Two anonymous reviewers and the Associate Editor provided comments that improved an earlier version of this manuscript. Two of the authors are U.S. Government Employees. This work was prepared as part of their official duties. Funding for this project was available thanks to cooperative agreement # F17AC00899 with the U.S. Fish and Wildlife Service. This project was also supported by McIntire-Stennis Funds # 1019539 and the Forest and Wildlife Research Center at Mississippi State University.
Any use of trade, product, website, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. Government.
The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the U.S. Fish and Wildlife Service.
Citation: Ramirez-Reyes C, Nazeri M, Street G, Jones-Farrand DT, Vilella FJ, Evans KO. 2021. Embracing ensemble species distribution models to inform at-risk species status assessments. Journal of Fish and Wildlife Management 12(1):98–111; e1944-687X. https://doi.org/10.3996/JFWM-20-072