Sea turtle strandings due to cold-stunning are seen when turtles are exposed to ocean temperatures that acutely and persistently drop below approximately 12°C. In North Carolina, this syndrome affects imperiled loggerhead Caretta caretta, green Chelonia mydas, and Kemp's ridley Lepidochelys kempii sea turtle species. Based on oceanic and meteorological patterns of cold-stunning in sea turtles, we hypothesized that we could predict the daily size of cold-stunning events in North Carolina using random forest models. We used cold-stunning data from the North Carolina Sea Turtle Stranding and Salvage Network from 2010 to 2015 and oceanic and meteorological data from the National Data Buoy Center from 2009 to 2015 to create a random forest model that explained 99% of the variance. We explored additional models using the 10 and 20 most important variables or only oceanic and meteorological variables. These models explained similar percentages of variance. The variables most frequently found to be important were related to air temperature, atmospheric pressure, wind direction, and wind speed. Surprisingly, variables associated with water temperature, which is critical from a biological perspective, were not among the most important variables identified. We also included variables for the mean change in these metrics daily from 4 d before the day of stranding. These variables were among the most important in several of our models, especially the change in mean air temperature from 4 d before stranding to the day of stranding. The importance of specific variables from our random forest models can be used to guide the selection of future model predictors to estimate daily size of cold-stunning events. We plan to apply the results of this study to a predictive model that can serve as a warning system and to a downscaled climate projection to determine the potential impact of climate change on cold-stunning event size in the future.
Each year, the U.S. Sea Turtle Stranding and Salvage Network (STSSN), formally established in 1980, documents hundreds to thousands of sea turtle strandings due to cold-stunning along the Atlantic coast of the United States (Southeast Fisheries Science Center 2019). In North Carolina, three sea turtle species are affected and are listed under the U.S. Endangered Species Act (ESA 1973, as amended): the threatened green Chelonia mydas (U.S. Fish and Wildlife Service 1978) and loggerhead Caretta caretta (U.S. Fish and Wildlife Service 1978) sea turtles, as well as the endangered Kemp's ridley Lepidochelys kempii (U.S. Fish and Wildlife Service 1970) sea turtle. Sea turtle strandings due to cold-stunning are seen when ocean temperatures acutely and persistently drop below approximately 12°C (Burke and Standora 1991; Morreale et al. 1992; Bentivegna et al. 2002; Still et al. 2005; Roberts et al. 2014; Innis and Staggs 2017).
Sea turtles affected by cold-stunning cease eating, are lethargic, and become positively buoyant because of gas in the gastrointestinal tract (Schwartz 1978). Turtles become stranded on beaches where they may be exposed to even colder temperatures and can die without human intervention. During cold-stunning season, STSSN volunteers and cooperators monitor beaches for stranded animals, which are reported, presented for rehabilitation, and, in the majority of cases, survive to be released once they recover and ocean temperatures have improved (Avens et al. 2012a). The number of turtles stranding due to cold-stunning is increasing (Still et al. 2002, 2005; Avens et al. 2012b; Christiansen et al. 2016; Griffin et al. 2019) and an extensive amount of preparation, people hours, and financial resources are necessary each North Carolina winter. Those actively involved in responding to cold-stunning events are familiar with general oceanic and meteorological patterns that result in strandings (Innis and Staggs 2017), but identification of specific variables and time lines could improve prediction and enhance response efficacy.
In addition to decreased temperatures, wind direction and speed have been identified as factors contributing to cold-stunning strandings (Witherington and Ehrhart 1989; Burke and Standora 1991; Innis and Staggs 2017). In a study in Cape Cod Bay, Massachusetts using classification tree models, Still et al. (2005) found that juvenile Kemp's ridley and loggerhead sea turtles strand due to cold-stunning during slightly different oceanic and climatic conditions, with the smaller Kemp's ridleys stranding earlier in the season at slightly greater sea surface temperatures than loggerheads. A study from the northwest Atlantic predicted that the number of Kemp's ridley turtles stranding annually will continue to increase as late summer to late autumn sea surface temperature increases result in changes to the species' distribution (Griffin et al. 2019), while a study from the western Gulf of Mexico also showed increased numbers of green turtles stranding over time (Shaver et al. 2017). These findings, along with the institutional memory of the state wildlife agency and those involved with the stranding network, suggest that cold-stunning event occurrence and size may be predictable.
The random forest (RF) model (Breiman 2001) is an ensemble model that is an extension of decision tree models. Decision trees make predictions by examining best candidate splits, making additional splits until no improvements can be made and the tree is complete (Williams 2017). The RF model uses bootstrapping to repeatedly sample the data set and create multiple trees using a subset of the available variables (Williams 2017). Each bootstrap leaves approximately 37% of the data out of bag (i.e., unused records from the original data set), which can be used to determine the accuracy of the RF and can eliminate the need for training and validation data sets (Breiman 1996; Williams 2017). A RF model is useful in identifying the variables that are the most important to the performance of the model. The importance ranking of the variables can then be used to guide predictor selection for other model types.
Based on oceanic and meteorological patterns of sea turtle cold-stunning events, we hypothesized that we could predict the daily size of cold-stunning events in North Carolina using RF models. Using stranding records and ocean buoy data, our study identified the most important variables predictive of cold-stunning daily event size in North Carolina. The predictability of these events suggests the potential for development of a warning system to enhance recovery efforts as well as the potential of insights into the impact of future climate change.
The North Carolina Wildlife Resources Commission collected data for sea turtle strandings due to cold-stunning in North Carolina from 2010 to 2015 (n = 827 turtles) as part of the efforts of the STSSN. These data included individual stranding identification number, stranding date, species, sex, life stage, stranding site (inshore or offshore), county, beach name, latitude, longitude, and morphometric measurements (straight- or curved-line carapace measurements and mass), as well as additional information not utilized by this study (Data S1, Supplemental Material). A full suite of morphometric measurements was not always complete for each individual. Stranding locations are shown in Figure 1.
We obtained oceanic and meteorological data for 19 buoys in North Carolina (Data S2, Supplemental Material) from the National Data Buoy Center (NDBC; National Oceanic and Atmospheric Administration 2019) for 2009 to 2015 (Data S3, Supplemental Material). Data included measurement date, time, atmospheric pressure, air temperature, water temperature, wind direction, and wind speed, as well as additional information not used in this study. Buoy locations are shown in Figure 1.
All data preparation and analyses were performed using R version 3.3.3 (R Core Team 2017) and RStudio version 1.0.143 (RStudio Team 2016). Sea turtle morphometric data are summarized by species in Table 1. A limitation of the RF model is its inability to handle incomplete data sets. While most sea turtles had both straight- and curved-line carapace measurements, some had either straight- or curved-line measurements, and rare individuals had neither. Additionally, body mass was not available for 432 individuals. For a RF model to function, any individual with a missing measurement would need to be excluded. To address this, we constructed five groups for each morphometric variable by species and assigned each individual by group. The first four groups were constructed using the minimum value, first quartile, median, third quartile, and maximum value (e.g., group two contained values falling between the first quartile and median). The fifth group contained individuals with missing measurements. Conversion formulae are available to estimate straight- or curved-line carapace length measurements from notch to tip when only one is available (Avens and Goshe 2007; Goshe et al. 2010), but we elected to rely on the measured data and do not anticipate that this affected our results, as most turtles had both measurements. Our data set included 189 loggerhead, 461 greens, and 177 Kemp's ridley sea turtles. One hundred fifty-four were classified as female, 59 as male, and 614 were of unknown sex. Nearly all turtles were in the juvenile life stage (n = 815 juveniles, n = 6 adults, n = 6 not available). The most frequent stranding day of the year (mode) was the 38th day, fifth day, and ninth day for loggerhead, green, and Kemp's ridley turtles, respectively. Most turtles were found stranded inshore (n = 667) vs. offshore (n = 160) and were found in five separate counties on 11 beaches. We set the response (dependent) variable for the RF model to the daily size of cold-stunning events. During our study period there were 178 unique days of cold-stunning events with a median and mean size of 10 and 22 turtles/d, respectively (range 1–89 turtles/d).
We bulk decompressed the NDBC data, concatenated the separate buoy files into a single file, and imported the data into R for further processing (Data S4, Supplemental Material). The raw data were incomplete and some buoys were sporadically missing information. Invalid data were labeled as an impossible value for the metric, typically some variation of “99.0,” and as part of the data postprocessing steps, those values were marked as not available in R. To retrieve the turtle stranding location weather conditions, we used the geosphere version 1.5-5 package to select the buoy with the geographically nearest valid data set. We considered a data set to be valid if at least 95% of a day's measurements were available (not all buoys were completely functional on a daily basis). If a valid data set was not available, we selected the next closest buoy with a valid data set.
Strandings due to cold-stunning imply a drop in temperature. The time period between the most turtles experiencing a decrease in mean water or air temperature and the day that stranding occurred was 3 (n = 689) and 4 d (n = 666) before stranding, respectively. Therefore, all oceanic and meteorological variables were calculated to a maximum of 4-d lag before the day of stranding. We calculated minimum, mean, maximum, and mean change compared with the day of stranding for all variables except wind direction, where only mean was calculated (Data S4, Supplemental Material). Oceanic and meteorological variable names and descriptions are in Table S1 (Supplemental Material).
All RF models were calculated using the randomForest package version 4.6-12 (Data S5, Supplemental Material). We first tuned each model for the optimal number of variables tried at each split (mtry) with 500 trees starting at two variables at each split. The optimal mtry minimizes the out-of-bag error. We ran each RF model with its optimal mtry and 500 trees. We examined a plot of error vs. the number of trees for each model. We recorded the percent variance explained and importance of specific variables using mean decrease accuracy (%IncMSE) for each model.
We ran five RF models with our data set. We ran the first model with the full suite of 95 turtle and oceanic and meteorological variables. We ran second and third models to determine the effectiveness of models limited to only the most important variables and used the top 10 and 20 variables identified from the full model, respectively. We ran the fourth model using only the 81 oceanic and meteorological variables to determine model efficacy without turtle data input. We ran the fifth model to examine model efficacy using only the 52 oceanic and meteorological variables available before the day of stranding.
The complete R code is available in Data S5 (Supplemental Material). We have reported summary statistics for the oceanic and meteorological variables associated with sea turtle strandings due to cold-stunning in Table S1 (Supplemental Material). The full RF model with mtry set to four had 99.07% variance explained. A plot of error vs. the number trees shows that the default number of 500 trees was sufficiently stable (Figure 2). Figure 3 shows the variables by decreasing importance. The exact values are available in Table S2 (Supplemental Material).
The top 20 variables from the full model are shown in order of decreasing importance in Figure 3 and listed, with descriptions, in Table 2 (see Table S2 Supplemental Material for numerical values). The top 10 model with mtry set to two had 99.1% variance explained, whereas the top 20 model with mtry set to two had 99.25% variance explained. Figure 4 shows the variables in order of decreasing importance. The model with only oceanic and meteorological variables and a mtry of three had 99.16% variance explained, whereas the model using only oceanic and meteorological variables before the day of stunning and mtry set to four had 99.03% variance explained. Figure 5 shows the variables in order of decreasing importance.
All five of the RF models fit to the data explained greater than 99% of the variance. This supports our hypothesis that daily size of cold-stunning events in North Carolina is predictable and not a result of random chance. The most important variables from the full model and the success of the models using only oceanic and meteorological variables suggest that daily size of cold-stunning events in North Carolina is not dependent on sea turtle variables, such as species or size. Lastly, the model ignoring oceanic and meteorological variables from the day of stranding was also successful, supporting the goal of a system that could warn responders at least 1 d before a cold-stunning event.
The variables we found to be important most frequently were those related to air temperature, atmospheric pressure, wind direction, and wind speed. Cold fronts with winter storms decrease air temperature and atmospheric pressure and can produce strong winds, while wind direction can directly cause buoyant cold-stunned turtles to become stranded. Still et al. (2005) found average air temperature to be of moderate importance in their classification tree models, but sea surface temperature was the most dominant of the variables included. In our RF models, variables associated with water temperature were not among the most important variables identified for predicting cold-stunning event size. This unexpected result may be due to our models using daily size of cold-stunning events as the response variable, whereas Still et al. (2005) modeled presence/absence. Most of the stranding locations were inshore and water temperature data from inshore buoys were geographically closest in all but 18 cases, which suggests that potentially warmer offshore water temperatures are unlikely to have masked the importance of water temperature in our models. Assigning all turtles with an offshore stranding location to only offshore buoys would have considerably increased the distances between turtles and buoys and was less likely to accurately represent the local conditions vs. using closer, but inshore, buoys. Changes in water temperature may be a good predictor of whether cold-stunning will occur, but because winter water temperature varies less than air temperature, while also having a lower limit of 0°C, it may not be suitable as a predictor for the size of a daily cold-stunning event. Other studies and models have identified wind direction and speed as factors contributing to cold-stunning strandings (Witherington and Ehrhart 1989; Burke and Standora 1991; Still et al. 2005) and it has been recommended that search teams use wind direction to determine areas likely to have stranded turtles, such as windward beaches (Innis and Staggs 2017).
In addition to variables using the minimum, mean, and maximum values for air temperature, water temperature, atmospheric pressure, and wind speed, we included variables for the mean change in these metrics daily from 4 d before the day of stranding. Cold-stunning can be classified as acute or chronic. Acute cases are more frequent in the southern United States, are caused by atypical winter weather events, usually last less than 2 wk, and although they may affect large numbers of turtles in shallow waters, individuals are generally healthier and recover more quickly than chronic cases (Innis and Staggs 2017). We hypothesized that including a delta variable may better capture acute cold-stunning events. These variables were among the most important in several of our models, especially the change in mean air temperature from 4 d before stranding to the day of stranding, and have the potential to be useful for predicting daily size of cold-stunning events on the basis of forecast future air temperatures. Interestingly, despite the definition of cold-stunning denoting a drop in temperature, approximately 17–19% of individuals stranded after water temperature had increased in the 3 d before stranding or air temperature had increased in the 4 d before stranding.
A disadvantage of RF models is that they are considered a “gray box” in comparison with more traditional statistical tests. For example, the output of RF models does not produce P values or estimates, such as confidence intervals. However, RF models can account for nonlinear associations, do not have the issues that classification and regression trees have with collinearity, and are resistant to overfitting (Breiman 2001; Williams 2017). The importance ranking of variables from our RF models can be used to guide the selection of future model predictors to estimate daily size of cold-stunning events.
Prediction of cold-stunning events currently relies upon individual and institutional experience and weather forecasts. This study presents a multivariate analysis of sea turtle variables along with local oceanic and meteorological variables through the use of RF models. The relative importance of each variable in predicting the daily size of cold-stunning events was presented and can be used for variable selection in other model types. Additionally, a successful model using only variables available 1 to 4 d before a cold-stunning event shows promise for developing a warning system for responders. Using forecast sea surface temperature projections, Griffin et al. (2019) predicts a dramatic increase in the annual number of Kemp's ridleys cold-stunning in the northwest Atlantic as a result of climate change. Our next steps are to apply the results of this study to a downscaled climate projection to determine the potential impact of climate change on cold-stunning event size in the future.
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any supplemental material. Queries should be directed to the corresponding author for the article.
Data S1. North Carolina loggerhead Caretta caretta, green Chelonia mydas, and Kemp's ridley Lepidochelys kempii sea turtle stranding data collected by the North Carolina Wildlife Resources Commission for cold-stunning from 2010 to 2015.
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S1 (124 KB XLSX).
Data S2. National Data Buoy Center North Carolina buoy locations from 2009 to 2015.
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S2 (1 KB CSV).
Data S3. Bash shell script to fetch the list of North Carolina National Data Buoy Center buoys and then download historical data from 2009 to 2015.
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S3 (1 KB SH).
Data S4. Bash shell script that takes the data downloaded via Data S3 and generates a single file to simplify import into R.
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S4 (1 KB SH).
Data S5. R code (commented) for variable calculation and random forest models utilizing data from Data S1 and S4 (Supplemental Material).
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S5 (23 KB R).
Table S1. Summary statistics of oceanic and meteorological variables for loggerhead Caretta caretta, green Chelonia mydas, and Kemp's ridley Lepidochelys kempii sea turtle strandings due to cold-stunning in North Carolina from 2010 to 2015. We chose a 4-d cutoff for variables based on the time period when most turtles experienced a drop in the mean air and water temperature compared to the day of stranding.
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S6 (46 KB PDF).
Table S2. Variables, mean decrease accuracy (%IncMSE), and mean decrease Gini (IncNodePurity) from the output of the RF model of loggerhead Caretta caretta, green Chelonia mydas, and Kemp's ridley Lepidochelys kempii sea turtle strandings in North Carolina due to cold-stunning from 2010 to 2015 using all of the sea turtle and oceanic and meteorological variables listed in order of decreasing importance.
Found at DOI: https://doi.org/10.3996/052019-JFWM-043.S7 (39 KB PDF).
We thank the volunteers and cooperators of the North Carolina STSSN and Dr. Hongyu Ru for her statistical advice. This research was supported by a grant from the Bernice Barbour Foundation, Inc. and a Global Change fellowship (J.N.N.) from the North Carolina State University and U.S. Geological Survey Southeast Climate Adaptation Science Center. We thank the Associate Editor and journal reviewers for their feedback on this manuscript.
Any use of trade, product, website, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Citation: Niemuth JN, Ransom CC, Finn SA, Godfrey MH, Nelson SAC, Stoskopf MK. 2020. Using random forest algorithm to model cold-stunning events in sea turtles in North Carolina. Journal of Fish and Wildlife Management 11(2):531–541; e1944-687X. https://doi.org/10.3996/052019-JFWM-043
The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the U.S. Fish and Wildlife Service.