For many rare or endangered anurans, monitoring is achieved via auditory cues alone. Human-performed audio surveys are inherently biased, and may fail to detect animals when they are present. Automated audio recognition tools offer an alternative mode of observer-free monitoring. Few commercially available platforms for developing these tools exist, and little research has investigated whether these tools are effective at detecting rare vocalization events. We generated a recognizer for detecting the vocalization of the endangered Houston toad Anaxyrus houstonensis using SongScope© bioacoustics software. We developed our recognizer using a large sample of training data that included only the highest quality of recorded audio (i.e., low noise, no interfering vocalizations) divided into small, manageable batches. To track recognizer performance, we generated an independent set of test data through randomly sampling a large population of audio known to possess Houston toad vocalizations. We analyzed training data and test data recursively, using a criterion of zero tolerance for false-negative detections. For each step, we incorporated a new batch of training data into the recognizer. Once we included all training data, we manually verified recognizer performance against one full month (March 2014) of audio taken from a known breeding locality. The recognizer successfully identified 100% of all training data and 97.2% of all test data. However, there is a trade-off between reducing false-negative and increasing false-positive detections, which limited the usefulness of some features of SongScope. Methods of automated detection represent a means by which we may test the efficacy of the manual monitoring techniques currently in use. The ability to search any collection of audio recordings for Houston toad vocalizations has the potential to challenge the paradigms presently placed on monitoring for this species of conservation concern.
Long-term monitoring of anuran populations is required to understand population dynamics, the causes for growth or decline, and to enable informed conservation assessments and stewardship (Pechmann et al. 1991). Most male anurans vocalize to attract females for breeding, and population monitoring programs often use manual, human-performed auditory call surveys (MCSs) for assessments of breeding site occupancy (Bridges and Dorcas 2000; Crouch and Paton 2002; Schmidt 2003; Pierce and Gutzweiller 2004; Jackson et al. 2006; USFWS 2007). These surveys can also be used to estimate anuran abundance by indexing the number of individuals being heard (Zimmerman 1994; Weir and Mossman 2005). However, these indices do not show strong correlation with true abundance, and no consensus of approach exists at this time (Corn et al. 2011; Pierce and Hall 2013). MCSs can have a multitude of confounding factors, such as anthropogenic noise disturbance (Bee and Swanson 2007) or temporal bias (Cook et al. 2011). These challenges are further compounded when monitoring for rare or elusive species (Crouch and Paton 2002; Williams et al. 2013).
Automated methods of audio monitoring offer alternatives to traditional MCS (Digby et al. 2013). Automated recording devices (ARDs) can be less expensive than MCSs, monitor inhospitable and remote sites, avoid bias due to observer disturbance, and avoid temporal bias through rigorous sampling regimes. Recording devices can provide reliable data more rapidly than MCSs (Dorcas et al. 2009). Common applications of ARDs include testing or improving MCS methods for application over a larger spatial scale (Dorcas et al. 2009; Williams et al. 2013), as well as in determining the environmental cues for anuran chorusing (Bridges and Dorcas 2000; Oseen and Wassersug 2002; Acevado and Villanueva-Rivera 2006; Digby et al. 2013; Willacy et al. 2015). Early ARD users were burdened by the necessity to manually listen (O'Neal 2014) or spectrographically review field recordings. Advancements in the emerging science of bioacoustics have provided researchers with many techniques for automated audio detection, alleviating this burden. There are many commercial and open-source pattern recognition platforms available such as RAVEN (Charif et al. 2010), R packages “Seewave” or “monitoR” (Sueur et al. 2008; Katz et al. 2016), the automated detection toolbox in language C# (Towsey et al. 2012; Digby et al. 2013), and SongScope© (Wildlife Acoustics 2011a). See Obrist et al. (2010) for a more detailed account of bioacoustics software. These programs rely on a variety of complex mathematics to achieve recognition of focal audio (i.e., hidden Markov models, mel-frequency cepstral coefficients, neural networks, fuzzy clustering, and decision trees). However, not all studies that describe novel methods for automated detection provide information regarding what data type shall be generated (i.e., abundance, presence/absence) and seldom seek to answer an ecological question regarding focal species.
The efficacy of automated audio detection has been criticized for commonly featuring excessive false-positive detections (Barclay 1999; Swiston and Mennill 2009). However, trade-offs exist between false-positive (type I errors) and false-negative (type II errors) detections (Waddle et al. 2009). Researcher subjectivity, as well as quality and amount of training data, can affect the magnitude of these trade-offs. These limitations are exacerbated among studies focusing on rare or elusive species that might vocalize infrequently (Swiston and Mennill 2009; Goh 2011; Digby et al. 2013). For studies that aim to detect a rare animal's vocalization, false-negative detections are more problematic than false-positive detections. In other words, overlooking the call of a rare animal as a type II error has a greater consequence than simply having to filter through a larger number of type I errors to find what you're looking for. Alternatively, this trade-off may not affect methods for measuring biodiversity, where identifying the maximum number of species vocalizing, but not necessarily all species that may be present, is prioritized (Hsu et al. 2005; Aide et al. 2013; Bedoya et al. 2014; Noda 2016).
The Houston toad Anaxyrus houstonensis (Sanders 1953; Frost et al. 2006) is a rare species of anuran endemic to southeastern central Texas, and is listed as endangered at state, federal, and international levels (Gottschalk 1970; Honegger 1970; U.S. Endangered Species Act [ESA 1973, as amended]; Hammerson and Canseco-Márquez 2004). Ongoing habitat loss and fragmentation throughout its range are major drivers of population declines (Brown 1971; Potter et al. 1984). Houston toads have undergone extirpations in at least three Texas counties, restricting their range to only 10 remaining counties (Potter et al. 1984). Robust populations are documented in only Bastrop and Robertson counties, Texas. However, Houston toad populations in Bastrop County are declining, likely due to an increase in population stressors (e.g., drought, wildfires, development; Gaston et al. 2010, Duarte et al. 2014).
In September 2011, the Bastrop County Complex Fire burned > 13,700 ha of habitat, including 96% of Bastrop State Park, then believed to be the species' last stronghold (Price 2003). Because of the Houston toad's endangered status and local extirpations, numerous groups are interested in conducting consistent, long-term monitoring for this species. Researchers, environmental consultants, and federal agencies rely primarily on MCSs to determine breeding site presence of Houston toads (USFWS 2007), which is common among endangered anurans of North America (USFWS 1999, 2005, 2006). Continuous monitoring across its remaining range is essential for tracking population trends, and determining appropriate local management strategies (e.g., forest restoration, population supplementation). However, because of time, personnel, and financial constraints, thorough long-term, range-wide monitoring has been impossible to achieve.
This study illustrates how to develop a robust and reliable tool for recognizing the vocalizations of rare and endangered anurans within SongScope (version 4.1.3A) bioacoustics software (Wildlife Acoustics, Maynard, Massachusetts), using the Houston toad as a model species. Additionally, we describe how to validate the effectiveness of a recognition tool through detailed manual review. Because of the endangered status of the Houston toad, we prioritized minimization of missed calls (i.e., type II errors, false-negative detections).
We deployed ARDs (SongMeter models SM2, SM2+, and SM3) at 35 potential breeding locations from 3 January to 12 July 2014 in two counties of east central Texas (n = 11 in Bastrop County and n = 24 in Robertson County). We secured ARDs to structure objects < 10 m from pond, drainage, or water-body edge. We programmed each to record the first 10 min of each hour from 1800 hours to 0500 hours the following morning. This resulted in 12 10-min segments (120 min) of audio per device per survey night. To reduce file size we selected the proprietary WAC format, and reduced sample rate to 16 kHz. This lowered the maximum frequency recorded to 8 kHz, which is appropriate for detection of most North American anuran vocalizations (Narins et al. 2004). Under these settings ARDs required battery changes approximately every 40 d. During these visits SD cards containing field recordings were swapped for blank replacements. We also carried out MCSs following the guidelines provided by the U.S. Fish and Wildlife Service (USFWS) at a subset of the same sites that were monitored with ARDs (USFWS 2007). Surveys occurred on nights that met or exceeded the environmental conditions prescribed by the USFWS to ensure that no chorusing events would be missed. During MCSs, we monitored each site once per night, for 5 min, without the explicit goal of overlapping with audio recorded by ARDs.
We used SongScope to spectrographically review audio. We cross-referenced dates and locations of MCS detections to find audio files containing high-quality vocalizations for recognizer training data (n = 7 files; Audio S1–S7; Figure 1). Ideal vocalizations are visible within a spectrograph, audible when played back, and do not overlap with other animal vocalizations. We ensured that these files encompassed multiple sites across both counties surveyed. We annotated between 13 and 61 Houston toad vocalizations from each file. Annotating vocalizations in SongScope is a “click and drag” highlighting process used to define the bounds of a vocalization within the viewable spectrograph, that is, the moments in time when a vocalization starts and ends, and which frequencies it occupies, for the purpose of incorporating the sound into a recognizer (Figure 2).
To track recognizer performance as it was built, we arranged randomly selected data into a simulation of one full survey night of detections (i.e., 12 files; 120 min), referred to hereafter as “test data” (Figure 1). To accomplish this we generated a population of 105 files containing Houston toad calls, independent of the 7 training data files, using the same cross-referencing technique described above. We randomly selected 12 files from this population using Program R (R Core Team 2014). We obtained the number of Houston toad calls within these test data (n = 186) by visually inspecting their spectrographs.
SongScope offers two proprietary filters used for removing unwanted results: quality and score. Quality represents a statistical distribution of parameters within the training data used to build a recognizer, and ranges from 0.00 to 99.99, higher values indicating greater confidence (Wildlife Acoustics 2011b). Score can range from 0 to 100, and measures the statistical fit of a vocalization to the model estimated by the recognizer (Wildlife Acoustics 2011b). For this experiment, each recognizer scanned the test data with filters disabled to determine the lower threshold for true detections (lowest true positive). This ensures a zero tolerance for false-negative detections.
We incorporated the first training data files' annotations, and adjusted parameters to summarize all annotations entirely. Because batches of training data were divided by file for this study, we scanned the file(s) from which the training data were gathered recursively (Figure 1). This ensured that recognizers could, at minimum, identify the calls from which they were built. We adjusted parameters until the recognizer could accurately identify all training data incorporated. We manually reviewed the results from these self tests to verify detection. For each self test we counted positive detections, the number of true vocalizations each positive detection represents (accounting for overlap), the total number of detections made with filters fixed at zero, and the total number of detections made with filters adjusted to the value of the lowest true-positive detection (eliminating false positives only). Once we constructed, parameterized, and self-tested the recognizer, we scanned the test data with filters fixed at zero (Table 1). Again, we manually reviewed the results of these scans to confirm detections and ensure that no false negatives occurred. We repeated this for each set of annotations incorporated into the recognizer (Figure 1). Once we incorporated all files of training data, we removed unwanted annotations with potentially negative effects (e.g., short bursts, weak signals). Overall, we performed eight iterations of this process.
To quantify the failure rate of the final recognizer (step no. 8; Tables 1 and 2), we manually estimated the number of calls in each recording taken in 2014 from a single location (n = 1,945). We analyzed these recordings with filters set to lowest true-positive values (Table 1; Figure 1). The location chosen represented the highest probability of Houston toad chorusing on the basis of >10 y of MCS data (M.R.J. Forstner, personal observation). To estimate recognizer failure, we quantified true-positive and false-negative detections. We reanalyzed any files containing false-negative detections with filters disabled to investigate the source of error.
The 35 ARDs recorded between 1,465 and 2,272 audio files each, totaling 657,350 min. Data loss occurred at several ARDs because of inconsistent battery life, which is expected when managing a large collection of rechargeable cells under demanding environmental circumstances. Of the 35 ARDs deployed, 11 yielded detections of the Houston toad.
During recognizer self tests (Table 2), 68% of the detections were true positives and 32% were false positives, with filters adjusted to the value for lowest true-positive detection. False-positive detections decrease by 27.8% by applying filters in this way. With the incorporation of each new training data batch, the filter thresholds decreased (i.e., excluded fewer false positives), with the exception of the final step in which we removed imperfect annotations.
Parameters estimated by SongScope that underwent the greatest change throughout this study included cross- and total training, model states, state usage, and mean duration (Table 1). Training percentages dropped as variation in annotations increased; however, these percentages ranged from 70.97 to 83.3. Mean duration (range 5.92–10.45 s) seemingly limits the lower threshold of the quality filter considerably. Thus, the eighth iteration of recognizer development was aimed toward increasing the mean duration by removing short vocalizations (Table 1).
Figure 3 shows that quality and duration of detection are approximately normally distributed. This means the closer detections are to the mean duration estimated by the recognizer, the higher the quality. We do not believe this phenomenon to be limited to only effects based on duration; rather, duration is the only parameter having a detectable effect on the efficacy of the recognizer built to identify the call of the Houston toad. Given that the Houston toad's call has a constant frequency that is rather narrow banded (Tipton et al. 2012), length of call varies more than other features. Because quality assesses all the parameters within a recognizer, it is fair to consider the outliers that violate the normal distribution presented in Figure 3 as vocalizations that vary in a metric other than duration.
We noted that false-negative detections occur in the event that a vocalization takes place as a recording begins. In this instance that portion of sound that has been recorded will go unidentified by the software (Figure 4). We believe this to be the only occasion, when using a zero-tolerance approach, in which a false-negative detection may occur. There were false-negative detections in two of the seven training files and in one of the files used to create the test data. We found only one missed call per file, and each occurred at the origin of the file (Figure 4).
From the subset of data taken from a single location we processed a total of 1,945 files. Manual review estimated 393 vocalizations in 53 files, ranging from 1 to 30 calls per file, averaging 7.42. The recognizer made 437 true-positive detections in 51 audio files. Of those 51 recordings the number of vocalizations per file ranged from 1 to 26, averaging 8.57 detections per file. There were 11 incidents of false negatives. Six of these incidents were a consequence of vocalizations taking place at the origin of each recording (Figure 4). The five other incidents of false negatives included faint or weak calls. The score assigned for these uncharacteristically faint vocalizations was below the threshold determined during recognizer development. Lowering this filter such that it includes these five vocalizations increased the number of false-positive detections by 224% (n1 = 1,399 to n2 = 3,133). Taking all vocalizations into account the recognizer correctly identified 97.2% of the true vocalizations present within the validation audio. False-positive detections consisted primarily of detections shorter than 1 s. These false positives were triggered by the sound of wind, rain, automobile traffic, birds, and other anurans, namely Hyla versicolor or Pseudacris crucifer.
The number of vocalizations present in test data was underestimated in steps 4, 6, 7, and 8 (Tables 1 and 2). This is due to a single detection accounting for more than one vocalization (Figure 5). The number of vocalizations was overestimated in steps 1, 2, 3, and 5 such that more than one detection was made per vocalization (Figure 5). Upon the final step of development, self tests show that the finalized recognizer overestimates the number of vocalizations present by 2.7% (n = 191), a low deviation from the actual number present (n = 186).
Objectivity and attention to detail are important in determining the efficacy of a recognizer. Comparable studies report failure among automated detection procedures that could potentially be eliminated by increasing the rigor of training and validation, resulting in more carefully assembled tools (Eldridge 2011, Waddle et. al. 2009). Our approach for the development and optimization of this recognizer followed a strict criterion of zero tolerance for false-negative detections. Although this may not be necessary for recognizers for all species, for the Houston toad this method was effective. Through improving our recognizer iteratively, we reduced false-positive detections without concurrent increase in false negatives, outside of those described in Figure 4 (Table 1). Training-data self-test results were 32% false-positive detections, whereas results from recognizer validation indicate a dramatic increase in false-positive detections. This increase is likely caused by audio containing greater amounts of noise and fewer ideal Houston toad vocalizations.
In total, the process of preparing and optimizing this recognizer required a comprehensive time investment of approximately 24 h. This is a shorter build time, start to finish, than comparable studies (Eldridge 2011, Waddle et. al. 2009). Assembling training data by cross-referencing MCSs decreased the overall time investment required for our study. One other advantage to our approach is the small, yet effective, amount of test data. Time required to process 120 min of audio was < 2 min. Processing times are dependent on a multitude of factors (e.g., computer processing power), and may vary between researchers. We are enabled to utilize fewer files because each file often contains multiple vocalizations because of the Houston toad's explosive breeding strategy (Price 2003).
Manual methods of audio review required approximately 32 h to complete, or approximately 1 min per audio file. For files that contained no Houston toad vocalizations, this was fast and simple. However, for those files that possessed vocalizations, quantification and interpretation required greater effort. Automated methods of detection required < 6 h to complete. It required an additional hour to quantify and interpret results. In other words, 7 h of analytical effort (1 h of active effort) are required to interpret approximately 80 h of audio from a single location. For comparison, the expected MCS effort is between 30 and 60 min per site per season (USFWS 2007). These estimates do not include time or cost invested in the logistics of using ARDs, which are highly variable and difficult to estimate (i.e., deployment, periodic battery and memory card changes, transfer of data from removable memory, and file organization).
Errors inherent to traditional MCSs include observer bias, temporal variation, ease of access, right of entry, hazardous roadways, and presence of observer effects (Bee et al. 2007; Crouch and Paton 2002; Cook et al. 2011; Corn et al. 2011; Pierce and Hall 2013). Many, but not all, of these errors can be corrected via the implementation of ARDs. Although ARDs have their own suite of errors (i.e., data loss, battery life, theft, physical right of entry), the advantages they offer may outweigh these shortcomings. Although ARDs are becoming more commonly implemented, we are unaware of any ongoing long-term monitoring program for anurans that uses automated detection in practice. This represents a growing body of anuran chorusing data with limited implementation of available tools for analyzing said data. Given recent advancements, software now offers simple user-friendly foundations for complete development of robust and reliable automated audio pattern recognition tools. Improved methods of creating these tools may enable researchers to better interpret and apply these detection tools, as indicated by this research involving the endangered Houston toad. Furthermore, as the use of ARDs and methods of automated detection become more commonplace, more strict management actions are likely to follow, which inevitably leads to increased interaction, and ideally cooperation, with landowners, minimizing the occurrence of the errors associated with ARDs as outlined above.
More data can be provided by ARDs than by MCSs, and when coupled with automated detection tools, data can be processed quickly and consistently. However, at this time, ARDs and methods of automated detection are not a panacea. During our study ARDs were unable to detect vocalizations emanating from adjacent ponds; thus they are less sensitive to chorusing at great distances than human surveyors. These drawbacks make ARDs less suitable than MCSs for informing regulatory agencies given the current survey guidelines (USWFS 2007). To meet the requirements, as they exist presently, an ARD would be placed at each and every pond within and adjacent to a proposed project area, which is in most cases not feasible. Thus, combining both ARDs and MCSs may be the best solution. Ultimately, our recognizer performed such that no single survey night possessed even a single uncharacteristic call that went overlooked. Thus, any errors resulting from insufficient signal-to-noise ratio had no underlying implications on the recognizer's ability to provide presence/absence data at the site level that are critical to informing state and federal agencies of the current and potentially underestimated occurrence of the rare and endangered Houston toad.
Please note: The Journal of Fish and Wildlife Management is not responsible for the content or functionality of any supplemental material. Queries should be directed to the corresponding author for the article.
Reference S1. Potter FEJ, Brown LE, McClure WL, Scott NJ, Thomas RA. 1984. Recovery plan for the Houston toad (Bufo houstonensis). U.S. Fish & Wildlife Service.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S1; also available at http://www.amphibians.org/wp-content/uploads/2013/07/Huston-Toad-Recovery-Plan.pdf (12,103 KB PDF).
Reference S2. Price AH. 2003. The Houston toad in Bastrop State Park 1990–2002: a narrative. Open- file report 03-0401, Texas Parks & Wildlife Department.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S2 (933 KB PDF).
Reference S3. [USFWS] U.S. Fish and Wildlife Service. 1999. Survey Protocol for the Arroyo toad. Carlsbad & Ventura, California: U.S. Fish and Wildlife Service.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S3; also available at https://www.fws.gov/pacific/ecoservices/endangered/recovery/documents/AroyoToad.1999.protocol.pdf (25 KB PDF).
Reference S4. [USFWS] U.S. Fish and Wildlife Service. 2005. Revised guidance on site assessments and field surveys for the California red-legged frog. Sacramento, California: U.S. Fish and Wildlife Service.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S4; also available at https://www.fws.gov/arcata/es/amphibians/crlf/documents/20050801_CRLF_survey-guidelines.pdf (144 KB PDF).
Reference S5. [USFWS] U.S. Fish and Wildlife Service. 2006. Chiricahua leopard frog (Rana chiricahua) draft recovery plan with appendices. Appendix E:E-1–E-15.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S5; also available at https://www.fws.gov/southwest/es/Documents/R2ES/DRAFT_Recovery_Plan_for_the_Chiricahua_Leopard_Frog_with_Appendices.pdf (6,427 KB PDF).
Reference S6. [USFWS] U.S. Fish and Wildlife Service. 2007. Section 10(a)(1)(A). Scientific permit requirements for conducting Houston toad presence/absence surveys. Austin, Texas: U.S. Fish and Wildlife Service.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S6; also available at https://www.fws.gov/southwest/es/Documents/R2ES/Houston_toad_survey_requirements.pdf (29 KB PDF).
Audio S1. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S7 (21,646 KB PDF)
Audio S2. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S8 (24,077 KB PDF)
Audio S3. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S9 (19,956 KB PDF)
Audio S4. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S10 (19,407 KB PDF)
Audio S5. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S11 (23,897 KB PDF)
Audio S6. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S12 (20,892 KB PDF)
Audio S7. Training data used within SongScope© bioacoustics software to build a recognizer for the call of the endangered Houston toad Anaxyrus houstonensis.
Found at DOI: http://dx.doi.org/10.3996/052017-JFWM-047.S13 (23,369 KB PDF)
We are grateful for the support and assistance of Magellan Midstream Partners L.P. and the staff at Zephyr Environmental Corporation for their assistance with sites in Robertson County. We thank Jeff Farrar for his daily assistance in coordinating all of the respective teams, as well as the Boy Scouts of America, Blue Bonnet Energy, the Musgrave Family, and the Texas Department of Transportation, each for right of entry to additional sites. We had exceptional field personnel in the efforts of all those who surveyed for Houston toads throughout the 2014 breeding season, especially Jay Dixon, D.J. Stout, Jim Bell, Tim Clarke, Mike Horvath, and Jennifer Knowles. Finally, we thank Paul Crump, three anonymous reviewers, and the Associate Editor, who each provided comments that improved an earlier version of this manuscript. All work that was conducted to complete this study was performed under scientific permit TE-039544-1 issued to M.R.J.F. by the USFWS.
Any use of trade, product, website, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Citation: MacLaren R, McCracken SF, Forstner MRJ. 2018. Development and validation of automated detection tools for vocalizations of rare and endangered anurans. Journal of Fish and Wildlife Management 9(1):144–154; e1944-687X. doi:10.3996/052017-JFWM-047
The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the U.S. Fish and Wildlife Service.