Camera traps provide a low-cost approach to collect data and monitor wildlife across large scales but hand-labeling images at a rate that outpaces accumulation is difficult. Deep learning, a subdiscipline of machine learning and computer science, can address the issue of automatically classifying camera-trap images with a high degree of accuracy. This technique, however, may be less accessible to ecologists or small-scale conservation projects, and has serious limitations. In this study, we trained a simple deep learning model using a dataset of 120,000 images to identify the presence of nilgai Boselaphus tragocamelus, a regionally specific nonnative game animal, in camera-trap images with an overall accuracy of 97%. We trained a second model to identify 20 groups of animals and one group of images without any animals present, labeled as “none,” with an accuracy of 89%. Lastly, we tested the multigroup model on images collected of similar species, but in the southwestern United States, resulting in significantly lower precision and recall for each group. This study highlights the potential of deep learning for automating camera-trap image processing workflows, provides a brief overview of image-based deep learning, and discusses the often-understated limitations and methodological considerations in the context of wildlife conservation and species monitoring.

Camera traps, wireless cameras placed on trees or posts activated via motion sensors, are important tools for wildlife studies. Wildlife biologists have used them to estimate population densities (Howe et al. 2017), create species lists and inventories in dense tropical environments (Srbek-Araujo and Chiarello 2005; Lading 2006), understand population size and distributions (O'Connell et al. 2010), and identify new species (Rovero and Rathbun 2006). Their relatively low cost and ease make them scalable across large geographic regions. A common problem, however, is the rapid accumulation of images that outpaces the ability of users to manually sort and label them (Swanson et al. 2016). To address this issue, researchers have identified deep learning, a subfield of machine learning, as a powerful technique to automate the process of classifying, or grouping, images by species (Gomez et al. 2016; Norouzzadeh et al. 2018; Willi et al. 2019). Applications of deep learning for camera-trap classification have often relied on extremely large collections of images like Snapshot Serengeti (∼7 million images) or the North American Camera Trap dataset (3.3 million images) for training (Swanson et al. 2015; Tabak et al. 2019; Schneider et al. 2020). Transfer learning, a deep learning technique that starts with pretrained models as a base for future learning, can overcome this problem. Both Schneider et al. (2020) and Shahinfa et al. (2020) found that they needed only 1,000 images per class to achieve an accuracy of 97 and 98%, respectively, for eight classes. Despite growing popularity, applications of transfer learning for rapid camera-trap classification may still be beyond the expertise of many ecologists and conservation practitioners.

Our aim was to present an application of deep learning–based camera-trap analysis using a small dataset of 120,000 images. We trained a model using transfer learning, evaluated its accuracy, and demonstrated its limitations when applied on images outside the model's training context. We leveraged a nature-specific model by Cui et al. (2018) as a base to further train a south Texas–specific animal classifier. More specifically, we drew from a local database of camera-trap images to train 1) a binary classifier that discriminates between a single species, nilgai Boselaphus tragocamelus, an exotic bovid with expanding populations in south Texas and 2) a multigroup classifier for 20 animal groups and one “none” group. Lastly, we tested the model and its ability to generalize on images with similar classes but in different settings using the CalTech camera-trap dataset collected in the southwestern United States (Beery et al. 2018). Find resources and further details about training and implementation at the authors' github repository (Data S1–S4, Text S1 and S2, Supplemental Material).

We collected image data from motion-sensitive cameras placed in areas of known wildlife activity in Cameron County in the lower Rio Grande Valley of Texas from 2018 to 2019. This county is along the international border and characterized by a mosaic of shrubby plants, mesquite, and semiarid vegetation. Ranchers introduced free-ranging nilgai native to the Indian subcontinent in the 1930s (Leslie 2008). Although there appears to be no competition with other native species, nilgai inhabit areas that support species of conservation concern such as northern populations of ocelot Leopardus pardalis and perhaps the Gulf Coast jaguarundi Puma yagouaroundi cacomitli (Schmidly 2004; Leslie 2016). Furthermore, recent studies reveal that nilgai are optimal hosts for the southern cattle-fever tick Rhipicephalus microplus and have exacerbated current efforts to eradicate this exotic pest of wildlife and livestock (Lohmeyer et al. 2018). As such, monitoring nilgai behavior, population, and distribution have important implications for both wildlife management and agriculture in the region (Foley et. al. 2017; Goolsby et al. 2019).

Image data and preprocessing

We randomly drew images for each group from a local database that is part of a multiyear field research project aimed at treating cattle fever tick-infested nilgai at fence crossings. Research technicians with advanced experience in recognizing animals of interest hand-labeled images using the open-access Colorado Parks and Wildlife Photo Warehouse, a custom Microsoft Offices Access application designed specifically to store, manage, label, and analyze wildlife camera-trap data (Ivan and Newkirk 2016). We created three types of datasets necessary for training deep neural networks: 1) a large training set (∼85% of total images) for model learning, 2) a smaller validation set (∼5% of total images) for frequent testing and adjustment of model settings, and 3) a test set to evaluate the final trained model (∼10% of total images). We created separate training, validation, and test sets for each classifier.

Balancing training set

A balanced training set contains an even distribution of images across each group. The original raw image set of >2.5 million images was highly imbalanced with 84% (∼2 million images) having no wildlife, which we labeled as “none.” The top seven most common groups include feral pigs Sus scrofa, falsely triggered camera events, human activity, birds, nilgai, deer Odocoileus virginianus, and cattle. Camera-trap datasets are often imbalanced because of wind, grass, or other nontarget objects that create false capture events. Training on the complete dataset would be problematic because models can favor groups with more examples while ignoring those with only a few (Norouzzadeh et al. 2018). The model would overfit in such a way that a single group (“none”) could be predicted for every instance and still result in a high overall accuracy. To correct the imbalance, we oversampled or sampled with replacement so each group had roughly the same number of images (He and Garcia 2009). For example, if the “dog” group only had 50 unique images, we copied each until the total number of images matched that of the most frequently occurring group. While this oversampling technique balances the dataset, it has drawbacks. Because it repeats images in rare groups, the model lacks robustness in these groups to generalize on new examples in the future. This might be an issue for conservation projects focusing on rare species that are important to monitor but rarely occur. For this study, however, the most important group, “nilgai,” was one of the most frequently occurring. Still, to reduce the number of copies for oversampling, we lowered our total image set size from 2.5 million to 120,000 by taking slightly more than the next most frequent group (“human”). Additionally, a dataset of 120,000 images instead of 2.5 million lowered training time from weeks to days. We further altered data by combining or eliminating groups. We combined four groups—“feral cat,” “ocelot,” “bobcat” Lynx rufus, and “exotics, other”—to create the “cat” group and eliminated “unknown” and “squirrel.” These groups either lacked sufficient examples or were mislabeled (e.g., an image of a bobcat was labeled as ocelot). Each capture event included three images taken in rapid successive order. Individual images, not capture events, were classified by research technicians, and contributed to the total dataset size and class count.

We applied four types of data augmentation, a technique commonly used to strengthen model predictions by slightly altering images. We rotated, shifted, sheared, and flipped images both horizontally and vertically. We performed augmentation for each training cycle and performed different augmentations randomly for each image. Preprocessing also included rescaling pixel values between 0 and 1 and resizing the image from 2,048 × 1,152 to 299 × 299 pixels, standard procedures done to reduce the computational expense of training. The seven most common groups included feral hogs, a “none” group, human activity, birds, white-tailed deer, and cattle (Figure 1). Data preprocessing is an important step for reducing computational demands and increasing model robustness.

Figure 1.

Examples of cropped and resized camera trap images collected in the lower Rio Grande Valley of Texas in 2018 and 2019 and used for training a deep learning model that can automatically classify new images of wildlife. The top seven most common animal groups in the image dataset include (A) feral pigs labeled as “pigs,” (B) falsely triggered capture events without animals as “none,” (C) signs of human activity as “human,” (D) “bird,” (E) “nilgai” Boselaphus tragocamelus, (F) white-tailed deer Odocoileus virginianus as “deer,” and (G) “cattle.”

Figure 1.

Examples of cropped and resized camera trap images collected in the lower Rio Grande Valley of Texas in 2018 and 2019 and used for training a deep learning model that can automatically classify new images of wildlife. The top seven most common animal groups in the image dataset include (A) feral pigs labeled as “pigs,” (B) falsely triggered capture events without animals as “none,” (C) signs of human activity as “human,” (D) “bird,” (E) “nilgai” Boselaphus tragocamelus, (F) white-tailed deer Odocoileus virginianus as “deer,” and (G) “cattle.”

Close modal

Deep learning

A subfield of machine learning, deep learning aims to extract information from big data by learning from successive layers of increasingly meaningful representations called features (Chollet 2018). Many layers trained on labeled data and extract features hierarchically make up a neural network, a type of deep learning model. Information from previous layers informs following layers and is stored in the form of weights to make predictions on new unlabeled data. The neural network uses predicted and actual values to calculate an error score that is propagated back through the network to adjust weight values. Learning occurs iteratively by updating weights in such a way that optimizes its ability to reduce its error score. The model trains early layers to react strongly to simple features like edges, lines, and sharp color gradients, while the final layer of a neural network infers probabilities of input features to a class like “nilgai” or “deer.” The model distills features hierarchically from complex input images to a single prediction value (Figure 2; Toda and Okura 2019).

Figure 2.

Inside a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. The trained model identifies important image patterns, or features, associated with each class to make predictions. The model distills image data to its representative features; filtering layers extract meaningful characteristics (highlighted in yellow), a flattening layer transforms a three-dimensional array of feature values into two dimensions, and the final connected layer produces predicted model probabilities by class ending with an output label, “nilgai” Boselaphus tragocamelus. Parentheses indicate the dimensions of image data (width, length, channel).

Figure 2.

Inside a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. The trained model identifies important image patterns, or features, associated with each class to make predictions. The model distills image data to its representative features; filtering layers extract meaningful characteristics (highlighted in yellow), a flattening layer transforms a three-dimensional array of feature values into two dimensions, and the final connected layer produces predicted model probabilities by class ending with an output label, “nilgai” Boselaphus tragocamelus. Parentheses indicate the dimensions of image data (width, length, channel).

Close modal

Training a neural network from scratch often requires large amounts of data. However, transfer learning, an approach useful for training on small datasets, applies the stored knowledge of a model pretrained on large generic data as a base for similar but more specific problems. It transfers knowledge in the form of saved files that contain weights, complete or partial model architectures, and settings. Researchers can easily download model parameters from open-source libraries and read them into a new training instance. Feature extraction, the first step in using pretrained models, involves replacing and training only the final layer of a neural network on a new problem-specific dataset. The second process trains all layers including the newly added final layer. It adjusts network weights, making the model task-specific. Feature extraction must occur first since the final layer restricts overly large weight adjustments that could negatively affect inference or model prediction. Our model was pretrained by Cui et al. (2018), who used the iNaturalist 2017 dataset of 579,184 nature-specific objects including insects, mammals, and amphibians (Ueda 2017; Van Horn et al. 2018). We then trained on a smaller but domain-specific dataset of south Texas wildlife (Figure 3).

Figure 3.

We performed transfer learning by updating a model pretrained on a larger iNaturalist dataset using a small but regionally specific camera-trap dataset collected in the lower Rio Grande Valley in Texas in 2018 and 2019 to automatically classify new, unlabeled images (Ueda 2017). Transfer learning applies the learned features of large datasets to a more specific task.

Figure 3.

We performed transfer learning by updating a model pretrained on a larger iNaturalist dataset using a small but regionally specific camera-trap dataset collected in the lower Rio Grande Valley in Texas in 2018 and 2019 to automatically classify new, unlabeled images (Ueda 2017). Transfer learning applies the learned features of large datasets to a more specific task.

Close modal

Training and evaluation

We customized the InceptionV3 model, defined by its sequence and type of layers, to our unique number of groups (Szegedy et al. 2016). After each training cycle, we used the validation set to monitor performance and adjust model settings. In total, the model updated ∼21 million weight parameters until it stopped improving on the validation set; it took roughly 24 hours for both the multilabel and binary classifiers while using a single graphic processing unit. We evaluated each model after adjustments and training completed by reporting prediction results on the test set—the number of true positives, true negatives, false positives, and false negatives—for each classifier. We calculated five common accuracy metrics: overall accuracy, precision, recall, harmonic mean using precision and recall known as the F1 score, and the Matthews correlation coefficient, an adjusted form of the φ coefficient (Table 1; Guilford 1954). We used a second test set, collected from the southwestern Unites States and known as the CalTech dataset, to further evaluate model robustness (Beery et al. 2018).

Table 1.

Five metrics used to evaluate the accuracy of a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. The five metrics include overall accuracy, precision, recall, harmonic mean using precision and recall known as the F1 score, and the Matthews correlation coefficient. The table also provides descriptions and equations. We gathered true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from prediction results.

Five metrics used to evaluate the accuracy of a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. The five metrics include overall accuracy, precision, recall, harmonic mean using precision and recall known as the F1 score, and the Matthews correlation coefficient. The table also provides descriptions and equations. We gathered true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from prediction results.
Five metrics used to evaluate the accuracy of a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. The five metrics include overall accuracy, precision, recall, harmonic mean using precision and recall known as the F1 score, and the Matthews correlation coefficient. The table also provides descriptions and equations. We gathered true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from prediction results.

The trained binary classifier achieved an overall accuracy of 0.97, F1 score of 0.97, and Matthews correlation coefficient of 0.94, indicating the classifier was able to generalize on new images from the same area and accurately predict the presence of a nilgai. During training, we found an ∼15% increase in validation accuracy from the first to second stages. Recall (0.98) was slightly larger than precision (0.96), which is favorable for this unique task. The occasional instance of deer or cattle classified as a nilgai is preferred because research technicians will likely review and “catch” these images. A lost and uncounted nilgai image, however, is more detrimental to overall project goals. For multigroup problems, the average of the Matthews correlation coefficient is a more appropriate evaluation metric because it pools the performance over all samples and groups. Our multigroup classifiers achieved an average Matthews correlation coefficient of 0.89. Group-wise test results and evaluation metrics show that two of the most highly correlated classes—“skunk” Mephitis mephitis and “tortoise” Gopherus berlandieri—were the most imbalanced with each having <22 images (Table 2). The three most common groups in our dataset—“nilgai,” “deer,” and “none”—were strongly correlated. The multigroup classifier was successful in classifying 21 groups (Figure 4). For the second evaluation using the CalTech dataset, we adjusted classes to complement those of the south Texas dataset. We removed dissimilar classes (“bat,” “lizard,” “badger”), combined similar classes (“car” and “human”), and renamed classes when appropriate (“bobcat” to “cat”). The average Matthews correlation coefficient for the CalTech dataset was 0.22; further inspection of the other four metrics by class also indicated very poor performance (Table 3).

Table 2.

Evaluation results of a deep learning model trained and tested on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. Results of predictions made on new images not included in training were compared with their true labels to calculate overall accuracy including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN); precision; recall, harmonic mean using precision and recall known as the F1 score (F1); and the Matthews correlation coefficient (MCC). The precision, recall, accuracy, and F1 score are ratios from 0 to 1 while MCC is between −1 and 1.

Evaluation results of a deep learning model trained and tested on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. Results of predictions made on new images not included in training were compared with their true labels to calculate overall accuracy including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN); precision; recall, harmonic mean using precision and recall known as the F1 score (F1); and the Matthews correlation coefficient (MCC). The precision, recall, accuracy, and F1 score are ratios from 0 to 1 while MCC is between −1 and 1.
Evaluation results of a deep learning model trained and tested on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. Results of predictions made on new images not included in training were compared with their true labels to calculate overall accuracy including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN); precision; recall, harmonic mean using precision and recall known as the F1 score (F1); and the Matthews correlation coefficient (MCC). The precision, recall, accuracy, and F1 score are ratios from 0 to 1 while MCC is between −1 and 1.
Figure 4.

A random sample of 16 model predictions illustrates the performance of a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in 2018 and 2019. The trained model was designed to classify images into 20 animal groups and one empty “none” group. We drew sample test images from the original dataset but did not include them for training. Titles signify classifier predictions for each image. In this sample, a single incorrectly labeled image, middle-right, predicted as “pig” was in fact an image of nilgai Boselaphus tragocamelus as shown by the white arrow.

Figure 4.

A random sample of 16 model predictions illustrates the performance of a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in 2018 and 2019. The trained model was designed to classify images into 20 animal groups and one empty “none” group. We drew sample test images from the original dataset but did not include them for training. Titles signify classifier predictions for each image. In this sample, a single incorrectly labeled image, middle-right, predicted as “pig” was in fact an image of nilgai Boselaphus tragocamelus as shown by the white arrow.

Close modal
Table 3.

Evaluation results for a deep learning model trained on camera trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019 but tested on the CalTech camera trap dataset (Beery et al. 2018). The CalTech dataset was collected in the southwestern United States in 2018, contains similar animal groups, but includes conditions and backgrounds which are absent in the original Texas training set. Results of predictions made on images not included in training were compared with their true labels to calculate overall accuracy including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN); precision; recall; harmonic mean using precision and recall known as the F1 score (F1); and the Matthews correlation coefficient (MCC). The precision, recall, accuracy, and F1 score are ratios from 0 to 1 while MCC is between −1 and 1.

Evaluation results for a deep learning model trained on camera trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019 but tested on the CalTech camera trap dataset (Beery et al. 2018). The CalTech dataset was collected in the southwestern United States in 2018, contains similar animal groups, but includes conditions and backgrounds which are absent in the original Texas training set. Results of predictions made on images not included in training were compared with their true labels to calculate overall accuracy including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN); precision; recall; harmonic mean using precision and recall known as the F1 score (F1); and the Matthews correlation coefficient (MCC). The precision, recall, accuracy, and F1 score are ratios from 0 to 1 while MCC is between −1 and 1.
Evaluation results for a deep learning model trained on camera trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019 but tested on the CalTech camera trap dataset (Beery et al. 2018). The CalTech dataset was collected in the southwestern United States in 2018, contains similar animal groups, but includes conditions and backgrounds which are absent in the original Texas training set. Results of predictions made on images not included in training were compared with their true labels to calculate overall accuracy including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN); precision; recall; harmonic mean using precision and recall known as the F1 score (F1); and the Matthews correlation coefficient (MCC). The precision, recall, accuracy, and F1 score are ratios from 0 to 1 while MCC is between −1 and 1.

Our aim was to test if we could use a small number of hand-labeled camera-trap images to train a deep learning model to automatically detect wildlife, including a specific species. We also explored the limits of our model by testing on a dataset that we did not use in training, and had similar species, but different context. Class imbalance played a major role in skewing the performance of the model on rare classes where test images were similar to training images. For example, a tortoise's slow movement was enough to trigger the camera sensor multiple times, which resulted in many nearly identical images. Because rare groups contained an even fewer number of images in the test set, it was difficult to evaluate their accuracy. Addressing the class imbalance issue is an important factor for improving results. Applying a technique like emphasis sampling can increase prediction accuracy by duplicating, or emphasizing, only images that have been misclassified instead of oversampling all rare groups (Norouzzadeh et al. 2018). This approach is more dynamic because it balances data as needed by responding to prediction results. Alternatively, researchers can combine multiple data sources to add images to rare classes from other camera-trap datasets (Swanson et al. 2015; LILA BC 2019). However, this approach risks introducing too many dissimilar environmental settings, images, and class types. Secondly, evaluating a second dataset allowed us to illustrate the model's lack of location invariance or inability to generalize on new images with conditions not represented in the training set (Beery et al. 2018). The strength of the model to make accurate predictions under a diverse set of conditions depends on how well the training data represents those conditions. Lastly, researchers adopting a trained model into an automatic camera-trap classification workflow should closely monitor it by inspecting important and rare groups for anomalies or regularly testing it on a subset of new images. As our study shows, new camera angles, species, or locations pose challenges to accurate classifications. Transfer learning has the potential to save time and resources typically required to hand-label camera-trap images. A simple trained classifier making predictions on 3,000 raw images saves roughly 12 personnel hours. Applications of deep learning, while traditionally left to experts in computer vision, have become less complicated with the emergence of publicly available datasets and open-source software. Likewise, we include our code, trained model, instructions, and a set of sample images that we hope improve the transfer of knowledge from academia to the field.

Please note: The Journal of Fish and Wildlife Management is not responsible for the content of functionality of any supplemental material. Queries should be directed to the corresponding author for the article.

Data S1. A set of two IPython notebooks to automatically classify and evaluate sample images using a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. We designed the model to classify images as wildlife or as being empty (a false camera trigger event). Notebooks use additional supplemental data such as input weight files, a sample repository of images, and true image labels to evaluate predictions. The notebooks generate a new set of folders for each class, copy input images, and place them in folders based on predicted group. The notebooks generate figures of the distribution of predictions across animal groups. We applied a csv file containing true image labels to generate an evaluation report.

Available: https://doi.org/10.3996/JFWM-20-076.S1 (5.89 KB ZIP) and https://github.com/mkutu/Nilgai/tree/master/notebooks (15.57 MB IPYNB)

Data S2. A sample of 222 new images from the camera-trap dataset collected in the lower Rio Grande Valley in Texas in 2018 and 2019. With this sample, along with true label information (also provided in the supplemental material), users can test the deep learning model to automatically classify images as wildlife or as being empty (a false camera trigger event).

Available: https://doi.org/10.3996/JFWM-20-076.S2 (29.2 MB ZIP) and https://github.com/mkutu/Nilgai/tree/master/images/images (28.5 MB JPG)

Data S3. The csv file contains true image label information for evaluating the accuracy of a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019.

Available: https://doi.org/10.3996/JFWM-20-076.S3 (7.36 KB CSV) and https://github.com/mkutu/Nilgai/blob/master/notebooks/image_labels.csv

Data S4. A set of two .h5 files that contain the stored weights and model settings created by training a deep learning model on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019.

Available: https://doi.org/10.3996/JFWM-20-076.S4 (239 MB ZIP) and https://github.com/mkutu/Nilgai/tree/master/model

Text S1. A “README.md” text file with instructions for creating a virtual environment needed for running a deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019. A virtual environment allows users to install dependencies, small pieces of software in the form of source code, that are required to run Python programs without making major changes to the users' systems. Instructions outline the procedures for setting up environments for both Windows and Mac OSX operating systems. Notes on troubleshooting are also included.

Available: https://doi.org/10.3996/JFWM-20-076.S5 (3.33 KB TXT) and https://github.com/mkutu/Nilgai/blob/master/README.md

Text S2. A “requirements.txt” text file used to install the required Python dependencies, small pieces of software in the form of source code, inside the virtual environment. Dependencies are required to run the deep learning model trained on camera-trap images collected in the lower Rio Grande Valley in Texas in 2018 and 2019.

Available: https://doi.org/10.3996/JFWM-20-076.S6 (1 KB TXT) and https://github.com/mkutu/Nilgai/blob/master/requirements.txt

Game camera images and initial processing was supported through appropriated research project 3094-32000-042-00-D, Integrated Pest Management of Cattle Fever Ticks. This article reports results of research only and mention of a proprietary product does not constitute an endorsement or recommendation by the U.S. Department of Agriculture for its use. U.S. Department of Agriculture is an equal opportunity provider and employer. Special thanks to Amelia Berle for data management, and research technicians who spent countless hours labeling images. Additional thanks to Dr. Rupesh Kariyat and Dr. Christofferson for providing access to computing equipment. We would also like to thank the journal reviewers and Associate Editor for their commitment to open access, which ensures applied conservation science remains accessible to all. Matthew Kutugata was supported by U.S. Department of Agriculture National Institute of Food and Agriculture Grant 2016-38422-25543.

Any use of trade, product, website, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Beery
S,
Horn
GV,
Perona
P.
2018
.
Recognition in terra incognita
.
Pages
456
473
in
Proceedings of the Computer Vision–ECCV 2018, 15th European Conference, Munich, Germany.
Chollet
F.
2018
.
Deep learning with Python
.
Shelter Island, New York
:
Manning Publications Co
.
Cui
Y,
Song
Y,
Sun
C,
Howard
A,
Belongie
S.
2018
.
Large scale fine-grained categorization and domain-specific transfer learning
.
In Conference on Computer Vision and Pattern Recognition
.
Foley
AM,
Goolsby
JA,
Ortega-S.
A,
Ortega-S.
JA,
Pérez de León
A,
Singh
NK,
Schwartz
A,
Ellis
D,
Hewitt
DG,
Campbell
TA
.
2017
.
Movement patterns of nilgai antelope in South Texas: implications for cattle fever tick management
.
Preventive Veterinary Medicine
146
:
166
172
.
Gomez
A,
Diez
G,
Salazar
A,
Diaz
A.
2016
.
Animal identification in low quality camera-trap images using very deep convolutional neural networks and confidence thresholds
.
Pages
747
756
in
Bebis
G,
Boyle
R,
Parvin
B,
Koracin
D,
Porikli
F,
Skaff
S,
Entezari
A,
Min
J,
Iwai
D,
Sadagic
A,
Scheidegger
C,
Isenberg
T,
editors.
International symposium on visual computing
.
Cham, Switzerland
:
Springer International Publishing
.
Goolsby
J,
Cantu
D,
Vasquez
A,
Racelis
A.
2019
.
Development of a remotely activated field sprayer and evaluation of temperature and aeration on the longevity of Steinernema riobrave entomopathogenic nematodes for treatment of cattle fever tick-infested nilgai
.
Subtropical Agriculture and Environments
70
:
1
5
.
Guilford
JP.
1954
.
Psychometric methods
.
New York
:
McGraw-Hill
.
He
H,
Garcia
EA.
2009
.
Learning from imbalanced data
.
IEEE Transactions on Knowledge and Data Engineering
21
:
1263
1284
.
Howe
EJ,
Buckland
ST,
Després-Einspenner
ML,
Kühl
HS.
2017
.
Distance sampling with camera traps
.
Methods in Ecology and Evolution
8
:
1558
1565
.
Ivan
JS,
Newkirk
ES.
2016
.
CPW photo warehouse: a custom database to facilitate archiving, identifying, summarizing and managing photo data collected from camera traps
.
Methods in Ecology and Evolution
7
:
499
504
.
Lading
E.
2006
.
Camera trapping and conservation in Lambir Hills National Park, Sarawak
.
The Raffles Bulletin of Zoology
54
:
469
475
.
[LILA BC] Labeled Information Library of Alexandria: Biology and Conservation.
2019
.
Available: http://lila.science/datasets (July 2021)
Leslie
DM
Jr.
2008
.
Boselaphus tragocamelus (Artiodactyla: Bovidae)
.
Mammalian Species
813
:
1
16
.
Leslie
DM
Jr.
2016
.
An international borderland of concern: conservation of biodiversity in the lower Rio Grande Valley
.
Reston, Virginia
:
U.S. Geological
Survey. Scientific Investigations Report 2016-5078.
Lohmeyer
KH,
May
MA,
Thomas
DB,
Pérez de León
AA.
2018
.
Implication of nilgai antelope (Artiodactyla: Bovidae) in reinfestations of Rhipicephalus (boophilus) microplus (Acari: Ixodidae) in South Texas: a review and update
.
Journal of Medical Entomology
55
:
515
522
.
Norouzzadeh
MS,
Nguyen
A,
Kosmala
M,
Swanson
A,
Palmer
MS,
Packer
C,
Clune
J.
2018
.
Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning
.
Proceedings of the National Academy of Sciences of the United States of America
115
:
E5716
E5725
.
O'Connell
AF,
Nichols
JD,
Karanth
KU.
2010
.
Camera traps in animal ecology: methods and analyses
.
New York
:
Springer Science + Business Media
.
Rovero
F,
Rathbun
GB.
2006
.
A potentially new giant sengi (elephant-shrew) from the Udzungwa Mountains, Tanzania
.
Journal of East African Natural History
95
(2)
:
111
115
.
Schmidly
DJ,
Bradley
RD.
2004
.
The mammals of Texas
.
Austin
:
University of Texas Press
.
Srbek-Araujo
AC,
Chiarello
AG.
2005
.
Is camera-trapping an efficient method for surveying mammals in neotropical forests? A case study in south-eastern Brazil
.
Journal of Tropical Ecology
21
:
121
125
.
Swanson
A,
Kosmala
M,
Lintott
C,
Packer
C.
2016
.
A generalized approach for producing, quantifying, and validating citizen science data from wildlife images
.
Conservation Biology
30
:
520
531
.
Swanson
A,
Kosmala
M,
Lintott
C,
Simpson
R,
Smith
A,
Packer
C.
2015
.
Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna
.
Scientific Data
2
:
1
14
.
Szegedy
C,
Vanhoucke
V,
Ioffe
S,
Shlens
J,
Wojna
Z.
2016
.
Rethinking the inception architecture for computer vision
.
Las Vegas, Nevada
:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
.
Tabak
MA,
Norouzzadeh
MS,
Wolfson
DW,
Sweeney
SJ,
Vercauteren
KC,
Snow
NP,
Halseth
JM,
Salvo
PAD,
Lewis
JS,
White
MD,
Teton
B,
Beasley
JC,
Schlichting
PE,
Boughton
RK,
Wight
B,
Newkirk
ES,
Ivan
JS,
Odell
EA,
Brook
RK,
Lukacs
PM,
Moeller
AK,
Mandeville
EG,
Clune
J,
Mille
RS.
2019
.
Machine learning to classify animal species in camera trap images: applications in ecology
.
Methods in Ecology and Evolution
10
:
585
590
.
Toda
Y,
Okura
F.
2019
.
How convolutional neural networks diagnose plant disease
.
Plant Phenomics
2019
:
9237136
.
Ueda
K.
2017
.
iNaturalist research-grade observations
.
Online database
:
occurrence dataset
.
Van Horn
G,
Mac Aodha
O,
Song
Y,
Cui
Y,
Sun
C,
Shepard
A,
Adam
H,
Perona
P,
Belongie
S.
2018
.
The INaturalist species classification and detection dataset
.
Pages
8769
8778
in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Willi
M,
Pitman
RT,
Cardoso
AW,
Locke
C,
Swanson
A,
Boyer
A,
Veldthuis
M,
Fortson
L.
2019
.
Identifying animal species in camera trap images using deep learning and citizen science
.
Methods in Ecology and Evolution
10
:
80
91
.

The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the U.S. Fish and Wildlife Service.

Author notes

Citation: Kutugata M, Baumgardt J, Goolsby JA, and Racelis AE. 2021. Automatic camera-trap classification using wildlife-specific deep learning in nilgai management. Journal of Fish and Wildlife Management 12(2):412–421; e1944-687X. https://doi.org/10.3996/JFWM-20-076

Supplemental Material