Can you figure out what species this is? Computers can
SOURCE: Willi, M., Pitman, R. T., Cardoso, A. W., Locke, C., Swanson, A., Boyer, A., Veldthuis, M. & Fortson, L. (2019). Identifying animal species in camera trap images using deep learning and citizen science. Methods in Ecology and Evolution, 10(1), 80-91, https://doi.org/10.1111/2041-210X.13099.
Cover photo: Arunachal macaque in a camera trap image from the Eaglenest Wildlife Sanctuary. Nandini Velho, Wikimedia Commons.
Camera traps and conservation
Protecting wild animals requires far more data than scientists could collect alone. So researchers often enlist the help of amateur “citizen scientists” to help out in the process. Citizen scientists are people who may not have formal science education but volunteer their time to help with scientific research. One of the easiest ways citizen scientists can help is by looking at camera trap photos (that is, photos that were taken automatically by motion-sensing cameras in the wild) and identifying any animals in the photos. Using this, scientists can answer important questions about the location, population size, and behavior of many different types of animals. This approach has already helped to get data that can aid in the conservation of lions, sea lions, wild cats, sea birds, giraffes and more.
However, with more and more large scale projects that need citizen scientists to identify animals, it is taking an increasingly long time to process all of the photos from any individual study. Marco Willi from the University of Minnesota and his colleagues thought there might be a way to speed things up: by getting computers to identify most of the easy animals, and leaving humans to figure out the extra hard ones.
Teaching the computer Zoology
Just like humans learn by best example, creating an algorithm usually involves giving the computer a training dataset and teaching it how to classify those examples that we already know. In this case, that meant taking images from four different camera trap studies and allowing the computer to identify which features of the image led to the photo being classified as a given species.
Once they had a few algorithms up and running, the critical next step was to evaluate how the models performed. This seems like a straightforward question, but there are actually a lot of different things to think about in evaluating the algorithms. First, it is important to know where the algorithm goes wrong when it misclassifies photos. Is it getting all the giraffes right but none of the monkeys? Is it able to tell when there is no animal in the photo? Some animals were much more common than others in the set of training photos, so the algorithm had more experience classifying those animals than others. Researchers needed to correct for this because, in the end, the algorithm should be able to correctly identify all the species present in the photos.
Blank photos are also important because between 21.7% and 83.9% of the photos from a given camera trap study did not include any animals. These would have been taken due to plants moving in the wind or other subtle movements in the frame that managed to trigger the motion-activated camera.
To deal with both of these potential pitfalls, the authors of the study split the task into two separate challenges. First, they used an algorithm to determine whether or not there was anything in the photo, and then, for the photos that had an animal in the photo, they determined what species it is.
Can learning from one system be applied to other studies?
One issue with creating these algorithms is that you need a lot of data from a certain part of the world to be able to accurately identify species there (just like you may need to live in one place for a long time before you know the plants and animals). To address this, the authors tried to create an algorithm that could learn from a given dataset (for example, animals in Africa) and then just modify what it already knows about identifying animals to learn more animals in another part of the world. So, in this case, there are two training datasets—one from a previous camera trap study that someone else did, and a much smaller dataset specific to the location and cameras used in a new study. This method is called the transfer-learning method, and the researchers wanted to test it to see how helpful the previous study is, and how much data they need from the new site.
Once they had a functioning workflow, they needed to answer a few really important questions: how much time does this really save citizen scientists? How many of the photos can the algorithm confidently predict? How can researchers be sure they catch any mistakes the computer makes?
In the end, the authors found really exciting results. The algorithms they built were able to confidently identify whether or not photos contained animals between 91.2% and 98.0% of the time, and nine times out of ten, they were also able to correctly determine the specific species of animal. Using the “transfer-learning” strategy, where the algorithm gets information from other parts of the world in addition to photos from the study it is actually working on, they were able to raise the overall accuracy. This was especially true in cases where the model was only trained on a small fraction of the data. When the model only got to learn from 12.5% of the data, it was able to get three-fourths of the images correct. However, using transfer-learning it got over 85% correct. This is true even when the algorithm learned from images in Africa and then had to identify images from Wisconsin (USA)!
These levels are still not quite as high as when citizen scientists identify the animals, as citizen scientists tend to get over 95% of the classifications correct. Sill, they are not too far off. If researchers don’t count instances where the algorithm has a low level of confidence in its prediction (and instead they give these harder tasks to citizen scientists) the researchers can increase the accuracy of the predictions all the way to the same level as humans. Most exciting of all, when this workflow is fully implemented, it decreases the total amount of human work required for any given study by over 40%. That means that even without dramatically increasing the amount of work that citizen scientists are doing, researchers will be able to complete more intensive studies and get results back even faster, giving a better sense of where animals are and what they are doing than we have ever had before.
Camera trap studies are used around the world to track the distribution and interaction of a wide variety of animals, helping answer important ecological questions and aid in animal conservation. The computer-based methods analyzed in this paper have the potential to dramatically reduce the amount of human effort required in analyzing these photos, which increases the speed at which scientists can see the results of their studies and respond to new information. However, it is important to note that none of this would be possible without the help of hard-working citizen scientist volunteers! Citizen scientists are critical to the development and success of many types of research, and citizen science offers a great opportunity for people to learn about scientific research. If you want to get involved (and have fun looking at cool nature pictures!) check out zooniverse.com for a variety of online citizen science projects, including over 20 different camera trap studies that could always use more help!