Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology.

Advanced Search

S P Kyathanahally, T Hardeman, M Reyes, E Merz, T Bulas, P Brun, F Pomati, M Baity-Jesi

Author Information

S P Kyathanahally: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland. sreenath.kyathanahally@eawag.ch.
T Hardeman: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland.
M Reyes: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland.
E Merz: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland.
T Bulas: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland.
P Brun: WSL, Zürcherstrasse 111, 8903, Birmensdorf, Switzerland.
F Pomati: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland.
M Baity-Jesi: Eawag, Überlandstrasse 133, 8600, Dübendorf, Switzerland. marco.baityjesi@eawag.ch.

PMID: 36329061 DOI: 10.1038/s41598-022-21910-0

Monitoring biodiversity is paramount to manage and protect natural resources. Collecting images of organisms over large temporal or spatial scales is a promising practice to monitor the biodiversity of natural ecosystems, providing large amounts of data with minimal interference with the environment. Deep learning models are currently used to automate classification of organisms into taxonomic units. However, imprecision in these classifiers introduces a measurement noise that is difficult to control and can significantly hinder the analysis and interpretation of data. We overcome this limitation through ensembles of Data-efficient image Transformers (DeiTs), which not only are easy to train and implement, but also significantly outperform the previous state of the art (SOTA). We validate our results on ten ecological imaging datasets of diverse origin, ranging from plankton to birds. On all the datasets, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 29.35% to 100.00%, and often achieving performances very close to perfect classification. Ensembles of DeiTs perform better not because of superior single-model performances but rather due to smaller overlaps in the predictions by independent models and lower top-1 probabilities. This increases the benefit of ensembling, especially when using geometric averages to combine individual learners. While we only test our approach on biodiversity image datasets, our approach is generic and can be applied to any kind of images.

Kremen, C., Merenlender, A. M. & Murphy, D. D. Ecological monitoring: A vital need for integrated conservation and development programs in the tropics. Conserv. Biol. 8, 388–397 (1994). [DOI: 10.1046/j.1523-1739.1994.08020388.x]
Jetz, W. et al. Essential biodiversity variables for mapping and monitoring species populations. Nat. Ecol. Evol. 3, 539–551 (2019). [DOI: 10.1038/s41559-019-0826-1]
Kühl, H. S. et al. Effective biodiversity monitoring needs a culture of integration. One Earth 3, 462–474. https://doi.org/10.1016/j.oneear.2020.09.010 (2020). [DOI: 10.1016/j.oneear.2020.09.010]
Witmer, G. Wildlife population monitoring: Some practical considerations. Wildl. Res. https://doi.org/10.1071/WR04003 (2005). [DOI: 10.1071/WR04003]
McEvoy, J. F., Hall, G. P. & McDonald, P. G. Evaluation of unmanned aerial vehicle shape, flight path and camera type for waterfowl surveys: Disturbance effects and species recognition. Peer J. 4, e1831–e1831. https://doi.org/10.7717/peerj.1831 (2016). [DOI: 10.7717/peerj.1831]
Hodgson, J. C. et al. Drones count wildlife more accurately and precisely than humans. Methods Ecol. Evol. 9, 1160–1167. https://doi.org/10.1111/2041-210X.12974 (2018). [DOI: 10.1111/2041-210X.12974]
Tuia, D. et al. Perspectives in machine learning for wildlife conservation. Nat. Commun. 13, 792. https://doi.org/10.1038/s41467-022-27980-y (2022). [DOI: 10.1038/s41467-022-27980-y]
Soranno, P. A. et al. Cross-scale interactions: Quantifying multi-scaled cause-effect relationships in macrosystems. Front. Ecol. Environ. 12, 65–73. https://doi.org/10.1890/120366 (2014). [DOI: 10.1890/120366]
Luque, S., Pettorelli, N., Vihervaara, P. & Wegmann, M. Improving biodiversity monitoring using satellite remote sensing to provide solutions towards the 2020 conservation targets. Methods Ecol. Evol. 9, 1784–1786. https://doi.org/10.1111/2041-210X.13057 (2018). [DOI: 10.1111/2041-210X.13057]
Burton, A. C. et al. Review: Wildlife camera trapping: A review and recommendations for linking surveys to ecological processes. J. Appl. Ecol. 52, 675–685. https://doi.org/10.1111/1365-2664.12432 (2015). [DOI: 10.1111/1365-2664.12432]
Rowcliffe, J. M. & Carbone, C. Surveys using camera traps: Are we looking to a brighter future? Anim. Conserv. 11, 185–186. https://doi.org/10.1111/j.1469-1795.2008.00180.x (2008). [DOI: 10.1111/j.1469-1795.2008.00180.x]
Steenweg, R. et al. Scaling-up camera traps: Monitoring the planet’s biodiversity with networks of remote sensors. Front. Ecol. Environ. 15, 26–34. https://doi.org/10.1002/fee.1448 (2017). [DOI: 10.1002/fee.1448]
Orenstein, E. C. et al. The scripps plankton camera system: A framework and platform for in situ microscopy. Limnol. Oceanogr. Methods 18, 681–695. https://doi.org/10.1002/lom3.10394 (2020). [DOI: 10.1002/lom3.10394]
Merz, E. et al. Underwater dual-magnification imaging for automated lake plankton monitoring. Water Res. 203, 117524. https://doi.org/10.1101/2021.04.14.439767 (2021). [DOI: 10.1101/2021.04.14.439767]
Farley, S. S., Dawson, A., Goring, S. J. & Williams, J. W. Situating ecology as a big-data science: Current advances, challenges, and solutions. Bioscience 68, 563–576. https://doi.org/10.1093/biosci/biy068 (2018). [DOI: 10.1093/biosci/biy068]
Jamison, E. & Gurevych, I. Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing pp 291–297 (2015).
Kwok, R. Ai empowers conservation biology. Nature 567, 133–134. https://doi.org/10.1038/d41586-019-00746-1 (2019). [DOI: 10.1038/d41586-019-00746-1]
Norouzzadeh, M. S. et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl. Acad. Sci. 115, E5716–E5725. https://doi.org/10.1073/pnas.1719367115 (2018). [DOI: 10.1073/pnas.1719367115]
Willi, M. et al. Identifying animal species in camera trap images using deep learning and citizen science. Methods Ecol. Evol. 10, 80–91. https://doi.org/10.1111/2041-210X.13099 (2019). [DOI: 10.1111/2041-210X.13099]
Tabak, M. A. et al. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods Ecol. Evol. 10, 585–590. https://doi.org/10.1111/2041-210X.13120 (2019). [DOI: 10.1111/2041-210X.13120]
Henrichs, D. W., Anglès, S., Gaonkar, C. C. & Campbell, L. Application of a convolutional neural network to improve automated early warning of harmful algal blooms. Environ. Sci. Pollut. Res. pp 1–12 (2021).
Kyathanahally, S. P. et al. Deep learning classification of lake zooplankton. Front. Microbiol. https://doi.org/10.3389/fmicb.2021.746297 (2021). [DOI: 10.3389/fmicb.2021.746297]
Py, O., Hong, H., & Zhongzhi, S. Plankton classification with deep convolutional neural networks. In 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference pp 132–136. https://doi.org/10.1109/ITNEC.2016.7560334 (2016).
Dai, J., Yu, Z., Zheng, H., Zheng, B. & Wang, N. A hybrid convolutional neural network for plankton classification. In Chen, C.-S., Lu, J. & Ma, K.-K. (eds.) Computer Vision – ACCV 2016 Workshops, 102–114 (Springer International Publishing, Cham, 2017).
Lee, H., Park, M. & Kim, J. Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. In 2016 IEEE International Conference on Image Processing (ICIP), pp 3713–3717. https://doi.org/10.1109/ICIP.2016.7533053 (2016).
Luo, J. Y. et al. Automated plankton image analysis using convolutional neural networks. Limnol. Oceanogr. Methods 16, 814–827. https://doi.org/10.1002/lom3.10285 (2018). [DOI: 10.1002/lom3.10285]
Islam, S. B. & Valles, D. Identification of wild species in texas from camera-trap images using deep neural network for conservation monitoring. In 2020 10th Annual Computing and Communication Workshop and Conference (CCWC) pp 0537–0542, https://doi.org/10.1109/CCWC47524.2020.9031190 (2020).
Green, S. E., Rees, J. P., Stephens, P. A., Hill, R. A. & Giordano, A. J. Innovations in camera trapping technology and approaches: The integration of citizen science and artificial intelligence. Animals https://doi.org/10.3390/ani10010132 (2020). [DOI: 10.3390/ani10010132]
Schneider, S., Greenberg, S., Taylor, G. W. & Kremer, S. C. Three critical factors affecting automated image species recognition performance for camera traps. Ecol. Evol. 10, 3503–3517. https://doi.org/10.1002/ece3.6147 (2020). [DOI: 10.1002/ece3.6147]
Vaswani, A. et al. Attention is all you need. CoRR arXiv:1706.03762 (2017).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR arXiv:2010.11929 (2020).
Touvron, H. et al. Training data-efficient image transformers & distillation through attention. CoRR arXiv:2012.12877 (2020).
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D. & Wilson, A. G. Averaging weights leads to wider optima and better generalization (2018).
Krizhevsky, A. Learning multiple layers of features from tiny images. (2009). https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf .
Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition pp 248–255 (Ieee, 2009).
Recht, B., Roelofs, R., Schmidt, L. & Shankar, V. Do imagenet classifiers generalize to imagenet?. In International Conference on Machine Learning pp 5389–5400 (PMLR, 2019).
d’Ascoli, S., Refinetti, M., Biroli, G. & Krzakala, F. Double trouble in double descent: Bias and variance (s) in the lazy regime. In International Conference on Machine Learning pp 2280–2290 (PMLR, 2020).
Nakkiran, P., Venkat, P., Kakade, S. & Ma, T. Optimal regularization can mitigate double descent. arXiv preprint arXiv:2003.01897 (2020).
Moreno-Torres, J. G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N. V. & Herrera, F. A unifying view on dataset shift in classification. Pattern Recogn. 45, 521–530. https://doi.org/10.1016/j.patcog.2011.06.019 (2012). [DOI: 10.1016/j.patcog.2011.06.019]
Minderer, M. et al. Revisiting the calibration of modern neural networks. Adv. Neural. Inf. Process. Syst. 34, 15682–15694 (2021).
Naseer, M. M. et al. Intriguing properties of vision transformers. Adv. Neural. Inf. Process. Syst. 34, 23296–23308 (2021).
Paul, S. & Chen, P.-Y. Vision transformers are robust learners. In Proceedings of the AAAI Conference on Artificial Intelligence 36, pp 2071–2081 (2022).
Zheng, H. et al. Automatic plankton image classification combining multiple view features via multiple kernel learning. BMC Bioinf. 18, 570. https://doi.org/10.1186/s12859-017-1954-8 (2017). [DOI: 10.1186/s12859-017-1954-8]
Lumini, A., Nanni, L. & Maguolo, G. Deep learning for plankton and coral classification. Appl. Comput. Inform. https://doi.org/10.1016/j.aci.2019.11.004 (2020).
Gómez-Ríos, A. et al. Towards highly accurate coral texture images classification using deep convolutional neural networks and data augmentation. Expert Syst. Appl. 118, 315–328. https://doi.org/10.1016/j.eswa.2018.10.010 (2019). [DOI: 10.1016/j.eswa.2018.10.010]
Kyathanahally, S. et al. Data for: Deep learning classification of lake zooplankton. Front. Microbiol. https://doi.org/10.25678/0004DY (2021). [DOI: 10.25678/0004DY]
Sosik, H. & Olson, R. Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr. Methods 5, 204–216 (2007). [DOI: 10.4319/lom.2007.5.204]
Olson, R. J. & Sosik, H. M. A submersible imaging-in-flow instrument to analyze nano-and microplankton: Imaging flowcytobot. Limnol. Oceanogr. Methods 5, 195–203. https://doi.org/10.4319/lom.2007.5.195 (2007). [DOI: 10.4319/lom.2007.5.195]
Gorsky, G. et al. Digital zooplankton image analysis using the ZooScan integrated system. J. Plankton Res. 32, 285–303. https://doi.org/10.1093/plankt/fbp124 (2010). [DOI: 10.1093/plankt/fbp124]
Van Horn, G. et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 595–604. https://doi.org/10.1109/CVPR.2015.7298658 (2015).
He, J., et al. Transfg: A transformer architecture for fine-grained recognition. CoRR arXiv:2103.07976 (2021).
Khosla, A., Jayadevaprakash, N., Yao, B. & Fei-Fei, L. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition (Colorado Springs, CO, 2011).
Abeywardhana, D., Dangalle, C., Nugaliyadde, A. & Mallawarachchi, Y. Deep learning approach to classify tiger beetles of Sri Lanka. Eco. Inform. 62, 101286. https://doi.org/10.1016/j.ecoinf.2021.101286 (2021). [DOI: 10.1016/j.ecoinf.2021.101286]
Gagne, C., Kini, J., Smith, D. & Shah, M. Florida wildlife camera trap dataset. CoRR arXiv:2106.12628 (2021).
Xu, Y., Zhang, Q., Zhang, J. & Tao, D. Vitae: Vision transformer advanced by exploring intrinsic inductive bias. CoRR arXiv:2106.03348 (2021).
Allen-Zhu, Z. & Li, Y. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. CoRR arXiv:2012.09816 (2020).
Tan, C. et al. A survey on deep transfer learning. In International conference on artificial neural networks pp 270–279 (Springer, 2018).
Torch image models (2022). Available at https://fastai.github.io/timmdocs/ .
Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6, 1–54 (2019). [DOI: 10.1186/s40537-019-0192-5]
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).
Loshchilov, I. & Hutter, F. Fixing weight decay regularization in adam. CoRR abs/1711.05101 (2017).
Abadi, M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016).
O’Malley, T. et al. Keras Tuner. https://github.com/keras-team/keras-tuner (2019).
Mockus, J. Bayesian Approach to Global Optimization: Theory and Applications Vol. 37 (Springer Science & Business Media, 2012).
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks (2018).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474 (2018).
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning pp 6105–6114 (PMLR, 2019).
Seni, G. & Elder, J. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions Vol. 2 (Morgan & Claypool Publishers, 2010). [DOI: 10.1007/978-3-031-01899-2]
Zhang, C. & Ma, Y. Ensemble Machine Learning: Methods and Applications (Springer, 2012). [DOI: 10.1007/978-1-4419-9326-7]
Alexandre, L. A., Campilho, A. C. & Kamel, M. On combining classifiers using sum and product rules. Pattern Recogn. Lett. 22, 1283–1289 (2001). [DOI: 10.1016/S0167-8655(01)00073-3]
Tax, D. M., Duin, R. P. & Breukelen, M. V. Comparison between product and mean classifier combination rules. In In Proc. Workshop on Statistical Pattern Recognition, 165–170 (1997).

5221.00492.999.01/Eawag Discretionary Fund
Q392-1149/Bundesamt für Umwelt
182124/Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Animals

Ecosystem

Biodiversity

Birds

Plankton

Diagnostic Imaging

Journal Article Research Support, Non-U.S. Gov't

Vision Transformers (ViT) for Blanket-Penetrating Sleep Posture Recognition Using a Triple Ultra-Wideband (UWB) Radar System.

OpenLB
Open Library of Bioscience