Catalogue Search | MBRL

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

by Gentner, Timothy Q. , Sainburg, Tim , Thielk, Marvin in Acoustics , Algorithms , Animal communication

2020

Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.

Journal Article

Share this book

Add to My Shelf

Elephant Sound Classification Using Deep Learning Optimization

by Dewmini, Hiruni , Perera, Charith , Meedeniya, Dulani in Accuracy , Acoustic properties , Algorithms

2025

Elephant sound identification is crucial in wildlife conservation and ecological research. The identification of elephant vocalizations provides insights into the behavior, social dynamics, and emotional expressions, leading to elephant conservation. This study addresses elephant sound classification utilizing raw audio processing. Our focus lies on exploring lightweight models suitable for deployment on resource-costrained edge devices, including MobileNet, YAMNET, and RawNet, alongside introducing a novel model termed ElephantCallerNet. Notably, our investigation reveals that the proposed ElephantCallerNet achieves an impressive accuracy of 89% in classifying raw audio directly without converting it to spectrograms. Leveraging Bayesian optimization techniques, we fine-tuned crucial parameters such as learning rate, dropout, and kernel size, thereby enhancing the model’s performance. Moreover, we scrutinized the efficacy of spectrogram-based training, a prevalent approach in animal sound classification. Through comparative analysis, the raw audio processing outperforms spectrogram-based methods. In contrast to other models in the literature that primarily focus on a single caller type or binary classification that identifies whether a sound is an elephant voice or not, our solution is designed to classify three distinct caller-types namely roar, rumble, and trumpet.

Journal Article

Share this book

Add to My Shelf

Explainable classification of goat vocalizations using convolutional neural networks

by Ntalampiras, Stavros , Pesando Gamacchio, Gabriele in Acoustic properties , Agricultural research , Agriculture

2025

Efficient precision livestock farming relies on having timely access to data and information that accurately describes both the animals and their surrounding environment. This paper advances classification of goat vocalizations leveraging a publicly available dataset recorded at diverse farms breeding different species. We developed a Convolutional Neural Network (CNN) architecture tailored for classifying goat vocalizations, yielding an average classification rate of 95.8% in discriminating various goat emotional states. To this end, we suitably augmented the existing dataset using pitch shifting and time stretching techniques boosting the robustness of the trained model. After thoroughly demonstrating the superiority of the designed architecture over the contrasting approaches, we provide insights into the underlying mechanisms governing the proposed CNN by carrying out an extensive interpretation study. More specifically, we conducted an explainability analysis to identify the time-frequency content within goat vocalisations that significantly impacts the classification process. Such an XAI-driven validation not only provides transparency in the decision-making process of the CNN model but also sheds light on the acoustic features crucial for distinguishing the considered classes. Last but not least, the proposed solution encompasses an interactive scheme able to provide valuable information to animal scientists regarding the analysis performed by the model highlighting the distinctive components of the considered goat vocalizations. Our findings underline the effectiveness of data augmentation techniques in bolstering classification accuracy and highlight the significance of leveraging XAI methodologies for validating and interpreting complex machine learning models applied to animal vocalizations.

Journal Article

Share this book

Add to My Shelf

Impact of transfer learning methods and dataset characteristics on generalization in birdsong classification

by Planqué, Bob , Ghani, Burooj , Kalkman, Vincent J. in 631/114/1305 , 639/705/117 , 704/158/670

2025

Animal sounds can be recognised automatically by machine learning, and this has an important role to play in biodiversity monitoring. Yet despite increasingly impressive capabilities, bioacoustic species classifiers still exhibit imbalanced performance across species and habitats, especially in complex soundscapes. In this study, we explore the effectiveness of transfer learning in large-scale bird sound classification across various conditions, including single- and multi-label scenarios, and across different model architectures such as CNNs and Transformers. Our experiments demonstrate that both finetuning and knowledge distillation yield strong performance, with cross-distillation proving particularly effective in improving in-domain performance on Xeno-canto data. However, when generalizing to soundscapes, shallow finetuning exhibits superior performance compared to knowledge distillation, highlighting its robustness and constrained nature. Our study further investigates how to use multi-species labels, in cases where these are present but incomplete. We advocate for more comprehensive labeling practices within the animal sound community, including annotating background species and providing temporal details, to enhance the training of robust bird sound classifiers. These findings provide insights into the optimal reuse of pretrained models for advancing automatic bioacoustic recognition.

Journal Article

Share this book

Add to My Shelf

Bird population declines and species turnover are changing the acoustic properties of spring soundscapes

by Benkő, Z , Butler, S.J , Øien, Ingar Jostein in 631/158/670 , 631/158/672 , 631/158/853

2021

Natural sounds, and bird song in particular, play a key role in building and maintaining our connection with nature, but widespread declines in bird populations mean that the acoustic properties of natural soundscapes may be changing. Using data-driven reconstructions of soundscapes in lieu of historical recordings, here we quantify changes in soundscape characteristics at more than 200,000 sites across North America and Europe. We integrate citizen science bird monitoring data with recordings of individual species to reveal a pervasive loss of acoustic diversity and intensity of soundscapes across both continents over the past 25 years, driven by changes in species richness and abundance. These results suggest that one of the fundamental pathways through which humans engage with nature is in chronic decline, with potentially widespread implications for human health and well-being.

Journal Article

Share this book

Add to My Shelf

SeqFusionNet: A hybrid model for sequence-aware and globally integrated acoustic representation

by Li, Jun , Ma, Wei , Gu, Ouyuping in Accuracy , Acoustics , Algorithms

2025

Animals communicate information primarily via their calls, and directly using their vocalizations proves essential for executing species conservation and tracking biodiversity. Conventional visual approaches are frequently limited by distance and surroundings, while call-based monitoring concentrates solely on the animals themselves, proving more effective and straightforward than visual techniques. This paper introduces an animal sound classification model named SeqFusionNet, integrating the sequential encoding of Transformer with the global perception of MLP to achieve robust global feature extraction. Research involved compiling and organizing four common acoustic datasets (pig, bird, urbansound, and marine mammal), with extensive experiments exploring the applicability of vocal features across species and the model’s recognition capabilities. Experimental results validate SeqFusionNet’s efficacy in classifying animal calls: it identifies four pig call types at 95.00% accuracy, nine and six bird categories at 94.52% and 95.24% respectively, fifteen and eleven marine mammal types reaching 96.43% and 97.50% accuracy, while attaining 94.39% accuracy on ten urban sound categories. Comparative analysis shows our method surpasses existing approaches. Beyond matching reference models on UrbanSound8K, SeqFusionNet demonstrates strong robustness and generalization across species. This research offers an expandable, efficient framework for automated bioacoustic monitoring, supporting wildlife preservation, ecological studies, and environmental sound analysis applications.

Journal Article

Share this book

Add to My Shelf

A multi-stage ensemble framework for classifying pig vocalizations under noisy animal farm environments

by Kim, Hyongsuk , Chung, Seyeon , Zhou, Heng in 631/114 , 631/601 , 631/601/18

2025

Pig vocalizations are important indicators of health and emotional state, offering significant potential for advancing precision livestock farming. Accurate recognition of these vocalization patterns requires robust noise filtering and effective feature extraction. However, existing studies often focus on isolated patterns such as coughs, limiting their practical applicability in real-world settings. This study introduces the Pig Vocalization Multi-stage Classification (PVMC) model, a comprehensive framework designed to detect and classify a wide range of pig vocalizations under diverse farm conditions for assessing health and emotional stress. PVMC adopts a multi-stage approach that integrates cough and scream detection with emotional state classification, providing a holistic analysis of pig vocalizations. The proposed system features: (1) improved robustness across varying vocalization durations and noise levels, (2) customized model architectures optimized for each stage of the pipeline, and (3) an ensemble learning strategy combining Wav2Vec2 and AST (Audio Spectrogram Transformer) to enhance performance and computational efficiency. PVMC achieved a signal-to-noise ratio (SNR) improvement of up to 4.9dB, 95.80% accuracy in vocalization segmentation, 98.88% accuracy in key vocalization classification, and 92.15% accuracy in emotional state detection. Notably, the ensemble method significantly improved overall precision, recall, and F1-score. These results demonstrate the PVMC model’s robustness and practical utility as a deployable solution for real-time pig vocalization monitoring, contributing to intelligent, welfare-oriented livestock management systems.

Journal Article

Share this book

Add to My Shelf

Towards a new taxonomy of primate vocal production learning

by Fischer, Julia , Hammerschmidt, Kurt in Alarm Calls , Animal Communication , Animals

2020

The extent to which vocal learning can be found in nonhuman primates is key to reconstructing the evolution of speech. Regarding the adjustment of vocal output in relation to auditory experience (vocal production learning in the narrow sense), effects on the ontogenetic trajectory of vocal development as well as adjustment to group-specific call features have been found. Yet, a comparison of the vocalizations of different primate genera revealed striking similarities in the structure of calls and repertoires in different species of the same genus, indicating that the structure of nonhuman primate vocalizations is highly conserved. Thus, modifications in relation to experience only appear to be possible within relatively tight species-specific constraints. By contrast, comprehension learning may be extremely rapid and open-ended. In conjunction, these findings corroborate the idea of an ancestral independence of vocal production and auditory comprehension learning. To overcome the futile debate about whether or not vocal production learning can be found in nonhuman primates, we suggest putting the focus on the different mechanisms that may mediate the adjustment of vocal output in response to experience; these mechanisms may include auditory facilitation and learning from success. This article is part of the theme issue ‘What can animal communication teach us about human language?’

Journal Article

Share this book

Add to My Shelf

Multiclass CNN Approach for Automatic Classification of Dolphin Vocalizations

by De Marco, Rocco , Li Veli, Daniel , Lucchetti, Alessandro in Acoustics , Algorithms , Animal vocalization

2025

Monitoring dolphins in the open sea is essential for understanding their behavior and the impact of human activities on the marine ecosystems. Passive Acoustic Monitoring (PAM) is a non-invasive technique for tracking dolphins, providing continuous data. This study presents a novel approach for classifying dolphin vocalizations from a PAM acoustic recording using a convolutional neural network (CNN). Four types of common bottlenose dolphin (Tursiops truncatus) vocalizations were identified from underwater recordings: whistles, echolocation clicks, burst pulse sounds, and feeding buzzes. To enhance classification performances, edge-detection filters were applied to spectrograms, with the aim of removing unwanted noise components. A dataset of nearly 10,000 spectrograms was used to train and test the CNN through a 10-fold cross-validation procedure. The results showed that the CNN achieved an average accuracy of 95.2% and an F1-score of 87.8%. The class-specific results showed a high accuracy for whistles (97.9%), followed by echolocation clicks (94.5%), feeding buzzes (94.0%), and burst pulse sounds (92.3%). The highest F1-score was obtained for whistles, exceeding 95%, while the other three vocalization typologies maintained an F1-score above 80%. This method provides a promising step toward improving the passive acoustic monitoring of dolphins, contributing to both species conservation and the mitigation of conflicts with fisheries.

Journal Article

Share this book

Add to My Shelf

Comparison of methods for rhythm analysis of complex animals’ acoustic signals

by Knörnschild, Mirjam , Burchardt, Lara S. in Acoustic properties , Acoustics , Analysis

2020

Analyzing the rhythm of animals' acoustic signals is of interest to a growing number of researchers: evolutionary biologists want to disentangle how these structures evolved and what patterns can be found, and ecologists and conservation biologists aim to discriminate cryptic species on the basis of parameters of acoustic signals such as temporal structures. Temporal structures are also relevant for research on vocal production learning, a part of which is for the animal to learn a temporal structure. These structures, in other words, these rhythms, are the topic of this paper. How can they be investigated in a meaningful, comparable and universal way? Several approaches exist. Here we used five methods to compare their suitability and interpretability for different questions and datasets and test how they support the reproducibility of results and bypass biases. Three very different datasets with regards to recording situation, length and context were analyzed: two social vocalizations of Neotropical bats (multisyllabic, medium long isolation calls of Saccopteryx bilineata, and monosyllabic, very short isolation calls of Carollia perspicillata) and click trains of sperm whales, Physeter macrocephalus. Techniques to be compared included Fourier analysis with a newly developed goodness-of-fit value, a generate-and-test approach where data was overlaid with varying artificial beats, and the analysis of inter-onset-intervals and calculations of a normalized Pairwise Variability Index (nPVI). We discuss the advantages and disadvantages of the methods and we also show suggestions on how to best visualize rhythm analysis results. Furthermore, we developed a decision tree that will enable researchers to select a suitable and comparable method on the basis of their data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter