Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
348
result(s) for
"Sound Spectrography - methods"
Sort by:
USVSEG: A robust method for segmentation of ultrasonic vocalizations in rodents
2020
Rodents' ultrasonic vocalizations (USVs) provide useful information for assessing their social behaviors. Despite previous efforts in classifying subcategories of time-frequency patterns of USV syllables to study their functional relevance, methods for detecting vocal elements from continuously recorded data have remained sub-optimal. Here, we propose a novel procedure for detecting USV segments in continuous sound data containing background noise recorded during the observation of social behavior. The proposed procedure utilizes a stable version of the sound spectrogram and additional signal processing for better separation of vocal signals by reducing the variation of the background noise. Our procedure also provides precise time tracking of spectral peaks within each syllable. We demonstrated that this procedure can be applied to a variety of USVs obtained from several rodent species. Performance tests showed this method had greater accuracy in detecting USV syllables than conventional detection methods.
Journal Article
Inharmonic speech reveals the role of harmonicity in the cocktail party problem
2018
The “cocktail party problem” requires us to discern individual sound sources from mixtures of sources. The brain must use knowledge of natural sound regularities for this purpose. One much-discussed regularity is the tendency for frequencies to be harmonically related (integer multiples of a fundamental frequency). To test the role of harmonicity in real-world sound segregation, we developed speech analysis/synthesis tools to perturb the carrier frequencies of speech, disrupting harmonic frequency relations while maintaining the spectrotemporal envelope that determines phonemic content. We find that violations of harmonicity cause individual frequencies of speech to segregate from each other, impair the intelligibility of concurrent utterances despite leaving intelligibility of single utterances intact, and cause listeners to lose track of target talkers. However, additional segregation deficits result from replacing harmonic frequencies with noise (simulating whispering), suggesting additional grouping cues enabled by voiced speech excitation. Our results demonstrate acoustic grouping cues in real-world sound segregation.
Harmonicity is associated with a single sound source and may be a useful cue with which to segregate the speech of multiple talkers. Here the authors introduce a method for perturbing the constituent frequencies of speech and show that violating harmonicity degrades intelligibility of speech mixtures.
Journal Article
ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning
2019
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis – particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (
Orcinus orca
) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository – the Orchive – comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species.
Journal Article
Assessment of Laying Hens’ Thermal Comfort Using Sound Technology
by
Carpentier, Lenn
,
Teng, Guanghui
,
Du, Xiaodong
in
Algorithms
,
Animal Husbandry - methods
,
animal vocalisation
2020
Heat stress is one of the most important environmental stressors facing poultry production and welfare worldwide. The detrimental effects of heat stress on poultry range from reduced growth and egg production to impaired health. Animal vocalisations are associated with different animal responses and can be used as useful indicators of the state of animal welfare. It is already known that specific chicken vocalisations such as alarm, squawk, and gakel calls are correlated with stressful events, and therefore, could be used as stress indicators in poultry monitoring systems. In this study, we focused on developing a hen vocalisation detection method based on machine learning to assess their thermal comfort condition. For extraction of the vocalisations, nine source-filter theory related temporal and spectral features were chosen, and a support vector machine (SVM) based classifier was developed. As a result, the classification performance of the optimal SVM model was 95.1 ± 4.3% (the sensitivity parameter) and 97.6 ± 1.9% (the precision parameter). Based on the developed algorithm, the study illustrated that a significant correlation existed between specific vocalisations (alarm and squawk call) and thermal comfort indices (temperature-humidity index, THI) (alarm-THI, R = −0.414, P = 0.01; squawk-THI, R = 0.594, P = 0.01). This work represents the first step towards the further development of technology to monitor flock vocalisations with the intent of providing producers an additional tool to help them actively manage the welfare of their flock.
Journal Article
Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems
by
Chung, Yongwha
,
Oh, Seunggeun
,
Lee, Jonguk
in
Accuracy
,
Animals
,
Auscultation - instrumentation
2013
Automatic detection of pig wasting diseases is an important issue in the management of group-housed pigs. Further, respiratory diseases are one of the main causes of mortality among pigs and loss of productivity in intensive pig farming. In this study, we propose an efficient data mining solution for the detection and recognition of pig wasting diseases using sound data in audio surveillance systems. In this method, we extract the Mel Frequency Cepstrum Coefficients (MFCC) from sound data with an automatic pig sound acquisition process, and use a hierarchical two-level structure: the Support Vector Data Description (SVDD) and the Sparse Representation Classifier (SRC) as an early anomaly detector and a respiratory disease classifier, respectively. Our experimental results show that this new method can be used to detect pig wasting diseases both economically (even a cheap microphone can be used) and accurately (94% detection and 91% classification accuracy), either as a standalone solution or to complement known methods to obtain a more accurate solution.
Journal Article
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
by
Savariaux, Christophe
,
Yvert, Blaise
,
Bocquelet, Florent
in
Acoustics
,
Aphasia
,
Biofeedback, Psychology - instrumentation
2016
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.
Journal Article
Revisiting the syntactic abilities of non-human animals: natural vocalizations and artificial grammar learning
by
ten Cate, Carel
,
Okanoya, Kazuo
in
Acoustic Stimulation - methods
,
Animal vocalization
,
Animals
2012
The domain of syntax is seen as the core of the language faculty and as the most critical difference between animal vocalizations and language. We review evidence from spontaneously produced vocalizations as well as from perceptual experiments using artificial grammars to analyse animal syntactic abilities, i.e. abilities to produce and perceive patterns following abstract rules. Animal vocalizations consist of vocal units (elements) that are combined in a species-specific way to create higher order strings that in turn can be produced in different patterns. While these patterns differ between species, they have in common that they are no more complex than a probabilistic finite-state grammar. Experiments on the perception of artificial grammars confirm that animals can generalize and categorize vocal strings based on phonetic features. They also demonstrate that animals can learn about the co-occurrence of elements or learn simple ‘rules’ like attending to reduplications of units. However, these experiments do not provide strong evidence for an ability to detect abstract rules or rules beyond finite-state grammars. Nevertheless, considering the rather limited number of experiments and the difficulty to design experiments that unequivocally demonstrate more complex rule learning, the question of what animals are able to do remains open.
Journal Article
Gelada vocal sequences follow Menzerath’s linguistic law
by
Bergman, Thore J.
,
Ferrer-i-Cancho, Ramon
,
Gustison, Morgan L.
in
Algorithms
,
Animal communication
,
Animals
2016
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath’s law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath’s law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath’s law reflects compression—the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.
Journal Article
Exploring Spectrogram-Based Audio Classification for Parkinson’s Disease: A Study on Speech Classification and Qualitative Reliability Verification
2024
Patients suffering from Parkinson’s disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson’s patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson’s through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson’s speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson’s through speech using two different types of models but also validated the predictions of the model in practice.
Journal Article
Accelerated construction of stress relief music datasets using CNN and the Mel-scaled spectrogram
by
Choi, Suvin
,
Park, Jong-Ik
,
Hong, Cheol-Ho
in
Adult
,
Artificial neural networks
,
Classification
2024
Listening to music is a crucial tool for relieving stress and promoting relaxation. However, the limited options available for stress-relief music do not cater to individual preferences, compromising its effectiveness. Traditional methods of curating stress-relief music rely heavily on measuring biological responses, which is time-consuming, expensive, and requires specialized measurement devices. In this paper, a deep learning approach to solve this problem is introduced that explicitly uses convolutional neural networks and provides a more efficient and economical method for generating large datasets of stress-relief music. These datasets are composed of Mel-scaled spectrograms that include essential sound elements (such as frequency, amplitude, and waveform) that can be directly extracted from the music. The trained model demonstrated a test accuracy of 98.7%, and a clinical study indicated that the model-selected music was as effective as researcher-verified music in terms of stress-relieving capacity. This paper underlines the transformative potential of deep learning in addressing the challenge of limited music options for stress relief. More importantly, the proposed method has profound implications for music therapy because it enables a more personalized approach to stress-relief music selection, offering the potential for enhanced emotional well-being.
Journal Article