Catalogue Search | MBRL

Multi-Sound-Source Localization Using Machine Learning for Small Autonomous Unmanned Vehicles with a Self-Rotating Bi-Microphone Array

by Gala, Deepak , Sun, Liang , Lindsay, Nathan in Algorithms , Arrays , Artificial Intelligence

2021

While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capabilities have not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform three-dimensional (3D) multi-sound-sources localization (MSSL) using only the inter-channel time difference (ICTD) signal generated by a self-rotating bi-microphone array. The proposed two approaches are based on two machine learning techniques viz., Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Random Sample Consensus (RANSAC) algorithms, respectively, whose performances were tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their 3D orientations in a reverberant environment.

Journal Article

Share this book

Add to My Shelf

Sound Source Localization Using Deep Learning Models

by Ogata, Tetsuya , Nakadai, Kazuhiro , Yalta, Nelson in Artificial neural networks , Deep learning , Environment models

2017

[abstFig src='/00290001/04.jpg' width='300' text='Using a deep learning model, the robot locate the sound source from a multiple channel audio stream input' ] This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

Journal Article

Share this book

Add to My Shelf

Joint Learning of Audio–Visual Saliency Prediction and Sound Source Localization on Multi-face Videos

by Qiao, Minglang , Hu, Weiming , Xu, Mai in Annotations , Audio data , Audio signals

2024

Visual and audio events simultaneously occur and both attract attention. However, most existing saliency prediction works ignore the influence of audio and only consider vision modality. In this paper, we propose a multi-task learning method for audio–visual saliency prediction and sound source localization on multi-face video by leveraging visual, audio and face information. Specifically, we first introduce a large-scale database of multi-face video in visual-audio condition, containing eye-tracking data and sound source annotations. Using this database, we find that sound influences human attention, and conversely attention offers a cue to determine sound source on multi-face video. Guided by these findings, an audio–visual multi-task network (AVM-Net) is introduced to predict saliency and locate sound source. AVM-Net consists of three branches corresponding to visual, audio and face modalities. The visual branch has a two-stream architecture to capture spatial and temporal information. Face and audio branches encode audio signals and faces, respectively. Finally, a spatio-temporal multi-modal graph is constructed to model the interaction among multiple faces. With joint optimization of these branches, the intrinsic correlation of the tasks of saliency prediction and sound source localization is utilized and their performance is boosted by each other. Experiments show that the proposed method outperforms 12 state-of-the-art saliency prediction methods, and achieves competitive results in sound source localization.

Journal Article

Share this book

Add to My Shelf

Off-Screen Sound Separation Based on Audio-visual Pre-training Using Binaural Audio

by Togo, Ren , Ogawa, Takahiro , Yoshida, Masaki in Analysis , audio-visual systems , binaural audio

2023

This study proposes a novel off-screen sound separation method based on audio-visual pre-training. In the field of audio-visual analysis, researchers have leveraged visual information for audio manipulation tasks, such as sound source separation. Although such audio manipulation tasks are based on correspondences between audio and video, these correspondences are not always established. Specifically, sounds coming from outside a screen have no audio-visual correspondences and thus interfere with conventional audio-visual learning. The proposed method separates such off-screen sounds based on their arrival directions using binaural audio, which provides us with three-dimensional sensation. Furthermore, we propose a new pre-training method that can consider the off-screen space and use the obtained representation to improve off-screen sound separation. Consequently, the proposed method can separate off-screen sounds irrespective of the direction from which they arrive. We conducted our evaluation using generated video data to circumvent the problem of difficulty in collecting ground truth for off-screen sounds. We confirmed the effectiveness of our methods through off-screen sound detection and separation tasks.

Journal Article

Share this book

Add to My Shelf

A Combined Method for Localizing Two Overlapping Acoustic Sources Based on Deep Learning

by Agafonov, Evgeny , Shahoud, Ghiath , Lyapin, Alexander in Acoustics , Datasets , Deep learning

2025

Deep learning approaches for multi-source sound localization face significant challenges, particularly the need for extensive training datasets encompassing diverse spatial configurations to achieve robust generalization. This requirement leads to substantial computational demands, which are further exacerbated when localizing overlapping sources in complex acoustic environments with reverberation and noise. In this paper, a new methodology is proposed for simultaneous localization of two overlapping sound sources in the time–frequency domain in a closed, reverberant environment with a spatial resolution of 10° using a small-sized microphone array. The proposed methodology is based on the integration of the sound source separation method with a single-source sound localization model. A hybrid model was proposed to separate the sound source signals received by each microphone in the array. The model was built using a bidirectional long short-term memory (BLSTM) network and trained on a dataset using the ideal binary mask (IBM) as the training target. The modeling results show that the proposed localization methodology is efficient in determining the directions for two overlapping sources simultaneously, with an average localization accuracy of 86.1% for the test dataset containing short-term signals of 500 ms duration with different signal-to-signal ratio values.

Journal Article

Share this book

Add to My Shelf

Auditory Feature Driven Model Predictive Control for Sound Source Approaching

by Zhang, Chi , Wang, Zhiqing , Guo, Yuxin in Control , Engineering , Localization

2024

Sound source approaching is a typical task for the robot with auditory sensing. Many existing methods are based on sound source localization (SSL), and utilize the explicit location as the control input. To reduce the localization computation cost and improve the robustness against noise and reverberation, we propose a novel auditory feature driven model predictive control (AFD-MPC) method, which directly uses the auditory feature as the control input. First, a new convolution-ternarization based interaural time difference (CT-ITD) estimation method is proposed, which is more robust to noise and reverberation by eliminating signal spikes and irrelevant components. Second, a new system model is derived and established, which directly links the robot motions and the interaural time difference (ITD) feature. Third, AFD-MPC is realized based on the proposed CT-ITD feature estimation and system model. The states at multiple future time steps are predicted based on the system model, and a control objective function considering both target approaching and motion smoothness is designed. By involving the multi-step future states in the control objective function, the control outcome is more smooth on motion trajectory and more robust to instantaneous interferences. A series of experiments such as static and dynamic sound source approaching are conducted on a mobile robot equipped with a small-sized 6-microphone array to validate the effectiveness of our methods.

Journal Article

Share this book

Add to My Shelf

Numerical research on the focusing characteristics of steady state sound field in finite space by combining finite element and phase conjugation methods

by Wang, Yanlin , Song, Hebin , Liu, Song in Absorbers (materials) , Absorptivity , Acoustic absorption

2024

The research on sound source identification in an infinite free sound field has been relatively in-depth, but there is relatively little research on sound source identification and localization in limited spaces, especially those with sound absorption and insulation materials. The identification and reconstruction of the acoustical steady propagation field radiated from a single frequency finite space are studied numerically using discrete elements based on the phase conjugation method by FEM. Two different kinds of array forms of phase conjugation arrays are studied for sound source localization such as the planar array and the linear array. In addition, the influences of the existence of sound-absorbing material on the wall on the focusing properties are also discussed. The numerical results show that: The phase conjugation method can completely achieve the identification and location of the acoustical source finite space no matter using the planar array form or the linear array according to FEM. The linear array form can obtain the subwavelength focusing with fewer elements. The optimal distance between the array and the sound source is 0.5 λ and 2 λ to get the best reconstruction results. The smaller the absorption coefficient, the more to meet the multi-path phase compensation principle, the better reconstruction of the sound source.

Journal Article

Share this book

Add to My Shelf

Detection of Signal of Fire Source for Coal Spontaneous Combustion Applied with Acoustic Wave

by Ren, Shuaijing , Zhang, Yanni , Ma, Teng in Acoustic measurement , Acoustic waves , Acoustics

2023

Coal spontaneous combustion can cause a series of problems in terms of wasted resources, casualties, and environmental pollution. Accurate detection of the fire source in loose coal is the key to preventing coal spontaneous combustion. Acoustic temperature measurement has significant advantages of strong stability, high accuracy, and wide measurement range, which can make up for the shortcomings of traditional fire source detection methods. Detecting the optimum source signal for loose coal temperature measurement is the basis and prerequisite for the realization of acoustic temperature measurement. Therefore, the anti-interference characteristics of three typical sound source signals were tested and analyzed through a self-designed sound wave propagation characteristic test system. The cross-correlations among maximum length sequence signal, pulse signal, and linear sweep signal were compared and analyzed. Compared to the other two signals, the main peak of the cross-correlation coefficient of the linear sweep signal was more prominent and its pseudo-peaks interfered less with its main peak. This signal had strong anti-interference ability, and it can be used as a basic acoustic source signal for temperature measurement of loose coal. To further screen out the optimal frequency band and length of the linear sweep signal, four bituminous coals were selected as the propagation medium. The main peak value and the difference between the main peak and the maximum pseudo-peak of the cross-correlation coefficient were proposed as the evaluation indicators. The optimum signal frequency bands of long-flame coal, non-caking coal, coking coal, and lean coal were 400–900, 400–900, 500–1200, and 400–900 Hz, and the optimum signal length of four coals was 0.1 s. The study results can provide theoretical support for the selection of acoustic temperature measurement signals for loose coal.

Journal Article

Share this book

Add to My Shelf

Evaluating the Role of Unit Cell Multiplicity in the Acoustic Response of Phononic Crystals Using Laser-Plasma Sound Sources

by Dimitriou, Vasilis , Tatarakis, Michael , Papadogiannis, Nektarios in Acoustic insulation , Acoustic properties , Acoustics

2025

Acoustic metamaterials and phononic crystals are progressively consolidating as an important technology that is expected to significantly impact the science and industry of acoustics in the coming years. In this work, the impact of unit cell multiplicity on the spectral features of the acoustic response of phononic crystals is systematically studied using the recently demonstrated laser-plasma sound source characterization method. Specifically, by exploiting the advantages of this method, the impact of the number of repeated unit cells on the depth of the phononic band gaps and the passband spectral features across the entire audible range is demonstrated. These experimental findings are supported by specially developed computational simulations accounting for the precise structural characteristics of the studied phononic crystals and are analysed to provide a phenomenological understanding of the underlying physical mechanism. It is shown that by increasing the unit cell multiplicity, the bandgaps deepen and the number of resonant peaks in the crystal transmission zones increases. The resonant mode shapes are computationally investigated and interpreted in terms of spherical harmonics. This study highlights the tunability and design flexibility of acoustic components using phononic crystals, opening new paths towards applications in the fields of sound control and noise insulation.

Journal Article

Share this book

Add to My Shelf

SoundCompass: A Distributed MEMS Microphone Array-Based Sensor for Sound Source Localization

by Touhafi, Abdellah , Segers, Laurent , Steenhaut, Kris in beamforming , Design , Environmental policy

2014

Sound source localization is a well-researched subject with applications ranging from localizing sniper fire in urban battlefields to cataloging wildlife in rural areas. One critical application is the localization of noise pollution sources in urban environments, due to an increasing body of evidence linking noise pollution to adverse effects on human health. Current noise mapping techniques often fail to accurately identify noise pollution sources, because they rely on the interpolation of a limited number of scattered sound sensors. Aiming to produce accurate noise pollution maps, we developed the SoundCompass, a low-cost sound sensor capable of measuring local noise levels and sound field directionality. Our first prototype is composed of a sensor array of 52 Microelectromechanical systems (MEMS) microphones, an inertial measuring unit and a low-power field-programmable gate array (FPGA). This article presents the SoundCompass’s hardware and firmware design together with a data fusion technique that exploits the sensing capabilities of the SoundCompass in a wireless sensor network to localize noise pollution sources. Live tests produced a sound source localization accuracy of a few centimeters in a 25-m2 anechoic chamber, while simulation results accurately located up to five broadband sound sources in a 10,000-m2 open field.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter