Catalogue Search | MBRL

Human and machine hearing : extracting meaning from sound

by Lyon, Richard F., author in Hearing. , Auditory perception Mathematical models. , Auditory perception Computer simulation.

\"If we understood more about how humans hear, we could make machines hear better, in the sense of being able to analyze sound and extract useful and meaningful information from it. Or so I claim. I have been working for decades, but more intensely in recent years, to add some substance to this claim, and to help engineers and scientists understand how the pieces fit together, so they can help move the art forward. There is still plenty to be done, and this book is my attempt to help focus the effort in this field into productive directions; to help new practitioners see enough of the evolution of ideas that they can skip to where new developments and experiments are needed, or to techniques that can already solve their sound understanding problems. The book-writing process has been tremendous fun, with support from family, friends, and colleagues. They do, however, have a tendency to ask two annoying questions: \"Is the book done yet?\" and \"Who is your audience?\" The first eventually answers itself, but I need to say a few words about the second. I find that interest in sound and hearing comes from people of many different disciplines, with complementary backgrounds and sometimes incompatible terminology and concepts. I want all of these people as my audience, as I want to teach a synthesis of their various viewpoints into a more comprehensive framework that includes everything needed to work on machine hearing problems. That is, electrical engineers, computer scientists, physicists, physiologists, audiologists, musicians, psychologists, and others are all part of my audience. Students, teachers, researchers, product managers, developers, and hackers are, too\"-- Provided by publisher.

Book

Share this book

Add to My Shelf

Tactile stimulations reduce or promote the segregation of auditory streams: psychophysics and modeling

by Rankin, James , Słowiński, Piotr , Darki, Farzaneh in Acoustic Stimulation , Activity patterns , Adult

2025

Auditory stream segregation plays a crucial role in understanding the auditory scene. This study investigates the role of tactile stimulation in auditory stream segregation through psychophysics experiments and a computational model of audio-tactile interactions. We examine how tactile pulses, synchronized with one group of tones (high- or low-frequency tones) in a sequence of interleaved high- and low-frequency tones (ABA- triplets), influence the likelihood of perceiving integrated or segregated auditory streams. Our findings reveal that tactile pulses synchronized with a single tone sequence (either the A-tone or B-tone sequence) enhance perceptual segregation, while pulses synchronized with both tone sequences promote integration. Based on these findings, we developed a dynamical model that captures interactions between auditory and tactile neural circuits, including recurrent excitation, mutual inhibition, adaptation, and noise. The proposed model shows excellent agreement with the experiment. Model predictions are validated through psychophysics experiments. In the model, we assume that selective tactile stimulation dynamically modulates the tonotopic organization within the auditory cortex. This modulation facilitates segregation by reinforcing specific tonotopic responses through single-tone synchronization while smoothing neural activity patterns with dual-tone alignment to promote integration. The model offers a robust computational framework for exploring cross-modal effects on stream segregation and predicts neural behavior under varying tactile conditions. Our findings imply that cross-modal synchronization, with carefully timed tactile cues, could improve auditory perception with potential applications in auditory assistive technologies aimed at enhancing speech recognition in noisy settings.

Journal Article

Share this book

Add to My Shelf

Pavlovian conditioning–induced hallucinations result from overweighting of perceptual priors

by Powers, A. R. , Mathys, C. , Corlett, P. R. in Acoustic Stimulation , Adult , Auditory perception

2017

Some people hear voices that others do not, but only some of those people seek treatment. Using a Pavlovian learning task, we induced conditioned hallucinations in four groups of people who differed orthogonally in their voice-hearing and treatment-seeking statuses. People who hear voices were significantly more susceptible to the effect. Using functional neuroimaging and computational modeling of perception, we identified processes that differentiated voice-hearers from non–voice-hearers and treatment-seekers from non–treatment-seekers and characterized a brain circuit that mediated the conditioned hallucinations. These data demonstrate the profound and sometimes pathological impact of top-down cognitive processes on perception and may represent an objective means to discern people with a need for treatment from those without.

Journal Article

Share this book

Add to My Shelf

Recruitment of the motor system during music listening: An ALE meta-analysis of fMRI data

by Cobb, Patrice R. , Balasubramaniam, Ramesh , Gordon, Chelsea L. in Analysis , Auditory perception , Biology and Life Sciences

2018

Several neuroimaging studies have shown that listening to music activates brain regions that reside in the motor system, even when there is no overt movement. However, many of these studies report the activation of varying motor system areas that include the primary motor cortex, supplementary motor area, dorsal and ventral pre-motor areas and parietal regions. In order to examine what specific roles are played by various motor regions during music perception, we used activation likelihood estimation (ALE) to conduct a meta-analysis of neuroimaging literature on passive music listening. After extensive search of the literature, 42 studies were analyzed resulting in a total of 386 unique subjects contributing 694 activation foci in total. As suspected, auditory activations were found in the bilateral superior temporal gyrus, transverse temporal gyrus, insula, pyramis, bilateral precentral gyrus, and bilateral medial frontal gyrus. We also saw the widespread activation of motor networks including left and right lateral premotor cortex, right primary motor cortex, and the left cerebellum. These results suggest a central role of the motor system in music and rhythm perception. We discuss these findings in the context of the Action Simulation for Auditory Prediction (ASAP) model and other predictive coding accounts of brain function.

Journal Article

Share this book

Add to My Shelf

Biases in Visual, Auditory, and Audiovisual Perception of Space

by Wozny, David R. , Odegaard, Brian , Shams, Ladan in Accuracy , Acoustic Stimulation - methods , Adolescent

2015

Localization of objects and events in the environment is critical for survival, as many perceptual and motor tasks rely on estimation of spatial location. Therefore, it seems reasonable to assume that spatial localizations should generally be accurate. Curiously, some previous studies have reported biases in visual and auditory localizations, but these studies have used small sample sizes and the results have been mixed. Therefore, it is not clear (1) if the reported biases in localization responses are real (or due to outliers, sampling bias, or other factors), and (2) whether these putative biases reflect a bias in sensory representations of space or a priori expectations (which may be due to the experimental setup, instructions, or distribution of stimuli). Here, to address these questions, a dataset of unprecedented size (obtained from 384 observers) was analyzed to examine presence, direction, and magnitude of sensory biases, and quantitative computational modeling was used to probe the underlying mechanism(s) driving these effects. Data revealed that, on average, observers were biased towards the center when localizing visual stimuli, and biased towards the periphery when localizing auditory stimuli. Moreover, quantitative analysis using a Bayesian Causal Inference framework suggests that while pre-existing spatial biases for central locations exert some influence, biases in the sensory representations of both visual and auditory space are necessary to fully explain the behavioral data. How are these opposing visual and auditory biases reconciled in conditions in which both auditory and visual stimuli are produced by a single event? Potentially, the bias in one modality could dominate, or the biases could interact/cancel out. The data revealed that when integration occurred in these conditions, the visual bias dominated, but the magnitude of this bias was reduced compared to unisensory conditions. Therefore, multisensory integration not only improves the precision of perceptual estimates, but also the accuracy.

Journal Article

Share this book

Add to My Shelf

The sound of trustworthiness: Acoustic-based modulation of perceived voice personality

by Bibi Boehme , Pascal Belin , Philip McAleer in [SDV.NEU.NB]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]/Neurobiology , [SDV.NEU.SC]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]/Cognitive Sciences , Acoustic equipment

2017

When we hear a new voice we automatically form a \"first impression\" of the voice owner's personality; a single word is sufficient to yield ratings highly consistent across listeners. Past studies have shown correlations between personality ratings and acoustical parameters of voice, suggesting a potential acoustical basis for voice personality impressions, but its nature and extent remain unclear. Here we used data-driven voice computational modelling to investigate the link between acoustics and perceived trustworthiness in the single word \"hello\". Two prototypical voice stimuli were generated based on the acoustical features of voices rated low or high in perceived trustworthiness, respectively, as well as a continuum of stimuli inter- and extrapolated between these two prototypes. Five hundred listeners provided trustworthiness ratings on the stimuli via an online interface. We observed an extremely tight relationship between trustworthiness ratings and position along the trustworthiness continuum (r = 0.99). Not only were trustworthiness ratings higher for the high- than the low-prototypes, but the difference could be modulated quasi-linearly by reducing or exaggerating the acoustical difference between the prototypes, resulting in a strong caricaturing effect. The f0 trajectory, or intonation, appeared a parameter of particular relevance: hellos rated high in trustworthiness were characterized by a high starting f0 then a marked decrease at mid-utterance to finish on a strong rise. These results demonstrate a strong acoustical basis for voice personality impressions, opening the door to multiple potential applications.

Journal Article

Share this book

Add to My Shelf

A computational study to model the effect of electrode-to-auditory nerve fiber distance on spectral resolution in cochlear implant

by Woo, Jihwan , Yang, Hyejin , Choi, Inyong in Acoustic Stimulation - methods , Auditory discrimination , Auditory nerve

2020

Spectral ripple discrimination (SRD) has been widely used to evaluate the spectral resolution in cochlear implant (CI) recipients based on its strong correlation with speech perception performance. However, despite its usefulness for predicting speech perception outcomes, SRD performance exhibits large across-subject variabilities even among subjects implanted with the same CIs and sound processors. The potential factors of this observation include current spread, nerve survival, and CI mapping. Previous studies have found that the spectral resolution reduces with increasing distance of the stimulation electrode from the auditory nerve fibers (ANFs), attributable to increasing current spread. However, it remains unclear whether the spread of excitation is the only cause of the observation, or whether other factors such as temporal interaction also contribute to it. In this study, we used a computational model to investigate channel interaction upon non-simultaneous stimulation with respect to the electrode-ANF distance, and evaluated the SRD performance for five electrode-ANF distances. The SRD performance was determined based on the similarity between two neurograms in response to standard and inverted stimuli and used to evaluate the spectral resolution in the computational model. The spread of excitation was observed to increase with increasing electrode-ANF distance, consistent with previous findings. Additionally, the preceding pulses delivered from neighboring channels induced a channel interaction that either inhibited or facilitated the neural responses to subsequent pulses depending on the electrode-ANF distance. The SRD performance was also found to decrease with increasing electrode-ANF distance. The findings of this study suggest that variation of the neural responses (inhibition or facilitation) with the electrode-ANF distance in CI users may cause spectral smearing, and hence poor spectral resolution. A computational model such as that used in this study is a useful tool for understanding the neural factors related to CI outcomes, such as cannot be accomplished by behavioral studies alone.

Journal Article

Share this book

Add to My Shelf

Long-term modification of cortical synapses improves sensory perception

by Zaika, Natalya , Yuan, Kexin , Bernstein, Hannah in 631/378/2620 , Acoustic Stimulation , Anesthetics - pharmacology

2013

By pairing acoustic stimuli and electrical stimulation of the nucleus basalis neuromodulatory system in rats, the authors show an induction of long-lasting synaptic modifications of the auditory cortex that conserved excitation across the auditory receptive fields. This type of modification also improved auditory sensory detection and behavioral performance in tone perception. Synapses and receptive fields of the cerebral cortex are plastic. However, changes to specific inputs must be coordinated within neural networks to ensure that excitability and feature selectivity are appropriately configured for perception of the sensory environment. We induced long-lasting enhancements and decrements to excitatory synaptic strength in rat primary auditory cortex by pairing acoustic stimuli with activation of the nucleus basalis neuromodulatory system. Here we report that these synaptic modifications were approximately balanced across individual receptive fields, conserving mean excitation while reducing overall response variability. Decreased response variability should increase detection and recognition of near-threshold or previously imperceptible stimuli. We confirmed both of these hypotheses in behaving animals. Thus, modification of cortical inputs leads to wide-scale synaptic changes, which are related to improved sensory perception and enhanced behavioral performance.

Journal Article

Share this book

Add to My Shelf

A Comparison of Regularization Methods in Forward and Backward Models for Auditory Attention Decoding

by Slaney, Malcolm , de Cheveigné, Alain , Wong, Daniel D. E. in Acoustics , Attention , attention decoding

2018

The decoding of selective auditory attention from noninvasive electroencephalogram (EEG) data is of interest in brain computer interface and auditory perception research. The current state-of-the-art approaches for decoding the attentional selection of listeners are based on linear mappings between features of sound streams and EEG responses (forward model), or vice versa (backward model). It has been shown that when the envelope of attended speech and EEG responses are used to derive such mapping functions, the model estimates can be used to discriminate between attended and unattended talkers. However, the predictive/reconstructive performance of the models is dependent on how the model parameters are estimated. There exist a number of model estimation methods that have been published, along with a variety of datasets. It is currently unclear if any of these methods perform better than others, as they have not yet been compared side by side on a single standardized dataset in a controlled fashion. Here, we present a comparative study of the ability of different estimation methods to classify attended speakers from multi-channel EEG data. The performance of the model estimation methods is evaluated using different performance metrics on a set of labeled EEG data from 18 subjects listening to mixtures of two speech streams. We find that when forward models predict the EEG from the attended audio, regularized models do not improve regression or classification accuracies. When backward models decode the attended speech from the EEG, regularization provides higher regression and classification accuracies.

Journal Article

Share this book

Add to My Shelf

Sparse high-dimensional decomposition of non-primary auditory cortical receptive fields

by Shamma, Shihab , Mukherjee, Shoutik , Babadi, Behtash in Acoustic Stimulation , Action Potentials - physiology , Animals

2025

Characterizing neuronal responses to natural stimuli remains a central goal in sensory neuroscience. In auditory cortical neurons, the stimulus selectivity of elicited spiking activity is summarized by a spectrotemporal receptive field (STRF) that relates neuronal responses to the stimulus spectrogram. Though effective in characterizing primary auditory cortical responses, STRFs of non-primary auditory neurons can be quite intricate, reflecting their mixed selectivity. The complexity of non-primary STRFs hence impedes understanding how acoustic stimulus representations are transformed along the auditory pathway. Here, we focus on the relationship between ferret primary auditory cortex (A1) and a secondary region, dorsal posterior ectosylvian gyrus (PEG). We propose estimating receptive fields in PEG with respect to a well-established high-dimensional computational model of primary-cortical stimulus representations. These “cortical receptive fields” (CortRF) are estimated greedily to identify the salient primary-cortical features modulating spiking responses and in turn related to corresponding spectrotemporal features. Hence, they provide biologically plausible hierarchical decompositions of STRFs in PEG. Such CortRF analysis was applied to PEG neuronal responses to speech and temporally orthogonal ripple combination (TORC) stimuli and, for comparison, to A1 neuronal responses. CortRFs of PEG neurons captured their selectivity to more complex spectrotemporal features than A1 neurons; moreover, CortRF models were more predictive of PEG (but not A1) responses to speech. Our results thus suggest that secondary-cortical stimulus representations can be computed as sparse combinations of primary-cortical features that facilitate encoding natural stimuli. Thus, by adding the primary-cortical representation, we can account for PEG single-unit responses to natural sounds better than bypassing it and considering as input the auditory spectrogram. These results confirm with explicit details the presumed hierarchical organization of the auditory cortex.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter