Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
301 result(s) for "Auditory perception Mathematical models."
Sort by:
Human and machine hearing : extracting meaning from sound
\"If we understood more about how humans hear, we could make machines hear better, in the sense of being able to analyze sound and extract useful and meaningful information from it. Or so I claim. I have been working for decades, but more intensely in recent years, to add some substance to this claim, and to help engineers and scientists understand how the pieces fit together, so they can help move the art forward. There is still plenty to be done, and this book is my attempt to help focus the effort in this field into productive directions; to help new practitioners see enough of the evolution of ideas that they can skip to where new developments and experiments are needed, or to techniques that can already solve their sound understanding problems. The book-writing process has been tremendous fun, with support from family, friends, and colleagues. They do, however, have a tendency to ask two annoying questions: \"Is the book done yet?\" and \"Who is your audience?\" The first eventually answers itself, but I need to say a few words about the second. I find that interest in sound and hearing comes from people of many different disciplines, with complementary backgrounds and sometimes incompatible terminology and concepts. I want all of these people as my audience, as I want to teach a synthesis of their various viewpoints into a more comprehensive framework that includes everything needed to work on machine hearing problems. That is, electrical engineers, computer scientists, physicists, physiologists, audiologists, musicians, psychologists, and others are all part of my audience. Students, teachers, researchers, product managers, developers, and hackers are, too\"-- Provided by publisher.
Dissecting neural computations in the human auditory pathway using deep neural networks for speech
The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex. Using direct intracranial recordings and modern speech AI models, Li and colleagues show representational and computational similarities between deep neural networks for self-supervised speech learning and the human auditory pathway.
Tactile stimulations reduce or promote the segregation of auditory streams: psychophysics and modeling
Auditory stream segregation plays a crucial role in understanding the auditory scene. This study investigates the role of tactile stimulation in auditory stream segregation through psychophysics experiments and a computational model of audio-tactile interactions. We examine how tactile pulses, synchronized with one group of tones (high- or low-frequency tones) in a sequence of interleaved high- and low-frequency tones (ABA- triplets), influence the likelihood of perceiving integrated or segregated auditory streams. Our findings reveal that tactile pulses synchronized with a single tone sequence (either the A-tone or B-tone sequence) enhance perceptual segregation, while pulses synchronized with both tone sequences promote integration. Based on these findings, we developed a dynamical model that captures interactions between auditory and tactile neural circuits, including recurrent excitation, mutual inhibition, adaptation, and noise. The proposed model shows excellent agreement with the experiment. Model predictions are validated through psychophysics experiments. In the model, we assume that selective tactile stimulation dynamically modulates the tonotopic organization within the auditory cortex. This modulation facilitates segregation by reinforcing specific tonotopic responses through single-tone synchronization while smoothing neural activity patterns with dual-tone alignment to promote integration. The model offers a robust computational framework for exploring cross-modal effects on stream segregation and predicts neural behavior under varying tactile conditions. Our findings imply that cross-modal synchronization, with carefully timed tactile cues, could improve auditory perception with potential applications in auditory assistive technologies aimed at enhancing speech recognition in noisy settings.
Modelling auditory attention
Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information—a phenomenon referred to as the ‘cocktail party problem’. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by ‘bottom-up’ sensory-driven factors, as well as ‘top-down’ task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes. This article is part of the themed issue ‘Auditory and visual scene analysis’.
Sensory reliability takes priority over the central tendency effect in temporal and spatial estimation
Perception is influenced by contextual factors that help resolve sensory uncertainty. A well-known phenomenon, the central tendency effect, describes how perceptual estimates gravitate toward the mean of a distribution of stimuli, particularly when sensory input is unreliable. However, in multisensory contexts, it remains unclear whether this effect follows a generalized priority across modalities or might be influenced by task-relevant sensory dominance. We studied spatial and temporal estimation in the auditory and visual modalities, testing whether perceptual estimates are driven by a supra-modal prior or by modality reliability specific to the task, and applied Bayesian modeling to explain the results. Participants first performed baseline sessions using only one modality and then a third session in which the modalities were interleaved. In the interleaved session, we found that the changes in auditory and visual estimates were not towards a supra-modal (generalized) prior, but estimates related to the dominant modality (vision for space, audition for time) were stable, while estimates of the other sensory modality (audition for space, vision for time) were pulled towards the dominant modality’s prior. Bayesian modeling also confirmed that the best-fitting models were those in which priors were modality-specific rather than supra-modal. These results highlight that perceptual estimation favors sensory reliability over a general tendency to regress toward the mean, providing insights into how the brain integrates contextual information across modalities.
Biases in Visual, Auditory, and Audiovisual Perception of Space
Localization of objects and events in the environment is critical for survival, as many perceptual and motor tasks rely on estimation of spatial location. Therefore, it seems reasonable to assume that spatial localizations should generally be accurate. Curiously, some previous studies have reported biases in visual and auditory localizations, but these studies have used small sample sizes and the results have been mixed. Therefore, it is not clear (1) if the reported biases in localization responses are real (or due to outliers, sampling bias, or other factors), and (2) whether these putative biases reflect a bias in sensory representations of space or a priori expectations (which may be due to the experimental setup, instructions, or distribution of stimuli). Here, to address these questions, a dataset of unprecedented size (obtained from 384 observers) was analyzed to examine presence, direction, and magnitude of sensory biases, and quantitative computational modeling was used to probe the underlying mechanism(s) driving these effects. Data revealed that, on average, observers were biased towards the center when localizing visual stimuli, and biased towards the periphery when localizing auditory stimuli. Moreover, quantitative analysis using a Bayesian Causal Inference framework suggests that while pre-existing spatial biases for central locations exert some influence, biases in the sensory representations of both visual and auditory space are necessary to fully explain the behavioral data. How are these opposing visual and auditory biases reconciled in conditions in which both auditory and visual stimuli are produced by a single event? Potentially, the bias in one modality could dominate, or the biases could interact/cancel out. The data revealed that when integration occurred in these conditions, the visual bias dominated, but the magnitude of this bias was reduced compared to unisensory conditions. Therefore, multisensory integration not only improves the precision of perceptual estimates, but also the accuracy.
Top-Down Inference in the Auditory System: Potential Roles for Corticofugal Projections
It has become widely accepted that humans use contextual information to infer the meaning of ambiguous acoustic signals. In speech, for example, high-level semantic, syntactic, or lexical information shape our understanding of a phoneme buried in noise. Most current theories to explain this phenomenon rely on hierarchical predictive coding models involving a set of Bayesian priors emanating from high-level brain regions (e.g., prefrontal cortex) that are used to influence processing at lower-levels of the cortical sensory hierarchy (e.g., auditory cortex). As such, virtually all proposed models to explain top-down facilitation are focused on intracortical connections, and consequently, subcortical nuclei have scarcely been discussed in this context. However, subcortical auditory nuclei receive massive, heterogeneous, and cascading descending projections at every level of the sensory hierarchy, and activation of these systems has been shown to improve speech recognition. It is not yet clear whether or how top-down modulation to resolve ambiguous sounds calls upon these corticofugal projections. Here, we review the literature on top-down modulation in the auditory system, primarily focused on humans and cortical imaging/recording methods, and attempt to relate these findings to a growing animal literature, which has primarily been focused on corticofugal projections. We argue that corticofugal pathways contain the requisite circuitry to implement predictive coding mechanisms to facilitate perception of complex sounds and that top-down modulation at early (i.e., subcortical) stages of processing complement modulation at later (i.e., cortical) stages of processing. Finally, we suggest experimental approaches for future studies on this topic.
Perceptual warping exposes categorical representations for speech in human brainstem responses
•Measured brainstem FFRs during online speech categorization.•Speech-FFRs were enhanced in active vs. passive listening.•FFR speech representations were warped according to listeners’ phonetic label.•Subcortical activity carries a perceptual code and is actively modulated in a top-manner during speech perception. The brain transforms continuous acoustic events into discrete category representations to downsample the speech signal for our perceptual-cognitive systems. Such phonetic categories are highly malleable, and their percepts can change depending on surrounding stimulus context. Previous work suggests these acoustic-phonetic mapping and perceptual warping of speech emerge in the brain no earlier than auditory cortex. Here, we examined whether these auditory-category phenomena inherent to speech perception occur even earlier in the human brain, at the level of auditory brainstem. We recorded speech-evoked frequency following responses (FFRs) during a task designed to induce more/less warping of listeners’ perceptual categories depending on stimulus presentation order of a speech continuum (random, forward, backward directions). We used a novel clustered stimulus paradigm to rapidly record the high trial counts needed for FFRs concurrent with active behavioral tasks. We found serial stimulus order caused perceptual shifts (hysteresis) near listeners’ category boundary confirming identical speech tokens are perceived differentially depending on stimulus context. Critically, we further show neural FFRs during active (but not passive) listening are enhanced for prototypical vs. category-ambiguous tokens and are biased in the direction of listeners’ phonetic label even for acoustically-identical speech stimuli. These findings were not observed in the stimulus acoustics nor model FFR responses generated via a computational model of cochlear and auditory nerve transduction, confirming a central origin to the effects. Our data reveal FFRs carry category-level information and suggest top-down processing actively shapes the neural encoding and categorization of speech at subcortical levels. These findings suggest the acoustic-phonetic mapping and perceptual warping in speech perception occur surprisingly early along the auditory neuroaxis, which might aid understanding by reducing ambiguity inherent to the speech signal.
Laminar specificity of the auditory perceptual awareness negativity: A biophysical modeling study
How perception of sensory stimuli emerges from brain activity is a fundamental question of neuroscience. To date, two disparate lines of research have examined this question. On one hand, human neuroimaging studies have helped us understand the large-scale brain dynamics of perception. On the other hand, work in animal models (mice, typically) has led to fundamental insight into the micro-scale neural circuits underlying perception. However, translating such fundamental insight from animal models to humans has been challenging. Here, using biophysical modeling, we show that the auditory awareness negativity (AAN), an evoked response associated with perception of target sounds in noise, can be accounted for by synaptic input to the supragranular layers of auditory cortex (AC) that is present when target sounds are heard but absent when they are missed. This additional input likely arises from cortico-cortical feedback and/or non-lemniscal thalamic projections and targets the apical dendrites of layer-5 (L5) pyramidal neurons. In turn, this leads to increased local field potential activity, increased spiking activity in L5 pyramidal neurons, and the AAN. The results are consistent with current cellular models of conscious processing and help bridge the gap between the macro and micro levels of perception-related brain activity.
A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex
Convolutional neural networks (CNNs) can provide powerful and flexible models of neural sensory processing. However, the utility of CNNs in studying the auditory system has been limited by their requirement for large datasets and the complex response properties of single auditory neurons. To address these limitations, we developed a population encoding model: a CNN that simultaneously predicts activity of several hundred neurons recorded during presentation of a large set of natural sounds. This approach defines a shared spectro-temporal space and pools statistical power across neurons. Population models of varying architecture performed consistently and substantially better than traditional linear-nonlinear models on data from primary and non-primary auditory cortex. Moreover, population models were highly generalizable. The output layer of a model pre-trained on one population of neurons could be fit to data from novel single units, achieving performance equivalent to that of neurons in the original fit data. This ability to generalize suggests that population encoding models capture a complete representational space across neurons in an auditory cortical field.