Catalogue Search | MBRL

Long-term memory representations for audio-visual scenes

by Jaggy, Oliver , Huff, Markus , Papenmeier, Frank in Behavioral Science and Psychology , Cognition , Cognitive Psychology

2023

In this study, we investigated the nature of long-term memory representations for naturalistic audio-visual scenes. Whereas previous research has shown that audio-visual scenes are recognized more accurately than their unimodal counterparts, it remains unclear whether this benefit stems from audio-visually integrated long-term memory representations or a summation of independent retrieval cues. We tested two predictions for audio-visually integrated memory representations. First, we used a modeling approach to test whether recognition performance for audio-visual scenes is more accurate than would be expected from independent retrieval cues. This analysis shows that audio-visual integration is not necessary to explain the benefit of audio-visual scenes relative to purely auditory or purely visual scenes. Second, we report a series of experiments investigating the occurrence of study-test congruency effects for unimodal and audio-visual scenes. Most importantly, visually encoded information was immune to additional auditory information presented during testing, whereas auditory encoded information was susceptible to additional visual information presented during testing. This renders a true integration of visual and auditory information in long-term memory representations unlikely. In sum, our results instead provide evidence for visual dominance in long-term memory. Whereas associative auditory information is capable of enhancing memory performance, the long-term memory representations appear to be primarily visual.

Journal Article

Share this book

Add to My Shelf

Effortless facial expression recognition without motor simulation

by Carneiro Pereira, Sarah , Vannuscorps, Gilles in 631/378/1457 , 631/378/2613 , 631/477/2811

2025

Efficient recognition of facial expressions is often assumed to require covert and unconscious imitation of the observed facial movements – a “motor simulation”. At odds with this hypothesis, some individuals with congenital facial paralysis can recognize facial expressions with typical speed and accuracy. However, efficiency also implies minimal cognitive effort. Motor simulation may reduce the need for effortful inferential processing. To explore this, we asked 16 individuals born with congenital facial paralysis and typically developed participants to categorize emotional sounds (e.g. anger vs. fear) while viewing congruent or incongruent facial expressions. The facial expressions were irrelevant to the task, but facial identity was important for a subsequent memory test. If motor simulation is necessary for effortless facial expression recognition, individuals born with congenital facial paralysis should not be influenced by the facial expressions. If motor simulation contributes to effortless facial expression recognition without being necessary, they could be less influenced than the typically developed participant. However, the results did not support either of these predictions. Effortless facial expression recognition can be achieved without motor simulation.

Journal Article

Share this book

Add to My Shelf

Cross-modal congruency modulates evidence accumulation, not decision thresholds

by Kampa, Björn , Brožová, Natálie , Kayser, Christoph in audio-visual integration , cognitive modeling , Cross-modal

2025

Audiovisual cross-modal correspondences (CMCs) refer to the brain's inherent ability to subconsciously connect auditory and visual information. These correspondences reveal essential aspects of multisensory perception and influence behavioral performance, enhancing reaction times and accuracy. However, the impact of different types of CMCs–arising from statistical co-occurrences or shaped by semantic associations–on information processing and decision-making remains underexplored. This study utilizes the Implicit Association Test, where unisensory stimuli are sequentially presented and linked via CMCs within an experimental block by the specific response instructions (either congruent or incongruent). Behavioral data are integrated with EEG measurements through neurally informed drift-diffusion modeling to examine how neural activity across both auditory and visual trials is modulated by CMCs. Our findings reveal distinct neural components that differentiate between congruent and incongruent stimuli regardless of modality, offering new insights into the role of congruency in shaping multisensory perceptual decision-making. Two key neural stages were identified: an Early component enhancing sensory encoding in congruent trials and a Late component affecting evidence accumulation, particularly in incongruent trials. These results suggest that cross-modal congruency primarily influences the processing and accumulation of sensory information rather than altering decision thresholds.

Journal Article

Share this book

Add to My Shelf

Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect

by Mitra, Suvobrata , Stacey, Jemaine E. , Stacey, Paula C. in Auditory Perception , Auditory Stimuli , Behavioral Science and Psychology

2020

Seeing a talker’s face can aid audiovisual (AV) integration when speech is presented in noise. However, few studies have simultaneously manipulated auditory and visual degradation. We aimed to establish how degrading the auditory and visual signal affected AV integration. Where people look on the face in this context is also of interest; Buchan, Paré and Munhall ( Brain Research , 1242 , 162–171, 2008 ) found fixations on the mouth increased in the presence of auditory noise whilst Wilson, Alsius, Paré and Munhall ( Journal of Speech, Language, and Hearing Research , 59 (4), 601–615, 2016 ) found mouth fixations decreased with decreasing visual resolution. In Condition 1, participants listened to clear speech, and in Condition 2, participants listened to vocoded speech designed to simulate the information provided by a cochlear implant. Speech was presented in three levels of auditory noise and three levels of visual blurring. Adding noise to the auditory signal increased McGurk responses, while blurring the visual signal decreased McGurk responses. Participants fixated the mouth more on trials when the McGurk effect was perceived. Adding auditory noise led to people fixating the mouth more, while visual degradation led to people fixating the mouth less. Combined, the results suggest that modality preference and where people look during AV integration of incongruent syllables varies according to the quality of information available.

Journal Article

Share this book

Add to My Shelf

Emotional multisensory integration in 5- to 6-year-old children occurs early rather than late: ERP evidence

by Liu, Qian , Deng, Huan , Zhu, Jingting in Alexithymia , Autism , Behavioral Science and Psychology

2024

Multisensory integration of emotion is crucial for social interactions. However, existing research on emotional multisensory integration has primarily focused on adults and infants, with less attention paid to children. To explore the cognitive and neural mechanisms underlying emotional multisensory integration in children, we recorded event-related potentials from 42 children aged 5 to 6 years when they identified emotional stimuli. The emotional stimuli, including pictures, sounds, or both presented simultaneously, were randomly presented. Children were required to recognize the emotions of the stimuli. Results showed that (1) children identified audiovisual emotional stimuli faster and more accurately than visual stimuli; (2) multimodal stimuli elicited a larger N1 component than visual stimuli did, which may indicate the early development of emotional multisensory integration in children; (3) the audiovisual P2 component did not exhibit an inhibitory effect compared with visual stimuli, indicating information redundancy resulting from incomplete cognitive development in children. In conclusion, these results suggest that emotional multisensory integration in 5- to 6-year-old children occurs early rather than late.

Journal Article

Share this book

Add to My Shelf

McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS)

by Debener, Stefan , Stropahl, Maren , Schellhardt, Sebastian in Adult , Aged , Behavioral Science and Psychology

2017

The concurrent presentation of different auditory and visual syllables may result in the perception of a third syllable, reflecting an illusory fusion of visual and auditory information. This well-known McGurk effect is frequently used for the study of audio-visual integration. Recently, it was shown that the McGurk effect is strongly stimulus-dependent, which complicates comparisons across perceivers and inferences across studies. To overcome this limitation, we developed the freely available Oldenburg audio-visual speech stimuli (OLAVS), consisting of 8 different talkers and 12 different syllable combinations. The quality of the OLAVS set was evaluated with 24 normal-hearing subjects. All 96 stimuli were characterized based on their stimulus disparity, which was obtained from a probabilistic model (cf. Magnotti & Beauchamp, 2015 ). Moreover, the McGurk effect was studied in eight adult cochlear implant (CI) users. By applying the individual, stimulus-independent parameters of the probabilistic model, the predicted effect of stronger audio-visual integration in CI users could be confirmed, demonstrating the validity of the new stimulus material.

Journal Article

Share this book

Add to My Shelf

Integrative interaction of emotional speech in audio-visual modality

by Wei, Jianguo , Li, Na , Fan, Lingzhong in Brain , Brain mapping , Cortex (parietal)

2022

Emotional clues are always expressed in many ways in our daily life, and the emotional information we receive is often represented by multiple modalities. Successful social interactions require a combination of multisensory cues to accurately determine the emotion of others. The integration mechanism of multimodal emotional information has been widely investigated. Different brain activity measurement methods were used to determine the location of brain regions involved in the audio-visual integration of emotional information, mainly in the bilateral superior temporal regions. However, the methods adopted in these studies are relatively simple, and the materials of the study rarely contain speech information. The integration mechanism of emotional speech in the human brain still needs further examination. In this paper, a functional magnetic resonance imaging (fMRI) study was conducted using event-related design to explore the audio-visual integration mechanism of emotional speech in human brain by using dynamic facial expressions and emotional speech to express emotions of different valences. Representational similarity analysis (RSA) based on regions of interest (ROIs), whole brain searchlight analysis, modality conjunction analysis and supra-additive analysis were used to analyze and verify the role of relevant brain regions. Meanwhile, a weighted RSA method was used to evaluate the contributions of each candidate models in the best fitted model of ROIs. The results showed that only the left insula was detected by all methods, suggesting that the left insula played an important role in the audio-visual integration of emotional speech. Whole brain searchlight analysis, modality conjunction analysis and supra-additive analysis together reveal that the bilateral middle temporal gyrus (MTG), right inferior parietal lobule (IPL) and bilateral precuneus might be involved in the audio-visual integration of emotional speech from other aspects.

Journal Article

Share this book

Add to My Shelf

Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level

by Leppänen, Paavo H. T. , Kolozsvári, Orsolya B. , Xu, Weiyong in audio-visual integration , audio-visual stimuli , Familiarity

2019

During speech perception, listeners rely on multimodal input and make use of both auditory and visual information. When presented with speech, for example syllables, the differences in brain responses to distinct stimuli are not, however, caused merely by the acoustic or visual features of the stimuli. The congruency of the auditory and visual information and the familiarity of a syllable, that is, whether it appears in the listener's native language or not, also modulates brain responses. We investigated how the congruency and familiarity of the presented stimuli affect brain responses to audio-visual (AV) speech in 12 adult Finnish native speakers and 12 adult Chinese native speakers. They watched videos of a Chinese speaker pronouncing syllables (/pa/, /pha/, /ta/, /tha/, /fa/) during a magnetoencephalography (MEG) measurement where only /pa/ and /ta/ were part of Finnish phonology while all the stimuli were part of Chinese phonology. The stimuli were presented in audio-visual (congruent or incongruent), audio only, or visual only conditions. The brain responses were examined in five time-windows: 75-125, 150-200, 200-300, 300-400, and 400-600 ms. We found significant differences for the congruency comparison in the fourth time-window (300-400 ms) in both sensor and source level analysis. Larger responses were observed for the incongruent stimuli than for the congruent stimuli. For the familiarity comparisons no significant differences were found. The results are in line with earlier studies reporting on the modulation of brain responses for audio-visual congruency around 250-500 ms. This suggests a much stronger process for the general detection of a mismatch between predictions based on lip movements and the auditory signal than for the top-down modulation of brain responses based on phonological information.

Journal Article

Share this book

Add to My Shelf

Mental Effort When Playing, Listening, and Imagining Music in One Pianist’s Eyes and Brain

by Hagen, Thomas , Godøy, Rolf Inge , Endestad, Tor in audiation , audio-visual integration , Brain mapping

2020

We investigated volitional imagery and ‘musical effort’ with a professional pianist and non-professional pianists and non-musicians by use of pupillometry and fMRI. We confirmed that musical imagery has strong commonality with music listening in both experts and naïve individuals. The combined approach of psychophysiology and neuroimaging revealed the cognitive work during musical activities like playing, listening, and simply imagining the music. Our objective measure of effort, via pupil size, showed that pupil diameters were largest when ‘playing’ (regardless whether there was sound produced or not) compared to conditions with no movement (i.e., ‘listening’ and ‘imagery’). We found positive correlations between pupil diameters of the professional pianist during different conditions with the same piano piece (i.e., normal playing, silenced playing, listen, imagining), which might indicate similar degrees of load on cognitive resources as well as an intimate link between the motor imagery of sound-producing body motions and gestures. Neuroimaging with fMRI provided evidence for a relationship between noradrenergic activity and mental workload or attentional intensity within the domain of music cognition. We found effort related activity in the superior part of the locus coeruleus and, similarly to the pupil, the listening and imagery engaged less the LC-NE system than the motor condition. Pianists attended more intensively to the most difficult piece than non-musicians, since they showed larger pupils than non-musicians only for the most difficult piece. Non-musicians seemed to be the most engaged group by listening, suggesting that the amount of attention allocated for a same task may follow a hierarchy of expertise demanding less attentional effort in expert or performers than in novices. We found only weak evidence for a commonality between subjective effort ratings and the objective effort gauged with pupil diameter during listening. We suggest that psychophysiological methods like pupillometry could index mental effort in a manner that is not ‘observable’ in awareness or via introspection

Journal Article

Share this book

Add to My Shelf

Synaesthetic Interactions between Sounds and Colour Afterimages: Revisiting Werner and Zietz’s Approach

by Cattaruzza, Serena , Prenassi, Marco , Parovel, Giulia in Sound

2022

We ran a pilot experiment to explore, using a new psychophysical method, the hypothesis proposed by Zietz and Werner in the ’30s, that a sound presented simultaneously with an afterimage can change its phenomenal appearance in non-synaesthetes. The method we adopted is able to directly collect and visualise the apparent changes in intensity of the afterimages, by recording observers’ interactions with a physical feedback mechanism (the paths that the observers generated by moving a cursor), without referring to verbal descriptions. These first findings support some of the most meaningful observations reported by Werner (1934) and Zietz (1931), according to which the colours of the afterimages ‘disintegrate’ at the hearing of a low sound and ‘concentrate’ for a high sound. This relationship is particularly evident with the Yellow stimulus, where the perceived colour intensity of its afterimage seems to have a faster negative change with a low-pitched tone sound, and an increase in intensity and duration when perceived simultaneously with a soprano sound. These data are also coherent with the crossmodal correspondences between both pitch and loudness in audition and lightness and brightness in vision reported in the literature.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter