Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
by
Nayak, Sanjib Kumar
, Orchid Chetia Phukan
, Arun Balaji Buduru
, Nayak, Ananda Chandra
, Behera, Swarup Ranjan
, Mohd Mujtaba Akhtar
, Girish
, Pailla Balakrishna Reddy
in
Emotion recognition
/ Representations
/ Speech recognition
2025
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
by
Nayak, Sanjib Kumar
, Orchid Chetia Phukan
, Arun Balaji Buduru
, Nayak, Ananda Chandra
, Behera, Swarup Ranjan
, Mohd Mujtaba Akhtar
, Girish
, Pailla Balakrishna Reddy
in
Emotion recognition
/ Representations
/ Speech recognition
2025
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
by
Nayak, Sanjib Kumar
, Orchid Chetia Phukan
, Arun Balaji Buduru
, Nayak, Ananda Chandra
, Behera, Swarup Ranjan
, Mohd Mujtaba Akhtar
, Girish
, Pailla Balakrishna Reddy
in
Emotion recognition
/ Representations
/ Speech recognition
2025
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
Paper
Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
2025
Request Book From Autostore
and Choose the Collection Method
Overview
In this work, we investigate multimodal foundation models (MFMs) for EmoFake detection (EFD) and hypothesize that they will outperform audio foundation models (AFMs). MFMs due to their cross-modal pre-training, learns emotional patterns from multiple modalities, while AFMs rely only on audio. As such, MFMs can better recognize unnatural emotional shifts and inconsistencies in manipulated audio, making them more effective at distinguishing real from fake emotional expressions. To validate our hypothesis, we conduct a comprehensive comparative analysis of state-of-the-art (SOTA) MFMs (e.g. LanguageBind) alongside AFMs (e.g. WavLM). Our experiments confirm that MFMs surpass AFMs for EFD. Beyond individual foundation models (FMs) performance, we explore FMs fusion, motivated by findings in related research areas such synthetic speech detection and speech emotion recognition. To this end, we propose SCAR, a novel framework for effective fusion. SCAR introduces a nested cross-attention mechanism, where representations from FMs interact at two stages sequentially to refine information exchange. Additionally, a self-attention refinement module further enhances feature representations by reinforcing important cross-FM cues while suppressing noise. Through SCAR with synergistic fusion of MFMs, we achieve SOTA performance, surpassing both standalone FMs and conventional fusion approaches and previous works on EFD.
Publisher
Cornell University Library, arXiv.org
Subject
This website uses cookies to ensure you get the best experience on our website.