Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

by Jiang, Bo , Wang, Xixi , Luo, Bin , Wang, Xiao , Tang, Jin

in Affinity / Cognitive tasks / Diffusion / Modal data / Representation learning / Transformers

2024

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Journal Article

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

Jiang, Bo,

Wang, Xixi,

Luo, Bin,

Wang, Xiao,

Tang, Jin

2024

Overview

Aggregating multi-modal data to obtain reliable data representation attracts more and more attention. Recent studies demonstrate that Transformer models usually work well for multi-modal tasks. Existing Transformers generally either adopt the cross-attention (CA) mechanism or simple concatenation to achieve the information interaction among different modalities which generally ignore the issue of modality gap. In this work, we re-think Transformer and extend it to MutualFormer for multi-modal data representation. Rather than CA in Transformer, MutualFormer employs our new design of cross-diffusion attention (CDA) to conduct the information communication among different modalities. Comparing with CA, the main advantages of the proposed CDA are three aspects. First, the cross-affinities in CDA are defined based on the individual modal affinities (token metrics) which thus can naturally alleviate the issue of modality/domain gap existed in traditional token feature based CA definition. Second, CDA provides a general scheme which can either be used for multi-modal representation or serve as the post-optimization for existing CA models. Third, CDA is implemented efficiently. We successfully apply the MutualFormer on several multi-modal learning tasks. Extensive experiments demonstrate the effectiveness of the proposed MutualFormer.

Share this book

Add to My Shelf

Publisher

Springer Nature B.V

Subject

/ Representation learning

/ Transformers