Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model
by
Jang, Dong-Won
, Park, Hyung-Min
, Park, Rae-Hong
, Kim, Jae-Bin
, Lee, Yong-Hyeok
in
Acoustics
/ attention
/ audio–visual recognition
/ cross-modality alignment
/ Deep learning
/ dual cross-modality attention
/ hybrid CTC/attention
/ Lipreading
/ Machine translating
/ Methods
/ Neural networks
/ Noise
/ Queries
/ Speech
/ transformer
/ Voice recognition
2020
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model
by
Jang, Dong-Won
, Park, Hyung-Min
, Park, Rae-Hong
, Kim, Jae-Bin
, Lee, Yong-Hyeok
in
Acoustics
/ attention
/ audio–visual recognition
/ cross-modality alignment
/ Deep learning
/ dual cross-modality attention
/ hybrid CTC/attention
/ Lipreading
/ Machine translating
/ Methods
/ Neural networks
/ Noise
/ Queries
/ Speech
/ transformer
/ Voice recognition
2020
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model
by
Jang, Dong-Won
, Park, Hyung-Min
, Park, Rae-Hong
, Kim, Jae-Bin
, Lee, Yong-Hyeok
in
Acoustics
/ attention
/ audio–visual recognition
/ cross-modality alignment
/ Deep learning
/ dual cross-modality attention
/ hybrid CTC/attention
/ Lipreading
/ Machine translating
/ Methods
/ Neural networks
/ Noise
/ Queries
/ Speech
/ transformer
/ Voice recognition
2020
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model
Journal Article
Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model
2020
Request Book From Autostore
and Choose the Collection Method
Overview
Since attention mechanism was introduced in neural machine translation, attention has been combined with the long short-term memory (LSTM) or replaced the LSTM in a transformer model to overcome the sequence-to-sequence (seq2seq) problems with the LSTM. In contrast to the neural machine translation, audio–visual speech recognition (AVSR) may provide improved performance by learning the correlation between audio and visual modalities. As a result that the audio has richer information than the video related to lips, AVSR is hard to train attentions with balanced modalities. In order to increase the role of visual modality to a level of audio modality by fully exploiting input information in learning attentions, we propose a dual cross-modality (DCM) attention scheme that utilizes both an audio context vector using video query and a video context vector using audio query. Furthermore, we introduce a connectionist-temporal-classification (CTC) loss in combination with our attention-based model to force monotonic alignments required in AVSR. Recognition experiments on LRS2-BBC and LRS3-TED datasets showed that the proposed model with the DCM attention scheme and the hybrid CTC/attention architecture achieved at least a relative improvement of 7.3% on average in the word error rate (WER) compared to competing methods based on the transformer model.
Publisher
MDPI AG
This website uses cookies to ensure you get the best experience on our website.