Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
GVA: guided visual attention approach for automatic image caption generation
by
Hossen, Md. Bipul
, Abdussalam, Amr
, Ye, Zhongfu
, Hossain, Md. Imran
in
Artificial neural networks
/ Computer Communication Networks
/ Computer Graphics
/ Computer Science
/ Cryptology
/ Data Storage Representation
/ Datasets
/ Encoders-Decoders
/ Feature extraction
/ Language
/ Modules
/ Multimedia Information Systems
/ Neural networks
/ Operating Systems
/ Semantics
/ Special Issue Paper
2024
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
GVA: guided visual attention approach for automatic image caption generation
by
Hossen, Md. Bipul
, Abdussalam, Amr
, Ye, Zhongfu
, Hossain, Md. Imran
in
Artificial neural networks
/ Computer Communication Networks
/ Computer Graphics
/ Computer Science
/ Cryptology
/ Data Storage Representation
/ Datasets
/ Encoders-Decoders
/ Feature extraction
/ Language
/ Modules
/ Multimedia Information Systems
/ Neural networks
/ Operating Systems
/ Semantics
/ Special Issue Paper
2024
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
GVA: guided visual attention approach for automatic image caption generation
by
Hossen, Md. Bipul
, Abdussalam, Amr
, Ye, Zhongfu
, Hossain, Md. Imran
in
Artificial neural networks
/ Computer Communication Networks
/ Computer Graphics
/ Computer Science
/ Cryptology
/ Data Storage Representation
/ Datasets
/ Encoders-Decoders
/ Feature extraction
/ Language
/ Modules
/ Multimedia Information Systems
/ Neural networks
/ Operating Systems
/ Semantics
/ Special Issue Paper
2024
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
GVA: guided visual attention approach for automatic image caption generation
Journal Article
GVA: guided visual attention approach for automatic image caption generation
2024
Request Book From Autostore
and Choose the Collection Method
Overview
Automated image caption generation with attention mechanisms focuses on visual features including objects, attributes, actions, and scenes of the image to understand and provide more detailed captions, which attains great attention in the multimedia field. However, deciding which aspects of an image to highlight for better captioning remains a challenge. Most advanced captioning models utilize only one attention module to assign attention weights to visual vectors, but this may not be enough to create an informative caption. To tackle this issue, we propose an innovative and well-designed Guided Visual Attention (GVA) approach, incorporating an additional attention mechanism to re-adjust the attentional weights on the visual feature vectors and feed the resulting context vector to the language LSTM. Utilizing the first-level attention module as guidance for the GVA module and re-weighting the attention weights significantly enhances the caption’s quality. Recently, deep neural networks have allowed the encoder-decoder architecture to make use visual attention mechanism, where faster R-CNN is used for extracting features in the encoder and a visual attention-based LSTM is applied in the decoder. Extensive experiments have been implemented on both the MS-COCO and Flickr30k benchmark datasets. Compared with state-of-the-art methods, our approach achieved an average improvement of 2.4% on BLEU@1 and 13.24% on CIDEr for the MSCOCO dataset, as well as 4.6% on BLEU@1 and 12.48% on CIDEr score for the Flickr30K datasets, based on the cross-entropy optimization. These results demonstrate the clear superiority of our proposed approach in comparison to existing methods using standard evaluation metrics. The implementing code can be found here: (
https://github.com/mdbipu/GVA
).
Publisher
Springer Berlin Heidelberg,Springer Nature B.V
Subject
This website uses cookies to ensure you get the best experience on our website.