Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection
by
Tomoyuki Suzuki
, Yoshimitsu Aoki
in
Accuracy
/ action recognition
/ Analysis
/ Benchmarking
/ Chemical technology
/ compressed video
/ Computational linguistics
/ Data Compression
/ Efficiency
/ Electric Power Supplies
/ Electric transformers
/ Language processing
/ Methods
/ Motion
/ Natural language interfaces
/ Neural networks
/ Performance evaluation
/ Policy
/ Spacetime
/ TP1-1185
/ transformer
/ Video compression
/ video recognition
/ video recognition; action recognition; transformer; compressed video
2022
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection
by
Tomoyuki Suzuki
, Yoshimitsu Aoki
in
Accuracy
/ action recognition
/ Analysis
/ Benchmarking
/ Chemical technology
/ compressed video
/ Computational linguistics
/ Data Compression
/ Efficiency
/ Electric Power Supplies
/ Electric transformers
/ Language processing
/ Methods
/ Motion
/ Natural language interfaces
/ Neural networks
/ Performance evaluation
/ Policy
/ Spacetime
/ TP1-1185
/ transformer
/ Video compression
/ video recognition
/ video recognition; action recognition; transformer; compressed video
2022
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection
by
Tomoyuki Suzuki
, Yoshimitsu Aoki
in
Accuracy
/ action recognition
/ Analysis
/ Benchmarking
/ Chemical technology
/ compressed video
/ Computational linguistics
/ Data Compression
/ Efficiency
/ Electric Power Supplies
/ Electric transformers
/ Language processing
/ Methods
/ Motion
/ Natural language interfaces
/ Neural networks
/ Performance evaluation
/ Policy
/ Spacetime
/ TP1-1185
/ transformer
/ Video compression
/ video recognition
/ video recognition; action recognition; transformer; compressed video
2022
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection
Journal Article
Efficient Transformer-Based Compressed Video Modeling via Informative Patch Selection
2022
Request Book From Autostore
and Choose the Collection Method
Overview
Recently, Transformer-based video recognition models have achieved state-of-the-art results on major video recognition benchmarks. However, their high inference cost significantly limits research speed and practical use. In video compression, methods considering small motions and residuals that are less informative and assigning short code lengths to them (e.g., MPEG4) have successfully reduced the redundancy of videos. Inspired by this idea, we propose Informative Patch Selection (IPS), which efficiently reduces the inference cost by excluding redundant patches from the input of the Transformer-based video model. The redundancy of each patch is calculated from motions and residuals obtained while decoding a compressed video. The proposed method is simple and effective in that it can dynamically reduce the inference cost depending on the input without any policy model or additional loss term. Extensive experiments on action recognition demonstrated that our method could significantly improve the trade-off between the accuracy and inference cost of the Transformer-based video model. Although the method does not require any policy model or additional loss term, its performance approaches that of existing methods that do require them.
This website uses cookies to ensure you get the best experience on our website.