Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
VTP: volumetric transformer for multi-view multi-person 3D pose estimation
by
Huang, Ouhan
, Jia, Gangyong
, Chen, Yuxing
, Gu, Renshu
in
Cameras
/ Design
/ Error correction
/ Methods
/ Performance enhancement
/ Pose estimation
/ Position errors
/ Representations
/ Transformers
2023
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
VTP: volumetric transformer for multi-view multi-person 3D pose estimation
by
Huang, Ouhan
, Jia, Gangyong
, Chen, Yuxing
, Gu, Renshu
in
Cameras
/ Design
/ Error correction
/ Methods
/ Performance enhancement
/ Pose estimation
/ Position errors
/ Representations
/ Transformers
2023
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
VTP: volumetric transformer for multi-view multi-person 3D pose estimation
Journal Article
VTP: volumetric transformer for multi-view multi-person 3D pose estimation
2023
Request Book From Autostore
and Choose the Collection Method
Overview
This paper presents Volumetric Transformer Pose Estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.
Publisher
Springer Nature B.V
Subject
This website uses cookies to ensure you get the best experience on our website.