Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Graph-Attention Fusion with VAE Cross-Modal Mapping and Reinforcement-Learning Visualization for Real-Time AR
by
Cheng, Cheng
2025
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Graph-Attention Fusion with VAE Cross-Modal Mapping and Reinforcement-Learning Visualization for Real-Time AR
by
Cheng, Cheng
2025
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Graph-Attention Fusion with VAE Cross-Modal Mapping and Reinforcement-Learning Visualization for Real-Time AR
Journal Article
Graph-Attention Fusion with VAE Cross-Modal Mapping and Reinforcement-Learning Visualization for Real-Time AR
2025
Request Book From Autostore
and Choose the Collection Method
Overview
In AR scenarios, the intelligent generation and visualization of multimodal perception information face challenges such as feature heterogeneity, insufficient semantic alignment, and unstable real-time performance. To address these issues, this study proposes a feature modeling method that integrates an Attention-GCN for multimodal fusion, a variational autoencoder (VAE) with geometric/temporal constraints for cross-modal mapping, and a reinforcement learning (PPO) driven optimization mechanism to form a \"perception–generation–presentation–feedback\" closed-loop system. Experiments are conducted on a self-built multimodal dataset of 28,000 sequences, with results evaluated on a held-out test set to ensure reliability. Baseline comparisons include a unimodal CNN and a heuristic fusion model under the same computational conditions. Results demonstrate that the proposed framework achieves an average delay of 1.42 ± 0.08 s, frame rate of 57 ± 1.5 fps, semantic alignment rate of 92.4% ± 1.1, and interaction interruption rate of 3.5% ± 0.4, outperforming baselines in efficiency, semantic consistency, and rendering stability. These findings highlight the framework’s feasibility for real-time multimodal interaction in AR scenarios and its scalability across mid-range devices.
MBRLCatalogueRelatedBooks
Related Items
Related Items
We currently cannot retrieve any items related to this title. Kindly check back at a later time.
This website uses cookies to ensure you get the best experience on our website.