Asset Details

MbrlCatalogueTitleDetail

Journal Article

Graph-Attention Fusion with VAE Cross-Modal Mapping and Reinforcement-Learning Visualization for Real-Time AR

Cheng, Cheng

2025

Overview

In AR scenarios, the intelligent generation and visualization of multimodal perception information face challenges such as feature heterogeneity, insufficient semantic alignment, and unstable real-time performance. To address these issues, this study proposes a feature modeling method that integrates an Attention-GCN for multimodal fusion, a variational autoencoder (VAE) with geometric/temporal constraints for cross-modal mapping, and a reinforcement learning (PPO) driven optimization mechanism to form a \"perception–generation–presentation–feedback\" closed-loop system. Experiments are conducted on a self-built multimodal dataset of 28,000 sequences, with results evaluated on a held-out test set to ensure reliability. Baseline comparisons include a unimodal CNN and a heuristic fusion model under the same computational conditions. Results demonstrate that the proposed framework achieves an average delay of 1.42 ± 0.08 s, frame rate of 57 ± 1.5 fps, semantic alignment rate of 92.4% ± 1.1, and interaction interruption rate of 3.5% ± 0.4, outperforming baselines in efficiency, semantic consistency, and rendering stability. These findings highlight the framework’s feasibility for real-time multimodal interaction in AR scenarios and its scalability across mid-range devices.

Share this book

Add to My Shelf

MBRLCatalogueRelatedBooks

Related Items

We currently cannot retrieve any items related to this title. Kindly check back at a later time.

Language Selector

MBRLGlobalSearch