Catalogue Search | MBRL

Generating Multi-View Action Data from a Monocular Camera Video by Fusing Human Mesh Recovery and 3D Scene Reconstruction

by Kim, Hyunsu , Son, Yunsik in 3D scene reconstruction , Cameras , Computer vision

2025

Multi-view data, captured from various perspectives, is crucial for training view-invariant human action recognition models, yet its acquisition is hindered by spatio-temporal constraints and high costs. This study aims to develop the Pose Scene EveryWhere (PSEW) framework, which automatically generates temporally consistent, multi-view 3D human action data from a single monocular video. The proposed framework first predicts 3D human parameters from each video frame using a deep learning-based Human Mesh Recovery (HMR) model. Subsequently, it applies tracking, linear interpolation, and Kalman filtering to refine temporal consistency and produce naturalistic motion. The refined human meshes are then reconstructed into a virtual 3D scene by estimating a stable floor plane for alignment, and finally, novel-view videos are rendered using user-defined virtual cameras. As a result, the framework successfully generated multi-view data with realistic, jitter-free motion from a single video input. To assess fidelity to the original motion, we used Root Mean Square Error (RMSE) and Mean Per Joint Position Error (MPJPE) as metrics, achieving low average errors in both 2D (RMSE: 0.172; MPJPE: 0.202) and 3D (RMSE: 0.145; MPJPE: 0.206) space. PSEW provides an efficient, scalable, and low-cost solution that overcomes the limitations of traditional data collection methods, offering a remedy for the scarcity of training data for action recognition models.

Journal Article

Share this book

Add to My Shelf

A multi‐view learning approach with diffusion model to synthesize FDG PET from MRI T1WI for diagnosis of Alzheimer's disease

by Xiao, Weizhong , Dening, Tom , Huang, Yueqin in Alzheimer Disease - diagnosis , Alzheimer Disease - diagnostic imaging , Alzheimer's disease

2025

INTRODUCTION This study presents a novel multi‐view learning approach for machine learning (ML)–based Alzheimer's disease (AD) diagnosis. METHODS A diffusion model is proposed to synthesize the fluorodeoxyglucose positron emission tomography (FDG PET) view from the magnetic resonance imaging T1 weighted imaging (MRI T1WI) view and incorporate two synthesis strategies: one‐way synthesis and two‐way synthesis. To assess the utility of the synthesized views, we use multilayer perceptron (MLP)–based classifiers with various combinations of the views. RESULTS The two‐way synthesis achieves state‐of‐the‐art performance with a structural similarity index measure (SSIM) at 0.9380 and a peak‐signal‐to‐noise ratio (PSNR) at 26.47. The one‐way synthesis achieves an SSIM at 0.9282 and a PSNR at 23.83. Both synthesized FDG PET views have shown their effectiveness in improving diagnostic accuracy. DISCUSSION This work supports the notion that ML‐based cross‐domain data synthesis can be a useful approach to improve AD diagnosis by providing additional synthesized disease‐related views for multi‐view learning. Highlights We propose a diffusion model with two strategies to synthesize fluorodeoxyglucose positron emission tomography (FDG PET) from magnetic resonance imaging T1 weighted imaging (MRI T1WI). We raise multi‐view learning with MRl T1Wl and synthesized FDG PET for Alzheimer's disease (AD) diagnosis. We provide a comprehensive experimental comparison for the synthesized FDG PET view. The feasibility of synthesized FDG PET view in AD diagnosis is validated with various experiments. We demonstrate the ability of synthesized FDG PET to enhance the performance of machine learning–based AD diagnosis.

Journal Article

Share this book

Add to My Shelf

SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis

by Xu, Kuo , Cao, Yang-Jie , Li, Zhen-Qiang in Artificial Intelligence , Computer Science , Computers

2024

Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.

Journal Article

Share this book

Add to My Shelf

3D Video watermarking for MVD based view-synthesis and RST attack

by Rana, Shuvendu in Computer Communication Networks , Computer Science , Data Structures and Information Theory

2024

Security in terms of copyright measurement for digital media distribution is the most challenging task. To maintain the digital right in 3D media, a watermarking scheme is proposed for Multi-view Video plus Depth (MVD) representation to sustain against the view synthesis and RST attack. The Singular Value Decomposition (SVD) is carried out on the left and the right video sequences to find view-invariant coefficients for watermark insertion. Motion compensated Discrete Cosine Transform (DCT) based Temporal Filtering (MCDCT-TF) is used in the temporal direction to make the scheme robust against video compression attack. The 2D Discrete Wavelet Transform (2D-DWT ) is processed on the temporally filtered low-pass frames as a pre-processing to get to make the SVD coefficients more connected or say correlated in between the 3D view such that robustness can be achieved against RST and view synthesis with minimum visual degradation. A set of experiments is carried out with different 3D video sequences to justify the robustness of the proposed scheme over the RST attack.

Journal Article

Share this book

Add to My Shelf

A Dual-UNet Diffusion Framework for Personalized Panoramic Generation

by Xiang, Shiming , Huo, Leigang , Shen, Jing in Artificial intelligence , Cameras , Comparative analysis

2026

While text-to-image and customized generation methods demonstrate strong capabilities in single-image generation, they fall short in supporting immersive applications that require coherent 360° panoramas. Conversely, existing panorama generation models lack customization capabilities. In panoramic scenes, reference objects often appear as minor background elements and may be multiple in number, while reference images across different views exhibit weak correlations. To address these challenges, we propose a diffusion-based framework for customized multi-view image generation. Our approach introduces a decoupled feature injection mechanism within a dual-UNet architecture to handle weakly correlated reference images, effectively integrating spatial information by concurrently feeding both reference images and noise into the denoising branch. A hybrid attention mechanism enables deep fusion of reference features and multi-view representations. Furthermore, a data augmentation strategy facilitates viewpoint-adaptive pose adjustments, and panoramic coordinates are employed to guide multi-view attention. The experimental results demonstrate our model’s effectiveness in generating coherent, high-quality customized multi-view images.

Journal Article

Share this book

Add to My Shelf

Multi-View Synthesis of Sparse Projection of Absorption Spectra Based on Joint GRU and U-Net

by Huang, Xiaodong , Pei, Pan , Shi, Yanhui in Accuracy , Algorithms , Combustion

2024

Tunable diode laser absorption spectroscopy (TDLAS) technology, combined with chromatographic imaging algorithms, is commonly used for two-dimensional temperature and concentration measurements in combustion fields. However, obtaining critical temperature information from limited detection data is a challenging task in practical engineering applications due to the difficulty of deploying sufficient detection equipment and the lack of sufficient data to invert temperature and other distributions in the combustion field. Therefore, we propose a sparse projection multi-view synthesis model based on U-Net that incorporates the sequence learning properties of gated recurrent unit (GRU) and the generalization ability of residual networks, called GMResUNet. The datasets used for training all contain projection data with different degrees of sparsity. This study shows that the synthesized full projection data had an average relative error of 0.35%, a PSNR of 40.726, and a SSIM of 0.997 at a projection angle of 4. At projection angles of 2, 8, and 16, the average relative errors of the synthesized full projection data were 0.96%, 0.19%, and 0.18%, respectively. The temperature field reconstruction was performed separately for sparse and synthetic projections, showing that the application of the model can significantly improve the reconstruction accuracy of the temperature field of high-energy combustion.

Journal Article

Share this book

Add to My Shelf

A Synthesizing Semantic Characteristics Lung Nodules Classification Method Based on 3D Convolutional Neural Network

by Dong, Yanan , Wang, Meng , Gao, Bin in Artificial neural networks , attention mechanism , Attention task

2023

Early detection is crucial for the survival and recovery of lung cancer patients. Computer-aided diagnosis system can assist in the early diagnosis of lung cancer by providing decision support. While deep learning methods are increasingly being applied to tasks such as CAD (Computer-aided diagnosis system), these models lack interpretability. In this paper, we propose a convolutional neural network model that combines semantic characteristics (SCCNN) to predict whether a given pulmonary nodule is malignant. The model synthesizes the advantages of multi-view, multi-task and attention modules in order to fully simulate the actual diagnostic process of radiologists. The 3D (three dimensional) multi-view samples of lung nodules are extracted by spatial sampling method. Meanwhile, semantic characteristics commonly used in radiology reports are used as an auxiliary task and serve to explain how the model interprets. The introduction of the attention module in the feature fusion stage improves the classification of lung nodules as benign or malignant. Our experimental results using the LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative) show that this study achieves 95.45% accuracy and 97.26% ROC (Receiver Operating Characteristic) curve area. The results show that the method we proposed not only realize the classification of benign and malignant compared to standard 3D CNN approaches but can also be used to intuitively explain how the model makes predictions, which can assist clinical diagnosis.

Journal Article

Share this book

Add to My Shelf

Analysis of maximum tolerant depth distortion in view synthesis

by Hou, Chunping , Qi, Sumin , Wang, Laihua in Algorithms , Cameras , Distortion

2018

In view synthesis, pixels in an original view are warped into a virtual view with depth-image-based rendering (DIBR). During the procedure of DIBR, distortions in depth map may lead to geometric errors in the synthesized view which will induce quality degradation of synthesized view. Therefore, how to efficiently preserve the fidelity of depth information is extremely important. In this paper, we explore and develop a maximum tolerable depth distortion (MTDD) model to examine the allowable depth distortion which will not introduce any texture distortion for a rendered virtual view and accordingly develop. Experimental results show that a virtual view can be synthesized without introducing any geometric changes if depth distortions follow the MTDD specified thresholds.

Journal Article

Share this book

Add to My Shelf

A study of depth/texture bit-rate allocation in multi-view video plus depth compression

by Riou, Paul , Pressigout, Muriel , Morin, Luce in Coding , Data compression , Experiments

2013

Multi-view video plus depth (MVD) data offer a reliable representation of three-dimensional (3D) scenes for 3D video applications. This is a huge amount of data whose compression is an important challenge for researchers at the current time. Consisting of texture and depth video sequences, the question of the relationship between these two types of data regarding bit-rate allocation often raises. This paper questions the required ratio between texture and depth when encoding MVD data. In particular, the paper investigates the elements impacting on the best bit-rate ratio between depth and color: total bit-rate budget, input data features, encoding strategy, and assessed view.

Journal Article

Share this book

Add to My Shelf

Colour volumetric compression for realistic view synthesis applications

by Canagarajah, Nishan C. , Anantrasirichai, Nantheera , Redmill, David W. in Algorithms , Bandwidths , Coding

2011

Colour volumetric data, which is constructed from a set of multi-view images, is capable of providing realistic immersive experience. However it is not widely applicable due to its manifold increase in bandwidth. This paper presents a novel framework to achieve scalable volumetric compression. Based on wavelet transformation, data rearrangement algorithm is proposed to compact volumetric data leading to high efficiency of transformation. The colour data is rearranged using the characteristics of human visual system. A pre-processing scheme for adaptive resolution is also proposed in this paper. The low resolution overcomes the limitation of the data transmission at low bitrates, whilst the fine resolution improves the quality of the synthesised images. Results show significant improvement of the compression performance over the traditional 3D coding. Finally, effect of using residual coding is investigated in order to show a trade off between the compression and view synthesis performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter