Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
16 result(s) for "multi-view image synthesis"
Sort by:
Two-View Mammogram Synthesis from Single-View Data Using Generative Adversarial Networks
While two-view mammography taking both mediolateral-oblique (MLO) and cranio-caudual (CC) views is the current standard method of examination in breast cancer screening, single-view mammography is still being performed in some countries on women of specific ages. The rate of cancer detection is lower with single-view mammography than for two-view mammography, due to the lack of available image information. The goal of this work is to improve single-view mammography’s ability to detect breast cancer by providing two-view mammograms from single projections. The synthesis of novel-view images from single-view data has recently been achieved using generative adversarial networks (GANs). Here, we apply complete representation GAN (CR-GAN), a novel-view image synthesis model, aiming to produce CC-view mammograms from MLO views. Additionally, we incorporate two adaptations—the progressive growing (PG) technique and feature matching loss—into CR-GAN. Our results show that use of the PG technique reduces the training time, while the synthesized image quality is improved when using feature matching loss, compared with the method using only CR-GAN. Using the proposed method with the two adaptations, CC views similar to real views are successfully synthesized for some cases, but not all cases; in particular, image synthesis is rarely successful when calcifications are present. Even though the image resolution and quality are still far from clinically acceptable levels, our findings establish a foundation for further improvements in clinical applications. As the first report applying novel-view synthesis in medical imaging, this work contributes by offering a methodology for two-view mammogram synthesis.
Face aging with pixel-level alignment GAN
Face aging is of great significance in cross-time identity verification problem. However, there is still a huge gap between the synthesized face image and the real face in terms of quality and consistency due to identity ambiguity and image distortion caused by existing face aging methods. To meet this challenge, we propose a face aging framework named as Pixel-level Alignment GAN, PAGAN, to synthesize faces of different age groups. Face images are featured by age, identity, and fine-grained pixel-value to ensure the quality, which is a typical multi-task learning problem. The proposed face aging framework with PAGAN is a combination of age estimation, identity preservation, and image de-noising. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art methods not only in the accuracy of age classification but also in the image quality. With the proposed PAGAN, the face recognition accuracy with synthesized images has increased 0.21% and the image quality rating has increased around 5%, which proves the effectiveness and validity of proposed method.
Remote Sensing Neural Radiance Fields for Multi-View Satellite Photogrammetry
Neural radiance fields (NeRFs) combining machine learning with differentiable rendering have arisen as one of the most promising approaches for novel view synthesis and depth estimates. However, NeRFs only applies to close-range static imagery and it takes several hours to train the model. The satellites are hundreds of kilometers from the earth. Satellite multi-view images are usually captured over several years, and the scene of images is dynamic in the wild. Therefore, multi-view satellite photogrammetry is far beyond the capabilities of NeRFs. In this paper, we present a new method for multi-view satellite photogrammetry of Earth observation called remote sensing neural radiance fields (RS-NeRFs). It aims to generate novel view images and accurate elevation predictions quickly. For each scene, we train an RS-NeRF using high-resolution optical images without labels or geometric priors and apply image reconstruction losses for self-supervised learning. Multi-date images exhibit significant changes in appearance, mainly due to cars and varying shadows, which brings challenges to satellite photogrammetry. Robustness to these changes is achieved by the input of solar ray direction and the vehicle removal method. NeRFs make it intolerable by requiring a very long time to train an easy scene. In order to significantly reduce the training time of RS-NeRFs, we build a tiny network with HashEncoder and adopted a new sampling technique with our custom CUDA kernels. Compared with previous work, our method performs better on novel view synthesis and elevation estimates, taking several minutes.
SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis
Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.
Robust Local Light Field Synthesis via Occlusion-aware Sampling and Deep Visual Feature Fusion
Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence. Rendering a locally immersive light field (LF) based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques. In this work, we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view. To fully explore the precious information from source LF captures, we propose a novel occlusion-aware source sampler (OSS) module which efficiently transfers the pixels of source views to the target view’s frustum in an occlusion-aware manner. An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF. The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles, but also proves to be able to effectively enhance the visual rendering quality. Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.
MM-NeRF: Large-Scale Scene Representation with Multi-Resolution Hash Grid and Multi-View Priors Features
Reconstructing large-scale scenes using Neural Radiance Fields (NeRFs) is a research hotspot in 3D computer vision. Existing MLP (multi-layer perception)-based methods often suffer from issues of underfitting and a lack of fine details in rendering large-scale scenes. Popular solutions are to divide the scene into small areas for separate modeling or to increase the layer scale of the MLP network. However, the subsequent problem is that the training cost increases. Moreover, reconstructing large scenes, unlike object-scale reconstruction, involves a geometrically considerable increase in the quantity of view data if the prior information of the scene is not effectively utilized. In this paper, we propose an innovative method named MM-NeRF, which integrates efficient hybrid features into the NeRF framework to enhance the reconstruction of large-scale scenes. We propose employing a dual-branch feature capture structure, comprising a multi-resolution 3D hash grid feature branch and a multi-view 2D prior feature branch. The 3D hash grid feature models geometric details, while the 2D prior feature supplements local texture information. Our experimental results show that such integration is sufficient to render realistic novel views with fine details, forming a more accurate geometric representation. Compared with representative methods in the field, our method significantly improves the PSNR (Peak Signal-to-Noise Ratio) by approximately 5%. This remarkable progress underscores the outstanding contribution of our method in the field of large-scene radiance field reconstruction.
A Dual-UNet Diffusion Framework for Personalized Panoramic Generation
While text-to-image and customized generation methods demonstrate strong capabilities in single-image generation, they fall short in supporting immersive applications that require coherent 360° panoramas. Conversely, existing panorama generation models lack customization capabilities. In panoramic scenes, reference objects often appear as minor background elements and may be multiple in number, while reference images across different views exhibit weak correlations. To address these challenges, we propose a diffusion-based framework for customized multi-view image generation. Our approach introduces a decoupled feature injection mechanism within a dual-UNet architecture to handle weakly correlated reference images, effectively integrating spatial information by concurrently feeding both reference images and noise into the denoising branch. A hybrid attention mechanism enables deep fusion of reference features and multi-view representations. Furthermore, a data augmentation strategy facilitates viewpoint-adaptive pose adjustments, and panoramic coordinates are employed to guide multi-view attention. The experimental results demonstrate our model’s effectiveness in generating coherent, high-quality customized multi-view images.
GAN Based Three-Stage-Training Algorithm for Multi-view Facial Expression Recognition
With the rapid development of deep learning, the performance on facial expression recognition (FER) has been greatly improved. However, multi-view FER, where the facial information is difficult to be fully acquired due to pose deflections, remains challenging. This paper presents a generative adversarial network (GAN) based three-stage-training algorithm for multi-view FER. Specifically, the frontal-view faces are firstly adopted to train a convolutional neural network (CNN) in an efficient way. Then, the pre-trained CNN is embedded in a well-designed GAN such that the faces with different deflections can be synthesized as the frontal-view ones containing expression features. The original and synthesized faces are finally fused and employed to retrain the CNN classifier such that the multi-view FER problem is efficiently resolved. Experimental results demonstrate that the three-stage-training algorithm can achieve better performance than some existing methods.
Enhancing Structure-from-Motion: Re-aligning images with 3D Gaussian splatting
Abstract This paper presents a novel algorithm that enhances Structure-from-Motion (SfM) pipelines by integrating efficient novel view synthesis (NVS) using 3D Gaussian Splatting (3DGS). Traditional SfM methods often fail in challenging scenarios with repetitive structures, outliers or misaligned images, leading to ghost and doppelganger artifacts. To address this, we propose an NVS-guided evaluation and correction framework that leverages 3DGS-rendered views to detect and rectify misaligned images, improving reconstruction accuracy. Our method not only enhances SfM outputs but also benefits downstream tasks such as dense reconstruction [multi-view stereo (MVS)], NVS, and visual localization. Experiments on public datasets and our real-world indoor dataset show that state-of-the-art baselines fail in several ambiguous scenes where our approach succeeds. For example, on the Cambridge Landmarks dataset (Shop Facade), our method reduces localization error from 5.7 to 5.4 cm and improves SSIM from 0.735 to 0.787. These results confirm the robustness and broad applicability of our framework, demonstrating that while 3DGS is an effective tool, the core contribution lies in the proposed SfM re-alignment strategy. Graphical Abstract Graphical Abstract Schematic illustration of the SfM re-alignment strategy using 3D Gaussian Splatting.
A Synthesizing Semantic Characteristics Lung Nodules Classification Method Based on 3D Convolutional Neural Network
Early detection is crucial for the survival and recovery of lung cancer patients. Computer-aided diagnosis system can assist in the early diagnosis of lung cancer by providing decision support. While deep learning methods are increasingly being applied to tasks such as CAD (Computer-aided diagnosis system), these models lack interpretability. In this paper, we propose a convolutional neural network model that combines semantic characteristics (SCCNN) to predict whether a given pulmonary nodule is malignant. The model synthesizes the advantages of multi-view, multi-task and attention modules in order to fully simulate the actual diagnostic process of radiologists. The 3D (three dimensional) multi-view samples of lung nodules are extracted by spatial sampling method. Meanwhile, semantic characteristics commonly used in radiology reports are used as an auxiliary task and serve to explain how the model interprets. The introduction of the attention module in the feature fusion stage improves the classification of lung nodules as benign or malignant. Our experimental results using the LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative) show that this study achieves 95.45% accuracy and 97.26% ROC (Receiver Operating Characteristic) curve area. The results show that the method we proposed not only realize the classification of benign and malignant compared to standard 3D CNN approaches but can also be used to intuitively explain how the model makes predictions, which can assist clinical diagnosis.