Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
523 result(s) for "multi-view images"
Sort by:
PDE-Based 3D Surface Reconstruction from Multi-View 2D Images
Partial differential equation (PDE) based surfaces own a lot of advantages, compared to other types of 3D representation. For instance, fewer variables are required to represent the same 3D shape; the position, tangent, and even curvature continuity between PDE surface patches can be naturally maintained when certain conditions are satisfied, and the physics-based nature is also kept. Although some works applied implicit PDEs to 3D surface reconstruction from images, there is little work on exploiting the explicit solutions of PDE to this topic, which is more efficient and accurate. In this paper, we propose a new method to apply the explicit solutions of a fourth-order partial differential equation to surface reconstruction from multi-view images. The method includes two stages: point clouds data are extracted from multi-view images in the first stage, which is followed by PDE-based surface reconstruction from the obtained point clouds data. Our computational experiments show that the reconstructed PDE surfaces exhibit good quality and can recover the ground truth with high accuracy. A comparison between various solutions with different complexity to the fourth-order PDE is also made to demonstrate the power and flexibility of our proposed explicit PDE for surface reconstruction from images.
Two-View Mammogram Synthesis from Single-View Data Using Generative Adversarial Networks
While two-view mammography taking both mediolateral-oblique (MLO) and cranio-caudual (CC) views is the current standard method of examination in breast cancer screening, single-view mammography is still being performed in some countries on women of specific ages. The rate of cancer detection is lower with single-view mammography than for two-view mammography, due to the lack of available image information. The goal of this work is to improve single-view mammography’s ability to detect breast cancer by providing two-view mammograms from single projections. The synthesis of novel-view images from single-view data has recently been achieved using generative adversarial networks (GANs). Here, we apply complete representation GAN (CR-GAN), a novel-view image synthesis model, aiming to produce CC-view mammograms from MLO views. Additionally, we incorporate two adaptations—the progressive growing (PG) technique and feature matching loss—into CR-GAN. Our results show that use of the PG technique reduces the training time, while the synthesized image quality is improved when using feature matching loss, compared with the method using only CR-GAN. Using the proposed method with the two adaptations, CC views similar to real views are successfully synthesized for some cases, but not all cases; in particular, image synthesis is rarely successful when calcifications are present. Even though the image resolution and quality are still far from clinically acceptable levels, our findings establish a foundation for further improvements in clinical applications. As the first report applying novel-view synthesis in medical imaging, this work contributes by offering a methodology for two-view mammogram synthesis.
Reliable Estimation of Deterioration Levels via Late Fusion Using Multi-View Distress Images for Practical Inspection
This paper presents reliable estimation of deterioration levels via late fusion using multi-view distress images for practical inspection. The proposed method simultaneously solves the following two problems that are necessary to support the practical inspection. Since maintenance of infrastructures requires a high level of safety and reliability, this paper proposes a neural network that can generate an attention map from distress images and text data acquired during the inspection. Thus, deterioration level estimation with high interpretability can be realized. In addition, since multi-view distress images are taken for single distress during the actual inspection, it is necessary to estimate the final result from these images. Therefore, the proposed method integrates estimation results obtained from the multi-view images via the late fusion and can derive an appropriate result considering all the images. To the best of our knowledge, no method has been proposed to solve these problems simultaneously, and this achievement is the biggest contribution of this paper. In this paper, we confirm the effectiveness of the proposed method by conducting experiments using data acquired during the actual inspection.
IFE-CMT: Instance-Aware Fine-Grained Feature Enhancement Cross Modal Transformer for 3D Object Detection
In recent years, multi-modal 3D object detection algorithms have experienced significant development. However, current algorithms primarily focus on designing overall fusion strategies for multi-modal features, neglecting finer-grained representations, which leads to a decline in the detection accuracy of small objects. To address this issue, this paper proposes the Instance-aware Fine-grained feature Enhancement Cross Modal Transformer (IFE-CMT) model. We designed an Instance feature Enhancement Module (IE-Module), which can accurately extract object features from multi-modal data and use them to enhance overall features while avoiding view transformations and maintaining low computational overhead. Additionally, we design a new point cloud branch network that effectively expands the network’s receptive field, enhancing the model’s semantic expression capabilities while preserving texture details of the objects. Experimental results on the nuScenes dataset demonstrate that compared to the CMT model, our proposed IFE-CMT model improves mAP and NDS by 2.1% and 0.8% on the validation set, respectively. On the test set, it improves mAP and NDS by 1.9% and a 0.7%. Notably, for small object categories such as bicycles and motorcycles, the mAP improved by 6.6% and 3.7%, respectively, significantly enhancing the detection accuracy of small objects.
Highlight Removal of Multi-View Facial Images
Highlight removal is a fundamental and challenging task that has been an active field for decades. Although several methods have recently been improved for facial images, they are typically designed for a single image. This paper presents a lightweight optimization method for removing the specular highlight reflections of multi-view facial images. This is achieved by taking full advantage of the Lambertian consistency, which states that the diffuse component does not vary with the change in the viewing angle, while the specular component changes the behavior. We provide non-negative constraints on light and shading in all directions, rather than normal directions contained in the face, to obtain physically reliable properties. The removal of highlights is further facilitated through the estimation of illumination chromaticity, which is done by employing orthogonal subspace projection. An important practical feature of the proposed method does not require face reflectance priors. A dataset with ground truth for highlight removal of multi-view facial images is captured to quantitatively evaluate the performance of our method. We demonstrate the robustness and accuracy of our method through comparisons to existing methods for removing specular highlights and improvement in applications such as reconstruction.
Building element recognition with MTL-AINet considering view perspectives
The reconstruction and analysis of building models are crucial for the construction of smart cities. A refined building model can provide a reliable data support for data analysis and intelligent management of smart cities. The colors, textures, and geometric forms of building elements, such as building outlines, doors, windows, roof skylights, roof ridges, and advertisements, are diverse; therefore, it is challenging to accurately identify the various details of buildings. This article proposes the Multi-Task Learning AINet method that considers features such as color, texture, direction, and roll angle for building element recognition. The AINet is used as the basis function; the semantic projection map of color and texture, and direction and roll angle is used for multi-task learning, and the complex building facade is divided into similar semantic patches. Thereafter, the multi-semantic features are combined using hierarchical clustering with a region adjacency graph and the nearest neighbor graph to achieve an accurate recognition of building elements. The experimental results show that the proposed method has a higher accuracy for building detailed edges and can accurately extract detailed elements.
Accurate Robot Arm Attitude Estimation Based on Multi-View Images and Super-Resolution Keypoint Detection Networks
Robot arm monitoring is often required in intelligent industrial scenarios. A two-stage method for robot arm attitude estimation based on multi-view images is proposed. In the first stage, a super-resolution keypoint detection network (SRKDNet) is proposed. The SRKDNet incorporates a subpixel convolution module in the backbone neural network, which can output high-resolution heatmaps for keypoint detection without significantly increasing the computational resource consumption. Efficient virtual and real sampling and SRKDNet training methods are put forward. The SRKDNet is trained with generated virtual data and fine-tuned with real sample data. This method decreases the time and manpower consumed in collecting data in real scenarios and achieves a better generalization effect on real data. A coarse-to-fine dual-SRKDNet detection mechanism is proposed and verified. Full-view and close-up dual SRKDNets are executed to first detect the keypoints and then refine the results. The keypoint detection accuracy, PCK@0.15, for the real robot arm reaches up to 96.07%. In the second stage, an equation system, involving the camera imaging model, the robot arm kinematic model and keypoints with different confidence values, is established to solve the unknown rotation angles of the joints. The proposed confidence-based keypoint screening scheme makes full use of the information redundancy of multi-view images to ensure attitude estimation accuracy. Experiments on a real UR10 robot arm under three views demonstrate that the average estimation error of the joint angles is 0.53 degrees, which is superior to that achieved with the comparison methods.
High-Dynamic-Range Image Generation and Coding for Multi-exposure Multi-view Images
High-dynamic-range (HDR) images offer better visual quality that is much closer to reality by allowing a wider range of luminance. Because of the rarity of devices that directly capture/display scenes in HDR format, HDR images are usually generated using several low-dynamic-range (LDR) images with various exposure settings and then displayed on the conventional display, after tone mapping. This paper proposes HDR generation for multi-view images considering the need to provide the user with an expanded visual experience, not only in terms of a wider field of view (FOV), but also with a greater dynamic range. This novel technique generates HDR multi-view images in an efficient way, and only N LDR images are needed to render HDR images in N views. Furthermore, to efficiently transmit the multi-view image with various exposure, two coding architectures are proposed. The experimental results show that the proposed schemes are capable of achieving 39.5 % bitrate savings and give a rendered HDR image with greatly improved quality, compared to a conventional multi-view coding scheme.
Key Technologies of Seam Fusion for Multi-view Image Texture Mapping Based on 3D Point Cloud Data
With the rapid development of computer technology and measurement technology, three-dimensional point cloud data, as an important form of data in computer graphics, is used by light reactions in reverse engineering, surveying, robotics, virtual reality, stereo 3D imaging, Indoor scene reconstruction and many other fields. This paper aims to study the key technology of 3D point cloud data multi-view image texture mapping seam fusion, and propose a joint coding and compression scheme of multi-view image texture to replace the previous independent coding scheme of applying MVC standard compression to multi-view image texture. Experimental studies have shown that multi-view texture depth joint coding has different degrees of performance improvement compared with the other two current 3D MVD data coding schemes. Especially for Ballet and Dancer sequences with better depth video quality, the performance of JMVDC is very obvious. Compared with the KS_ IBP structure, the gain can reach as high as 1.34dB at the same bit rate.
Structure tensor-based Gaussian kernel edge-adaptive depth map refinement with triangular point view in images
Image reconstruction is the process of restoring the image resolution. In 3D image reconstruction, the objects in different viewpoints are processed with the triangular point view (TPV) method to estimate object geometry structure for 3D model. This work proposes a depth refinement methodology in preserving the geometric structure of objects using the structure tensor method with a Gaussian filter by transforming a series of 2D input images into a 3D model. The computation of depth map errors can be found by comparing the masked area/patch with the distribution of the original image's greyscale levels using the error pixel-based patch extraction algorithm. The presence of errors in the depth estimation could seriously deteriorate the quality of the 3D effect. The depth maps were iteratively refined based on histogram bins number to improve the accuracy of initial depth maps reconstructed from rigid objects. The existing datasets such as the dataset tanks and unit (DTU) and Middlebury datasets, were used to build the model out of the object scene structure. The results of this work have demonstrated that the proposed patch analysis outperformed the existing state of the art models depth refinement methods in terms of accuracy.