Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
149 result(s) for "Weinmann, Michael"
Sort by:
A Classification-Segmentation Framework for the Detection of Individual Trees in Dense MMS Point Cloud Data Acquired in Urban Areas
In this paper, we present a novel framework for detecting individual trees in densely sampled 3D point cloud data acquired in urban areas. Given a 3D point cloud, the objective is to assign point-wise labels that are both class-aware and instance-aware, a task that is known as instance-level segmentation. To achieve this, our framework addresses two successive steps. The first step of our framework is given by the use of geometric features for a binary point-wise semantic classification with the objective of assigning semantic class labels to irregularly distributed 3D points, whereby the labels are defined as “tree points” and “other points”. The second step of our framework is given by a semantic segmentation with the objective of separating individual trees within the “tree points”. This is achieved by applying an efficient adaptation of the mean shift algorithm and a subsequent segment-based shape analysis relying on semantic rules to only retain plausible tree segments. We demonstrate the performance of our framework on a publicly available benchmark dataset, which has been acquired with a mobile mapping system in the city of Delft in the Netherlands. This dataset contains 10.13 M labeled 3D points among which 17.6 % are labeled as “tree points”. The derived results clearly reveal a semantic classification of high accuracy (up to 90.77 %) and an instance-level segmentation of high plausibility, while the simplicity, applicability and efficiency of the involved methods even allow applying the complete framework on a standard laptop computer with a reasonable processing time (less than 2.5 h).
Geospatial Computer Vision Based on Multi-Modal Data—How Valuable Is Shape Information for the Extraction of Semantic Information?
In this paper, we investigate the value of different modalities and their combination for the analysis of geospatial data of low spatial resolution. For this purpose, we present a framework that allows for the enrichment of geospatial data with additional semantics based on given color information, hyperspectral information, and shape information. While the different types of information are used to define a variety of features, classification based on these features is performed using a random forest classifier. To draw conclusions about the relevance of different modalities and their combination for scene analysis, we present and discuss results which have been achieved with our framework on the MUUFL Gulfport Hyperspectral and LiDAR Airborne Data Set.
Constraint-Based Optimized Human Skeleton Extraction from Single-Depth Camera
As a cutting-edge research topic in computer vision and graphics for decades, human skeleton extraction from single-depth camera remains challenging due to possibly occurring occlusions of different body parts, huge appearance variations, and sensor noise. In this paper, we propose to incorporate human skeleton length conservation and symmetry priors as well as temporal constraints to enhance the consistency and continuity for the estimated skeleton of a moving human body. Given an initial estimation of the skeleton joint positions provided per frame by the Kinect SDK or Nuitrack SDK, which do not follow the aforementioned priors and can prone to errors, our framework improves the accuracy of these pose estimates based on the length and symmetry constraints. In addition, our method is device-independent and can be integrated into skeleton extraction SDKs for refinement, allowing the detection of outliers within the initial joint location estimates and predicting new joint location estimates following the temporal observations. The experimental results demonstrate the effectiveness and robustness of our approach in several cases.
GEOMETRIC FEATURES AND THEIR RELEVANCE FOR 3D POINT CLOUD CLASSIFICATION
In this paper, we focus on the automatic interpretation of 3D point cloud data in terms of associating a class label to each 3D point. While much effort has recently been spent on this research topic, little attention has been paid to the influencing factors that affect the quality of the derived classification results. For this reason, we investigate fundamental influencing factors making geometric features more or less relevant with respect to the classification task. We present a framework which consists of five components addressing point sampling, neighborhood recovery, feature extraction, classification and feature relevance assessment. To analyze the impact of the main influencing factors which are represented by the given point sampling and the selected neighborhood type, we present the results derived with different configurations of our framework for a commonly used benchmark dataset for which a reference labeling with respect to three structural classes (linear structures, planar structures and volumetric structures) as well as a reference labeling with respect to five semantic classes (Wire, Pole/Trunk, Façade, Ground and Vegetation) is available.
Density-based Geometric Convergence of NeRFs at Training Time: Insights from Spatio-temporal Discretization
Whereas emerging learning-based scene representations are predominantly evaluated based on image quality metrics such as PSNR, SSIM or LPIPS, only a few investigations focus on the evaluation of geometric accuracy of the underlying model. In contrast to only demonstrating the geometric deviations of models for the fully optimized scene model, our work aims at investigating the geometric convergence behavior during the optimization. For this purpose, we analyze the geometric convergence of discretized density fields by leveraging respectively derived point cloud representations for different training steps during the optimization of the scene representation and their comparison based on established point cloud metrics, thereby allowing insights regarding which scene parts are already represented well within the scene representation at a certain time during the optimization. By demonstrating that certain regions reach convergence earlier than other regions in the scene, we provide the motivation regarding future developments on locally-guided optimization approaches to shift the computational burden to the adjustment of regions that still need to converge while leaving converged regions unchanged which might help to further reduce training time and improve the achieved quality.
The Potential of Neural Radiance Fields and 3D Gaussian Splatting for 3D Reconstruction from Aerial Imagery
In this paper, we focus on investigating the potential of advanced Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting for 3D scene reconstruction from aerial imagery obtained via sensor platforms with an almost nadir-looking camera. Such a setting for image acquisition is convenient for capturing large-scale urban scenes, yet it poses particular challenges arising from imagery with large overlap, very short baselines, similar viewing direction and almost the same but large distance to the scene, and it therefore differs from the usual object-centric scene capture. We apply a traditional approach for image-based 3D reconstruction (COLMAP), a modern NeRF-based approach (Nerfacto) and a representative for the recently introduced 3D Gaussian Splatting approaches (Splatfacto), where the latter two are provided in the Nerfstudio framework. We analyze results achieved on the recently released UseGeo dataset both quantitatively and qualitatively. The achieved results reveal that the traditional COLMAP approach still outperforms Nerfacto and Splatfacto approaches for various scene characteristics, such as less-textured areas, areas with high vegetation, shadowed areas and areas observed from only very few views.
SEMANTIC SEGMENTATION OF AERIAL IMAGERY VIA MULTI-SCALE SHUFFLING CONVOLUTIONAL NEURAL NETWORKS WITH DEEP SUPERVISION
In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.
Incomplete Gamma Kernels: Generalizing Locally Optimal Projection Operators
We present incomplete gamma kernels, a generalization of Locally Optimal Projection (LOP) operators. In particular, we reveal the relation of the classical localized \\( L_1 \\) estimator, used in the LOP operator for point cloud denoising, to the common Mean Shift framework via a novel kernel. Furthermore, we generalize this result to a whole family of kernels that are built upon the incomplete gamma function and each represents a localized \\( L_p \\) estimator. By deriving various properties of the kernel family concerning distributional, Mean Shift induced, and other aspects such as strict positive definiteness, we obtain a deeper understanding of the operator's projection behavior. From these theoretical insights, we illustrate several applications ranging from an improved Weighted LOP (WLOP) density weighting scheme and a more accurate Continuous LOP (CLOP) kernel approximation to the definition of a novel set of robust loss functions. These incomplete gamma losses include the Gaussian and LOP loss as special cases and can be applied to various tasks including normal filtering. Furthermore, we show that the novel kernels can be included as priors into neural networks. We demonstrate the effects of each application in a range of quantitative and qualitative experiments that highlight the benefits induced by our modifications.
Teaching the Incompressible Navier-Stokes Equations to Fast Neural Surrogate Models in 3D
Physically plausible fluid simulations play an important role in modern computer graphics and engineering. However, in order to achieve real-time performance, computational speed needs to be traded-off with physical accuracy. Surrogate fluid models based on neural networks have the potential to achieve both, fast fluid simulations and high physical accuracy. However, these approaches rely on massive amounts of training data, require complex pipelines for training and inference or do not generalize to new fluid domains. In this work, we present significant extensions to a recently proposed deep learning framework, which addresses the aforementioned challenges in 2D. We go from 2D to 3D and propose an efficient architecture to cope with the high demands of 3D grids in terms of memory and computational complexity. Furthermore, we condition the neural fluid model on additional information about the fluid's viscosity and density which allows simulating laminar as well as turbulent flows based on the same surrogate model. Our method allows to train fluid models without requiring fluid simulation data beforehand. Inference is fast and simple, as the fluid model directly maps a fluid state and boundary conditions at a moment t to a subsequent fluid state at t+dt. We obtain real-time fluid simulations on a 128x64x64 grid that include various fluid phenomena such as the Magnus effect or Karman vortex streets and generalize to domain geometries not considered during training. Our method indicates strong improvements in terms of accuracy, speed and generalization capabilities over current 3D NN-based fluid models.
Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments
Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices.