Catalogue Search | MBRL

Improving the Efficiency of 3D Monocular Object Detection and Tracking for Road and Railway Smart Mobility

by Mauri, Antoine , Kounouho, Messmer , Evain, Alexandre in 3D bounding boxes estimation , Algorithms , Artificial Intelligence

2023

Three-dimensional (3D) real-time object detection and tracking is an important task in the case of autonomous vehicles and road and railway smart mobility, in order to allow them to analyze their environment for navigation and obstacle avoidance purposes. In this paper, we improve the efficiency of 3D monocular object detection by using dataset combination and knowledge distillation, and by creating a lightweight model. Firstly, we combine real and synthetic datasets to increase the diversity and richness of the training data. Then, we use knowledge distillation to transfer the knowledge from a large, pre-trained model to a smaller, lightweight model. Finally, we create a lightweight model by selecting the combinations of width, depth & resolution in order to reach a target complexity and computation time. Our experiments showed that using each method improves either the accuracy or the efficiency of our model with no significant drawbacks. Using all these approaches is especially useful for resource-constrained environments, such as self-driving cars and railway systems.

Journal Article

Share this book

Add to My Shelf

SECOND: Sparsely Embedded Convolutional Detection

by Mao, Yuxing , Li, Bo , Yan, Yan in 3D object detection , autonomous driving , convolutional neural networks

2018

LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed.

Journal Article

Share this book

Add to My Shelf

3D Object Detection for Autonomous Driving: A Comprehensive Survey

by Mao, Jiageng , Wang, Xiaogang , Shi, Shaoshuai in Object recognition , Perception , Telematics

2023

Autonomous driving, in recent years, has been receiving increasing attention for its potential to relieve drivers’ burdens and improve the safety of driving. In modern autonomous driving pipelines, the perception system is an indispensable component, aiming to accurately estimate the status of surrounding environments and provide reliable observations for prediction and planning. 3D object detection, which aims to predict the locations, sizes, and categories of the 3D objects near an autonomous vehicle, is an important part of a perception system. This paper reviews the advances in 3D object detection for autonomous driving. First, we introduce the background of 3D object detection and discuss the challenges in this task. Second, we conduct a comprehensive survey of the progress in 3D object detection from the aspects of models and sensory inputs, including LiDAR-based, camera-based, and multi-modal detection approaches. We also provide an in-depth analysis of the potentials and challenges in each category of methods. Additionally, we systematically investigate the applications of 3D object detection in driving systems. Finally, we conduct a performance analysis of the 3D object detection approaches, and we further summarize the research trends over the years and prospect the future directions of this area.

Journal Article

Share this book

Add to My Shelf

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

by Wang, Zhe , Guo, Chaoxu , Deng, Jiajun in Artificial neural networks , Neural networks , Object recognition

2023

3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about 3× faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of 150m×150m.

Journal Article

Share this book

Add to My Shelf

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

by Wang, Bei , Kuang, Hongwu , Zhang, Zehan in 3d object detection , Accuracy , autonomous driving

2020

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Journal Article

Share this book

Add to My Shelf

GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation

by Yuan, Yixuan , Zhang, Qijian , Hou, Junhui in Annotations , Boxes , Detectors

2023

The inherent ambiguity in ground-truth annotations of 3D bounding boxes, caused by occlusions, signal missing, or manual annotation errors, can confuse deep 3D object detectors during training, thus deteriorating detection accuracy. However, existing methods overlook such issues to some extent and treat the labels ass deterministic. In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects. Then, we propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables. The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors to build probabilistic detectors and supervise the learning of the localization uncertainty. Besides, we propose an uncertainty-aware quality estimator architecture in probabilistic detectors to guide the training of the IoU-branch with predicted localization uncertainty. We incorporate the proposed methods into various popular base 3D detectors and demonstrate significant and consistent performance gains on both KITTI and Waymo benchmark datasets. Especially, the proposed GLENet-VR outperforms all published LiDAR-based approaches by a large margin and achieves the top rank among single-modal methods on the challenging KITTI test set. The source code and pre-trained models are publicly available at https://github.com/Eaphan/GLENet.

Journal Article

Share this book

Add to My Shelf

Multi-Modal 3D Object Detection in Autonomous Driving: A Survey

by Wang, Yingjie , Mao, Qiuyu , Zhang, Yanyong in Algorithms , Autonomous cars , Autonomy

2023

The past decade has witnessed the rapid development of autonomous driving systems. However, it remains a daunting task to achieve full autonomy, especially when it comes to understanding the ever-changing, complex driving scenes. To alleviate the difficulty of perception, self-driving vehicles are usually equipped with a suite of sensors (e.g., cameras, LiDARs), hoping to capture the scenes with overlapping perspectives to minimize blind spots. Fusing these data streams and exploiting their complementary properties is thus rapidly becoming the current trend. Nonetheless, combining data that are captured by different sensors with drastically different ranging/ima-ging mechanisms is not a trivial task; instead, many factors need to be considered and optimized. If not careful, data from one sensor may act as noises to data from another sensor, with even poorer results by fusing them. Thus far, there has been no in-depth guidelines to designing the multi-modal fusion based 3D perception algorithms. To fill in the void and motivate further investigation, this survey conducts a thorough study of tens of recent deep learning based multi-modal 3D detection networks (with a special emphasis on LiDAR-camera fusion), focusing on their fusion stage (i.e., when to fuse), fusion inputs (i.e., what to fuse), and fusion granularity (i.e., how to fuse). These important design choices play a critical role in determining the performance of the fusion algorithm. In this survey, we first introduce the background of popular sensors used for self-driving, their data properties, and the corresponding object detection algorithms. Next, we discuss existing datasets that can be used for evaluating multi-modal 3D object detection algorithms. Then we present a review of multi-modal fusion based 3D detection networks, taking a close look at their fusion stage, fusion input and fusion granularity, and how these design choices evolve with time and technology. After the review, we discuss open challenges as well as possible solutions. We hope that this survey can help researchers to get familiar with the field and embark on investigations in the area of multi-modal 3D object detection.

Journal Article

Share this book

Add to My Shelf

DART3D: Depth‐Aware Robust Adversarial Training for Monocular 3D Object Detection

by Shang, Xiaoke , Li, Xingyuan , Ren, Bohua

2025

Monocular 3D object detection plays a pivotal role in the field of autonomous driving and numerous deep learning‐based methods have made significant breakthroughs in this area. Despite the advancements in detection accuracy and efficiency, these models tend to fail when faced with adversarial attacks, rendering them ineffective. Therefore, bolstering the adversarial robustness of 3D detection models has become a critical issue. To mitigate this issue, we propose a depth‐aware robust adversarial training method for monocular 3D object detection, dubbed DART3D. Specifically, we first design an adversarial attack that iteratively degrades the 2D and 3D perception capabilities of 3D object detection models (iterative deterioration of perception), serving as the foundation for our subsequent defense mechanism. In response to this attack, we propose an uncertainty‐based residual learning method for adversarial training. Our adversarial training leverages inherent uncertainty to boost robustness against attacks while incorporating depth‐aware information enhances resistance to perturbations in both 2D and 3D domains. We conducted extensive experiments on the KITTI 3D dataset, showing that DART3D outperforms direct adversarial training in 3D object detection for the car category, with improvements of 4.415%, 4.112% and 3.195% in easy, moderate and hard settings, respectively.

Journal Article

Share this book

Add to My Shelf

A Survey on Deep Learning Based Segmentation, Detection and Classification for 3D Point Clouds

by Avots, Egils , Anbarjafari, Gholamreza , Karabulut, Dogus in 3D object classification , 3D object detection , 3D object recognition

2023

The computer vision, graphics, and machine learning research groups have given a significant amount of focus to 3D object recognition (segmentation, detection, and classification). Deep learning approaches have lately emerged as the preferred method for 3D segmentation problems as a result of their outstanding performance in 2D computer vision. As a result, many innovative approaches have been proposed and validated on multiple benchmark datasets. This study offers an in-depth assessment of the latest developments in deep learning-based 3D object recognition. We discuss the most well-known 3D object recognition models, along with evaluations of their distinctive qualities.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks

by Hou, Junhui , Zhang, Yifan , Yuan, Yixuan in Deep learning , Detectors , Lidar

2024

Recent years have witnessed significant advancements in deep learning-based 3D object detection, leading to its widespread adoption in numerous applications. As 3D object detectors become increasingly crucial for security-critical tasks, it is imperative to understand their robustness against adversarial attacks. This paper presents the first comprehensive evaluation and analysis of the robustness of LiDAR-based 3D detectors under adversarial attacks. Specifically, we extend three distinct adversarial attacks to the 3D object detection task, benchmarking the robustness of state-of-the-art LiDAR-based 3D object detectors against attacks on the KITTI and Waymo datasets. We further analyze the relationship between robustness and detector properties. Additionally, we explore the transferability of cross-model, cross-task, and cross-data attacks. Thorough experiments on defensive strategies for 3D detectors are conducted, demonstrating that simple transformations like flipping provide little help in improving robustness when the applied transformation strategy is exposed to attackers. Finally, we propose balanced adversarial focal training, based on conventional adversarial training, to strike a balance between accuracy and robustness. Our findings will facilitate investigations into understanding and defending against adversarial attacks on LiDAR-based 3D object detectors, thus advancing the field. The source code is publicly available at https://github.com/Eaphan/Robust3DOD.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter