Catalogue Search | MBRL

Real-Time Semantic Segmentation with Dual Encoder and Self-Attention Mechanism for Autonomous Driving

by Chang, Yu-Bang , Tsai, Chieh , Lin, Chang-Hong in Accuracy , autonomous driving , Computer vision

2021

As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.

Journal Article

Share this book

Add to My Shelf

Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation

by Mei, Zhen , Lin, Chen , Ye, Peng in Computer vision , Discretization , Optimization

2022

Semantic segmentation is a popular research topic in computer vision, and many efforts have been made on it with impressive results. In this paper, we intend to search an optimal network structure that can run in real-time for this problem. Towards this goal, we jointly search the depth, channel, dilation rate and feature spatial resolution, which results in a search space consisting of about 2.78×10324 possible choices. To handle such a large search space, we leverage differential architecture search methods. However, the architecture parameters searched using existing differential methods need to be discretized, which causes the discretization gap between the architecture parameters found by the differential methods and their discretized version as the final solution for the architecture search. Hence, we relieve the problem of discretization gap from the innovative perspective of solution space regularization. Specifically, a novel Solution Space Regularization (SSR) loss is first proposed to effectively encourage the supernet to converge to its discrete one. Then, a new Hierarchical and Progressive Solution Space Shrinking method is presented to further achieve high efficiency of searching. In addition, we theoretically show that the optimization of SSR loss is equivalent to the L0-norm regularization, which accounts for the improved search-evaluation gap. Comprehensive experiments show that the proposed search scheme can efficiently find an optimal network structure that yields an extremely fast speed (175 FPS) of segmentation with a small model size (1 M) while maintaining comparable accuracy.

Journal Article

Share this book

Add to My Shelf

Enhancing Human-Robot Collaboration: A Sim2Real Domain Adaptation Algorithm for Point Cloud Segmentation in Industrial Environments

by Mohammadi Amin, Fatemeh , Caldwell, Darwin G. , van de Venn, Hans Wernher in Adaptation , Algorithms , Artificial Intelligence

2025

The robust interpretation of 3D environments is crucial for human-robot collaboration (HRC) applications, where safety and operational efficiency are paramount. Semantic segmentation plays a key role in this context by enabling a precise and detailed understanding of the environment. Considering the intense data hunger for real-world industrial annotated data essential for effective semantic segmentation, this paper introduces a pioneering approach in the Sim2Real domain adaptation for semantic segmentation of 3D point cloud data, specifically tailored for HRC. Our focus is on developing a network that robustly transitions from simulated environments to real-world applications, thereby enhancing its practical utility and impact on a safe HRC. In this work, we propose a dual-stream network architecture (FUSION) combining Dynamic Graph Convolutional Neural Networks (DGCNN) and Convolutional Neural Networks (CNN) augmented with residual layers as a Sim2Real domain adaptation algorithm for an industrial environment. The proposed model was evaluated on real-world HRC setups and simulation industrial point clouds, it showed increased state-of-the-art performance, achieving a segmentation accuracy of 97.76%, and superior robustness compared to existing methods. The simulation dataset and source code will be made publicly available at: https://github.com/Fatemeh-MA/Fusion .

Journal Article

Share this book

Add to My Shelf

NoctuDroneNet: Real-Time Semantic Segmentation of Nighttime UAV Imagery in Complex Environments

by Qu, Ruokun , Tan, Jintao , Liu, Yelu in Accuracy , Adaptation , Architecture

2025

Nighttime semantic segmentation represents a challenging frontier in computer vision, made particularly difficult by severe low-light conditions, pronounced noise, and complex illumination patterns. These challenges intensify when dealing with Unmanned Aerial Vehicle (UAV) imagery, where varying camera angles and altitudes compound the difficulty. In this paper, we introduce NoctuDroneNet (Nocturnal UAV Drone Network, hereinafter referred to as NoctuDroneNet), a real-time segmentation model tailored specifically for nighttime UAV scenarios. Our approach integrates convolution-based global reasoning with training-only semantic alignment modules to effectively handle diverse and extreme nighttime conditions. We construct a new dataset, NUI-Night, focusing on low-illumination UAV scenes to rigorously evaluate performance under conditions rarely represented in standard benchmarks. Beyond NUI-Night, we assess NoctuDroneNet on the Varied Drone Dataset (VDD), a normal-illumination UAV dataset, demonstrating the model’s robustness and adaptability to varying flight domains despite the lack of large-scale low-light UAV benchmarks. Furthermore, evaluations on the Night-City dataset confirm its scalability and applicability to complex nighttime urban environments. NoctuDroneNet achieves state-of-the-art performance on NUI-Night, surpassing strong real-time baselines in both segmentation accuracy and speed. Qualitative analyses highlight its resilience to under-/over-exposure and small-object detection, underscoring its potential for real-world applications like UAV emergency landings under minimal illumination.

Journal Article

Share this book

Add to My Shelf

Aerial Hybrid Adjustment of LiDAR Point Clouds, Frame Images, and Linear Pushbroom Images

by Jonassen, Vetle O. , Gjevestad, Jon Glenn Omholt , Kjørsvik, Narve S. in Accuracy , Aerial surveys , Bundle adjustment

2024

In airborne surveying, light detection and ranging (LiDAR) strip adjustment and image bundle adjustment are customarily performed as separate processes. The bundle adjustment is usually conducted from frame images, while using linear pushbroom (LP) images in the bundle adjustment has been historically challenging due to the limited number of observations available to estimate the exterior image orientations. However, data from these three sensors conceptually provide information to estimate the same trajectory corrections, which is favorable for solving the problems of image depth estimation or the planimetric correction of LiDAR point clouds. Thus, our purpose with the presented study is to jointly estimate corrections to the trajectory and interior sensor states in a scalable hybrid adjustment between 3D LiDAR point clouds, 2D frame images, and 1D LP images. Trajectory preprocessing is performed before the low-frequency corrections are estimated for certain time steps in the following adjustment using cubic spline interpolation. Furthermore, the voxelization of the LiDAR data is used to robustly and efficiently form LiDAR observations and hybrid observations between the image tie-points and the LiDAR point cloud to be used in the adjustment. The method is successfully demonstrated with an experiment, showing the joint adjustment of data from the three different sensors using the same trajectory correction model with spline interpolation of the trajectory corrections. The results show that the choice of the trajectory segmentation time step is not critical. Furthermore, photogrammetric sub-pixel planimetric accuracy is achieved, and height accuracy on the order of mm is achieved for the LiDAR point cloud. This is the first time these three types of sensors with fundamentally different acquisition techniques have been integrated. The suggested methodology presents a joint adjustment of all sensor observations and lays the foundation for including additional sensors for kinematic mapping in the future.

Journal Article

Share this book

Add to My Shelf

A Multi-Supervised Network for Real-Time and Accurate Semantic Segmentation in Underwater Scenes

by Xu, Mingze , Huang, Zhigang , Ding, Jun in Ablation , Accuracy , Alignment

2026

Real-time semantic segmentation is a core perception capability for underwater robots and autonomous underwater vehicles (AUVs), yet it remains challenging because underwater imagery often exhibits low contrast, blurred boundaries, and strong appearance degradation under strict onboard computation budgets. This paper proposes MSNet, a multi-supervised two-pathway network that decouples feature learning into a semantic branch for context modeling and a detail branch for preserving high-resolution spatial information. MSNet introduces three complementary supervisory signals: (i) low-frequency semantic supervision derived from smoothed labels to encourage body semantics, (ii) high-frequency detail supervision derived from edge-enhanced labels to improve boundary localization, and (iii) category representation supervision implemented by a Category Representation Enhancement Module (CREM) to strengthen class discrimination at the deepest stage. To prevent auxiliary supervision from amplifying cross-resolution misalignment during fusion, we embed a Bilateral Flow-based Alignment Module (BFAM) into multi-stage feature fusion. Experiments on the SUIM benchmark show that MSNet achieves 79.83% mIoU and 86.57% F-score at 55 FPS with 6.2 M parameters on an RTX 3060 GPU, outperforming mainstream encoder–decoder and two-pathway algorithms. Compared with SFNet and BiSeNet V3, MSNet improves mIoU by 1.52% and 1.89%, and runs 9 FPS faster than SFNet. Ablation studies verify the effectiveness and complementarity of the proposed supervision and alignment strategies, indicating MSNet offers a practical accuracy–speed trade-off for marine engineering applications.

Journal Article

Share this book

Add to My Shelf

Real-Time Segmentation of Unstructured Environments by Combining Domain Generalization and Attention Mechanisms

by Zhao, Wenfeng , Zhong, Minyue , Lin, Nuanchen in Accuracy , Agriculture , Algorithms

2023

This paper presents a focused investigation into real-time segmentation in unstructured environments, a crucial aspect for enabling autonomous navigation in off-road robots. To address this challenge, an improved variant of the DDRNet23-slim model is proposed, which includes a lightweight network architecture and reclassifies ten different categories, including drivable roads, trees, high vegetation, obstacles, and buildings, based on the RUGD dataset. The model’s design includes the integration of the semantic-aware normalization and semantic-aware whitening (SAN–SAW) module into the main network to improve generalization ability beyond the visible domain. The model’s segmentation accuracy is improved through the fusion of channel attention and spatial attention mechanisms in the low-resolution branch to enhance its ability to capture fine details in complex scenes. Additionally, to tackle the issue of category imbalance in unstructured scene datasets, a rare class sampling strategy (RCS) is employed to mitigate the negative impact of low segmentation accuracy for rare classes on the overall performance of the model. Experimental results demonstrate that the improved model achieves a significant 14% increase mIoU in the invisible domain, indicating its strong generalization ability. With a parameter count of only 5.79M, the model achieves mAcc of 85.21% and mIoU of 77.75%. The model has been successfully deployed on a a Jetson Xavier NX ROS robot and tested in both real and simulated orchard environments. Speed optimization using TensorRT increased the segmentation speed to 30.17 FPS. The proposed model strikes a desirable balance between inference speed and accuracy and has good domain migration ability, making it applicable in various domains such as forestry rescue and intelligent agricultural orchard harvesting.

Journal Article

Share this book

Add to My Shelf

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

by Ke, Yan , Hu, Xuegang in Accuracy , Computer Graphics , Computer Science

2024

In recent years, the performance of real-time semantic segmentation has increasingly become a research focus for real-time applications such as autonomous driving. Although large deep models have excellent segmentation results, their inference speed is slow and the models are complex, which makes them difficult to deploy in practice. To address these problems, a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation (EMFANet) is proposed in this paper, which employs the encoder–decoder framework with efficient channel attention mechanism. In EMFANet, the effective symmetric attention residual unit (SARU) is presented to rapidly obtain large amounts of multi-scale contextual information. The lightweight multi-scale information aggregation unit (MIAU) is presented for efficient fusion of multi-scale features. Experimental results on the Cityscapes test set show that EMFANet can obtain 72.1% mean intersection over union (mIoU) and 143 FPS with only 1.03 M parameters. It also has competitive segmentation capability on the low-resolution Camvid test set with a fast inference speed of 357 FPS. EMFANet achieves an outstanding performance balance between segmentation accuracy, inference speed and model size.

Journal Article

Share this book

Add to My Shelf

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

by Ding, Hao , Wang, Xing , Zhang, Zekai in Artificial neural networks , Benchmarks , Computer applications

2024

Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance.

Journal Article

Share this book

Add to My Shelf

Mechanical Fault Diagnosis of High Voltage Circuit Breakers with Unknown Fault Type Using Hybrid Classifier Based on LMD and Time Segmentation Energy Entropy

by Huang, Nantian , Cai, Guowei , Xu, Dianguo in Classifiers , Energy use , Entropy

2016

In order to improve the identification accuracy of the high voltage circuit breakers’ (HVCBs) mechanical fault types without training samples, a novel mechanical fault diagnosis method of HVCBs using a hybrid classifier constructed with Support Vector Data Description (SVDD) and fuzzy c-means (FCM) clustering method based on Local Mean Decomposition (LMD) and time segmentation energy entropy (TSEE) is proposed. Firstly, LMD is used to decompose nonlinear and non-stationary vibration signals of HVCBs into a series of product functions (PFs). Secondly, TSEE is chosen as feature vectors with the superiority of energy entropy and characteristics of time-delay faults of HVCBs. Then, SVDD trained with normal samples is applied to judge mechanical faults of HVCBs. If the mechanical fault is confirmed, the new fault sample and all known fault samples are clustered by FCM with the cluster number of known fault types. Finally, another SVDD trained by the specific fault samples is used to judge whether the fault sample belongs to an unknown type or not. The results of experiments carried on a real SF6 HVCB validate that the proposed fault-detection method is effective for the known faults with training samples and unknown faults without training samples.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter