Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
14
result(s) for
"visual sensor information processing module"
Sort by:
Technology and application of intelligent driving based on visual perception
2017
The camera is one of the important sensors to realise the intelligent driving environment. It can realise lane detection and tracking, obstacle detection, traffic sign detection, identification and discrimination and visual simultaneous localisation and mapping. The visual sensor model, quantity and installation location are different on different intelligent driving hardware experimental platform as well as the visual sensor information processing module, thus a number of intelligent driving system software modules and interfaces are different. In this study, the software architecture of the autonomous vehicle based on the driving brain is used to adapt to different types of visual sensors. The target segment is extracted by the image segmentation algorithm, and then the segmentation of the region of interest is carried out. According to the input feature calculation results, the obstacle search is done in the second segmentation region, the output of the accessible road area. As driving information is complete, the authors will increase or reduce one or more visual sensors, change the visual sensor model or installation location, which will no longer directly affect the intelligent driving decision, they make the multi-vision sensors adapted to the requirements of different intelligent driving hardware test platforms.
Journal Article
Deep Learning for Visual SLAM: The State-of-the-Art and Future Trends
Visual Simultaneous Localization and Mapping (VSLAM) has been a hot topic of research since the 1990s, first based on traditional computer vision and recognition techniques and later on deep learning models. Although the implementation of VSLAM methods is far from perfect and complete, recent research in deep learning has yielded promising results for applications such as autonomous driving and navigation, service robots, virtual and augmented reality, and pose estimation. The pipeline of traditional VSLAM methods based on classical image processing algorithms consists of six main steps, including initialization (data acquisition), feature extraction, feature matching, pose estimation, map construction, and loop closure. Since 2017, deep learning has changed this approach from individual steps to implementation as a whole. Currently, three ways are developing with varying degrees of integration of deep learning into traditional VSLAM systems: (1) adding auxiliary modules based on deep learning, (2) replacing the original modules of traditional VSLAM with deep learning modules, and (3) replacing the traditional VSLAM system with end-to-end deep neural networks. The first way is the most elaborate and includes multiple algorithms. The other two are in the early stages of development due to complex requirements and criteria. The available datasets with multi-modal data are also of interest. The discussed challenges, advantages, and disadvantages underlie future VSLAM trends, guiding subsequent directions of research.
Journal Article
Background-Aware Cross-Attention Multiscale Fusion for Multispectral Object Detection
2024
Limited by the imaging capabilities of sensors, research based on single modality is difficult to cope with faults and dynamic perturbations in detection. Effective multispectral object detection, which can achieve better detection accuracy by fusing visual information from different modalities, has attracted widespread attention. However, most of the existing methods adopt simple fusion mechanisms, which fail to utilize the complementary information between modalities while lacking the guidance of a priori knowledge. To address the above issues, we propose a novel background-aware cross-attention multiscale fusion network (BA-CAMF Net) to achieve adaptive fusion in visible and infrared images. First, a background-aware module is designed to calculate the light and contrast to guide the fusion. Then, a cross-attention multiscale fusion module is put forward to enhance inter-modality complement features and intra-modality intrinsic features. Finally, multiscale feature maps from different modalities are fused according to background-aware weights. Experimental results on LLVIP, FLIR, and VEDAI indicate that the proposed BA-CAMF Net achieves higher detection accuracy than the current State-of-the-Art multispectral detectors.
Journal Article
Walking direction recognition based on deep learning with inertial sensors and pressure insoles
2025
A vast population of visually impaired individuals is currently facing intricate life challenges, particularly related to perceiving walking directions. Therefore, this paper proposes a novel deep learning method based on wearable sensors to address the problem of walking direction recognition. The information mining and fusion module, the multi-feature position information mining attention module, and the multi-feature content information mining attention module are proposed to comprehensively mine comprehensive information from walking data. To overcome the limitation of information gathered from a single type of sensor, this paper combines inertial sensors and pressure insoles for walking direction recognition. Experimental results demonstrate that compared to existing research methods, the proposed method in this paper achieves a higher recognition accuracy highlighting the superiority and effectiveness of this method.
Journal Article
Flare Removal Model Based on Sparse-UFormer Networks
2024
When a camera lens is directly faced with a strong light source, image flare commonly occurs, significantly reducing the clarity and texture of the photo and interfering with image processing tasks that rely on visual sensors, such as image segmentation and feature extraction. A novel flare removal network, the Sparse-UFormer neural network, has been developed. The network integrates two core components onto the UFormer architecture: the mixed-scale feed-forward network (MSFN) and top-k sparse attention (TKSA), creating the sparse-transformer module. The MSFN module captures rich multi-scale information, enabling the more effective addressing of flare interference in images. The TKSA module, designed with a sparsity strategy, focuses on key features within the image, thereby significantly enhancing the precision and efficiency of flare removal. Furthermore, in the design of the loss function, besides the conventional flare, background, and reconstruction losses, a structural similarity index loss has been incorporated to ensure the preservation of image details and structure while removing the flare. Ensuring the minimal loss of image information is a fundamental premise for effective image restoration. The proposed method has been demonstrated to achieve state-of-the-art performance on the Flare7K++ test dataset and in challenging real-world scenarios, proving its effectiveness in removing flare artefacts from images.
Journal Article
3D car-detection based on a Mobile Deep Sensor Fusion Model and real-scene applications
2020
Unmanned vehicles need to make a comprehensive perception of the surrounding environmental information during driving. Perception of automotive information is of significance. In the field of automotive perception, the sterevision of car-detection plays a vital role and sterevision can calculate the length, width, and height of a car, making the car more specific. However, under the existing technology, it is impossible to obtain accurate detection in a complex environment by relying on a single sensor. Therefore, it is particularly important to study the complex sensing technology based on multi-sensor fusion. Recently, with the development of deep learning in the field of vision, a mobile sensor-fusion method based on deep learning is proposed and applied in this paper——Mobile Deep Sensor Fusion Model (MDSFM). The content of this article is as follows. It does a data processing that projects 3D data to 2D data, which can form a dataset suitable for the model, thereby training data more efficiently. In the modules of LiDAR, it uses a revised squeezeNet structure to lighten the model and reduce parameters. In the modules of cameras, it uses the improved design of detecting module in R-CNN with a Mobile Spatial Attention Module (MSAM). In the fused part, it uses a dual-view deep fusing structure. And then it selects images from the KITTI’s datasets for validation to test this model. Compared with other recognized methods, it shows that our model has a fairly good performance. Finally, it implements a ROS program on the experimental car and our model is in good condition. The result shows that it can improve performance of detecting easy cars significantly through MDSFM. It increases the quality of the detected data and improves the generalized ability of car-detection model. It improves contextual relevance and preserves background information. It remains stable in driverless environments. It is applied in the realistic scenario and proves that the model has a good practical value.
Journal Article
NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion
2024
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios.
Journal Article
UPGAN: An Unsupervised Generative Adversarial Network Based on U-Shaped Structure for Pansharpening
by
Feng, Yuting
,
Chu, Xing
,
Wang, Qianqian
in
Computational linguistics
,
Computer vision
,
Decomposition
2024
Pansharpening is the fusion of panchromatic images and multispectral images to obtain images with high spatial resolution and high spectral resolution, which have a wide range of applications. At present, methods based on deep learning can fit the nonlinear features of images and achieve excellent image quality; however, the images generated with supervised learning approaches lack real-world applicability. Therefore, in this study, we propose an unsupervised pansharpening method based on a generative adversarial network. Considering the fine tubular structures in remote sensing images, a dense connection attention module is designed based on dynamic snake convolution to recover the details of spatial information. In the stage of image fusion, the fusion of features in groups is applied through the cross-scale attention fusion module. Moreover, skip layers are implemented at different scales to integrate significant information, thus improving the objective index values and visual appearance. The loss function contains four constraints, allowing the model to be effectively trained without reference images. The experimental results demonstrate that the proposed method outperforms other widely accepted state-of-the-art methods on the QuickBird and WorldView2 data sets.
Journal Article
Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
2023
Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset.
Journal Article
Electronic Guidance Cane for Users Having Partial Vision Loss Disability
by
Sarfraz, Muhammad Shahzad
,
Javeed, Muhammad Awais
,
Khan, Asad
in
Blindness
,
Cameras
,
Cellular communication
2021
Vision is, no doubt, one of the most important and precious gifts to humans; however, there exists a fraction of visually impaired ones who cannot see properly. These visually impaired disabled people face many challenges in their lives—like performing routine activities, e.g., shopping and walking. Additionally, they also need to travel to known and unknown places for different necessities, and hence, they require an attendant. Most of the time, affording an attendant is not easier and inexpensive, especially when almost 2.5% of the population of Pakistan is visually impaired. There exist some ways of helping these physically impaired people, for example, devices with a navigation system with speech output; however, these are either less accurate, costly, or heavier. Additionally, none of them have shown perfect results in both indoor and outdoor activities. Additionally, the problems become even more severe when the subject/the people are partially deaf as well. In this paper, we present a proof of concept of an embedded prototype which not only navigates but also detects the hurdles and gives alerts—using speech alarm output and/or vibration for the partially deaf—along the way. The designed embedded system includes a cane, a microcontroller, Global System for Mobile Communication (GSM), Global Positioning System (GPS) module, Arduino, a speech output module speaker, Light-Dependent Resistor (LDR), and ultrasonic sensors for hurdle detection with voice and vibrational feedback. Using our developed system, physically impaired people can reach their destination safely and independently.
Journal Article