Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
14
result(s) for
"RGB-D detector"
Sort by:
Passion fruit detection and counting based on multiple scale faster R-CNN using RGB-D images
by
Xue Yueju
,
Liu, Haofeng
,
Chan, Zheng
in
Artificial neural networks
,
Convolution
,
Feature extraction
2020
The accurate and reliable fruit detection in orchards is one of the most crucial tasks for supporting higher level agriculture tasks such as yield mapping and robotic harvesting. However, detecting and counting small fruit is a very challenging task under variable lighting conditions, low-resolutions and heavy occlusion by neighboring fruits or foliage. To robustly detect small fruits, an improved method is proposed based on multiple scale faster region-based convolutional neural networks (MS-FRCNN) approach using the color and depth images acquired with an RGB-D camera. The architecture of MS-FRCNN is improved to detect lower-level features by incorporating feature maps from shallower convolution feature maps for regions of interest (ROI) pooling. The detection framework consists of three phases. Firstly, multiple scale feature extractors are used to extract low and high features from RGB and depth images respectively. Then, RGB-detector and depth-detector are trained separately using MS-FRCNN. Finally, late fusion methods are explored for combining the RGB and depth detector. The detection framework was demonstrated and evaluated on two datasets that include passion fruit images under variable illumination conditions and occlusion. Compared with the faster R-CNN detector of RGB-D images, the recall, the precision and F1-score of MS-FRCNN method increased from 0.922 to 0.962, 0.850 to 0.931 and 0.885 to 0.946, respectively. Furthermore, the MS-FRCNN method effectively improves small passion fruit detection by achieving 0.909 of the F1 score. It is concluded that the detector based on MS-FRCNN can be applied practically in the actual orchard environment.
Journal Article
Citrus Tree Canopy Segmentation of Orchard Spraying Robot Based on RGB-D Image and the Improved DeepLabv3
2023
The accurate and rapid acquisition of fruit tree canopy parameters is fundamental for achieving precision operations in orchard robotics, including accurate spraying and precise fertilization. In response to the issue of inaccurate citrus tree canopy segmentation in complex orchard backgrounds, this paper proposes an improved DeepLabv3+ model for fruit tree canopy segmentation, facilitating canopy parameter calculation. The model takes the RGB-D (Red, Green, Blue, Depth) image segmented canopy foreground as input, introducing Dilated Spatial Convolution in Atrous Spatial Pyramid Pooling to reduce computational load and integrating Convolutional Block Attention Module and Coordinate Attention for enhanced edge feature extraction. MobileNetV3-Small is utilized as the backbone network, making the model suitable for embedded platforms. A citrus tree canopy image dataset was collected from two orchards in distinct regions. Data from Orchard A was divided into training, validation, and test set A, while data from Orchard B was designated as test set B, collectively employed for model training and testing. The model achieves a detection speed of 32.69 FPS on Jetson Xavier NX, which is six times faster than the traditional DeepLabv3+. On test set A, the mIoU is 95.62%, and on test set B, the mIoU is 92.29%, showing a 1.12% improvement over the traditional DeepLabv3+. These results demonstrate the outstanding performance of the improved DeepLabv3+ model in segmenting fruit tree canopies under different conditions, thus enabling precise spraying by orchard spraying robots.
Journal Article
Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation
by
Malik, Jitendra
,
Arbeláez, Pablo
,
Girshick, Ross
in
Algorithms
,
Analysis
,
Artificial Intelligence
2015
In this paper, we address the problems of contour detection, bottom-up grouping, object detection and semantic segmentation on RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset (Silberman et al., ECCV,
2012
). We propose algorithms for object boundary detection and hierarchical segmentation that generalize the
g
P
b
-
u
c
m
approach of Arbelaez et al. (TPAMI,
2011
) by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We train RGB-D object detectors by analyzing and computing histogram of oriented gradients on the depth image and using them with deformable part models (Felzenszwalb et al., TPAMI,
2010
). We observe that this simple strategy for training object detectors significantly outperforms more complicated models in the literature. We then turn to the problem of semantic segmentation for which we propose an approach that classifies superpixels into the dominant object categories in the NYUD2 dataset. We design generic and class-specific features to encode the appearance and geometry of objects. We also show that additional features computed from RGB-D object detectors and scene classifiers further improves semantic segmentation accuracy. In all of these tasks, we report significant improvements over the state-of-the-art.
Journal Article
A Comprehensive Survey of Visual SLAM Algorithms
2022
Simultaneous localization and mapping (SLAM) techniques are widely researched, since they allow the simultaneous creation of a map and the sensors’ pose estimation in an unknown environment. Visual-based SLAM techniques play a significant role in this field, as they are based on a low-cost and small sensor system, which guarantees those advantages compared to other sensor-based SLAM techniques. The literature presents different approaches and methods to implement visual-based SLAM systems. Among this variety of publications, a beginner in this domain may find problems with identifying and analyzing the main algorithms and selecting the most appropriate one according to his or her project constraints. Therefore, we present the three main visual-based SLAM approaches (visual-only, visual-inertial, and RGB-D SLAM), providing a review of the main algorithms of each approach through diagrams and flowcharts, and highlighting the main advantages and disadvantages of each technique. Furthermore, we propose six criteria that ease the SLAM algorithm’s analysis and consider both the software and hardware levels. In addition, we present some major issues and future directions on visual-SLAM field, and provide a general overview of some of the existing benchmark datasets. This work aims to be the first step for those initiating a SLAM project to have a good perspective of SLAM techniques’ main elements and characteristics.
Journal Article
SY-SLAM: Real-Time Dynamic Indoor RGB-D SLAM with SuperPoint Detection and Asynchronous YOLOv8s-Based Keypoint Suppression
by
Wei, Shuangfeng
,
Zhou, Shan
,
Zhi, Shaoshuai
in
Ablation
,
Accuracy
,
bounding-box-based keypoint suppression
2026
Traditional visual SLAM pipelines are typically designed under the static-world assumption and often degrade severely in indoor environments with frequent human motion. To improve trajectory accuracy and front-end stability in such scenarios while maintaining real-time throughput, we present SY-SLAM, an RGB-D SLAM system for dynamic indoor environments with frequent human motion. (S stands for SuperPoint, which is used as a detector-only learned keypoint front-end, and Y stands for YOLO, which provides asynchronous person-aware keypoint suppression based on detected human bounding boxes.) We integrate a TensorRT-deployed detector-only SuperPoint module to improve keypoint repeatability and robustness while retaining ORB binary descriptors for efficient matching and place recognition within the ORB-SLAM3 framework. To avoid feature starvation while preserving keypoint quality, we further introduce an adaptive SuperPoint keypoint selection strategy that applies stricter filtering when keypoints are abundant and relaxes the selection constraints when they are scarce. In parallel, an asynchronous YOLOv8s TensorRT thread performs person detection with temporal bounding-box memory, and keypoints inside detected person regions are removed before ORB descriptor computation and matching to reduce dynamic-feature contamination in the front end. We evaluate SY-SLAM on five dynamic TUM RGB-D fr3 sequences using ATE and RPE metrics. Compared with ORB-SLAM3, SY-SLAM reduces ATE RMSE by 93.45% across four dynamic walking sequences. On the widely reported fr3/w/x sequence, SY-SLAM achieves competitive accuracy with recent dynamic SLAM methods while maintaining real-time performance. The system runs in real time at 46.8 Hz (21.36 ms per frame) on an Intel i9-13900H CPU with an NVIDIA RTX 4070 Laptop GPU.
Journal Article
Tableware Tidying-Up Robot System for Self-Service Restaurant–Detection and Manipulation of Leftover Food and Tableware
2022
In this study, an automated tableware tidying-up robot system was developed to tidy up tableware in a self-service restaurant with a large amount of tableware. This study focused on sorting and collecting tableware placed on trays detected by an RGB-D camera. Leftover food was also treated with this robot system. The RGB-D camera efficiently detected the position and height of the tableware and whether there was leftover food or not by image processing. A parallel arm and robot hand mechanism was designed to realize the advantages of a low cost and high processing speed. Two types of rotation mechanisms were designed to realize the function of throwing away leftover food. The effectiveness of the camera detection system was verified through the experiments of tableware and leftover food detection. The effectiveness of the prototype robot and the rotation assist mechanism was verified through the experiments of grasping tableware, throwing away leftover food by two types of rotating mechanisms, collecting multiple tableware, and the sorting of overlapping tableware with multiple robots.
Journal Article
Analysis of Depth Cameras for Proximal Sensing of Grapes
2022
This work investigates the performance of five depth cameras in relation to their potential for grape yield estimation. The technologies used by these cameras include structured light (Kinect V1), active infrared stereoscopy (RealSense D415), time of flight (Kinect V2 and Kinect Azure), and LiDAR (Intel L515). To evaluate their suitability for grape yield estimation, a range of factors were investigated including their performance in and out of direct sunlight, their ability to accurately measure the shape of the grapes, and their potential to facilitate counting and sizing of individual berries. The depth cameras’ performance was benchmarked using high-resolution photogrammetry scans. All the cameras except the Kinect V1 were able to operate in direct sunlight. Indoors, the RealSense D415 camera provided the most accurate depth scans of grape bunches, with a 2 mm average depth error relative to photogrammetric scans. However, its performance was reduced in direct sunlight. The time of flight and LiDAR cameras provided depth scans of grapes that had about an 8 mm depth bias. Furthermore, the individual berries manifested in the scans as pointed shape distortions. This led to an underestimation of berry sizes when applying the RANSAC sphere fitting but may help with the detection of individual berries with more advanced algorithms. Applying an opaque coating to the surface of the grapes reduced the observed distance bias and shape distortion. This indicated that these are likely caused by the cameras’ transmitted light experiencing diffused scattering within the grapes. More work is needed to investigate if this distortion can be used for enhanced measurement of grape properties such as ripeness and berry size.
Journal Article
Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection
2023
Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network (DIN) for RGB-D salient object detection, to take full advantage of both absolute and relative depth information, and further, enforce the in-depth fusion of the RGB-D cross-modalities. Specifically, an absolute depth-induced module (ADIM) is proposed, to hierarchically integrate absolute depth values and RGB features, to allow the interaction between the appearance and structural information in the encoding stage. A relative depth-induced module (RDIM) is designed, to capture detailed saliency cues, by exploring contrastive and structural information from relative depth values in the decoding stage. By combining the ADIM and RDIM, we can accurately locate salient objects with clear boundaries, even from complex scenes. The proposed DIN is a lightweight network, and the model size is much smaller than that of state-of-the-art algorithms. Extensive experiments on six challenging benchmarks, show that our method outperforms most existing RGB-D salient object detection models.
Journal Article
Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues
by
Martins, Renato
,
Nascimento, Erickson R.
,
Campos, Mario F. M.
in
Artificial Intelligence
,
Computer Science
,
Computer Vision and Pattern Recognition
2020
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images. We propose a complete framework to create an enhanced map representation of the environment with object-level information to be used in several applications such as human-robot interaction, assistive robotics, visual navigation, or in manipulation tasks. Our formulation leverages a CNN-based object detector (Yolo) with a 3D model-based segmentation technique to perform instance semantic segmentation, and to localize, identify, and track different classes of objects in the scene. The tracking and positioning of semantic classes is done with a dictionary of Kalman filters in order to combine sensor measurements over time and then providing more accurate maps. The formulation is designed to identify and to disregard dynamic objects in order to obtain a medium-term invariant map representation. The proposed method was evaluated with collected and publicly available RGB-D data sequences acquired in different indoor scenes. Experimental results show the potential of the technique to produce augmented semantic maps containing several objects (notably doors). We also provide to the community a dataset composed of annotated object classes (doors, fire extinguishers, benches, water fountains) and their positioning, as well as the source code as ROS packages.
Journal Article
CEHD: A Unified Framework for Detection and Height Estimation of Fresh Corn Ears in Field Conditions
2025
Real-time detection of fresh corn ear height can provide a basis for dynamic adjustment of harvester header parameters, reducing mechanical damage and improving harvest quality. This study proposes a corn ear height detection model (CEHD). A YOLO-HAMDF network is developed for ear recognition, in which the core modules—TBDA, GLSA, and AQE—respectively suppress background interference, enhance contextual perception, and optimize bounding-box scoring. Depth information is incorporated to filter non-target regions and improve system robustness. In addition, a DI-DeepSORT module is designed for ear tracking, where DBC-Net and IDA-Kalman, respectively, enhance the discriminability of ReID features and enable independent-dimension adaptive noise modeling with smoothed positional updates. Experimental results demonstrate that the proposed CEHD model achieves a mean absolute error (MAE) of only 3.21 ± 0.05 cm under field conditions, indicating strong stability and practical applicability. In summary, this study presents a stable and reliable corn ear height detection system, achieves real-time monitoring of ear height, and provides data support for the dynamic adjustment of header parameters in fresh corn harvesters.
Journal Article