Catalogue Search | MBRL

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

in Classification , Deep learning , Estimation

2021

This paper presents a comprehensive survey on vision-based robotic grasping. We conclude three key tasks during vision-based robotic grasping, which are object localization, object pose estimation and grasp estimation. In detail, the object localization task contains object localization without classification, object detection and object instance segmentation. This task provides the regions of the target object in the input data. The object pose estimation task mainly refers to estimating the 6D object pose and includes correspondence-based methods, template-based methods and voting-based methods, which affords the generation of grasp poses for known objects. The grasp estimation task includes 2D planar grasp methods and 6DoF grasp methods, where the former is constrained to grasp from one direction. These three tasks could accomplish the robotic grasping with different combinations. Lots of object pose estimation methods need not object localization, and they conduct object localization and object pose estimation jointly. Lots of grasp estimation methods need not object localization and object pose estimation, and they conduct grasp estimation in an end-to-end manner. Both traditional methods and latest deep learning-based methods based on the RGB-D image inputs are reviewed elaborately in this survey. Related datasets and comparisons between state-of-the-art methods are summarized as well. In addition, challenges about vision-based robotic grasping and future directions in addressing these challenges are also pointed out.

Journal Article

Share this book

Add to My Shelf

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

by Wu, Feng , Zha, Zheng-Jun , Zhu, Kai in Datasets , Entropy , Learning

2024

Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, (1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. (2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a background activation suppression (BAS) method. Specifically, an activation map constraint module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.

Journal Article

Share this book

Add to My Shelf

AERO: AI-Enabled Remote Sensing Observation with Onboard Edge Computing in UAVs

by Alhabashi, Yasser , Koubaa, Anis , Ghouti, Lahouari in Accuracy , Artificial intelligence , Automation

2023

Unmanned aerial vehicles (UAVs) equipped with computer vision capabilities have been widely utilized in several remote sensing applications, such as precision agriculture, environmental monitoring, and surveillance. However, the commercial usage of these UAVs in such applications is mostly performed manually, with humans being responsible for data observation or offline processing after data collection due to the lack of on board AI on edge. Other technical methods rely on the cloud computation offloading of AI applications, where inference is conducted on video streams, which can be unscalable and infeasible due to remote cloud servers’ limited connectivity and high latency. To overcome these issues, this paper presents a new approach to using edge computing in drones to enable the processing of extensive AI tasks onboard UAVs for remote sensing. We propose a cloud–edge hybrid system architecture where the edge is responsible for processing AI tasks and the cloud is responsible for data storage, manipulation, and visualization. We designed AERO, a UAV brain system with onboard AI capability using GPU-enabled edge devices. AERO is a novel multi-stage deep learning module that combines object detection (YOLOv4 and YOLOv7) and tracking (DeepSort) with TensorRT accelerators to capture objects of interest with high accuracy and transmit data to the cloud in real time without redundancy. AERO processes the detected objects over multiple consecutive frames to maximize detection accuracy. The experiments show a reduced false positive rate (0.7%), a low percentage of tracking identity switches (1.6%), and an average inference speed of 15.5 FPS on a Jetson Xavier AGX edge device.

Journal Article

Share this book

Add to My Shelf

Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

by Siméoni, Oriane , Puy, Gilles , Gidaris, Spyros in Annotations , Artificial Intelligence , Computer Imaging

2025

The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs.

Journal Article

Share this book

Add to My Shelf

Multi-template matching: a versatile tool for object-localization in microscopy images

by Thomas, Laurent S. V. , Gehrig, Jochen in Algorithms , Animals , Annotations

2020

Background The localization of objects of interest is a key initial step in most image analysis workflows. For biomedical image data, classical image-segmentation methods like thresholding or edge detection are typically used. While those methods perform well for labelled objects, they are reaching a limit when samples are poorly contrasted with the background, or when only parts of larger structures should be detected. Furthermore, the development of such pipelines requires substantial engineering of analysis workflows and often results in case-specific solutions. Therefore, we propose a new straightforward and generic approach for object-localization by template matching that utilizes multiple template images to improve the detection capacity. Results We provide a new implementation of template matching that offers higher detection capacity than single template approach, by enabling the detection of multiple template images. To provide an easy-to-use method for the automatic localization of objects of interest in microscopy images, we implemented multi-template matching as a Fiji plugin, a KNIME workflow and a python package. We demonstrate its application for the localization of entire, partial and multiple biological objects in zebrafish and medaka high-content screening datasets. The Fiji plugin can be installed by activating the Multi-Template-Matching and IJ-OpenCV update sites. The KNIME workflow is available on nodepit and KNIME Hub. Source codes and documentations are available on GitHub ( https://github.com/multi-template-matching ). Conclusion The novel multi-template matching is a simple yet powerful object-localization algorithm, that requires no data-pre-processing or annotation. Our implementation can be used out-of-the-box by non-expert users for any type of 2D-image. It is compatible with a large variety of applications including, for instance, analysis of large-scale datasets originating from automated microscopy, detection and tracking of objects in time-lapse assays, or as a general image-analysis step in any custom processing pipelines. Using different templates corresponding to distinct object categories, the tool can also be used for classification of the detected regions.

Journal Article

Share this book

Add to My Shelf

Energy-Efficient Object Detection and Tracking Framework for Wireless Sensor Network

by Dev, Jayashree , Mishra, Jibitesh in Accuracy , Algorithms , Cluster Analysis

2023

Object detection and tracking is one of the key applications of wireless sensor networks (WSNs). The key issues associated with this application include network lifetime, object detection and localization accuracy. To ensure the high quality of the service, there should be a trade-off between energy efficiency and detection accuracy, which is challenging in a resource-constrained WSN. Most researchers have enhanced the application lifetime while achieving target detection accuracy at the cost of high node density. They neither considered the system cost nor the object localization accuracy. Some researchers focused on object detection accuracy while achieving energy efficiency by limiting the detection to a predefined target trajectory. In particular, some researchers only focused on node clustering and node scheduling for energy efficiency. In this study, we proposed a mobile object detection and tracking framework named the Energy Efficient Object Detection and Tracking Framework (EEODTF) for heterogeneous WSNs, which minimizes energy consumption during tracking while not affecting the object detection and localization accuracy. It focuses on achieving energy efficiency via node optimization, mobile node trajectory optimization, node clustering, data reporting optimization and detection optimization. We compared the performance of the EEODTF with the Energy Efficient Tracking and Localization of Object (EETLO) model and the Particle-Swarm-Optimization-based Energy Efficient Target Tracking Model (PSOEETTM). It was found that the EEODTF is more energy efficient than the EETLO and PSOEETTM models.

Journal Article

Share this book

Add to My Shelf

Analytical Formalism for Data Representation and Object Detection with 2D LiDAR: Application in Mobile Robotics

by Fagundes, Leonardo A. , Caldeira, Alexandre G. , Quemelli, Matheus B. in Algorithms , LASER scanner , Lasers

2024

In mobile robotics, LASER scanners have a wide spectrum of indoor and outdoor applications, both in structured and unstructured environments, due to their accuracy and precision. Most works that use this sensor have their own data representation and their own case-specific modeling strategies, and no common formalism is adopted. To address this issue, this manuscript presents an analytical approach for the identification and localization of objects using 2D LiDARs. Our main contribution lies in formally defining LASER sensor measurements and their representation, the identification of objects, their main properties, and their location in a scene. We validate our proposal with experiments in generic semi-structured environments common in autonomous navigation, and we demonstrate its feasibility in multiple object detection and identification, strictly following its analytical representation. Finally, our proposal further encourages and facilitates the design, modeling, and implementation of other applications that use LASER scanners as a distance sensor.

Journal Article

Share this book

Add to My Shelf

ImageNet Auto-Annotation with Segmentation Propagation

by Ferrari, Vittorio , Guillaumin, Matthieu , Küttel, Daniel in Accuracy , Annotations , Artificial Intelligence

2014

ImageNet is a large-scale hierarchical database of object classes with millions of images.We propose to automatically populate it with pixelwise object-background segmentations, by leveraging existing manual annotations in the form of class labels and bounding-boxes. The key idea is to recursively exploit images segmented so far to guide the segmentation of new images. At each stage this propagation process expands into the images which are easiest to segment at that point in time, e.g. by moving to the semantically most related classes to those segmented so far. The propagation of segmentation occurs both (a) at the image level, by transferring existing segmentations to estimate the probability of a pixel to be foreground, and (b) at the class level, by jointly segmenting images of the same class and by importing the appearance models of classes that are already segmented. Through experiments on 577 classes and 500k images we show that our technique (i) annotates a wide range of classes with accurate segmentations; (ii) effectively exploits the hierarchical structure of ImageNet; (iii) scales efficiently, especially when implemented on superpixels; (iv) outperforms a baseline GrabCut (Rother et al. 2004 ) initialized on the image center, as well as segmentation transfer from a fixed source pool and run independently on each target image (Kuettel and Ferrari 2012 ). Moreover, our method also delivers state-of-the-art results on the recent iCoseg dataset for co-segmentation.

Journal Article

Share this book

Add to My Shelf

Improving the Efficiency of 3D Monocular Object Detection and Tracking for Road and Railway Smart Mobility

by Mauri, Antoine , Kounouho, Messmer , Evain, Alexandre in 3D bounding boxes estimation , Algorithms , Artificial Intelligence

2023

Three-dimensional (3D) real-time object detection and tracking is an important task in the case of autonomous vehicles and road and railway smart mobility, in order to allow them to analyze their environment for navigation and obstacle avoidance purposes. In this paper, we improve the efficiency of 3D monocular object detection by using dataset combination and knowledge distillation, and by creating a lightweight model. Firstly, we combine real and synthetic datasets to increase the diversity and richness of the training data. Then, we use knowledge distillation to transfer the knowledge from a large, pre-trained model to a smaller, lightweight model. Finally, we create a lightweight model by selecting the combinations of width, depth & resolution in order to reach a target complexity and computation time. Our experiments showed that using each method improves either the accuracy or the efficiency of our model with no significant drawbacks. Using all these approaches is especially useful for resource-constrained environments, such as self-driving cars and railway systems.

Journal Article

Share this book

Add to My Shelf

A Detection Method of Bolts on Axlebox Cover Based on Cascade Deep Convolutional Neural Network

by Zhao, Shuguang , Tong, Qianqian , Wang, Ji in Algorithms , Artificial neural networks , Bolts

2023

Loosening detection; cascade deep convolutional neural network; object localization; saliency detection problem of bolts on axlebox covers. Firstly, an SSD network based on ResNet50 and CBAM module by improving bolt image features is proposed for locating bolts on axlebox covers. And then, the A2-PFN is proposed according to the slender features of the marker lines for extracting more accurate marker lines regions of the bolts. Finally, a rectangular approximation method is proposed to regularize the marker line regions as a way to calculate the angle of the marker line and plot all the angle values into an angle table, according to which the criteria of the angle table can determine whether the bolt with the marker line is in danger of loosening. Meanwhile, our improved algorithm is compared with the pre-improved algorithm in the object localization stage. The results show that our proposed method has a significant improvement in both detection accuracy and detection speed, where our mAP (IoU = 0.75) reaches 0.77 and fps reaches 16.6. And in the saliency detection stage, after qualitative comparison and quantitative comparison, our method significantly outperforms other state-of-the-art methods, where our MAE reaches 0.092, F-measure reaches 0.948 and AUC reaches 0.943. Ultimately, according to the angle table, out of 676 bolt samples, a total of 60 bolts are loose, 69 bolts are at risk of loosening, and 547 bolts are tightened.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter