Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
12 result(s) for "2D visual detection"
Sort by:
Research on a Fusion Technique of YOLOv8-URE-Based 2D Vision and Point Cloud for Robotic Grasping in Stacked Scenarios
In industrial robotic grasping tasks, traditional 3D point cloud registration and pose estimation methods often struggle with low efficiency and limited accuracy in stacked and cluttered environments. To address these challenges, this paper proposes a grasp pose estimation algorithm that integrates 2D object detection based on YOLOv8-URE with 3D point cloud registration. In the detection stage, the method enhances object feature perception and localization by optimizing the receptive field structure and introducing attention mechanisms. It also employs an efficient multi-scale feature fusion strategy to improve bounding box regression accuracy. During point cloud processing, target centers predicted by the detector guide rapid segmentation, followed by robust registration techniques to estimate precise object poses. Experimental results demonstrate that YOLOv8-URE improves detection accuracy by 9.21% compared to YOLOv8n, reduces registration time by 60.5%, and significantly increases grasp success rates, proving its reliability and effectiveness in industrial scenarios.
Object detection and recognition using contour based edge detection and fast R-CNN
Object detection is a technique of computer vision whose primary intent is to detect objects. The objects can be detected from any image or video feeds. Now a day’s object detection is extensively applied in video surveillance systems, human tracking, and self-driving cars. This paper presented a novel object detection approach that uses only wireframe-based features. The wireframe of the image is identified by using Cellular logical array processing. This technique can determine the visual and geometric features of the image . This paper focuses on a deep neural network framework to detect the target object in the image. Fast R-CNN is used for the detection of objects. The detection speed is fast because only the wireframe of the image is obtained first and then fed into the Fast RCNN model for detection and classification purposes. The performance of the proposed methodology is evaluated on PASCAL VOC, example-based synthesis dataset and real-time dataset. The proposed methodology gives mean average precision (mAP) 89.4%, 91.33% and 88.1% on PASCAL VOC, example-based and real-time dataset. The experimental analysis demonstrated that our proposed detection method achieves better results than the other state of art methods. The approach is helpful to detect the 2D and 3D objects as well.
Mimicking evasive behavior in wavelength‐dependent reconfigurable phototransistors with ultralow power consumption
Retinal‐inspired synaptic phototransistors, which integrate light signal detection, preprocessing, and memory functions, show promising applications in artificial vision sensors. In recent years, it has been reported to construct heterojunction in phototransistors to realize positive photoconductance (PPC) and negative photoconductance (NPC) modulations, thereby achieving visible and infrared wavelength discrimination and various visual functions. However, relatively little attention has been paid to wavelength‐dependent switching and reconfigurability between two states (PPC and NPC), limiting further applications for complex simulations of biological visual functions. Here, a mixed organic–inorganic heterojunction synaptic phototransistor was constructed by integrating CsPbBr3 nanoplates (NPLs) with strong blue‐light absorption and poly(3‐hexylthiophene‐2,5‐diyl) (P3HT) with strong red‐light absorption. Compared with the three‐dimensional (3D) structure CsPbBr3 nanocubes (NCs), the two‐dimensional (2D) CsPbBr3 NPLs exhibited more efficient charge transfer with P3HT. Based on the individual optical absorption properties in organic–inorganic heterojunction, the device exhibited wavelength‐selective and reconfigurable behavior between PPC and NPC. A low power consumption of 0.053 fJ per synaptic event was achieved, which is comparable to a biological synapse. Finally, Drosophila's evasive behavior to food under red and blue light can be successfully demonstrated. This work demonstrates the future potential of synaptic phototransistors for visuomorphic computing. The synaptic phototransistor composed of two‐dimensional CsPbBr3 nanoplates (NPLs) and poly(3‐hexylthiophene‐2,5‐diyl) (P3HT) was constructed. Benefitting from the complementary light absorption properties of CsPbBr3 NPLs and P3HT, the wavelength‐dependent reconfigurable switching between positive photoconductance and negative photoconductance was realized. This work provides new opportunities for phototransistors to simulate more biological visual functions and shows potential in in‐sensor computing.
Hadamard Error-Correcting Codes and Their Application in Digital Watermarking
In communication technologies such as digital watermarking, wireless sensor networks (WSNs), and visual light communication (VLC), error-correcting codes are crucial. The Enhanced Hadamard Error-Correcting Code (EHC), which is based on 2D Hadamard Basis Images, is a novel error correction technique that is presented in this study. This technique is used to evaluate the effectiveness of the video watermarking scheme. Even with highly sophisticated embedding techniques, watermarks usually fail to resist such comprehensive attacks because of the extraordinarily high compression rate of approximately 1:200 that is frequently employed in video dissemination. It can only be used in conjunction with a sufficient error-correcting coding method. This study compares the efficacy of the well-known Reed–Solomon Code with this novel technique, the Enhanced Hadamard Error-Correcting Code (EHC), in maintaining watermarks in embedded videos. The main idea behind this newly created multidimensional Enhanced Hadamard Error-Correcting Code is to use a 1D Hadamard decoding approach on the 2D base pictures after they have been transformed into a collection of one-dimensional rows. Following that, the image is rebuilt, allowing for a more effective 2D decoding procedure. Using this technique, it is possible to exceed the theoretical error-correcting capacity threshold of ⌊dmin−12⌋ bits, where dmin is the Hamming distance. It may be possible to achieve better results by converting the 2D EHC into a 3D format. The new Enhanced Hadamard Code is used in a video watermarking coding scheme to show its viability and efficacy. The original video is broken down using a multi-level interframe wavelet transform during the video watermarking embedding process. Low-pass filtering is applied to the video stream in order to extract a certain frequency range. The watermark is subsequently incorporated using this filtered section. Either the Reed–Solomon Correcting Code or the Enhanced Hadamard Code is used to protect the watermarks. The experimental results show that EHC far outperforms the RS Code and is very resilient against severe MPEG compression.
Fabric Defect Detection Based on Illumination Correction and Visual Salient Features
Aiming at the influence of uneven illumination on fabric feature extraction and the limitations of traditional frequency-based visual saliency algorithms, we propose a fabric defect detection method based on the combination of illumination correction and visual salient features—(1) Construct a multi-scale side window box (MS-BOX) filter to extract the illumination component of the image, then use the constructed two-dimensional gamma correction function to perform illumination correction on the image in the global angle, and finally enhance the local contrast of the image in the local angle; (2) Use the L0 gradient minimization method to remove the background texture of fabric images and highlight the defects; (3) Represent the fabric image as a quaternion image, where each pixel in the image is represented by a quaternion consisting of color, intensity and edge characteristics. The two-dimensional fractional Fourier transform (2D-FRFT) is used to obtain the saliency map of the quaternion image. Experiments show that our method has a higher overall recall rate for defect detection of star-patterned, box-patterned, and dot-patterned fabrics, and the overall recall-precision effect is better than other existing methods.
Point-wise saliency detection on 3D point clouds via covariance descriptors
In the human visual system, visual saliency perception is a rapid pre-attention processing mechanism, which can benefit myriad visual tasks, such as segmentation, localization, and detection. While most research is devoted to saliency detection on 2D images and 3D meshes, little work has been performed for efficient saliency detection on 3D point clouds. In this paper, we present a novel point clouds saliency detection method by employing principal component analysis (PCA) in a sigma-set feature space. In this method, we first construct local shape descriptors based on covariance matrices for saliency detection, considering that covariance matrices can naturally model nonlinear correlations of different low-level compact and rotational-invariant features. Secondly, we transform these covariance matrices to vector descriptors in Euclidean vector space by applying the sigma-point technique, which keeps the inner statistics of regions of 3D point clouds. Based on our informative descriptors, PCA is employed in the descriptor space for identifying saliency patterns in a point cloud. Our method shows its advantages of being structure-sensitive, capturing geometry information and computationally efficient. Experimental results demonstrate that our method achieves good performance without using any topological information.
A novel approach for salient object detection using double-density dual-tree complex wavelet transform in conjunction with superpixel segmentation
Salient object detection in wavelet domain has recently begun to attract researchers’ effort due to its desired ability to provide multi-scale analysis of an image simultaneously in both frequency and spatial domains. The proposed algorithm exploits the inherent multi-scale structure of the double-density dual-tree complex-oriented wavelet transform (DDDTCWT) to decompose each input image into four approximate sub-band images and 32 high-pass detailed sub-band images at each scale. These 32 detailed high-pass sub-bands at each scale are adequate to represent singularities of any geometric object with high precision and to mimic zooming-in and zooming-out process of human vision system. In the proposed model, we first compute a rough segmented saliency map (RSSM) by fusing multi-scale edge-to-texture features generated from DDDTCWT with segmentation results obtained from bipartite graph partitioning-based segmentation approach. Then, each pixel in RSSM is categorized into either background region or salient region based on a threshold. Finally, the pixels of the two regions are considered as samples to be drawn from a multivariate kernel function whose parameters are estimated using expectation maximization algorithm, to generate a saliency map. The performance of the proposed model is evaluated in terms of precision, recall, F-measure, area under the ROC curve and computation time using six publicly available image datasets. Extensive experimental results on six benchmark datasets demonstrate that the proposed model outperformed the existing 29 state-of-the-art methods in terms of F-measure on all five datasets, recall on four datasets and area under ROC curve on two datasets. In terms of mean recall value, mean F-measure value and mean AUC value on all six datasets, the proposed method outperforms all state-of-the-art methods. The proposed method also takes comparatively less computation time in comparison with many existing spatial domain methods.
Fast Component-Based QR Code Detection in Arbitrarily Acquired Images
Quick Response (QR) codes are a type of 2D barcode that is becoming very popular, with several application possibilities. Since they can encode alphanumeric characters, a rich set of information can be made available through encoded URL addresses. In particular, QR codes could be used to aid visually impaired and blind people to access web based voice information systems and services, and autonomous robots to acquire context-relevant information. However, in order to be decoded, QR codes need to be properly framed, something that robots, visually impaired and blind people will not be able to do easily without guidance. Therefore, any application that aims assisting robots or visually impaired people must have the capability to detect QR codes and guide them to properly frame the code. A fast component-based two-stage approach for detecting QR codes in arbitrarily acquired images is proposed in this work. In the first stage, regular components present at three corners of the code are detected, and in the second stage geometrical restrictions among detected components are verified to confirm the presence of a code. Experimental results show a high detection rate, superior to 90 %, at a fast speed compatible with real-time applications.
A novel method for 2D-to-3D video conversion based on boundary information
This paper proposes a novel method for 2D-to-3D video conversion, based on boundary information to automatically generate the depth map. First, we use the Gaussian model to detect foreground objects and then separate the foreground and background. Second, we employ the superpixel algorithm to find the edge information. According to the superpixels, we will assign corresponding hierarchical depth value to initial depth map. From the result of depth value assignment, we detect the edges by Sobel edge detection with two thresholds to strengthen edge information. To identify the boundary pixels, we use a thinning algorithm to modify edge detection. Following these results, we assign the depth value of foreground to refine it. We use four kinds of scanning path for the entire image to create a more accurate depth map. After that, we have the final depth map. Finally, we utilize depth image-based rendering (DIBR) to synthesize left and right view images. After combining the depth map and the original 2D video, a vivid 3D video is produced.
Using a Deep Learning Model on Images to Obtain a 2D Laser People Detector for a Mobile Robot
Recent improvements in deep learning techniques applied to images allow the detection of people with a high success rate. However, other types of sensors, such as laser rangefinders, are still useful due to their wide field of vision and their ability to operate in different environments and lighting conditions. In this work we use an interesting computational intelligence technique such as the deep learning method to detect people in images taken by a mobile robot. The masks of the people in the images are used to automatically label a set of samples formed by 2D laser range data that will allow us to detect the legs of people present in the scene. The samples are geometric characteristics of the clusters built from the laser data. The machine learning algorithms are used to learn a classifier that is capable of detecting people from only 2D laser range data. Our people detector is compared to a state-of-the-art classifier. Our proposal achieves a higher value of F 1 in the test set using an unbalanced dataset. To improve accuracy, the final classifier has been generated from a balanced training set. This final classifier has also been evaluated using a test set in which we have obtained very high accuracy values in each class. The contribution of this work is 2-fold. On the one hand, our proposal performs an automatic labeling of the samples so that the dataset can be collected under real operating conditions. On the other hand, the robot can detect people in a wider field of view than if we only used a camera, and in this way can help build more robust behaviors.