Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
3,155 result(s) for "Convolutional Neural Networks (CNNs)"
Sort by:
A deep learning based fusion of RGB camera information and magnetic localization information for endoscopic capsule robots
A reliable, real time localization functionality is crutial for actively controlled capsule endoscopy robots, which are an emerging, minimally invasive diagnostic and therapeutic technology for the gastrointestinal (GI) tract. In this study, we extend the success of deep learning approaches from various research fields to the problem of sensor fusion for endoscopic capsule robots. We propose a multi-sensor fusion based localization approach which combines endoscopic camera information and magnetic sensor based localization information. The results performed on real pig stomach dataset show that our method achieves sub-millimeter precision for both translational and rotational movements.
Designing Unmanned Aerial Survey Monitoring Program to Assess Floating Litter Contamination
Monitoring marine contamination by floating litter can be particularly challenging since debris are continuously moving over a large spatial extent pushed by currents, waves, and winds. Floating litter contamination have mostly relied on opportunistic surveys from vessels, modeling and, more recently, remote sensing with spectral analysis. This study explores how a low-cost commercial unmanned aircraft system equipped with a high-resolution RGB camera can be used as an alternative to conduct floating litter surveys in coastal waters or from vessels. The study compares different processing and analytical strategies and discusses operational constraints. Collected UAS images were analyzed using three different approaches: (i) manual counting (MC), using visual inspection and image annotation with object counts as a baseline; (ii) pixel-based detection, an automated color analysis process to assess overall contamination; and (iii) machine learning (ML), automated object detection and identification using state-of-the-art convolutional neural network (CNNs). Our findings illustrate that MC still remains the most precise method for classifying different floating objects. ML still has a heterogeneous performance in correctly identifying different classes of floating litter; however, it demonstrates promising results in detecting floating items, which can be leveraged to scale up monitoring efforts and be used in automated analysis of large sets of imagery to assess relative floating litter contamination.
CellNet: A Lightweight Model towards Accurate LOC-Based High-Speed Cell Detection
Label-free cell separation and sorting in a microfluidic system, an essential technique for modern cancer diagnosis, resulted in high-throughput single-cell analysis becoming a reality. However, designing an efficient cell detection model is challenging. Traditional cell detection methods are subject to occlusion boundaries and weak textures, resulting in poor performance. Modern detection models based on convolutional neural networks (CNNs) have achieved promising results at the cost of a large number of both parameters and floating point operations (FLOPs). In this work, we present a lightweight, yet powerful cell detection model named CellNet, which includes two efficient modules, CellConv blocks and the h-swish nonlinearity function. CellConv is proposed as an effective feature extractor as a substitute to computationally expensive convolutional layers, whereas the h-swish function is introduced to increase the nonlinearity of the compact model. To boost the prediction and localization ability of the detection model, we re-designed the model’s multi-task loss function. In comparison with other efficient object detection methods, our approach achieved state-of-the-art 98.70% mean average precision (mAP) on our custom sea urchin embryos dataset with only 0.08 M parameters and 0.10 B FLOPs, reducing the size of the model by 39.5× and the computational cost by 4.6×. We deployed CellNet on different platforms to verify its efficiency. The inference speed on a graphics processing unit (GPU) was 500.0 fps compared with 87.7 fps on a CPU. Additionally, CellNet is 769.5-times smaller and 420 fps faster than YOLOv3. Extensive experimental results demonstrate that CellNet can achieve an excellent efficiency/accuracy trade-off on resource-constrained platforms.
Automatic Ceiling Damage Detection in Large-Span Structures Based on Computer Vision and Deep Learning
To alleviate the workload in prevailing expert-based onsite inspection, a vision-based method using state-of-the-art deep learning architectures is proposed to automatically detect ceiling damage in large-span structures. The dataset consists of 914 images collected by the Kawaguchi Lab since 1995 with over 7000 learnable damages in the ceilings and is categorized into four typical damage forms (peelings, cracks, distortions, and fall-offs). Twelve detection models are established, trained, and compared by variable hyperparameter analysis. The best performing model reaches a mean average precision (mAP) of 75.28%, which is considerably high for object detection. A comparative study indicates that the model is generally robust to the challenges in ceiling damage detection, including partial occlusion by visual obstructions, the extremely varied aspect ratios, small object detection, and multi-object detection. Another comparative study in the F1 score performance, which combines the precision and recall in to one single metric, shows that the model outperforms the CNN (convolutional neural networks) model using the Saliency-MAP method in our previous research to a remarkable extent. In the case of a large-area ratio with a non-ceiling region, the F1 score of these two models are 0.83 and 0.28, respectively. The findings of this study push automatic ceiling damage detection in large-span structures one step further.
A CNN-Based Method of Vehicle Detection from Aerial Images Using Hard Example Mining
Recently, deep learning techniques have had a practical role in vehicle detection. While much effort has been spent on applying deep learning to vehicle detection, the effective use of training data has not been thoroughly studied, although it has great potential for improving training results, especially in cases where the training data are sparse. In this paper, we proposed using hard example mining (HEM) in the training process of a convolutional neural network (CNN) for vehicle detection in aerial images. We applied HEM to stochastic gradient descent (SGD) to choose the most informative training data by calculating the loss values in each batch and employing the examples with the largest losses. We picked 100 out of both 500 and 1000 examples for training in one iteration, and we tested different ratios of positive to negative examples in the training data to evaluate how the balance of positive and negative examples would affect the performance. In any case, our method always outperformed the plain SGD. The experimental results for images from New York showed improved performance over a CNN trained in plain SGD where the F1 score of our method was 0.02 higher.
Empirical study of 3D-HPE on HOI4D egocentric vision dataset based on deep learning
3D hand pose estimation (3D-HPE) is one of the tasks performed on data obtained from egocentric vision camera (EVC) such as hand detection, segmentation, and gesture recognition applied in fields such as HCI, HRI, VR, AR, Healthcare, supporting for the visually impaired people, etc. In these applications, hand point cloud data obtained from EV is not very challenging due to being obscured by gaze direction and other objects. Our paper performs a comparative study on 3D right-hand pose estimation (3D-R-HPE) from the HOI4D dataset with four cameras used to collect and animate the dataset. This is a very challenging dataset and was published at CVPR 2022. We use CNNs (P2PR PointNet, Hand PointNet, V2V-PoseNet, and HandFoldingNet - HFNet) to fine-tune the 3D-HPE model based on the point cloud data (PCD) of hand. The resulting error of 3D-HPE is presented as follows: P2PR PointNet (average error (Erra) is 32.71mm), Hand PointNet (average error (Erra) is 35.12mm), V2V-PoseNet (average error (Erra) is 26.32mm), and HFNet (average error (Erra) is 20.49mm). HFNet is the latest CNN (in 2021) with the best results. This estimation error is small and can be applied and modeled to automatically detect, estimate, and recognize hand pose from the data obtained by EV. The average processing time is 5.4fps when done on the GPU of the HFNet, which is the fastest. Detailed quantitative and qualitative results were presented that are beneficial to various applications such as human-computer interaction, virtual and augmented reality, and healthcare, particularly in challenging scenarios involving occlusions and complex datasets.
BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network
Sign language recognition is one of the most challenging applications in machine learning and human-computer interaction. Many researchers have developed classification models for different sign languages such as English, Arabic, Japanese, and Bengali; however, no significant research has been done on the general-shape performance for different datasets. Most research work has achieved satisfactory performance with a small dataset. These models may fail to replicate the same performance for evaluating different and larger datasets. In this context, this paper proposes a novel method for recognizing Bengali sign language (BSL) alphabets to overcome the issue of generalization. The proposed method has been evaluated with three benchmark datasets such as ‘38 BdSL’, ‘KU-BdSL’, and ‘Ishara-Lipi’. Here, three steps are followed to achieve the goal: segmentation, augmentation, and Convolutional neural network (CNN) based classification. Firstly, a concatenated segmentation approach with YCbCr, HSV and watershed algorithm was designed to accurately identify gesture signs. Secondly, seven image augmentation techniques are selected to increase the training data size without changing the semantic meaning. Finally, the CNN-based model called BenSignNet was applied to extract the features and classify purposes. The performance accuracy of the model achieved 94.00%, 99.60%, and 99.60% for the BdSL Alphabet, KU-BdSL, and Ishara-Lipi datasets, respectively. Experimental findings confirmed that our proposed method achieved a higher recognition rate than the conventional ones and accomplished a generalization property in all datasets for the BSL domain.
Enhancing Plant Disease Detection: Incorporating Advanced CNN Architectures for Better Accuracy and Interpretability
Convolutional Neural Networks (CNNs) have proven effective in automated plant disease diagnosis, significantly contributing to crop health monitoring. However, their limited interpretability hinders practical deployment in real-world agricultural settings. To address this, we explore advanced CNN architectures, namely ResNet-50 and EfficientNet, augmented with attention mechanisms. These models enhance accuracy by optimizing depth, width, and resolution, while attention layers improve transparency by focusing on disease-relevant regions. Experiments using the PlantVillage dataset show that basic CNNs achieve 46.69% accuracy, while ResNet-50 and EfficientNet attain 63.79% and 98.27%, respectively. On a 39-class extended dataset, our proposed EfficientNet-B0 with attention (EfficientNetB0-Attn), integrating an attention module at layer 262, achieves 99.39% accuracy. This approach significantly enhances interpretability without compromising performance. The attention module generates weights via backpropagation, allowing the model to emphasize disease-relevant image regions, thereby enhancing both accuracy and interpretability.
Forecasting Vertical Profiles of Ocean Currents from Surface Characteristics: A Multivariate Multi-Head Convolutional Neural Network–Long Short-Term Memory Approach
While study of ocean dynamics usually involves modeling deep ocean variables, monitoring and accurate forecasting of nearshore environments is also critical. However, sensor observations often contain artifacts like long stretches of missing data and noise, typically after an extreme event occurrence or some accidental damage to the sensors. Such data artifacts, if not handled diligently prior to modeling, can significantly impact the reliability of any further predictive analysis. Therefore, we present a framework that integrates data reconstruction of key sea state variables and multi-step-ahead forecasting of current speed from the reconstructed time series for 19 depth levels simultaneously. Using multivariate chained regressions, the reconstruction algorithm rigorously tests from an ensemble of tree-based models (fed only with surface characteristics) to impute gaps in the vertical profiles of the sea state variables down to 20 m deep. Subsequently, a deep encoder–decoder model, comprising multi-head convolutional networks, extracts high-level features from each depth level’s multivariate (reconstructed) input and feeds them to a deep long short-term memory network for 24 h ahead forecasts of current speed profiles. In this work, we utilized Viking buoy data, and demonstrated that with limited training data, we could explain an overall 80% variation in the current speed profiles across the forecast period and the depth levels.
Real Time Multipurpose Smart Waste Classification Model for Efficient Recycling in Smart Cities Using Multilayer Convolutional Neural Network and Perceptron
Urbanization is a big concern for both developed and developing countries in recent years. People shift themselves and their families to urban areas for the sake of better education and a modern lifestyle. Due to rapid urbanization, cities are facing huge challenges, one of which is waste management, as the volume of waste is directly proportional to the people living in the city. The municipalities and the city administrations use the traditional wastage classification techniques which are manual, very slow, inefficient and costly. Therefore, automatic waste classification and management is essential for the cities that are being urbanized for the better recycling of waste. Better recycling of waste gives the opportunity to reduce the amount of waste sent to landfills by reducing the need to collect new raw material. In this paper, the idea of a real-time smart waste classification model is presented that uses a hybrid approach to classify waste into various classes. Two machine learning models, a multilayer perceptron and multilayer convolutional neural network (ML-CNN), are implemented. The multilayer perceptron is used to provide binary classification, i.e., metal or non-metal waste, and the CNN identifies the class of non-metal waste. A camera is placed in front of the waste conveyor belt, which takes a picture of the waste and classifies it. Upon successful classification, an automatic hand hammer is used to push the waste into the assigned labeled bucket. Experiments were carried out in a real-time environment with image segmentation. The training, testing, and validation accuracy of the purposed model was 0.99% under different training batches with different input features.