Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
2,127 result(s) for "Attention map"
Sort by:
Oil Spill Identification based on Dual Attention UNet Model Using Synthetic Aperture Radar Images
Oil spills cause tremendous damage to marine, coastal environments, and ecosystems. Previous deep learning-based studies have addressed the task of detecting oil spills as a semantic segmentation problem. However, further improvement is still required to address the noisy nature of the Synthetic Aperture Radar (SAR) imagery problem, which limits segmentation performance. In this study, a new deep learning model based on the Dual Attention Model (DAM) is developed to automatically detect oil spills in a water body. We enhanced a conventional UNet segmentation network by integrating a dual attention model DAM to selectively highlight the relevant and discriminative global and local characteristics of oil spills in SAR imagery. DAM is composed of a Channel Attention Map and a Position Attention Map which are stacked in the decoder network of UNet. The proposed DAM-UNet is compared with four baselines, namely fully convolutional network, PSPNet, LinkNet, and traditional UNet. The proposed DAM-UNet outperforms the four baselines, as demonstrated empirically. Moreover, the EG-Oil Spill dataset includes a large set of SAR images with 3000 image pairs. The obtained overall accuracy of the proposed method increased by 3.2% and reaches 94.2% compared with that of the traditional UNet. The study opens new development ideas for integrating attention modules into other deep learning tasks, including machine translation, image-based analysis, action recognition, and speech recognition.
Deterioration Level Estimation Based on Convolutional Neural Network Using Confidence-Aware Attention Mechanism for Infrastructure Inspection
This paper presents deterioration level estimation based on convolutional neural networks using a confidence-aware attention mechanism for infrastructure inspection. Spatial attention mechanisms try to highlight the important regions in feature maps for estimation by using an attention map. The attention mechanism using an effective attention map can improve feature maps. However, the conventional attention mechanisms have a problem as they fail to highlight important regions for estimation when an ineffective attention map is mistakenly used. To solve the above problem, this paper introduces the confidence-aware attention mechanism that reduces the effect of ineffective attention maps by considering the confidence corresponding to the attention map. The confidence is calculated from the entropy of the estimated class probabilities when generating the attention map. Because the proposed method can effectively utilize the attention map by considering the confidence, it can focus more on the important regions in the final estimation. This is the most significant contribution of this paper. The experimental results using images from actual infrastructure inspections confirm the performance improvement of the proposed method in estimating the deterioration level.
TA-DNN—two stage attention-based deep neural network for single image rain removal
Computer Vision algorithms easily get affected by images or videos captured in outdoor environments due to various bad weather conditions such as rain, fog, snow and haze. Two-stage deep neural network based on attention learning is proposed for single image rain removal. Based on the fact that the implicit connection among rain lines within the image is higher than that between the rain line and the background image, it is easier to learn the rain lines in the rainy image using attention learning. The proposed Two-stage Attention-based Deep Neural Network (TA-DNN) for single image rain removal essentially consists of modules such as Inception, Sequential Dual Attention Block (SDAB), and Multi-Scale Feature Aggregation Module (MSFAM) for feature extraction, rain line detection, and transfiguration, respectively. The experimental results illustrate that the proposed method performs better when compared with the state-of-the-art methods both qualitatively and quantitatively.
Image super-resolution reconstruction based on feature map attention mechanism
To improve the issue of low-frequency and high-frequency components from feature maps being treated equally in existing image super-resolution reconstruction methods, the paper proposed an image super-resolution reconstruction method using attention mechanism with feature map to facilitate reconstruction from original low-resolution images to multi-scale super-resolution images. The proposed model consists of a feature extraction block, an information extraction block, and a reconstruction module. Firstly, the extraction block is used to extract useful features from low-resolution images, with multiple information extraction blocks being combined with the feature map attention mechanism and passed between feature channels. Secondly, the interdependence is used to adaptively adjust the channel characteristics to restore more details. Finally, the reconstruction module reforms different scales high-resolution images. The experimental results can demonstrate that the proposed method can effectively improve not only the visual effect of images but also the results on the Set5, Set14, Urban100, and Manga109. The results can demonstrate the proposed method has structurally similarity to the image reconstruction methods. Furthermore, the evaluating indicator of Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) has been improved to a certain degree, while the effectiveness of using feature map attention mechanism in image super-resolution reconstruction applications is useful and effective.
Brain tumor detection and classification in MRI using hybrid ViT and GRU model with explainable AI in Southern Bangladesh
Brain tumor, a leading cause of uncontrolled cell growth in the central nervous system, presents substantial challenges in medical diagnosis and treatment. Early and accurate detection is essential for effective intervention. This study aims to enhance the detection and classification of brain tumors in Magnetic Resonance Imaging (MRI) scans using an innovative framework combining Vision Transformer (ViT) and Gated Recurrent Unit (GRU) models. We utilized primary MRI data from Bangabandhu Sheikh Mujib Medical College Hospital (BSMMCH) in Faridpur, Bangladesh. Our hybrid ViT-GRU model extracts essential features via ViT and identifies relationships between these features using GRU, addressing class imbalance and outperforming existing diagnostic methods. We extensively processed the dataset, and then trained the model using various optimizers (SGD, Adam, AdamW) and evaluated through rigorous 10-fold cross-validation. Additionally, we incorporated Explainable Artificial Intelligence (XAI) techniques-Attention Map, SHAP, and LIME-to enhance the interpretability of the model’s predictions. For the primary dataset BrTMHD-2023, the ViT-GRU model achieved precision, recall, and F1-score metrics of 97%. The highest accuracies obtained with SGD, Adam, and AdamW optimizers were 81.66%, 96.56%, and 98.97%, respectively. Our model outperformed existing Transfer Learning models by 1.26%, as validated through comparative analysis and cross-validation. The proposed model also shows excellent performances with another Brain Tumor Kaggle Dataset outperforming the existing research done on the same dataset with 96.08% accuracy. The proposed ViT-GRU framework significantly improves the detection and classification of brain tumors in MRI scans. The integration of XAI techniques enhances the model’s transparency and reliability, fostering trust among clinicians and facilitating clinical application. Future work will expand the dataset and apply findings to real-time diagnostic devices, advancing the field.
Deep Discriminative Representation Learning with Attention Map for Scene Classification
In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.
Interpretable and Reliable Oral Cancer Classifier with Attention Mechanism and Expert Knowledge Embedding via Attention Map
Convolutional neural networks have demonstrated excellent performance in oral cancer detection and classification. However, the end-to-end learning strategy makes CNNs hard to interpret, and it can be challenging to fully understand the decision-making procedure. Additionally, reliability is also a significant challenge for CNN based approaches. In this study, we proposed a neural network called the attention branch network (ABN), which combines the visual explanation and attention mechanisms to improve the recognition performance and interpret the decision-making simultaneously. We also embedded expert knowledge into the network by having human experts manually edit the attention maps for the attention mechanism. Our experiments have shown that ABN performs better than the original baseline network. By introducing the Squeeze-and-Excitation (SE) blocks to the network, the cross-validation accuracy increased further. Furthermore, we observed that some previously misclassified cases were correctly recognized after updating by manually editing the attention maps. The cross-validation accuracy increased from 0.846 to 0.875 with the ABN (Resnet18 as baseline), 0.877 with SE-ABN, and 0.903 after embedding expert knowledge. The proposed method provides an accurate, interpretable, and reliable oral cancer computer-aided diagnosis system through visual explanation, attention mechanisms, and expert knowledge embedding.
Visual Attention Consistency for Human Attribute Recognition
The recognition of a human attribute is usually determined by certain regions of the input image, e.g., certain part of the human body, and such attribute-region relevance plays an important role in human attribute recognition. In deep networks, this attribute-region relevance can be derived as an interpretive attention map, where highlighted areas indicate the most relevant regions that contribute to the final recognition. Based on the assumption that more plausible attention maps indicate better networks, in this paper, we propose a new approach for human attribute recognition by exploring and enforcing two kinds of attention consistency in network learning. One kind of consistency enforces the equivariance of the attention map when the input image undergoes certain spatial transforms, such as scaling, rotation and flipping. The other kind of the consistency is enforced between the attention maps derived from two different networks when both of them are trained for recognizing the same attribute from the same image. We formulate these two kinds of consistency as new loss functions and combine them with the traditional classification loss for network training. Experiments on three datasets of human attribute recognition verify the effectiveness of the proposed method by achieving new state-of-the-art performance.
WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection
Breathtaking advances in face forgery techniques produce visually untraceable deepfake videos, thus potential malicious abuse of these techniques has sparked great concerns. Existing deepfake detectors primarily focus on specific forgery patterns with global features extracted by CNN backbones for forgery detection. Due to inadequate exploration of content and texture features, they often suffer from overfitting method-specific forged regions, thus exhibiting limited generalization to increasingly realistic forgeries. In this paper, we propose a Wavelet-guided Texture-Content HiErarchical Relation (WATCHER) Learning framework to delve deeper into the relation-aware texture-content features. Specifically, we propose a Wavelet-guided AutoEncoder scheme to retrieve the general visual representation, which is aware of high-frequency details for understanding forgeries. To further excavate fine-grained counterfeit clues, a Texture-Content Attention Maps Learning module is presented to enrich the contextual information of content and texture features via multi-level attention maps in a hierarchical learning protocol. Finally, we propose a Progressive Multi-domain Feature Interaction module in pursuit to perform semantic reasoning on relationship-enhanced texture-content forgery features. Extensive experiments on popular benchmark datasets substantiate the superiority of our WATCHER model, consistently trumping state-of-the-art methods by a significant margin.
ID-insensitive deepfake detection model based on multi-attention mechanism
Deepfake technology has enabled the widespread distribution of manipulated facial content online, raising serious societal concerns. In recent years, deepfake detection has emerged as a critical research focus. However, existing methods frequently overlook the connection between local details and overall image features, while also failing to address the problem of implicit identity leakage. Consequently, their performance is suboptimal, particularly in cross-dataset evaluations. Specifically, the proposed multi-attention deepfake detection model consists of the following three parts: (1) Texture Feature Enhancement: We employ CondenseNet to enhance texture features efficiently, preserving subtle details and ensuring feature integrity; (2) Multi-Scale Artifact Detection: We introduce an artifact detection module that identifies potentially manipulated regions, enabling localized detection and minimizing the impact of identity information. (3) Multi-Attention Mechanism: By generating multiple attention maps, our model prioritizes different regions of the input image, fusing both texture and local features to improve classification performance. Our method is evaluated on the FaceForensics++ and DFDC benchmarks for facial manipulation detection. Additionally, we assess its cross-dataset performance on Celeb-DF-v2, achieving state-of-the-art results.