Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
386
result(s) for
"Grad-CAM"
Sort by:
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
by
Das Abhishek
,
Cogswell, Michael
,
Vedantam Ramakrishna
in
Artificial neural networks
,
Computer vision
,
Decisions
2020
We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g.VGG), (2) CNNs used for structured outputs (e.g.captioning), (3) CNNs used in tasks with multi-modal inputs (e.g.visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are robust to adversarial perturbations, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show that even non-attention based models learn to localize discriminative regions of input image. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names (Bau et al. in Computer vision and pattern recognition, 2017) to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo on CloudCV (Agrawal et al., in: Mobile cloud visual media computing, pp 265–290. Springer, 2015) (http://gradcam.cloudcv.org) and a video at http://youtu.be/COjUB9Izk6E.
Journal Article
An Explainable Deep Learning Model to Prediction Dental Caries Using Panoramic Radiograph Images
by
Zeynep Ozpolat
,
Ozal Yildirim
,
U. Rajendra Acharya
in
Accuracy
,
caries
,
caries; dental health; explainable deep models; deep learning; Grad-CAM
2023
Dental caries is the most frequent dental health issue in the general population. Dental caries can result in extreme pain or infections, lowering people’s quality of life. Applying machine learning models to automatically identify dental caries can lead to earlier treatment. However, physicians frequently find the model results unsatisfactory due to a lack of explainability. Our study attempts to address this issue with an explainable deep learning model for detecting dental caries. We tested three prominent pre-trained models, EfficientNet-B0, DenseNet-121, and ResNet-50, to determine which is best for the caries detection task. These models take panoramic images as the input, producing a caries–non-caries classification result and a heat map, which visualizes areas of interest on the tooth. The model performance was evaluated using whole panoramic images of 562 subjects. All three models produced remarkably similar results. However, the ResNet-50 model exhibited a slightly better performance when compared to EfficientNet-B0 and DenseNet-121. This model obtained an accuracy of 92.00%, a sensitivity of 87.33%, and an F1-score of 91.61%. Visual inspection showed us that the heat maps were also located in the areas with caries. The proposed explainable deep learning model diagnosed dental caries with high accuracy and reliability. The heat maps help to explain the classification results by indicating a region of suspected caries on the teeth. Dentists could use these heat maps to validate the classification results and reduce misclassification.
Journal Article
Accelerating Urban Flood Inundation Simulation Under Spatio‐Temporally Varying Rainstorms Using ConvLSTM Deep Learning Model
by
Liao, Yaoxing
,
Lai, Chengguang
,
Wang, Zhaoli
in
Accuracy
,
Artificial neural networks
,
Climate change
2025
Urban floods induced by rainstorms can lead to severe losses of lives and property, making rapid flood prediction essential for effective disaster prevention and mitigation. However, traditional deep learning (DL) models often overlook the spatial heterogeneity of rainstorms and lack interpretability. Here, we propose an end‐to‐end rapid prediction method for urban flood inundation incorporating spatiotemporal varying rainstorms using a Convolutional Long Short‐Term Memory Network (ConvLSTM) DL model. We compare the performance of the proposed method with that of a 3D Convolutional Neural Network (3D CNN) model and introduce the spatial visualization technique Grad‐CAM to interpret the rainstorms contributions to flood predictions. Results demonstrate that: (a) Compared to the physics‐based model, the proposed ConvLSTM model achieves satisfactory accuracy in predicting flood inundation evolution under spatio‐temporal varying rainstorms, with an average Pearson correlation coefficient (PCC) of 0.958 and a mean absolute error (MAE) of 0.021 m, successfully capturing the locations of observed inundation points under actual rainstorm conditions. (b) The ConvLSTM model can rapidly predict urban rainstorm inundation process in just 2 s for a study area of 74 km2, which is 170 times more efficient than a physics‐based model. (c) The interpretability of the ConvLSTM model for urban flood prediction can be enhanced through Grad‐CAM, revealing the model naturally focuses on local or upstream rainfall concentration areas most responsible for inundation, aligning well with hydrological understanding. Overall, the ConvLSTM model serves as a powerful surrogate for rapid urban flood simulation, providing an important reference for real‐time flood early warning and mitigation.
Journal Article
Enhancing agriculture through real-time grape leaf disease classification via an edge device with a lightweight CNN architecture and Grad-CAM
by
Ahsan, Mominul
,
Haider, Julfikar
,
Goni, Md. Omaer Faruq
in
631/114/2397
,
631/1647/48
,
Agriculture - methods
2024
Crop diseases can significantly affect various aspects of crop cultivation, including crop yield, quality, production costs, and crop loss. The utilization of modern technologies such as image analysis via machine learning techniques enables early and precise detection of crop diseases, hence empowering farmers to effectively manage and avoid the occurrence of crop diseases. The proposed methodology involves the use of modified MobileNetV3Large model deployed on edge device for real-time monitoring of grape leaf disease while reducing computational memory demands and ensuring satisfactory classification performance. To enhance applicability of MobileNetV3Large, custom layers consisting of two dense layers were added, each followed by a dropout layer, helped mitigate overfitting and ensured that the model remains efficient. Comparisons among other models showed that the proposed model outperformed those with an average train and test accuracy of 99.66% and 99.42%, with a precision, recall, and F1 score of approximately 99.42%. The model was deployed on an edge device (Nvidia Jetson Nano) using a custom developed GUI app and predicted from both saved and real-time data with high confidence values. Grad-CAM visualization was used to identify and represent image areas that affect the convolutional neural network (CNN) classification decision-making process with high accuracy. This research contributes to the development of plant disease classification technologies for edge devices, which have the potential to enhance the ability of autonomous farming for farmers, agronomists, and researchers to monitor and mitigate plant diseases efficiently and effectively, with a positive impact on global food security.
Journal Article
A novel Skin lesion prediction and classification technique: ViT‐GradCAM
by
Jayachandran, Jagannathan
,
Srinivasan, Gayathri
,
Shafiq, Muhammad
in
Algorithms
,
Databases, Factual
,
Deep Learning
2024
Background Skin cancer is one of the highly occurring diseases in human life. Early detection and treatment are the prime and necessary points to reduce the malignancy of infections. Deep learning techniques are supplementary tools to assist clinical experts in detecting and localizing skin lesions. Vision transformers (ViT) based on image segmentation classification using multiple classes provide fairly accurate detection and are gaining more popularity due to legitimate multiclass prediction capabilities. Materials and methods In this research, we propose a new ViT Gradient‐Weighted Class Activation Mapping (GradCAM) based architecture named ViT‐GradCAM for detecting and classifying skin lesions by spreading ratio on the lesion's surface area. The proposed system is trained and validated using a HAM 10000 dataset by studying seven skin lesions. The database comprises 10 015 dermatoscopic images of varied sizes. The data preprocessing and data augmentation techniques are applied to overcome the class imbalance issues and improve the model's performance. Result The proposed algorithm is based on ViT models that classify the dermatoscopic images into seven classes with an accuracy of 97.28%, precision of 98.51, recall of 95.2%, and an F1 score of 94.6, respectively. The proposed ViT‐GradCAM obtains better and more accurate detection and classification than other state‐of‐the‐art deep learning‐based skin lesion detection models. The architecture of ViT‐GradCAM is extensively visualized to highlight the actual pixels in essential regions associated with skin‐specific pathologies. Conclusion This research proposes an alternate solution to overcome the challenges of detecting and classifying skin lesions using ViTs and GradCAM, which play a significant role in detecting and classifying skin lesions accurately rather than relying solely on deep learning models.
Journal Article
A Comprehensive Review of Explainable Artificial Intelligence (XAI) in Computer Vision
by
Cai, Lingfeng
,
Li, Yule
,
Cheng, Zhihan
in
Algorithms
,
Artificial Intelligence
,
Comparative analysis
2025
Explainable Artificial Intelligence (XAI) is increasingly important in computer vision, aiming to connect complex model outputs with human understanding. This review provides a focused comparative analysis of representative XAI methods in four main categories, attribution-based, activation-based, perturbation-based, and transformer-based approaches, selected from a broader literature landscape. Attribution-based methods like Grad-CAM highlight key input regions using gradients and feature activation. Activation-based methods analyze the responses of internal neurons or feature maps to identify which parts of the input activate specific layers or units, helping to reveal hierarchical feature representations. Perturbation-based techniques, such as RISE, assess feature importance through input modifications without accessing internal model details. Transformer-based methods, which use self-attention, offer global interpretability by tracing information flow across layers. We evaluate these methods using metrics such as faithfulness, localization accuracy, efficiency, and overlap with medical annotations. We also propose a hierarchical taxonomy to classify these methods, reflecting the diversity of XAI techniques. Results show that RISE has the highest faithfulness but is computationally expensive, limiting its use in real-time scenarios. Transformer-based methods perform well in medical imaging, with high IoU scores, though interpreting attention maps requires care. These findings emphasize the need for context-aware evaluation and hybrid XAI methods balancing interpretability and efficiency. The review ends by discussing ethical and practical challenges, stressing the need for standard benchmarks and domain-specific tuning.
Journal Article
An Explainable Centralized Deep Learning Model for Gastrointestinal Polyp Segmentation Using the Kvasir-SEG Dataset
2026
Gastrointestinal polyps are well-known precursors to colorectal cancer (CRC), making their accurate detection and segmentation during colonoscopy essential for early diagnosis and cancer prevention. Deep learning–based segmentation models trained on publicly available datasets such as Kvasir-SEG have demonstrated promising performance; however, two key challenges remain: limited robustness across diverse polyp morphologies and endoscopic imaging conditions, and the lack of interpretable decision-making mechanisms that support clinical trust and validation. Many existing centralized segmentation approaches are primarily optimized using overlap-based metrics such as the Dice coefficient and intersection over union (IoU), without adequately analyzing challenging cases such as small, flat, or low-contrast polyps or providing insight into the visual cues influencing model predictions. This study presents an explainable centralized deep learning segmentation model for gastrointestinal polyp segmentation using the Kvasir-SEG dataset. The approach integrates a ResUNet++-Lite encoder–decoder segmentation model with Grad-CAM and masked Grad-CAM visualizations to analyze the spatial regions influencing segmentation predictions. The study focuses on establishing a reproducible and interpretable experimental model that combines systematic preprocessing, data augmentation, centralized training, and explainability analysis. Experimental evaluation on an 80:20 train–test split of the Kvasir-SEG dataset, where data augmentation was applied after splitting, demonstrates stable training behavior and competitive segmentation performance, achieving a pixel accuracy of 0.964, a Dice coefficient of 0.858, and an IoU of 0.791 on the held-out test set. Qualitative explainability results further indicate that the model consistently focuses on anatomically relevant polyp regions. Overall, the study illustrates how segmentation performance and explainable AI techniques can be integrated to support the development of clinically interpretable AI-assisted colonoscopy systems.
Journal Article
COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare
by
Tiwari, Prayag
,
Shome, Debaditya
,
Zhang, Yazhou
in
Accuracy
,
Bacterial pneumonia
,
Classification
2021
In the recent pandemic, accurate and rapid testing of patients remained a critical task in the diagnosis and control of COVID-19 disease spread in the healthcare industry. Because of the sudden increase in cases, most countries have faced scarcity and a low rate of testing. Chest X-rays have been shown in the literature to be a potential source of testing for COVID-19 patients, but manually checking X-ray reports is time-consuming and error-prone. Considering these limitations and the advancements in data science, we proposed a Vision Transformer-based deep learning pipeline for COVID-19 detection from chest X-ray-based imaging. Due to the lack of large data sets, we collected data from three open-source data sets of chest X-ray images and aggregated them to form a 30 K image data set, which is the largest publicly available collection of chest X-ray images in this domain to our knowledge. Our proposed transformer model effectively differentiates COVID-19 from normal chest X-rays with an accuracy of 98% along with an AUC score of 99% in the binary classification task. It distinguishes COVID-19, normal, and pneumonia patient’s X-rays with an accuracy of 92% and AUC score of 98% in the Multi-class classification task. For evaluation on our data set, we fine-tuned some of the widely used models in literature, namely, EfficientNetB0, InceptionV3, Resnet50, MobileNetV3, Xception, and DenseNet-121, as baselines. Our proposed transformer model outperformed them in terms of all metrics. In addition, a Grad-CAM based visualization is created which makes our approach interpretable by radiologists and can be used to monitor the progression of the disease in the affected lungs, assisting healthcare.
Journal Article
Explainable attention based breast tumor segmentation using a combination of UNet, ResNet, DenseNet, and EfficientNet models
2025
This study utilizes the Breast Ultrasound Image (BUSI) dataset to present a deep learning technique for breast tumor segmentation based on a modified UNet architecture. To improve segmentation accuracy, the model integrates attention mechanisms, such as the Convolutional Block Attention Module (CBAM) and Non-Local Attention, with advanced encoder architectures, including ResNet, DenseNet, and EfficientNet. These attention mechanisms enable the model to focus more effectively on relevant tumor areas, resulting in significant performance improvements. Models incorporating attention mechanisms outperformed those without, as reflected in superior evaluation metrics. The effects of Dice Loss and Binary Cross-Entropy (BCE) Loss on the model’s performance were also analyzed. Dice Loss maximized the overlap between predicted and actual segmentation masks, leading to more precise boundary delineation, while BCE Loss achieved higher recall, improving the detection of tumor areas. Grad-CAM visualizations further demonstrated that attention-based models enhanced interpretability by accurately highlighting tumor areas. The findings denote that combining advanced encoder architectures, attention mechanisms, and the UNet framework can yield more reliable and accurate breast tumor segmentation. Future research will explore the use of multi-modal imaging, real-time deployment for clinical applications, and more advanced attention mechanisms to further improve segmentation performance.
Journal Article
Enhancing brain tumor detection in MRI images through explainable AI using Grad-CAM with Resnet 50
2024
This study addresses the critical challenge of detecting brain tumors using MRI images, a pivotal task in medical diagnostics that demands high accuracy and interpretability. While deep learning has shown remarkable success in medical image analysis, there remains a substantial need for models that are not only accurate but also interpretable to healthcare professionals. The existing methodologies, predominantly deep learning-based, often act as black boxes, providing little insight into their decision-making process. This research introduces an integrated approach using ResNet50, a deep learning model, combined with Gradient-weighted Class Activation Mapping (Grad-CAM) to offer a transparent and explainable framework for brain tumor detection. We employed a dataset of MRI images, enhanced through data augmentation, to train and validate our model. The results demonstrate a significant improvement in model performance, with a testing accuracy of 98.52% and precision-recall metrics exceeding 98%, showcasing the model’s effectiveness in distinguishing tumor presence. The application of Grad-CAM provides insightful visual explanations, illustrating the model’s focus areas in making predictions. This fusion of high accuracy and explainability holds profound implications for medical diagnostics, offering a pathway towards more reliable and interpretable brain tumor detection tools.
Journal Article