Catalogue Search | MBRL

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

by Das Abhishek , Cogswell, Michael , Vedantam Ramakrishna in Artificial neural networks , Computer vision , Decisions

2020

We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g.VGG), (2) CNNs used for structured outputs (e.g.captioning), (3) CNNs used in tasks with multi-modal inputs (e.g.visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are robust to adversarial perturbations, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show that even non-attention based models learn to localize discriminative regions of input image. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names (Bau et al. in Computer vision and pattern recognition, 2017) to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo on CloudCV (Agrawal et al., in: Mobile cloud visual media computing, pp 265–290. Springer, 2015) (http://gradcam.cloudcv.org) and a video at http://youtu.be/COjUB9Izk6E.

Journal Article

Share this book

Add to My Shelf

An Explainable Deep Learning Model to Prediction Dental Caries Using Panoramic Radiograph Images

by Zeynep Ozpolat , Ozal Yildirim , U. Rajendra Acharya in Accuracy , caries , caries; dental health; explainable deep models; deep learning; Grad-CAM

2023

Dental caries is the most frequent dental health issue in the general population. Dental caries can result in extreme pain or infections, lowering people’s quality of life. Applying machine learning models to automatically identify dental caries can lead to earlier treatment. However, physicians frequently find the model results unsatisfactory due to a lack of explainability. Our study attempts to address this issue with an explainable deep learning model for detecting dental caries. We tested three prominent pre-trained models, EfficientNet-B0, DenseNet-121, and ResNet-50, to determine which is best for the caries detection task. These models take panoramic images as the input, producing a caries–non-caries classification result and a heat map, which visualizes areas of interest on the tooth. The model performance was evaluated using whole panoramic images of 562 subjects. All three models produced remarkably similar results. However, the ResNet-50 model exhibited a slightly better performance when compared to EfficientNet-B0 and DenseNet-121. This model obtained an accuracy of 92.00%, a sensitivity of 87.33%, and an F1-score of 91.61%. Visual inspection showed us that the heat maps were also located in the areas with caries. The proposed explainable deep learning model diagnosed dental caries with high accuracy and reliability. The heat maps help to explain the classification results by indicating a region of suspected caries on the teeth. Dentists could use these heat maps to validate the classification results and reduce misclassification.

Journal Article

Share this book

Add to My Shelf

Accelerating Urban Flood Inundation Simulation Under Spatio‐Temporally Varying Rainstorms Using ConvLSTM Deep Learning Model

by Liao, Yaoxing , Lai, Chengguang , Wang, Zhaoli in Artificial neural networks , Correlation coefficient , Correlation coefficients

2025

Urban floods induced by rainstorms can lead to severe losses of lives and property, making rapid flood prediction essential for effective disaster prevention and mitigation. However, traditional deep learning (DL) models often overlook the spatial heterogeneity of rainstorms and lack interpretability. Here, we propose an end‐to‐end rapid prediction method for urban flood inundation incorporating spatiotemporal varying rainstorms using a Convolutional Long Short‐Term Memory Network (ConvLSTM) DL model. We compare the performance of the proposed method with that of a 3D Convolutional Neural Network (3D CNN) model and introduce the spatial visualization technique Grad‐CAM to interpret the rainstorms contributions to flood predictions. Results demonstrate that: (a) Compared to the physics‐based model, the proposed ConvLSTM model achieves satisfactory accuracy in predicting flood inundation evolution under spatio‐temporal varying rainstorms, with an average Pearson correlation coefficient (PCC) of 0.958 and a mean absolute error (MAE) of 0.021 m, successfully capturing the locations of observed inundation points under actual rainstorm conditions. (b) The ConvLSTM model can rapidly predict urban rainstorm inundation process in just 2 s for a study area of 74 km2, which is 170 times more efficient than a physics‐based model. (c) The interpretability of the ConvLSTM model for urban flood prediction can be enhanced through Grad‐CAM, revealing the model naturally focuses on local or upstream rainfall concentration areas most responsible for inundation, aligning well with hydrological understanding. Overall, the ConvLSTM model serves as a powerful surrogate for rapid urban flood simulation, providing an important reference for real‐time flood early warning and mitigation.

Journal Article

Share this book

Add to My Shelf

A novel Skin lesion prediction and classification technique: ViT‐GradCAM

by Jayachandran, Jagannathan , Srinivasan, Gayathri , Shafiq, Muhammad in Algorithms , Databases, Factual , Deep Learning

2024

Background Skin cancer is one of the highly occurring diseases in human life. Early detection and treatment are the prime and necessary points to reduce the malignancy of infections. Deep learning techniques are supplementary tools to assist clinical experts in detecting and localizing skin lesions. Vision transformers (ViT) based on image segmentation classification using multiple classes provide fairly accurate detection and are gaining more popularity due to legitimate multiclass prediction capabilities. Materials and methods In this research, we propose a new ViT Gradient‐Weighted Class Activation Mapping (GradCAM) based architecture named ViT‐GradCAM for detecting and classifying skin lesions by spreading ratio on the lesion's surface area. The proposed system is trained and validated using a HAM 10000 dataset by studying seven skin lesions. The database comprises 10 015 dermatoscopic images of varied sizes. The data preprocessing and data augmentation techniques are applied to overcome the class imbalance issues and improve the model's performance. Result The proposed algorithm is based on ViT models that classify the dermatoscopic images into seven classes with an accuracy of 97.28%, precision of 98.51, recall of 95.2%, and an F1 score of 94.6, respectively. The proposed ViT‐GradCAM obtains better and more accurate detection and classification than other state‐of‐the‐art deep learning‐based skin lesion detection models. The architecture of ViT‐GradCAM is extensively visualized to highlight the actual pixels in essential regions associated with skin‐specific pathologies. Conclusion This research proposes an alternate solution to overcome the challenges of detecting and classifying skin lesions using ViTs and GradCAM, which play a significant role in detecting and classifying skin lesions accurately rather than relying solely on deep learning models.

Journal Article

Share this book

Add to My Shelf

COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare

by Tiwari, Prayag , Shome, Debaditya , Zhang, Yazhou in Accuracy , Classification , Coronaviruses

2021

In the recent pandemic, accurate and rapid testing of patients remained a critical task in the diagnosis and control of COVID-19 disease spread in the healthcare industry. Because of the sudden increase in cases, most countries have faced scarcity and a low rate of testing. Chest X-rays have been shown in the literature to be a potential source of testing for COVID-19 patients, but manually checking X-ray reports is time-consuming and error-prone. Considering these limitations and the advancements in data science, we proposed a Vision Transformer-based deep learning pipeline for COVID-19 detection from chest X-ray-based imaging. Due to the lack of large data sets, we collected data from three open-source data sets of chest X-ray images and aggregated them to form a 30 K image data set, which is the largest publicly available collection of chest X-ray images in this domain to our knowledge. Our proposed transformer model effectively differentiates COVID-19 from normal chest X-rays with an accuracy of 98% along with an AUC score of 99% in the binary classification task. It distinguishes COVID-19, normal, and pneumonia patient’s X-rays with an accuracy of 92% and AUC score of 98% in the Multi-class classification task. For evaluation on our data set, we fine-tuned some of the widely used models in literature, namely, EfficientNetB0, InceptionV3, Resnet50, MobileNetV3, Xception, and DenseNet-121, as baselines. Our proposed transformer model outperformed them in terms of all metrics. In addition, a Grad-CAM based visualization is created which makes our approach interpretable by radiologists and can be used to monitor the progression of the disease in the affected lungs, assisting healthcare.

Journal Article

Share this book

Add to My Shelf

Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals

by Oh, Shu Lih , Acharya, U Rajendra , Ng, E.Y.K. in Accuracy , Algorithms , Artificial intelligence

2022

Myocardial infarction (MI) accounts for a high number of deaths globally. In acute MI, accurate electrocardiography (ECG) is important for timely diagnosis and intervention in the emergency setting. Machine learning is increasingly being explored for automated computer-aided ECG diagnosis of cardiovascular diseases. In this study, we have developed DenseNet and CNN models for the classification of healthy subjects and patients with ten classes of MI based on the location of myocardial involvement. ECG signals from the Physikalisch-Technische Bundesanstalt database were pre-processed, and the ECG beats were extracted using an R peak detection algorithm. The beats were then fed to the two models separately. While both models attained high classification accuracies (more than 95%), DenseNet is the preferred model for the classification task due to its low computational complexity and higher classification accuracy than the CNN model due to feature reusability. An enhanced class activation mapping (CAM) technique called Grad-CAM was subsequently applied to the outputs of both models to enable visualization of the specific ECG leads and portions of ECG waves that were most influential for the predictive decisions made by the models for the 11 classes. It was observed that Lead V4 was the most activated lead in both the DenseNet and CNN models. Furthermore, this study has also established the different leads and parts of the signal that get activated for each class. This is the first study to report features that influenced the classification decisions of deep models for multiclass classification of MI and healthy ECGs. Hence this study is crucial and contributes significantly to the medical field as with some level of visible explainability of the inner workings of the models, the developed DenseNet and CNN models may garner needed clinical acceptance and have the potential to be implemented for ECG triage of MI diagnosis in hospitals and remote out-of-hospital settings. •This is the first study to explain the inner workings of the DenseNet and CNN models developed for MI detection.•DenseNet is a better model than CNN, for rapid classification of MI.•Model is developed with ten-fold cross-validation. Hence, it is robust and accurate.•Obtained high accuracy of 98.9% for the classification of ten MI classes with DenseNet model.

Journal Article

Share this book

Add to My Shelf

Explanatory classification of CXR images into COVID-19, Pneumonia and Tuberculosis using deep learning and XAI

by Siku, Birat , Shahi, Tej Bahadur , Bhandari, Mohan in Accuracy , Artificial Intelligence , Bacterial diseases

2022

Chest X-ray (CXR) images are considered useful to monitor and investigate a variety of pulmonary disorders such as COVID-19, Pneumonia, and Tuberculosis (TB). With recent technological advancements, such diseases may now be recognized more precisely using computer-assisted diagnostics. Without compromising the classification accuracy and better feature extraction, deep learning (DL) model to predict four different categories is proposed in this study. The proposed model is validated with publicly available datasets of 7132 chest x-ray (CXR) images. Furthermore, results are interpreted and explained using Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Modelagnostic Explanation (LIME), and SHapley Additive exPlanation (SHAP) for better understandably. Initially, convolution features are extracted to collect high-level object-based information. Next, shapely values from SHAP, predictability results from LIME, and heatmap from Grad-CAM are used to explore the black-box approach of the DL model, achieving average test accuracy of 94.31 ± 1.01% and validation accuracy of 94.54 ± 1.33 for 10-fold cross validation. Finally, in order to validate the model and qualify medical risk, medical sensations of classification are taken to consolidate the explanations generated from the eXplainable Artificial Intelligence (XAI) framework. The results suggest that XAI and DL models give clinicians/medical professionals persuasive and coherent conclusions related to the detection and categorization of COVID-19, Pneumonia, and TB. •A light weight CNN to detect infection on CXR images.•Explanatory Classification of CXR Images into COVID-19, Pneumonia and Tuberculosis.•Exploring the Black box approach of CNN using XAI.•Performance comparison with other state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Correction: An explainable hybrid deep learning framework for precise skin lesion segmentation and multi-class classification

by Darem, Abdulbasit A. , Abdullah, Monir , Fiaz, Muhammad in classification , Deep learning , explainable AI

2025

[This corrects the article DOI: 10.3389/fmed.2025.1681542.].

Journal Article

Share this book

Add to My Shelf

Explainable Transformer-Based Deep Learning Model for the Detection of Malaria Parasites from Blood Cell Images

by Anower, Md. Shamim , Islam, Md. Robiul , Ahsan, Mominul in Accuracy , Artificial intelligence , Blood

2022

Malaria is a life-threatening disease caused by female anopheles mosquito bites. Various plasmodium parasites spread in the victim’s blood cells and keep their life in a critical situation. If not treated at the early stage, malaria can cause even death. Microscopy is a familiar process for diagnosing malaria, collecting the victim’s blood samples, and counting the parasite and red blood cells. However, the microscopy process is time-consuming and can produce an erroneous result in some cases. With the recent success of machine learning and deep learning in medical diagnosis, it is quite possible to minimize diagnosis costs and improve overall detection accuracy compared with the traditional microscopy method. This paper proposes a multiheaded attention-based transformer model to diagnose the malaria parasite from blood cell images. To demonstrate the effectiveness of the proposed model, the gradient-weighted class activation map (Grad-CAM) technique was implemented to identify which parts of an image the proposed model paid much more attention to compared with the remaining parts by generating a heatmap image. The proposed model achieved a testing accuracy, precision, recall, f1-score, and AUC score of 96.41%, 96.99%, 95.88%, 96.44%, and 99.11%, respectively, for the original malaria parasite dataset and 99.25%, 99.08%, 99.42%, 99.25%, and 99.99%, respectively, for the modified dataset. Various hyperparameters were also finetuned to obtain optimum results, which were also compared with state-of-the-art (SOTA) methods for malaria parasite detection, and the proposed method outperformed the existing methods.

Journal Article

Share this book

Add to My Shelf

Enhancement of detection accuracy for preventing iris presentation attack

by Shyam, Gopal Krishna , Alam, Sumbul , Venkatesh, Priyanka

2024

A system that recognizes the iris is susceptible to presentation attacks (PAs), in which a malicious party shows artefacts such as printed eyeballs, patterned contact lenses, or cosmetics to obscure their personal identity or manipulate someone else’s identity. In this study, we suggest the dual channel DenseNet presentation attack detection (DC-DenseNetPAD), an iris PA detector based on convolutional neural network architecture that is dependable and effective and is known as DenseNet. It displays generalizability across PA datasets, sensors, and artifacts. The efficiency of the suggested iris PA detection technique has been supported by tests performed on a popular dataset which is openly accessible (LivDet-2017 and LivDet-2015). The proposed technique outperforms state-of-the-art techniques with a true detection rate of 99.16% on LivDet-2017 and 98.40% on LivDet-2015. It is an improvement over the existing techniques using the LivDet-2017 dataset. We employ Grad-CAM as well as t-SNE plots to visualize intermediate feature distributions and fixation heatmaps in order to demonstrate how well DC-DenseNetPAD performs.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter