Catalogue Search | MBRL

A Comprehensive Review of Explainable Artificial Intelligence (XAI) in Computer Vision

by Cai, Lingfeng , Li, Yule , Cheng, Zhihan in Algorithms , Artificial Intelligence , Comparative analysis

2025

Explainable Artificial Intelligence (XAI) is increasingly important in computer vision, aiming to connect complex model outputs with human understanding. This review provides a focused comparative analysis of representative XAI methods in four main categories, attribution-based, activation-based, perturbation-based, and transformer-based approaches, selected from a broader literature landscape. Attribution-based methods like Grad-CAM highlight key input regions using gradients and feature activation. Activation-based methods analyze the responses of internal neurons or feature maps to identify which parts of the input activate specific layers or units, helping to reveal hierarchical feature representations. Perturbation-based techniques, such as RISE, assess feature importance through input modifications without accessing internal model details. Transformer-based methods, which use self-attention, offer global interpretability by tracing information flow across layers. We evaluate these methods using metrics such as faithfulness, localization accuracy, efficiency, and overlap with medical annotations. We also propose a hierarchical taxonomy to classify these methods, reflecting the diversity of XAI techniques. Results show that RISE has the highest faithfulness but is computationally expensive, limiting its use in real-time scenarios. Transformer-based methods perform well in medical imaging, with high IoU scores, though interpreting attention maps requires care. These findings emphasize the need for context-aware evaluation and hybrid XAI methods balancing interpretability and efficiency. The review ends by discussing ethical and practical challenges, stressing the need for standard benchmarks and domain-specific tuning.

Journal Article

Share this book

Add to My Shelf

A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision

by Iqbal, Umair , Davies, Tim , Perez, Pascal in Algorithms , Artificial intelligence , Cameras

2024

Computer Vision (CV) has become increasingly important for Single-Board Computers (SBCs) due to their widespread deployment in addressing real-world problems. Specifically, in the context of smart cities, there is an emerging trend of developing end-to-end video analytics solutions designed to address urban challenges such as traffic management, disaster response, and waste management. However, deploying CV solutions on SBCs presents several pressing challenges (e.g., limited computation power, inefficient energy management, and real-time processing needs) hindering their use at scale. Graphical Processing Units (GPUs) and software-level developments have emerged recently in addressing these challenges to enable the elevated performance of SBCs; however, it is still an active area of research. There is a gap in the literature for a comprehensive review of such recent and rapidly evolving advancements on both software and hardware fronts. The presented review provides a detailed overview of the existing GPU-accelerated edge-computing SBCs and software advancements including algorithm optimization techniques, packages, development frameworks, and hardware deployment specific packages. This review provides a subjective comparative analysis based on critical factors to help applied Artificial Intelligence (AI) researchers in demonstrating the existing state of the art and selecting the best suited combinations for their specific use-case. At the end, the paper also discusses potential limitations of the existing SBCs and highlights the future research directions in this domain.

Journal Article

Share this book

Add to My Shelf

Neural Architecture Search Survey: A Computer Vision Perspective

by Jeon, Kwang-Woo , Kang, Jeon-Seong , Chung, Hyun-Joon in artificial intelligence (AI) , automated machine learning (Auto-ML) , Automation

2023

In recent years, deep learning (DL) has been widely studied using various methods across the globe, especially with respect to training methods and network structures, proving highly effective in a wide range of tasks and applications, including image, speech, and text recognition. One important aspect of this advancement is involved in the effort of designing and upgrading neural architectures, which has been consistently attempted thus far. However, designing such architectures requires the combined knowledge and know-how of experts from each relevant discipline and a series of trial-and-error steps. In this light, automated neural architecture search (NAS) methods are increasingly at the center of attention; this paper aimed at summarizing the basic concepts of NAS while providing an overview of recent studies on the applications of NAS. It is worth noting that most previous survey studies on NAS have been focused on perspectives of hardware or search strategies. To the best knowledge of the present authors, this study is the first to look at NAS from a computer vision perspective. In the present study, computer vision areas were categorized by task, and recent trends found in each study on NAS were analyzed in detail.

Journal Article

Share this book

Add to My Shelf

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks

by Qi, Fanchao , Jiang, Xin , Zhang, Zhengyan in Artificial intelligence , Classification , Computer vision

2023

The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.

Journal Article

Share this book

Add to My Shelf

EfficientLiteDet: a real-time pedestrian and vehicle detection algorithm

by Hashmi, Mohammad Farukh , Keskar, Avinash G. , Murthy, Chintakindi Balaram in Algorithms , Communications Engineering , Computer Science

2022

Since safety plays a crucial role and the top priority, in both unmanned and driver-assistance driving systems, there is a need of efficient and accurate detection of captured objects by object detection algorithms in real-time. Directly applying existing models to tackle real-time pedestrian and vehicle detection tasks captured by high speed moving vehicle scenarios has two problems. First, the target scale varies drastically because the vehicle speed changes greatly. Second, captured images contain both tiny targets and high density targets, which brings in occlusion between targets. To solve the two issues, an efficient light weight real-time detection algorithm is proposed, which is referred to as EfficientLiteDet. Based on Tiny-YOLOv4, one more prediction head is introduced in the proposed model to detect multi-scale targets effectively. In order to detect tiny and occluded denser targets, we used Transformer Prediction Heads (TPH) instead of original anchor detection heads in our model. To explore the potential of self-attention mechanism in TPH, the proposed model integrates “convolutional block attention model” to locate crucial attention region on scenarios with denser targets. Further to improve the detection performance of our model, we applied various data augmentation strategies such as mosaic, mix-up, multi-scale, and random-horizontal-flip during the model training. Extensive experiments are conducted on five challenging pedestrian and vehicle datasets shows that the EfficientLiteDet model has better performance in real-time scenarios. On Pascal Voc-2007, Highway and Udacity datasets, the proposed model achieves mean average precision (mAP) 87.3%, 80.1% and 77.8%, respectively, which is quite better than Tiny-YOLOv4 state-of-the-art algorithm by + 2.4%, 1.8% and + 2.4%, respectively.

Journal Article

Share this book

Add to My Shelf

Image captioning improved visual question answering

by Sharma, Himanshu , Jalal, Anand Singh in 1174: Futuristic Trends and Innovations in Multimedia Systems Using Big Data , Algorithms , Computer Communication Networks

2022

Both Visual Question Answering (VQA) and image captioning are the problems which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In general, computer vision models are effectively utilized to represent visual contents. While NLP algorithms are used to represent the sentences. In recent years, VQA and image captioning tasks are tackled independently although they require similar type of algorithms. In this paper, a joint relationship between these two tasks is established and exploited. We present an image captioning based VQA model that uses the knowledge learnt from the image captioning task and transfers that knowledge to VQA task. We integrate the image captioning module into the VQA model by fusing the features obtained from captioning model and the attention-based visual feature. The experimental results demonstrate the improvement in the answer generation accuracy by a margin 3.45 % on VQA 1.0, 3.33% on VQA 2.0 and 1.73% on VQA-CP v2 datasets over the state-of-the-art VQA models.

Journal Article

Share this book

Add to My Shelf

An Automated Feature-Based Image Registration Strategy for Tool Condition Monitoring in CNC Machine Applications

by Hurtado Carreon, Andres , Lazar, Eden , Veldhuis, Stephen C. in Algorithms , Automation , computer vision (CV)

2024

The implementation of Machine Vision (MV) systems for Tool Condition Monitoring (TCM) plays a critical role in reducing the total cost of operation in manufacturing while expediting tool wear testing in research settings. However, conventional MV-TCM edge detection strategies process each image independently to infer edge positions, rendering them susceptible to inaccuracies when tool edges are compromised by material adhesion or chipping, resulting in imprecise wear measurements. In this study, an MV system is developed alongside an automated, feature-based image registration strategy to spatially align tool wear images, enabling a more consistent and accurate detection of tool edge position. The MV system was shown to be robust to the machining environment, versatile across both turning and milling machining centers and capable of reducing tool wear image capturing time up to 85% in reference to standard approaches. A comparison of feature detector-descriptor algorithms found SIFT, KAZE, and ORB to be the most suitable for MV-TCM registration, with KAZE presenting the highest accuracy and ORB being the most computationally efficient. The automated registration algorithm was shown to be efficient, performing registrations in 1.3 s on average and effective across a wide range of tool geometries and coating variations. The proposed tool reference line detection strategy, based on spatially aligned tool wear images, outperformed standard methods, resulting in average tool wear measurement errors of 2.5% and 4.5% in the turning and milling tests, respectively. Such a system allows machine tool operators to more efficiently capture cutting tool images while ensuring more reliable tool wear measurements.

Journal Article

Share this book

Add to My Shelf

Artificial intelligence innovations in neurosurgical oncology: a narrative review

by Pease, Matthew , Abumoussa, Andrew , Sexton, Daniel P. in Artificial Intelligence , Brain Neoplasms - pathology , Brain Neoplasms - surgery

2024

Purpose Artificial Intelligence (AI) has become increasingly integrated clinically within neurosurgical oncology. This report reviews the cutting-edge technologies impacting tumor treatment and outcomes. Methods A rigorous literature search was performed with the aid of a research librarian to identify key articles referencing AI and related topics (machine learning (ML), computer vision (CV), augmented reality (AR), virtual reality (VR), etc.) for neurosurgical care of brain or spinal tumors. Results Treatment of central nervous system (CNS) tumors is being improved through advances across AI—such as AL, CV, and AR/VR. AI aided diagnostic and prognostication tools can influence pre-operative patient experience, while automated tumor segmentation and total resection predictions aid surgical planning. Novel intra-operative tools can rapidly provide histopathologic tumor classification to streamline treatment strategies. Post-operative video analysis, paired with rich surgical simulations, can enhance training feedback and regimens. Conclusion While limited generalizability, bias, and patient data security are current concerns, the advent of federated learning, along with growing data consortiums, provides an avenue for increasingly safe, powerful, and effective AI platforms in the future.

Journal Article

Share this book

Add to My Shelf

Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques and Embedded Platforms—A Comprehensive Review

by Hashmi, Mohammad Farukh , Geem, Zong Woo , Bokde, Neeraj Dhanraj in Accuracy , Artificial neural networks , Classification

2020

In recent years there has been remarkable progress in one computer vision application area: object detection. One of the most challenging and fundamental problems in object detection is locating a specific object from the multiple objects present in a scene. Earlier traditional detection methods were used for detecting the objects with the introduction of convolutional neural networks. From 2012 onward, deep learning-based techniques were used for feature extraction, and that led to remarkable breakthroughs in this area. This paper shows a detailed survey on recent advancements and achievements in object detection using various deep learning techniques. Several topics have been included, such as Viola–Jones (VJ), histogram of oriented gradient (HOG), one-shot and two-shot detectors, benchmark datasets, evaluation metrics, speed-up techniques, and current state-of-art object detectors. Detailed discussions on some important applications in object detection areas, including pedestrian detection, crowd detection, and real-time object detection on Gpu-based embedded systems have been presented. At last, we conclude by identifying promising future directions.

Journal Article

Share this book

Add to My Shelf

A novel brain-controlled wheelchair combined with computer vision and augmented reality

by Yu, Yang , Liu, Yadong , Tang, Jingsheng in Algorithms , And Electroencephalogram (EEG) , Artificial intelligence

2022

Background Brain-controlled wheelchairs (BCWs) are important applications of brain–computer interfaces (BCIs). Currently, most BCWs are semiautomatic. When users want to reach a target of interest in their immediate environment, this semiautomatic interaction strategy is slow. Methods To this end, we combined computer vision (CV) and augmented reality (AR) with a BCW and proposed the CVAR-BCW: a BCW with a novel automatic interaction strategy. The proposed CVAR-BCW uses a translucent head-mounted display (HMD) as the user interface, uses CV to automatically detect environments, and shows the detected targets through AR technology. Once a user has chosen a target, the CVAR-BCW can automatically navigate to it. For a few scenarios, the semiautomatic strategy might be useful. We integrated a semiautomatic interaction framework into the CVAR-BCW. The user can switch between the automatic and semiautomatic strategies. Results We recruited 20 non-disabled subjects for this study and used the accuracy, information transfer rate (ITR), and average time required for the CVAR-BCW to reach each designated target as performance metrics. The experimental results showed that our CVAR-BCW performed well in indoor environments: the average accuracies across all subjects were 83.6% (automatic) and 84.1% (semiautomatic), the average ITRs were 8.2 bits/min (automatic) and 8.3 bits/min (semiautomatic), the average times required to reach a target were 42.4 s (automatic) and 93.4 s (semiautomatic), and the average workloads and degrees of fatigue for the two strategies were both approximately 20. Conclusions Our CVAR-BCW provides a user-centric interaction approach and a good framework for integrating more advanced artificial intelligence technologies, which may be useful in the field of disability assistance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter