Catalogue Search | MBRL

Malware Detection Issues, Challenges, and Future Directions: A Survey

by Aboaoja, Faitouri A. , Zainal, Anazida , Al-rimy, Bander Ali Saleh in Automation , Behavior , Classification

2022

The evolution of recent malicious software with the rising use of digital services has increased the probability of corrupting data, stealing information, or other cybercrimes by malware attacks. Therefore, malicious software must be detected before it impacts a large number of computers. Recently, many malware detection solutions have been proposed by researchers. However, many challenges limit these solutions to effectively detecting several types of malware, especially zero-day attacks due to obfuscation and evasion techniques, as well as the diversity of malicious behavior caused by the rapid rate of new malware and malware variants being produced every day. Several review papers have explored the issues and challenges of malware detection from various viewpoints. However, there is a lack of a deep review article that associates each analysis and detection approach with the data type. Such an association is imperative for the research community as it helps to determine the suitable mitigation approach. In addition, the current survey articles stopped at a generic detection approach taxonomy. Moreover, some review papers presented the feature extraction methods as static, dynamic, and hybrid based on the utilized analysis approach and neglected the feature representation methods taxonomy, which is considered essential in developing the malware detection model. This survey bridges the gap by providing a comprehensive state-of-the-art review of malware detection model research. This survey introduces a feature representation taxonomy in addition to the deeper taxonomy of malware analysis and detection approaches and links each approach with the most commonly used data types. The feature extraction method is introduced according to the techniques used instead of the analysis approach. The survey ends with a discussion of the challenges and future research directions.

Journal Article

Share this book

Add to My Shelf

Mobile malware detection method using improved GhostNetV2 with image enhancement technique

by Cui, MengTian , Du, Yao , Chen, Xi in Accuracy , Adversarial samples , Algorithms

2025

In recent years, image-based feature extraction and deep learning classification methods are widely used in the field of malware detection, which helps improve the efficiency of automatic malicious feature extraction and enhances the overall performance of detection models. However, recent studies reveal that adversarial sample generation techniques pose significant challenges to malware detection models, as their effectiveness significantly declines when identifying adversarial samples. To address this problem, we propose a malware detection method based on an improved GhostNetV2 model, which simultaneously enhances detection performance for both normal malware and adversarial samples. First, Android classes.dex files are converted into RGB images, and image enhancement is performed using the Local Histogram Equalization technique. Subsequently, the Gabor method is employed to transform three-channel images into single-channel images, ensuring consistent detection accuracy for malicious code while reducing training and inference time. Second, we make three improvements to GhostNetV2 to more effectively identify malicious code, including introducing channel shuffling in the Ghost module, replacing the squeeze and excitation mechanism with a more efficient channel attention mechanism, and optimizing the activation function. Finally, extensive experiments are conducted to evaluate the proposed method. Results demonstrate that our model achieves superior performance compared to 20 state-of-the-art deep learning models, attaining detection accuracies of 97.7% for normal malware and 92.0% for adversarial samples.

Journal Article

Share this book

Add to My Shelf

Intelligent phishing detection scheme using deep learning algorithms

by Hossain, M. A. , Lwin, Khin T. , Adebowale, Moruf Akin in Algorithms , Artificial neural networks , Averages

2023

PurposePhishing attacks have evolved in recent years due to high-tech-enabled economic growth worldwide. The rise in all types of fraud loss in 2019 has been attributed to the increase in deception scams and impersonation, as well as to sophisticated online attacks such as phishing. The global impact of phishing attacks will continue to intensify, and thus, a more efficient phishing detection method is required to protect online user activities. To address this need, this study focussed on the design and development of a deep learning-based phishing detection solution that leveraged the universal resource locator and website content such as images, text and frames.Design/methodology/approachDeep learning techniques are efficient for natural language and image classification. In this study, the convolutional neural network (CNN) and the long short-term memory (LSTM) algorithm were used to build a hybrid classification model named the intelligent phishing detection system (IPDS). To build the proposed model, the CNN and LSTM classifier were trained by using 1m universal resource locators and over 10,000 images. Then, the sensitivity of the proposed model was determined by considering various factors such as the type of feature, number of misclassifications and split issues.FindingsAn extensive experimental analysis was conducted to evaluate and compare the effectiveness of the IPDS in detecting phishing web pages and phishing attacks when applied to large data sets. The results showed that the model achieved an accuracy rate of 93.28% and an average detection time of 25 s.Originality/valueThe hybrid approach using deep learning algorithm of both the CNN and LSTM methods was used in this research work. On the one hand, the combination of both CNN and LSTM was used to resolve the problem of a large data set and higher classifier prediction performance. Hence, combining the two methods leads to a better result with less training time for LSTM and CNN architecture, while using the image, frame and text features as a hybrid for our model detection. The hybrid features and IPDS classifier for phishing detection were the novelty of this study to the best of the authors' knowledge.

Journal Article

Share this book

Add to My Shelf

A multi-label visualisation approach for malware behaviour analysis

by Taha, Kamal , Yoo, Paul D. , Yeun, Chan Yeob in 639/705/117 , 639/705/258 , Artificial intelligence

2025

Modern malware evolves continuously, posing persistent challenges to cybersecurity. Conventional classification approaches typically group malware by its primary objective, emphasising dominant behaviours while overlooking the complex and overlapping strategies common in real-world attacks. Here we present DECODE (DEep Classification Of Dynamic Exploits), a proportional multi-label, context-aware framework that combines object detection, explainable artificial intelligence (XAI), and agent-based large language models (LLMs) to deliver interpretable and comprehensive malware analysis. DECODE introduces the first object detection dataset specifically for malware classification, generated through an automated annotation pipeline that removes the need for manual labelling and remains effective even for visually indistinguishable malware features. To improve attribution reliability, we extend Gradient-weighted Class Activation Mapping (Grad-CAM) with a Bayesian formulation, enabling uncertainty-aware visualisation of discriminative regions linked to multiple categories. The regions identified through object detection are subsequently mapped to their corresponding API call sequences and interpreted via a multi-agent reasoning module, which incorporates critique-and-verification loops to reduce hallucinations and bias. Experimental evaluation shows multi-label and binary classification accuracies of 0.8513 and 0.9380, respectively, outperforming conventional deep learning baselines. By combining visual localisation, proportional multi-label scoring, and human-readable behavioural narratives, DECODE enables malware to be classified not only by intended impact but also by fine-grained structural and behavioural traits, offering a richer understanding of complex threats.

Journal Article

Share this book

Add to My Shelf

Android Malware Detection Using TCN with Bytecode Image

by Zhang, Wenhui , Ding, Chao , Lu, Bei in Accuracy , Artificial neural networks , Cellular telephones

2021

With the rapid increase in the number of Android malware, the image-based analysis method has become an effective way to defend against symmetric encryption and confusing malware. At present, the existing Android malware bytecode image detection method, based on a convolution neural network (CNN), relies on a single DEX file feature and requires a large amount of computation. To solve these problems, we combine the visual features of the XML file with the data section of the DEX file for the first time, and propose a new Android malware detection model, based on a temporal convolution network (TCN). First, four gray-scale image datasets with four different combinations of texture features are created by combining XML files and DEX files. Then the image size is unified and input to the designed neural network with three different convolution methods for experimental validation. The experimental results show that adding XML files is beneficial for Android malware detection. The detection accuracy of the TCN model is 95.44%, precision is 95.45%, recall rate is 95.45%, and F1-Score is 95.44%. Compared with other methods based on the traditional CNN model or lightweight MobileNetV2 model, the method proposed in this paper, based on the TCN model, can effectively utilize bytecode image sequence features, improve the accuracy of detecting Android malware and reduce its computation.

Journal Article

Share this book

Add to My Shelf

An intrusion detection model to detect zero-day attacks in unseen data using machine learning

by Ku, Chin Soon , Alizadehsani, Roohallah , Dai, Zhen in Accuracy , Algorithms , Analysis

2024

In an era marked by pervasive digital connectivity, cybersecurity concerns have escalated. The rapid evolution of technology has led to a spectrum of cyber threats, including sophisticated zero-day attacks. This research addresses the challenge of existing intrusion detection systems in identifying zero-day attacks using the CIC-MalMem-2022 dataset and autoencoders for anomaly detection. The trained autoencoder is integrated with XGBoost and Random Forest, resulting in the models XGBoost-AE and Random Forest-AE. The study demonstrates that incorporating an anomaly detector into traditional models significantly enhances performance. The Random Forest-AE model achieved 100% accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC), outperforming the methods proposed by Balasubramanian et al., Khan, Mezina et al., Smith et al., and Dener et al. When tested on unseen data, the Random Forest-AE model achieved an accuracy of 99.9892%, precision of 100%, recall of 99.9803%, F1 score of 99.9901%, and MCC of 99.8313%. This research highlights the effectiveness of the proposed model in maintaining high accuracy even with previously unseen data.

Journal Article

Share this book

Add to My Shelf

Discriminative Regions and Adversarial Sensitivity in CNN-Based Malware Image Classification

by Roy, Anish , Di Troia, Fabio in Accuracy , Artificial neural networks , Bias

2025

The escalating prevalence of malware poses a significant threat to digital infrastructure, demanding robust yet efficient detection methods. In this study, we evaluate multiple Convolutional Neural Network (CNN) architectures, including basic CNN, LeNet, AlexNet, GoogLeNet, and DenseNet, on a dataset of 11,000 malware images spanning 452 families. Our experiments demonstrate that CNN models can achieve reliable classification performance across both multiclass and binary tasks. However, we also uncover a critical weakness in that even minimal image perturbations, such as pixel modification lower than 1% of the total image pixels, drastically degrade accuracy and reveal CNNs’ fragility in adversarial settings. A key contribution of this work is spatial analysis of malware images, revealing that discriminative features concentrate disproportionately in the bottom-left quadrant. This spatial bias likely reflects semantic structure, as malware payload information often resides near the end of binary files when rasterized. Notably, models trained in this region outperform those trained in other sections, underscoring the importance of spatial awareness in malware classification. Taken together, our results reveal that CNN-based malware classifiers are simultaneously effective and vulnerable to learning strong representations but sensitive to both subtle perturbations and positional bias. These findings highlight the need for future detection systems that integrate robustness to noise with resilience against spatial distortions to ensure reliability in real-world adversarial environments.

Journal Article

Share this book

Add to My Shelf

Enhancing security in IoMT using federated TinyGAN for lightweight and accurate malware detection

by R, Bright Gee Varghese , Shankar, M. Gobi , Daniel, Esther in 639/166 , 639/705 , Accuracy

2026

The internet of medical things (IoMT) ecosystem is highly vulnerable to malware attacks due to the vast number of connected devices and their continuous collection, transmission, and processing of sensitive data. Inadequate device management often makes each device a potential entry point, enabling malware to spread rapidly across networks with minimal detection. Given the resource constraints, privacy concerns, and distributed nature of IoT devices, there is a pressing need for lightweight and adaptive intrusion detection models. This paper proposes a federated learning (FL) based framework enhanced with TinyGAN, where the generator produces synthetic data to improve malware detection. The federated approach enables continuous, decentralized learning, allowing the model to adapt to emerging threats without requiring centralized retraining, thereby preserving privacy and reducing computational overhead. Experimental evaluations demonstrate significant improvements in both detection accuracy and efficiency compared to conventional centralized techniques. After 20 training rounds, the proposed model achieved a precision of 99.30%, a recall of 100%, and an F1-score of 99.52%. These results highlight the scalability, privacy-preserving nature, and effectiveness of the framework, offering a practical advancement in securing IoT environments against malware attacks. An experimental analysis of the IoT-23 dataset reveals that FL with TinyGAN consistently outperforms traditional models, such as MLP and FNN/LSTM, in terms of accuracy, convergence rate, and resource consumption, thereby establishing its effectiveness for practical IoT malware detection.

Journal Article

Share this book

Add to My Shelf

MalBERTv2: Code Aware BERT-Based Model for Malware Identification

by Akhloufi, Moulay A. , Rahali, Abir in Algorithms , Anti-virus software , Artificial intelligence

2023

To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, recent advances in natural language processing (NLP) models can aid in proactively detecting various threats. In this paper, we present a novel approach for representing the relevance and significance of the Malware/Goodware (MG) datasets, through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly available datasets, with a focus on the source code of the apps by extracting the top-ranked files that present the most relevant information. These files are then passed through a pre-tokenization feature generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply a classifier using bidirectional encoder representations from transformers (BERT) as a layer within the model pipeline. The performance of our model is evaluated on different datasets, achieving a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our approach for proactively detecting malware threats using NLP techniques.

Journal Article

Share this book

Add to My Shelf

A Hybrid CNN–BiLSTM Framework Optimized with Bayesian Search for Robust Android Malware Detection

by Mutambik, Ibrahim in Accuracy , Android malware detection , Artificial neural networks

2025

With the rapid proliferation of Android smartphones, mobile malware threats have escalated significantly, underscoring the need for more accurate and adaptive detection solutions. This work proposes an innovative deep learning hybrid model that combines Convolutional Neural Networks (CNNs) with Bidirectional Long Short-Term Memory (BiLSTM) networks for learning both local features and sequential behavior in Android applications. To improve the relevance and clarity of the input data, Mutual Information is applied for feature selection, while Bayesian Optimization is adopted to efficiently optimize the model’s parameters. The designed system is tested on standard Android malware datasets and achieves an impressive detection accuracy of 99.3%, clearly outperforming classical approaches such as Support Vector Machines (SVMs), Random Forest, CNN, and Naive Bayes. Moreover, it delivers strong outcomes across critical evaluation metrics like F1-score and ROC-AUC. These findings confirm the framework’s high efficiency, adaptability, and practical applicability, making it a compelling solution for Android malware detection in today’s evolving threat landscape.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter