Catalogue Search | MBRL

Malware Detection Issues, Challenges, and Future Directions: A Survey

by Aboaoja, Faitouri A. , Zainal, Anazida , Al-rimy, Bander Ali Saleh in Automation , Behavior , Classification

2022

The evolution of recent malicious software with the rising use of digital services has increased the probability of corrupting data, stealing information, or other cybercrimes by malware attacks. Therefore, malicious software must be detected before it impacts a large number of computers. Recently, many malware detection solutions have been proposed by researchers. However, many challenges limit these solutions to effectively detecting several types of malware, especially zero-day attacks due to obfuscation and evasion techniques, as well as the diversity of malicious behavior caused by the rapid rate of new malware and malware variants being produced every day. Several review papers have explored the issues and challenges of malware detection from various viewpoints. However, there is a lack of a deep review article that associates each analysis and detection approach with the data type. Such an association is imperative for the research community as it helps to determine the suitable mitigation approach. In addition, the current survey articles stopped at a generic detection approach taxonomy. Moreover, some review papers presented the feature extraction methods as static, dynamic, and hybrid based on the utilized analysis approach and neglected the feature representation methods taxonomy, which is considered essential in developing the malware detection model. This survey bridges the gap by providing a comprehensive state-of-the-art review of malware detection model research. This survey introduces a feature representation taxonomy in addition to the deeper taxonomy of malware analysis and detection approaches and links each approach with the most commonly used data types. The feature extraction method is introduced according to the techniques used instead of the analysis approach. The survey ends with a discussion of the challenges and future research directions.

Journal Article

Share this book

Add to My Shelf

MalFe—Malware Feature Engineering Generation Platform

by Venter, Hein , Singh, Avinash , Ikuesan, Richard Adeyemi in Accuracy , Algorithms , Artificial intelligence

2023

The growing sophistication of malware has resulted in diverse challenges, especially among security researchers who are expected to develop mechanisms to thwart these malicious attacks. While security researchers have turned to machine learning to combat this surge in malware attacks and enhance detection and prevention methods, they often encounter limitations when it comes to sourcing malware binaries. This limitation places the burden on malware researchers to create context-specific datasets and detection mechanisms, a time-consuming and intricate process that involves a series of experiments. The lack of accessible analysis reports and a centralized platform for sharing and verifying findings has resulted in many research outputs that can neither be replicated nor validated. To address this critical gap, a malware analysis data curation platform was developed. This platform offers malware researchers a highly customizable feature generation process drawing from analysis data reports, particularly those generated in sandbox-based environments such as Cuckoo Sandbox. To evaluate the effectiveness of the platform, a replication of existing studies was conducted in the form of case studies. These studies revealed that the developed platform offers an effective approach that can aid malware detection research. Moreover, a real-world scenario involving over 3000 ransomware and benign samples for ransomware detection based on PE entropy was explored. This yielded an impressive accuracy score of 98.8% and an AUC of 0.97 when employing the decision tree algorithm, with a low latency of 1.51 ms. These results emphasize the necessity of the proposed platform while demonstrating its capacity to construct a comprehensive detection mechanism. By fostering community-driven interactive databanks, this platform enables the creation of datasets as well as the sharing of reports, both of which can substantially reduce experimentation time and enhance research repeatability.

Journal Article

Share this book

Add to My Shelf

Empirical Analysis of Learning-based Malware Detection Methods using Image Visualization

by Sheneamer, Abdullah , Alhazmi, Essa , Henrydoss, James in Accuracy , Algorithms , Artificial neural networks

2022

Malware, a short name for malicious software is an emerging cyber threat. Various researchers have proposed ways to build advanced malware detectors that can mitigate threat actors and enable effective cybersecurity decisions in the past. Recent research implements malware detectors based on visualized images of malware executable files. In this framework, a malware binary is converted into an image, and by extracting image features and applying machine learning methods, the malware is identified based on image similarity. In this research work, we implement the Image visualization-based malware detection method and conduct an empirical analysis of vari-ous learners for selecting a candidate learning classifier that can provide better prediction performance. We evaluate our framework using the following malware datasets, Search And RetrieVAl of Malware (SARVAM), Xue-dataset, and Canadian Institutes for Cyber Security (CIC) datasets. Our experiments include the following learning algorithms, Linear Regression, Random Forest, K-Nearest Neighbor (KNN), Classification and Decision Tree (CART), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and deep learning-based Convolutional Neural Network (CNN). This image-visualization-based method proves to be effective in terms of prediction accuracy. Some conclusions emerge from our initial study and find that a Con-volutional Neural Network (CNN) algorithm provides relatively better performance when used against SARvAM and various malware datasets. The CNN model achieved a high performance of F1-score and accuracy in the binary classification task reaching 95.70% and 99.50%, consecutively. The model in the multi-classification task achieved of 95.96% and 99.30% (F1-score and accuracy) for detecting malware types. We find that the KNN model outperforms other traditional classifiers.

Journal Article

Share this book

Add to My Shelf

A novel permission-based Android malware detection system using feature selection based on linear regression

by Kural, Oğuz Emre , Akleylek, Sedat , Şahin, Durmuş Özkan in Artificial Intelligence , Computational Biology/Bioinformatics , Computational Science and Engineering

2023

With the developments in mobile and wireless technology, mobile devices have become an important part of our lives. While Android is the leading operating system in market share, it is the platform most targeted by attackers. Although many solutions have been proposed in the literature for the detection of Android malware, there is still a need for attribute selection methods to be used in Android malware detection systems. In this study, a machine learning-based malware detection system is proposed to distinguish Android malware from benign applications. At the feature selection stage of the proposed malware detection system, it is aimed to remove unnecessary features by using a linear regression-based feature selection approach. In this way, the dimension of the feature vector is reduced, the training time is decreased, and the classification model can be used in real-time malware detection systems. When the results of the study are examined, the highest 0.961 is obtained according to the F-measure metric by using at least 27 features.

Journal Article

Share this book

Add to My Shelf

A Hybrid Analysis-Based Approach to Android Malware Family Classification

by Zhang, Wenhui , Ding, Chao , Lu, Bei in android malware , Chi-square test , Classification

2021

With the popularity of Android, malware detection and family classification have also become a research focus. Many excellent methods have been proposed by previous authors, but static and dynamic analyses inevitably require complex processes. A hybrid analysis method for detecting Android malware and classifying malware families is presented in this paper, and is partially optimized for multiple-feature data. For static analysis, we use permissions and intent as static features and use three feature selection methods to form a subset of three candidate features. Compared with various models, including k-nearest neighbors and random forest, random forest is the best, with a detection rate of 95.04%, while the chi-square test is the best feature selection method. After using feature selection to explore the critical static features contained in this dataset, we analyzed a subset of important features to gain more insight into the malware. In a dynamic analysis based on network traffic, unlike those that focus on a one-way flow of traffic and work on HTTP protocols and transport layer protocols, we focused on sessions and retained protocol layers. The Res7LSTM model is then used to further classify the malicious and partially benign samples detected in the static detection. The experimental results show that our approach can not only work with fewer static features and guarantee sufficient accuracy, but also improve the detection rate of Android malware family classification from 71.48% in previous work to 99% when cutting the traffic in terms of the sessions and protocols of all layers.

Journal Article

Share this book

Add to My Shelf

An Effective Memory Analysis for Malware Detection and Classification

by Omar, Khairuddin , Akram Zainol Ariffin, Khairul , Sihwail, Rami in Classifiers , Command and control , Communication

2021

The study of malware behaviors, over the last years, has received tremendous attention from researchers for the purpose of reducing malware risks. Most of the investigating experiments are performed using either static analysis or behavior analysis. However, recent studies have shown that both analyses are vulnerable to modern malware files that use several techniques to avoid analysis and detection. Therefore, extracted features could be meaningless and a distraction for malware analysts. However, the volatile memory can expose useful information about malware behaviors and characteristics. In addition, memory analysis is capable of detecting unconventional malware, such as in-memory and fileless malware. However, memory features have not been fully utilized yet. Therefore, this work aims to present a new malware detection and classification approach that extracts memory-based features from memory images using memory forensic techniques. The extracted features can expose the malware’s real behaviors, such as interacting with the operating system, DLL and process injection, communicating with command and control site, and requesting higher privileges to perform specific tasks. We also applied feature engineering and converted the features to binary vectors before training and testing the classifiers. The experiments show that the proposed approach has a high classification accuracy rate of 98.5% and a false positive rate as low as 1.24% using the SVM classifier. The efficiency of the approach has been evaluated by comparing it with other related works. Also, a new memory-based dataset consisting of 2502 malware files and 966 benign samples forming 8898 features and belonging to six memory types has been created and published online for research purposes.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection

by Kouliaridis, Vasileios , Kambourakis, Georgios in Accuracy , android malware , Axes (reference lines)

2021

Year after year, mobile malware attacks grow in both sophistication and diffusion. As the open source Android platform continues to dominate the market, malware writers consider it as their preferred target. Almost strictly, state-of-the-art mobile malware detection solutions in the literature capitalize on machine learning to detect pieces of malware. Nevertheless, our findings clearly indicate that the majority of existing works utilize different metrics and models and employ diverse datasets and classification features stemming from disparate analysis techniques, i.e., static, dynamic, or hybrid. This complicates the cross-comparison of the various proposed detection schemes and may also raise doubts about the derived results. To address this problem, spanning a period of the last seven years, this work attempts to schematize the so far ML-powered malware detection approaches and techniques by organizing them under four axes, namely, the age of the selected dataset, the analysis type used, the employed ML techniques, and the chosen performance metrics. Moreover, based on these axes, we introduce a converging scheme which can guide future Android malware detection techniques and provide a solid baseline to machine learning practices in this field.

Journal Article

Share this book

Add to My Shelf

Enhancing malware detection with feature selection and scaling techniques using machine learning models

by Saleh, Mohammad Abu , Biswas, Barna , Akter, Jahanara in 639/166 , 639/705/117 , Cybersecurity

2025

The increasing prevalence of malware presents a critical challenge to cybersecurity, emphasizing the need for robust detection methods. This study uses a binary tabular classification dataset to evaluate the impact of feature selection, feature scaling, and machine learning (ML) models on malware detection. The methodology involves experimenting with three feature scaling techniques (no scaling, normalization, and min-max scaling), three feature selection methods (no selection, Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA)), and twelve ML models, including traditional algorithms and ensemble methods. A publicly available dataset with 11,598 samples and 139 features is utilized, and model performance is assessed using metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Results reveal that the Light Gradient Boosting Machine (LGBM) achieves the highest accuracy of 97.16% when PCA and either min-max scaling or normalization are applied. Additionally, ensemble models consistently outperform traditional ML models, demonstrating their effectiveness in enhancing malware detection. These findings offer valuable insights into optimizing preprocessing and model selection strategies for developing reliable and efficient malware detection systems.

Journal Article

Share this book

Add to My Shelf

Dynamic Extraction of Initial Behavior for Evasive Malware Detection

by Aboaoja, Faitouri A. , Zainal, Anazida , Rassam, Murad A. in Accuracy , Algorithms , Behavior

2023

Recently, malware has become more abundant and complex as the Internet has become more widely used in daily services. Achieving satisfactory accuracy in malware detection is a challenging task since malicious software exhibit non-relevant features when they change the performed behaviors as a result of their awareness of the analysis environments. However, the existing solutions extract features from the entire collected data offered by malware during the run time. Accordingly, the actual malicious behaviors are hidden during the training, leading to a model trained using unrepresentative features. To this end, this study presents a feature extraction scheme based on the proposed dynamic initial evasion behaviors determination (DIEBD) technique to improve the performance of evasive malware detection. To effectively represent evasion behaviors, the collected behaviors are tracked by examining the entropy distributions of APIs-gram features using the box-whisker plot algorithm. A feature set suggested by the DIEBD-based feature extraction scheme is used to train machine learning algorithms to evaluate the proposed scheme. Our experiments’ outcomes on a dataset of benign and evasive malware samples show that the proposed scheme achieved an accuracy of 0.967, false positive rate of 0.040, and F1 of 0.975.

Journal Article

Share this book

Add to My Shelf

Malware Detection Using Deep Learning and Correlation-Based Feature Selection

by Alyasseri, Zaid Abdi Alkareem , Sani, Nor Samsiah , Mohammed, Husam Jasim in Analysis , Artificial intelligence , Cyberterrorism

2023

Malware is one of the most frequent cyberattacks, with its prevalence growing daily across the network. Malware traffic is always asymmetrical compared to benign traffic, which is always symmetrical. Fortunately, there are many artificial intelligence techniques that can be used to detect malware and distinguish it from normal activities. However, the problem of dealing with large and high-dimensional data has not been addressed enough. In this paper, a high-performance malware detection system using deep learning and feature selection methodologies is introduced. Two different malware datasets are used to detect malware and differentiate it from benign activities. The datasets are preprocessed, and then correlation-based feature selection is applied to produce different feature-selected datasets. The dense and LSTM-based deep learning models are then trained using these different versions of feature-selected datasets. The trained models are then evaluated using many performance metrics (accuracy, precision, recall, and F1-score). The results indicate that some feature-selected scenarios preserve almost the same original dataset performance. The different nature of the used datasets shows different levels of performance changes. For the first dataset, the feature reduction ratios range from 18.18% to 42.42%, with performance degradation of 0.07% to 5.84%, respectively. The second dataset reduction rate is between 81.77% and 93.5%, with performance degradation of 3.79% and 9.44%, respectively.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter