Catalogue Search | MBRL

MalHAPGNN: An Enhanced Call Graph-Based Malware Detection Framework Using Hierarchical Attention Pooling Graph Neural Network

by Hu, Jingjing , Xue, Jingfeng , Guo, Wenjie in Accuracy , Artificial intelligence , Classification

2025

While deep learning techniques have been extensively employed in malware detection, there is a notable challenge in effectively embedding malware features. Current neural network methods primarily capture superficial characteristics, lacking in-depth semantic exploration of functions and failing to preserve structural information at the file level. Motivated by the aforementioned challenges, this paper introduces MalHAPGNN, a novel framework for malware detection that leverages a hierarchical attention pooling graph neural network based on enhanced call graphs. Firstly, to ensure semantic richness, a Bidirectional Encoder Representations from Transformers-based (BERT) attribute-enhanced function embedding method is proposed for the extraction of node attributes in the function call graph. Subsequently, this work designs a hierarchical graph neural network that integrates attention mechanisms and pooling operations, complemented by function node sampling and structural learning strategies. This framework delivers a comprehensive profile of malicious code across semantic, syntactic, and structural dimensions. Extensive experiments conducted on the Kaggle and VirusShare datasets have demonstrated that the proposed framework outperforms other graph neural network (GNN)-based malware detection methods.

Journal Article

Share this book

Add to My Shelf

Deep learning for effective Android malware detection using API call graph embeddings

by Acarman, Tankut , Pektaş, Abdurrahman in Accuracy , Algorithms , Application programming interface

2020

High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F -measure, 98.47% in recall and 98.84% in precision, respectively.

Journal Article

Share this book

Add to My Shelf

Embedding vector generation based on function call graph for effective malware detection and classification

by Wu, Xiao-Wang , Jia, Peng , Wang, Yan in Accuracy , Artificial Intelligence , Classification

2022

The surge of malware poses a huge threat to cyberspace security. The existing malware analysis methods based on machine learning mainly rely on feature engineering. These methods need to extract many handcrafted features from the malware to improve accuracy, which increases the complexity of malware analysis. In order to solve this problem, this paper proposes GEMAL, a new malware analysis method based on function call graph (FCG) and graph embedding network. FCG contains the structure information of the binary file and has been used in various research of malware analysis. Inspired by natural language processing tasks, we treat instructions as words and functions as sentences, so that we can automatically extract semantic features using the natural language processing method. We use an attention mechanism based graph embedding network to combine structural features and semantic features to generate embedding vectors of malware for automatic and efficient malware analysis. We use two datasets to test the efficiency of GEMAL. One is a self-created dataset named WUFCG, which contains 70,188 real-world samples. The other one is the public dataset of the Microsoft Malware Classification Challenge, which contains 10,868 samples. Experimental results show that GEMAL can detect real-world malware with 99.16% accuracy and classify malware families with the best accuracy of 99.81%.

Journal Article

Share this book

Add to My Shelf

NMal-Droid: network-based android malware detection system using transfer learning and CNN-BiGRU ensemble

by Srivastava, Gautam , Zhao, Yue , Lin, Jerry Chun-Wei in Algorithms , Artificial neural networks , Classification

2024

Currently, malware activities pose a substantial risk to the security of Android applications. These risks are capable of stealing important information and causing chaos in the economy, social structure, and financial sector. Malicious network traffic targets Android applications due to their constant connectivity. This study develops the NMal-Droid approach for network-based Android malware detection and classification. First, we designed a packet parser algorithm that filters the combination of HTTP traces and TCP flows from PCAPs (Packet Capturing) files. Second, the fine-tune embedding approach is developed that uses a word2vec pre-trained model to analyze features’ embeddings in three different ways, i.e., random, static, and dynamic. It is used to learn and extract feature-matrix matrices with related meanings. Third, The Convolutional Neural Network (CNN) is used to extract effective features from embedded information. Fourth, the Bi-directional Gated Recurrent Unit (Bi-GRU) neural network is designed to compute gradient computation in the context of time-forward and time-reversed. Finally, a multi-head ensemble of CNN-BiGRU is developed for accurate malware classification and detection. The proposed approach is evaluated on five different activation functions with 100 filters and a range of 1–5 kernel sizes for in-depth investigation. An explainable AI-based experiment is conducted to interpret and validate the proposed approach. The proposed method is tested using two big Android malware datasets, CIC-AAGM2017 and CICMalDroid 2020, which comprise a total of 10.2k malware and 3.2K benign samples. It is shown that the proposed approach outperforms as compared to the state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

BejaGNN: behavior-based Java malware detection via graph neural network

by Yang, Li , Ma, Jianfeng , Feng, Pengbin in Algorithms , Anti-virus software , Archives & records

2023

As a popular platform-independent language, Java is widely used in enterprise applications. In the past few years, language vulnerabilities exploited by Java malware have become increasingly prevalent, which cause threats for multi-platform. Security researchers continuously propose various approaches for fighting against Java malware programs. The low code path coverage and poor execution efficiency of dynamic analysis limit the large-scale application of dynamic Java malware detection methods. Therefore, researchers turn to extracting abundant static features to implement efficient malware detection. In this paper, we explore the direction of capturing malware semantic information by using graph learning algorithms and present BejaGNN (Behavior-based Java malware detection via Graph Neural Network), a novel behavior-based Java malware detection method using static analysis, word embedding technique, and graph neural network. Specifically, BejaGNN leverages static analysis techniques to extract ICFGs (Inter-procedural Control Flow Graph) from Java program files and then prunes these ICFGs to remove noisy instructions. Then, word embedding techniques are adopted to learn semantic representations for Java bytecode instructions. Finally, BejaGNN builds a graph neural network classifier to determine the maliciousness of Java programs. Experimental results on a public Java bytecode benchmark demonstrate that BejaGNN achieves high 1 98.8% and is superior to existing Java malware detection approaches, which verifies the promise of graph neural network in Java malware detection.

Journal Article

Share this book

Add to My Shelf

Semantic lossless encoded image representation for malware classification

by Chakrabarti, Prasun , Yu, Yaoxiang , Aziz, Kamran in 639/705/117 , 639/705/258 , Artificial intelligence

2025

Combining artificial intelligence with static analysis is an effective method for classifying malicious code. Due to the development of anti-analysis techniques, malicious code commonly employs obfuscation methods like packing, which result in garbled assembly code and the loss of original semantics. Consequently, existing pre-trained code language models are rendered ineffective in such scenarios. Current research addresses this issue by converting malicious bytecode into grayscale images and extracting visual features for classification. However, this process truncates the original sequence, compromising its coherence and structure. Furthermore, the image dimensions undergo compression and cropping based on the model’s input requirements, leading to the loss of intricate details. Our solution is a lossless encoding method for the visual structure of code, enabling unrestricted processing of malicious code images of any size. We convert bytecode files into semantically lossless images with proportional width. Then, we use image interleaving encoding to address semantic truncation issues caused by traditional image preprocessing methods. This method also prevents the loss of original code information due to image cropping or compression. For feature extraction, our goal is to combine the lossless encoding results with both local receptive field features and global contextual features. For local features, we achieve uniform embedding of variably sized input samples into equally sized feature maps using a multi-scale feature extraction module. For global contextual features, we reframe the feature maps along the row dimension, treating them as long-text sequences embedded in a matrix. We segment the feature maps into multiple row patch blocks and modify the Transformer’s input components to cache and merge the hidden states of each block. Comparative experiments on various malware datasets demonstrate the effectiveness of our method, consistently achieving outstanding performance across classification metrics.

Journal Article

Share this book

Add to My Shelf

An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL

by Huang, Mingqing , Jiang, Qingshan , Aljofey, Ali in Algorithms , Artificial neural networks , Blacklisting

2020

Phishing is the easiest way to use cybercrime with the aim of enticing people to give accurate information such as account IDs, bank details, and passwords. This type of cyberattack is usually triggered by emails, instant messages, or phone calls. The existing anti-phishing techniques are mainly based on source code features, which require to scrape the content of web pages, and on third-party services which retard the classification process of phishing URLs. Although the machine learning techniques have lately been used to detect phishing, they require essential manual feature engineering and are not an expert at detecting emerging phishing offenses. Due to the recent rapid development of deep learning techniques, many deep learning-based methods have also been introduced to enhance the classification performance. In this paper, a fast deep learning-based solution model, which uses character-level convolutional neural network (CNN) for phishing detection based on the URL of the website, is proposed. The proposed model does not require the retrieval of target website content or the use of any third-party services. It captures information and sequential patterns of URL strings without requiring a prior knowledge about phishing, and then uses the sequential pattern features for fast classification of the actual URL. For evaluations, comparisons are provided between different traditional machine learning models and deep learning models using various feature sets such as hand-crafted, character embedding, character level TF-IDF, and character level count vectors features. According to the experiments, the proposed model achieved an accuracy of 95.02% on our dataset and an accuracy of 98.58%, 95.46%, and 95.22% on benchmark datasets which outperform the existing phishing URL models.

Journal Article

Share this book

Add to My Shelf

Embedding and Siamese deep neural network-based malware detection in Internet of Things

by Srinivasulu, Asadi , Lakshmi, T. Sree , Govindarajan, M. in Accuracy , Artificial intelligence , Artificial neural networks

2025

PurposeA proper understanding of malware characteristics is necessary to protect massive data generated because of the advances in Internet of Things (IoT), big data and the cloud. Because of the encryption techniques used by the attackers, network security experts struggle to develop an efficient malware detection technique. Though few machine learning-based techniques are used by researchers for malware detection, large amounts of data must be processed and detection accuracy needs to be improved for efficient malware detection. Deep learning-based methods have gained significant momentum in recent years for the accurate detection of malware. The purpose of this paper is to create an efficient malware detection system for the IoT using Siamese deep neural networks.Design/methodology/approachIn this work, a novel Siamese deep neural network system with an embedding vector is proposed. Siamese systems have generated significant interest because of their capacity to pick up a significant portion of the input. The proposed method is efficient in malware detection in the IoT because it learns from a few records to improve forecasts. The goal is to determine the evolution of malware similarity in emerging domains of technology.FindingsThe cloud platform is used to perform experiments on the Malimg data set. ResNet50 was pretrained as a component of the subsystem that established embedding. Each system reviews a set of input documents to determine whether they belong to the same family. The results of the experiments show that the proposed method outperforms existing techniques in terms of accuracy and efficiency.Originality/valueThe proposed work generates an embedding for each input. Each system examined a collection of data files to determine whether they belonged to the same family. Cosine proximity is also used to estimate the vector similarity in a high-dimensional area.

Journal Article

Share this book

Add to My Shelf

Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection

by Magableh, Aws A. , AlSobeh, Anas M.R. , Shatnawi, Amani in Algorithms , Artificial intelligence , Comparative analysis

2024

As social media platforms continue their exponential growth, so do the threats targeting their security. Detecting disguised spam messages poses an immense challenge owing to the constant evolution of tactics. This research investigates advanced artificial intelligence techniques to significantly enhance multiplatform spam classification on Twitter and YouTube. The deep neural networks we use are state-of-the-art. They are recurrent neural network architectures with long- and short-term memory cells that are powered by both static and contextualized word embeddings. Extensive comparative experiments precede rigorous hyperparameter tuning on the datasets. Results reveal a profound impact of tailored, platform-specific AI techniques in combating sophisticated and perpetually evolving threats. The key innovation lies in tailoring deep learning (DL) architectures to leverage both intrinsic platform contexts and extrinsic contextual embeddings for strengthened generalization. The results include consistent accuracy improvements of more than 10–15% in multisource datasets, unlocking actionable guidelines on optimal components of neural models, and embedding strategies for cross-platform defense systems. Contextualized embeddings like BERT and ELMo consistently outperform their noncontextualized counterparts. The standalone ELMo model with logistic regression emerges as the top performer, attaining exceptional accuracy scores of 90% on Twitter and 94% on YouTube data. This signifies the immense potential of contextualized language representations in capturing subtle semantic signals vital for identifying disguised spam. As emerging adversarial attacks exploit human vulnerabilities, advancing defense strategies through enhanced neural language understanding is imperative. We recommend that social media companies and academic researchers build on contextualized language models to strengthen social media security. This research approach demonstrates the immense potential of personalized, platform-specific DL techniques to combat the continuously evolving threats that threaten social media security.

Journal Article

Share this book

Add to My Shelf

A Crypto-Steganography Approach for Hiding Ransomware within HEVC Streams in Android IoT Devices

by Almomani, Iman , El-Shafai, Walid , Alkhayer, Aala in Algorithms , Blockchain , Cryptography

2022

Steganography is a vital security approach that hides any secret content within ordinary data, such as multimedia. This hiding aims to achieve the confidentiality of the IoT secret data; whether it is benign or malicious (e.g., ransomware) and for defensive or offensive purposes. This paper introduces a hybrid crypto-steganography approach for ransomware hiding within high-resolution video frames. This proposed approach is based on hybridizing an AES (advanced encryption standard) algorithm and LSB (least significant bit) steganography process. Initially, AES encrypts the secret Android ransomware data, and then LSB embeds it based on random selection criteria for the cover video pixels. This research examined broad objective and subjective quality assessment metrics to evaluate the performance of the proposed hybrid approach. We used different sizes of ransomware samples and different resolutions of HEVC (high-efficiency video coding) frames to conduct simulation experiments and comparison studies. The assessment results prove the superior efficiency of the introduced hybrid crypto-steganography approach compared to other existing steganography approaches in terms of (a) achieving the integrity of the secret ransomware data, (b) ensuring higher imperceptibility of stego video frames, (3) introducing a multi-level security approach using the AES encryption in addition to the LSB steganography, (4) performing randomness embedding based on RPS (random pixel selection) for concealing secret ransomware bits, (5) succeeding in fully extracting the ransomware data at the receiver side, (6) obtaining strong subjective and objective qualities for all tested evaluation metrics, (7) embedding different sizes of secret data at the same time within the video frame, and finally (8) passing the security scanning tests of 70 antivirus engines without detecting the existence of the embedded ransomware.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter