Catalogue Search | MBRL

An Improved Software Source Code Vulnerability Detection Method: Combination of Multi-Feature Screening and Integrated Sampling Model

by Han, Daoqi , He, Xin , Fu, Xueliang in abstract syntax tree (AST) , Accuracy , Bi-LSTM

2025

Vulnerability detection in software source code is crucial in ensuring software security. Existing models face challenges with dataset class imbalance and long training times. To address these issues, this paper introduces a multi-feature screening and integrated sampling model (MFISM) to enhance vulnerability detection efficiency and accuracy. The key innovations include (i) utilizing abstract syntax tree (AST) representation of source code to extract potential vulnerability-related features through multiple feature screening techniques; (ii) conducting analysis of variance (ANOVA) and evaluating feature selection techniques to identify representative and discriminative features; (iii) addressing class imbalance by applying an integrated over-sampling strategy to create synthetic samples from vulnerable code to expand the minority class sample size; (iv) employing outlier detection technology to filter out abnormal synthetic samples, ensuring high-quality synthesized samples. The model employs a bidirectional long short-term memory network (Bi-LSTM) to accurately identify vulnerabilities in the source code. Experimental results demonstrate that MFISM improves the F1 score performance by approximately 10% compared to existing DeepBalance methods and reduces the training time to 2–3 h. These results confirm the effectiveness and superiority of MFISM in source code vulnerability detection tasks.

Journal Article

Share this book

Add to My Shelf

Study on Data Preprocessing for Machine Learning Based on Semiconductor Manufacturing Processes

by Park, Ha-Je , Han, Young-Shin , Nam, Choon-Sung in Accuracy , Bias , Classification

2024

Various data types generated in the semiconductor manufacturing process can be used to increase product yield and reduce manufacturing costs. On the other hand, the data generated during the process are collected from various sensors, resulting in diverse units and an imbalanced dataset with a bias towards the majority class. This study evaluated analysis and preprocessing methods for predicting good and defective products using machine learning to increase yield and reduce costs in semiconductor manufacturing processes. The SECOM dataset is used to achieve this, and preprocessing steps are performed, such as missing value handling, dimensionality reduction, resampling to address class imbalances, and scaling. Finally, six machine learning models were evaluated and compared using the geometric mean (GM) and other metrics to assess the combinations of preprocessing methods on imbalanced data. Unlike previous studies, this research proposes methods to reduce the number of features used in machine learning to shorten the training and prediction times. Furthermore, this study prevents data leakage during preprocessing by separating the training and test datasets before analysis and preprocessing. The results showed that applying oversampling methods, excluding KM SMOTE, achieves a more balanced class classification. The combination of SVM, ADASYN, and MaxAbs scaling showed the best performance with an accuracy and GM of 85.14% and 72.95%, respectively, outperforming all other combinations.

Journal Article

Share this book

Add to My Shelf

Sex differences in autism spectrum disorder using class imbalance adjusted functional connectivity

by Mun, Jongmin , Namgung, Jong Young , Park, Bo-yong in Adolescent , Adult , Association analysis

2024

•Investigated interaction effects between sex and autism condition.•Sex ratio was matched using the Gaussian mixture model-based oversampling.•Low-dimensional principal functional gradients were generated.•Sex-related effects were linked to gene enrichment in cortex, thalamus, and striatum.•Gradients showed varying associations with symptom severity across sexes. Autism spectrum disorder (ASD) is an atypical neurodevelopmental condition with a diagnostic ratio largely differing between male and female participants. Due to the sex imbalance in participants with ASD, we lack an understanding of the differences in connectome organization of the brain between male and female participants with ASD. In this study, we matched the sex ratio using a Gaussian mixture model-based oversampling technique and investigated the differences in functional connectivity between male and female participants with ASD using low-dimensional principal gradients. Between-group comparisons of the gradient values revealed significant interaction effects of sex in the sensorimotor, attention, and default mode networks. The sex-related differences in the gradients were highly associated with higher-order cognitive control processes. Transcriptomic association analysis provided potential biological underpinnings, specifying gene enrichment in the cortex, thalamus, and striatum during development. Finally, the principal gradients were differentially associated with symptom severity of ASD between sexes, highlighting significant effects in female participants with ASD. Our work proposed an oversampling method to mitigate sex imbalance in ASD and observed significant sex-related differences in functional connectome organization. The findings may advance our knowledge about the sex heterogeneity in large-scale brain networks in ASD.

Journal Article

Share this book

Add to My Shelf

CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques

by Zuo, Yun , Zeng, Xiangxiang , Liu, Xiangrong in Aging , Algorithms , Alzheimer's disease

2021

Background Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. Results In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. Conclusion The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/

Journal Article

Share this book

Add to My Shelf

Deep ensemble optimal classifier-based software defect prediction for early detection and quality assurance

by Janapati, Krishna Chaithanya , Sai Kiran, Mungalched , Vinay, Thota in Accuracy , Algorithms , Architecture

2025

Software Defect Prediction (SDP) is critical in identifying fault-prone modules during the software development life cycle, enhancing software quality, and reducing maintenance costs. However, existing SDP models face significant challenges, including high false positive rates and severe class imbalance between defect-prone and non-defect modules, which hinder practical model training and accurate defect detection. To overcome these challenges, this study proposes a robust and efficient SDP architecture that integrates several advanced techniques to enhance predictive accuracy and model reliability. The methodology begins with comprehensive data preprocessing, incorporating normalization to standardize feature scales and outlier detection to eliminate noise and improve data quality. To address the inherent class imbalance in the dataset, the minority oversampling by synthetic data (MOSD) technique is employed to generate synthetic samples for the minority class, thus achieving a more balanced distribution. Following this, an Adaptive sequential K-best (ASKB) feature selection strategy is implemented to identify and retain the most relevant features, effectively reducing dimensionality while preserving critical information. Finally, the weighted random forest (WRF) classifier is utilized, which assigns appropriate weights to training instances, enabling improved classification performance, especially for minority class instances. The proposed methodology achieved an accuracy of 99.119%, a precision of 99.431%, a recall of 99.122%, and an F1-score of 99.333%.

Journal Article

Share this book

Add to My Shelf

A Review on Micro-Watts All-Digital Frequency Synthesizers

by Lim, Xian Yang , Navaneethan, Venkadasamy , Teo, Boon Chiat Terence in CMOS , Communication , Complementary metal oxide semiconductors

2025

This paper reviews recent developments in highly integrated all-digital frequency synthesizers suitable to deploy in low-power internet-of-things (IoT) applications. This review sets low power consumption as a key criterion for exploring the all-digital frequency synthesizer implemented in CMOS fabrication technology. The alignment with mainstream CMOS technology offers high-density, comprehensive, robust signal processing capability, making it very suitable for all-digital phase-locked loops to harvest that capacity, and it becomes inevitable. This review includes various divider-less low-power frequency synthesizers, including all-digital phase-locked loops (ADPLL), all-digital frequency-locked loops (ADFLL), and hybrid PLLs. This paper also discusses the latest architectural developments for ADPLLs to lead to low-power implementation, such as DTC-assisted TDC, embedded TDC, and various levels of hybridization in ADPLLs.

Journal Article

Share this book

Add to My Shelf

Rockburst Intensity Classification Prediction Based on Multi-Model Ensemble Learning Algorithms

by Ma, Haoji , Wang, Jiachuang , Yan, Xianhang in Algorithms , Analysis , Classification

2023

Rockburst is a common and huge hazard in underground engineering, and the scientific prediction of rockburst disasters can reduce the risks caused by rockburst. At present, developing an accurate and reliable rockburst risk prediction model remains a great challenge due to the difficulty of integrating fusion algorithms to complement each other’s strengths. In order to effectively predict rockburst risk, firstly, 410 sets of valid rockburst data were collected as the original data set in this paper, which was used to process these rockburst cases by the SMOTE oversampling method. Then, four integrated algorithms and eight basic algorithms were selected, which were optimized by hyperparameters and five-fold cross-validation and combined with the random search grid method, thus improving the classification performance of these algorithms. Third, the stacking integration algorithm, which was combined with the principles of various machine learning algorithms and the characteristics of the rockburst cases, integrated the optimization of rockburst algorithms with reference to four combinatorial strategies. Further, we adopted the voting integration algorithm, chose multiple combination schemes, and referred to the weighted fusion of accuracy, F1 score, recall, precision, and cv-mean as the weight values, and the optimal model for rockburst risk prediction was obtained. Finally, using the 35 generated stacking integration algorithms and 18 voting integration algorithms, the optimal model in the fusion strategy was selected and the traditional integration algorithm model was analyzed on the basis of different sample combinations of the models. The results showed that the prediction performance of stacking and voting integration algorithms was mostly better than the ordinary machine-learning performance, and the selection of appropriate fusion strategies could effectively improve the performance of rockburst prediction for ensemble learning algorithms.

Journal Article

Share this book

Add to My Shelf

Design and analysis of digital data recovery circuits using oversampling

by Lin, C.-H. , Li, Z.-H. , Jou, S.-J. in Applied sciences , Architecture , Circuit properties

2007

A performance evaluation and circuit architecture for all-digital data recovery using an oversampling method is proposed. The architecture is very regular and hence very suitable for standard-cell implementation flow. Due to its feedforward architecture, the required bit-rate can be achieved through proper pipelining. These properties make the proposed architecture very suitable as soft silicon intellectual property. Analysis of BER due to the combined effects of the key design parameters like data jitter, clock jitter and oversampling ratio in the oversampling technique are carried out. Thus, different specifications of data recovery can be designed with different design parameters. A module generator that can estimate the design parameters automatically is established. Design implementation shows the proposed all-digital data recovery circuit can achieve 3.07 Gbit/s (post-layout) with 0.25 μm 2.5 V CMOS technology standard-cell design and occupies 380 x 390 μm2 chip area.

Journal Article

Share this book

Add to My Shelf

OpenSerDes: An Open Source Process-Portable All-Digital Serial Link

by Chatterjee, Baibhab , Gaurav, Kumar K , Sen, Shreyas in Application specific integrated circuits , Circuit design , CMOS

2021

In the last decade, the growing influence of open source software has necessitated the need to reduce the abstraction levels in hardware design. Open source hardware significantly reduces the development time, increasing the probability of first-pass success and enable developers to optimize software solutions based on hardware features, thereby reducing the design costs. The recent introduction of open source Process Development Kit (OpenPDK) by Skywater technologies in June 2020 has eliminated the barriers to Application-Specific Integrated Circuit (ASIC) design, which is otherwise considered expensive and not easily accessible. The OpenPDK is the first concrete step towards achieving the goal of open source circuit blocks that can be imported to reuse and modify in ASIC design. With process technologies scaling down for better performance, the need for entirely digital designs, which can be synthesized in any standard Automatic Place-and-Route (APR) tool, has increased considerably, for mapping physical design to the new process technology. This work presents the first open source all-digital Serializer/Deserializer (SerDes) for multi-GHz serial links designed using Skywater OpenPDK 130nm process node. To ensure that the design is fully synthesizable, the SerDes uses CMOS inverter-based drivers at the Tx, while the Rx front end comprises a resistive feedback inverter as a sensing element, followed by sampling elements. A fully digital oversampling CDR at the Rx recovers the Tx clock for proper decoding of data bits. The physical design flow utilizes OpenLANE, which is an end-to-end tool for generating GDS from RTL. Virtuoso has been used for extracting parasitics for post-layout simulations, which exhibit the SerDes functionality at 2 Gbps for 34 dB channel loss while consuming 438 mW power. The GDS and netlist files of the SerDes are uploaded in a GitHub repository for public access.

Paper

Share this book

Add to My Shelf

VLSI Implementation of Novel Class of High Speed Pipelined Digital Signal Processing Filter for Wireless Receivers

by Islam, Md Shabiul , Yazan Samir Algnabi , Teymourzadeh, Rozita in Analog to digital converters , Communications systems , Digital signal processing

2018

The need for a high-performance transceiver with high Signal to Noise Ratio (SNR) has driven the communication system to utilize the latest technique identified as oversampling systems. It was the most economical modulator and decimation in the communication system. It has been proven to increase the SNR and is used in many high-performance systems such as in the Analog to Digital Converter (ADC) for wireless transceiver. This research work presented the design of the novel class of decimation and it's VLSI implementation which was the sub-component in the oversampling technique. The design and realization of the main unit of decimation stage that was the Cascaded Integrator Comb (CIC) filter, the associated half-band filters, and the droop correction are also designed. The Verilog HDL code in Xilinx ISE environment has been derived to describe the proposed advanced CIC filter properties. Consequently, Virtex-II FPGA board was used to implement and test the design on the real hardware. The ASIC design implementation was performed accordingly and resulted in power and area measurement on-chip core layout. The proposed design focused on the trade-off between the high speed and the low power consumption as well as the silicon area and high resolution for the chip implementation which satisfies wireless communication systems. The synthesis report illustrates the maximum clock frequency of 332 MHz with the active core area of 0.308 x 0.308 mm2. It can be concluded that VLSI implementation of proposed filter architecture is an enabler in solving problems that affect communication capability in DSP application.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter