Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,984
result(s) for
"source code analysis"
Sort by:
Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-Learning
by
Perez-Meana, Hector
,
Corona-Fraga, Pablo
,
Hernandez-Suarez, Aldo
in
Analysis
,
Automation
,
C plus plus
2025
In cybersecurity, identifying and addressing vulnerabilities in source code is essential for maintaining secure IT environments. Traditional static and dynamic analysis techniques, although widely used, often exhibit high false-positive rates, elevated costs, and limited interpretability. Machine Learning (ML)-based approaches aim to overcome these limitations but encounter challenges related to scalability and adaptability due to their reliance on large labeled datasets and their limited alignment with the requirements of secure development teams. These factors hinder their ability to adapt to rapidly evolving software environments. This study proposes an approach that integrates Prototype-Based Model-Agnostic Meta-Learning(Proto-MAML) with a Question-Answer (QA) framework that leverages the Bidirectional Encoder Representations from Transformers (BERT) model. By employing Few-Shot Learning (FSL), Proto-MAML identifies and mitigates vulnerabilities with minimal data requirements, aligning with the principles of the Secure Development Lifecycle (SDLC) and Development, Security, and Operations (DevSecOps). The QA framework allows developers to query vulnerabilities and receive precise, actionable insights, enhancing its applicability in dynamic environments that require frequent updates and real-time analysis. The model outputs are interpretable, promoting greater transparency in code review processes and enabling efficient resolution of emerging vulnerabilities. Proto-MAML demonstrates strong performance across multiple programming languages, achieving an average precision of 98.49%, recall of 98.54%, F1-score of 98.78%, and exact match rate of 98.78% in PHP, Java, C, and C++.
Journal Article
Studying the characteristics of AIOps projects on GitHub
by
Khomh, Foutse
,
Li, Heng
,
Aghili, Roozbeh
in
Anomalies
,
Artificial intelligence
,
Machine learning
2023
Artificial Intelligence for IT Operations (AIOps) leverages AI approaches to handle the massive amount of data generated during the operations of software systems. Prior works have proposed various AIOps solutions to support different tasks in system operations and maintenance, such as anomaly detection. In this study, we conduct an in-depth analysis of open-source AIOps projects to understand the characteristics of AIOps in practice. We first carefully identify a set of AIOps projects from GitHub and analyze their repository metrics (e.g., the used programming languages). Then, we qualitatively examine the projects to understand their input data, analysis techniques, and goals. Finally, we assess the quality of these projects using different quality metrics, such as the number of bugs. To provide context, we also sample two sets of baseline projects from GitHub: a random sample of machine learning projects and a random sample of general-purposed projects. By comparing different metrics between our identified AIOps projects and these baselines, we derive meaningful insights. Our results reveal a recent and growing interest in AIOps solutions. However, the quality metrics indicate that AIOps projects suffer from more issues than our baseline projects. We also pinpoint the most common issues in AIOps approaches and discuss potential solutions to address these challenges. Our findings offer valuable guidance to researchers and practitioners, enabling them to comprehend the current state of AIOps practices and shed light on different ways of improving AIOps’ weaker aspects. To the best of our knowledge, this work marks the first attempt to characterize open-source AIOps projects.
Journal Article
Boosting source code learning with text-oriented data augmentation: an empirical study
by
Papadakis, Mike
,
Le Traon, Yves
,
Dong, Zeming
in
Artificial neural networks
,
Classification
,
Cloning
2025
Recent studies have demonstrated remarkable advancements in
source code learning
, which applies
deep neural networks (DNNs)
to tackle various software engineering tasks. Similar to other DNN-based domains, source code learning also requires massive high-quality training data to achieve the success of these applications. Data augmentation, a technique used to produce additional training data, is widely adopted in other domains (e.g.
computer vision
). However, the existing practice of data augmentation in source code learning is limited to simple syntax-preserved methods, such as code refactoring. In this paper, considering that source code can also be represented as text data, we take an early step to investigate the effectiveness of data augmentation methods originally designed for natural language texts in the context of source code learning. To this end, we focus on code classification tasks and conduct a comprehensive empirical study across four critical code problems and four DNN architectures to assess the effectiveness of 25 data augmentation methods. Our results reveal specific data augmentation methods that yield more accurate and robust models for source code learning. Additionally, we discover that the data augmentation methods remain beneficial even when they slightly break source code syntax.
Journal Article
Empirical evidence on the link between object-oriented measures and external quality attributes: a systematic literature review
2015
There is a plethora of studies investigating object-oriented measures and their link with external quality attributes, but usefulness of the measures may differ across empirical studies. This study aims to aggregate and identify useful object-oriented measures, specifically those obtainable from the source code of object-oriented systems that have gone through such empirical evaluation. By conducting a systematic literature review, 99 primary studies were identified and traced to four external quality attributes: reliability, maintainability, effectiveness and functionality. A vote-counting approach was used to investigate the link between object-oriented measures and the attributes, and to also assess the consistency of the relation reported across empirical studies. Most of the studies investigate links between object-oriented measures and proxies for reliability attributes, followed by proxies for maintainability. The least investigated attributes were: effectiveness and functionality. Measures from the C&K measurement suite were the most popular across studies. Vote-counting results suggest that complexity, cohesion, size and coupling measures have a better link with reliability and maintainability than inheritance measures. However, inheritance measures should not be overlooked during quality assessment initiatives; their link with reliability and maintainability could be context dependent. There were too few studies traced to effectiveness and functionality attributes; thus a meaningful vote-counting analysis could not be conducted for these attributes. Thus, there is a need for diversification of quality attributes investigated in empirical studies. This would help with identifying useful measures during quality assessment initiatives, and not just for reliability and maintainability aspects.
Journal Article
Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey
by
Lashkari, Arash Habibi
,
Vombatkere, Nikhill
,
He, Xie
in
author profiling
,
Authorship
,
authorship attribution
2024
Over the past few decades, researchers have put their effort and paid significant attention to the authorship attribution field, as it plays an important role in software forensics analysis, plagiarism detection, security attack detection, and protection of trade secrets, patent claims, copyright infringement, or cases of software theft. It helps new researchers understand the state-of-the-art works on authorship attribution methods, identify and examine the emerging methods for authorship attribution, and discuss their key concepts, associated challenges, and potential future work that could help newcomers in this field. This paper comprehensively surveys authorship attribution methods and their key classifications, used feature types, available datasets, model evaluation criteria and metrics, and challenges and limitations. In addition, we discuss the potential future research directions of the authorship attribution field based on the insights and lessons learned from this survey work.
Journal Article
Comparing four approaches for technical debt identification
2014
Software systems accumulate
technical debt
(TD) when short-term goals in software development are traded for long-term goals (e.g., quick-and-dirty implementation to reach a release date versus a well-refactored implementation that supports the long-term health of the project). Some forms of TD accumulate over time in the form of source code that is difficult to work with and exhibits a variety of anomalies. A number of source code analysis techniques and tools have been proposed to potentially identify the code-level debt accumulated in a system. What has not yet been studied is if using multiple tools to detect TD can lead to benefits, that is, if different tools will flag the same or different source code components. Further, these techniques also lack investigation into the symptoms of TD “interest” that they lead to. To address this latter question, we also investigated whether TD, as identified by the source code analysis techniques, correlates with interest payments in the form of increased defect- and change-proneness. Comparing the results of different TD identification approaches to understand their commonalities and differences and to evaluate their relationship to indicators of future TD “interest.” We selected four different TD identification techniques (code smells, automatic static analysis issues, grime buildup, and Modularity violations) and applied them to 13 versions of the Apache Hadoop open source software project. We collected and aggregated statistical measures to investigate whether the different techniques identified TD indicators in the same or different classes and whether those classes in turn exhibited high interest (in the form of a large number of defects and higher change-proneness). The outputs of the four approaches have very little overlap and are therefore pointing to different problems in the source code. Dispersed Coupling and Modularity violations were co-located in classes with higher defect-proneness. We also observed a strong relationship between Modularity violations and change-proneness. Our main contribution is an initial overview of the TD landscape, showing that different TD techniques are loosely coupled and therefore indicate problems in different locations of the source code. Moreover, our proxy interest indicators (change- and defect-proneness) correlate with only a small subset of TD indicators.
Journal Article
Studying logging practice in test code
by
Zhang, Haonan
,
Lamothe, Maxime
,
Tang, Yiming
in
Computer engineering
,
Empirical analysis
,
Open source software
2022
Logging is widely used in modern software development to record run-time information for software systems and plays a significant role in software testing. Although the research area of logging has attracted much attention, little attention is paid to the practice of test logging (i.e., the logging involved in test files). To fill this knowledge gap, we conduct this empirical study to explore and disclose the practice of test logging. This study examines 21 open-source subjects with ∼70K logging statements, of which ∼48K are production logging statements and ∼22K are test logging statements. We organize our study by answering four research questions, and as a result, (1) we have yielded five findings to reveal the differences between test and production logging statements, (2) we have disclosed four findings regarding the differences between the maintenance efforts of test and production logging statements, (3) we have identified four reasons why developers use test log, and (4) we have uncovered the relationship between test logging and production logging. To the best of our knowledge, this is the first study that quantitatively and qualitatively analyzes the logging practices in test and production code, providing developers and researchers with insight into this topic.
Journal Article
Analysing the Analysers: An Investigation of Source Code Analysis Tools
by
Bhutani, Vikram
,
Buckley, Jim
,
Toosi, Farshad Ghassemi
in
Artificial intelligence
,
code quality
,
object oriented paradigm
2024
NOABSTRACTThe primary expectation from a software system revolves around its functionality. However, as the software development process advances, equal emphasis is placed on the quality of the software system for non-functional attributes like maintainability and performance. Tools are available to aid in this endeavour, assessing the quality of a software system from multiple perspectives.This study aims to perform a comprehensive analysis of a particular set of source code analytical tools by examining diverse perspectives found in the literature and documentations. Given the vast array of programming languages available today, selecting appropriate source-code analytical tools presents a significant challenge. Therefore, this analysis aims to provide general insights to aid in selecting a more suitable analytical tool tailored to specific requirements.Seven prominent static analysis tools, namely SonarQube, Coverty, CodeSonar, Snyk Code, ESLint, Klocwork, and PMD, were chosen based on their prevalence in the literature and recognition in the software development community. To systematically categorise and organise their distinctive features and capabilities, a taxonomy was developed. This taxonomy covers crucial dimensions, including input support, technology employed, extensibility, user experience, rules, configurability, and supported languages.The comparative analysis highlights the distinctive strengths of each tool. SonarQube stands out as a comprehensive solution with a hybrid approach supporting static and dynamic code evaluations, accommodating multiple languages and integrating with popular Integrated Development Environments (IDEs). Coverity excels in identifying security vulnerabilities and defects, making it an excellent choice for security -focused development. CodeSonar prioritises code security and safety, offering a robust analysis. Snyk Code and ESLint, focusing on JavaScript, emphasise code quality and standards adherence. Klocwork is exceptional in defect detection and security analysis for C, C++, and Java. Lastly, PMD specialises in Java, emphasising code style and best practices.
Journal Article
An Empirical Study of Automated Machine Learning Python Libraries Using Source Code Analysis
by
Lynch, Conor
,
O’Leary, Christian
,
Toosi, Farshad Ghassemi
in
Artificial intelligence
,
AutoML
,
Coding standards
2026
The growth of Automated Machine Learning (AutoML) has expanded access to machine learning workflows by enabling the automation of tasks and reducing the technical barrier to entry. However, the reliability and maintainability of these libraries depend on the quality of their underlying source code. This study presents a novel, systematic analysis of 16 Python AutoML libraries utilising SonarQube – an industry-standard SCA platform – and Python analysis tools: Bandit, Coverage.py, Prospector, Pylint, Radon, and Ruff. The AutoML Libraries are evaluated using software quality metrics, which collectively reflect overall code complexity, maintainability, security, and adherence to Python coding standards.
Strong agreement was observed between SonarQube-based rankings and rankings derived from Python-based tools. Based on median SCA rankings, the libraries were ordered (highest to lowest estimated code quality) as follows: Hyperopt-sklearn, AutoKeras, GAMA, MLBox, FEDOT, TPOT, MLJAR, LightAutoML, Auto-sklearn, PyCaret, FLAML, Auto-PyTorch, Ludwig, EvalML, AutoTS, and AutoGluon.
An additional exploratory Spearman rank correlation analysis examined the relationship between SCA metrics and forecasting performance measures from a prior electricity price prediction benchmark (
= 7). Several SCA metrics exhibit strong monotonic relationships with forecasting error measures, e.g., SonarQube
and
correlate positively with mean absolute error (ρ = 0.86), while
(ρ = −0.89) and
(ρ = − 0.86) correlate negatively with library execution time. Due to the limited sample size, these findings are descriptive and non-parametric. The results suggest that code quality scores may relate to lower-bound predictive performance and computational efficiency, warranting further validation.
Journal Article