Catalogue Search | MBRL

Studying the characteristics of AIOps projects on GitHub

by Khomh, Foutse , Li, Heng , Aghili, Roozbeh in Anomalies , Artificial intelligence , Machine learning

2023

Artificial Intelligence for IT Operations (AIOps) leverages AI approaches to handle the massive amount of data generated during the operations of software systems. Prior works have proposed various AIOps solutions to support different tasks in system operations and maintenance, such as anomaly detection. In this study, we conduct an in-depth analysis of open-source AIOps projects to understand the characteristics of AIOps in practice. We first carefully identify a set of AIOps projects from GitHub and analyze their repository metrics (e.g., the used programming languages). Then, we qualitatively examine the projects to understand their input data, analysis techniques, and goals. Finally, we assess the quality of these projects using different quality metrics, such as the number of bugs. To provide context, we also sample two sets of baseline projects from GitHub: a random sample of machine learning projects and a random sample of general-purposed projects. By comparing different metrics between our identified AIOps projects and these baselines, we derive meaningful insights. Our results reveal a recent and growing interest in AIOps solutions. However, the quality metrics indicate that AIOps projects suffer from more issues than our baseline projects. We also pinpoint the most common issues in AIOps approaches and discuss potential solutions to address these challenges. Our findings offer valuable guidance to researchers and practitioners, enabling them to comprehend the current state of AIOps practices and shed light on different ways of improving AIOps’ weaker aspects. To the best of our knowledge, this work marks the first attempt to characterize open-source AIOps projects.

Journal Article

Share this book

Add to My Shelf

Empirical evidence on the link between object-oriented measures and external quality attributes: a systematic literature review

by Börstler, Jürgen , Wohlin, Claes , Šmite, Darja in Compilers , Computer Science , Interpreters

2015

There is a plethora of studies investigating object-oriented measures and their link with external quality attributes, but usefulness of the measures may differ across empirical studies. This study aims to aggregate and identify useful object-oriented measures, specifically those obtainable from the source code of object-oriented systems that have gone through such empirical evaluation. By conducting a systematic literature review, 99 primary studies were identified and traced to four external quality attributes: reliability, maintainability, effectiveness and functionality. A vote-counting approach was used to investigate the link between object-oriented measures and the attributes, and to also assess the consistency of the relation reported across empirical studies. Most of the studies investigate links between object-oriented measures and proxies for reliability attributes, followed by proxies for maintainability. The least investigated attributes were: effectiveness and functionality. Measures from the C&K measurement suite were the most popular across studies. Vote-counting results suggest that complexity, cohesion, size and coupling measures have a better link with reliability and maintainability than inheritance measures. However, inheritance measures should not be overlooked during quality assessment initiatives; their link with reliability and maintainability could be context dependent. There were too few studies traced to effectiveness and functionality attributes; thus a meaningful vote-counting analysis could not be conducted for these attributes. Thus, there is a need for diversification of quality attributes investigated in empirical studies. This would help with identifying useful measures during quality assessment initiatives, and not just for reliability and maintainability aspects.

Journal Article

Share this book

Add to My Shelf

Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-Learning

by Perez-Meana, Hector , Corona-Fraga, Pablo , Hernandez-Suarez, Aldo in Analysis , Automation , C plus plus

2025

In cybersecurity, identifying and addressing vulnerabilities in source code is essential for maintaining secure IT environments. Traditional static and dynamic analysis techniques, although widely used, often exhibit high false-positive rates, elevated costs, and limited interpretability. Machine Learning (ML)-based approaches aim to overcome these limitations but encounter challenges related to scalability and adaptability due to their reliance on large labeled datasets and their limited alignment with the requirements of secure development teams. These factors hinder their ability to adapt to rapidly evolving software environments. This study proposes an approach that integrates Prototype-Based Model-Agnostic Meta-Learning(Proto-MAML) with a Question-Answer (QA) framework that leverages the Bidirectional Encoder Representations from Transformers (BERT) model. By employing Few-Shot Learning (FSL), Proto-MAML identifies and mitigates vulnerabilities with minimal data requirements, aligning with the principles of the Secure Development Lifecycle (SDLC) and Development, Security, and Operations (DevSecOps). The QA framework allows developers to query vulnerabilities and receive precise, actionable insights, enhancing its applicability in dynamic environments that require frequent updates and real-time analysis. The model outputs are interpretable, promoting greater transparency in code review processes and enabling efficient resolution of emerging vulnerabilities. Proto-MAML demonstrates strong performance across multiple programming languages, achieving an average precision of 98.49%, recall of 98.54%, F1-score of 98.78%, and exact match rate of 98.78% in PHP, Java, C, and C++.

Journal Article

Share this book

Add to My Shelf

Comparing four approaches for technical debt identification

by Vetro’, Antonio , Shull, Forrest , Izurieta, Clemente in Analysis , Automation , Compilers

2014

Software systems accumulate technical debt (TD) when short-term goals in software development are traded for long-term goals (e.g., quick-and-dirty implementation to reach a release date versus a well-refactored implementation that supports the long-term health of the project). Some forms of TD accumulate over time in the form of source code that is difficult to work with and exhibits a variety of anomalies. A number of source code analysis techniques and tools have been proposed to potentially identify the code-level debt accumulated in a system. What has not yet been studied is if using multiple tools to detect TD can lead to benefits, that is, if different tools will flag the same or different source code components. Further, these techniques also lack investigation into the symptoms of TD “interest” that they lead to. To address this latter question, we also investigated whether TD, as identified by the source code analysis techniques, correlates with interest payments in the form of increased defect- and change-proneness. Comparing the results of different TD identification approaches to understand their commonalities and differences and to evaluate their relationship to indicators of future TD “interest.” We selected four different TD identification techniques (code smells, automatic static analysis issues, grime buildup, and Modularity violations) and applied them to 13 versions of the Apache Hadoop open source software project. We collected and aggregated statistical measures to investigate whether the different techniques identified TD indicators in the same or different classes and whether those classes in turn exhibited high interest (in the form of a large number of defects and higher change-proneness). The outputs of the four approaches have very little overlap and are therefore pointing to different problems in the source code. Dispersed Coupling and Modularity violations were co-located in classes with higher defect-proneness. We also observed a strong relationship between Modularity violations and change-proneness. Our main contribution is an initial overview of the TD landscape, showing that different TD techniques are loosely coupled and therefore indicate problems in different locations of the source code. Moreover, our proxy interest indicators (change- and defect-proneness) correlate with only a small subset of TD indicators.

Journal Article

Share this book

Add to My Shelf

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

by Lashkari, Arash Habibi , Vombatkere, Nikhill , He, Xie in author profiling , Authorship , authorship attribution

2024

Over the past few decades, researchers have put their effort and paid significant attention to the authorship attribution field, as it plays an important role in software forensics analysis, plagiarism detection, security attack detection, and protection of trade secrets, patent claims, copyright infringement, or cases of software theft. It helps new researchers understand the state-of-the-art works on authorship attribution methods, identify and examine the emerging methods for authorship attribution, and discuss their key concepts, associated challenges, and potential future work that could help newcomers in this field. This paper comprehensively surveys authorship attribution methods and their key classifications, used feature types, available datasets, model evaluation criteria and metrics, and challenges and limitations. In addition, we discuss the potential future research directions of the authorship attribution field based on the insights and lessons learned from this survey work.

Journal Article

Share this book

Add to My Shelf

Studying logging practice in test code

by Zhang, Haonan , Lamothe, Maxime , Tang, Yiming in Computer engineering , Empirical analysis , Open source software

2022

Logging is widely used in modern software development to record run-time information for software systems and plays a significant role in software testing. Although the research area of logging has attracted much attention, little attention is paid to the practice of test logging (i.e., the logging involved in test files). To fill this knowledge gap, we conduct this empirical study to explore and disclose the practice of test logging. This study examines 21 open-source subjects with ∼70K logging statements, of which ∼48K are production logging statements and ∼22K are test logging statements. We organize our study by answering four research questions, and as a result, (1) we have yielded five findings to reveal the differences between test and production logging statements, (2) we have disclosed four findings regarding the differences between the maintenance efforts of test and production logging statements, (3) we have identified four reasons why developers use test log, and (4) we have uncovered the relationship between test logging and production logging. To the best of our knowledge, this is the first study that quantitatively and qualitatively analyzes the logging practices in test and production code, providing developers and researchers with insight into this topic.

Journal Article

Share this book

Add to My Shelf

Harnessing deep learning algorithms to predict software refactoring

by Al Qasem, Osama , Akour, Mohammed , Alenezi, Mamdouh in Algorithms , Bioinformatics , Computer vision

2020

All experimental results show how refactoring has a direct influence on improving software quality. [...]predicting refactoring promptly should be investigating, and building an accurate model becomes mandatory. Software developers still face a real challenge to pick the right time and software code for refactoring purposes as the operation needs time and budget [11]. [...]developers should be sure about which piece of code should be evolved before starting the process of refactoring to adopt the new requirements. Different methodologies are designed and built to help developers in the refactoring process such as code smells detection strategies [12], logic meta-programming [13], invariant mining [14] and search-based [15, 16]. [...]machine learning is harnessed in the area of prediction and shows noticeable performance in terms of prediction in various fields as computer vision, defect prediction, natural language processing, code comprehension, bioinformatics, speech recognition, and finance [17-24 ]. [...]is to control the information that flows out of memory known as reset gate unlike long short-term memory (LSTM), GRU hasn't had separate memory cell, instead of that gating unit that controls the flow of information inside the unit [36].

Journal Article

Share this book

Add to My Shelf

Analysing the Analysers: An Investigation of Source Code Analysis Tools

by Bhutani, Vikram , Buckley, Jim , Toosi, Farshad Ghassemi in Artificial intelligence , code quality , object oriented paradigm

2024

NOABSTRACTThe primary expectation from a software system revolves around its functionality. However, as the software development process advances, equal emphasis is placed on the quality of the software system for non-functional attributes like maintainability and performance. Tools are available to aid in this endeavour, assessing the quality of a software system from multiple perspectives.This study aims to perform a comprehensive analysis of a particular set of source code analytical tools by examining diverse perspectives found in the literature and documentations. Given the vast array of programming languages available today, selecting appropriate source-code analytical tools presents a significant challenge. Therefore, this analysis aims to provide general insights to aid in selecting a more suitable analytical tool tailored to specific requirements.Seven prominent static analysis tools, namely SonarQube, Coverty, CodeSonar, Snyk Code, ESLint, Klocwork, and PMD, were chosen based on their prevalence in the literature and recognition in the software development community. To systematically categorise and organise their distinctive features and capabilities, a taxonomy was developed. This taxonomy covers crucial dimensions, including input support, technology employed, extensibility, user experience, rules, configurability, and supported languages.The comparative analysis highlights the distinctive strengths of each tool. SonarQube stands out as a comprehensive solution with a hybrid approach supporting static and dynamic code evaluations, accommodating multiple languages and integrating with popular Integrated Development Environments (IDEs). Coverity excels in identifying security vulnerabilities and defects, making it an excellent choice for security -focused development. CodeSonar prioritises code security and safety, offering a robust analysis. Snyk Code and ESLint, focusing on JavaScript, emphasise code quality and standards adherence. Klocwork is exceptional in defect detection and security analysis for C, C++, and Java. Lastly, PMD specialises in Java, emphasising code style and best practices.

Journal Article

Share this book

Add to My Shelf

Source Code Features and their Dependencies: An Aggregative Statistical Analysis on Open-Source Java Software Systems

by Toosi, Farshad Ghassemi in Java , Object oriented principle , source code analysis

2023

Source code constitutes the static and human-readable component of a software system. It comprises an array of artifacts and features that collectively execute a specific set of tasks. Coding behaviours and patterns are formulated through the orchestrated utilization of distinct features in a specified sequence, fostering inter-dependencies among these features. This study seeks to explore into the presence of specific coding behaviours and patterns within Java, which could potentially unveil the extent to which developers endeavour to leverage the facilities and services that exist in the programming language aggregatively. In pursuit of investigating behaviours and patterns, 436 open-source Java projects are selected, each having more than 150 Java files (Classes and Interfaces), in a semi-randomized manner. For every project, 39 features have been chosen, and the frequency of each individual feature has been independently assessed. By employing linear regression, the interrelationships among all features across the complete array of projects are scrutinized. This analysis intends to uncover the manifestation of distinct coding behaviours and patterns. Based on the selected features, preliminary findings suggest a notable collective incorporation of diverse coding behaviours among programmers, encompassing Encapsulation and Polymorphism. The findings also point to a distinct preference for using a specific commenting mechanism, JavaDoc, and the potential existence of Code-Clone and dead code. Overall, the results indicate a clear tendency among programmers to strongly adhere to the fundamental principles of Object -Oriented programming. However, certain less obvious attributes of object-oriented languages appear to receive relatively less attention from programmers.

Journal Article

Share this book

Add to My Shelf

Analysis of the change in bugginess and adaptiveness of python software systems

by Yousuf, Mir Mohammad , Rashid, Mamoon in Coding , Coding standards , Computer Communication Networks

2022

The useful constructs in terms of dynamic features of programming languages bring the developers convenience and flexibility, but can also lead to the risks in the software maintenance. Evaluating whether the use of such features affects the maintenance can be significant for both practitioners as well as researchers, yet a very little amount of work has been done to investigate it in python software systems. Researchers believe that the occurrence of the bugs (pre-release or post-release) affects the maintenance activities. Thus, software maintenance is frequently a challenging and hectic process for both engineers as well as IT consultancy firms. There are few studies concerning the different aspects of the source code quality of python software systems such as comprehensiveness, bugginess, and adaptiveness. In this paper, the authors analyzed the python software systems to find the effect of the bugs on the overall project and correlate the effects with the maintenance cost. This study also analyzed the relation between coding practices using the coding standards of the programming language with that of the bugs. In order to provide more experimental evidence about the impact of the source code quality and bugginess on the software development and maintenance, the authors provided an experimental study on the 8 versions of Trac 0.12, 16 versions of Trac 1.0, 6 versions of Trac 1.1, and 3 versions of Trac 1.2. The results are more indicative of the bugs affecting the quality of the product and thus increasing the maintenance cost. Moreover, the results also show that with the uplifting of the software patch, the quality of the code is improved, thus reducing the maintenance cost.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter