Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5
result(s) for
"Petrulio Fernando"
Sort by:
On Refining the SZZ Algorithm with Bug Discussion Data
by
Bacchelli, Alberto
,
Rani, Pooja
,
Petrulio, Fernando
in
Algorithms
,
Compilers
,
Computer Science
2024
Context
Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the
SZZ
algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits.
Objectives
Despite several improvements and variants,
SZZ
struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka
tangled commit
and
ghost commits
).
Methods
Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the
SZZ
algorithm.
Results
To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset,
RoTEB
, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for
SZZ
. After we found evidence that the mentioned files are relevant, we augment
SZZ
with this information, using different strategies, and evaluate the resulting approach against multiple
SZZ
variations.
Conclusion
We define a taxonomy outlining the rationale behind developers’ references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the
SZZ
algorithm’s efficacy. Then, we verify that integrating these file references augments the precision of
SZZ
in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for
SZZ
. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material:
https://zenodo.org/records/11484723
.
Journal Article
The evolution of the code during review: an investigation on review changes
by
Bacchelli, Alberto
,
Fregnan, Enrico
,
Petrulio, Fernando
in
Investigations
,
Open source software
,
Software engineering
2022
Code review is a software engineering practice in which reviewers manually inspect the code written by a fellow developer and propose any change that is deemed necessary or useful. The main goal of code review is to improve the quality of the code under review. Despite the widespread use of code review, only a few studies focused on the investigation of its outcomes, for example, investigating the code changes that happen to the code under review. The goal of this paper is to expand our knowledge on the outcome of code review while re-evaluating results from previous work. To this aim, we analyze changes that happened during the review process, which we define as review changes. Considering three popular open-source software projects, we investigate the types of review changes (based on existing taxonomies) and what triggers them; also, we study which code factors in a code review are most related to the number of review changes. Our results show that the majority of changes relate to evolvability concerns, with a strong prevalence of documentation and structure changes at type-level. Furthermore, differently from past work, we found that the majority of review changes are not triggered by reviewers’ comments. Finally, we find that the number of review changes in a code review is related to the size of the initial patch as well as the new lines of code that it adds. However, other factors, such as lines deleted or the author of the review patchset, do not always show an empirically supported relationship with the number of changes.
Journal Article
What happens in my code reviews? An investigation on automatically classifying review changes
by
Petrulio, Fernando
,
Bacchelli, Alberto
,
Di Geronimo, Linda
in
Classification
,
Machine learning
,
Software engineering
2022
Code reviewing is a widespread practice used by software engineers to maintain high code quality. To date, the knowledge on the effect of code review on source code is still limited. Some studies have addressed this problem by classifying the types of changes that take place during the review process (a.k.a. review changes), as this strategy can, for example, pinpoint the immediate effect of reviews on code. Nevertheless, this classification (1) is not scalable, as it was conducted manually, and (2) was not assessed in terms of how meaningful the provided information is for practitioners. This paper aims at addressing these limitations: First, we investigate to what extent a machine learning-based technique can automatically classify review changes. Then, we evaluate the relevance of information on review change types and its potential usefulness, by conducting (1) semi-structured interviews with 12 developers and (2) a qualitative study with 17 developers, who are asked to assess reports on the review changes of their project. Key results of the study show that not only it is possible to automatically classify code review changes, but this information is also perceived by practitioners as valuable to improve the code review process. Data and materials: https://doi.org/10.5281/zenodo.5592254
Journal Article
The indolent lambdification of Java
by
Bacchelli, Alberto
,
Sawant, Anand Ashok
,
Petrulio Fernando
in
Compilers
,
Consumers
,
Investigations
2021
As Java 8 introduced functional interfaces and lambda expressions to the Java programming language, the JDK API was changed to introduce support for lambda expressions, thus allowing consumers to define lambda functions when using Java’s collections. While the JDK API allows for a functional paradigm, for API consumers to be able to completely embrace Java’s new functional features, third-party APIs must also support lambda expressions. To understand the current state of the Java ecosystem, we investigate (i) the extent to which third-party Java APIs have changed their interfaces, (ii) why or why not they introduce functional interface support and (iii) in the case the API has changed its interface how it does so. We also investigate the consumers’ perspective, particularly their ease in using lambda expressions in Java with APIs. We perform our investigation by manually analyzing the top 50 popular Java APIs, conducting in-person and email interviews with 23 API producers, and surveying 110 developers. We find that only a minority of the top 50 APIs support functional interfaces, the rest does not support them, predominantly in the interest of backward compatibility. Java 7 support is still greatly desirable due to enterprise projects not migrating to newer versions of Java. This suggests that the Java ecosystem is stagnant and that the introduction of new language features will not be enough to save it from the advent of new languages such as Kotlin (JVM based) and Rust (non-JVM based).
Journal Article
SZZ in the time of Pull Requests
2022
In the multi-commit development model, programmers complete tasks (e.g., implementing a feature) by organizing their work in several commits and packaging them into a commit-set. Analyzing data from developers using this model can be useful to tackle challenging developers' needs, such as knowing which features introduce a bug as well as assessing the risk of integrating certain features in a release. However, to do so one first needs to identify fix-inducing commit-sets. For such an identification, the SZZ algorithm is the most natural candidate, but its performance has not been evaluated in the multi-commit context yet. In this study, we conduct an in-depth investigation on the reliability and performance of SZZ in the multi-commit model. To obtain a reliable ground truth, we consider an already existing SZZ dataset and adapt it to the multi-commit context. Moreover, we devise a second dataset that is more extensive and directly created by developers as well as Quality Assurance (QA) engineers of Mozilla. Based on these datasets, we (1) test the performance of B-SZZ and its non-language-specific SZZ variations in the context of the multi-commit model, (2) investigate the reasons behind their specific behavior, and (3) analyze the impact of non-relevant commits in a commit-set and automatically detect them before using SZZ.