Catalogue Search | MBRL

A Theoretical and Empirical Study of Search-Based Testing: Local, Global, and Hybrid Search

by Harman, M. , McMinn, P. in Algorithms , and search , Artificial intelligence

2010

Search-based optimization techniques have been applied to structural software test data generation since 1992, with a recent upsurge in interest and activity within this area. However, despite the large number of recent studies on the applicability of different search-based optimization approaches, there has been very little theoretical analysis of the types of testing problem for which these techniques are well suited. There are also few empirical studies that present results for larger programs. This paper presents a theoretical exploration of the most widely studied approach, the global search technique embodied by Genetic Algorithms. It also presents results from a large empirical study that compares the behavior of both global and local search-based optimization on real-world programs. The results of this study reveal that cases exist of test data generation problem that suit each algorithm, thereby suggesting that a hybrid global-local search (a Memetic Algorithm) may be appropriate. The paper presents a Memetic Algorithm along with further empirical results studying its performance.

Journal Article

Share this book

Add to My Shelf

Mutation-Driven Generation of Unit Tests and Oracles

by Zeller, A. , Fraser, G. in Analysis , assertions , Automation

2012

To assess the quality of test suites, mutation analysis seeds artificial defects (mutations) into programs; a nondetected mutation indicates a weakness in the test suite. We present an automated approach to generate unit tests that detect these mutations for object-oriented classes. This has two advantages: First, the resulting test suite is optimized toward finding defects modeled by mutation operators rather than covering code. Second, the state change caused by mutations induces oracles that precisely detect the mutants. Evaluated on 10 open source libraries, our μtest prototype generates test suites that find significantly more seeded defects than the original manually written test suites.

Journal Article

Share this book

Add to My Shelf

Evaluating the impact of flaky simulators on testing autonomous driving systems

by Nejati, Shiva , Amini, Mohammad Hossein , Naseri, Shervin in Algorithms , Classifiers , Compilers

2024

Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of 85%, 82% and 96% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by 31%, 21%, and 13% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository (Github paper 2023 ).

Journal Article

Share this book

Add to My Shelf

Achieving scalable mutation-based generation of whole test suites

by Fraser, Gordon , Arcuri, Andrea in Compilers , Computer Science , Interpreters

2015

Without complete formal specification, automatically generated software tests need to be manually checked in order to detect faults. This makes it desirable to produce the strongest possible test set while keeping the number of tests as small as possible. As commonly applied coverage criteria like branch coverage are potentially weak, mutation testing has been proposed as a stronger criterion. However, mutation based test generation is hampered because usually there are simply too many mutants, and too many of these are either trivially killed or equivalent. On such mutants, any effort spent on test generation would per definition be wasted. To overcome this problem, our search-based E vo S uite test generation tool integrates two novel optimizations: First, we avoid redundant test executions on mutants by monitoring state infection conditions, and second we use whole test suite generation to optimize test suites towards killing the highest number of mutants, rather than selecting individual mutants. These optimizations allowed us to apply E vo S uite to a random sample of 100 open source projects, consisting of a total of 8,963 classes and more than two million lines of code, leading to a total of 1,380,302 mutants. The experiment demonstrates that our approach scales well, making mutation testing a viable test criterion for automated test case generation tools, and allowing us to analyze the relationship of branch coverage and mutation testing in detail.

Journal Article

Share this book

Add to My Shelf

1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite

by Fraser, Gordon , Arcuri, Andrea in Compilers , Computer Science , Interpreters

2015

Automated unit test generation techniques traditionally follow one of two goals: Either they try to find violations of automated oracles (e.g., assertions, contracts, undeclared exceptions), or they aim to produce representative test suites (e.g., satisfying branch coverage) such that a developer can manually add test oracles. Search-based testing (SBST) has delivered promising results when it comes to achieving coverage, yet the use in conjunction with automated oracles has hardly been explored, and is generally hampered as SBST does not scale well when there are too many testing targets. In this paper we present a search-based approach to handle both objectives at the same time, implemented in the EvoSuite tool. An empirical study applying EvoSuite on 100 randomly selected open source software projects (the SF100 corpus) reveals that SBST has the unique advantage of being well suited to perform both traditional goals at the same time—efficiently triggering faults, while producing representative test sets for any chosen coverage criterion. In our study, EvoSuite detected twice as many failures in terms of undeclared exceptions as a traditional random testing approach, witnessing thousands of real faults in the 100 open source projects. Two out of every five classes with undeclared exceptions have actual faults, but these are buried within many failures that are caused by implicit preconditions. This “noise” can be interpreted as either a call for further research in improving automated oracles—or to make tools like EvoSuite an integral part of software development to enforce clean program interfaces.

Journal Article

Share this book

Add to My Shelf

TM-fuzzer: fuzzing autonomous driving systems through traffic management

by Chen, Fansong , Wang, Gaosheng , Xi, Laile in Artificial Intelligence , Automation , Autonomous vehicles

2024

Simulation testing of Autonomous Driving Systems (ADS) is crucial for ensuring the safety of autonomous vehicles. Currently, scenarios searched by ADS simulation testing tools are less likely to expose ADS issues and highly similar. In this paper, we propose TM-fuzzer, a novel approach for searching ADS test scenarios, which utilizes real-time traffic management and diversity analysis to search security-critical and unique scenarios within the infinite scenario space. TM-fuzzer dynamically manages traffic flow by manipulating non-player characters near autonomous vehicle throughout the simulation process to enhance the efficiency of test scenarios. Additionally, the TM-fuzzer utilizes clustering analysis on vehicle trajectory graphs within scenarios to increase the diversity of test scenarios. Compared to the baseline, the TM-fuzzer identified 29 unique violated scenarios more than four times faster and enhanced the incidence of ADS-caused violations by 26.26%. Experiments suggest that the TM-fuzzer demonstrates improved efficiency and accuracy.

Journal Article

Share this book

Add to My Shelf

Generating Test Cases for Decentralized Swarming Unmanned Aerial Vehicles

by Marson, David , Pretschner, Alexander in Artificial Intelligence , Cargo , Control

2026

Decentralized unmanned aerial vehicle (UAV) swarms can accomplish demanding tasks through cooperation due to their inherent robustness and scalability. However, existing testing methods typically involve randomly inducing a failure—an approach that may not degrade system performance as much as a targeted failure would. This leaves a critical gap in understanding the true robustness of a swarm. For example, one UAVs in a group of UAVs transporting cargo could lose power, and the overall swarm performance may degrade, e.g., transportation time may become longer than expected. However, the failure of some UAVs may cause greater performance degradation than the failure of others. In this paper, we propose a systematic methodology for generating test cases that challenge a UAV swarm’s robustness as much as possible within computational constraints. The methodology incorporates techniques from scenario-based testing, including search-based test case generation, and is repeated for a range of functionally relevant swarm sizes. We demonstrate the methodology with a case study that is based on the concept of a UAV cargo delivery mission. We used a genetic algorithm to configure the failure behavior that is injected into the test, specifically which UAV to fail and when the failure occurs, with the goal of maximizing degradation in system performance (measured as an increase in flight time). Our results show that a genetic algorithm outperforms random search when the swarm size is less than or equal to the sum of a swarm controller parameter and a test parameter: the number of neighbors that an individual UAV considers to be in its “neighborhood,” and the number of UAVs that are configured to fail during the test, respectively. For swarm sizes larger than this threshold, the fitness landscape of the search becomes much more constrained, and a genetic algorithm does not provide a substantial benefit compared to random search. To support broader adoption, we also propose a generalized testing process for decentralized swarms that accounts for system functionality, robustness, and scalability. Overall, this work provides both a theoretical framework and empirical results to guide the process of generating challenging test cases for decentralized UAV swarms.

Journal Article

Share this book

Add to My Shelf

Automated test generation for Scratch programs

by Schweikl, Sebastian , Fraser, Gordon , Feldmeier, Patric in Algorithms , Automation , Computers

2023

The importance of programming education has led to dedicated educational programming environments, where users visually arrange block-based programming constructs that typically control graphical, interactive game-like programs. The Scratch programming environment is particularly popular, with more than 90 million registered users at the time of this writing. While the block-based nature of Scratch helps learners by preventing syntactical mistakes, there nevertheless remains a need to provide feedback and support in order to implement desired functionality. To support individual learning and classroom settings, this feedback and support should ideally be provided in an automated fashion, which requires tests to enable dynamic program analysis. In prior work we introduced Whisker, a framework that enables automated testing of Scratch programs. However, creating these automated tests for Scratch programs is challenging. In this paper, we therefore investigate how to automatically generate Whisker tests. Generating tests for Scratch raises important challenges: First, game-like programs are typically randomised, leading to flaky tests. Second, Scratch programs usually consist of animations and interactions with long delays, inhibiting the application of classical test generation approaches. Thus, the new application domain raises the question of which test generation technique is best suited to produce high coverage tests capable of detecting faulty behaviour. We investigate these questions using an extension of the Whisker test framework for automated test generation. Evaluation on common programming exercises, a random sample of 1000 Scratch user programs, and the 1000 most popular Scratch programs demonstrates that our approach enables Whisker to reliably accelerate test executions, and even though many Scratch programs are small and easy to cover, there are many unique challenges for which advanced search-based test generation using many-objective algorithms is needed in order to achieve high coverage.

Journal Article

Share this book

Add to My Shelf

Search-based fairness testing for regression-based machine learning systems

by Tantithamthavorn Chakkrit , Kuhn, Lisa , Walker, Katie in Context , Evaluation , Health care

2022

ContextMachine learning (ML) software systems are permeating many aspects of our life, such as healthcare, transportation, banking, and recruitment. These systems are trained with data that is often biased, resulting in biased behaviour. To address this issue, fairness testing approaches have been proposed to test ML systems for fairness, which predominantly focus on assessing classification-based ML systems. These methods are not applicable to regression-based systems, for example, they do not quantify the magnitude of the disparity in predicted outcomes, which we identify as important in the context of regression-based ML systems.Method:We conduct this study as design science research. We identify the problem instance in the context of emergency department (ED) wait-time prediction. In this paper, we develop an effective and efficient fairness testing approach to evaluate the fairness of regression-based ML systems. We propose fairness degree, which is a new fairness measure for regression-based ML systems, and a novel search-based fairness testing (SBFT) approach for testing regression-based machine learning systems. We apply the proposed solutions to ED wait-time prediction software.Results:We experimentally evaluate the effectiveness and efficiency of the proposed approach with ML systems trained on real observational data from the healthcare domain. We demonstrate that SBFT significantly outperforms existing fairness testing approaches, with up to 111% and 190% increase in effectiveness and efficiency of SBFT compared to the best performing existing approaches.Conclusion:These findings indicate that our novel fairness measure and the new approach for fairness testing of regression-based ML systems can identify the degree of fairness in predictions, which can help software teams to make data-informed decisions about whether such software systems are ready to deploy. The scientific knowledge gained from our work can be phrased as a technological rule; to measure the fairness of the regression-based ML systems in the context of emergency department wait-time prediction use fairness degree and search-based techniques to approximate it.

Journal Article

Share this book

Add to My Shelf

Improved novel bat algorithm for test case prioritization and minimization

by Sangwan, Om Prakash , Bajaj, Anu , Abraham, Ajith in Application of Soft Computing , Artificial Intelligence , Computational Intelligence

2022

Regression testing is essential for continuous integration and continuous development. It is needed to ensure that the modifications have not produced any errors or faults, thereby maintaining the quality and reliability of the software. The testers usually avoid exhaustive retesting because it requires lots of effort and time. The test case prioritization and minimization solve the issue by scheduling the critical test cases and removing redundant ones. Optimization techniques help by improving the efficiency of these techniques while utilizing limited resources. This paper proposed an enhanced discrete novel bat algorithm for the test case prioritization. The algorithm is modified in two ways. First, we have proposed a fix-up mechanism for the discrete combinatorial problem, which conducts the perturbation in the population using the asexual reproduction algorithm. Second, the novel bat algorithm is improved, where the bats hunt in different habitats with quantum behavior using Gaussian distribution and search in the limited habitat with Doppler effect. In addition, we have embedded the test case minimization procedure in the algorithm for redundancy reduction. The experimental results are empirically analyzed using different testing criteria, i.e., fault and statement coverage on three subject programs from the software infrastructure repository. Consequently, test selection percentage, coverage loss, fault detection loss, and cost reduction percentages are deduced for the test case minimization at program and version levels. Empirical results and statistical comparisons with the random search, bat algorithm, novel bat algorithm, birds swarm algorithm, whale optimization algorithm, and genetic algorithm show the outperformance of the proposed algorithm.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter