Catalogue Search | MBRL

Size matters? Or not: A/B testing with limited sample in automotive embedded software

بواسطة Liu, Yuchu , David Issa Mattos , Helena Holmström Olsson في Automobile industry , Automotive engineering , Case studies

2021

A/B testing is gaining attention in the automotive sector as a promising tool to measure causal effects from software changes. Different from the web-facing businesses, where A/B testing has been well-established, the automotive domain often suffers from limited eligible users to participate in online experiments. To address this shortcoming, we present a method for designing balanced control and treatment groups so that sound conclusions can be drawn from experiments with considerably small sample sizes. While the Balance Match Weighted method has been used in other domains such as medicine, this is the first paper to apply and evaluate it in the context of software development. Furthermore, we describe the Balance Match Weighted method in detail and we conduct a case study together with an automotive manufacturer to apply the group design method in a fleet of vehicles. Finally, we present our case study in the automotive software engineering domain, as well as a discussion on the benefits and limitations of the A/B group design method.

Paper

شارك هذا الكتاب

أضف إلى رفتي

Bayesian propensity score matching in automotive embedded software engineering

بواسطة Liu, Yuchu , David Issa Mattos , Helena Holmström Olsson في Automotive engineering , Bayesian analysis , Domains

2021

Randomised field experiments, such as A/B testing, have long been the gold standard for evaluating the value that new software brings to customers. However, running randomised field experiments is not always desired, possible or even ethical in the development of automotive embedded software. In the face of such restrictions, we propose the use of the Bayesian propensity score matching technique for causal inference of observational studies in the automotive domain. In this paper, we present a method based on the Bayesian propensity score matching framework, applied in the unique setting of automotive software engineering. This method is used to generate balanced control and treatment groups from an observational online evaluation and estimate causal treatment effects from the software changes, even with limited samples in the treatment group. We exemplify the method with a proof-of-concept in the automotive domain. In the example, we have a larger control (\\(N_c=1100\\)) fleet of cars using the current software and a small treatment fleet (\\(N_t=38\\)), in which we introduce a new software variant. We demonstrate a scenario that shipping of a new software to all users is restricted, as a result, a fully randomised experiment could not be conducted. Therefore, we utilised the Bayesian propensity score matching method with 14 observed covariates as inputs. The results show more balanced groups, suitable for estimating causal treatment effects from the collected observational data. We describe the method in detail and share our configuration. Furthermore, we discuss how can such a method be used for online evaluation of new software utilising small groups of samples.

Paper

شارك هذا الكتاب

أضف إلى رفتي

On Experimentation in Software-Intensive Systems

بواسطة Mattos, David Issa في Artificial intelligence , Computer science , Customer feedback

2021

Context:Delivering software that has value to customers is a primary concern of every software company. Prevalent in web-facing companies, controlled experiments are used to validate and deliver value in incremental deployments. At the same that web-facing companies are aiming to automate and reduce the cost of each experiment iteration, embedded systems companies are starting to adopt experimentation practices and leverage their activities on the automation developments made in the online domain.Objective:This thesis has two main objectives. The first objective is to analyze how software companies can run and optimize their systems through automated experiments. This objective is investigated from the perspectives of the software architecture, the algorithms for the experiment execution and the experimentation process. The second objective is to analyze how non web-facing companies can adopt experimentation as part of their development process to validate and deliver value to their customers continuously. This objective is investigated from the perspectives of the software development process and focuses on the experimentation aspects that are distinct from web-facing companies.Method:To achieve these objectives, we conducted research in close collaboration with industry and used a combination of different empirical research methods: case studies, literature reviews, simulations, and empirical evaluations.Results:This thesis provides six main results. First, it proposes an architecture framework for automated experimentation that can be used with different types of experimental designs in both embedded systems and web-facing systems. Second, it proposes a new experimentation process to capture the details of a trustworthy experimentation process that can be used as the basis for an automated experimentation process. Third, it identifies the restrictions and pitfalls of different multi-armed bandit algorithms for automating experiments in industry. This thesis also proposes a set of guidelines to help practitioners select a technique that minimizes the occurrence of these pitfalls. Fourth, it proposes statistical models to analyze optimization algorithms that can be used in automated experimentation. Fifth, it identifies the key challenges faced by embedded systems companies when adopting controlled experimentation, and we propose a set of strategies to address these challenges. Sixth, it identifies experimentation techniques and proposes a new continuous experimentation model for mission-critical and business-to-business.Conclusion:The results presented in this thesis indicate that the trustworthiness in the experimentation process and the selection of algorithms still need to be addressed before automated experimentation can be used at scale in industry. The embedded systems industry faces challenges in adopting experimentation as part of its development process. In part, this is due to the low number of users and devices that can be used in experiments and the diversity of the required experimental designs for each new situation. This limitation increases both the complexity of the experimentation process and the number of techniques used to address this constraint.

Dissertation

شارك هذا الكتاب

أضف إلى رفتي

Towards Automated Experiments in Software Intensive Systems

بواسطة Mattos, David Issa في Artificial intelligence , Computer Engineering

2018

Context: Delivering software that has value to customers is a primary concern of every software company. One of the techniques to continuously validate and deliver value in online software systems is the use of controlled experiments. The time cost of each experiment iteration, the increasing growth in the development organization to run experiments and the need for a more automated and systematic approach is leading companies to look for different techniques to automate the experimentation process. Objective: The overall objective of this thesis is to analyze how to automate different types of experiments and how companies can support and optimize their systems through automated experiments. This thesis explores the topic of automated online experiments from the perspectives of the software architecture, the algorithms for the experiment execution and the experimentation process, and focuses on two main application domains: the online and the embedded systems domain. Method: To achieve the objective, we conducted this research in close collaboration with industry using a combination of different empirical research methods: case studies, literature reviews, simulations and empirical evaluations. Results and conclusions: This thesis provides five main results. First, we propose an architecture framework for automated experimentation that can be used with different types of experimental designs in both embedded systems and web-facing systems. Second, we identify the key challenges faced by embedded systems companies when adopting controlled experimentation and we propose a set of strategies to address these challenges. Third, we develop a new algorithm for online experiments. Fourth, we identify restrictions and pitfalls of different algorithms for automating experiments in industry and we propose a set of guidelines to help practitioners select a technique that minimizes the occurrence of these pitfalls. Fifth, we propose a new experimentation process to capture the details of a trustworthy experimentation process that can be used as basis for an automated experimentation process. Future work: In future work, we plan to investigate how embedded systems can incorporate experiments in their development process without compromising existing real-time and safety requirements. We also plan to analyze the impact and costs of automating the different parts of the experimentation process.

Dissertation

شارك هذا الكتاب

أضف إلى رفتي

On the Use of Causal Graphical Models for Designing Experiments in the Automotive Domain

بواسطة David Issa Mattos , Liu, Yuchu في Domains , Experiments , Randomization

2022

Randomized field experiments are the gold standard for evaluating the impact of software changes on customers. In the online domain, randomization has been the main tool to ensure exchangeability. However, due to the different deployment conditions and the high dependence on the surrounding environment, designing experiments for automotive software needs to consider a higher number of restricted variables to ensure conditional exchangeability. In this paper, we show how at Volvo Cars we utilize causal graphical models to design experiments and explicitly communicate the assumptions of experiments. These graphical models are used to further assess the experiment validity, compute direct and indirect causal effects, and reason on the transportability of the causal conclusions.

Paper

شارك هذا الكتاب

أضف إلى رفتي

Bayesian Paired-Comparison with the bpcs Package

بواسطة Érika Martins Silva Ramos , David Issa Mattos في Assessments , Bayesian analysis , Data analysis

2021

This article introduces the bpcs R package (Bayesian Paired Comparison in Stan) and the statistical models implemented in the package. This package aims to facilitate the use of Bayesian models for paired comparison data in behavioral research. Bayesian analysis of paired comparison data allows parameter estimation even in conditions where the maximum likelihood does not exist, allows easy extension of paired comparison models, provide straightforward interpretation of the results with credible intervals, have better control of type I error, have more robust evidence towards the null hypothesis, allows propagation of uncertainties, includes prior information, and perform well when handling models with many parameters and latent variables. The bpcs package provides a consistent interface for R users and several functions to evaluate the posterior distribution of all parameters, to estimate the posterior distribution of any contest between items, and to obtain the posterior distribution of the ranks. Three reanalyses of recent studies that used the frequentist Bradley-Terry model are presented. These reanalyses are conducted with the Bayesian models of the bpcs package, and all the code used to fit the models, generate the figures, and the tables are available in the online appendix.

Paper

شارك هذا الكتاب

أضف إلى رفتي

Statistical Models for the Analysis of Optimization Algorithms with Benchmark Functions

بواسطة David Issa Mattos , Helena Holmström Olsson , Bosch, Jan في Bayesian analysis , Benchmarks , Data analysis

2021

Frequentist statistical methods, such as hypothesis testing, are standard practice in papers that provide benchmark comparisons. Unfortunately, these methods have often been misused, e.g., without testing for their statistical test assumptions or without controlling for family-wise errors in multiple group comparisons, among several other problems. Bayesian Data Analysis (BDA) addresses many of the previously mentioned shortcomings but its use is not widely spread in the analysis of empirical data in the evolutionary computing community. This paper provides three main contributions. First, we motivate the need for utilizing Bayesian data analysis and provide an overview of this topic. Second, we discuss the practical aspects of BDA to ensure that our models are valid and the results transparent. Finally, we provide five statistical models that can be used to answer multiple research questions. The online appendix provides a step-by-step guide on how to perform the analysis of the models discussed in this paper, including the code for the statistical models, the data transformations and the discussed tables and figures.

Paper

شارك هذا الكتاب

أضف إلى رفتي

On the Assessment of Benchmark Suites for Algorithm Comparison

بواسطة David Issa Mattos , Helena Holmström Olsson , Bosch, Jan في Adaptive algorithms , Benchmarks , Boolean algebra

2021

Benchmark suites, i.e. a collection of benchmark functions, are widely used in the comparison of black-box optimization algorithms. Over the years, research has identified many desired qualities for benchmark suites, such as diverse topology, different difficulties, scalability, representativeness of real-world problems among others. However, while the topology characteristics have been subjected to previous studies, there is no study that has statistically evaluated the difficulty level of benchmark functions, how well they discriminate optimization algorithms and how suitable is a benchmark suite for algorithm comparison. In this paper, we propose the use of an item response theory (IRT) model, the Bayesian two-parameter logistic model for multiple attempts, to statistically evaluate these aspects with respect to the empirical success rate of algorithms. With this model, we can assess the difficulty level of each benchmark, how well they discriminate different algorithms, the ability score of an algorithm, and how much information the benchmark suite adds in the estimation of the ability scores. We demonstrate the use of this model in two well-known benchmark suites, the Black-Box Optimization Benchmark (BBOB) for continuous optimization and the Pseudo Boolean Optimization (PBO) for discrete optimization. We found that most benchmark functions of BBOB suite have high difficulty levels (compared to the optimization algorithms) and low discrimination. For the PBO, most functions have good discrimination parameters but are often considered too easy. We discuss potential uses of IRT in benchmarking, including its use to improve the design of benchmark suites, to measure multiple aspects of the algorithms, and to design adaptive suites.

Paper

شارك هذا الكتاب

أضف إلى رفتي

Bayesian causal inference in automotive software engineering and online evaluation

بواسطة David Issa Mattos , Helena Holmström Olsson , Lantz, Jonn في Automobiles , Automotive engineering , Bayesian analysis

2022

Randomised field experiments, such as A/B testing, have long been the gold standard for evaluating software changes. In the automotive domain, running randomised field experiments is not always desired, possible, or even ethical. In the face of such limitations, we develop a framework BOAT (Bayesian causal modelling for ObvservAtional Testing), utilising observational studies in combination with Bayesian causal inference, in order to understand real-world impacts from complex automotive software updates and help software development organisations arrive at causal conclusions. In this study, we present three causal inference models in the Bayesian framework and their corresponding cases to address three commonly experienced challenges of software evaluation in the automotive domain. We develop the BOAT framework with our industry collaborator, and demonstrate the potential of causal inference by conducting empirical studies on a large fleet of vehicles. Moreover, we relate the causal assumption theories to their implications in practise, aiming to provide a comprehensive guide on how to apply the causal models in automotive software engineering. We apply Bayesian propensity score matching for producing balanced control and treatment groups when we do not have access to the entire user base, Bayesian regression discontinuity design for identifying covariate dependent treatment assignments and the local treatment effect, and Bayesian difference-in-differences for causal inference of treatment effect overtime and implicitly control unobserved confounding factors. Each one of the demonstrative case has its grounds in practise, and is a scenario experienced when randomisation is not feasible. With the BOAT framework, we enable online software evaluation in the automotive domain without the need of a fully randomised experiment.

Paper

شارك هذا الكتاب

أضف إلى رفتي

Engineering for a Science-Centric Experimentation Platform

بواسطة Gerostathopoulos, Ilias , David Issa Mattos , Mao, Tobias في Experimentation , Inference , Scientists

2019

Netflix is an internet entertainment service that routinely employs experimentation to guide strategy around product innovations. As Netflix grew, it had the opportunity to explore increasingly specialized improvements to its service, which generated demand for deeper analyses supported by richer metrics and powered by more diverse statistical methodologies. To facilitate this, and more fully harness the skill sets of both engineering and data science, Netflix engineers created a science-centric experimentation platform that leverages the expertise of data scientists from a wide range of backgrounds by allowing them to make direct code contributions in the languages used by scientists (Python and R). Moreover, the same code that runs in production is able to be run locally, making it straightforward to explore and graduate both metrics and causal inference methodologies directly into production services. In this paper, we utilize a case-study research method to provide two main contributions. Firstly, we report on the architecture of this platform, with a special emphasis on its novel aspects: how it supports science-centric end-to-end workflows without compromising engineering requirements. Secondly, we describe its approach to causal inference, which leverages the potential outcomes conceptual framework to provide a unified abstraction layer for arbitrary statistical models and methodologies.

Paper

شارك هذا الكتاب

أضف إلى رفتي

محدد اللغة

MBRLGlobalSearch

محدد اللغة

Catalogue Search | MBRL

نتائج البحث

استكشف المجموعة الواسعة من العناوين المتاحة.

MBRLSearchResults

MBRLHappinessMeter