Catalogue Search | MBRL

CENTRAL LIMIT THEOREMS FOR EMPIRICAL TRANSPORTATION COST IN GENERAL DIMENSION

by del Barrio, Eustasio , Loubes, Jean-Michel

2019

We consider the problem of optimal transportation with quadratic cost between a empirical measure and a general target probability on ℝ d , with d ≥ 1. We provide new results on the uniqueness and stability of the associated optimal transportation potentials, namely, the minimizers in the dual formulation of the optimal transportation problem. As a consequence, we show that a CLT holds for the empirical transportation cost under mild moment and smoothness requirements. The limiting distributions are Gaussian and admit a simple description in terms of the optimal transportation potentials.

Journal Article

Share this book

Add to My Shelf

Tackling Algorithmic Bias in Neural-Network Classifiers using Wasserstein-2 Regularization

by Vincenot, Quentin , Risser, Laurent , Sanz, Alberto González in Accuracy , Algorithms , Applications of Mathematics

2022

The increasingly common use of neural network classifiers in industrial and social applications of image analysis has allowed impressive progress these last years. Such methods are, however, sensitive to algorithmic bias, i.e., to an under- or an over-representation of positive predictions or to higher prediction errors in specific subgroups of images. We then introduce in this paper a new method to temper the algorithmic bias in Neural-Network-based classifiers. Our method is Neural-Network architecture agnostic and scales well to massive training sets of images. It indeed only overloads the loss function with a Wasserstein-2-based regularization term for which we back-propagate the impact of specific output predictions using a new model, based on the Gâteaux derivatives of the predictions distribution. This model is algorithmically reasonable and makes it possible to use our regularized loss with standard stochastic gradient-descent strategies. Its good behavior is assessed on the reference Adult census , MNIST , CelebA datasets.

Journal Article

Share this book

Add to My Shelf

optimalFlow: optimal transport approach to flow cytometry gating and population matching

by del Barrio, Eustasio , Mayo-Íscar, Agustín , Inouzhe, Hristo in Algorithms , Bioinformatics , Biomedical and Life Sciences

2020

Background Data obtained from flow cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well-known phenomenon produced by measurements on different individuals, with different characteristics such as illness, age, sex, etc. The use of different settings for measurement, the variation of the conditions during experiments and the different types of flow cytometers are some of the technical causes of variability. This mixture of sources of variability makes the use of supervised machine learning for identification of cell populations difficult. The present work is conceived as a combination of strategies to facilitate the task of supervised gating. Results We propose optimalFlowTemplates , based on a similarity distance and Wasserstein barycenters , which clusters cytometries and produces prototype cytometries for the different groups. We show that supervised learning, restricted to the new groups, performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification , which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code is freely available as optimalFlow , a Bioconductor R package at https://bioconductor.org/packages/optimalFlow . Conclusions optimalFlowTemplates + optimalFlowClassification addresses the problem of using supervised learning while accounting for biological and technical variability. Our methodology provides a robust automated gating workflow that handles the intrinsic variability of flow cytometry data well. Our main innovation is the methodology itself and the optimal transport techniques that we apply to flow cytometry analysis.

Journal Article

Share this book

Add to My Shelf

The real seroprevalence of SARS-CoV-2 in France and its consequences for virus dynamics

by Miedougé, Marcel , Izopet, Jacques , Soulat, Jean-Marc in 631/114/2415 , 631/326/596/4130 , 692/308/174

2021

The SARS-CoV-2 virus has spread world-wide since December 2019, killing more than 2.9 million of people. We have adapted a statistical model from the SIR epidemiological models to predict the spread of SARS-CoV-2 in France. Our model is based on several parameters and assumed a 4.2% seroprevalence in Occitania after the first lockdown. The recent use of serological tests to measure the effective seroprevalence of SARS-CoV-2 in the population of Occitania has led to a seroprevalence around 2.4%. This implies to review the parameters of our model to conclude at a lower than expected virus transmission rate, which may be due to infectivity varying with the patient’s symptoms or to a constraint due to an uneven population geographical distribution.

Journal Article

Share this book

Add to My Shelf

How Optimal Transport Can Tackle Gender Biases in Multi-Class Neural Network Classifiers for Job Recommendations

by Jourdan, Fanny , Asher, Nicholas , Risser, Laurent in algorithmic bias , Algorithms , Artificial intelligence

2023

Automatic recommendation systems based on deep neural networks have become extremely popular during the last decade. Some of these systems can, however, be used in applications that are ranked as High Risk by the European Commission in the AI act—for instance, online job candidate recommendations. When used in the European Union, commercial AI systems in such applications will be required to have proper statistical properties with regard to the potential discrimination they could engender. This motivated our contribution. We present a novel optimal transport strategy to mitigate undesirable algorithmic biases in multi-class neural network classification. Our strategy is model agnostic and can be used on any multi-class classification neural network model. To anticipate the certification of recommendation systems using textual data, we used it on the Bios dataset, for which the learning task consists of predicting the occupation of female and male individuals, based on their LinkedIn biography. The results showed that our approach can reduce undesired algorithmic biases in this context to lower levels than a standard strategy.

Journal Article

Share this book

Add to My Shelf

Detecting and Processing Unsuspected Sensitive Variables for Robust Machine Learning

by Hervier, Lucas , Risser, Laurent , Picard, Agustin Martin in Algorithms , Bias , bias mitigation

2023

The problem of algorithmic bias in machine learning has recently gained a lot of attention due to its potentially strong impact on our societies. In much the same manner, algorithmic biases can alter industrial and safety-critical machine learning applications, where high-dimensional inputs are used. This issue has, however, been mostly left out of the spotlight in the machine learning literature. Contrary to societal applications, where a set of potentially sensitive variables, such as gender or race, can be defined by common sense or by regulations to draw attention to potential risks, the sensitive variables are often unsuspected in industrial and safety-critical applications. In addition, these unsuspected sensitive variables may be indirectly represented as a latent feature of the input data. For instance, the predictions of an image classifier may be altered by reconstruction artefacts in a small subset of the training images. This raises serious and well-founded concerns about the commercial deployment of AI-based solutions, especially in a context where new regulations address bias issues in AI. The purpose of our paper is, then, to first give a large overview of recent advances in robust machine learning. Then, we propose a new procedure to detect and to treat such unknown biases. As far as we know, no equivalent procedure has been proposed in the literature so far. The procedure is also generic enough to be used in a wide variety of industrial contexts. Its relevance is demonstrated on a set of satellite images used to train a classifier. In this illustration, our technique detects that a subset of the training images has reconstruction faults, leading to systematic prediction errors that would have been unsuspected using conventional cross-validation techniques.

Journal Article

Share this book

Add to My Shelf

Human liver microbiota modeling strategy at the early onset of fibrosis

by Gamboa, Fabrice , Fernández-Real, Jose-Manuel , Arnoriaga-Rodriguez, Maria in Analysis , Bacteria , Bacteriology

2023

Background Gut microbiota is involved in the development of liver diseases such as fibrosis. We and others identified that selected sets of gut bacterial DNA and bacteria translocate to tissues, notably the liver, to establish a non-infectious tissue microbiota composed of microbial DNA and a low frequency live bacteria. However, the precise set of bacterial DNA, and thereby the corresponding taxa associated with the early stages of fibrosis need to be identified. Furthermore, to overcome the impact of different group size and patient origins we adapted innovative statistical approaches. Liver samples with low liver fibrosis scores (F0, F1, F2), to study the early stages of the disease, were collected from Romania( n = 36), Austria( n = 10), Italy( n = 19), and Spain( n = 17). The 16S rRNA gene was sequenced. We considered the frequency, sparsity, unbalanced sample size between cohorts to identify taxonomic profiles and statistical differences. Results Multivariate analyses, including adapted spectral clustering with L1-penalty fair-discriminant strategies, and predicted metagenomics were used to identify that 50% of liver taxa associated with the early stage fibrosis were Enterobacteriaceae, Pseudomonadaceae, Xanthobacteriaceae and Burkholderiaceae. The Flavobacteriaceae and Xanthobacteriaceae discriminated between F0 and F1. Predicted metagenomics analysis identified that the preQ0 biosynthesis and the potential pathways involving glucoryranose and glycogen degradation were negatively associated with liver fibrosis F1-F2 vs F0. Conclusions Without demonstrating causality, our results suggest first a role of bacterial translocation to the liver in the progression of fibrosis, notably at the earliest stages. Second, our statistical approach can identify microbial signatures and overcome issues regarding sample size differences, the impact of environment, and sets of analyses. Trial registration TirguMECCH ROLIVER Prospective Cohort for the Identification of Liver Microbiota, registration 4065/2014. Registered 01 01 2014.

Journal Article

Share this book

Add to My Shelf

Existence and consistency of Wasserstein barycenters

by Le Gouic, Thibaut , Loubes, Jean-Michel in Center of gravity , Consistency , Economics

2017

Based on the Fréchet mean, we define a notion of barycenter corresponding to a usual notion of statistical mean . We prove the existence of Wasserstein barycenters of random probabilities defined on a geodesic space ( E , d ). We also prove the consistency of this barycenter in a general setting, that includes taking barycenters of empirical versions of the probability measures or of a growing set of probability measures.

Journal Article

Share this book

Add to My Shelf

Influence of SARS-CoV-2 Variant B.1.1.7, Vaccination, and Public Health Measures on the Spread of SARS-CoV-2

by Nicot, Florence , Ranger, Noémie , Dimeglio, Chloé in Coronaviruses , COVID-19 , COVID-19 - epidemiology

2021

The spread of SARS-CoV-2 and the resulting disease COVID-19 has killed over 2.6 million people as of 18 March 2021. We have used a modified susceptible, infected, recovered (SIR) epidemiological model to predict how the spread of the virus in regions of France will vary depending on the proportions of variants and on the public health strategies adopted, including anti-COVID-19 vaccination. The proportion of SARS-CoV-2 variant B.1.1.7, which was not detected in early January, increased to become 60% of the forms of SARS-CoV-2 circulating in the Toulouse urban area at the beginning of February 2021, but there was no increase in positive nucleic acid tests. Our prediction model indicates that maintaining public health measures and accelerating vaccination are efficient strategies for the sustained control of SARS-CoV-2.

Journal Article

Share this book

Add to My Shelf

Fairness seen as global sensitivity analysis

by Gamboa, Fabrice , Boissin, Thibaut , Bénesse, Clément in Algorithms , Artificial Intelligence , Computer Science

2024

Ensuring that a predictor is not biased against a sensitive feature is the goal of fair learning. Meanwhile, Global Sensitivity Analysis (GSA) is used in numerous contexts to monitor the influence of any feature on an output variable. We merge these two domains, Global Sensitivity Analysis and Fairness, by showing how fairness can be defined using a special framework based on Global Sensitivity Analysis and how various usual indicators are common between these two fields. We also present new Global Sensitivity Analysis indices, as well as rates of convergence, that are useful as fairness proxies.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter