Catalogue Search | MBRL

A rank-based marker selection method for high throughput scRNA-seq data

by Gilbert, Anna C. , Vargo, Alexander H. S. in Algorithms , Animals , Base Sequence

2020

Background High throughput microfluidic protocols in single cell RNA sequencing (scRNA-seq) collect mRNA counts from up to one million individual cells in a single experiment; this enables high resolution studies of rare cell types and cell development pathways. Determining small sets of genetic markers that can identify specific cell populations is thus one of the major objectives of computational analysis of mRNA counts data. Many tools have been developed for marker selection on single cell data; most of them, however, are based on complex statistical models and handle the multi-class case in an ad-hoc manner. Results We introduce RankCorr , a fast method with strong mathematical underpinnings that performs multi-class marker selection in an informed manner. RankCorr proceeds by ranking the mRNA counts data before linearly separating the ranked data using a small number of genes. The step of ranking is intuitively natural for scRNA-seq data and provides a non-parametric method for analyzing count data. In addition, we present several performance measures for evaluating the quality of a set of markers when there is no known ground truth. Using these metrics, we compare the performance of RankCorr to a variety of other marker selection methods on an assortment of experimental and synthetic data sets that range in size from several thousand to one million cells. Conclusions According to the metrics introduced in this work, RankCorr is consistently one of most optimal marker selection methods on scRNA-seq data. Most methods show similar overall performance, however; thus, the speed of the algorithm is the most important consideration for large data sets (and comparing the markers selected by several methods can be fruitful). RankCorr is fast enough to easily handle the largest data sets and, as such, it is a useful tool to add into computational pipelines when dealing with high throughput scRNA-seq data. RankCorr software is available for download at https://github.com/ahsv/RankCorr with extensive documentation.

Journal Article

Share this book

Add to My Shelf

Oversimplifying quantum factoring

by Vargo, Alexander , Smolin, John A. , Smith, Graeme in 639/705/117 , 639/766/483/481 , Algorithmics. Computability. Computer arithmetics

2013

Building a device capable of factoring large numbers is a major goal of quantum computing; an algorithm for quantum factoring (Shor’s algorithm) exists, and a simple coin-tossing exercise is used to illustrate the dangers of oversimplification when implementing this algorithm experimentally. A cautionary approach to quantum computing Building a device capable of factoring larger numbers is a major goal of quantum computing. Some small-scale demonstrations of an algorithm for quantum factoring (known as Shor's algorithm) exist, but these have used simplifications dependent on knowing the factors in advance. John Smolin et al . use a simple coin-tossing exercise to illustrate the dangers of over-simplification, and suggest a more stringent test for experimental demonstrations of Shor's algorithm. Shor’s quantum factoring algorithm exponentially outperforms known classical methods. Previous experimental implementations have used simplifications dependent on knowing the factors in advance. However, as we show here, all composite numbers admit simplification of the algorithm to a circuit equivalent to flipping coins. The difficulty of a particular experiment therefore depends on the level of simplification chosen, not the size of the number factored. Valid implementations should not make use of the answer sought.

Journal Article

Share this book

Add to My Shelf

Applications of Machine Learning: From Single Cell Biology to Algorithmic Fairness

by Vargo, Alexander H.S in Artificial intelligence , Bioinformatics , Cellular biology

2020

It is common practice to obtain answers to complex questions by analyzing large amounts of data. Formal modeling and careful mathematical definitions are essential to extracting relevant answers from data, and establishing a mathematical framework requires deliberate interdisciplinary collaboration between the specialists who provide the questions and the mathematicians who translate them. This dissertation details the results of two of these interdisciplinary collaborations: one in single cell RNA sequencing, and the other in fairness. High throughput microfluidic protocols in single cell RNA sequencing (scRNA-seq) collect integer valued mRNA counts from many individual cells in a single experiment; this enables high resolution studies of rare cell types and cell development pathways. ScRNA-seq data are sparse: often 90% of the collected reads are zeros. Specialized methods are required to obtain solutions to biological questions from these sparse, integer-valued data. Determining genetic markers that can identify specific cell populations is one of the major objectives of the analysis of mRNA count data. We introduce RANKCORR, a fast method with robust mathematical underpinnings that performs multi-class marker selection. RANKCORR proceeds by ranking the mRNA count data before linearly separating the ranked data using a small number of genes. Ranking scRNA-seq count data provides a reasonable non-parametric method for analyzing these data; we further include an analysis of the statistical properties of this rank transformation. We compare the performance of RANKCORR to a variety of other marker selection methods. These experiments show that RANKCORR is consistently one of the top-performing marker selection methods on scRNA-seq data, though other methods show similar overall performance. This suggests that the speed of the algorithm is the most important consideration for large data sets. RANKCORR is efficient and able to handle the largest data sets; as such, it is a useful tool for dealing with high throughput scRNA-seq data. The second collaboration combines state of the art machine learning methods with formal definitions of fairness. Machine learning methods have a tendency to preserve or exacerbate biases that exist in data; consequently, the algorithms that influence our daily lives often display biases against certain protected groups. It is both objectionable and often illegal to allow daily decisions (e.g. mortgage approvals, job advertisements) to disadvantage protected groups; a growing body of literature in the field of algorithmic fairness aims to mitigate these issues. We contribute two methods towards this goal. We first introduce a preprocessing method designed to debias the training data. Specifically, the method attempts to remove any variation in the original data that comes from protected group status. This is accomplished by leveraging knowledge of groups that we expect to receive similar outcomes from a fair algorithm. We further present a method for training a classifier (from potentially biased data) that is both accurate and fair using the gradient boosting framework. Gradient boosting is a powerful method for constructing predictive models that can be superior to neural networks on tabular data; the development of a fair gradient boosting method is thus desirable for the adoption of fair methods. Moreover, the method that we present is designed to construct predictors that are fair at an individual level - that is, two comparable individuals will be assigned similar results. This is different from most of the existing fair algorithms that ensure fairness at a statistical level.

Dissertation

Share this book

Add to My Shelf

Comparison of marker selection methods for high throughput scRNA-seq data

by Gilbert, Anna C , Vargo, Alexander Hs in Genomics , Performance evaluation

2019

Here, we evaluate the performance of a variety of marker selection methods on scRNA-seq UMI counts data. We test on an assortment of experimental and synthetic data sets that range in size from several thousand to one million cells. In addition, we propose several performance measures for evaluating the quality of a set of markers when there is no known ground truth. According to these metrics, most existing marker selection methods show similar performance on experimental scRNA-seq data; thus, the speed of the algorithm is the most important consideration for large data sets. With this in mind, we introduce RankCorr, a fast marker selection method with strong mathematical underpinnings that takes a step towards sensible multi-class marker selection. Footnotes * https://github.com/ahsv/marker-selection-code

Paper

Share this book

Add to My Shelf

Simulation of rare events in quantum error correction

by Vargo, Alexander , Bravyi, Sergey in Alpha decay , Computer simulation , Decay rate

2013

We consider the problem of calculating the logical error probability for a stabilizer quantum code subject to random Pauli errors. To access the regime of large code distances where logical errors are extremely unlikely we adopt the splitting method widely used in Monte Carlo simulations of rare events and Bennett's acceptance ratio method for estimating the free energy difference between two canonical ensembles. To illustrate the power of these methods in the context of error correction, we calculate the logical error probability \\(P_L\\) for the 2D surface code on a square lattice with a pair of holes for all code distances \\(d\\le 20\\) and all error rates \\(p\\) below the fault-tolerance threshold. Our numerical results confirm the expected exponential decay \\(P_L\\sim \\exp{[-\\alpha(p)d]}\\) and provide a simple fitting formula for the decay rate \\(\\alpha(p)\\). Both noiseless and noisy syndrome readout circuits are considered.

Paper

Share this book

Add to My Shelf

Individually Fair Gradient Boosting

by Yurochkin, Mikhail , Sun, Yuekai , Vargo, Alexander in Algorithms , Decision trees , Machine learning

2021

We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias.

Paper

Share this book

Add to My Shelf

Efficient Algorithms for Maximum Likelihood Decoding in the Surface Code

by Vargo, Alexander , Suchara, Martin , Bravyi, Sergey in Algorithms , Computer simulation , Error correction

2014

We describe two implementations of the optimal error correction algorithm known as the maximum likelihood decoder (MLD) for the 2D surface code with a noiseless syndrome extraction. First, we show how to implement MLD exactly in time \\(O(n^2)\\), where \\(n\\) is the number of code qubits. Our implementation uses a reduction from MLD to simulation of matchgate quantum circuits. This reduction however requires a special noise model with independent bit-flip and phase-flip errors. Secondly, we show how to implement MLD approximately for more general noise models using matrix product states (MPS). Our implementation has running time \\(O(n\\chi^3)\\) where \\(\\chi\\) is a parameter that controls the approximation precision. The key step of our algorithm, borrowed from the DMRG method, is a subroutine for contracting a tensor network on the two-dimensional grid. The subroutine uses MPS with a bond dimension \\(\\chi\\) to approximate the sequence of tensors arising in the course of contraction. We benchmark the MPS-based decoder against the standard minimum weight matching decoder observing a significant reduction of the logical error probability for \\(\\chi\\ge 4\\).

Paper

Share this book

Add to My Shelf

Debiasing representations by removing unwanted variation due to protected attributes

by Sun, Yuekai , Vargo, Alexander , Bower, Amanda in Representations

2018

We propose a regression-based approach to removing implicit biases in representations. On tasks where the protected attribute is observed, the method is statistically more efficient than known approaches. Further, we show that this approach leads to debiased representations that satisfy a first order approximation of conditional parity. Finally, we demonstrate the efficacy of the proposed approach by reducing racial bias in recidivism risk scores.

Paper

Share this book

Add to My Shelf

Approximation of real error channels by Clifford channels and Pauli measurements

by Gutiérrez, Mauricio , Svec, Lukas , Vargo, Alexander in Channels , Computer simulation , Damping

2012

The Gottesman-Knill theorem allows for the efficient simulation of stabilizer-based quantum error-correction circuits. Errors in these circuits are commonly modeled as depolarizing channels by using Monte Carlo methods to insert Pauli gates randomly throughout the circuit. Although convenient, these channels are poor approximations of common, realistic channels like amplitude damping. Here we analyze a larger set of efficiently simulable error channels by allowing the random insertion of any one-qubit gate or measurement that can be efficiently simulated within the stabilizer formalism. Our new error channels are shown to be a viable method for accurately approximating real error channels.

Paper

Share this book

Add to My Shelf

Sexually antagonistic selection promotes genetic divergence between males and females in an ant

by Blumenfeld, Alexander J. , Eyer, Pierre-André , Vargo, Edward L. in Adaptation , Alleles , Animals

2019

Genetic diversity acts as a reservoir for potential adaptations, yet selection tends to reduce this diversity over generations. However, sexually antagonistic selection (SAS) may promote diversity by selecting different alleles in each sex. SAS arises when an allele is beneficial to one sex but harmful to the other. Usually, the evolution of sex chromosomes allows each sex to independently reach different optima, thereby circumventing the constraint of a shared autosomal genome. Because the X chromosome is found twice as often in females than males, it represents a hot spot for SAS, offering a refuge for recessive male-beneficial but female-costly alleles. Hymenopteran species do not have sex chromosomes; females are diploid and males are haploid, with sex usually determined by heterozygosity at the complementary sex-determining locus. For this reason, their entire genomes display an X-linked pattern, as every chromosome is found twice as often in females than in males, which theoretically predisposes them to SAS in large parts of their genome. Here we report an instance of sexual divergence in the Hymenoptera, a sexually reproducing group that lacks sex chromosomes. In the invasive ant Nylanderia fulva, a postzygotic SAS leads daughters to preferentially carry alleles from their mothers and sons to preferentially carry alleles from their grandfathers for a substantial region (∼3%) of the genome. This mechanism results in nearly all females being heterozygous at these regions and maintains diversity throughout the population, whichmay mitigate the effects of a genetic bottleneck following introduction to an exotic area and enhance the invasion success of this ant.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter