Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
7
result(s) for
"Paskov, Hristo"
Sort by:
Multitask learning improves prediction of cancer drug sensitivity
2016
Precision oncology seeks to predict the best therapeutic option for individual patients based on the molecular characteristics of their tumors. To assess the preclinical feasibility of drug sensitivity prediction, several studies have measured drug responses for cytotoxic and targeted therapies across large collections of genomically and transcriptomically characterized cancer cell lines and trained predictive models using standard methods like elastic net regression. Here we use existing drug response data sets to demonstrate that
multitask
learning across drugs strongly improves the accuracy and interpretability of drug prediction models. Our method uses trace norm regularization with a highly efficient ADMM (alternating direction method of multipliers) optimization algorithm that readily scales to large data sets. We anticipate that our approach will enhance efforts to exploit growing drug response compendia in order to advance personalized therapy.
Journal Article
Exploiting Social Network Structure for Person-to-Person Sentiment Analysis
by
West, Robert
,
Leskovec, Jure
,
Paskov, Hristo S.
in
Attachment
,
Corpus linguistics
,
Data mining
2014
Person-to-person evaluations are prevalent in all kinds of discourse and
important for establishing reputations, building social bonds, and shaping
public opinion. Such evaluations can be analyzed separately using signed social
networks and textual sentiment analysis, but this misses the rich interactions
between language and social context. To capture such interactions, we develop a
model that predicts individual
’s opinion of
individual
by synthesizing information from the
signed social network in which
and
are embedded with sentiment analysis of the
evaluative texts relating
to
. We prove that this problem is NP-hard but can be
relaxed to an efficiently solvable hinge-loss Markov random field, and we show
that this implementation outperforms text-only and network-only versions in two
very different datasets involving community-level decision-making: the Wikipedia
Requests for Adminship corpus and the Convote U.S. Congressional speech
corpus.
Journal Article
Learning with N-Grams: From Massive Scales to Compressed Representations
by
Paskov, Hristo Spassimirov
in
Artificial intelligence
,
Computer science
,
Industrial engineering
2017
Machine learning has established itself as an important driver of industrial progress and scientific discovery. The quest to expand its usage to address ever deeper questions and harder problems places particular emphasis on building sophisticated and statistically rigorous models that can handle the deluge of information being generated. The stakes are higher than ever; the success of global, billion dollar initiatives that can fundamentally change the landscape of human health rests on the existence of machine learning tools that can extract intricate relationships at unprecedented scales. In turn, machine learning paradigms are constantly evolving to address these needs, and some of the greatest advances have come from integrating combinatorial ideas with classical statistical ideas, such as the ability to perform principled feature selection using the Lasso. The underlying perspective of this thesis is that machine learning must rely on the algorithms and data structures that classically form the underpinnings of theoretical computer science in order to fully harness the potential of these combinatorial ideas. To this end, we contribute two advances to machine learning based on N-gram features, a feature representation for strings that has stood the test of time and continues to provide state-of-the-art results in natural language processing and genomics. The first addresses the computational and statistical issues of learning with long, and possibly all, N-grams in a document corpus. Our main result leverages suffix trees to provide a quadratic memory and processing time improvement over current machine learning systems by virtue of a fast matrix-vector multiplication routine whose computational requirements are at worst linear in the length of the underlying document corpus. As the majority of machine learning algorithms rely on and are bottlenecked by matrix-vector multiplication to learn, our routine can speed up almost any learning system by simply replacing its multiplication routine with ours. The practical savings are substantial, including an efficiency gain of four orders of magnitude for DNA sequence data, and open a new realm of possibilities for N-gram models. This routine also has large statistical implications; suffix trees perform a quadratic dimensionality reduction that substantially increases the robustness of machine learning systems when the appropriate level of data representation granularity is unknown. Finally, we provide an efficient persistent data storage system based on our algorithms that screens N-gram features according to a multitude of statistical criteria and produces data structures optimized for multiplication. Our second contribution looks to classical ideas from compression to devise a new form of combinatorial Deep Learning for text termed Dracula. Dracula is based on a generalization of the compression criterion underlying dictionary--based compressors like Lempel-Ziv 78. It learns a dictionary of N-grams that efficiently compresses a text corpus, and then recursively compresses its own dictionary for additional space savings. In doing so, it selects N-grams that are useful features for learning and induces a graph--based regularizer that orders the N-grams into low and high frequency components. Importantly, solving Dracula can be expressed as a binary linear program that may be further relaxed to a linear program, allowing a plurality of tools from optimization and computer science to be used to analyze its properties. Computationally, Dracula is NP-Complete, but it exhibits substantial problem structure that allows approximate algorithms to scale to large datasets. Statistically, we show how Dracula can learn a multitude of representations to accommodate an underlying storage cost model and identify parameters that control the behavior of its solutions in meaningful ways. We also demonstrate that Dracula is amenable to fine tuning by proving that its solutions evolve in a predictable way as the storage cost model varies. We demonstrate the utility of Dracula’s features using experiments over a variety of problem domains including natural language processing and bioinformatics.
Dissertation
Learning High Order Feature Interactions with Fine Control Kernels
2020
We provide a methodology for learning sparse statistical models that use as features all possible multiplicative interactions among an underlying atomic set of features. While the resulting optimization problems are exponentially sized, our methodology leads to algorithms that can often solve these problems exactly or provide approximate solutions based on combining highly correlated features. We also introduce an algorithmic paradigm, the Fine Control Kernel framework, so named because it is based on Fenchel Duality and is reminiscent of kernel methods. Its theory is tailored to large sparse learning problems, and it leads to efficient feature screening rules for interactions. These rules are inspired by the Apriori algorithm for market basket analysis -- which also falls under the purview of Fine Control Kernels, and can be applied to a plurality of learning problems including the Lasso and sparse matrix estimation. Experiments on biomedical datasets demonstrate the efficacy of our methodology in deriving algorithms that efficiently produce interactions models which achieve state-of-the-art accuracy and are interpretable.
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
2019
There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.
Data Representation and Compression Using Linear-Programming Approximations
by
Mitchell, John C
,
Hastie, Trevor J
,
Paskov, Hristo S
in
Data compression
,
Dictionaries
,
Linear programming
2016
We propose `Dracula', a new framework for unsupervised feature selection from sequential data such as text. Dracula learns a dictionary of \\(n\\)-grams that efficiently compresses a given corpus and recursively compresses its own dictionary; in effect, Dracula is a `deep' extension of Compressive Feature Learning. It requires solving a binary linear program that may be relaxed to a linear program. Both problems exhibit considerable structure, their solution paths are well behaved, and we identify parameters which control the depth and diversity of the dictionary. We also discuss how to derive features from the compressed documents and show that while certain unregularized linear models are invariant to the structure of the compressed dictionary, this structure may be used to regularize learning. Experiments are presented that demonstrate the efficacy of Dracula's features.
Exploiting Social Network Structure for Person-to-Person Sentiment Analysis
by
Leskovec, Jure
,
West, Robert
,
Paskov, Hristo S
in
Data mining
,
Decision making
,
Markov processes
2014
Person-to-person evaluations are prevalent in all kinds of discourse and important for establishing reputations, building social bonds, and shaping public opinion. Such evaluations can be analyzed separately using signed social networks and textual sentiment analysis, but this misses the rich interactions between language and social context. To capture such interactions, we develop a model that predicts individual A's opinion of individual B by synthesizing information from the signed social network in which A and B are embedded with sentiment analysis of the evaluative texts relating A to B. We prove that this problem is NP-hard but can be relaxed to an efficiently solvable hinge-loss Markov random field, and we show that this implementation outperforms text-only and network-only versions in two very different datasets involving community-level decision-making: the Wikipedia Requests for Adminship corpus and the Convote U.S. Congressional speech corpus.