Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
239
result(s) for
"MANNING, Christopher D"
Sort by:
Human Language Understanding & Reasoning
2022
The last decade has yielded dramatic and quite surprising breakthroughs in natural language processing through the use of simple artificial neural network computations, replicated on a very large scale and trained over exceedingly large amounts of data. The resulting pretrained language models, such as BERT and GPT-3, have provided a powerful universal language understanding and generation base, which can easily be adapted to many understanding, writing, and reasoning tasks. These models show the first inklings of a more general form of artificial intelligence, which may lead to powerful foundation models in domains of sensory experience beyond just language.
Journal Article
Advances in natural language processing
by
Hirschberg, Julia
,
Manning, Christopher D.
in
artificial intelligence
,
Cognition & reasoning
,
Computation
2015
Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area.
Journal Article
Emergent linguistic structure in artificial neural networks trained by self-supervision
by
Hewitt, John
,
Khandelwal, Urvashi
,
Levy, Omer
in
Artificial neural networks
,
COLLOQUIUM PAPERS
,
Computer Sciences
2020
This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.
Journal Article
CoQA: A Conversational Question Answering Challenge
by
Chen, Danqi
,
Reddy, Siva
,
Manning, Christopher D.
in
Answers
,
Comprehension
,
Computational linguistics
2019
Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building
nversational
uestion
nswering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating that there is ample room for improvement. We present CoQA as a challenge to the community at
.
Journal Article
ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
by
Wu, Zhengxuan
,
Potts, Christopher
,
Manning, Christopher D.
in
Benchmarks
,
Computational linguistics
,
Equivalence
2023
Compositional generalization benchmarks for semantic parsing seek to assess whether models can accurately compute
for novel sentences, but operationalize this in terms of
(LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen,
). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.
Journal Article
Cross-lingual Projected Expectation Regularization for Weakly Supervised Learning
2021
We consider a multilingual weakly supervised learning scenario where knowledge
from annotated corpora in a resource-rich language is transferred via bitext to
guide the learning in other languages. Past approaches project labels across
bitext and use them as features or gold labels for training. We propose a new
method that projects model expectations rather than labels, which facilities
transfer of model uncertainty across language boundaries. We encode expectations
as constraints and train a discriminative CRF model using Generalized
Expectation Criteria (Mann and McCallum, 2010). Evaluated on standard
Chinese-English and German-English NER datasets, our method demonstrates
F
scores of 64% and 60% when no labeled data is used. Attaining
the same accuracy with supervised CRFs requires 12k and 1.5k labeled sentences.
Furthermore, when combined with labeled examples, our method yields significant
improvements over state-of-the-art supervised methods, achieving best reported
numbers to date on Chinese OntoNotes and German CoNLL-03 datasets.
Journal Article
Grounded Compositional Semantics for Finding and Describing Images with Sentences
by
Ng, Andrew Y.
,
Le, Quoc V.
,
Karpathy, Andrej
in
Compositionality
,
Dependency
,
Image classification
2021
Previous work on Recursive Neural Networks (RNNs) shows that these models can
produce compositional feature vectors for accurately representing and
classifying sentences or images. However, the sentence vectors of previous
models cannot accurately represent visually grounded meaning. We introduce the
DT-RNN model which uses dependency trees to embed sentences into a vector space
in order to retrieve images that are described by those sentences. Unlike
previous RNN-based models which use constituency trees, DT-RNNs naturally focus
on the action and agents in a sentence. They are better able to abstract from
the details of word order and syntactic expression. DT-RNNs outperform other
recursive and recurrent neural networks, kernelized CCA and a bag-of-words
baseline on the tasks of finding an image that fits a sentence description and
vice versa. They also give more similar representations to sentences that
describe the same image.
Journal Article
Combining joint models for biomedical event extraction
by
Surdeanu, Mihai
,
Riedel, Sebastian
,
McCallum, Andrew
in
Algorithms
,
Bioinformatics
,
Biomedical and Life Sciences
2012
Background
We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the best scoring event structure. Our primary focus is on
stacking
where the predictions from the Stanford system are used as features in the UMass system. For comparison, we look at simpler model combination techniques such as
intersection
and
union
which require only the outputs from each system and combine them directly.
Results
First, we find that stacking substantially improves performance while intersection and union provide no significant benefits. Second, we investigate the graph properties of event structures and their impact on the combination of our systems. Finally, we trace the origins of events proposed by the stacked model to determine the role each system plays in different components of the output. We learn that, while stacking can propose
novel event structures
not seen in either base model, these events have extremely low precision. Removing these novel events improves our already state-of-the-art F1 to 56.6% on the test set of Genia (Task 1). Overall, the combined system formed via stacking (\"FAUST\") performed well in the BioNLP 2011 shared task. The FAUST system obtained 1st place in three out of four tasks: 1st place in Genia Task 1 (56.0% F1) and Task 2 (53.9%), 2nd place in the Epigenetics and Post-translational Modifications track (35.0%), and 1st place in the Infectious Diseases track (55.6%).
Conclusion
We present a state-of-the-art event extraction system that relies on the strengths of structured prediction and model combination through stacking. Akin to results on other tasks, stacking outperforms intersection and union and leads to very strong results. The utility of model combination hinges on complementary views of the data, and we show that our sub-systems capture different graph properties of event structures. Finally, by removing low precision novel events, we show that performance from stacking can be further improved.
Journal Article
Measuring machine translation quality as semantic equivalence: A metric based on entailment features
by
Padó, Sebastian
,
Jurafsky, Dan
,
Galley, Michel
in
Artificial Intelligence
,
Computational Linguistics
,
Computer Applications
2009
Current evaluation metrics for machine translation have increasing difficulty in distinguishing good from merely fair translations. We believe the main problem to be their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that assesses the quality of MT output through its semantic equivalence to the reference translation, based on a rich set of match and mismatch features motivated by textual entailment. We first evaluate this metric in an evaluation setting against a combination metric of four state-of-the-art scores. Our metric predicts human judgments better than the combination metric. Combining the entailment and traditional features yields further improvements. Then, we demonstrate that the entailment metric can also be used as learning criterion in minimum error rate training (MERT) to improve parameter estimation in MT system training. A manual evaluation of the resulting translations indicates that the new model obtains a significant improvement in translation quality.
Journal Article
Benchmarking the Current Employment Statistics national estimates
2017
The Current Employment Statistics (CES) survey is a large monthly survey of approximately 147,000 businesses and government agencies that represent about 634,000 individual worksites. It is used to produce detailed industry estimates of employment, hours, and earnings for the nation, states, and metropolitan areas. The CES program benchmarks its all-employee series annually to reanchor sample-based employment estimates to full population counts. This process improves the accuracy of the CES all-employee series by replacing estimates with full population counts that are not subject to the sampling or modeling errors inherent in the CES monthly estimates. These population counts are derived from administrative records and are much less timely than the sample-based estimates. However, they provide a near census of establishment employment. The authors describe the procedures currently used to benchmark the national CES all-employee estimates.
Journal Article