Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
12
result(s) for
"Chiarcos, Christian"
Sort by:
Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax
2018
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.
Journal Article
Salience
by
Chiarcos, Christian
,
Claus, Berry
,
Grabski, Michael
in
Computational Linguistics
,
Computerlinguistik
,
Discourse Analysis
2011
The volume addresses the role of salience in discourse and provides broad coverage of various perspectives on and functions of discourse salience. The range of multidisciplinary approaches adopted in the volume differ with regard to the underlying theoretical proposals and foci of research. The topics range from (i) entity-based salience to (ii) discourse-structural salience of utterances to (iii) extra-linguistic factors of salience in discourse. Accordingly, the volume is organized into three sections.
Part I focuses on discourse referents and the choice of referring expressions. The contributions cover issues such as salience and demonstrativity in Russian, discourse salience and grammatical voice in the West Siberian language Eastern Khanty, the joined information of syntactic and semantic prominence, and a computational framework of salience metrics. The contributions to Part II are concerned with linguistic structures at or above the clause level. The salience of discourse segments is addressed with respect to the translation of discourse relations and position of verb arguments in Old High German. Part III extends the scope beyond purely linguistic phenomena and deals with the role of extra-linguistic salience in discourse processing. Visual salience in a situated-dialog context, salience marking by hypertextual links, and extra-linguistic salience derived from a mental representation of the described situation are all discussed here.
The notion of salience is of relevance to discourse studies in theoretical linguistics, computational linguistics, as well as psycholinguistics.
By all these lovely tokens... Merging conflicting tokenizations
2012
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a nontrivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.
Journal Article
Information structure in African languages: corpora and tools
2011
In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 \"Information Structure\". These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the of language documentation, distributed access, and the creation of data archives.
Journal Article
A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations
2017
We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.
Designing Annotation Schemes: From Model to Representation
by
Chiarcos, Christian
,
Ide, Nancy
,
Stede, Manfred
in
Computational linguistics
,
HUMANITIES
,
Standard representation formats
2017
The physical formats used to represent linguistic data and its annotations have evolved over the past four decades, accommodating different needs and perspectives as well as incorporating advances in data representation generally. This chapter provides an overview of representation formats with the aim of surveying the relevant issues for representing different data types together with current state-of-the-art solutions, in order to provide sufficient information to guide others in the choice of a representation format or formats.
Book Chapter