Catalogue Search | MBRL

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

by Chiarcos, Christian , Khait, Ilya , Steuer, Julius in Annotations , Cuneiform , Languages

2018

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.

Journal Article

Share this book

Add to My Shelf

Salience

by Chiarcos, Christian , Claus, Berry , Grabski, Michael in Computational Linguistics , Computerlinguistik , Discourse Analysis

2011

The volume addresses the role of salience in discourse and provides broad coverage of various perspectives on and functions of discourse salience. The range of multidisciplinary approaches adopted in the volume differ with regard to the underlying theoretical proposals and foci of research. The topics range from (i) entity-based salience to (ii) discourse-structural salience of utterances to (iii) extra-linguistic factors of salience in discourse. Accordingly, the volume is organized into three sections. Part I focuses on discourse referents and the choice of referring expressions. The contributions cover issues such as salience and demonstrativity in Russian, discourse salience and grammatical voice in the West Siberian language Eastern Khanty, the joined information of syntactic and semantic prominence, and a computational framework of salience metrics. The contributions to Part II are concerned with linguistic structures at or above the clause level. The salience of discourse segments is addressed with respect to the translation of discourse relations and position of verb arguments in Old High German. Part III extends the scope beyond purely linguistic phenomena and deals with the role of extra-linguistic salience in discourse processing. Visual salience in a situated-dialog context, salience marking by hypertextual links, and extra-linguistic salience derived from a mental representation of the described situation are all discussed here. The notion of salience is of relevance to discourse studies in theoretical linguistics, computational linguistics, as well as psycholinguistics.

eBook

Share this book

Add to My Shelf

By all these lovely tokens... Merging conflicting tokenizations

by Chiarcos, Christian , Ritz, Julia , Stede, Manfred in Algorithms , Annotations , Architecture

2012

Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a nontrivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.

Journal Article

Share this book

Add to My Shelf

By all these lovely tokens... Merging conflicting tokenizations : Linguistic Annotations

by CHIARCOS, Christian , STEDE, Manfred , RITZ, Julia in Applied linguistics , Computational linguistics , Linguistics

2012

Journal Article

Share this book

Add to My Shelf

Information structure in African languages: corpora and tools

by Chiarcos, Christian , Grubic, Mira , Ritz, Julia in Access , African Languages , Annotations

2011

In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 \"Information Structure\". These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the of language documentation, distributed access, and the creation of data archives.

Journal Article

Share this book

Add to My Shelf

Information structure in African languages: corpora and tools

by FIEDLER, Ines , GRUBIC, Mira , CHIARCOS, Christian in Applied linguistics , Computational linguistics , Linguistics

2011

Journal Article

Share this book

Add to My Shelf

The Mental Salience Framework: Context-adequate generation of referring expressions

by Chiarcos, Christian

2011

Book Chapter

Share this book

Add to My Shelf

A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

by Chiarcos, Christian , Schenk, Niko , Rönnqvist, Samuel

2017

We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.

Paper

Share this book

Add to My Shelf

Designing Annotation Schemes: From Model to Representation

by Chiarcos, Christian , Ide, Nancy , Stede, Manfred in Computational linguistics , HUMANITIES , Standard representation formats

2017

The physical formats used to represent linguistic data and its annotations have evolved over the past four decades, accommodating different needs and perspectives as well as incorporating advances in data representation generally. This chapter provides an overview of representation formats with the aim of surveying the relevant issues for representing different data types together with current state-of-the-art solutions, in order to provide sufficient information to guide others in the choice of a representation format or formats.

Book Chapter

Share this book

Add to My Shelf

Introduction: Salience in linguistics and beyond

by Chiarcos, Christian , Claus, Berry , Grabski, Michael in Computational linguistics , Discourse analysis , Psycholinguistics

2011

Book Chapter

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter