Catalogue Search | MBRL

by Gospodinov, Georgi, 1968- author in Bulgarian fiction 21st century , Bulgarian literature 21st century , Bulgarian language Texts

2021

Book

Share this book

Add to My Shelf

Text Analytics in Bulgarian: An Overview and Future Directions

by Hristova, Gloria in Bulgarian text data , language resources development , natural language processing

2021

Text analytics is becoming an integral part of modern business and economic research and analysis. However, the extent to which its application is possible and accessible varies for different languages. The main goal of this paper is to outline fundamental research on text analytics applied on data in Bulgarian. A review of key research articles in two main directions is provided – development of language resources for Bulgarian and experimenting with Bulgarian text data in practical applications. By summarizing the results of a large literature review, we draw conclusions about the degree of development of the field, the availability of language resources for the Bulgarian language and the extent to which text analytics has been applied in practical problems. Future directions for research are outlined. To the best of the author’s knowledge, this is the first study providing a comprehensive overview of progress in the field of text analytics in Bulgarian.

Journal Article

Share this book

Add to My Shelf

Vsichki sme stranit͡si

by Ruseva, Peti͡a author in Bulgarian fiction 21st century , Bulgarian literature 21st century , Bulgarian language Texts

2023

Book

Share this book

Add to My Shelf

MULTEXT-East: morphosyntactic resources for Central and Eastern European languages

by Erjavec, Tomaž in Annotations , Artificial intelligence , Brief Report

2012

The paper presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the morphosyntactic specifications, morphosyntactic léxica, and a parallel corpus, the novel \"1984\" by George Orwell, which is sentence aligned and contains hand-validated morphosyntactic descriptions and lemmas. The resources are uniformly encoded in XML, using the Text Encoding Initiative Guidelines, TEI P5, and cover 16 languages, mainly from Central and Eastern Europe: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian. This dataset, unique in terms of languages covered and the wealth of encoding, is extensively documented, and freely available for research purposes. The paper overviews the MULTEXT-East resources by type and language and gives some conclusions and directions for further work.

Journal Article

Share this book

Add to My Shelf

Vtora bŭlgarska dŭrzhava

by Vagalinski, Li͡udmil Ferdinandov author , Khalembakov, Ogni͡an author , Zlatareva, Galina author in Children's stories, Bulgarian 21th century , Bulgarian language Texts , Bulgaria History

2022

Book

Share this book

Add to My Shelf

Text Analytics in Bulgarian: An Overview and Future Directions

by Hristova, Gloria in Bulgarian text data , language resources development , natural language processing

2021

Text analytics is becoming an integral part of modern business and economic research and analysis. However, the extent to which its application is possible and accessible varies for different languages. The main goal of this paper is to outline fundamental research on text analytics applied on data in Bulgarian. A review of key research articles in two main directions is provided – development of language resources for Bulgarian and experimenting with Bulgarian text data in practical applications. By summarizing the results of a large literature review, we draw conclusions about the degree of development of the field, the availability of language resources for the Bulgarian language and the extent to which text analytics has been applied in practical problems. Future directions for research are outlined. To the best of the author’s knowledge, this is the first study providing a comprehensive overview of progress in the field of text analytics in Bulgarian.

Journal Article

Share this book

Add to My Shelf

Pŭrva bŭlgarska dŭrzhava

by Khalembakov, Ogni͡an author , Vagalinski, Li͡udmil Ferdinandov author , Zlatareva, Galina author in Children's stories, Bulgarian 21th century , Bulgarian language Texts , Bulgaria History

2022

Book

Share this book

Add to My Shelf

Post-ocr text correction for Bulgarian historical documents

by Dimitrov, Dimitar , Beshirov, Angel , Nakov, Preslav in 19th century , Acknowledgment , Bulgarian language

2025

The digitization of historical documents is crucial for preserving the cultural heritage of the society. An essential step in this process is converting scanned images to text using Optical Character Recognition (OCR), which can enable further search, information extraction, etc. Unfortunately, this is a challenging problem as standard OCR tools are not tailored to deal with historical orthography or challenging layouts. Thus, it is standard to apply an additional text correction step on the OCR output when dealing with such documents. In this work, we focus on Bulgarian, and we create the first benchmark dataset for evaluating the OCR text correction for historical Bulgarian documents written in the first standardized Bulgarian orthography: the Drinov orthography from the 19th century. We further develop a method for automatically generating synthetic data in this orthography, as well as in the subsequent Ivanchev orthography, by leveraging vast amounts of contemporary literature Bulgarian texts. We then use state-of-the-art LLMs and encoder-decoder framework which we augment with diagonal attention loss and copy and coverage mechanisms to improve the post-OCR text correction. The proposed method reduces the errors introduced during the recognition. It improves the quality of the documents by 25%, which is an increase of 16% compared to the state-of-the-art on the ICDAR 2019 Bulgarian dataset. We release our data and code at https://github.com/angelbeshirov/post-ocr-text-correction .

Journal Article

Share this book

Add to My Shelf

Praistorii͡a.Traki Slavi͡ani Bulgari

by Khalembakov, Ogni͡an author , Pavlov, Plamen consultant in Children's stories, Bulgarian 21th century , Bulgarian language Texts , Bulgaria History

2021

Book

Share this book

Add to My Shelf

Vernacularization of Bulgarian literacy in the seventeenth century: new perspectives

by Mladenova, Olga M. in 16th century , 17th century , Adzhar

2018

The homily On the Ten Commandments, attributed to Damaskēnos Stouditēs but in fact authored by his teacher Theophanēs Eleavoulkos, was included in manuscripts of different types and read in the Bulgarian lands for over three hundred years starting in the late sixteenth century. This article reports the results of the textual analysis of 26 copies of the text in Greek, Church Slavonic, vernacular Bulgarian, and Romanian, and shows what light the textual study of a vernacular text in conjunction with its accessible counterparts can shed on the processes of vernacularization that took place in the Bulgarian lands in the seventeenth century. The article concludes that in this period there existed a circle of anonymous conservative men of letters who adhered to the opinion that Church Slavonic in a minimally modernized form should continue to be the written language of Bulgarians. It suggests that they were associated with the Adzhar school of calligraphy and illumination (1630s-1760s). The views endorsed by these men of letters were in contrast with those of their more radical contemporaries who championed a Bulgarian written language based on the vernacular. These two circles of men of letters shared many creative characteristics, probably due to their schooling together.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter