Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Country Of Publication
    • Publisher
    • Source
    • Target Audience
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
41 result(s) for "Bulgarian language Texts"
Sort by:
Text Analytics in Bulgarian: An Overview and Future Directions
Text analytics is becoming an integral part of modern business and economic research and analysis. However, the extent to which its application is possible and accessible varies for different languages. The main goal of this paper is to outline fundamental research on text analytics applied on data in Bulgarian. A review of key research articles in two main directions is provided – development of language resources for Bulgarian and experimenting with Bulgarian text data in practical applications. By summarizing the results of a large literature review, we draw conclusions about the degree of development of the field, the availability of language resources for the Bulgarian language and the extent to which text analytics has been applied in practical problems. Future directions for research are outlined. To the best of the author’s knowledge, this is the first study providing a comprehensive overview of progress in the field of text analytics in Bulgarian.
MULTEXT-East: morphosyntactic resources for Central and Eastern European languages
The paper presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the morphosyntactic specifications, morphosyntactic léxica, and a parallel corpus, the novel \"1984\" by George Orwell, which is sentence aligned and contains hand-validated morphosyntactic descriptions and lemmas. The resources are uniformly encoded in XML, using the Text Encoding Initiative Guidelines, TEI P5, and cover 16 languages, mainly from Central and Eastern Europe: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian. This dataset, unique in terms of languages covered and the wealth of encoding, is extensively documented, and freely available for research purposes. The paper overviews the MULTEXT-East resources by type and language and gives some conclusions and directions for further work.
Text Analytics in Bulgarian: An Overview and Future Directions
Text analytics is becoming an integral part of modern business and economic research and analysis. However, the extent to which its application is possible and accessible varies for different languages. The main goal of this paper is to outline fundamental research on text analytics applied on data in Bulgarian. A review of key research articles in two main directions is provided – development of language resources for Bulgarian and experimenting with Bulgarian text data in practical applications. By summarizing the results of a large literature review, we draw conclusions about the degree of development of the field, the availability of language resources for the Bulgarian language and the extent to which text analytics has been applied in practical problems. Future directions for research are outlined. To the best of the author’s knowledge, this is the first study providing a comprehensive overview of progress in the field of text analytics in Bulgarian.
Post-ocr text correction for Bulgarian historical documents
The digitization of historical documents is crucial for preserving the cultural heritage of the society. An essential step in this process is converting scanned images to text using Optical Character Recognition (OCR), which can enable further search, information extraction, etc. Unfortunately, this is a challenging problem as standard OCR tools are not tailored to deal with historical orthography or challenging layouts. Thus, it is standard to apply an additional text correction step on the OCR output when dealing with such documents. In this work, we focus on Bulgarian, and we create the first benchmark dataset for evaluating the OCR text correction for historical Bulgarian documents written in the first standardized Bulgarian orthography: the Drinov orthography from the 19th century. We further develop a method for automatically generating synthetic data in this orthography, as well as in the subsequent Ivanchev orthography, by leveraging vast amounts of contemporary literature Bulgarian texts. We then use state-of-the-art LLMs and encoder-decoder framework which we augment with diagonal attention loss and copy and coverage mechanisms to improve the post-OCR text correction. The proposed method reduces the errors introduced during the recognition. It improves the quality of the documents by 25%, which is an increase of 16% compared to the state-of-the-art on the ICDAR 2019 Bulgarian dataset. We release our data and code at https://github.com/angelbeshirov/post-ocr-text-correction .
Vernacularization of Bulgarian literacy in the seventeenth century: new perspectives
The homily On the Ten Commandments, attributed to Damaskēnos Stouditēs but in fact authored by his teacher Theophanēs Eleavoulkos, was included in manuscripts of different types and read in the Bulgarian lands for over three hundred years starting in the late sixteenth century. This article reports the results of the textual analysis of 26 copies of the text in Greek, Church Slavonic, vernacular Bulgarian, and Romanian, and shows what light the textual study of a vernacular text in conjunction with its accessible counterparts can shed on the processes of vernacularization that took place in the Bulgarian lands in the seventeenth century. The article concludes that in this period there existed a circle of anonymous conservative men of letters who adhered to the opinion that Church Slavonic in a minimally modernized form should continue to be the written language of Bulgarians. It suggests that they were associated with the Adzhar school of calligraphy and illumination (1630s-1760s). The views endorsed by these men of letters were in contrast with those of their more radical contemporaries who championed a Bulgarian written language based on the vernacular. These two circles of men of letters shared many creative characteristics, probably due to their schooling together.