Catalogue Search | MBRL

Živý bič

by Urban, Milo author in Slovak fiction 21th century , Slovenian literature 21th century , Slovak language Texts

2014

Book

Share this book

Add to My Shelf

MULTEXT-East: morphosyntactic resources for Central and Eastern European languages

by Erjavec, Tomaž in Annotations , Artificial intelligence , Brief Report

2012

The paper presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the morphosyntactic specifications, morphosyntactic léxica, and a parallel corpus, the novel \"1984\" by George Orwell, which is sentence aligned and contains hand-validated morphosyntactic descriptions and lemmas. The resources are uniformly encoded in XML, using the Text Encoding Initiative Guidelines, TEI P5, and cover 16 languages, mainly from Central and Eastern Europe: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian. This dataset, unique in terms of languages covered and the wealth of encoding, is extensively documented, and freely available for research purposes. The paper overviews the MULTEXT-East resources by type and language and gives some conclusions and directions for further work.

Journal Article

Share this book

Add to My Shelf

Malý princ

by Saint-Exupéry, Antoine de, 1900-1944 author , Zannoni, Laura illustrator , Baláž, Martin translator in Voyages and travels Juvenile fiction , Friendship Juvenile fiction , Princes Juvenile fiction

2000

An aviator whose plane is forced down in the Sahara Desert encounters a little man from a small planet who describes his adventures in the universe seeking the secret of what is really important in life.

BOOK

Share this book

Add to My Shelf

Learning and Retaining Specialized Vocabulary From Textbook Reading: Comparison of Learning Outcomes Through L1 and L2

by Gablasova, Dana in Academic learning , Bilingual Education , Delayed

2014

This study investigated the acquisition of specialized vocabulary from L1 and L2 textbook reading by 64 Slovak high school students who were intermediate or advanced users of English. The students were divided into two groups: One group reads the academic texts in their L1, the other group in their L2. In a posttest and a delayed posttest, they were asked to orally recall the meanings of 12 technical words that appeared in the texts. The word meanings recalled by the students immediately after reading and 1 week later were examined in terms of their breadth and depth. Results showed that although the L2-instructed students acquired the meanings of the specialized vocabulary items to a considerable degree, they still differed significantly from their L1-instructed counterparts in several respects: They could recall fewer word meanings after the reading; they acquired the words to a lesser depth; and after a week, their knowledge of the words faded more rapidly than that of the L1-instructed participants. The significance of the findings for L2 vocabulary acquisition and bilingual education is discussed. (Verlag).

Journal Article

Share this book

Add to My Shelf

Machine Learning and Lexicon Approach to Texts Processing in the Detection of Degrees of Toxicity in Online Discussions

by Adamišín, Kamil , Mach, Marián , Machová, Kristína in Accuracy , Algorithms , Computational linguistics

2022

This article focuses on the problem of detecting toxicity in online discussions. Toxicity is currently a serious problem when people are largely influenced by opinions on social networks. We offer a solution based on classification models using machine learning methods to classify short texts on social networks into multiple degrees of toxicity. The classification models used both classic methods of machine learning, such as naïve Bayes and SVM (support vector machine) as well ensemble methods, such as bagging and RF (random forest). The models were created using text data, which we extracted from social networks in the Slovak language. The labelling of our dataset of short texts into multiple classes—the degrees of toxicity—was provided automatically by our method based on the lexicon approach to texts processing. This lexicon method required creating a dictionary of toxic words in the Slovak language, which is another contribution of the work. Finally, an application was created based on the learned machine learning models, which can be used to detect the degree of toxicity of new social network comments as well as for experimentation with various machine learning methods. We achieved the best results using an SVM—average value of accuracy = 0.89 and F1 = 0.79. This model also outperformed the ensemble learning by the RF and Bagging methods; however, the ensemble learning methods achieved better results than the naïve Bayes method.

Journal Article

Share this book

Add to My Shelf

Evaluation of English–Slovak Neural and Statistical Machine Translation

by Munkova, Dasa , Benko, Ľubomír , Munk, Michal in Accuracy , automatic evaluation , Datasets

2021

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Journal Article

Share this book

Add to My Shelf

Haluzárna, bafačky a homokonec: Neologismy na webu Československé filmové databáze a relevance neologismů (nejen) pro popis slovotvorby

by Sláma, Jakub in Language and Literature Studies , neologism , Philology

2023

Studies of Czech neologisms usually focus on neologisms found in standard texts or in a collaborative online dictionary including mostly expressive words, invented words and the like. The paper attempts to bridge the gap by focusing on less formal and standard, yet fully authentic texts, i.e., user reviews in the Czech-Slovak film database. Based on 2,006 novel lexical units found on the website, it is shown that while neologisms are mostly found among lexical words, some are also found among grammatical, supposedly closed-class words. It is illustrated that derivation by suffixes and prefixoids as well as compounding are the most productive processes, yielding some surprisingly productive types of new words. Finally, the paper focuses on three prominent patterns found among the neologisms (X-árna, V-ačka, and homo-N) and points out their relevance, thus illustrating that studies of neologisms do not need to be trivially descriptive and classificatory but can point towards more general issues, both theoretical and methodological.

Journal Article

Share this book

Add to My Shelf

Use of Computer and Corpus Tols in the Research of a 19th Century German -Language Manuscript Bok of Notes and Extracts

by Braxatoris, Martin , Braxatorisová, Anita in 19th century , Computerized corpora , Computers

2023

The study explores the possibilities of using computer and corpus tools in the interpretation of texts of the genre of book of notes and extracts; these are documents consisting of extracts and modified excerpts from contemporary press and literature, records of the author’s own thoughts, etc. Samuel Ferjenčík’s manuscript is a Germanlanguage document by a Slovak author intended for private use; cited or adapted passages are usually given without any reference to the source. The paper introduces the problems of automatic identification of the source base, which relate to the application of OCR and content similarity detection tools. It discusses the results of text matching, which revealed several manipulations of source texts, especially substitutions, indicating attitudes and priority problems in the author’s thought-world. It further interprets the results of the use of the Sketch Engine corpus manager tools by which the frequency of occurrence of key terms and their collocability were investigated, paying special attention to substituted words. The paper is an example of the application of computer and corpus-linguistics methods to the interpretation of literary texts, which is represented by a number of current studies in the field of digital humanities. The proposed approaches are applicable to research on other books of notes and extracts, topical in the context of research trends related to egodocuments, as well as to textual research on monu ments of other genres.

Journal Article

Share this book

Add to My Shelf

Syntetická poézia v kontexte slovenského nekonvenčného písania a postliterárnej situácie

by Šrank, Jaroslav in Literary criticism , Natural language processing , Neural networks

2022

Štúdia sa venuje básnickej zbierke Liza Gennart: Výsledky vzniku (2020) z poetologického a axiologického hľadiska. Výsledky vzniku sú súčasťou projektu, ktorý vytvorili súčasná slovenská poetka a teoretička elektronickej literatúry Zuzana Husárová a zvukový umelec a programátor Ľubomír Panák za využitia neurónovej siete. Básne, ktoré vygenerovala neurónová sieť v tomto texte v nadväznosti na zaužívané označenie syntetický text chápeme ako syntetickú poéziu. Úvodné časti štúdie postihujú aspekty, do ktorých sa projekt Z. Husárovej a Ľ. Panáka zapisuje z globálneho hľadiska (ekonomické pohyby, pozícia literatúry vo svete, pohyby v humanitných vedách) a vo vzťahu k technologickej sfére, ktorá tvorí významnú zložku skúmaného projektu. Literárnohistorická kontextualizácia sa venuje generatívnemu písaniu v slovenskej literatúre, autorským tímom a virtuálnym autorským signatúram. Štvrtý oddiel štúdie na základe textovej analýzy navrhuje vo vzťahu k tejto aktualizácii syntetickej poézie zaviesť terminologické označenia poetika defektu, poetika inkoherencie a poetika redukcie.

Journal Article

Share this book

Add to My Shelf

Identification of Spontaneous Spoken Texts in Slovak

by Krammer, Peter , Mojžiš, Ján , Kvassay, Marcel in Accuracy , Classification , Dictionaries

2019

We propose a text classification method for the purpose of creating a language model for automatic recognition of spontaneous spoken speech. Transcripts from our departmental speech database served as spontaneous spoken texts. Using supervised machine learning methods, we have created multiple classification models (including neural networks), that were able to distinguish them from written texts with high accuracy. We subsequently verified the accuracy of our trained models on a database of texts containing direct speech extracted from newspaper articles.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter