Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectCountry Of PublicationPublisherSourceTarget AudienceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
44
result(s) for
"Basque language Texts"
Sort by:
Standard Basque : a progressive grammar
by
Coene, Armand de
,
Rijk, Rudolf P. G. de
in
Basque language
,
Basque language -- Grammar
,
Basque language -- Textbooks for foreign speakers -- English
2008,2007
The first modern pedagogically oriented reference to the grammar of standard Basque (Euskara Batua), in two volumes: Part 1 presents detailed grammar lessons, Part 2 glosses and supplementary materials.
Assessment of the E3C corpus for the recognition of disorders in clinical texts
by
Lavelli, Alberto
,
Toti, Daniele
,
Zanoli, Roberto
in
Abdominal pain
,
Acknowledgment
,
Annotations
2024
Disorder named entity recognition (DNER) is a fundamental task of biomedical natural language processing, which has attracted plenty of attention. This task consists in extracting named entities of disorders such as diseases, symptoms, and pathological functions from unstructured text. The European Clinical Case Corpus (E3C) is a freely available multilingual corpus (English, French, Italian, Spanish, and Basque) of semantically annotated clinical case texts. The entities of type disorder in the clinical cases are annotated at both mention and concept level. At mention -level, the annotation identifies the entity text spans, for example, abdominal pain. At concept level, the entity text spans are associated with their concept identifiers in Unified Medical Language System, for example, C0000737. This corpus can be exploited as a benchmark for training and assessing information extraction systems. Within the context of the present work, multiple experiments have been conducted in order to test the appropriateness of the mention-level annotation of the E3C corpus for training DNER models. In these experiments, traditional machine learning models like conditional random fields and more recent multilingual pre-trained models based on deep learning were compared with standard baselines. With regard to the multilingual pre-trained models, they were fine-tuned (i) on each language of the corpus to test per-language performance, (ii) on all languages to test multilingual learning, and (iii) on all languages except the target language to test cross-lingual transfer learning. Results show the appropriateness of the E3C corpus for training a system capable of mining disorder entities from clinical case texts. Researchers can use these results as the baselines for this corpus to compare their own models. The implemented models have been made available through the European Language Grid platform for quick and easy access.
Journal Article
The corpus of Basque simplified texts (CBST)
by
Gonzalez-Dios, Itziar
,
Aranzabe, María Jesús
,
Díaz de Ilarraza, Arantza
in
Annotations
,
Basque language
,
Basque people
2018
In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.
Journal Article
Exploring Automatic Readability Assessment for Science Documents within a Multilingual Educational Context
by
Aranberri, Nora
,
Aldabe, Itziar
,
Uçar, Suna-Şeyma
in
Academic achievement
,
Accuracy
,
Active Learning
2024
Current student-centred, multilingual, active teaching methodologies require that teachers have continuous access to texts that are adequate in terms of topic and language competence. However, the task of finding appropriate materials is arduous and time consuming for teachers. To build on automatic readability assessment research that could help to assist teachers, we explore the performance of natural language processing approaches when dealing with educational science documents for secondary education. Currently, readability assessment is mainly explored in English. In this work we extend our research to Basque and Spanish together with English by compiling context-specific corpora and then testing the performance of feature-based machine-learning and deep learning models. Based on the evaluation of our results, we find that our models do not generalize well although deep learning models obtain better accuracy and F1 in all configurations. Further research in this area is still necessary to determine reliable characteristics of training corpora and model parameters to ensure generalizability.
Journal Article
Measuring language distance for historical texts in Basque
by
Estarrona, Ainara
,
Padilla-Moyano, Manuel
,
Etxeberria, Izaskun
in
Basque language
,
Basque people
,
Corpus linguistics
2023
Measuring distance between languages, dialects and language varieties, both synchronically and diachronically, is a topic of growing interest in NLP. Based on our Syntactically Annotated Historical COrpus in BAsque (SAHCOBA) and previous work in perplexity-based language distance proposed by Gamallo, Pichel and Alegria (2017, 2020), we have compared historical corpora with current texts in the standard variety and calculated the language distances between them. As the standard Basque is based on the central dialects, the starting hypothesis is that the oldest texts and the dialects on the extremes will be the most distant. The results obtained have largely confirmed the thesis of traditional dialectology: peripheral dialects show a strong idiosyncrasy and are more distant from the rest.
Journal Article