Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
75 result(s) for "Latin language Data processing."
Sort by:
Methods in Latin Computational Linguistics
In Methods in Latin Computational Linguistics, Barbara McGillivray presents some of the methodological foundations of Latin Computational Linguistics through three corpus case studies covering morpho-syntactic and lexical-semantic aspects of Latin verb valency and quantitative diachronic explorations of Latin prefixed verbs.
Evaluating transfer learning approach for detecting Arabic anti-refugee/migrant speech on social media
Purpose>The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African refugees and illegal migrants on the YouTube Algerian space.Design/methodology/approach>The transfer learning approach which recently presents the state-of-the-art approach in natural language processing tasks has been exploited to classify and detect hate speech in Algerian dialectal Arabic. Besides, a descriptive analysis has been conducted to answer the analytical research questions that aim at measuring and evaluating the presence of the anti-refugee/migrant discourse on the YouTube social platform.Findings>Data analysis revealed that there has been a gradual modest increase in the number of anti-refugee/migrant hateful comments on YouTube since 2014, a sharp rise in 2017 and a sharp decline in later years until 2021. Furthermore, our findings stemming from classifying hate content using multilingual and monolingual pre-trained language transformers demonstrate a good performance of the AraBERT monolingual transformer in comparison with the monodialectal transformer DziriBERT and the cross-lingual transformers mBERT and XLM-R.Originality/value>Automatic hate speech detection in languages other than English is quite a challenging task that the literature has tried to address by various approaches of machine learning. Although the recent approach of cross-lingual transfer learning offers a promising solution, tackling this problem in the context of the Arabic language, particularly dialectal Arabic makes it even more challenging. Our results cast a new light on the actual ability of the transfer learning approach to deal with low-resource languages that widely differ from high-resource languages as well as other Latin-based, low-resource languages.
Development of basic reading skills in Latin: a corpus-based tool for computer-assisted fluency training
The present paper evaluates the processes of reading acquisition in Latin from the component-skills approach and discusses how advances in reading in modern foreign languages could be adapted to the specific needs of Latin as a historical language. Compared to the holistic and socially embedded approaches to modern foreign language acquisition, the grammar-translation method traditionally used in schools shows considerable weaknesses in the development of basic reading skills in Latin. Therefore, we address the possible advantages of corpus-based teaching strategies and present Machina Callida, a psycholinguistically informed e-tutor suitable for supporting Latin vocabulary acquisition and reading comprehension at beginner and intermediate levels. Using digital corpora of original Latin texts, the application semi-automatically generates contextualized vocabulary exercises tailored to the needs of different groups of learners. Through its integration with the research data repository Zenodo, Machina Callida supports online collaboration in the creation and distribution of open educational resources through crowdsourcing.
Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers
Text classification is a natural language processing (NLP) task relevant to many commercial applications, like e-commerce and customer service. Naturally, classifying such excerpts accurately often represents a challenge, due to intrinsic language aspects, like irony and nuance. To accomplish this task, one must provide a robust numerical representation for documents, a process known as embedding. Embedding represents a key NLP field nowadays, having faced a significant advance in the last decade, especially after the introduction of the word-to-vector concept and the popularization of Deep Learning models for solving NLP tasks, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer-based Language Models (TLMs). Despite the impressive achievements in this field, the literature coverage regarding generating embeddings for Brazilian Portuguese texts is scarce, especially when considering commercial user reviews. Therefore, this work aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese. This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models. The methods are evaluated with five open-source databases with pre-defined data partitions made available in an open digital repository to encourage reproducibility. The Fine-tuned TLMs achieved the best results for all cases, being followed by the Feature-based TLM, LSTM, and CNN, with alternate ranks, depending on the database under analysis.
Design and implementation of adolescent health Latin dance teaching system under artificial intelligence technology
Since various dance teaching systems have attracted much attention with the development of Artificial Intelligence (AI) technology, this paper improves the recognition performance of Latin dance teaching systems by optimizing the action recognition model. Firstly, the object detection and action recognition technology under the current AI technology is analyzed, and the Two-stage object detection algorithm and One-stage object detection algorithm are evaluated. Secondly, the technologies and functions contained in the adolescent health Latin dance teaching system are described, including image acquisition, feature extraction, object detection, and action recognition. Finally, the action recognition algorithm is optimized based on object detection, and the rationality and feasibility of the proposed algorithm are verified by experiments. The experimental results show that the optimization algorithm can search the optimal feature subset after five iterations on Undefine Classes of 101 (UCF101) dataset, but it needs seven iterations on Human Motion Database 51 (HMDB51) dataset. Meanwhile, when using support vector machine classifier, the optimization algorithm can achieve the highest accuracy of motion recognition. Regressive Function, Multinomial Naive Bayes and Gaussian Naive Bayes Algorithms have lower prediction delay, as low as 0.01s. Therefore, this paper has certain reference significance for the design and implementation of adolescent health Latin dance teaching system.
Transliterating Latin to Amharic scripts using user-defined rules and character mappings
As social media platforms become increasingly accessible, individuals’ usage of new forms of textual communication (posts, comments, chats, etc.) on social media using local language scripts such as Amharic has increased tremendously. However, many users prefer to post comments in Latin scripts instead of local ones due to the availability of more convenient forms of character input using Latin keyboards. In existing Latin to Amharic transliteration systems, missing consideration of double consonants and double vowels has caused transliteration errors. Further, as there are multiple ways of character mapping conventions in existing systems, social media texts are susceptible to a wide variety of user adoptions during script production. The current systems have failed to address these gaps and adoptions. In this work, we present the RBLatAm (Rule-Based Latin to Amharic) transliteration system, a generic rule-based system that converts Amharic words which have been written using Latin script back into their native Amharic script. The system is based on mapping rules engineered from three existing transliteration systems (Microsoft, Google, SERA) and additional rules for double consonants, and conventions adopted on social media by speakers of Amharic. When tested on transliterated Amharic words of non-named entities, and named entities of persons, the system achieves an accuracy of 75.8% and 84.6%, respectively. The system also correctly transliterates words reported as errors in previous studies. This system drastically improves the basis for performing research on text mining for Amharic language texts by being able to process such texts even if they have originally been produced in Latin scripts.
Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
Background In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems and focus on fast capturing of data to detect threat signals through channels other than traditional surveillance. In this study, we develop Natural Language Processing tools that can be used within EBS systems. In particular, we focus on information extraction techniques that enable digital surveillance to monitor Internet data and social media. Results We created an annotated Spanish corpus from ProMED-mail health reports regarding disease outbreaks in Latin America. The corpus has been used to train algorithms for two information extraction tasks: named entity recognition and relation extraction. The algorithms, based on deep learning and rules, have been applied to recognize diseases, hosts, and geographical locations where a disease is occurring, among other entities and relations. In addition, an in-depth analysis of micro-average F1 metrics shows the suitability of our approaches for both tasks. Conclusions The annotated corpus and algorithms presented could leverage the development of automated tools for extracting information from news and health reports written in Spanish. Moreover, this framework could be useful within EBS systems to support the early detection of Latin American disease outbreaks.
Developing a big data analytics platform using Apache Hadoop Ecosystem for delivering big data services in libraries
Purpose Although the challenges associated with big data are increasing, the question of the most suitable big data analytics (BDA) platform in libraries is always significant. The purpose of this study is to propose a solution to this problem. Design/methodology/approach The current study identifies relevant literature and provides a review of big data adoption in libraries. It also presents a step-by-step guide for the development of a BDA platform using the Apache Hadoop Ecosystem. To test the system, an analysis of library big data using Apache Pig, which is a tool from the Apache Hadoop Ecosystem, was performed. It establishes the effectiveness of Apache Hadoop Ecosystem as a powerful BDA solution in libraries. Findings It can be inferred from the literature that libraries and librarians have not taken the possibility of big data services in libraries very seriously. Also, the literature suggests that there is no significant effort made to establish any BDA architecture in libraries. This study establishes the Apache Hadoop Ecosystem as a possible solution for delivering BDA services in libraries. Research limitations/implications The present work suggests adapting the idea of providing various big data services in a library by developing a BDA platform, for instance, providing assistance to the researchers in understanding the big data, cleaning and curation of big data by skilled and experienced data managers and providing the infrastructural support to store, process, manage, analyze and visualize the big data. Practical implications The study concludes that Apache Hadoops’ Hadoop Distributed File System and MapReduce components significantly reduce the complexities of big data storage and processing, respectively, and Apache Pig, using Pig Latin scripting language, is very efficient in processing big data and responding to queries with a quick response time. Originality/value According to the study, there are significantly fewer efforts made to analyze big data from libraries. Furthermore, it has been discovered that acceptance of the Apache Hadoop Ecosystem as a solution to big data problems in libraries are not widely discussed in the literature, although Apache Hadoop is regarded as one of the best frameworks for big data handling.
Text-mining forma mentis networks reconstruct public perception of the STEM gender gap in social media
Mindset reconstruction maps how individuals structure and perceive knowledge, a map unfolded here by investigating language and its cognitive reflection in the human mind, i.e., the mental lexicon. Textual forma mentis networks (TFMN) are glass boxes introduced for extracting and understanding mindsets’ structure (in Latin forma mentis ) from textual data. Combining network science, psycholinguistics and Big Data, TFMNs successfully identified relevant concepts in benchmark texts, without supervision. Once validated, TFMNs were applied to the case study of distorted mindsets about the gender gap in science. Focusing on social media, this work analysed 10,000 tweets mostly representing individuals’ opinions at the beginning of posts. “Gender” and “gap” elicited a mostly positive, trustful and joyous perception, with semantic associates that: celebrated successful female scientists, related gender gap to wage differences, and hoped for a future resolution. The perception of “woman” highlighted jargon of sexual harassment and stereotype threat (a form of implicit cognitive bias) about women in science “sacrificing personal skills for success”. The semantic frame of “man” highlighted awareness of the myth of male superiority in science. No anger was detected around “person”, suggesting that tweets got less tense around genderless terms. No stereotypical perception of “scientist” was identified online, differently from real-world surveys. This analysis thus identified that Twitter discourse mostly starting conversations promoted a majorly stereotype-free, positive/trustful perception of gender disparity, aimed at closing the gap. Hence, future monitoring against discriminating language should focus on other parts of conversations like users’ replies. TFMNs enable new ways for monitoring collective online mindsets, offering data-informed ground for policy making.