Catalogue Search | MBRL

Practical corpus linguistics : an introduction to corpus-based language analysis

by Weisser, Martin in Citation of electronic information resources , Computational linguisitics -- Methodology , Computer network resources -- Evaluation

2016,2015

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed. * Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed * Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux * Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools * Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data

eBook

Share this book

Add to My Shelf

The Potential of Automatic Word Comparison for Historical Linguistics

by Gray, Russell D. , List, Johann-Mattis , Greenhill, Simon J. in Algorithms , Automatic , Automation

2017

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.

Journal Article

Share this book

Add to My Shelf

EsPal: One-stop shopping for Spanish word properties

by Duchon, Andrew , Martí, Antonia , Perea, Manuel in Behavioral Science and Psychology , Cognitive Psychology , Corpus Linguistics

2013

This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, biphones, and bisyllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users either to upload a set of words to receive their properties or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal/ .

Journal Article

Share this book

Add to My Shelf

Systematic Literature Review of Ecological Discourse Analysis From 2014 to 2023

by Ang, Lay Hoon , Halim, Hazlina Abdul , Chu, Aixuan in Analysis , Anthropocene , Applied Linguistics

2024

One of the crucial foci of contemporary linguistics is upon the complex human-nature relationship within the Anthropocene. An essential methodology of eco-linguistics is Ecological Discourse Analysis (EDA), a field which combines linguistics and ecology. The philosophical underpinning of EDA is rooted in an ecological thematic framework, emphasising the significance of biodiversity and sustainability in the natural world. Despite gaining attention over the past decade, EDA still lacks a comprehensive explanation to adequately and accurately explain the fundamental assertion of its philosophical framework: it gives undue prominence to language biodiversity and sustainability. This systematic literature review, conducted using the PRISMA 2020 paradigm, covers studies from 2014 to 2023. It examines 38 works on EDA across several genres. Further clues illustrate the application of EDA to various genres, often informed by theoretical frameworks associated with systemic functional linguistics, cognitive linguistics, and corpus linguistics. EDA research concentrates on ecological discourse and advocating for protection, exposing the lack of nomenclature, analytical framework, application domain, theoretical framework, and objectives. Furthermore, EDA faces challenges in effectively addressing ecological issues within discourse construction because it does not have a sound theoretical paradigm or enough systematicity. Furthermore, the texts analysed using EDA predominantly focus on ecological discourse, with only a few studies incorporating non-ecological literary texts. This underscores the necessity of expanding the scope of ecological linguistic research.

Journal Article

Share this book

Add to My Shelf

Wordbank: an open repository for developmental vocabulary data

by FRANK, MICHAEL C. , MARCHMAN, VIRGINIA A. , YUROVSKY, DANIEL in Ability , Child , Child development

2017

The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.

Journal Article

Share this book

Add to My Shelf

Assessing the Role of Socio-Demographic Triggers on Kolmogorov-Based Complexity in Spoken English Varieties

by Ehret, Katharina in Chaos theory , Community research , Complexity

2025

This paper assesses the role of socio-demographic triggers on Kolmogorov-based complexity in spoken English varieties. It thus contributes to the ongoing debate on contact and complexity in the sociolinguistic typological research community. Currently, evidence on whether socio-demographic triggers influence the morphosyntactic complexity of languages is controversial and inconclusive. Particularly controversial is the influence of the proportion of non-native speakers and the number of native speakers, which are both common proxies for language contact. In order to illuminate the issue from an English-varieties perspective, I use regression analysis to test several socio-demographic triggers in a corpus database of spoken English varieties. Language complexity here is operationalised in terms of Kolmogorov-based morphological and syntactic complexity. The results only partially support the idea that socio-demographic triggers influence morphosyntactic complexity in English varieties, i.e., speaker-related triggers turn out to be negative but non-significant. Yet, net migration rate shows a positive significant effect on morphological complexity which needs to be seen in the global context of English as a commodity and unequal access to English. I thus argue that socioeconomic triggers are better predictors for complexity than demographic speaker numbers. In sum, the paper opens up new horizons for research on language complexity.

Journal Article

Share this book

Add to My Shelf

A computational literature review of football performance analysis through probabilistic topic modeling

in Analysis , Classification , Coherence

2022

This research aims to illustrate the potential use of concepts, techniques, and mining process tools to improve the systematic review process. Thus, a review was performed on two online databases (Scopus and ISI Web of Science) from 2012 to 2019. A total of 9649 studies were identified, which were analyzed using probabilistic topic modeling procedures within a machine learning approach. The Latent Dirichlet Allocation method, chosen for modeling, required the following stages: 1) data cleansing, and 2) data modeling into topics for coherence and perplexity analysis. All research was conducted according to the standards of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses in a fully computerized way. The computational literature review is an integral part of a broader literature review process. The results presented met three criteria: (1) literature review for a research area, (2) analysis and classification of journals, and (3) analysis and classification of academic and individual research teams. The contribution of the article is to demonstrate how the publication network is formed in this particular field of research, and how the content of abstracts can be automatically analyzed to provide a set of research topics for quick understanding and application in future projects.

Journal Article

Share this book

Add to My Shelf

Neural machine translation in foreign language teaching and learning: a systematic review

by Klimova, Blanka , Sanchez-Stockhammer, Christina , Lehr, Caroline in Artificial Intelligence , Best practice , Best Practices

2023

Nowadays, hardly anyone working in the field of foreign language teaching and learning can imagine life without machine translation (MT) tools. Thanks to the rapid development of artificial intelligence, MT now most widely assumes a new form, the so-called Neural Machine Translation (NMT), which offers the potential for a wide application in foreign language learning (FLL). Therefore, the purpose of this review study is to explore different approaches to the efficient implementation of NMT into FLL and provide specific pedagogical implications for best practices. The PRISMA methodology for systematic reviews and meta-analyses was strictly followed. The search was conducted in two well-established databases, specifically Scopus and Web of Science, to generate sufficient data from research articles for further analysis. The findings of this systematic review indicate that NMT is an efficient tool for developing both productive (speaking and writing) and receptive (reading and listening) language skills, including mediation skills, which are relevant for translation. Moreover, the results show that NMT tools are especially suitable for advanced learners of L2, whose higher proficiency level enables them to critically reflect on the output of NMT texts more than beginners or lower-intermediate learners. Thus, the findings of this review study reveal that NMT has valuable implications for L2 pedagogy since it can serve as a very powerful online reference tool for FLL provided that teachers introduce students to its benefits but also limitations by implementing various teaching approaches.

Journal Article

Share this book

Add to My Shelf

AI-Assisted Corpus Linguistics: Integrating NLP Models Into Corpus Analysis

by Al-Qarni, Abdullah Saad in Accuracy , Algorithms , Analysis

2026

Integrating natural language processing (NLP) and artificial intelligence (AI) models into corpus linguistics has opened new avenues for linguistic analysis, yet their suitability for rigorous academic research remains debated due to issues like opacity and interpretability. This systematic review explores how NLP models transform traditional corpus linguistics methodologies, focusing on their applications, benefits, and challenges. Employing a PRISMA-guided approach, the study reviewed peer-reviewed literature from 2013 to 2025 across databases like Scopus and ACL Anthology, using keywords such as “AI in corpus linguistics” and “NLP corpus analysis”. Inclusion criteria targeted studies applying NLP models (e.g., BERT, GPT) to linguistic tasks, resulting in 12 selected studies after screening 922 records. A quality assessment using the CASP checklist ensured robustness, followed by thematic synthesis of findings. Results highlight that NLP models enhance corpus analysis by automating tasks like keyword extraction and pragmatic annotation, while offering scalability and semantic depth. Applications span discourse analysis, diachronic studies, and sociolinguistic variation, supported by tools like CorpusChat and Hugging Face Transformers. However, challenges include model biases, lack of transparency, and domain mismatch. The study explores that AI-driven NLP models significantly advance corpus linguistics but require addressing ethical, privacy, and reproducibility concerns to ensure academic rigor. Future research should focus on developing domain-specific models and enhancing interpretability to fully harness AI’s potential in linguistic studies.

Journal Article

Share this book

Add to My Shelf

What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature

by Lo, Chung Kwan in Academic Achievement , Artificial intelligence , Brain

2023

An artificial intelligence-based chatbot, ChatGPT, was launched in November 2022 and is capable of generating cohesive and informative human-like responses to user input. This rapid review of the literature aims to enrich our understanding of ChatGPT’s capabilities across subject domains, how it can be used in education, and potential issues raised by researchers during the first three months of its release (i.e., December 2022 to February 2023). A search of the relevant databases and Google Scholar yielded 50 articles for content analysis (i.e., open coding, axial coding, and selective coding). The findings of this review suggest that ChatGPT’s performance varied across subject domains, ranging from outstanding (e.g., economics) and satisfactory (e.g., programming) to unsatisfactory (e.g., mathematics). Although ChatGPT has the potential to serve as an assistant for instructors (e.g., to generate course materials and provide suggestions) and a virtual tutor for students (e.g., to answer questions and facilitate collaboration), there were challenges associated with its use (e.g., generating incorrect or fake information and bypassing plagiarism detectors). Immediate action should be taken to update the assessment methods and institutional policies in schools and universities. Instructor training and student education are also essential to respond to the impact of ChatGPT on the educational environment.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter