Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,421
result(s) for
"Standard Dialects"
Sort by:
Persistent features - Corpus-based evidence for reallocation processes in German
2021
This study aims at tracing a reallocation process of a grammatical feature alongside the dialect-standard axis with the aid of corpus linguistics methods; more precisely with an integrative application of quantitative and qualitative approaches. The phenomenon under investigation is articles without the definiteness marker d- in German, usually ascribed to the Bavarian dialect area. Analyses show, however, that this apparently dialectal feature diffuses to other communication settings closer to the intended standard language use. This process is accompanied by a refunctionalisation of reduced article forms, indicating the relevance of language-internal relations for reallocation of grammatical features. The methodical approach should be easily applicable to other variants and – as many European languages show a diaglossic repertoire - relevant to other languages as well.
Journal Article
A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units
by
Baniata, Laith H.
,
Ampomah, Isaac. K. E.
,
Park, Seyoung
in
Arabic dialects
,
Arabic language
,
Colloquial language
2021
Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).
Journal Article
The elevation of Sepedi from a dialect to an official standard language: Cultural and economic power and political influence matter
by
Rakgogo, Tebogo J.
,
Zungu, Evangeline B.
in
African languages
,
Collaboration
,
College students
2022
This study explored the role played by economic, cultural, and political power and influence when a particular dialect was elevated to the status of an official standard language. This was a qualitative study that employed text analysis where journal articles, dissertations, theses, academic books and Parliamentary Joint Constitutional Review minutes were considered for data collection and analysis. In order to supplement the above-mentioned method, 267 research participants involving students (undergraduate and postgraduate) and lecturers from the selected five South African universities, including members of the language authorities, were also invited to participate in the study. Self-administered survey questionnaires and face-to-face interviews were chosen as qualitative methods for data collection. From a dialectal point of view, this study indicated that all official standard languages were dialects before. However, these dialects were considered superior and elevated to the status of official languages because of socio-economic power and political influence. This article further recorded that the status type of language planning in the South African context is quite political in nature, not less linguistic. It was against this background that the researchers claim that there is no official standard language that was not a dialect before.
Journal Article
Phonological awareness in Arabic: the role of phonological distance, phonological-unit size, and SES
by
Saiegh-Haddad Elinor
,
Schiff, Rachel
,
Shahbari-Kassem Abeer
in
Arabic language
,
Bilingualism
,
Children
2020
The study tested phonological awareness in a cross-sectional sample of 200 Arabic-speaking 2nd, 4th, 6th, 8th, and 10th graders from low and mid-high Socio-Economic Status (SES). Participants were native speakers of a local dialect of Palestinian Arabic spoken in the north of Israel. Twelve phonological awareness tasks were administered: six of them included stimuli that have an identical form in Standard Arabic and in the spoken dialect (hereafter, SpA words; e.g., /sɑʒɑd/ ‘knelt’) and six used StA words with a unique form different from the one used in the dialect (hereafter, StA words; e.g., /ʔɑχɑð/ ‘took’). Three tasks (blending, segmentation, deletion) were developed for each set of words to test syllable awareness and three additional ones to test phoneme awareness. Repeated measure ANOVAs showed a cross-sectional growth in syllable and phoneme awareness across grades, as well as significant differences between children from low versus mid-high SES. The results also showed a consistent effect of phonological distance on phonological awareness across all tasks and in both groups with awareness of SpA words higher than StA words. At the same time, the impact of phonological distance was more prominent in children from low SES as against mid-high SES, in phoneme awareness as against syllable awareness, and in segmentation and deletion tasks as against blending tasks. The results underscore the roles of item-based properties of phonological distance and phonological-unit size, as well as the role of participant-based characteristics of SES in phonological awareness in Arabic diglossia.
Journal Article
MIND your language(s): Recognizing Minority, Indigenous, Non-standard(ized), and Dialect variety usage in “monolinguals”
2023
While Psychology research in general has been criticized for oversampling from WEIRD (Western, Educated, Industrialized, Rich, Democratic) populations, Psycholinguistics has a problem with conducting a large amount of research on a relatively small number of languages. Yet even within WEIRD environments, the experiences of speakers of Minority, Indigenous, Non-standard(ized), and Dialect (MIND) varieties are not always captured alongside their use of a more prestigious standard language. This position piece will provide a case study of one such variety: Scots, a Germanic variety spoken in Scotland, which is often considered “bad English.” However, its speakers display cognitive characteristics of bilingualism despite often regarding themselves as monolingual due to sociolinguistic factors. Such factors include social prestige and language ideology, as well as linguistic distance. In doing so, this paper introduces a new acronym encouraging researchers to MIND their language – by developing more inclusive ways of capturing the linguistic experiences of MIND speakers, to move away from binary distinctions of “bilingual” and “monolingual,” and to recognize that not all varieties are afforded the status of language, nor do many multilinguals consider themselves as anything other than monolingual.
Journal Article
Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation
by
Bayomi, Hanaa
,
Abdou, Sherif Mahdy
,
Wassif, Khaled Tawfik
in
639/705
,
639/705/1046
,
639/705/117
2024
Machine translation for low-resource languages poses significant challenges, primarily due to the limited availability of data. In recent years, unsupervised learning has emerged as a promising approach to overcome this issue by aiming to learn translations between languages without depending on parallel data. A wide range of methods have been proposed in the literature to address this complex problem. This paper presents an in-depth investigation of semi-supervised neural machine translation specifically focusing on translating Arabic dialects, particularly Egyptian, to Modern Standard Arabic. The study employs two distinct datasets: one parallel dataset containing aligned sentences in both dialects, and a monolingual dataset where the source dialect is not directly connected to the target language in the training data. Three different translation systems are explored in this study. The first is an attention-based sequence-to-sequence model that benefits from the shared vocabulary between the Egyptian dialect and Modern Arabic to learn word embeddings. The second is an unsupervised transformer model that depends solely on monolingual data, without any parallel data. The third system starts with the parallel dataset for an initial supervised learning phase and then incorporates the monolingual data during the training process.
Journal Article
Language and family entrepreneurship: empirical research based on micro survey data of CFPS in China
2025
Language serves as a vital link between individuals. In a multi-ethnic country like China, significant differences exist between Standard Mandarin and various regional dialects, which can influence family entrepreneurs’ access to entrepreneurial resources and information. This paper examines the theoretical and empirical impact of Standard Mandarin and dialects on family entrepreneurship choice and performance, using microdata from the China Family Panel Studies (CFPS). The results show that families who predominantly use Standard Mandarin have a higher probability of starting a business, with cognitive competence and social networks as two important channels through which language influences the likelihood of entrepreneurship. Specifically, the effect of Standard Mandarin on urban families’ entrepreneurship choice is more significant, whereas the impact on rural families is not as pronounced. Regardless of whether they are in urban or rural areas, speaking a dialect is more beneficial for entrepreneurial families in terms of integrating into the local community, leading to better entrepreneurial performance. Further analysis reveals that in most of the ten dialect regions of China, Standard Mandarin has a significant positive impact on family entrepreneurship choice, but a significant negative impact on family entrepreneurial performance. In high-income, young, eastern, and central region families, the probability of starting businesses is higher for families using Standard Mandarin, while dialects have a significant positive impact on family entrepreneurial performance in most sub-samples.
Journal Article
Creation of annotated country-level dialectal Arabic resources: An unsupervised approach
2022
The wide usage of multiple spoken Arabic dialects on social networking sites stimulates increasing interest in Natural Language Processing (NLP) for dialectal Arabic (DA). Arabic dialects represent true linguistic diversity and differ from modern standard Arabic (MSA). In fact, the complexity and variety of these dialects make it insufficient to build one NLP system that is suitable for all of them. In comparison with MSA, the available datasets for various dialects are generally limited in terms of size, genre and scope. In this article, we present a novel approach that automatically develops an annotated country-level dialectal Arabic corpus and builds lists of words that encompass 15 Arabic dialects. The algorithm uses an iterative procedure consisting of two main components: automatic creation of lists for dialectal words and automatic creation of annotated Arabic dialect identification corpus. To our knowledge, our study is the first of its kind to examine and analyse the poor performance of the MSA part-of-speech tagger on dialectal Arabic contents and to exploit that in order to extract the dialectal words. The pointwise mutual information association measure and the geographical frequency of word occurrence online are used to classify dialectal words. The annotated dialectal Arabic corpus (Twt15DA), built using our algorithm, is collected from Twitter and consists of 311,785 tweets containing 3,858,459 words in total. We randomly selected a sample of 75 tweets per country, 1125 tweets in total, and conducted a manual dialect identification task by native speakers. The results show an average inter-annotator agreement score equal to 64%, which reflects satisfactory agreement considering the overlapping features of the 15 Arabic dialects.
Journal Article
Arabic Automatic Speech Recognition: A Systematic Literature Review
by
Dhouib, Amira
,
Khribi, Mohamed Koutheair
,
El Ghoul, Oussama
in
Acknowledgment
,
Arabic language
,
Arabic language processing
2022
Automatic Speech Recognition (ASR), also known as Speech-To-Text (STT) or computer speech recognition, has been an active field of research recently. This study aims to chart this field by performing a Systematic Literature Review (SLR) to give insight into the ASR studies proposed, especially for the Arabic language. The purpose is to highlight the trends of research about Arabic ASR and guide researchers with the most significant studies published over ten years from 2011 to 2021. This SLR attempts to tackle seven specific research questions related to the toolkits used for developing and evaluating Arabic ASR, the supported type of the Arabic language, the used feature extraction/classification techniques, the type of speech recognition, the performance of Arabic ASR, the existing gaps facing researchers, along with some future research. Across five databases, 38 studies met our defined inclusion criteria. Our results showed different open-source toolkits to support Arabic speech recognition. The most prominent ones were KALDI, HTK, then CMU Sphinx toolkits. A total of 89.47% of the retained studies cover modern standard Arabic, whereas 26.32% of them were dedicated to different dialects of Arabic. MFCC and HMM were presented as the most used feature extraction and classification techniques, respectively: 63% of the papers were based on MFCC and 21% were based on HMM. The review also shows that the performance of Arabic ASR systems depends mainly on different criteria related to the availability of resources, the techniques used for acoustic modeling, and the used datasets.
Journal Article
Recognizing two dialects in one written form: A Stroop study
by
van Heuven, Vincent J.
,
Schiller, Niels O.
,
Wu, Junru
in
Asian cultural groups
,
Asian History
,
Bilingualism
2024
This study aims to examine the influence of dialectal experience on logographic visual word recognition. Two groups of Chinese monolectals and three groups of Chinese bi-dialectals performed Stroop color-naming in Standard Chinese (SC), and two of the bi-dialectal groups also in their regional dialects. The participant groups differed in dialectal experiences. The ink-character relation was manipulated in semantics, segments, and tones separately, as congruent, competing, or different, yielding ten Stroop conditions for comparison. All the groups showed Stroop interference for the conditions of segmental competition, as well as evidence for semantic activation by the characters. Bi-dialectal experience, even receptive, could benefit conflict resolution in the Stroop task. Chinese characters can automatically activate words in both dialects. Comparing naming in Standard Chinese and naming in the bi-dialectals’ regional dialects, still, a regional-dialect disadvantage suggests that the activation is biased with literacy and lexico-specific inter-dialectal relations.
Journal Article