Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
330
result(s) for
"N-Gram language models"
Sort by:
Hate speech detection and racial bias mitigation in social media based on BERT model
by
Crespi, Noël
,
Farahbakhsh, Reza
,
Mozafari, Marzieh
in
Abuse
,
African American English
,
African Americans
2020
Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. In this paper, we first introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers) and evaluate the proposed model on two publicly available datasets that have been annotated for racism, sexism, hate or offensive content on Twitter. Next, we introduce a bias alleviation mechanism to mitigate the effect of bias in training set during the fine-tuning of our pre-trained BERT-based model for hate speech detection. Toward that end, we use an existing regularization method to reweight input samples, thereby decreasing the effects of high correlated training set' s n-grams with class labels, and then fine-tune our pre-trained BERT-based model with the new re-weighted samples. To evaluate our bias alleviation mechanism, we employed a cross-domain approach in which we use the trained classifiers on the aforementioned datasets to predict the labels of two new datasets from Twitter, AAE-aligned and White-aligned groups, which indicate tweets written in African-American English (AAE) and Standard American English (SAE), respectively. The results show the existence of systematic racial bias in trained classifiers, as they tend to assign tweets written in AAE from AAE-aligned group to negative classes such as racism, sexism, hate, and offensive more often than tweets written in SAE from White-aligned group. However, the racial bias in our classifiers reduces significantly after our bias alleviation mechanism is incorporated. This work could institute the first step towards debiasing hate speech and abusive language detection systems.
Journal Article
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
2020
We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),
a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4%. We evaluate
-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands.
Journal Article
Comparison of text preprocessing methods
2023
Text preprocessing is not only an essential step to prepare the corpus for modeling but also a key area that directly affects the natural language processing (NLP) application results. For instance, precise tokenization increases the accuracy of part-of-speech (POS) tagging, and retaining multiword expressions improves reasoning and machine translation. The text corpus needs to be appropriately preprocessed before it is ready to serve as the input to computer models. The preprocessing requirements depend on both the nature of the corpus and the NLP application itself, that is, what researchers would like to achieve from analyzing the data. Conventional text preprocessing practices generally suffice, but there exist situations where the text preprocessing needs to be customized for better analysis results. Hence, we discuss the pros and cons of several common text preprocessing methods: removing formatting, tokenization, text normalization, handling punctuation, removing stopwords, stemming and lemmatization, n-gramming, and identifying multiword expressions. Then, we provide examples of text datasets which require special preprocessing and how previous researchers handled the challenge. We expect this article to be a starting guideline on how to select and fine-tune text preprocessing methods.
Journal Article
How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN
by
Linzen, Tal
,
McCoy, R. Thomas
,
Celikyilmaz, Asli
in
Analogy (Language change)
,
Automatic text generation
,
Copying
2023
Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? To tease apart these possibilities, we introduce RAVEN, a suite of analyses for assessing the novelty of generated text, focusing on sequential structure (
-grams) and syntactic structure. We apply these analyses to four neural language models trained on English (an LSTM, a Transformer, Transformer-XL, and GPT-2). For local structure—e.g., individual dependencies—text generated with a standard sampling scheme is substantially less novel than our baseline of human-generated text from each model’s test set. For larger-scale structure—e.g., overall sentence structure—model-generated text is as novel or even more novel than the human-generated baseline, but models still sometimes copy substantially, in some cases duplicating passages over 1,000 words long from the training set. We also perform extensive manual analysis, finding evidence that GPT-2 uses both compositional and analogical generalization mechanisms and showing that GPT-2’s novel text is usually well-formed morphologically and syntactically but has reasonably frequent semantic issues (e.g., being self-contradictory).
Journal Article
The Potential of Chatbots for Emotional Support and Promoting Mental Well-Being in Different Cultures: Mixed Methods Study
by
Choi, Junghoi
,
Jung, Chani
,
Cha, Meeyoung
in
Activities of daily living
,
Analysis
,
Artificial intelligence
2023
Artificial intelligence chatbot research has focused on technical advances in natural language processing and validating the effectiveness of human-machine conversations in specific settings. However, real-world chat data remain proprietary and unexplored despite their growing popularity, and new analyses of chatbot uses and their effects on mitigating negative moods are urgently needed. In this study, we investigated whether and how artificial intelligence chatbots facilitate the expression of user emotions, specifically sadness and depression. We also examined cultural differences in the expression of depressive moods among users in Western and Eastern countries. This study used SimSimi, a global open-domain social chatbot, to analyze 152,783 conversation utterances containing the terms \"depress\" and \"sad\" in 3 Western countries (Canada, the United Kingdom, and the United States) and 5 Eastern countries (Indonesia, India, Malaysia, the Philippines, and Thailand). Study 1 reports new findings on the cultural differences in how people talk about depression and sadness to chatbots based on Linguistic Inquiry and Word Count and n-gram analyses. In study 2, we classified chat conversations into predefined topics using semisupervised classification techniques to better understand the types of depressive moods prevalent in chats. We then identified the distinguishing features of chat-based depressive discourse data and the disparity between Eastern and Western users. Our data revealed intriguing cultural differences. Chatbot users in Eastern countries indicated stronger emotions about depression than users in Western countries (positive: P<.001; negative: P=.01); for example, Eastern users used more words associated with sadness (P=.01). However, Western users were more likely to share vulnerable topics such as mental health (P<.001), and this group also had a greater tendency to discuss sensitive topics such as swear words (P<.001) and death (P<.001). In addition, when talking to chatbots, people expressed their depressive moods differently than on other platforms. Users were more open to expressing emotional vulnerability related to depressive or sad moods to chatbots (74,045/148,590, 49.83%) than on social media (149/1978, 7.53%). Chatbot conversations tended not to broach topics that require social support from others, such as seeking advice on daily life difficulties, unlike on social media. However, chatbot users acted in anticipation of conversational agents that exhibit active listening skills and foster a safe space where they can openly share emotional states such as sadness or depression. The findings highlight the potential of chatbot-assisted mental health support, emphasizing the importance of continued technical and policy-wise efforts to improve chatbot interactions for those in need of emotional assistance. Our data indicate the possibility of chatbots providing helpful information about depressive moods, especially for users who have difficulty communicating emotions to other humans.
Journal Article
Research-paper recommender systems: a literature survey
2016
In the last 16 years, more than 200 research articles were published about
research-paper recommender systems
. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and approaches. We found that more than half of the recommendation approaches applied content-based filtering (55 %). Collaborative filtering was applied by only 18 % of the reviewed approaches, and graph-based recommendations by 16 %. Other recommendation concepts included stereotyping, item-centric recommendations, and hybrid recommendations. The content-based filtering approaches mainly utilized papers that the users had authored, tagged, browsed, or downloaded. TF-IDF was the most frequently applied weighting scheme. In addition to simple terms, n-grams, topics, and citations were utilized to model users’ information needs. Our review revealed some shortcomings of the current research. First, it remains unclear which recommendation concepts and approaches are the most promising. For instance, researchers reported different results on the performance of content-based and collaborative filtering. Sometimes content-based filtering performed better than collaborative filtering and sometimes it performed worse. We identified three potential reasons for the ambiguity of the results. (A) Several evaluations had limitations. They were based on strongly pruned datasets, few participants in user studies, or did not use appropriate baselines. (B) Some authors provided little information about their algorithms, which makes it difficult to re-implement the approaches. Consequently, researchers use different implementations of the same recommendations approaches, which might lead to variations in the results. (C) We speculated that minor variations in datasets, algorithms, or user populations inevitably lead to strong variations in the performance of the approaches. Hence, finding the most promising approaches is a challenge. As a second limitation, we noted that many authors neglected to take into account factors other than accuracy, for example overall user satisfaction. In addition, most approaches (81 %) neglected the user-modeling process and did not infer information automatically but let users provide keywords, text snippets, or a single paper as input. Information on runtime was provided for 10 % of the approaches. Finally, few research papers had an impact on research-paper recommender systems in practice. We also identified a lack of authority and long-term research interest in the field: 73 % of the authors published no more than one paper on research-paper recommender systems, and there was little cooperation among different co-author groups. We concluded that several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.
Journal Article
Lexical Sophistication as a Multidimensional Phenomenon: Relations to Second Language Lexical Proficiency, Development, and Writing Quality
by
KIM, MINKYUNG
,
CROSSLEY, SCOTT A.
,
KYLE, KRISTOPHER
in
Competence
,
Components
,
corpus linguistics
2018
This study conceptualizes lexical sophistication as a multidimensional phenomenon by reducing numerous lexical features of lexical sophistication into 12 aggregated components (i.e., dimensions) via a principal component analysis approach. These components were then used to predict second language (L2) writing proficiency levels, holistic lexical proficiency scores, and longitudinal lexical growth. The results from regression analyses indicated that 5 lexical components (i.e., bigram and trigram strength of directional association, content word properties, bigram mutual information, bigram and trigram proportions, and word specificity) explained 16.1% and 31.0% of the variance of L2 writing proficiency and lexical proficiency, respectively. Two additional components (i.e., word acquisition properties and content word frequency) explained an additional 8.5% of the variance of L2 writing proficiency. Six lexical components (i.e., bigram and trigram proportions, word acquisition properties, content word frequency, bigram frequency and range, content word properties, and function word frequency and range) showed significant developmental trends in L2 beginning learners over a year-long period. These findings provide information about the multidimensional nature of lexical sophistication by expanding its scope beyond frequency and toward other primary dimensions that include various lexical and phrasal features such as concreteness, orthographic density, hypernymy, and n-gram frequency and association strength. (Verlag).
Journal Article
Metaverse through the prism of power and addiction: what will happen when the virtual world becomes more attractive than reality?
2022
New technologies are emerging at a fast pace without being properly analyzed in terms of their social impact or adequately regulated by societies. One of the biggest potentially disruptive technologies for the future is the metaverse, or the new Internet, which is being developed by leading tech companies. The idea is to create a virtual reality universe that would allow people to meet, socialize, work, play, entertain, and create.Methods coming from future studies are used to analyze expectations and narrative building around the metaverse. Additionally, it is examined how metaverse could shape the future relations of power and levels of media addiction in the society.Hype and disappointment dynamics created after the video presentation of meta’s CEO Mark Zuckerberg have been found to affect the present, especially in terms of certainty and designability. This idea is supported by a variety of data, including search engine n-grams, trends in the diffusion of NFT technology, indications of investment interest, stock value statistics, and so on. It has been found that discourse in the mentioned presentation of the metaverse contains elements of optimism, epochalism, and inventibility, which corresponds to the concept of future essentialism.On the other hand, power relations in society, inquired through the prism of classical theorists, indicate that current trends in the concentration of power among Big Tech could expand even more if the metaverse becomes mainstream. Technology deployed by the metaverse may create an attractive environment that would mimic direct reality and further stimulate media addiction in society.It is proposed that future inquiries examine how virtual reality affects the psychology of individuals and groups, their creative capacity, and imagination. Also, virtual identity as a human right and recommender systems as a public good need to be considered in future theoretical and empirical endeavors.
Journal Article
Detecting AI-generated essays: the ChatGPT challenge
2023
PurposeWith the advent of ChatGPT, a sophisticated generative artificial intelligence (AI) tool, maintaining academic integrity in all educational settings has recently become a challenge for educators. This paper discusses a method and necessary strategies to confront this challenge.Design/methodology/approachIn this study, a language model was defined to achieve high accuracy in distinguishing ChatGPT-generated essays from human written essays with a particular focus on “not falsely” classifying genuinely human-written essays as AI-generated (Negative).FindingsVia support vector machine (SVM) algorithm 100% accuracy was recorded for identifying human generated essays. The author discussed the key use of Recall and F2 score for measuring classification performance and the importance of eliminating False Negatives and making sure that no actual human generated essays are incorrectly classified as AI generated. The results of the proposed model's classification algorithms were compared to those of AI-generated text detection software developed by OpenAI, GPTZero and Copyleaks.Practical implicationsAI-generated essays submitted by students can be detected by teachers and educational designers using the proposed language model and machine learning (ML) classifier at a high accuracy. Human (student)-generated essays can and must be correctly identified with 100% accuracy even if the overall classification accuracy performance is slightly reduced.Originality/valueThis is the first and only study that used an n-gram bag-of-words (BOWs) discrepancy language model as input for a classifier to make such prediction and compared the classification results of other AI-generated text detection software in an empirical way.
Journal Article
Evaluating LLMs’ grammatical error correction performance in learner Chinese
2024
Large language models (LLMs) have recently exhibited significant capabilities in various English NLP tasks. However, their performance in Chinese grammatical error correction (CGEC) remains unexplored. This study evaluates the abilities of state-of-the-art LLMs in correcting learner Chinese errors from a corpus linguistic perspective. The performance of LLMs is assessed using standard evaluation metrics of MaxMatch score. Keyword and key n-gram analyses are conducted to quantitatively explore linguistic features that differentiate LLM outputs from those of human annotators. LLMs’ performance in syntactic and semantic dimensions is further qualitatively analyzed based on these probes of keywords and key n-grams. Results show that LLMs achieve a relatively higher performance in test datasets with multiple annotators and low performance in those with a single annotator. LLMs tend to overcorrect wrong sentences, under the explicit prompt of the “minimal edit” strategy, by using more linguistic devices to generate fluent and grammatical sentences. Furthermore, they struggle with under-correction and hallucination in reasoning-dependent situations. These findings highlight the strengths and limitations of LLMs in CGEC, suggesting that future efforts should focus on refining overcorrection tendencies and improving the handling of complex semantic contexts.
Journal Article