Catalogue Search | MBRL

Automatic detection of cyberbullying in social media text

by Van Hee, Cynthia , Desmet, Bart , Lefever, Els in Annotations , Artificial intelligence , Bullying

2018

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

Journal Article

Share this book

Add to My Shelf

SENTiVENT

by Hoste, Véronique , Jacobs, Gilles in Annotations , Companies , Computational Linguistics

2022

We present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged F₁-score of 59% validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.

Journal Article

Share this book

Add to My Shelf

Automatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text?

by Hoste, Véronique , Jacobs, Gilles , Cynthia Van Hee in Aggressiveness , Annotations , Automation

2022

Successful prevention of cyberbullying depends on the adequate detection of harmful messages. Given the impossibility of human moderation on the Social Web, intelligent systems are required to identify clues of cyberbullying automatically. Much work on cyberbullying detection focuses on detecting abusive language without analyzing the severity of the event nor the participants involved. Automatic analysis of participant roles in cyberbullying traces enables targeted bullying prevention strategies. In this paper, we aim to automatically detect different participant roles involved in textual cyberbullying traces, including bullies, victims, and bystanders. We describe the construction of two cyberbullying corpora (a Dutch and English corpus) that were both manually annotated with bullying types and participant roles and we perform a series of multiclass classification experiments to determine the feasibility of text-based cyberbullying participant role detection. The representative datasets present a data imbalance problem for which we investigate feature filtering and data resampling as skew mitigation techniques. We investigate the performance of feature-engineered single and ensemble classifier setups as well as transformer-based pretrained language models (PLMs). Cross-validation experiments revealed promising results for the detection of cyberbullying roles using PLM fine-tuning techniques, with the best classifier for English (RoBERTa) yielding a macro-averaged \\({F_1}\\)-score of 55.84%, and the best one for Dutch (RobBERT) yielding an \\({F_1}\\)-score of 56.73%. Experiment replication data and source code are available at https://osf.io/nb2r3.

Journal Article

Share this book

Add to My Shelf

Fine-Grained Implicit Sentiment in Financial News: Uncovering Hidden Bulls and Bears

by Hoste, Véronique , Jacobs, Gilles in Annotations , Attitudes , Classification

2021

The field of sentiment analysis is currently dominated by the detection of attitudes in lexically explicit texts such as user reviews and social media posts. In objective text genres such as economic news, indirect expressions of sentiment are common. Here, a positive or negative attitude toward an entity must be inferred from connotational or real-world knowledge. To capture all expressions of subjectivity, a need exists for fine-grained resources and approaches for implicit sentiment analysis. We present the SENTiVENT corpus of English business news that contains token-level annotations for target spans, polar spans, and implicit polarity (positive, negative, or neutral investor sentiment, respectively). We both directly annotate polar expressions and induce them from existing schema-based event annotations to obtain event-implied implicit sentiment tuples. This results in a large dataset of 12,400 sentiment–target tuples in 288 fully annotated articles. We validate the created resource with an inter-annotator agreement study and a series of coarse- to fine-grained supervised deep-representation-learning experiments. Agreement scores show that our annotations are of substantial quality. The coarse-grained experiments involve classifying the positive, negative, and neutral polarity of known polar expressions and, in clause-based experiments, the detection of positive, negative, neutral, and no-polarity clauses. The gold coarse-grained experiments obtain decent performance (76% accuracy and 63% macro-F1) and clause-based detection shows decreased performance (65% accuracy and 57% macro-F1) with the confusion of neutral and no-polarity. The coarse-grained results demonstrate the feasibility of implicit polarity classification as operationalized in our dataset. In the fine-grained experiments, we apply the grid tagging scheme unified model for triplet extraction, which obtains state-of-the-art performance on explicit sentiment in user reviews. We observe a drop in performance on our implicit sentiment corpus compared to the explicit benchmark (22% vs. 76% F1). We find that the current models for explicit sentiment are not directly portable to our implicit task: the larger lexical variety within implicit opinion expressions causes lexical data scarcity. We identify common errors and discuss several recommendations for implicit fine-grained sentiment analysis. Data and source code are available.

Journal Article

Share this book

Add to My Shelf

Current limitations in cyberbullying detection

by Van Hee, Cynthia , Verhoeven, Ben , Lefever, Els in Affordability , Bullying , Classifiers

2021

The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

Journal Article

Share this book

Add to My Shelf

Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

by Verhoeven, Ben , Lefever, Els , Desmet, Bart in Bullying , Classifiers , Computer simulation

2019

The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

Paper

Share this book

Add to My Shelf

Automatic Detection of Cyberbullying in Social Media Text

by Desmet, Bart , Lefever, Els , Verhoeven, Ben in Annotations , Bullying , Cyberbullying

2018

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a training corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for this particular task. Experiments on a holdout test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1-score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems based on keywords and word unigrams.

Paper

Share this book

Add to My Shelf

Mirativity and rhetorical structure

by Kimps, Ditte , Davidse, Kristin , Gentens, Caroline

2016

Book Chapter

Share this book

Add to My Shelf

Dysregulation of kynurenine metabolism is related to proinflammatory cytokines, attention, and prefrontal cortex volume in schizophrenia

by Weickert, Cynthia Shannon , Lim, Chai K , Weickert, Thomas W in Astrocytes , Brain , Cognition

2020

The kynurenine pathway (KP) of tryptophan (TRP) catabolism links immune system activation with neurotransmitter signaling. The KP metabolite kynurenic acid (KYNA) is increased in the brains of people with schizophrenia. We tested the extent to which: (1) brain KP enzyme mRNAs, (2) brain KP metabolites, and (3) plasma KP metabolites differed on the basis of elevated cytokines in schizophrenia vs. control groups and the extent to which plasma KP metabolites were associated with cognition and brain volume in patients displaying elevated peripheral cytokines. KP enzyme mRNAs and metabolites were assayed in two independent postmortem brain samples from a total of 71 patients with schizophrenia and 72 controls. Plasma KP metabolites, cognition, and brain volumes were measured in an independent cohort of 96 patients with schizophrenia and 81 healthy controls. Groups were stratified based on elevated vs. normal proinflammatory cytokine mRNA levels. In the prefrontal cortex (PFC), kynurenine (KYN)/TRP ratio, KYNA levels, and mRNA for enzymes, tryptophan dioxygenase (TDO) and kynurenine aminotransferases (KATI/II), were significantly increased in the high cytokine schizophrenia subgroup. KAT mRNAs significantly correlated with mRNA for glial fibrillary acidic protein in patients. In plasma, the high cytokine schizophrenia subgroup displayed an elevated KYN/TRP ratio, which correlated inversely with attention and dorsolateral prefrontal cortex (DLPFC) volume. This study provides further evidence for the role of inflammation in a subgroup of patients with schizophrenia and suggests a molecular mechanism through which inflammation could lead to schizophrenia. Proinflammatory cytokines may elicit conversion of TRP to KYN in the periphery and increase the N-methyl-d-aspartate receptor antagonist KYNA via increased KAT mRNA and possibly more enzyme synthesis activity in brain astrocytes, leading to DLPFC volume loss, and attention impairment in schizophrenia.

Journal Article

Share this book

Add to My Shelf

Effects of stress associated with academic examination on the kynurenine pathway profile in healthy students

by Myint, Kyaimon , Hoe, See Ziau , Lam, Sau Kuen in Adaptation , Astrocytes , Bioinformatics

2021

The effects of stress on the neuroendocrine, central nervous and immune systems are extremely complex. The kynurenine pathway (KP) of the tryptophan metabolism is recognised as a cross-link between the neuroendocrine- and immune systems. However, the effects of acute stress from everyday life on KP activation have not yet been studied. This study aims to investigate changes in the levels of the KP neuroactive metabolites and cytokines in response to stress triggered by academic examinations. Ninety-two healthy first year medical students benevolently participated in the study. Parameters were measured pre- examination, which is considered to be a high-stress period, and post-examination, as a low-stress period. Stress induced by academic examinations significantly increases the perceived stress scores ( p <0.001), serum cortisol levels ( p <0.001) and brain-derived neurotrophic factor (BDNF) levels ( p <0.01). It decreased IL-10 levels ( p <0.05) but had no effect on IL-6 and TNF-alpha levels. Only the KP neuroactive metabolite, 3-hydroxykynurenine (3-HK) significantly increased ( p <0.01) in the post-examination period. In addition, the stress scores positively correlated with the levels of cortisol ( r 2 = 0.297, p <0.01) at post examination. Acute stress triggered by academic examinations increases cortisol and BDNF production and suppresses the anti-inflammatory cytokine, IL-10, but did not increase significantly the levels of other pro-inflammatory cytokines, tryptophan, kynurenine and downstream KP metabolites. The concomitant increased levels of BDNF under the duress of acute examination stress appear to limit the levels pro-inflammatory markers, which may attenuate the action of cortisol and the neuroinflammatory branch of the KP.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter