Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Detection of Cyberbullying Patterns in Low Resource Colloquial Roman Urdu Microtext using Natural Language Processing, Machine Learning, and Ensemble Techniques

by Alshahrani, Hani , Memon, Mohsin Ali , Dewani, Amirita , Bhatti, Sania , Sulaiman, Adel , Alghamdi, Abdullah , Shaikh, Asadullah , Hamdi, Mohammed

in Artificial intelligence / Automation / Big Data / Classification / Communication / Computational linguistics / Content analysis / Coronaviruses / COVID-19 / Cyberbullying / cyberbullying detection / Data analysis / Data mining / ensemble learning / Hate speech / Language / Language processing / low resource Roman Urdu language / Machine learning / Medical research / Methods / Multilingualism / Natural language interfaces / Natural language processing / Pandemics / Performance evaluation / Social media / Social networks / Urdu language

2023

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Detection of Cyberbullying Patterns in Low Resource Colloquial Roman Urdu Microtext using Natural Language Processing, Machine Learning, and Ensemble Techniques

by Alshahrani, Hani , Memon, Mohsin Ali , Dewani, Amirita , Bhatti, Sania , Sulaiman, Adel , Alghamdi, Abdullah , Shaikh, Asadullah , Hamdi, Mohammed

2023

Confirm

Do you wish to request the book?

Detection of Cyberbullying Patterns in Low Resource Colloquial Roman Urdu Microtext using Natural Language Processing, Machine Learning, and Ensemble Techniques

by Alshahrani, Hani , Memon, Mohsin Ali , Dewani, Amirita , Bhatti, Sania , Sulaiman, Adel , Alghamdi, Abdullah , Shaikh, Asadullah , Hamdi, Mohammed

2023

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Detection of Cyberbullying Patterns in Low Resource Colloquial Roman Urdu Microtext using Natural Language Processing, Machine Learning, and Ensemble Techniques

Alshahrani, Hani,

Memon, Mohsin Ali,

Dewani, Amirita,

Bhatti, Sania,

Sulaiman, Adel,

Alghamdi, Abdullah,

Shaikh, Asadullah,

Hamdi, Mohammed

2023

Overview

Social media platforms have become a substratum for people to enunciate their opinions and ideas across the globe. Due to anonymity preservation and freedom of expression, it is possible to humiliate individuals and groups, disregarding social etiquette online, inevitably proliferating and diversifying the incidents of cyberbullying and cyber hate speech. This intimidating problem has recently sought the attention of researchers and scholars worldwide. Still, the current practices to sift the online content and offset the hatred spread do not go far enough. One factor contributing to this is the recent prevalence of regional languages in social media, the dearth of language resources, and flexible detection approaches, specifically for low-resource languages. In this context, most existing studies are oriented towards traditional resource-rich languages and highlight a huge gap in recently embraced resource-poor languages. One such language currently adopted worldwide and more typically by South Asian users for textual communication on social networks is Roman Urdu. It is derived from Urdu and written using a Left-to-Right pattern and Roman scripting. This language elicits numerous computational challenges while performing natural language preprocessing tasks due to its inflections, derivations, lexical variations, and morphological richness. To alleviate this problem, this research proposes a cyberbullying detection approach for analyzing textual data in the Roman Urdu language based on advanced preprocessing methods, voting-based ensemble techniques, and machine learning algorithms. The study has extracted a vast number of features, including statistical features, word N-Grams, combined n-grams, and BOW model with TFIDF weighting in different experimental settings using GridSearchCV and cross-validation techniques. The detection approach has been designed to tackle users’ textual input by considering user-specific writing styles on social media in a colloquial and non-standard form. The experimental results show that SVM with embedded hybrid N-gram features produced the highest average accuracy of around 83%. Among the ensemble voting-based techniques, XGboost achieved the optimal accuracy of 79%. Both implicit and explicit Roman Urdu instances were evaluated, and the categorization of severity based on prediction probabilities was performed. Time complexity is also analyzed in terms of execution time, indicating that LR, using different parameters and feature combinations, is the fastest algorithm. The results are promising with respect to standard assessment metrics and indicate the feasibility of the proposed approach in cyberbullying detection for the Roman Urdu language.

Share this book

Add to My Shelf

Publisher

MDPI AG

Subject

Artificial intelligence

/ Computational linguistics

/ Content analysis