Catalogue Search | MBRL

Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey

by Lashkari, Arash Habibi , Vombatkere, Nikhill , He, Xie in author profiling , Authorship , authorship attribution

2024

Over the past few decades, researchers have put their effort and paid significant attention to the authorship attribution field, as it plays an important role in software forensics analysis, plagiarism detection, security attack detection, and protection of trade secrets, patent claims, copyright infringement, or cases of software theft. It helps new researchers understand the state-of-the-art works on authorship attribution methods, identify and examine the emerging methods for authorship attribution, and discuss their key concepts, associated challenges, and potential future work that could help newcomers in this field. This paper comprehensively surveys authorship attribution methods and their key classifications, used feature types, available datasets, model evaluation criteria and metrics, and challenges and limitations. In addition, we discuss the potential future research directions of the authorship attribution field based on the insights and lessons learned from this survey work.

Journal Article

Share this book

Add to My Shelf

Assessment of LSTM, ARABERT and Prompt-Based Learning for Gender Author Profiling in Modern Standard Arabic Language

by Khoudja, Asmaa Mansour , Belkredim, Fatma Zohra , Loukam, Mourad in Accuracy , Arabic language , Deep learning

2024

Author Profiling aims to extract persons’ characteristics (gender, age…) from their writings. This emerging field of NLP poses great challenges for all languages in general and, in particular, for the Modern Standard Arabic Language. This paper presents an assessment study of three state-of-the-art approaches used for gender author profiling, namely, LSTM, ARABERT, and Prompt-Based learning. Using a rich dataset created for this task, our research investigates the effectiveness of these methods in gender identification. Our findings indicate that the ARABERT method obtained the highest scores in terms of accuracy, ranging from 84.6% to 92.4%, and Prompt-Based learning performed competitively compared to ARABERT, with accuracy increasing from 84% to 92.3%. However, while LSTM also showed progress across all batches, it still consistently performed worse than the other two models and reached an accuracy of only 78.5%.

Journal Article

Share this book

Add to My Shelf

Fake News Spreaders Detection: Sometimes Attention Is Not All You Need

by Siino, Marco , La Cascia, Marco , Di Nuovo, Elisa in Algorithms , Artificial intelligence , Chi-square test

2022

Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN, MultiCNN, Bayes, SVM). The CNN tested outperforms all the models tested and, to the best of our knowledge, any existing approach on the same dataset. Fourth, to better understand this result, we conduct a post-hoc analysis as an attempt to investigate the behaviour of the presented best performing black-box model. This study highlights the importance of choosing a suitable classifier given the specific task. To make an educated decision, we propose the use of corpus linguistics techniques. Our results suggest that large pre-trained deep models like Transformers are not necessarily the first choice when addressing a text classification task as the one presented in this article. All the code developed to run our tests is publicly available on GitHub.

Journal Article

Share this book

Add to My Shelf

Politically-oriented information inference from text

by da Silva, Samuel Caetano , Paraboni, Ivandre in Computational linguistics , Inference , Language processing

2023

The inference of politically-oriented information from text data is a popular research topic in Natural Language Processing (NLP) at both text- and author-level. In recent years, studies of this kind have been implemented with the aid of text representations ranging from simple count-based models (e.g., bag-of-words) to sequence-based models built from transformers (e.g., BERT). Despite considerable success, however, we may still ask whether results may be improved further by combining these models with additional text representations. To shed light on this issue, the present work describes a series of experiments to compare a number of strategies for political bias and ideology inference from text data using sequence-based BERT models, syntax-and semantics-driven features, and examines which of these representations (or their combinations) improve overall model accuracy. Results suggest that one particular strategy - namely, the combination of BERT language models with syntactic dependencies - significantly outperforms well-known count- and sequence-based text classifiers alike. In particular, the combined model has been found to improve accuracy across all tasks under consideration, outperforming the SemEval hyperpartisan news detection top-performing system by up to 6%, and outperforming the use of BERT alone by up to 21%, making a potentially strong case for the use of heterogeneous text representations in the present tasks.

Journal Article

Share this book

Add to My Shelf

A transformer fine-tuning strategy for text dialect identification

by Alourani, Abdullah , Shuja, Junaid , Humayun, Mohammad Ali in Accuracy , Arabic language , Artificial Intelligence

2023

Online medical consultation can significantly improve the efficiency of primary health care. Recently, many online medical question–answer services have been developed that connect the patients with relevant medical consultants based on their questions. Considering the linguistic variety in their question, social background identification of patients can improve the referral system by selecting a medical consultant with a similar social origin for efficient communication. This paper has proposed a novel fine-tuning strategy for the pre-trained transformers to identify the social origin of text authors. When fused with the existing adapter model, the proposed methods achieve an overall accuracy of 53.96% for the Arabic dialect identification task on the Nuanced Arabic Dialect Identification (NADI) dataset. The overall accuracy is 0.54% higher than the previous best for the same dataset, which establishes the utility of custom fine-tuning strategies for pre-trained transformer models.

Journal Article

Share this book

Add to My Shelf

Author profiling from Romanized Urdu text using transfer learning models

by Khan, Sajid Ullah , khan, Muhammad Sohail , Ali, Abid in Accuracy , Artificial Intelligence , Classification

2025

This research concentrates on author profiling using transfer learning models for classifying age and gender. The investigation encompassed a diverse set of transfer learning techniques, including Roberta, BERT, ALBERT, Distil BERT, Distil Roberta, ELECTRA, and XLNet. Through meticulous evaluation using metrics such as the Matthews Correlation Coefficient, Accuracy, Precision, Recall, and F1 Score, the study examined the efficacy of these models. The curated dataset was divided for gender and age tasks, resulting in robust gender prediction with the XLNet model and age prediction with the BERT model. Notably, the XLNet model achieved the highest MCC (0.7946), Accuracy (0.8957), Precision (0.8992), Recall (0.8957), and F1 Score (0.8958) values in gender classification, while the BERT model excelled in age prediction with an MCC of (0.7338), Accuracy of (0.8220), Precision of (0.8324), Recall of (0.8220), and F1 Score of (0.8243). Visualized outcomes provide valuable insights into the model’s performance nuances, paving the way for their practical implementation. This research offers novel contributions to author profiling tasks, bridging the gap between theory and real-world applications.

Journal Article

Share this book

Add to My Shelf

Studying scientific migration in Scopus

by Moed, Henk F , Aisati, M’hamed , Plume, Andrew in Authorship , Bibliometrics , Co authorship

2013

An exploration is presented of Scopus as a data source for the study of international scientific migration or mobility for five study countries: Germany, Italy, the Netherlands, UK and USA. It is argued that Scopus author-affiliation linking and author profiling are valuable, crucial tools in the study of this phenomenon. It was found that the UK has the largest degree of outward international migration, followed by The Netherlands, and the USA the lowest. Language similarity between countries is a more important factor in international migration than it is in international co-authorship. During 1999–2010 the Netherlands showed a positive “migration balance” with the UK and a negative one with Germany, suggesting that in the Netherlands there were more Ph.D. students from Germany than there were from the UK, or that for Dutch post docs stage periods in the UK were more attractive than those in Germany. Comparison of bibliometric indicators with OECD statistics provided evidence that differences exist in the way the various study countries measured their number of researchers. The authors conclude that a bibliometric study of scientific migration using Scopus is feasible and provides significant outcomes. They make suggestions for further research.

Journal Article

Share this book

Add to My Shelf

A survey of machine learning-based author profiling from texts analysis in social networks

by Fkih, Fethi , Ouni, Sarra , Omri, Mohamed Nazih in Computer Communication Networks , Computer Science , Data Structures and Information Theory

2023

Recently, online social networks, such as Twitter, Facebook, LinkedIn, etc., have grown exponentially with a large amount of information. These social networks have huge volumes of data, especially in textual form, which are unstructured and anonymous. This type of data usually leads to cybercrimes like cyberbullying, cyberterrorism, etc. and their analysis has nowadays become a serious challenge. From this perspective and to remedy this topical issue, various techniques have been proposed in the literature. Among the proposed solutions, author profiling represents the newest and most adopted technique by most researchers to discover hidden textual information. The objective of this technique is to identify the demographic or psychological aspects (age, sex, personality, mother tongue, etc.) of an author by examining the text that he has published. In recent years, this area of research has attracted many researchers who seek solutions for potential applications in various fields like marketing, computer forensics, security, etc. Within the scope of this article, we describe the author profiling task. Then, we present a brief thematic taxonomy and an illustration of some profiling solutions from the literature. In particular, different machine and deep learning techniques are detailed and discussed. This work also provides an overview of the main approaches, which we have studied in the literature, highlights the weak points and the strong points of each of these approaches. At the end of this study, a discussion of some research questions is presented and some future directions to circumvent the weaknesses detected in the approaches studied are presented in order to motivate academics and practitioners, who are interested in this problem that we assume essential, to advance solutions for profiling perpetrators on social networks.

Journal Article

Share this book

Add to My Shelf

Multidimensional Author Profiling for Social Business Intelligence

by Aramburu, María José , Berlanga, Rafael , Lanza-Cruz, Indira in Business intelligence , Classifiers , Competitive intelligence

2024

This paper presents a novel author profiling method specially aimed at classifying social network users into the multidimensional perspectives for social business intelligence (SBI) applications. In this scenario, being the user profiles defined on demand for each particular SBI application, we cannot assume the existence of labelled datasets for training purposes. Thus, we propose an unsupervised method to obtain the required labelled datasets for training the profile classifiers. Contrary to other author profiling approaches in the literature, we only make use of the users’ descriptions, which are usually part of the metadata posts. We exhaustively evaluated the proposed method under four different tasks for multidimensional author profiling along with state-of-the-art text classifiers. We achieved performances around 88% and 98% of F1 score for a gold standard and a silver standard datasets respectively. Additionally, we compare our results to other supervised approaches previously proposed for two of our tasks, getting very close performances despite using an unsupervised method. To the best of our knowledge, this is the first method designed to label user profiles in an unsupervised way for training profile classifiers with a similar performance to fully supervised ones.

Journal Article

Share this book

Add to My Shelf

Prediction of Author’s Profile basing on Fine-Tuning BERT model

by Bsir, Bassem , Khoufi, Nabil , Zrigui, Mounir in Accuracy , Artificial neural networks , Datasets

2024

The task of author profiling consists in specifying the infer-demographic features’ of the social networks’ users by studying their published content or the interactions between them. In the literature, many research works were conducted to enhance the accuracy of the techniques used in this process. In fact, the existing methods can be divided into two types: simple linear mod-els and complex deep neural network models. Among them, the transformer-based model exhibited the highest efficiency in NLP analysis in several lan-guages (English, German, French, Turk, Arabic, etc.). Despite their good per-formance, these approaches do not cover author profiling analysis and, thus, should be further enhanced. So, we propose in this paper a new deep learning strategy by training a customized transformer-model to learn the optimal fea-tures of our dataset. In this direction, we fine-tune the model by using the trans-fer learning approach to improve the results with random initialization. We have achieved about 79% of accuracy by modifying model to apply the retrain-ing process using PAN 2018 authorship dataset.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter