Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
68
result(s) for
"author profiling"
Sort by:
Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey
by
Lashkari, Arash Habibi
,
Vombatkere, Nikhill
,
He, Xie
in
author profiling
,
Authorship
,
authorship attribution
2024
Over the past few decades, researchers have put their effort and paid significant attention to the authorship attribution field, as it plays an important role in software forensics analysis, plagiarism detection, security attack detection, and protection of trade secrets, patent claims, copyright infringement, or cases of software theft. It helps new researchers understand the state-of-the-art works on authorship attribution methods, identify and examine the emerging methods for authorship attribution, and discuss their key concepts, associated challenges, and potential future work that could help newcomers in this field. This paper comprehensively surveys authorship attribution methods and their key classifications, used feature types, available datasets, model evaluation criteria and metrics, and challenges and limitations. In addition, we discuss the potential future research directions of the authorship attribution field based on the insights and lessons learned from this survey work.
Journal Article
Assessment of LSTM, ARABERT and Prompt-Based Learning for Gender Author Profiling in Modern Standard Arabic Language
by
Khoudja, Asmaa Mansour
,
Belkredim, Fatma Zohra
,
Loukam, Mourad
in
Accuracy
,
Arabic language
,
Deep learning
2024
Author Profiling aims to extract persons’ characteristics (gender, age…) from their writings. This emerging field of NLP poses great challenges for all languages in general and, in particular, for the Modern Standard Arabic Language. This paper presents an assessment study of three state-of-the-art approaches used for gender author profiling, namely, LSTM, ARABERT, and Prompt-Based learning. Using a rich dataset created for this task, our research investigates the effectiveness of these methods in gender identification. Our findings indicate that the ARABERT method obtained the highest scores in terms of accuracy, ranging from 84.6% to 92.4%, and Prompt-Based learning performed competitively compared to ARABERT, with accuracy increasing from 84% to 92.3%. However, while LSTM also showed progress across all batches, it still consistently performed worse than the other two models and reached an accuracy of only 78.5%.
Journal Article
Fake News Spreaders Detection: Sometimes Attention Is Not All You Need
by
Siino, Marco
,
La Cascia, Marco
,
Di Nuovo, Elisa
in
Algorithms
,
Artificial intelligence
,
Chi-square test
2022
Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN, MultiCNN, Bayes, SVM). The CNN tested outperforms all the models tested and, to the best of our knowledge, any existing approach on the same dataset. Fourth, to better understand this result, we conduct a post-hoc analysis as an attempt to investigate the behaviour of the presented best performing black-box model. This study highlights the importance of choosing a suitable classifier given the specific task. To make an educated decision, we propose the use of corpus linguistics techniques. Our results suggest that large pre-trained deep models like Transformers are not necessarily the first choice when addressing a text classification task as the one presented in this article. All the code developed to run our tests is publicly available on GitHub.
Journal Article
Politically-oriented information inference from text
by
da Silva, Samuel Caetano
,
Paraboni, Ivandre
in
Computational linguistics
,
Inference
,
Language processing
2023
The inference of politically-oriented information from text data is a popular research topic in Natural Language Processing (NLP) at both text- and author-level. In recent years, studies of this kind have been implemented with the aid of text representations ranging from simple count-based models (e.g., bag-of-words) to sequence-based models built from transformers (e.g., BERT). Despite considerable success, however, we may still ask whether results may be improved further by combining these models with additional text representations. To shed light on this issue, the present work describes a series of experiments to compare a number of strategies for political bias and ideology inference from text data using sequence-based BERT models, syntax-and semantics-driven features, and examines which of these representations (or their combinations) improve overall model accuracy. Results suggest that one particular strategy - namely, the combination of BERT language models with syntactic dependencies - significantly outperforms well-known count- and sequence-based text classifiers alike. In particular, the combined model has been found to improve accuracy across all tasks under consideration, outperforming the SemEval hyperpartisan news detection top-performing system by up to 6%, and outperforming the use of BERT alone by up to 21%, making a potentially strong case for the use of heterogeneous text representations in the present tasks.
Journal Article
A transformer fine-tuning strategy for text dialect identification
by
Alourani, Abdullah
,
Shuja, Junaid
,
Humayun, Mohammad Ali
in
Accuracy
,
Arabic language
,
Artificial Intelligence
2023
Online medical consultation can significantly improve the efficiency of primary health care. Recently, many online medical question–answer services have been developed that connect the patients with relevant medical consultants based on their questions. Considering the linguistic variety in their question, social background identification of patients can improve the referral system by selecting a medical consultant with a similar social origin for efficient communication. This paper has proposed a novel fine-tuning strategy for the pre-trained transformers to identify the social origin of text authors. When fused with the existing adapter model, the proposed methods achieve an overall accuracy of 53.96% for the Arabic dialect identification task on the Nuanced Arabic Dialect Identification (NADI) dataset. The overall accuracy is 0.54% higher than the previous best for the same dataset, which establishes the utility of custom fine-tuning strategies for pre-trained transformer models.
Journal Article
Author profiling from Romanized Urdu text using transfer learning models
by
Khan, Sajid Ullah
,
khan, Muhammad Sohail
,
Ali, Abid
in
Accuracy
,
Artificial Intelligence
,
Classification
2025
This research concentrates on author profiling using transfer learning models for classifying age and gender. The investigation encompassed a diverse set of transfer learning techniques, including Roberta, BERT, ALBERT, Distil BERT, Distil Roberta, ELECTRA, and XLNet. Through meticulous evaluation using metrics such as the Matthews Correlation Coefficient, Accuracy, Precision, Recall, and F1 Score, the study examined the efficacy of these models. The curated dataset was divided for gender and age tasks, resulting in robust gender prediction with the XLNet model and age prediction with the BERT model. Notably, the XLNet model achieved the highest MCC (0.7946), Accuracy (0.8957), Precision (0.8992), Recall (0.8957), and F1 Score (0.8958) values in gender classification, while the BERT model excelled in age prediction with an MCC of (0.7338), Accuracy of (0.8220), Precision of (0.8324), Recall of (0.8220), and F1 Score of (0.8243). Visualized outcomes provide valuable insights into the model’s performance nuances, paving the way for their practical implementation. This research offers novel contributions to author profiling tasks, bridging the gap between theory and real-world applications.
Journal Article
Studying scientific migration in Scopus
2013
An exploration is presented of Scopus as a data source for the study of international scientific migration or mobility for five study countries: Germany, Italy, the Netherlands, UK and USA. It is argued that Scopus author-affiliation linking and author profiling are valuable, crucial tools in the study of this phenomenon. It was found that the UK has the largest degree of outward international migration, followed by The Netherlands, and the USA the lowest. Language similarity between countries is a more important factor in international migration than it is in international co-authorship. During 1999–2010 the Netherlands showed a positive “migration balance” with the UK and a negative one with Germany, suggesting that in the Netherlands there were more Ph.D. students from Germany than there were from the UK, or that for Dutch post docs stage periods in the UK were more attractive than those in Germany. Comparison of bibliometric indicators with OECD statistics provided evidence that differences exist in the way the various study countries measured their number of researchers. The authors conclude that a bibliometric study of scientific migration using Scopus is feasible and provides significant outcomes. They make suggestions for further research.
Journal Article
A survey of machine learning-based author profiling from texts analysis in social networks
by
Fkih, Fethi
,
Ouni, Sarra
,
Omri, Mohamed Nazih
in
Computer Communication Networks
,
Computer Science
,
Data Structures and Information Theory
2023
Recently, online social networks, such as Twitter, Facebook, LinkedIn, etc., have grown exponentially with a large amount of information. These social networks have huge volumes of data, especially in textual form, which are unstructured and anonymous. This type of data usually leads to cybercrimes like cyberbullying, cyberterrorism, etc. and their analysis has nowadays become a serious challenge. From this perspective and to remedy this topical issue, various techniques have been proposed in the literature. Among the proposed solutions, author profiling represents the newest and most adopted technique by most researchers to discover hidden textual information. The objective of this technique is to identify the demographic or psychological aspects (age, sex, personality, mother tongue, etc.) of an author by examining the text that he has published. In recent years, this area of research has attracted many researchers who seek solutions for potential applications in various fields like marketing, computer forensics, security, etc. Within the scope of this article, we describe the author profiling task. Then, we present a brief thematic taxonomy and an illustration of some profiling solutions from the literature. In particular, different machine and deep learning techniques are detailed and discussed. This work also provides an overview of the main approaches, which we have studied in the literature, highlights the weak points and the strong points of each of these approaches. At the end of this study, a discussion of some research questions is presented and some future directions to circumvent the weaknesses detected in the approaches studied are presented in order to motivate academics and practitioners, who are interested in this problem that we assume essential, to advance solutions for profiling perpetrators on social networks.
Journal Article
Multidimensional Author Profiling for Social Business Intelligence
by
Aramburu, María José
,
Berlanga, Rafael
,
Lanza-Cruz, Indira
in
Business intelligence
,
Classifiers
,
Competitive intelligence
2024
This paper presents a novel author profiling method specially aimed at classifying social network users into the multidimensional perspectives for social business intelligence (SBI) applications. In this scenario, being the user profiles defined on demand for each particular SBI application, we cannot assume the existence of labelled datasets for training purposes. Thus, we propose an unsupervised method to obtain the required labelled datasets for training the profile classifiers. Contrary to other author profiling approaches in the literature, we only make use of the users’ descriptions, which are usually part of the metadata posts. We exhaustively evaluated the proposed method under four different tasks for multidimensional author profiling along with state-of-the-art text classifiers. We achieved performances around 88% and 98% of F1 score for a gold standard and a silver standard datasets respectively. Additionally, we compare our results to other supervised approaches previously proposed for two of our tasks, getting very close performances despite using an unsupervised method. To the best of our knowledge, this is the first method designed to label user profiles in an unsupervised way for training profile classifiers with a similar performance to fully supervised ones.
Journal Article
Prediction of Author’s Profile basing on Fine-Tuning BERT model
by
Bsir, Bassem
,
Khoufi, Nabil
,
Zrigui, Mounir
in
Accuracy
,
Artificial neural networks
,
Datasets
2024
The task of author profiling consists in specifying the infer-demographic features’ of the social networks’ users by studying their published content or the interactions between them. In the literature, many research works were conducted to enhance the accuracy of the techniques used in this process. In fact, the existing methods can be divided into two types: simple linear mod-els and complex deep neural network models. Among them, the transformer-based model exhibited the highest efficiency in NLP analysis in several lan-guages (English, German, French, Turk, Arabic, etc.). Despite their good per-formance, these approaches do not cover author profiling analysis and, thus, should be further enhanced. So, we propose in this paper a new deep learning strategy by training a customized transformer-model to learn the optimal fea-tures of our dataset. In this direction, we fine-tune the model by using the trans-fer learning approach to improve the results with random initialization. We have achieved about 79% of accuracy by modifying model to apply the retrain-ing process using PAN 2018 authorship dataset.
Journal Article