Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
365
result(s) for
"Author identification"
Sort by:
An ensemble deep learning model for author identification through multiple features
2025
One of the challenges in the natural language processing is authorship identification. The proposed research will improve the accuracy and stability of authorship identification by creating a new deep learning framework that combines the features of various types in a self-attentive weighted ensemble framework. Our approach enhances generalization to a great extent by combining a wide range of writing styles representations such as statistical features, TF-IDF vectors, and Word2Vec embeddings. The different sets of features are fed through separate Convolutional Neural Networks (CNN) so that the specific stylistic features can be extracted. More importantly, a self-attention mechanism is presented to smartly combine the results of these specialized CNNs so that the model can dynamically learn the significance of each type of features. The summation of the representation is then passed into a weighted SoftMax classifier with the aim of optimizing performance by taking advantage of the strengths of individual branches of the neural network. The suggested model was intensively tested on two different datasets, Dataset A, which included four authors, and Dataset B, which included thirty authors. Our method performed better than the baseline state-of-the-art methods by at least 3.09% and 4.45% on Dataset A and Dataset B respectively with accuracy of 80.29% and 78.44%, respectively. This self-attention-augmented multi-feature ensemble approach is very effective, with significant gains in state-of-the-art accuracy and robustness metrics of author identification.
Journal Article
The New Paradigm of Deepfake Detection at the Text Level
by
Stancu, Adrian
,
Rosca, Cosmina-Mihaela
,
Iovanovici, Emilian Marian
in
AI-generated text
,
Analysis
,
Artificial intelligence
2025
The world is currently facing the issue of text authenticity in different areas. The implications of generated text can raise concerns about manipulation. When a photo of a celebrity is posted alongside an impactful message, it can generate outrage, hatred, or other manipulative beliefs. Numerous artificial intelligence tools use different techniques to determine whether a text is artificial intelligence-generated or authentic. However, these tools fail to accurately determine cases in which a text is written by a person who uses patterns specific to artificial intelligence tools. For these reasons, this article presents a new approach to the issue of deepfake texts. The authors propose methods to determine whether a text is associated with a specific person by using specific written patterns. Each person has their own written style, which can be identified in the average number of words, the average length of the words, the ratios of unique words, and the sentiments expressed in the sentences. These features are used to develop a custom-made written-style machine learning model named the custom deepfake text model. The model’s results show an accuracy of 99%, a precision of 97.83%, and a recall of 90%. A second model, the anomaly deepfake text model, determines whether the text is associated with a specific author. For this model, an attempt was made to determine anomalies at the level of textual characteristics that are assumed to be associated with particular patterns of a certain author. The results show an accuracy of 88.9%, a precision of 100%, and a recall of 89.9%. The findings outline the possibility of using the model to determine if a text is associated with a certain author. The paper positions itself as a starting point for identifying deepfakes at the text level.
Journal Article
Comparative Analysis of Using Different Text Features, Models, and Methods in Text Author Recognition
2024
The authors used various methods and models in the computer system for text author recognition to recognize the authorship of texts via the example of Azerbaijani writers. They compared the effectiveness of using different text features and proposed feature selection procedures. The authors conducted computer experiments on the works of several famous Azerbaijani writers in the Azerbaijani language and analyzed the results obtained.
Journal Article
A Comparative Study of Machine Learning Methods and Text Features for Text Authorship Recognition in the Example of Azerbaijani Language Texts
by
Azimov, Rustam
,
Providas, Efthimios
in
Artificial neural networks
,
author identification
,
Authorship
2024
This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.
Journal Article
Robust stylometric analysis and author attribution based on tones and rimes
2020
In this article, we propose an innovative and robust approach to stylometric analysis without annotation and leveraging lexical and sub-lexical information. In particular, we propose to leverage the phonological information of tones and rimes in Mandarin Chinese automatically extracted from unannotated texts. The texts from different authors were represented by tones, tone motifs, and word length motifs as well as rimes and rime motifs. Support vector machines and random forests were used to establish the text classification model for authorship attribution. From the results of the experiments, we conclude that the combination of bigrams of rimes, word-final rimes, and segment-final rimes can discriminate the texts from different authors effectively when using random forests to establish the classification model. This robust approach can in principle be applied to other languages with established phonological inventory of onset and rimes.
Journal Article
Deep Content and Deep Sentiment Analysis
by
Kučera, Ondřej
,
Faltýnek, Dan
,
Benešová, Martina
in
Algorithms
,
Artificial intelligence
,
Attribution theory
2025
The objective of the article is twofold: first, to employ the knowledge of the recurrence of low-frequency words in authorial texts; and second, to prevent the misuse of this knowledge. Contrary to the prevailing authorship attribution theory and practice (Evert et al. 2017, Juola 2008), our research has revealed that the personal linguistic profile is not primarily composed of frequent words with grammatical functions. Instead, we have identified that a distinct set of full-meaning words defines an individual’s linguistic profile (Faltýnek 2020, Faltýnek – Matlach 2021). An examination of these meanings reveals an individual’s unconscious language habits and, consequently, their personality settings. Such personal profiling is referred to as “deep content” and “deep sentiment analysis”. The innovation in question has the potential to facilitate a novel form of linguistic personalization in digital communication, one that has not been previously observed or utilized. The main aim of this article is to describe the algorithm to conduct single-person linguistic deep content and deep sentiment profiling and personalization. We will describe technical steps to provide such a form of digital communication processing and to facilitate the adjustment of a text targeted at an individual, described as a
(Patent No.: US11797753B2, Faltýnek et al. 2023). This algorithm can be used to (a) produce a personal linguistic profile (analogically to psychometrics instruments such as NEO-FFI Big Five, Minnesota Multiphasic Personality Inventory (MMPI)), (b) target digital communication to an individual by “translating” a text to their language (i.e. linguistic habits) and stimulate desired feelings to a predetermined content. The algorithm is, however, also designed (c) to be used to avoid procedures (a) and (b) using any kind of digital communication platform by an individual. This algorithm is implemented in the software Cloakspeech (Faltýnek – Benešová – Kučera 2025), which provides personalization of AI-generated texts: AI speaks like a particular person.
Journal Article
Identifying Similar Users Between Dark Web and Surface Web Using BERTopic and Authorship Attribution
by
Park, SungJin
,
Kim, Dong-Wook
,
Han, Myung-Mook
in
Algorithms
,
Authorship
,
Computational linguistics
2025
The dark web is a part of the deep web that ensures anonymity to users, thus facilitating various malicious activities, such as the sales of drugs, firearms, and personal information or the dissemination of malware and cyberattack tools. These activities extend beyond the dark web and have negative effects on the surface web, which is commonly accessed by internet users. Recent studies on the dark web are limited to the detection and classification of specific malicious activities; that is, they cannot trace or identify the authors of dark web content or the source of a given information Therefore, we herein propose a method for identifying similar authors between the surface and dark webs using BERTopic and authorship attribution. We applied BERTopic to the surface and dark webs to extract previously unidentified topics and measured the similarity between the topics to detect similar topics between the two webs. In addition, we applied authorship attribution to the contents written by the authors of similar topics to extract the unique author characteristics. The similarity between the authors was measured to identify authors with similar characteristics. Thus, we identified authors who had written contents on similar topics on both the surface and dark webs as well as authors who are simultaneously active on both webs.
Journal Article
A Content Analysis of Indian Research Data Repositories Prospects and Possibilities
2019
The study aims to trace the development of Indian research data repositories (RDRs) and explore their content with the view of identifying prospects and possibilities. Further, it analyses the distribution of data repositories on the basis of content coverage, types of content, author identification system followed, software and the application programming interface used, subject wise number of repositories etc. The study is based on data repositories listed on the registry of data repositories accessible at http://www.re3data.org.The dataset was exported in Microsoft Excel format for analysis. A simple percentage method was followed in data analyses and results are presented through Tables and Figures. The study found a total of 2829 data repositories in existence worldwide. Further, it was seen that 1526 (53.9 %) are open and 924 (32.4 %) are restricted data repositories. Also, there are embargoed data repositories numbering 225 (8.0 %) and closed ones numbering 154 (5.4 %). There are 2829 RDRs covering 72 countries in the world. The study found that out of total 45 Indian RDRs, only 30 (67 %) are open, followed by restricted 12 (27 %) and 3 (6 %) that are closed. Majority of Indian RDRs (20) were developed in the year 2014. The study found that the majority of Indian RDRs (17) are‘disciplinary’. Further, the study also revealed that statistical data formats are available in a maximum of 31 (68.9 %) Indian RDRs. It was also seen that the majority of Indian RDRs (28) has datasets relating to ‘Life Sciences’. It was identified that only 20% of data repositories have been using metadata standards in metadata; the remaining 80% do not use any standards in metadata entry. This study covered only the research data repositories in India registered on the registry of data repositories. RDRs not listed in the registry of data repositories are left out.
Journal Article
Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents
by
Sk, Md Obaidullah
,
Mukherjee, Himadri
,
Sen, Shibaprasad
in
Accuracy
,
Algorithms
,
Artificial neural networks
2022
Author identification is an important aspect of literary analysis, studied in natural language processing (NLP). It aids identify the most probable author of articles, news texts or social media comments and tweets, for example. It can be applied to other domains such as criminal and civil cases, cybersecurity, forensics, identification of plagiarizer, and many more. An automated system in this context can thus be very beneficial for society. In this paper, we propose a convolutional neural network (CNN)-based author identification system from literary articles. This system uses visual features along with a five-layer convolutional neural network for the identification of authors. The prime motivation behind this approach was the feasibility to identify distinct writing styles through a visualization of the writing patterns. Experiments were performed on 1200 articles from 50 authors achieving a maximum accuracy of 93.58%. Furthermore, to see how the system performed on different volumes of data, the experiments were performed on partitions of the dataset. The system outperformed standard handcrafted feature-based techniques as well as established works on publicly available datasets.
Journal Article
Authorship Attribution Using Sequential Part-of-Speech Pattern Mining
2023
Given an anonymous text, automatically attributing a name from a group of known writers is called \"Authorship Attribution\" (AA). It is a classification problem, and feature extraction techniques are initially applied, followed by the training of a model using a collection of texts whose authors are known. Numerous features, such as lexical, semantic, structural, n-grams, etc., can be used to identify the stylistic characteristics of writers. The authors of this research propose a novel approach to this problem by using sequential pattern mining on part-of-speech (PoS) tags. This paper introduces and discusses the concept of a Part-of-Speech Skip-Gram (PoSSG) that is different from traditional n-gram. A sequential pattern mining algorithm is applied to obtain PoSSG patterns, which are then used for authorship attribution tasks. Experimental studies on two different datasets: novels extracted from Project Gutenberg and Stamatatos06 Author Identification: C10-Attribution confirms that this approach of mining PoSSG patterns facilitates author identification.
Journal Article