Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
24,292
result(s) for
"Text Data"
Sort by:
Handbook of research on opinion mining and text analytics on literary works and social media
\"This book uses artificial intelligence and big data analytics to conduct opinion mining and text analytics on literary works and social media, focusing on theories, method, applications and approaches of data analytic techniques that can be used to extract and analyze data from literary books and social media, in a meaningful pattern\"-- Provided by publisher.
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts
2022
With the rapid proliferation of social networking sites (SNS), automatic topic extraction from various text messages posted on SNS are becoming an important source of information for understanding current social trends or needs. Latent Dirichlet Allocation (LDA), a probabilistic generative model, is one of the popular topic models in the area of Natural Language Processing (NLP) and has been widely used in information retrieval, topic extraction, and document analysis. Unlike long texts from formal documents, messages on SNS are generally short. Traditional topic models such as LDA or pLSA (probabilistic latent semantic analysis) suffer performance degradation for short-text analysis due to a lack of word co-occurrence information in each short text. To cope with this problem, various techniques are evolving for interpretable topic modeling for short texts, pretrained word embedding with an external corpus combined with topic models is one of them. Due to recent developments of deep neural networks (DNN) and deep generative models, neural-topic models (NTM) are emerging to achieve flexibility and high performance in topic modeling. However, there are very few research works on neural-topic models with pretrained word embedding for generating high-quality topics from short texts. In this work, in addition to pretrained word embedding, a fine-tuning stage with an original corpus is proposed for training neural-topic models in order to generate semantically coherent, corpus-specific topics. An extensive study with eight neural-topic models has been completed to check the effectiveness of additional fine-tuning and pretrained word embedding in generating interpretable topics by simulation experiments with several benchmark datasets. The extracted topics are evaluated by different metrics of topic coherence and topic diversity. We have also studied the performance of the models in classification and clustering tasks. Our study concludes that though auxiliary word embedding with a large external corpus improves the topic coherency of short texts, an additional fine-tuning stage is needed for generating more corpus-specific topics from short-text data.
Journal Article
Sentimental classification analysis of polarity multi-view textual data using data mining techniques
by
Ali, Mohanad Faeq
,
Talib, Mohammed Saad
,
Alkhazraji, Adel Abdul-Jabbar
in
Algorithms
,
Classification
,
Clustering
2020
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
Journal Article
Subjective well-being and social media
\"Subjective Well-Being and Social Media shows how, by exploiting the unprecedented amount of information provided by the social networking sites, it is possible to build new composite indicators of subjective well-being. These new social media indicators are complementary to official statistics and surveys, whose data are collected at very low temporary and geographical resolution. The book also explains in full details how to solve the problem of selection bias coming from social media data. Mixing textual analysis, machine learning and time series analysis, the book also shows how to extract both the structural and the temporary components of subjective well-being. Cross-country analysis confirms that well-being is a complex phenomenon that is governed by macroeconomic and health factors, ageing, temporary shocks and cultural and psychological aspects. As an example, the last part of the book focuses on the impact of the prolonged stress due to the COVID-19 pandemic on subjective well-being in both Japan and Italy. Through a data science approach, the results show that a consistent and persistent drop occurred throughout 2020 in the overall level of well-being in both countries. The methodology presented in this book: enables social scientists and policy makers to know what people think about the quality of their own life, minimizing the bias induced by the interaction between the researcher and the observed individuals; being language-free, it allows for comparing the well-being perceived in different linguistic and socio-cultural contexts, disentangling differences due to objective events and life conditions from dissimilarities related to social norms or language specificities; provides a solution to the problem of selection bias in social media data through a systematic approach based on time-space small area estimation models. The book comes also with replication R scripts and data. Stefano M. Iacus is full professor of Statistics at the University of Milan, on leave at the Joint Research Centre of the European Commission. Former R-core member (1999-2017) and R Foundation Member. Giuseppe Porro is full professor of Economic Policy at the University of Insubria. An earlier version of this project was awarded the Italian Institute of Statistics-Google prize for \"official statistics and big data\"\"-- Provided by publisher.
Text+ – Concept and Benefits for Empirical Researchers
by
Trippel, Thorsten
,
Hinrichs, Erhard
in
Distributed research data infrastructure Text
,
German national research infrastructure NFDI
,
Language- and text-based research data
2024
In this contribution, we report on ongoing efforts in the German national research infrastructure consortium Text+ to make research data and services for text- and language-oriented disciplines FAIR, that is findable, accessible, interoperable, and reusable, as well as compliant with the CARE principles for language resources.
Journal Article
Functional Applications of Text Analytics Systems
2020,2021
Text analytics consist of the statistics about a text element, which includes the word count, the word histogram, and the word frequency histogram. Most text documents of value are related to other—sometimes many other—documents, and so analytics describing the relative frequency of terms in a document compared to its peers are important for defining key words (tagging, labeling, indexing), search-responsive terms (query terms), and compressed versions of the documents (key words, summary, etc.).This clearly written text explains the functional applications of search, translation, optimization, and learning with regard to text analytics. Generation of analytics is aided by a hybrid, ensemble, or other combinatorial approach in which two or more effective analytic processes are used simultaneously, and their outputs combined to form a better “consensus”. Additional value to the preservation of the information is provided through these methods. Also, since they encompass capabilities of two or more knowledge-generating systems, they can create a “superset” of access points to the data generated. The book also describes the role of functional approaches in the testing and configuration of these systems.
Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers
by
Dallmeyer, Jörg
,
Bayer, Markus
,
Buchhold, Björn
in
Artificial Intelligence
,
Classification
,
Classifiers
2023
In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. Especially with regard to small data analytics, additive accuracy gains of up to 15.53% and 3.56% are achieved within a constructed low data regime, compared to the no augmentation baseline and another data augmentation technique. As the current track of these constructed regimes is not universally applicable, we also show major improvements in several real world low data tasks (up to +4.84 F1-score). Since we are evaluating the method from many perspectives (in total 11 datasets), we also observe situations where the method might not be suitable. We discuss implications and patterns for the successful application of our approach on different types of datasets.
Journal Article
Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study
by
Vydiswaran, VG Vinod
,
Guetterman, Timothy C
,
Basu, Tanmay
in
Algorithms
,
Analysis
,
Augmentation
2018
Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure.
The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods.
We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis.
The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions.
NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.
Journal Article
Text Data Augmentation for Deep Learning
by
Furht, Borko
,
Shorten, Connor
,
Khoshgoftaar, Taghi M.
in
Algorithms
,
Artificial intelligence
,
Augmentation
2021
Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.
Journal Article