Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
343 result(s) for "TF-IDF"
Sort by:
Hate Speech and Offensive Content: Harnessing Machine Learning for Reliable Analysis and Detection
The escalating prevalence of hate speech on social media necessitates effective detection mechanisms to foster a safe and inclusive online community. This research paper aims to enhance hate speech detection accuracy by evaluating the performance of diverse machine learning algorithms: Random Forest (RF), Logistic Regression (LR), and K-Nearest Neighbors (KNN). A diverse dataset comprising text samples from various online platforms, encompassing a wide spectrum of hate speech instances, was meticulously collected. The data underwent careful preprocessing involving tokenization, stemming, and stop-word removal to enhance data quality. Additionally, feature extraction techniques such as TF-IDF (Term Frequency-Inverse Document Frequency) and word embeddings were employed to effectively represent the textual content. The dataset was divided into training and testing sets, and the selected machine learning algorithms were trained on the former. Fine-tuning of hyperparameters was performed using crossvalidation techniques to optimize their performance. Evaluation metrics, including accuracy, precision, recall, and F1-score, were employed to assess the models’ effectiveness. The experimental findings revealed promising outcomes for hate speech detection across all three algorithms. Notably, Count Vectorizer features demonstrated excellent performance, with Random Forest achieving an accuracy of 0.942 for binary hate speech analysis and Logistic Regression achieving an accuracy of 0.897 for multi-class hate speech analysis, followed by LR and KNN.
Research paper classification systems based on TF-IDF and LDA schemes
With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.
Automating fake news detection system using multi-level voting model
The issues of online fake news have attained an increasing eminence in the diffusion of shaping news stories online. Misleading or unreliable information in the form of videos, posts, articles, URLs is extensively disseminated through popular social media platforms such as Facebook and Twitter. As a result, editors and journalists are in need of new tools that can help them to pace up the verification process for the content that has been originated from social media. Motivated by the need for automated detection of fake news, the goal is to find out which classification model identifies phony features accurately using three feature extraction techniques, Term Frequency–Inverse Document Frequency (TF–IDF), Count-Vectorizer (CV) and Hashing-Vectorizer (HV). Also, in this paper, a novel multi-level voting ensemble model is proposed. The proposed system has been tested on three datasets using twelve classifiers. These ML classifiers are combined based on their false prediction ratio. It has been observed that the Passive Aggressive, Logistic Regression and Linear Support Vector Classifier (LinearSVC) individually perform best using TF-IDF, CV and HV feature extraction approaches, respectively, based on their performance metrics, whereas the proposed model outperforms the Passive Aggressive model by 0.8%, Logistic Regression model by 1.3%, LinearSVC model by 0.4% using TF-IDF, CV and HV, respectively. The proposed system can also be used to predict the fake content (textual form) from online social media websites.
EagleEye: A Worldwide Disease-Related Topic Extraction System Using a Deep Learning Based Ranking Algorithm and Internet-Sourced Data
Due to the prevalence of globalization and the surge in people’s traffic, diseases are spreading more rapidly than ever and the risks of sporadic contamination are becoming higher than before. Disease warnings continue to rely on censored data, but these warning systems have failed to cope with the speed of disease proliferation. Due to the risks associated with the problem, there have been many studies on disease outbreak surveillance systems, but existing systems have limitations in monitoring disease-related topics and internationalization. With the advent of online news, social media and search engines, social and web data contain rich unexplored data that can be leveraged to provide accurate, timely disease activities and risks. In this study, we develop an infectious disease surveillance system for extracting information related to emerging diseases from a variety of Internet-sourced data. We also propose an effective deep learning-based data filtering and ranking algorithm. This system provides nation-specific disease outbreak information, disease-related topic ranking, a number of reports per district and disease through various visualization techniques such as a map, graph, chart, correlation and coefficient, and word cloud. Our system provides an automated web-based service, and it is free for all users and live in operation.
Text documents clustering using data mining techniques
Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts on weight of document that calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights.
A Study of Output Vocabulary Knowledge in the English Writing Process
Vocabulary acquisition is pivotal in enhancing English writing proficiency. Effective integration of output vocabulary into written English is essential for improving students’ compositional skills. This study proposes a methodology for extracting vocabulary from English textual materials and subsequently applying it to student writing endeavors. To ensure the integrity and accuracy of the text materials utilized, this research employs a Long Short-Term Memory (LSTM) algorithm to perform a comprehensive spelling check on the English writing corpus prior to vocabulary extraction. Further, this paper adopts the high-frequency word list and Term Frequency-Inverse Document Frequency (TF-IDF) techniques to identify and evaluate the significance of vocabulary within the texts. Key vocabulary that significantly impacts word importance classification is preliminarily identified using the Graph Convolutional Network-K Nearest Neighbor (GCKN) algorithm. These pivotal words, termed ‘key nodes, ’ form the basis for constructing a network within the English texts. Utilizing the message-passing mechanism, information from associated nodes is aggregated at the central node, facilitating the acquisition of output vocabulary. The study findings indicate that students, after learning and applying the acquired vocabulary, demonstrate considerable improvements in their English writing capabilities. They exhibit a broader and more sophisticated use of vocabulary, leading to marked enhancements in their writing performance and overall English proficiency.
Attention-Based CNN and Bi-LSTM Model Based on TF-IDF and GloVe Word Embedding for Sentiment Analysis
Sentiment analysis (SA) detects people’s opinions from text engaging natural language processing (NLP) techniques. Recent research has shown that deep learning models, i.e., Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Transformer-based provide promising results for recognizing sentiment. Nonetheless, CNN has the advantage of extracting high-level features by using convolutional and max-pooling layers; it cannot efficiently learn a sequence of correlations. At the same time, Bidirectional RNN uses two RNN directions to improve extracting long-term dependencies. However, it cannot extract local features in parallel, and Transformer-based like Bidirectional Encoder Representations from Transformers (BERT) are the computational resources needed to fine-tune, facing an overfitting problem on small datasets. This paper proposes a novel attention-based model that utilizes CNNs with LSTM (named ACL-SA). First, it applies a preprocessor to enhance the data quality and employ term frequency-inverse document frequency (TF-IDF) feature weighting and pre-trained Glove word embedding approaches to extract meaningful information from textual data. In addition, it utilizes CNN’s max-pooling to extract contextual features and reduce feature dimensionality. Moreover, it uses an integrated bidirectional LSTM to capture long-term dependencies. Furthermore, it applies the attention mechanism at the CNN’s output layer to emphasize each word’s attention level. To avoid overfitting, the Guasiannoise and GuasianDroupout are adopted as regularization. The model’s robustness is evaluated on four English standard datasets, i.e., Sentiment140, US-airline, Sentiment140-MV, SA4A with various performance matrices, and compared efficiency with existing baseline models and approaches. The experiment results show that the proposed method significantly outperforms the state-of-the-art models.
Recent trends of green human resource management: Text mining and network analysis
Issues of the environmental crisis are being addressed by researchers, government, and organizations alike. GHRM is one such field that is receiving lots of research focus since it is targeted at greening the firms and making them eco-friendly. This research reviews 317 articles from the Scopus database published on green human resource management (GHRM) from 2008 to 2021. The study applies text mining, latent semantic analysis (LSA), and network analysis to explore the trends in the research field in GHRM and establish the relationship between the quantitative and qualitative literature of GHRM. The study has been carried out using KNIME and VOSviewer tools. As a result, the research identifies five recent research trends in GHRM using K-mean clustering. Future researchers can work upon these identified trends to solve environmental issues, make the environment eco-friendly, and motivate firms to implement GHRM in their practices.
Dilemmas and Breakthroughs in the Legal Regulation of Artificial Intelligence Based on Deep Learning Models
In this paper, we use big data analysis techniques combined with the TF-IDF algorithm to weigh the frequently occurring word frequency vectors in text and reduce the document length to obtain keywords without destroying the original text feature information. The similarity of text features is combined with a Bayesian algorithm for label classification to facilitate data query and indexing. The results show that the running time of the system is kept around 14s, the recall and accuracy can be close to about 75% and 72% on average, and the number of keywords can reach 5971 with an F1 value of 0.9, which proves the effectiveness of the artificial intelligence legal regulation system based on big data analysis.