Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
332 result(s) for "Word2vec"
Sort by:
The Geometry of Culture
We argue word embedding models are a useful tool for the study of culture using a historical analysis of shared understandings of social class as an empirical case. Word embeddings represent semantic relations between words as relationships between vectors in a highdimensional space, specifying a relational model of meaning consistent with contemporary theories of culture. Dimensions induced by word differences (rich–poor) in these spaces correspond to dimensions of cultural meaning, and the projection of words onto these dimensions reflects widely shared associations, which we validate with surveys. Analyzing text from millions of books published over 100 years, we show that the markers of class continuously shifted amidst the economic transformations of the twentieth century, yet the basic cultural dimensions of class remained remarkably stable. The notable exception is education, which became tightly linked to affluence independent of its association with cultivated taste.
LDA-CBOW-Based Mining Model for Risky Driving Behavior in Traffic Accidents
Traffic accident data of traffic management department is recorded in unstructured text form, which contains a large number of characteristic descriptions related to risky driving behavior. However, such data has short text length and abundant professional vocabulary. Many text mining techniques cannot effectively analyze such text data. This paper proposes an improved LDA algorithm based on CBOW—LDA-CBOW model for the study of traffic accident text data containing illegal behaviors. This model can better extract the topics of traffic accident data and filter the keywords under the corresponding topics, which provides a better way to study the dependence relationship between traffic data and illegal behaviors. Experiments show that compared to other models, this model can better extract related topics of traffic accident data with higher model efficiency and better robustness.
LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.
A Doctor Recommendation Based on Graph Computing and LDA Topic Model
Doctor recommendation technology can help patients filter out large number of irrelevant doctors and find doctors who meet their actual needs quickly and accurately, helping patients gain access to helpful personalized online healthcare services. To address the problems with the existing recommendation methods, this paper proposes a hybrid doctor recommendation model based on online healthcare platform, which utilizes the word2vec model, latent Dirichlet allocation (LDA) topic model, and other methods to find doctors who best suit patients' needs with the information obtained from consultations between doctors and patients. Then, the model treats these doctors as nodes in order to construct a doctor tag cooccurrence network and recommends the most important doctors in the network via an eigenvector centrality calculation model on the graph. This method identifies the important nodes in the entire effective doctor network to support the recommendation from a new graph computing perspective. An experiment conducted on the Chinese healthcare website Chunyuyisheng.com proves that the proposed method a good recommendation performance.
A detailed review on word embedding techniques with emphasis on word2vec
Text data has been growing drastically in the present day because of digitalization. The Internet, being flooded with millions of documents every day, makes the task of text processing by human beings relatively complex, which is neither adaptable nor successful. Many machine learning algorithms cannot interpret the raw text in its original format, as these algorithms purely need numbers as inputs to accomplish any task (say, classification, regression). A better way to represent text for computers, to understand and process text efficiently and effectively is needed. Word embedding is one such technique. Word embedding, or the encoding of words as vectors, has received much interest as a feature learning technique for natural language processing in recent times. This review presents a better way of understanding and working with word embeddings. Many researchers, who are non-experts in using different text processing techniques, would not know where to start their exploration due to a lack of comprehensive material. This review provides an overview of several word embedding strategies and the entire working procedure of word2vec,both in theory and mathematical perspectives which provides researchers with detailed information so that they may rapidly get to work on their research. Research results of standard word embedding techniques have also been included to better understand how word embedding have been improved from the past years to most recent findings.
Continuous-bag-of-words and Skip-gram for word vector training and text classification
Natural language processing is one of the most challenging parts in the study of artificial intelligence and is widely used in real-life applications. One of the basic questions is how to calculate the probability of a particular text sequence appearing in a certain context. Word2Vec is a powerful tool that provides a solution to the question for its ability to transform words into word vectors, and to train in high efficiency on large datasets and corpora. It has many models of which Continuous-Bag-Of-Words and Skip-gram are of great significance and also known to many people. Furthermore, some extended techniques related to the models are also proposed in order to simultaneously decrease required training time and increase the rate of accuracy for the training. Even though there are now a number of papers that describe these fundamental concepts, the quality vary greatly. To better understand the models and their extensions, and how well they behave when used for real tasks, different combinations of the models and techniques are made in this paper so as to compare their performance in processing large input data and the ability for correct prediction in the task of text classification. This is done as it could lead to more provision of details and understandings of the model for subsequent researches on this field of study.
Gene2vec: distributed representation of genes based on co-expression
Background Existing functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding. Results From a pure data-driven fashion, we trained a 200-dimension vector representation of all human genes, using gene co-expression patterns in 984 data sets from the GEO databases. These vectors capture functional relatedness of genes in terms of recovering known pathways - the average inner product (similarity) of genes within a pathway is 1.52X greater than that of random genes. Using t-SNE, we produced a gene co-expression map that shows local concentrations of tissue specific genes. We also illustrated the usefulness of the embedded gene vectors, laden with rich information on gene co-expression patterns, in tasks such as gene-gene interaction prediction. Conclusions We proposed a machine learning method that utilizes transcriptome-wide gene co-expression to generate a distributed representation of genes. We further demonstrated the utility of our distribution by predicting gene-gene interaction based solely on gene names. The distributed representation of genes could be useful for more bioinformatics applications.
The impact of COVID-19 on hotel customer satisfaction: evidence from Beijing and Shanghai in China
Purpose The purpose of this study is to provide better service to hotel customers during the COVID-19 era. Specifically, this study focuses on understanding the changes in hotel customer satisfaction during the epidemic and formulating effective marketing strategies to satisfy and attract guests. Design/methodology/approach As the first victim of the COVID-19 virus, China’s hotel industry has been profoundly affected and customer satisfaction and needs have also changed. Taking 105,635 hotel reviews obtained from Tripadvisor.com in Beijing and Shanghai as samples, this study explores the changes in consumer satisfaction by using text-mining methods. Findings The results suggest that there are significant differences in overall ratings, spatial distribution and ratings of different traveller types before and after the epidemic. Generally, customers have higher “tolerance” and are more inclined to give higher ratings and pay more attention to hotel prevention and control measures to reduce health risks after the COVID-19. Research limitations/implications This paper proves the changes in customer satisfaction before and after the COVID-19 at the theoretical level and reveals the changes in customer attention through the topic model and provides a basis for guiding hotel managers to reduce the impact of the COVID-19 crisis. Practical implications Empirical findings would provide useful insights into tourism management and improve hotel service quality during the COVID-19 epidemic era. Originality/value This research explores the hotel customer satisfaction in the field of hotel management before COVID-19 and after COVID-19, by using text mining to analyse mandarin online reviews. The results of this study will suggest that the hotel industry should continuously adjust its products and services based on the effective information obtained from customer reviews, so as to realize the activation and revitalization of the hotel industry in the epidemic era.
“FabNER”: information extraction from manufacturing process science domain literature using named entity recognition
The number of published manufacturing science digital articles available from scientific journals and the broader web have exponentially increased every year since the 1990s. To assimilate all of this knowledge by a novice engineer or an experienced researcher, requires significant synthesis of the existing knowledge space contained within published material, to find answers to basic and complex queries. Algorithmic approaches through machine learning and specifically Natural Language Processing (NLP) on a domain specific area such as manufacturing, is lacking. One of the significant challenges to analyzing manufacturing vocabulary is the lack of a named entity recognition model that enables algorithms to classify the manufacturing corpus of words under various manufacturing semantic categories. This work presents a supervised machine learning approach to categorize unstructured text from 500K+ manufacturing science related scientific abstracts and labelling them under various manufacturing topic categories. A neural network model using a bidirectional long-short term memory, plus a conditional random field (BiLSTM + CRF) is trained to extract information from manufacturing science abstracts. Our classifier achieves an overall accuracy (f1-score) of 88%, which is quite near to the state-of-the-art performance. Two use case examples are presented that demonstrate the value of the developed NER model as a Technical Language Processing (TLP) workflow on manufacturing science documents. The long term goal is to extract valuable knowledge regarding the connections and relationships between key manufacturing concepts/entities available within millions of manufacturing documents into a structured labeled-property graph data structure that allow for programmatic query and retrieval.