Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Document Vector Representation with Enhanced Features Based on Doc2VecC
in
Algorithms
/ Classification
/ Deletion
/ Documents
/ Effectiveness
/ Efficiency
/ Methods
/ Natural language processing
/ Neural networks
/ Representations
/ Semantics
/ Statistical methods
/ Words (language)
2024
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Document Vector Representation with Enhanced Features Based on Doc2VecC
in
Algorithms
/ Classification
/ Deletion
/ Documents
/ Effectiveness
/ Efficiency
/ Methods
/ Natural language processing
/ Neural networks
/ Representations
/ Semantics
/ Statistical methods
/ Words (language)
2024
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Document Vector Representation with Enhanced Features Based on Doc2VecC
Journal Article
Document Vector Representation with Enhanced Features Based on Doc2VecC
2024
Request Book From Autostore
and Choose the Collection Method
Overview
The main purpose of document vectorization is to represent words into a series of vectors that can express the semantics of documents. Whether in Chinese or English, words are the most basic units to express text processing. The effectiveness of the natural language processing tasks is highly correlated with the document vector representation method. Document vectorization methods include statistical-based methods and neural network-based methods. However, in general, many document vectorization methods are generic methods that do not distinguish between both long and short texts as well as English and Chinese usage scenarios, thus leading to unsatisfactory document classification results. In addition to developing a PV-IDF model with enhanced features to address the issue of document feature loss caused by the Doc2VecC model using random deletion method, this paper suggests the inverse document frequency as an important indicator of candidate word deletion strategy. This will speed up model training and improve the effectiveness of document classification. From the experimental data, the PV-IDF model with enhanced features performs better for both long and short documents,as well as English and Chinese documents, and it has important advantages in terms of algorithm execution efficiency and error rate, particularly for short documents. The proposed method outperforms the Doc2VecC model in each of the five evaluation indicators that evaluate the effect of classification, with the average error rate for short document classification being 41% lower than that of the Doc2VecC model and 45.2% lower than that of the PV-DM model, respectively. Compared with the Doc2VecC model, which can only show high efficiency on small-scale data sets, the PV-IDF model can demonstrate high training efficiency on a variety of scale datasets, outperforming the comparison approach. As a result, the proposed method can provide high-quality vector representations for documents of varying length and enhance the effectiveness of related operations.
Publisher
Springer Nature B.V
Subject
This website uses cookies to ensure you get the best experience on our website.