Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Out-domain Chinese new word detection with statistics-based character embedding
by
Zhu, Jia
, Yang, Min
, Yiu, S M
, Liang, Yuzhi
in
Artificial intelligence
/ Asian languages
/ Chinese languages
/ Corpus linguistics
/ Data quality
/ English language
/ Experiments
/ Japanese language
/ Languages
/ Machine learning
/ Methods
/ Neural networks
/ Personality
/ Quality
/ Segmentation
/ Short term memory
/ Social networks
/ Software
/ Speech
/ Statistics
/ Word boundaries
2019
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Out-domain Chinese new word detection with statistics-based character embedding
by
Zhu, Jia
, Yang, Min
, Yiu, S M
, Liang, Yuzhi
in
Artificial intelligence
/ Asian languages
/ Chinese languages
/ Corpus linguistics
/ Data quality
/ English language
/ Experiments
/ Japanese language
/ Languages
/ Machine learning
/ Methods
/ Neural networks
/ Personality
/ Quality
/ Segmentation
/ Short term memory
/ Social networks
/ Software
/ Speech
/ Statistics
/ Word boundaries
2019
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Out-domain Chinese new word detection with statistics-based character embedding
by
Zhu, Jia
, Yang, Min
, Yiu, S M
, Liang, Yuzhi
in
Artificial intelligence
/ Asian languages
/ Chinese languages
/ Corpus linguistics
/ Data quality
/ English language
/ Experiments
/ Japanese language
/ Languages
/ Machine learning
/ Methods
/ Neural networks
/ Personality
/ Quality
/ Segmentation
/ Short term memory
/ Social networks
/ Software
/ Speech
/ Statistics
/ Word boundaries
2019
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Out-domain Chinese new word detection with statistics-based character embedding
Journal Article
Out-domain Chinese new word detection with statistics-based character embedding
2019
Request Book From Autostore
and Choose the Collection Method
Overview
Unlike English and other Western languages, many Asian languages such as Chinese and Japanese do not delimit words by space. Word segmentation and new word detection are therefore key steps in processing these languages. Chinese word segmentation can be considered as a part-of-speech (POS)-tagging problem. We can segment corpus by assigning a label for each character which indicates the position of the character in a word (e.g., “B” for word beginning, and “E” for the end of the word, etc.). Chinese word segmentation seems to be well studied. Machine learning models such as conditional random field (CRF) and bi-directional long short-term memory (LSTM) have shown outstanding performances on this task. However, the segmentation accuracies drop significantly when applying the same approaches to out-domain cases, in which high-quality in-domain training data are not available. An example of out-domain applications is the new word detection in Chinese microblogs for which the availability of high-quality corpus is limited. In this paper, we focus on out-domain Chinese new word detection. We first design a new method Edge Likelihood (EL) for Chinese word boundary detection. Then we propose a domain-independent Chinese new word detector (DICND); each Chinese character is represented as a low-dimensional vector in the proposed framework, and segmentation-related features of the character are used as the values in the vector.
This website uses cookies to ensure you get the best experience on our website.