Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Country Of Publication
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
451 result(s) for "Chinese language Data processing."
Sort by:
The Chinese computer : a global history of the information age
\"Exploration of the largely unknown history of Chinese-language computing systems, accessible to an audience unfamiliar with the Chinese language or the technical workings of personal computers\"-- Provided by publisher.
HowNet and the computation of meaning
It is widely acknowledged that natural language processing, as an indispensable means for information technology, requires the strong support of world knowledge as well as linguistic knowledge. This book is a theoretical exploration into the extra-linguistic knowledge needed for natural language processing and a panoramic description of HowNet as a case study.
Automatic noun phrase extraction from full Chinese text
In this thesis, a new statistics-based partial parser CNPext for extraction of maximal-length noun phrase in Chinese is presented. Given a Chinese run text as the input, the CNPext system performs the following: (1) noun phrase boundary determination; and (2) ambiguities resolution for relative clause and prepositional phrase modifiers. The noun phrase extraction module consisted of two stages: it first finds all boundary candidates, and then pairs the opening and ending candidates to form the final noun phrase. Our system is superior to other noun phrase extraction systems, as it can resolve the structural ambiguities, a problem faced by many natural language processing systems. Others simply fail to do so as they cannot handle ambiguities incurred by relative clause and prepositional phrase modifiers. However, our experiments showed that merely statistics-based approaches with part-of-speech tags are not adequate for the purpose; semantic information at a higher level is needed for this. Our proposed algorithm used the semantic class relation between a verb-noun (preposition-noun) pair derived from the standard Chinese thesaurus, to work out which phrase structure is more semantically acceptable. Our work is the first comprehensive attempt in automatic Chinese noun phrase extraction. It not only proposes an effective way to automatically extract noun phrases from large running texts but also gives an impetus to the other work in similar areas, e.g. verb phrase extraction. Exploring effective methods for a complete noun phrase extraction system in the Chinese world is a challenging exercise. We hope this project has provided some insight, if not the complete solutions, to the problems and enables the development of advanced, practical Chinese information processing systems soon to handle the ever growing volume of information.
On the fractal patterns of language structures
Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.
Using Interactive Virtual Reality Tools in an Advanced Chinese Language Class: a Case Study
This case study explored college students’ use of interactive virtual reality tools (Google Cardboard and Expeditions) for learning Chinese as a foreign language. Specifically, the purpose of the study was to probe into students’ perceived benefits and challenges of using VR tools for Chinese language and culture learning. Twelve students were paired and role-played as virtual tour guides for six locations throughout a semester. Every two weeks, each dyad studied a particular Chinese tourist attraction or location and presented orally in Chinese as virtual tour guides by using the VR tools. Data collection included class observations of all presentations by each dyad, 24 reflections (two per participant, after the first and fifth presentations), and individual follow-up interviews. The study indicated that the real-life view VR tools offered an authentic context for Chinese language learning, sparked interest in the virtually presented locales, and encouraged students to further explore the target culture.
Prosody Dominates Over Semantics in Emotion Word Processing: Evidence From Cross-Channel and Cross-Modal Stroop Effects
Purpose: Emotional speech communication involves multisensory integration of linguistic (e.g., semantic content) and paralinguistic (e.g., prosody and facial expressions) messages. Previous studies on linguistic versus paralinguistic salience effects in emotional speech processing have produced inconsistent findings. In this study, we investigated the relative perceptual saliency of emotion cues in cross-channel auditory alone task (i.e., semantics-prosody Stroop task) and cross-modal audiovisual task (i.e., semantics-prosody-face Stroop task). Method: Thirty normal Chinese adults participated in two Stroop experiments with spoken emotion adjectives in Mandarin Chinese. Experiment 1 manipulated auditory pairing of emotional prosody (happy or sad) and lexical semantic content in congruent and incongruent conditions. Experiment 2 extended the protocol to cross-modal integration by introducing visual facial expression during auditory stimulus presentation. Participants were asked to judge emotional information for each test trial according to the instruction of selective attention. Results: Accuracy and reaction time data indicated that, despite an increase in cognitive demand and task complexity in Experiment 2, prosody was consistently more salient than semantic content for emotion word processing and did not take precedence over facial expression. While congruent stimuli enhanced performance in both experiments, the facilitatory effect was smaller in Experiment 2. Conclusion: Together, the results demonstrate the salient role of paralinguistic prosodic cues in emotion word processing and congruence facilitation effect in multisensory integration. Our study contributes tonal language data on how linguistic and paralinguistic messages converge in multisensory speech processing and lays a foundation for further exploring the brain mechanisms of cross-channel/modal emotion integration with potential clinical applications.
Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study
The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text. The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner. We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models' outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text. The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model's performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts. The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI.
ECBTNet: English-Foreign Chinese intelligent translation via multi-subspace attention and hyperbolic tangent LSTM
The translation and sharing of languages around the world has become a necessary precondition for the movement of people. Teaching Chinese as a foreign language (TCFL) undertakes international function of spreading national culture. How to translate Chinese as a foreign language into English has become an important task. Machine translation has moved beyond the realm of theory to practical use as a result of advancements in computing. Deep learning is a prominent and relatively young subfield of machine learning that has shown promising results in a variety of fields. This paper aims to develop a TCFL-oriented English-Chinese neural machine translation model. First, this paper proposes a hyperbolic tangent long short-term memory network (HTLSTM). This will integrate future information and historical information to extract more sufficient contextual semantic information. Secondly, this paper proposes a multi-subspace attention mechanism. This integrates multiple attention calculation functions in the multi-subspace attention mechanism (MSATT). Thirdly, this paper combines HTLSTM with MSATT to construct an English-Chinese bilingual neural translation model called ECBTNet. The multi-subspace attention maps hidden state of hyperbolic tangent long-term short-term memory network to multiple subspaces. This then uses multiple attention calculation functions in the multi-attention mechanism when calculating the attention score. By applying different attention calculation functions in different subspaces to extract omni-directional context information features, accurate attention calculation results can be obtained. Finally, a systematic experiment is carried out, and the experimental data verify the feasibility of applying ECBTNet to the field of English-Chinese translation in TCFL.