Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
15 result(s) for "Chinese language Data processing History."
Sort by:
The Chinese computer : a global history of the information age
\"Exploration of the largely unknown history of Chinese-language computing systems, accessible to an audience unfamiliar with the Chinese language or the technical workings of personal computers\"-- Provided by publisher.
On the fractal patterns of language structures
Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.
High-resolution climate reconstruction from historical Chinese weather records using optimized natural language processing
Reconstructing high-resolution climate data from historical documents is hindered by subjectivity and a lack of standardization. This study develops and validates a novel framework to overcome these challenges. In this paper, a historical weather classification lexicon is constructed by optimizing natural language processing (NLP) techniques. Leveraging semantic clustering and dynamic expansion, this lexicon effectively captures the linguistic diversity associated with weather events across different regions and intensity levels. Building on this lexicon, we propose a multi-dimensional index system to quantify historical weather grades. This system includes indicators such as weather intensity, agricultural impact, economic impact, social impact, and population casualties. For each indicator, scientific and objective weights are assigned using the entropy method combined with expert judgment. To validate the effectiveness of our approach, we extracted low-temperature weather records from historical documents of Guangdong and Hebei provinces in China. The results show that the overall trend of low-temperature weather in these two provinces is consistent with existing research on climate change during the Qing Dynasty. Moreover, the provincial trend maps reveal not only synchronous change patterns but also significant regional differences. A Random Forest model was employed to validate our index, achieving a classification accuracy of 94.0%, with Area Under the Curve(AUC) scores exceeding 0.98 for low-grade events. This data-driven methodology offers a replicable and scalable tool for converting qualitative historical narratives into high-resolution quantitative climate data, thereby enhancing our understanding of past climate variability and its societal impacts.
Tracing the geopolitical influences on the morphological and functional transformation in Guangdong merchant ships: Knowledge mining from the Ming and Qing maritime archives
Although the institutional history of ancient Chinese maritime trade has been extensively documented, the functional evolution of maritime vessels and their underlying drivers remains underexplored. Recent studies have moved beyond political explanations to explore the interplay of economic and technological dynamics. Using KH Coder for text mining, this study applies word frequency analysis and co-occurrence network modeling to investigate the geopolitical factors shaping the morphological evolution of Guangdong merchant ships in the Ming and Qing dynasties. A visual-comparative analysis further assesses the functional attributes of three representative ship types. Findings reveal that economic and military imperatives were the primary determinants of ship design, with political and geographic factors exerting secondary but supportive influence. For instance, increased piracy threats in the South China Sea prompted structural reinforcements for defensive purposes, while policy shifts under the Canton System encouraged hull designs optimized for high-capacity, long-distance trade. Guangdong’s maritime development was shaped largely by its strategic location and shipbuilding technologies. Ming-era vessels, constructed from teak and cedar, featured brightly painted, flat-bottomed hulls with elevated, streamlined prows. Qing-era ships employed lightweight alloys, muted color schemes, and reinforced double-planked hulls to enhance seaworthiness, while bow structures evolved into sharper and more angular forms. As Guangdong’s maritime trade transitioned from coastal routes to long-distance transoceanic networks—particularly with Europe—its ship design shifted progressively from broad and bulky to agile and eventually more durable configurations. These morphological transformations reflected not only external pressures, such as maritime security concerns and trade expansion, but also internal drivers, including institutional reforms and policy realignments that significantly influenced vessel design. This study contributes to the technical dimension of maritime historiography by emphasizing the merchant ship as an analytical nexus of institutional logic, technological systems, and geopolitical conditions. It offers both theoretical insight and methodological innovation for understanding the mechanisms behind ship design evolution and the spatial organization of premodern Chinese maritime networks.
What do the differences and commonalities in doctoral dissertation acknowledgments across disciplines reveal?
Acknowledgments in academic dissertations occupy a unique role within scholarly communication. Prior research has investigated acknowledgments through lenses such as funding attribution, genre analysis, and linguistic features. This study examines acknowledgments in doctoral dissertations from Chinese universities, organized by broad disciplinary categories. Utilizing BERTopic modeling, the research identifies topic keywords embedded within dissertation acknowledgments. Furthermore, computational linguistics techniques are employed to quantitatively evaluate the content and stylistic attributes of these acknowledgments, complemented by hierarchical clustering analysis to explore cross-disciplinary similarities. The topic modeling results indicate that acknowledgments by Chinese doctoral students frequently convey emotional reflections and exhibit distinct disciplinary traits. Additionally, hierarchical clustering shows that disciplines with similar characteristics exhibit greater similarity in the content and writing style of their acknowledgments, indicating that academic training influences researchers’ writing to some degree. This study seeks to catalyze further scholarly inquiry into this domain, advocating for expanded investigations from perspectives including psychology, neuroscience, and cross-cultural studies.
Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network
To address the challenges posed by the vast and complex knowledge information in cultural heritage design, such as low knowledge retrieval efficiency and limited visualization, this study proposes a method for knowledge extraction and knowledge graph construction based on graph attention neural networks (GAT). Using Tang Dynasty gold and silver artifacts as samples, we establish a joint knowledge extraction model based on GAT. The model employs the BERT pretraining model to encode collected textual knowledge data, conducts sentence dependency analysis, and utilizes GAT to allocate weights among entities, thereby enhancing the identification of target entities and their relationships. Comparative experiments on public datasets demonstrate that this model significantly outperforms baseline models in extraction effectiveness. Finally, the proposed method is applied to the construction of a knowledge graph for Tang Dynasty gold and silver artifacts. Taking the Gilded Musician Pattern Silver Cup as an example, this method provides designers with a visualized and interconnected knowledge collection structure.
Chinese Named Entity Recognition Method in History and Culture Field Based on BERT
With rapid development of the Internet, people have undergone tremendous changes in the way they obtain information. In recent years, knowledge graph is becoming a popular tool for the public to acquire knowledge. For knowledge graph of Chinese history and culture, most researchers adopted traditional named entity recognition methods to extract entity information from unstructured historical text data. However, the traditional named entity recognition method has certain defects, and it is easy to ignore the association between entities. To extract entities from a large amount of historical and cultural information more accurately and efficiently, this paper proposes one named entity recognition model combining Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). First, a BERT pre-trained language model is used to encode a single character to obtain a vector representation corresponding to each character. Then one Bidirectional Long Short-Term Memory (BiLSTM) layer is applied to semantically encode the input text. Finally, the label with the highest probability is output through the Conditional Random Field (CRF) layer to obtain each character’s category. This model uses the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model to replace the static word vectors trained in the traditional way. In comparison, the BERT pre-trained language model can dynamically generate semantic vectors according to the context of words, which improves the representation ability of word vectors. The experimental results prove that the model proposed in this paper has achieved excellent results in the task of named entity recognition in the field of historical culture. Compared with the existing named entity identification methods, the precision rate, recall rate, and F 1 value have been significantly improved.
A Chinese ancient book digital humanities research platform to support digital humanities research
Purpose With the rapid development of digital humanities, some digital humanities platforms have been successfully developed to support digital humanities research for humanists. However, most of them have still not provided a friendly digital reading environment and practicable social network analysis tool to support humanists on interpreting texts and exploring characters’ social network relationships. Moreover, the advancement of digitization technologies for the retrieval and use of Chinese ancient books is arising an unprecedented challenge and opportunity. For these reasons, this paper aims to present a Chinese ancient books digital humanities research platform (CABDHRP) to support historical China studies. In addition to providing digital archives, digital reading, basic search and advanced search functions for Chinese ancient books, this platform still provides two novel functions that can more effectively support digital humanities research, including an automatic text annotation system (ATAS) for interpreting texts and a character social network relationship map tool (CSNRMT) for exploring characters’ social network relationships. Design/methodology/approach This study adopted DSpace, an open-source institutional repository system, to serve as a digital archives system for archiving scanned images, metadata, and full texts to develop the CABDHRP for supporting digital humanities (DH) research. Moreover, the ATAS developed in the CABDHRP used the Node.js framework to implement the system’s front- and back-end services, as well as application programming interfaces (APIs) provided by different databases, such as China Biographical Database (CBDB) and TGAZ, used to retrieve the useful linked data (LD) sources for interpreting ancient texts. Also, Neo4j which is an open-source graph database management system was used to implement the CSNRMT of the CABDHRP. Finally, JavaScript and jQuery were applied to develop a monitoring program embedded in the CABDHRP to record the use processes from humanists based on xAPI (experience API). To understand the research participants’ perception when interpreting the historical texts and characters’ social network relationships with the support of ATAS and CSNRMT, semi-structured interviews with 21 research participants were conducted. Findings An ATAS embedded in the reading interface of CABDHRP can collect resources from different databases through LD for automatically annotating ancient texts to support digital humanities research. It allows the humanists to refer to resources from diverse databases when interpreting ancient texts, as well as provides a friendly text annotation reader for humanists to interpret ancient text through reading. Additionally, the CSNRMT provided by the CABDHRP can semi-automatically identify characters’ names based on Chinese word segmentation technology and humanists’ support to confirm and analyze characters’ social network relationships from Chinese ancient books based on visualizing characters’ social networks as a knowledge graph. The CABDHRP not only can stimulate humanists to explore new viewpoints in a humanistic research, but also can promote the public to emerge the learning interest and awareness of Chinese ancient books. Originality/value This study proposed a novel CABDHRP that provides the advanced features, including the automatic word segmentation of Chinese text, automatic Chinese text annotation, semi-automatic character social network analysis and user behavior analysis, that are different from other existed digital humanities platforms. Currently, there is no this kind of digital humanities platform developed for humanists to support digital humanities research.
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
Objective Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. Methods The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. Results Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. Conclusions In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.
Text Analysis and Visualization Research on the Hetu Dangse During the Qing Dynasty of China
In traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. In this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. This technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. In this study, we use the historical documents of the Qing Dynasty Hetu Dangse,preserved in the Archives of Liaoning Province, as data analysis samples. China’s Hetu Dangse is the largest Qing Dynasty thematic archive with Manchu and Chinese characters in the world. Through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and SVM (Support Vector Machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the Shengjing area of the Qing Dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. Through this, archivists can be guided practically in historical materials’ management and compilation.