Catalogue Search | MBRL

Track changes : a literary history of word processing

by Kirschenbaum, Matthew G., author in Word processing History. , Writing Technological innovations. , Creation (Literary, artistic, etc.) Technological innovations.

\"The story of writing in the digital age is every bit as messy as the ink-stained rags that littered the floor of Gutenberg's print shop or the hot molten lead of the linotype machine. During the period of the pivotal growth and widespread adoption of word processing as a writing technology, some authors embraced it as a marvel while others decried it as the death of literature. The product of years of archival research and numerous interviews conducted by the author, Track Changes is the first literary history of word processing. Matthew Kirschenbaum examines how the interests and ideals of creative authorship came to coexist with the computer revolution. Who were the first adopters? What kind of anxieties did they share? Was word processing perceived as just a better typewriter or something more? How did it change our understanding of writing? Track Changes balances the stories of individual writers with a consideration of how the seemingly ineffable act of writing is always grounded in particular instruments and media, from quills to keyboards. Along the way, we discover the candidates for the first novel written on a word processor, explore the surprisingly varied reasons why writers of both popular and serious literature adopted the technology, trace the spread of new metaphors and ideas from word processing in fiction and poetry, and consider the fate of literary scholarship and memory in an era when the final remnants of authorship may consist of folders on a hard drive or documents in the cloud.\"--Provided by publisher.

Book

Share this book

Add to My Shelf

Historical representations of social groups across 200 years of word embeddings from Google Books

by Caliskan, Aylin , Charlesworth, Tessa E. S. , Banaji, Mahzarin R. in Female , History, 19th Century , History, 20th Century

2022

Using word embeddings from 850 billion words in English-language Google Books, we provide an extensive analysis of historical change and stability in social group representations (stereotypes) across a long timeframe (from 1800 to 1999), for a large number of social group targets (Black, White, Asian, Irish, Hispanic, Native American, Man, Woman, Old, Young, Fat, Thin, Rich, Poor), and their emergent, bottom-up associations with 14,000 words and a subset of 600 traits. The results provide a nuanced picture of change and persistence in stereotypes across 200 y. Change was observed in the top-associated words and traits: Whether analyzing the top 10 or 50 associates, at least 50% of top associates changed across successive decades. Despite this changing content of top-associated words, the average valence (positivity/negativity) of these top stereotypes was generally persistent. Ultimately, through advances in the availability of historical word embeddings, this study offers a comprehensive characterization of both change and persistence in social group representations as revealed through books of the English-speaking world from 1800 to 1999.

Journal Article

Share this book

Add to My Shelf

Evolution of research topics in LIS between 1996 and 2019: an analysis based on latent Dirichlet allocation topic model

by Han, Xiaoyao in Allocation , Applications programs , Bibliographic coupling

2020

This study investigated the evolution of library and information science (LIS) by analyzing research topics in LIS journal articles. The analysis is divided into five periods covering the years 1996–2019. Latent Dirichlet allocation modeling was used to identify underlying topics based on 14,035 documents. An improved data-selection method was devised in order to generate a dynamic journal list that included influential journals for each period. Results indicate that (a) library science has become less prevalent over time, as there are no top topic clusters relevant to library issues since the period 2000–2005; (b) bibliometrics, especially citation analysis, is highly stable across periods, as reflected by the stable subclusters and consistent keywords; and (c) information retrieval has consistently been the dominant domain with interests gradually shifting to model-based text processing. Information seeking and behavior is also a stable field that tends to be dispersed among various topics rather than presented as its own subject. Information systems and organizational activities have been continuously discussed and have developed a closer relationship with e-commerce. Topics that occurred only once have undergone a change of technological context from the networks and Internet to social media and mobile applications.

Journal Article

Share this book

Add to My Shelf

The Revised Hierarchical Model: A critical review and assessment

by KROLL, JUDITH F. , VAN HELL, JANET G. , TOKOWICZ, NATASHA in Acknowledgment , Bilingualism , Cognition & reasoning

2010

Brysbaert and Duyck (this issue) suggest that it is time to abandon the Revised Hierarchical Model (Kroll and Stewart, 1994) in favor of connectionist models such as BIA+ (Dijkstra and Van Heuven, 2002) that more accurately account for the recent evidence on non-selective access in bilingual word recognition. In this brief response, we first review the history of the Revised Hierarchical Model (RHM), consider the set of issues that it was proposed to address and then evaluate the evidence that supports and fails to support the initial claims of the model. Although fifteen years of new research findings require a number of revisions to the RHM, we argue that the central issues to which the model was addressed, the way in which new lexical forms are mapped to meaning and the consequence of language learning history for lexical processing, cannot be accounted for solely within models of word recognition.

Journal Article

Share this book

Add to My Shelf

$On the fractal patterns of language structures$

On the fractal patterns of language structures

by Bernardes, Américo Tristão , Mello, Heliana , Ribeiro, Leonardo Costa in Algorithms , Analysis , Arabic language

2023

Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.

Journal Article

Share this book

Add to My Shelf

Normalized dataset for Sanskrit word segmentation and morphological parsing

by Kulkarni, Amba , Krishnan, Sriram , Huet, Gérard in Ancient languages , Computational Linguistics , Computer Science

2025

Sanskrit processing has seen a surge in the use of data-driven approaches over the past decade. Various tasks such as segmentation, morphological parsing, and dependency analysis have been tackled through the development of state-of-the-art models despite working with relatively limited datasets compared to other languages. However, a significant challenge lies in the availability of annotated datasets that are lexically, morphologically, syntactically, and semantically tagged. While syntactic and semantic tags are preferable for later stages of processing such as sentential parsing and disambiguation, lexical and morphological tags are crucial for low-level tasks of word segmentation and morphological parsing. The Digital Corpus of Sanskrit (DCS) is one notable effort that hosts over 650,000 lexically and morphologically tagged sentences from around 250 texts but also comes with its limitations at different levels of a sentence like chunk, segment, stem and morphological analysis. To overcome these limitations, we look at alternatives such as Sanskrit Heritage Segmenter (SH) and Saṃsādhanī tools, that provide information complementing DCS’ data. This work focuses on enriching the DCS dataset by incorporating analyses from SH, thereby creating a dataset that is rich in lexical and morphological information. Furthermore, this work also discusses the impact of such datasets on the performances of existing segmenters, specifically the Sanskrit Heritage Segmenter.

Journal Article

Share this book

Add to My Shelf

A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records

by Dong, Xishuang , Li, Xiangfang , Qian, Lijun in Algorithms , Bioinformatics , Biomedical and Life Sciences

2018

Background Electronic Medical Record (EMR) comprises patients’ medical information gathered by medical stuff for providing better health care. Named Entity Recognition (NER) is a sub-field of information extraction aimed at identifying specific entity terms such as disease, test, symptom, genes etc. NER can be a relief for healthcare providers and medical specialists to extract useful information automatically and avoid unnecessary and unrelated information in EMR. However, limited resources of available EMR pose a great challenge for mining entity terms. Therefore, a multitask bi-directional RNN model is proposed here as a potential solution of data augmentation to enhance NER performance with limited data. Methods A multitask bi-directional RNN model is proposed for extracting entity terms from Chinese EMR. The proposed model can be divided into a shared layer and a task specific layer. Firstly, vector representation of each word is obtained as a concatenation of word embedding and character embedding. Then Bi-directional RNN is used to extract context information from sentence. After that, all these layers are shared by two different task layers, namely the parts-of-speech tagging task layer and the named entity recognition task layer. These two tasks layers are trained alternatively so that the knowledge learned from named entity recognition task can be enhanced by the knowledge gained from parts-of-speech tagging task. Results The performance of our proposed model has been evaluated in terms of micro average F-score, macro average F-score and accuracy. It is observed that the proposed model outperforms the baseline model in all cases. For instance, experimental results conducted on the discharge summaries show that the micro average F-score and the macro average F-score are improved by 2.41% point and 4.16% point, respectively, and the overall accuracy is improved by 5.66% point. Conclusions In this paper, a novel multitask bi-directional RNN model is proposed for improving the performance of named entity recognition in EMR. Evaluation results using real datasets demonstrate the effectiveness of the proposed model.

Journal Article

Share this book

Add to My Shelf

Towards a historical dictionary for Arabic language

by Hadrich Belguith, Lamia , Laatar, Rim , Aloulou, Chafik in Arabic language , Artificial Intelligence , Bibliographic literature

2022

A historical dictionary is a language dictionary which studies the evolution of the construction of words and their meanings through the chronological stages the language has undergone. However, despite its richness, Arabic does not yet have a historical dictionary which helps to monitor its semantic development throughout history and to understand its knowledge and scientific heritage properly. The creation of such a dictionary should pass through several stages and requires a lot of effort. The most important step consists in determining a particular word sense in a given context which basically identifies the relevant meaning of a word according to the historical period in which it appeared. In this work, we suggest a dictionary of meanings that allows us to capture the semantic evolution of each Arabic word by focusing on the date of its first appearance and the way their meanings have evolved. Our method aims to provide relevant information about the oldest date of use of a given word, as well as its meanings, users, and sources. So, one of our major contributions of this paper is to determine the meaning of Arabic words according to where they appeared based on precise documents and contexts.

Journal Article

Share this book

Add to My Shelf

Recognizing two dialects in one written form: A Stroop study

by van Heuven, Vincent J. , Schiller, Niels O. , Wu, Junru in Asian cultural groups , Asian History , Bilingualism

2024

This study aims to examine the influence of dialectal experience on logographic visual word recognition. Two groups of Chinese monolectals and three groups of Chinese bi-dialectals performed Stroop color-naming in Standard Chinese (SC), and two of the bi-dialectal groups also in their regional dialects. The participant groups differed in dialectal experiences. The ink-character relation was manipulated in semantics, segments, and tones separately, as congruent, competing, or different, yielding ten Stroop conditions for comparison. All the groups showed Stroop interference for the conditions of segmental competition, as well as evidence for semantic activation by the characters. Bi-dialectal experience, even receptive, could benefit conflict resolution in the Stroop task. Chinese characters can automatically activate words in both dialects. Comparing naming in Standard Chinese and naming in the bi-dialectals’ regional dialects, still, a regional-dialect disadvantage suggests that the activation is biased with literacy and lexico-specific inter-dialectal relations.

Journal Article

Share this book

Add to My Shelf

Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis

by Gu, Dongxiao , Li, Min , Yang, Xuejie in Activities of daily living , Age groups , Allergens

2024

Allergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information. This study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR? This study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR. Our classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body's immune system and lead to the development of allergies. Our approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter