Catalogue Search | MBRL

Revisiting name recognition and candidate support: Experimental tests of the mere exposure hypothesis

by Green, Donald P. , Panagopoulos, Costas , Moniz, Philip in Acknowledgment , Candidates , Congressional elections

2025

Often lacking adequate information to guide their votes, voters may be susceptible to subtle psychological influences, including name recognition. For decades, scholars have found that voters are more likely to cast ballots for candidates whose names they recognize. These arguments imply that exposure to little-known candidates’ names increases electoral support. But research has seldom demonstrated a causal effect consistent with this “mere exposure” hypothesis, particularly under real-world conditions. We conduct three sets of experiments exposing subjects to the names of challengers in a range of electoral contexts across the United States. Results yield little support for the hypothesis that exposure increases electoral support. As name recognition may be insufficient without party labels, we also conduct experiments providing the candidates’ party affiliations, again finding little evidence of an effect. These findings cast doubt on the hypothesis that candidates, particularly challengers, who merely make their names known will thereby win more votes.

Journal Article

Share this book

Add to My Shelf

A spatially-aware algorithm for location extraction from structured documents

by Sharma, Praval , Joshi, Deepti , Soh, Leen-Kiat in Algorithms , Conditional random fields , Documents

2023

Place names facilitate locating and distinguishing geographic space where human activities and natural phenomena occur. Extracting place names at multiple spatial resolutions from text is beneficial in several tasks such as identifying the location of events, enriching gazetteers, discovering connections between events and places, etc. Most modern place name extraction approaches generalize the linguistic rules and lexical features as a universal rule and ignore patterns inherent in place names in the geographic contexts. As a result, they lack spatial awareness to effectively identify place names from different geographic contexts, especially the lesser-known place names. In this research, we develop a novel Spatially-Aware Location Extraction (SALE) algorithm for place name extraction from structured documents that uses a hybrid approach comprising of knowledge-driven and data-driven methods. We build a custom named entity recognition (NER) system based on the conditional random field (CRF) and train/ fine-tune it using spatial features extracted from a dataset based on a given geographic region. SALE uses multiple pathways, including the use of the spatially tuned NER to enhance the efficacy in our place names extraction. The experimental results using a large geographic region show that our algorithm outperforms well-known state-of-the-art place name recognizers.

Journal Article

Share this book

Add to My Shelf

City name recognition for Indian postal automation: Exploring script dependent and independent approach

by Sen, Shibaprasad , Roy, Kaushik , Obaidullah, Sk Md in Accuracy , Automation , Computer Communication Networks

2024

Postal documents are often used for official communication, online shopping, etc. At times, the delivery gets delayed due to multiple scripts leading to the need for postal sorting facilities. Understanding the destination city name plays a major part in solving automatic sorting problems as the same becomes more challenging due to the presence of handwritten documents. In order to develop an autonomous system to solve the problem, a Deep Learning-based system is proposed to recognize handwritten city names written in 6 major scripts namely Tamil, Roman, Devanagari, Bangla, Gurumukhi, and Arabic. Experiments were performed in both script-dependent (bi-stage) and independent approaches. In the bi-stage framework, we have obtained an average accuracy of 97.58 % along with a back-end script recognition rate of 99.07 % while in the script-independent approach, an accuracy of 97.03 % was obtained on a dataset consisting of 807 classes.

Journal Article

Share this book

Add to My Shelf

CHTopoNER model-based method for recognizing Chinese place names from social media information

by Qiu, Yue , Liu, Xingui , Jiang, Zhipeng in Artificial neural networks , Computational linguistics , Computer Appl. in Social and Behavioral Sciences

2024

Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.

Journal Article

Share this book

Add to My Shelf

Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries

by Liu, Shengyu , Chen, Qingcai , Wang, Xiaolong in biomedical texts , Construction , Dictionaries

2015

Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to existing drug dictionaries immediately after they are developed is also a challenge. In recent years, word embeddings that contain rich latent semantic information of words have been widely used to improve the performance of various natural language processing tasks. However, they have not been used in DNR systems. Compared to the semantic features based on drug dictionaries, the advantage of word embeddings lies in that learning them is unsupervised. In this paper, we investigate the effect of semantic features based on word embeddings on DNR and compare them with semantic features based on three drug dictionaries. We propose a conditional random fields (CRF)-based system for DNR. The skip-gram model, an unsupervised algorithm, is used to induce word embeddings on about 17.3 GigaByte (GB) unlabeled biomedical texts collected from MEDLINE (National Library of Medicine, Bethesda, MD, USA). The system is evaluated on the drug-drug interaction extraction (DDIExtraction) 2013 corpus. Experimental results show that word embeddings significantly improve the performance of the DNR system and they are competitive with semantic features based on drug dictionaries. F-score is improved by 2.92 percentage points when word embeddings are added into the baseline system. It is comparative with the improvements from semantic features based on drug dictionaries. Furthermore, word embeddings are complementary to the semantic features based on drug dictionaries. When both word embeddings and semantic features based on drug dictionaries are added, the system achieves the best performance with an F-score of 78.37%, which outperforms the best system of the DDIExtraction 2013 challenge by 6.87 percentage points.

Journal Article

Share this book

Add to My Shelf

A character social network relationship map tool to facilitate digital humanities research

by Chen, Yung-Ting , Chang, Chung , Chen, Chih-Ming in Acceptance , Annotations , Attitudes

2023

PurposeDigital humanities aim to use a digital-based revolutionary new way to carry out enhanced forms of humanities research more effectively and efficiently. This study develops a character social network relationship map tool (CSNRMT) that can semi-automatically assist digital humanists through human-computer interaction to more efficiently and accurately explore the character social network relationships from Chinese ancient texts for useful research findings.Design/methodology/approachWith a counterbalanced design, semi-structured in-depth interview, and lag sequential analysis, a total of 21 research subjects participated in an experiment to examine the system effectiveness and technology acceptance of adopting the ancient book digital humanities research platform with and without the CSNRMT to interpret the characters and character social network relationships.FindingsThe experimental results reveal that the experimental group with the CSNRMT support appears higher system effectiveness on the interpretation of characters and character social network relationships than the control group without the CSNRMT, but does not achieve a statistically significant difference. Encouragingly, the experimental group with the CSNRMT support presents remarkably higher technology acceptance than the control group without the CSNRMT. Furthermore, use behaviors analyzed by lag sequential analysis reveal that the CSNRMT could assist digital humanists in the interpretation of character social network relationships. The results of the interview present positive opinions on the integration of system interface, smoothness of operation, and external search function.Research limitations/implicationsCurrently, the system effectiveness of exploring the character social network relationships from texts for useful research findings by using the CSNRMT developed in this study will be significantly affected by the accuracy of recognizing character names and character social network relationships from Chinese ancient texts. The developed CSNRMT will be more practical when the offered information about character names and character social network relationships is more accurate and broad.Practical implicationsThis study develops an ancient book digital humanities research platform with an emerging CSNRMT that provides an easy-to-use real-time interaction interface to semi-automatically support digital humanists to perform digital humanities research with the need of exploring character social network relationships.Originality/valueAt present, a real-time social network analysis tool to provide a friendly interaction interface and effectively assist digital humanists in the digital humanities research with character social networks analysis is still lacked. This study thus presents the CSNRMT that can semi-automatically identify character names from Chinese ancient texts and provide an easy-to-use real-time interaction interface for supporting digital humanities research so that digital humanists could more efficiently and accurately establish character social network relationships from the analyzed texts to explore complicated character social networks relationship and find out useful research findings.

Journal Article

Share this book

Add to My Shelf

Solr-Plant: efficient extraction of plant names from text

by Restrepo, Maria Isabel , Sharma, Vivekanand , Sarkar, Indra Neil in Algorithms , Application programming interface , Batch processing

2019

Background The retrieval of plant-related information is a challenging task due to variations in species name mentions as well as spelling or typographical errors across data sources. Scalable solutions are needed for identifying plant name mentions from text and resolving them to accepted taxonomic names. Results An Apache Solr-based fuzzy matching system enhanced with the Smith-Waterman alignment algorithm (“Solr-Plant”) was developed for mapping and resolution to a plant name and synonym thesaurus. Evaluation of Solr-Plant suggests promising results in terms of both accuracy and processing efficiency on misspelled species names from two benchmark datasets: (1) SALVIAS and (2) National Center for Biotechnology Information (NCBI) Taxonomy. Additional evaluation using S800 text corpus also reflects high precision and recall. The latest version of the source code is available at https://github.com/bcbi/SolrPlantAPI . A REST-compliant web interface and service for Solr-Plant is hosted at http://bcbi.brown.edu/solrplant . Conclusion Automated techniques are needed for efficient and accurate identification of knowledge linked with biological scientific names. Solr-Plant complements the current state-of-the-art in terms of both efficiency and accuracy in identification of names restricted at species level. The approach can be extended to identify broader groups of organisms at different taxonomic levels. The results reflect potential utility of Solr-Plant as a data mining tool for extracting and correcting plant species names.

Journal Article

Share this book

Add to My Shelf

Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

by Li, Chi , Ma, Jing , Luo Yongcong in Algorithms , Commodities , Electronic commerce

2020

Commodity information must be matched to HSCode so as to be quickly through customs for export. So it is particularly important to identify entity name in the commodity title of e-commerce platform quickly and accurately. Aim at the problem, an approach based on TWs-LSTM is proposed to identify the entity name of commodity. In this paper, we apply TFIDF algorithm to manipulate text corpus of the commodity for getting the weight matrix of the commodity words. Meanwhile, we use the Word2Vec model to represent the semantic meanings of the words extracted from the bag of words. Then, the weight vector of commodity titles and every word vector of the title are combined into a new one-dimensional vector. We use these one-dimensional vectors to represent the commodity titles, named TWs model. Finally, we put the TWs vector into the LSTM for commodity entity name recognition. In the experimental stage, we compare the TWs-LSTM model with other text processing models for experimental calculation by dividing the commodity entity name data into a training set and a testing set. After applying the TWs-LSTM model, the F1-Score reached 64.58% with the commodity title corpus of the Tmall platform, where the TWs-LSTM achieves a state-of-the-art in comparison with the baseline models and previous studies.

Journal Article

Share this book

Add to My Shelf

LSTM-CRF for Drug-Named Entity Recognition

by Sun, Chengjie , Liu, Bingquan , Lin, Lei

2017

Drug-Named Entity Recognition (DNER) for biomedical literature is a fundamental facilitator of Information Extraction. For this reason, the DDIExtraction2011 (DDI2011) and DDIExtraction2013 (DDI2013) challenge introduced one task aiming at recognition of drug names. State-of-the-art DNER approaches heavily rely on hand-engineered features and domain-specific knowledge which are difficult to collect and define. Therefore, we offer an automatic exploring words and characters level features approach: a recurrent neural network using bidirectional long short-term memory (LSTM) with Conditional Random Fields decoding (LSTM-CRF). Two kinds of word representations are used in this work: word embedding, which is trained from a large amount of text, and character-based representation, which can capture orthographic feature of words. Experimental results on the DDI2011 and DDI2013 dataset show the effect of the proposed LSTM-CRF method. Our method outperforms the best system in the DDI2013 challenge.

Journal Article

Share this book

Add to My Shelf

BiLSTM-CRF for geological named entity recognition from the geoscience literature

by Li, Wenjia , Liufeng Tao , Wu, Liang in Conditional random fields , Earth science , Geology

2019

Many detailed geoscience reports lie unused, offering both challenges and opportunities for information extraction. In geoscience research, geological named entity recognition (GNER) is an important task in the field of geoscience information extraction. Regarding numerical geoscience data, research on information extraction remains limited. Most conventional NER approaches are heavily dependent on feature engineering, and such sentence-level-based methods suffer from the tagging inconsistency problem. Based on the above observations, this paper proposes a neural network approach, namely, attention-based bidirectional long short-term memory with a conditional random field layer (Att-BiLSTM-CRF), for name entity recognition to extract information entities describing geoscience information from geoscience reports. This approach leverages global information learned from an attention mechanism to enforce tagging consistency across multiple instances of the same token in a document. Experiments on the constructed dataset show that our method achieves comparable performance to that of other state-of-the-art systems. Additionally, our method achieved an average F1 score of 91.47% in the NER extraction task.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter