Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
519
result(s) for
"AUTOMATIC INDEXING"
Sort by:
Automatic indexing of scientific articles on Library and Information Science with SISA, KEA and MAUI
by
Ortuño, Pedro Díaz
,
Gil-Leiva, Isidoro
,
Corrêa, Renato Fernandes
in
Access to information
,
Algorithms
,
Automatic
2022
This article evaluates the SISA (Automatic Indexing System), KEA (Keyphrase Extraction Algorithm) and MAUI (Multi-Purpose Automatic Topic Indexing) automatic indexing systems to find out how they perform in relation to human indexing. SISA's algorithm is based on rules about the position of terms in the different structural components of the document, while the algorithms for KEA and MAUI are based on machine learning and the statistical features of terms. For evaluation purposes, a document collection of 230 scientific articles from the Revista Española de Documentación Científica published by the Consejo Superior de Investigaciones Científicas (CSIC) was used, of which 30 were used for training tasks and were not part of the evaluation test set. The articles were written in Spanish and indexed by human indexers using a controlled vocabulary in the InDICES database, also belonging to the CSIC. The human indexing of these documents constitutes the baseline or golden indexing, against which to evaluate the output of the automatic indexing systems by comparing terms sets using the evaluation metrics of precision, recall, F-measure and consistency. The results show that the SISA system performs best, followed by KEA and MAUI.
Journal Article
Research on the Automatic Subject-Indexing Method of Academic Papers Based on Climate Change Domain Ontology
2023
It is important to classify academic papers in a fine-grained manner to uncover deeper implicit themes and semantics in papers for better semantic retrieval, paper recommendation, research trend prediction, topic analysis, and a series of other functions. Based on the ontology of the climate change domain, this study used an unsupervised approach to combine two methods, syntactic structure and semantic modeling, to build a framework of subject-indexing techniques for academic papers in the climate change domain. The framework automatically indexes a set of conceptual terms as research topics from the domain ontology by inputting the titles, abstracts and keywords of the papers using natural language processing techniques such as syntactic dependencies, text similarity calculation, pre-trained language models, semantic similarity calculation, and weighting factors such as word frequency statistics and graph path calculation. Finally, we evaluated the proposed method using the gold standard of manually annotated articles and demonstrated significant improvements over the other five alternative methods in terms of precision, recall and F1-score. Overall, the method proposed in this study is able to identify the research topics of academic papers more accurately, and also provides useful references for the application of domain ontologies and unsupervised data annotation.
Journal Article
Sometimes the apple does fall far from the tree: a case study on automatic indexing precision errors in PubMed
by
Wilson, Paije
in
Abstract and Indexing
,
Abstracting and Indexing - methods
,
Abstracting and Indexing - standards
2025
Objective: This case study identifies the presence and prevalence of precision indexing errors in a subset of automatically indexed MEDLINE records in PubMed (specifically, all MEDLINE records automatically indexed with the MeSH term Malus, the genus name for apple trees). In short, how well does automatic indexing compare [figurative] apples to [literal] apples? Methods: 1,705 MEDLINE records automatically indexed with the MeSH term Malus underwent title/abstract and full text screening to determine whether they were correctly indexed (i.e., the records were about Malus, meaning they discussed the literal fruit or tree) or incorrectly indexed (i.e., they were not about Malus, meaning they did not discuss the literal fruit or tree). The context and type of indexing error were documented for each erroneously indexed record. Results: 135 (7.9%) records were incorrectly indexed with the MeSH term Malus. The most common indexing error was due to the word \"apple\" being used in similes, metaphors, and idioms (80, or 59.2%), with the next most common error being due to \"apple\" being present in a name or term (50, or 37%). Additional indexing errors were attributed to the use of \"apple\" in acronyms, and, in one case, a reference to Sir Isaac Newton. Conclusion: As indicated by this study's findings, automatic indexing can commit errors when indexing records that have words with non-literal or alternative meanings in their titles or abstracts. Librarians should be mindful of the existence of automatic indexing errors, and instruct authors on how best to ameliorate the effects of them within their own manuscripts.
Journal Article
Filtering failure: the impact of automated indexing in Medline on retrieval of human studies for knowledge synthesis
by
Askin, Nicole
,
Epp, Carla
,
Ostapyk, Tyler
in
Abstract and Indexing
,
Abstracting and Indexing - methods
,
Abstracting and Indexing - standards
2025
Objective: Use of the search filter ‘exp animals/ not humans.sh’ is a well-established method in evidence synthesis to exclude non-human studies. However, the shift to automated indexing of Medline records has raised concerns about the use of subject-heading-based search techniques. We sought to determine how often this string inappropriately excludes human studies among automated as compared to manually indexed records in Ovid Medline. Methods: We searched Ovid Medline for studies published in 2021 and 2022 using the Cochrane Highly Sensitive Search Strategy for randomized trials. We identified all results excluded by the non-human-studies filter. Records were divided into sets based on indexing method: automated, curated, or manual. Each set was screened to identify human studies. Results: Human studies were incorrectly excluded in all three conditions, but automated indexing inappropriately excluded human studies at nearly double the rate as manual indexing. In looking specifically at human clinical randomized controlled trials (RCTs), the rate of inappropriate exclusion of automated-indexing records was seven times that of manually-indexed records. Conclusions: Given our findings, searchers are advised to carefully review the effect of the ‘exp animals/ not humans.sh’ search filter on their search results, pending improvements to the automated indexing process.
Journal Article
The expansion of Google Scholar versus Web of Science: a longitudinal study
by
Dodou, Dimitra
,
de Winter, Joost C. F
,
Zadpoor, Amir A
in
Chemistry
,
Citation analysis
,
Citations
2014
Web of Science (WoS) and Google Scholar (GS) are prominent citation services with distinct indexing mechanisms. Comprehensive knowledge about the growth patterns of these two citation services is lacking. We analyzed the development of citation counts in WoS and GS for two classic articles and 56 articles from diverse research fields, making a distinction between retroactive growth (i.e., the relative difference between citation counts up to mid-2005 measured in mid-2005 and citation counts up to mid-2005 measured in April 2013) and actual growth (i.e., the relative difference between citation counts up to mid-2005 measured in April 2013 and citation counts up to April 2013 measured in April 2013). One of the classic articles was used for a citation-by-citation analysis. Results showed that GS has substantially grown in a retroactive manner (median of 170 % across articles), especially for articles that initially had low citations counts in GS as compared to WoS. Retroactive growth of WoS was small, with a median of 2 % across articles. Actual growth percentages were moderately higher for GS than for WoS (medians of 54 vs. 41 %). The citation-by-citation analysis showed that the percentage of citations being unique in WoS was lower for more recent citations (6.8 % for citations from 1995 and later vs. 41 % for citations from before 1995), whereas the opposite was noted for GS (57 vs. 33 %). It is concluded that, since its inception, GS has shown substantial expansion, and that the majority of recent works indexed in WoS are now also retrievable via GS. A discussion is provided on quantity versus quality of citations, threats for WoS, weaknesses of GS, and implications for literature research and research evaluation.
Journal Article
Aplicação da folksonomia assistida na construção de corpus de referência em Ciência da Informação
by
Correa, Renato Fernandes
,
Silva, Bruno Felipe de Melo
in
Application
,
Assisted Folksonomy
,
Automatic Indexing
2020
O presente trabalho propõe e discute a aplicação da folksonomia assistida na construção de corpus de referência de artigos científicos da área de Ciência da Informação. A hipótese levantada é que tal aplicação pode garantir maior qualidade na indexação de artigos científicos e uma melhor avaliação dos sistemas de indexação automática através do corpus compilado. Para a pesquisa foi delimitado o uso do corpus composto por 60 artigos escritos em língua portuguesa selecionados por Souza (2005). A plataforma colaborativa de indexação social assistida do corpus foi configurada usando o software de gerenciamento de coleção denominado Tainacan. As etapas da pesquisa envolveram a configuração e preparação da coleção no Tainacan, a realização da indexação social assistida por grupos de usuários e análise dos resultados do processo de indexação. A análise da folksonomia assistida ocorreu mediante comparação daquilo que consta disponibilizado nos campos de metadados Assuntos e tags dos artigos. Como indicadores da qualidade da indexação obtiveram-se média de 28% do coeficiente de consistência, 32% de precisão, 68% de revocação, e 41% de medida F. As médias alcançadas representam bons níveis de consistência e revocação, e níveis satisfatórios de precisão e medida F, dando a entender que o uso da folksonomia assistida é útil no aperfeiçoamento da indexação do corpus de referência.
Journal Article
Deep neural model with self-training for scientific keyphrase extraction
by
Liao, Han
,
Zhu, Xun
,
Lyu, Chen
in
Annotations
,
Artificial intelligence
,
Artificial neural networks
2020
Scientific information extraction is a crucial step for understanding scientific publications. In this paper, we focus on scientific keyphrase extraction, which aims to identify keyphrases from scientific articles and classify them into predefined categories. We present a neural network based approach for this task, which employs the bidirectional long short-memory (LSTM) to represent the sentences in the article. On top of the bidirectional LSTM layer in our neural model, conditional random field (CRF) is used to predict the label sequence for the whole sentence. Considering the expensive annotated data for supervised learning methods, we introduce self-training method into our neural model to leverage the unlabeled articles. Experimental results on the ScienceIE corpus and ACL keyphrase corpus show that our neural model achieves promising performance without any hand-designed features and external knowledge resources. Furthermore, it efficiently incorporates the unlabeled data and achieve competitive performance compared with previous state-of-the-art systems.
Journal Article
Keyword Extraction: A Modern Perspective
by
Nomoto, Tadashi
in
Computer Imaging
,
Computer Science
,
Computer Systems Organization and Communication Networks
2023
The goal of keyword extraction is to extract from a text, words, or phrases indicative of what it is talking about. In this work, we look at keyword extraction from a number of different perspectives: Statistics, Automatic Term Indexing, Information Retrieval (IR), Natural Language Processing (NLP), and the emerging Neural paradigm. The 1990s have seen some early attempts to tackle the issue primarily based on text statistics [
13
,
17
]. Meanwhile, in IR, efforts were largely led by DARPA’s Topic Detection and Tracking (TDT) project [
2
]. In this contribution, we discuss how past innovations paved a way for more recent developments, such as LDA, PageRank, and Neural Networks. We walk through the history of keyword extraction over the last 50 years, noting differences and similarities among methods that emerged during the time. We conduct a large meta-analysis of the past literature using datasets from news media, science, and medicine to business and bureaucracy, to draw a general picture of what a successful approach would look like.
Journal Article
MeSH indexing based on automatically generated summaries
by
Jimeno-Yepes, Antonio J
,
Aronson, Alan R
,
Díaz, Alberto
in
Abstracting and Indexing - methods
,
Algorithms
,
Analysis
2013
Background
MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results.
Results
We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision.
Conclusions
Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading.
Journal Article
Information filtering based on corrected redundancy-eliminating mass diffusion
2017
Methods used in information filtering and recommendation often rely on quantifying the similarity between objects or users. The used similarity metrics often suffer from similarity redundancies arising from correlations between objects' attributes. Based on an unweighted undirected object-user bipartite network, we propose a Corrected Redundancy-Eliminating similarity index (CRE) which is based on a spreading process on the network. Extensive experiments on three benchmark data sets-Movilens, Netflix and Amazon-show that when used in recommendation, the CRE yields significant improvements in terms of recommendation accuracy and diversity. A detailed analysis is presented to unveil the origins of the observed differences between the CRE and mainstream similarity indices.
Journal Article