Catalogue Search | MBRL

Mean Normalized Retrieval Order (MNRO): a new content-based image retrieval performance measure

by Chatzichristofis, Savvas A. , Boutalis, Yiannis S. , Iakovidou, Chryssanthi in Analysis , Applied sciences , Archives & records

2014

The results of a content based image retrieval system can be evaluated by several performance measures, each one employing different evaluation criteria. Many of the methods used in the field of information retrieval have been adopted for use in image retrieval systems. This paper reviews the most widely used performance measures for retrieval evaluation with particular emphasis on the assumptions made during their design. More specifically, it focuses on the design principles of the commonly used Mean Average Precision (MAP) and Average Normalized Modified Retrieval Rank (ANMRR), pinpointing their limitations. It also proposes a new performance measure for image retrieval systems, the Mean Normalized Retrieval Order (MNRO) , whose effectiveness is demonstrated through a wide range of experiments. Initial experiments were conducted on artificially produced query trials and evaluations. Experiments on a large database demonstrate the ability of MNRO to take into account the generality of the queries during the retrieval procedure. Furthermore, the results of a case study show that the proposed performance measure is closer to human evaluations, in comparison to MAP and ANMRR. Lastly, in order to encourage researchers and practitioners to use the proposed performance measure, we present the experimental results produced by a large number of state of the art descriptors applied on three well-known benchmarking databases.

Journal Article

Share this book

Add to My Shelf

Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study

by Spadaro, Anthony , Perrone, Jeanmarie , Lakamana, Sahithi in Answers , Attitudes , Augmentation

2025

The increasing use of social media to share lived and living experiences of substance use presents a unique opportunity to obtain information on side effects, use patterns, and opinions on novel psychoactive substances. However, due to the large volume of data, obtaining useful insights through natural language processing technologies such as large language models is challenging. This paper aims to develop a retrieval-augmented generation (RAG) architecture for medical question answering pertaining to clinicians' queries on emerging issues associated with health-related topics, using user-generated medical information on social media. We proposed a two-layer RAG framework for query-focused answer generation and evaluated a proof of concept for the framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. Our modular framework generates individual summaries followed by an aggregated summary to answer medical queries from large amounts of user-generated social media data in an efficient manner. We compared the performance of a quantized large language model (Nous-Hermes-2-7B-DPO), deployable in low-resource settings, with GPT-4. For this proof-of-concept study, we used user-generated data from Reddit to answer clinicians' questions on the use of xylazine and ketamine. Our framework achieves comparable median scores in terms of relevance, length, hallucination, coverage, and coherence when evaluated using GPT-4 and Nous-Hermes-2-7B-DPO, evaluated for 20 queries with 76 samples. There was no statistically significant difference between GPT-4 and Nous-Hermes-2-7B-DPO for coverage (Mann-Whitney U=733.0; n =37; n =39; P=.89 two-tailed), coherence (U=670.0; n =37; n =39; P=.49 two-tailed), relevance (U=662.0; n =37; n =39; P=.15 two-tailed), length (U=672.0; n =37; n =39; P=.55 two-tailed), and hallucination (U=859.0; n =37; n =39; P=.01 two-tailed). A statistically significant difference was noted for the Coleman-Liau Index (U=307.5; n =20; n =16; P<.001 two-tailed). Our RAG framework can effectively answer medical questions about targeted topics and can be deployed in resource-constrained settings.

Journal Article

Share this book

Add to My Shelf

Leaf disease image retrieval with object detection and deep metric learning

by Wang, Yi , Peng, Yingshu in Algorithms , Classification , Computer vision

2022

Rapid identification of plant diseases is essential for effective mitigation and control of their influence on plants. For plant disease automatic identification, classification of plant leaf images based on deep learning algorithms is currently the most accurate and popular method. Existing methods rely on the collection of large amounts of image annotation data and cannot flexibly adjust recognition categories, whereas we develop a new image retrieval system for automated detection, localization, and identification of individual leaf disease in an open setting, namely, where newly added disease types can be identified without retraining. In this paper, we first optimize the YOLOv5 algorithm, enhancing recognition ability in small objects, which helps to extract leaf objects more accurately; secondly, integrating classification recognition with metric learning, jointly learning categorizing images and similarity measurements, thus, capitalizing on prediction ability of available image classification models; and finally, constructing an efficient and nimble image retrieval system to quickly determine leaf disease type. We demonstrate detailed experimental results on three publicly available leaf disease datasets and prove the effectiveness of our system. This work lays the groundwork for promoting disease surveillance of plants applicable to intelligent agriculture and to crop research such as nutrition diagnosis, health status surveillance, and more.

Journal Article

Share this book

Add to My Shelf

Retrieval-augmented Chinese text-to-SQL generation for conversational bibliographic search

by Zhu, Mark Xuefang , Li, Guo , Wang, Zhenyu in Ablation , Accuracy , Alignment

2025

To overcome the limitations of current bibliographic search systems, such as low semantic precision and inadequate handling of complex queries, this study introduces a novel conversational search framework for the Chinese bibliographic domain. Our approach makes several contributions. We first developed BibSQL, the first Chinese Text-to-SQL dataset for bibliographic metadata. Using this dataset, we built a two-stage conversational system that combines semantic retrieval of relevant question-SQL pairs with in-context SQL generation by large language models (LLMs). To enhance retrieval, we designed SoftSimMatch, a supervised similarity learning model that improves semantic alignment. We further refined SQL generation using a Program-of-Thoughts (PoT) prompting strategy, which guides the LLM to produce more accurate output by first creating Python pseudocode. Experimental results demonstrate the framework’s effectiveness. Retrieval-augmented generation (RAG) significantly boosts performance, achieving up to 96.6% execution accuracy. Our SoftSimMatch-enhanced RAG approach surpasses zero-shot prompting and random example selection in both semantic alignment and SQL accuracy. Ablation studies confirm that the PoT strategy and self-correction mechanism are particularly beneficial under low-resource conditions, increasing one model’s exact matching accuracy from 74.8% to 82.9%. While acknowledging limitations such as potential logic errors in complex queries and reliance on domain-specific knowledge, the proposed framework shows strong generalizability and practical applicability. By uniquely integrating semantic similarity learning, RAG, and PoT prompting, this work establishes a scalable foundation for future intelligent bibliographic retrieval systems and domain-specific Text-to-SQL applications.

Journal Article

Share this book

Add to My Shelf

Enhancing image retrieval through optimal barcode representation

by Makrehchi, Masoud , Khosrowshahli, Rasa , Kheiri, Farnaz in 631/67 , 692/308 , Bar codes

2025

Data binary encoding has proven to be a versatile tool for optimizing data processing and memory efficiency in various machine learning applications. This includes deep barcoding, generating barcodes from deep learning feature extraction for image retrieval of similar cases among millions of indexed images. Despite the recent advancement in barcode generation methods, converting high-dimensional feature vectors (e.g., deep features) to compact and discriminative binary barcodes is still an urgent necessity and remains an unresolved problem. Difference-based binarization of features is one of the most efficient binarization methods, transforming continuous feature vectors into binary sequences and capturing trend information. However, the performance of this method is highly dependent on the ordering of the input features, leading to a significant combinatorial challenge. This research addresses this problem by optimizing feature sequences based on retrieval performance metrics. Our approach identifies optimal feature orderings, leading to substantial improvements in retrieval effectiveness compared to arbitrary or default orderings. We assess the performance of the proposed approach in various medical and non-medical image retrieval tasks. This evaluation includes medical images from The Cancer Genome Atlas (TCGA), a comprehensive publicly available dataset, as well as COVID-19 Chest X-rays dataset. In addition, we evaluate the proposed approach on non-medical benchmark image datasets, such as CIFAR-10, CIFAR-100, and Fashion-MNIST. Our findings demonstrate the importance of optimizing binary barcode representation to significantly enhance accuracy for fast image retrieval across a wide range of applications, highlighting the applicability and potential of barcodes in various domains.

Journal Article

Share this book

Add to My Shelf

Three methods for fair ranking of multiple protected items

by Melucci, Massimo in 639/705/117 , 639/705/258 , 639/705/531

2025

Three approaches to fair ranking in retrieval systems are compared in this paper: mPFR, which is based on the theory of preferences and eigensystems; cRR, which is a simple‘ ’round robin” method; and mMLP, which is based on linear programming. In order to increase fairness without sacrificing retrieval effectiveness, the techniques post-process the rankings that a retrieval system sends back to users. The findings demonstrate that when it comes to protecting elements, mPFR and cRR accomplish the same level of effectiveness and fairness. Despite being computationally more costly than the latter, the former’s mathematical architecture enables the ranking of reordering techniques at various levels of complexity, while mMLP might not be practical for datasets that are too big. Therefore, the choice between these methods often hinges on the specific use case and dataset size, where trade-offs between computational efficiency and desired fairness come into play. Future research could explore optimizing these techniques further to enhance their applicability across diverse scenarios, ensuring that both fairness and effectiveness are maintained.

Journal Article

Share this book

Add to My Shelf

NLP-based personal learning assistant for school education

by Mathew, Ann Neethu , V., Rohini , Paulose, Joy in Artificial intelligence , Chatbots , Cloud computing

2021

Computer-based knowledge and computation systems are becoming major sources of leverage for multiple industry segments. Hence, educational systems and learning processes across the world are on the cusp of a major digital transformation. This paper seeks to explore the concept of an artificial intelligence and natural language processing (NLP) based intelligent tutoring system (ITS) in the context of computer education in primary and secondary schools. One of the components of an ITS is a learning assistant, which can enable students to seek assistance as and when they need, wherever they are. As part of this research, a pilot prototype chatbot was developed, to serve as a learning assistant for the subject Scratch (Scratch is a graphical utility used to teach school children the concepts of programming). By the use of an open source natural language understanding (NLU) or NLP library, and a slackbased UI, student queries were input to the chatbot, to get the sought explanation as the answer. Through a two-stage testing process, the chatbot’s NLP extraction and information retrieval performance were evaluated. The testing results showed that the ontology modelling for such a learning assistant was done relatively accurately, and shows its potential to be pursued as a cloud-based solution in future.

Journal Article

Share this book

Add to My Shelf

A fast retrieval method for multilevel redundant data in grid resource business middle office based on improved decision tree algorithm

by Sun, Wei , Liu, Hui , Zou, Zhiwei in 639/166 , 639/705 , Accuracy

2025

The current power grid business handles massive data operations where data retrieval frequently encounters redundancy issues. Conventional decision tree-based methods struggle to achieve accurate data acquisition when facing redundant interference. To address this challenge, this study proposes a multi-level redundant data retrieval method using an improved decision tree algorithm for grid resource business center platforms. The methodology first establishes a multi-level data decision tree using grid resource business middle-platform data, then applies a decision tree pruning algorithm based on Akaike information criterion. The ant colony algorithm optimizes the pruning parameters of the decision tree model, and after obtaining optimal pruning parameters, processes the grid resource business middle-platform data decision tree to generate an improved version. Subsequently, the multi-level redundant data retrieval method based on the improved decision tree implements fast retrieval of hierarchical redundant data in grid resource business through designed repetitive data processing flows and multi-level redundant data discrimination mechanisms. The experimental results demonstrate that the improved decision tree algorithm improves multi-level redundant data retrieval accuracy by 14%. The optimized decision tree model for middle-platform data achieves more comprehensive representation of grid resource service data hierarchies and enables effective retrieval of multi-level redundant data including both image and text categories from the middle-platform data. The maximum F1-score reaches 0.99 with retrieval time of only 4.5 s, which is 1.5 s below the predefined threshold, confirming excellent practical performance.

Journal Article

Share this book

Add to My Shelf

A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval

by Pan, Min , Liu, Yu , Huang, Ellen Anne in 639/705/117 , 639/705/258 , Humanities and Social Sciences

2024

Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion. In fact, the semantic information within pseudo-relevant documents plays a critical role in selecting appropriate query expansion terms. Therefore, this paper proposes a novel approach that leverages pre-trained models to extract multi-dimensional semantic information from pseudo-relevant documents, offering more possibilities for query expansion. First, traditional sparse retrieval methods are used in the initial retrieval stage to ensure efficiency, and term-level weights are calculated based on statistical information. Then, the pre-trained model encodes both the query and the sentences and passages from the documents, extracting sentence-level and passage-level semantic similarities to the query. Finally, these semantic weights are combined with the term-level weights to generate an improved query for the second retrieval round. We conducted experiments on five TREC datasets and a medical dataset, showing improvements in official metrics such as MAP and P@10. The results demonstrate the effectiveness of utilizing multi-dimensional semantic information from pseudo-relevant documents to optimize query expansion. This study offers new insights into how the semantic information of pseudo-relevant documents can be effectively harnessed to enhance retrieval performance.

Journal Article

Share this book

Add to My Shelf

A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system

by Pamula, Rajendra , Chauhan, D. S. , Sharma, Dilip Kumar in Artificial Intelligence , Computational Intelligence , Datasets

2024

Nowadays, searching the relevant documents from a large dataset becomes a big challenge. Automatic query expansion is one of the techniques, which addresses this problem by refining the query. A new query expansion approach using cuckoo search and accelerated particle swarm optimization technique is proposed in this paper. The proposed approach mainly focused to find the most relevant expanded query rather than suitable expansion terms. In this paper, Fuzzy logic is also employed, which improves the performance of accelerated particle swarm optimization by controlling various parameters. We have compared the proposed approach with other existing and recently developed automatic query expansion approaches on various evaluating parameters such as average recall, average precision, Mean-Average Precision, F-measure and precision-recall graph. We have evaluated the performance of all approaches on three datasets CISI, CACM and TREC-3 . The results obtained for all three datasets depict that the proposed approach gets better results in comparison to other automatic query expansion approaches.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter