Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,658
result(s) for
"Retrieval augmented generation"
Sort by:
A Comprehensive Survey of Retrieval-Augmented Large Language Models for Decision Making in Agriculture: Unsolved Problems and Research Opportunities
by
Vizniuk, Artem
,
Siwocha, Agnieszka
,
Smoląg, Jacek
in
Agricultural equipment
,
Agriculture
,
Decision support systems
2025
The breakthrough in developing large language models (LLMs) over the past few years has led to their widespread implementation in various areas of industry, business, and agriculture. The aim of this article is to critically analyse and generalise the known results and research directions on approaches to the development and utilisation of LLMs, with a particular focus on their functional characteristics when integrated into decision support systems (DSSs) for agricultural monitoring. The subject of the research is approaches to the development and integration of LLMs into DSSs for agrotechnical monitoring. The main scientific and applied results of the article are as follows: the world experience of using LLMs to improve agricultural processes has been analysed; a critical analysis of the functional characteristics of LLMs has been carried out, and the areas of application of their architectures have been identified; the necessity of focusing on retrieval-augmented generation (RAG) as an approach to solving one of the main limitations of LLMs, which is the limited knowledge base of training data, has been established; the characteristics and prospects of using LLMs for DSSs in agriculture have been analysed to highlight trustworthiness, explainability and bias reduction as priority areas of research; the potential socio-economic effect from the implementation of LLMs and RAG in the agricultural sector is substantiated.
Journal Article
Challenges and Solutions in Applying Large Language Models to Guideline-Based Management Planning and Automated Medical Coding in Health Care: Algorithm Development and Validation
by
Jewell, Paul
,
Sarvari, Peter
,
Taylor, Rosie
in
AI Applications in Biomedical Engineering
,
Applications programs
,
Clinical Engineering
2025
Diagnostic errors and administrative burdens, including medical coding, remain major challenges in health care. Large language models (LLMs) have the potential to alleviate these problems, but their adoption has been limited by concerns regarding reliability, transparency, and clinical safety.
This study introduces and evaluates 2 LLM-based frameworks, implemented within the Rhazes Clinician platform, designed to address these challenges: generation-assisted retrieval-augmented generation (GARAG) for automated evidence-based treatment planning and generation-assisted vector search (GAVS) for automated medical coding.
GARAG was evaluated on 21 clinical test cases created by medically qualified authors. Each case was executed 3 times independently, and outputs were assessed using 4 criteria: correctness of references, absence of duplication, adherence to formatting, and clinical appropriateness of the generated management plan. GAVS was evaluated on 958 randomly selected admissions from the Medical Information Mart for Intensive Care (MIMIC)-IV database, in which billed International Classification of Diseases, Tenth Revision (ICD-10) codes served as the ground truth. Two approaches were compared: a direct GPT-4.1 baseline prompted to predict ICD-10 codes without constraints and GAVS, in which GPT-4.1 generated diagnostic entities that were each mapped onto the top 10 matching ICD-10 codes through vector search.
Across the 63 outputs, 62 (98.4%) satisfied all evaluation criteria, with the only exception being a minor ordering inconsistency in one repetition of case 14. For GAVS, the 958 admissions contained 8576 assigned ICD-10 subcategory codes (1610 unique). The vanilla LLM produced 131,329 candidate codes, whereas GAVS produced 136,920. At the subcategory level, the vanilla LLM achieved 17.95% average recall (15.86% weighted), while GAVS achieved 20.63% (18.62% weighted), a statistically significant improvement (P<.001). At the category level, performance converged (32.60% vs 32.58% average weighted recall; P=.99).
GARAG demonstrated a workflow that grounds management plans in diagnosis-specific, peer-reviewed guideline evidence, preserving fine-grained clinical detail during retrieval. GAVS significantly improved fine-grained diagnostic coding recall compared with a direct LLM baseline. Together, these frameworks illustrate how LLM-based methods can enhance clinical decision support and medical coding. Both were subsequently integrated into Rhazes Clinician, a clinician-facing web application that orchestrates LLM agents to call specialized tools, providing a single interface for physician use. Further independent validation and large-scale studies are required to confirm generalizability and assess their impact on patient outcomes.
Journal Article
Integrating Graph Retrieval-Augmented Generation into Prescriptive Recommender Systems
by
Weller, Julian
,
Schenck, Wolfram
,
Migenda, Nico
in
advanced data analytics
,
Algorithms
,
Artificial intelligence
2025
Making time-critical decisions with serious consequences is a daily aspect of work environments. To support the process of finding optimal actions, data-driven approaches are increasingly being used. The most advanced form of data-driven analytics is prescriptive analytics, which prescribes actionable recommendations for users. However, the produced recommendations rely on complex models and optimization techniques that are difficult to understand or justify to non-expert users. Currently, there is a lack of platforms that offer easy integration of domain-specific prescriptive analytics workflows into production environments. In particular, there is no centralized environment and standardized approach for implementing such prescriptive workflows. To address these challenges, large language models (LLMs) can be leveraged to improve interpretability by translating complex recommendations into clear, context-specific explanations, enabling non-experts to grasp the rationale behind the suggested actions. Nevertheless, we acknowledge the inherent black-box nature of LLMs, which may introduce limitations in transparency. To mitigate these limitations and to provide interpretable recommendations based on real user knowledge, a knowledge graph is integrated. In this paper, we present and validate a prescriptive analytics platform that integrates ontology-based graph retrieval-augmented generation (GraphRAG) to enhance decision making by delivering actionable and context-aware recommendations. For this purpose, a knowledge graph is created through a fully automated workflow based on an ontology, which serves as the backbone of the prescriptive platform. Data sources for the knowledge graph are standardized and classified according to the ontology by employing a zero-shot classifier. For user-friendly presentation, we critically examine the usability of GraphRAG in prescriptive analytics platforms. We validate our prescriptive platform in a customer clinic with industry experts in our IoT-Factory, a dedicated research environment.
Journal Article
Enhancing Health Information Retrieval with RAG by prioritizing topical relevance and factual accuracy
2025
The exponential surge in online health information, coupled with its increasing use by non-experts, highlights the pressing need for advanced Health Information Retrieval (HIR) models that consider not only topical relevance but also the factual accuracy of the retrieved information, given the potential risks associated with health misinformation. To this aim, this paper introduces a solution driven by Retrieval-Augmented Generation (RAG), which leverages the capabilities of generative Large Language Models (LLMs) to enhance the retrieval of health-related documents grounded in scientific evidence. In particular, we propose a three-stage model: in the first stage, the user’s query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature. In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText). In the last stage, the documents to be retrieved are evaluated and ranked both from the point of view of topical relevance and factual accuracy by means of their comparison with GenText, either through stance detection or semantic similarity. In addition to calculating factual accuracy, GenText can offer a layer of explainability for it, aiding users in understanding the reasoning behind the retrieval. Experimental evaluation of our model on benchmark datasets and against baseline models demonstrates its effectiveness in enhancing the retrieval of both topically relevant and factually accurate health information, thus presenting a significant step forward in the health misinformation mitigation problem.
Journal Article
Maximizing RAG efficiency: A comparative analysis of RAG methods
2025
This paper addresses the optimization of retrieval-augmented generation (RAG) processes by exploring various methodologies, including advanced RAG methods. The research, driven by the need to enhance RAG processes as highlighted by recent studies, involved a grid-search optimization of 23,625 iterations. We evaluated multiple RAG methods across different vectorstores, embedding models, and large language models, using cross-domain datasets and contextual compression filters. The findings emphasize the importance of balancing context quality with similarity-based ranking methods, as well as understanding tradeoffs between similarity scores, token usage, runtime, and hardware utilization. Additionally, contextual compression filters were found to be crucial for efficient hardware utilization and reduced token consumption, despite the evident impacts on similarity scores, which may be acceptable depending on specific use cases and RAG methods.
Journal Article
From Conversation to Standardized Terminology: An LLM‐RAG Approach for Automated Health Problem Identification in Home Healthcare
2025
Background With ambient listening systems increasingly adopted in healthcare, analyzing clinician‐patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time‐consuming and labor‐intensive. This study aims to automate health problem identification from clinician‐patient conversations using large language models (LLMs) with retrieval‐augmented generation (RAG). Methods Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician‐patient encounters in home healthcare. RAG‐enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero‐shot, few‐shot, and chain‐of‐thought). Three LLMs—Llama 3.1‐8B‐Instruct, GPT‐4o‐mini, and GPT‐o3‐mini—were compared using precision, recall, and F1‐score against expert annotations. Results The optimal configuration used a 1‐utterance context window, top k = 15, top p = 0.6, and few‐shot learning with chain‐of‐thought prompting. GPT‐4o‐mini achieved the highest F1‐score (0.90) for both problem and sign/symptom identification, followed by GPT‐o3‐mini (0.83/0.82), while Llama 3.1‐8B‐Instruct performed worst (0.73/0.72). Conclusions Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery. Clinical Relevance Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.
Journal Article
Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging
2025
Purpose
In radiology, large language models (LLMs), including ChatGPT, have recently gained attention, and their utility is being rapidly evaluated. However, concerns have emerged regarding their reliability in clinical applications due to limitations such as hallucinations and insufficient referencing. To address these issues, we focus on the latest technology, retrieval-augmented generation (RAG), which enables LLMs to reference reliable external knowledge (REK). Specifically, this study examines the utility and reliability of a recently released RAG-equipped LLM (RAG-LLM), NotebookLM, for staging lung cancer.
Materials and methods
We summarized the current lung cancer staging guideline in Japan and provided this as REK to NotebookLM. We then tasked NotebookLM with staging 100 fictional lung cancer cases based on CT findings and evaluated its accuracy. For comparison, we performed the same task using a gold-standard LLM, GPT-4 Omni (GPT-4o), both with and without the REK. For GPT-4o, the REK was provided directly within the prompt rather than through RAG.
Results
NotebookLM achieved 86% diagnostic accuracy in the lung cancer staging experiment, outperforming GPT-4o, which recorded 39% accuracy with the REK and 25% without it. Moreover, NotebookLM demonstrated 95% accuracy in searching reference locations within the REK.
Conclusion
NotebookLM, a RAG-LLM, successfully performed lung cancer staging by utilizing the REK, demonstrating superior performance compared to GPT-4o (without RAG). Additionally, it provided highly accurate reference locations within the REK, allowing radiologists to efficiently evaluate the reliability of NotebookLM’s responses and detect possible hallucinations. Overall, this study highlights the potential of NotebookLM, a RAG-LLM, in image diagnosis.
Journal Article
SKiM-GPT: combining biomedical literature-based discovery with large language model hypothesis evaluation
2025
Background
Generating and testing hypotheses is a critical aspect of biomedical science. Typically, researchers generate hypotheses by carefully analyzing available information and making logical connections, which are then tested. The accelerating growth of biomedical literature makes it increasingly difficult to keep pace with connections between biological entities emerging across biomedical research. Recently developed automated means of generating hypotheses can generate many more hypotheses than can be easily tested. One such approach involves literature‑based discovery (LBD) systems such as Serial KinderMiner (SKiM), which surfaces putative
A‑B‑C
links derived from term co‑occurrence. However, LBD systems leave three critical gaps: (i) they find statistical associations, not biological relationships; (ii) they can produce false‑positive leads; and (iii) they do not assess agreement with a hypothesis in question. As a result, LBD search results often require costly manual curation to be of practical utility to the researcher. Large language models (LLMs) have the potential to automate much of this curation step, but standalone LLMs are hampered by hallucinations, lack of transparency in information sources, and the inability to reference data not included in the training corpus.
Results
We introduce SKiM-GPT, a retrieval-augmented generation (RAG) system that combines SKiM’s co-occurrence search and retrieval with frontier LLMs to evaluate user-defined hypotheses. For every chosen
A
-
B
-
C
SKiM hit, SKiM-GPT retrieves appropriate PubMed abstract texts, filters out irrelevant abstracts with a fine-tuned relevance model, and prompts an LLM to evaluate the user’s hypothesis, given the relevant abstracts. Importantly, the SKiM-GPT system is transparent and human-verifiable: it displays the retrieved abstracts, the hypothesis score, and a justification for the score grounded in the texts and written in natural language. On a benchmark consisting of 14 disease-gene-drug hypotheses, SKiM-GPT achieves strong ordinal agreement with four expert biologists (Cohen’s κ = 0.84), demonstrating its ability to replicate expert judgment.
Conclusions
SKiM-GPT is open-source (
https://github.com/stewart-lab/skimgpt
) and available through a web interface (
https://skim.morgridge.org
), enabling both wet-lab and computational researchers to systematically and efficiently evaluate biomedical hypotheses at scale.
Journal Article
From text to DSM: evaluating the impact of writing style and entity naming on LLM-based retrieval of asymmetrical indirect design dependencies
2026
The design structure matrix (DSM) is an established method for modelling design dependencies but manually putting one together can be resource intensive. The Auto-DSM workflow integrates a large language model (LLM) with retrieval-augmented generation (RAG) to extract system dependencies from input data, which are then used to automatically generate a corresponding DSM. This paper reports on an evaluation study that uses the Auto-DSM workflow as a basis to evaluate the retrieval of asymmetrical direct and indirect system dependencies from text data. Five LLMs, namely GPT-4o, GPT-4, Llama 3, DeepSeek-R1, and TinyLlama were used in this work. Auto-DSM with GPT-4 produced a complete DSM with an accuracy of 0.981 (
SD
= 0.025,
N
= 600) when plain dependency descriptions were used and reached a full accuracy of 1.000 (
SD
= 0.000,
N
= 5) when the same dependencies were presented in the form of patent claims. It was revealed that the way system entities are named in input data can affect accuracy and the reporting of path distance between entities is influenced by the writing style and format of the data. The findings of this work can be used to support the development of automated DSM generation, enabling more advanced DSM techniques to be built on.
Journal Article
Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation
2025
Legal text summarization presents distinct challenges due to the intricate and domain-specific nature of legal language. This paper introduces a novel framework integrating dynamic Retrieval-Augmented Generation (RAG) with domain-specific adaptation to enhance the accuracy and contextual relevance of legal document summaries. The proposed Dynamic Legal RAG system achieves a vital form of symmetry between information retrieval and content generation, ensuring that retrieved legal knowledge is both comprehensive and precise. Using the BM25 retriever with top-3 chunk selection, the system optimizes relevance and efficiency, minimizing redundancy while maximizing legally pertinent content. with top-3 chunk selection, the system optimizes relevance and efficiency, minimizing redundancy while maximizing legally pertinent content. A key design feature is the compression ratio constraint (0.05 to 0.5), maintaining structural symmetry between the original judgment and its summary by balancing representation and information density. Extensive evaluations establish BM25 as the most effective retriever, striking an optimal balance between precision and recall. A comparative analysis of transformer-based (Decoder-only) models—DeepSeek-7B, LLaMA 2-7B, and LLaMA 3.1-8B—demonstrates that LLaMA 3.1-8B, enriched with Legal Named Entity Recognition (NER) and the Dynamic RAG system, achieves superior performance with a BERTScore of 0.89. This study lays a strong foundation for future research in hybrid retrieval models, adaptive chunking strategies, and legal-specific evaluation metrics, with practical implications for case law analysis and automated legal drafting.
Journal Article