Catalogue Search | MBRL

A Comprehensive Survey of Retrieval-Augmented Large Language Models for Decision Making in Agriculture: Unsolved Problems and Research Opportunities

by Vizniuk, Artem , Siwocha, Agnieszka , Smoląg, Jacek in Agricultural equipment , Agriculture , Decision support systems

2025

The breakthrough in developing large language models (LLMs) over the past few years has led to their widespread implementation in various areas of industry, business, and agriculture. The aim of this article is to critically analyse and generalise the known results and research directions on approaches to the development and utilisation of LLMs, with a particular focus on their functional characteristics when integrated into decision support systems (DSSs) for agricultural monitoring. The subject of the research is approaches to the development and integration of LLMs into DSSs for agrotechnical monitoring. The main scientific and applied results of the article are as follows: the world experience of using LLMs to improve agricultural processes has been analysed; a critical analysis of the functional characteristics of LLMs has been carried out, and the areas of application of their architectures have been identified; the necessity of focusing on retrieval-augmented generation (RAG) as an approach to solving one of the main limitations of LLMs, which is the limited knowledge base of training data, has been established; the characteristics and prospects of using LLMs for DSSs in agriculture have been analysed to highlight trustworthiness, explainability and bias reduction as priority areas of research; the potential socio-economic effect from the implementation of LLMs and RAG in the agricultural sector is substantiated.

Journal Article

Share this book

Add to My Shelf

Challenges and Solutions in Applying Large Language Models to Guideline-Based Management Planning and Automated Medical Coding in Health Care: Algorithm Development and Validation

by Jewell, Paul , Sarvari, Peter , Taylor, Rosie in Applications programs , Coding , Criteria

2025

Diagnostic errors and administrative burdens, including medical coding, remain major challenges in health care. Large language models (LLMs) have the potential to alleviate these problems, but their adoption has been limited by concerns regarding reliability, transparency, and clinical safety. This study introduces and evaluates 2 LLM-based frameworks, implemented within the Rhazes Clinician platform, designed to address these challenges: generation-assisted retrieval-augmented generation (GARAG) for automated evidence-based treatment planning and generation-assisted vector search (GAVS) for automated medical coding. GARAG was evaluated on 21 clinical test cases created by medically qualified authors. Each case was executed 3 times independently, and outputs were assessed using 4 criteria: correctness of references, absence of duplication, adherence to formatting, and clinical appropriateness of the generated management plan. GAVS was evaluated on 958 randomly selected admissions from the Medical Information Mart for Intensive Care (MIMIC)-IV database, in which billed International Classification of Diseases, Tenth Revision (ICD-10) codes served as the ground truth. Two approaches were compared: a direct GPT-4.1 baseline prompted to predict ICD-10 codes without constraints and GAVS, in which GPT-4.1 generated diagnostic entities that were each mapped onto the top 10 matching ICD-10 codes through vector search. Across the 63 outputs, 62 (98.4%) satisfied all evaluation criteria, with the only exception being a minor ordering inconsistency in one repetition of case 14. For GAVS, the 958 admissions contained 8576 assigned ICD-10 subcategory codes (1610 unique). The vanilla LLM produced 131,329 candidate codes, whereas GAVS produced 136,920. At the subcategory level, the vanilla LLM achieved 17.95% average recall (15.86% weighted), while GAVS achieved 20.63% (18.62% weighted), a statistically significant improvement (P<.001). At the category level, performance converged (32.60% vs 32.58% average weighted recall; P=.99). GARAG demonstrated a workflow that grounds management plans in diagnosis-specific, peer-reviewed guideline evidence, preserving fine-grained clinical detail during retrieval. GAVS significantly improved fine-grained diagnostic coding recall compared with a direct LLM baseline. Together, these frameworks illustrate how LLM-based methods can enhance clinical decision support and medical coding. Both were subsequently integrated into Rhazes Clinician, a clinician-facing web application that orchestrates LLM agents to call specialized tools, providing a single interface for physician use. Further independent validation and large-scale studies are required to confirm generalizability and assess their impact on patient outcomes.

Journal Article

Share this book

Add to My Shelf

Integrating Graph Retrieval-Augmented Generation into Prescriptive Recommender Systems

by Weller, Julian , Schenck, Wolfram , Migenda, Nico in advanced data analytics , Algorithms , Artificial intelligence

2025

Making time-critical decisions with serious consequences is a daily aspect of work environments. To support the process of finding optimal actions, data-driven approaches are increasingly being used. The most advanced form of data-driven analytics is prescriptive analytics, which prescribes actionable recommendations for users. However, the produced recommendations rely on complex models and optimization techniques that are difficult to understand or justify to non-expert users. Currently, there is a lack of platforms that offer easy integration of domain-specific prescriptive analytics workflows into production environments. In particular, there is no centralized environment and standardized approach for implementing such prescriptive workflows. To address these challenges, large language models (LLMs) can be leveraged to improve interpretability by translating complex recommendations into clear, context-specific explanations, enabling non-experts to grasp the rationale behind the suggested actions. Nevertheless, we acknowledge the inherent black-box nature of LLMs, which may introduce limitations in transparency. To mitigate these limitations and to provide interpretable recommendations based on real user knowledge, a knowledge graph is integrated. In this paper, we present and validate a prescriptive analytics platform that integrates ontology-based graph retrieval-augmented generation (GraphRAG) to enhance decision making by delivering actionable and context-aware recommendations. For this purpose, a knowledge graph is created through a fully automated workflow based on an ontology, which serves as the backbone of the prescriptive platform. Data sources for the knowledge graph are standardized and classified according to the ontology by employing a zero-shot classifier. For user-friendly presentation, we critically examine the usability of GraphRAG in prescriptive analytics platforms. We validate our prescriptive platform in a customer clinic with industry experts in our IoT-Factory, a dedicated research environment.

Journal Article

Share this book

Add to My Shelf

From Conversation to Standardized Terminology: An LLM‐RAG Approach for Automated Health Problem Identification in Home Healthcare

by Zolnoori, Maryam , Song, Jiyoun , Topaz, Maxim in Communication , Home Care Services , Humans

2025

Background With ambient listening systems increasingly adopted in healthcare, analyzing clinician‐patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time‐consuming and labor‐intensive. This study aims to automate health problem identification from clinician‐patient conversations using large language models (LLMs) with retrieval‐augmented generation (RAG). Methods Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician‐patient encounters in home healthcare. RAG‐enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero‐shot, few‐shot, and chain‐of‐thought). Three LLMs—Llama 3.1‐8B‐Instruct, GPT‐4o‐mini, and GPT‐o3‐mini—were compared using precision, recall, and F1‐score against expert annotations. Results The optimal configuration used a 1‐utterance context window, top k = 15, top p = 0.6, and few‐shot learning with chain‐of‐thought prompting. GPT‐4o‐mini achieved the highest F1‐score (0.90) for both problem and sign/symptom identification, followed by GPT‐o3‐mini (0.83/0.82), while Llama 3.1‐8B‐Instruct performed worst (0.73/0.72). Conclusions Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery. Clinical Relevance Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.

Journal Article

Share this book

Add to My Shelf

SKiM-GPT: combining biomedical literature-based discovery with large language model hypothesis evaluation

by Freeman, Jack , Sharma, Ishaan , Xu, Leo in Algorithms , Analysis , Automation

2025

Background Generating and testing hypotheses is a critical aspect of biomedical science. Typically, researchers generate hypotheses by carefully analyzing available information and making logical connections, which are then tested. The accelerating growth of biomedical literature makes it increasingly difficult to keep pace with connections between biological entities emerging across biomedical research. Recently developed automated means of generating hypotheses can generate many more hypotheses than can be easily tested. One such approach involves literature‑based discovery (LBD) systems such as Serial KinderMiner (SKiM), which surfaces putative A‑B‑C links derived from term co‑occurrence. However, LBD systems leave three critical gaps: (i) they find statistical associations, not biological relationships; (ii) they can produce false‑positive leads; and (iii) they do not assess agreement with a hypothesis in question. As a result, LBD search results often require costly manual curation to be of practical utility to the researcher. Large language models (LLMs) have the potential to automate much of this curation step, but standalone LLMs are hampered by hallucinations, lack of transparency in information sources, and the inability to reference data not included in the training corpus. Results We introduce SKiM-GPT, a retrieval-augmented generation (RAG) system that combines SKiM’s co-occurrence search and retrieval with frontier LLMs to evaluate user-defined hypotheses. For every chosen A - B - C SKiM hit, SKiM-GPT retrieves appropriate PubMed abstract texts, filters out irrelevant abstracts with a fine-tuned relevance model, and prompts an LLM to evaluate the user’s hypothesis, given the relevant abstracts. Importantly, the SKiM-GPT system is transparent and human-verifiable: it displays the retrieved abstracts, the hypothesis score, and a justification for the score grounded in the texts and written in natural language. On a benchmark consisting of 14 disease-gene-drug hypotheses, SKiM-GPT achieves strong ordinal agreement with four expert biologists (Cohen’s κ = 0.84), demonstrating its ability to replicate expert judgment. Conclusions SKiM-GPT is open-source ( https://github.com/stewart-lab/skimgpt ) and available through a web interface ( https://skim.morgridge.org ), enabling both wet-lab and computational researchers to systematically and efficiently evaluate biomedical hypotheses at scale.

Journal Article

Share this book

Add to My Shelf

Web Application for Retrieval-Augmented Generation: Implementation and Testing

by Dimitrova, Miroslava , Popchev, Ivan , Doukovska, Lyubka in Accuracy , Applications programs , Blockchain

2024

The purpose of this paper is to explore the implementation of retrieval-augmented generation (RAG) technology with open-source large language models (LLMs). A dedicated web-based application, PaSSER, was developed, integrating RAG with Mistral:7b, Llama2:7b, and Orca2:7b models. Various software instruments were used in the application’s development. PaSSER employs a set of evaluation metrics, including METEOR, ROUGE, BLEU, perplexity, cosine similarity, Pearson correlation, and F1 score, to assess LLMs’ performance, particularly within the smart agriculture domain. The paper presents the results and analyses of two tests. One test assessed the performance of LLMs across different hardware configurations, while the other determined which model delivered the most accurate and contextually relevant responses within RAG. The paper discusses the integration of blockchain with LLMs to manage and store assessment results within a blockchain environment. The tests revealed that GPUs are essential for fast text generation, even for 7b models. Orca2:7b on Mac M1 was the fastest, and Mistral:7b had superior performance on the 446 question–answer dataset. The discussion is on technical and hardware considerations affecting LLMs’ performance. The conclusion outlines future developments in leveraging other LLMs, fine-tuning approaches, and further integration with blockchain and IPFS.

Journal Article

Share this book

Add to My Shelf

Systematic Analysis of Retrieval-Augmented Generation-Based LLMs for Medical Chatbot Applications

by Bora, Arunabh , Cuayáhuitl, Heriberto in Accuracy , Analysis , Artificial intelligence

2024

Artificial Intelligence (AI) has the potential to revolutionise the medical and healthcare sectors. AI and related technologies could significantly address some supply-and-demand challenges in the healthcare system, such as medical AI assistants, chatbots and robots. This paper focuses on tailoring LLMs to medical data utilising a Retrieval-Augmented Generation (RAG) database to evaluate their performance in a computationally resource-constrained environment. Existing studies primarily focus on fine-tuning LLMs on medical data, but this paper combines RAG and fine-tuned models and compares them against base models using RAG or only fine-tuning. Open-source LLMs (Flan-T5-Large, LLaMA-2-7B, and Mistral-7B) are fine-tuned using the medical datasets Meadow-MedQA and MedMCQA. Experiments are reported for response generation and multiple-choice question answering. The latter uses two distinct methodologies: Type A, as standard question answering via direct choice selection; and Type B, as language generation and probability confidence score generation of choices available. Results in the medical domain revealed that Fine-tuning and RAG are crucial for improved performance, and that methodology Type A outperforms Type B.

Journal Article

Share this book

Add to My Shelf

CRP-RAG: A Retrieval-Augmented Generation Framework for Supporting Complex Logical Reasoning and Knowledge Planning

by Zhang, Kun , Wang, Yuanzhuo , Li, Jingyuan in Cage, Nicolas , Cognition & reasoning , Graphs

2025

The Retrieval-Augmented Generation (RAG) framework enhances Large Language Models (LLMs) by retrieving relevant knowledge to broaden their knowledge boundaries and mitigate factual hallucinations stemming from knowledge gaps. However, the RAG Framework faces challenges in effective knowledge retrieval and utilization; invalid or misused knowledge will interfere with LLM generation, reducing reasoning efficiency and answer quality. Existing RAG methods address these issues by decomposing and expanding queries, introducing special knowledge structures, and using reasoning process evaluation and feedback. However, the linear reasoning structures limit complex thought transformations and reasoning based on intricate queries. Additionally, knowledge retrieval and utilization are decoupled from reasoning and answer generation, hindering effective knowledge support during answer generation. To address these limitations, we propose the CRP-RAG framework, which employs reasoning graphs to model complex query reasoning processes more comprehensively and accurately. CRP-RAG guides knowledge retrieval, aggregation, and evaluation through reasoning graphs, dynamically adjusting the reasoning path based on evaluation results and selecting knowledge-sufficiency paths for answer generation. CRP-RAG outperforms the best LLM and RAG baselines by 2.46 in open-domain QA, 7.43 in multi-hop reasoning, and 4.2 in factual verification. Experiments also show the superior factual consistency and robustness of CRP-RAG over existing RAG methods. Extensive analyses confirm its accurate and fact-faithful reasoning and answer generation for complex queries.

Journal Article

Share this book

Add to My Shelf

ADR: Attention Head Detection and Reweighting Enhance RAG Performance in a Positional-Encoding-Free Paradigm

by Jia, Zhichen , Wang, Mingwei , Liu, Xingbang in Algorithms , Analysis , Attention

2025

Retrieval-augmented generation (RAG) has established a new search paradigm, in which large language models integrate external resources to compensate for their inherent knowledge limitations. However, limited context awareness reduces the performance of large language models in RAG tasks. Existing solutions incur additional time and memory overhead and depend on specific positional encodings. In this paper, we propose Attention Head Detection and Reweighting (ADR), a lightweight and general framework. Specifically, we employ a recognition task to identify RAG-suppressing heads that limit the model’s context awareness. We then reweight their outputs with learned coefficients to mitigate the influence of these RAG-suppressing heads. After training, the weights are fixed during inference, introducing no additional time overhead and remaining agnostic to the choice of positional embedding. Experiments on PetroAI further demonstrate that ADR enhances the context awareness of fine-tuned models.

Journal Article

Share this book

Add to My Shelf

Analysis of Large Language Models for Company Annual Reports Based on Retrieval-Augmented Generation

by Hanne, Thomas , Daniel, Chaissy , Puthuparambil, Bennet in annual reports , Artificial intelligence , Chatbots

2025

Large language models (LLMs) like ChatGPT-4 and Gemini 1.0 demonstrate significant text generation capabilities but often struggle with outdated knowledge, domain specificity, and hallucinations. Retrieval-Augmented Generation (RAG) offers a promising solution by integrating external knowledge sources to produce more accurate and informed responses. This research investigates RAG’s effectiveness in enhancing LLM performance for financial report analysis. We examine how RAG and the specific prompt design improve the provision of qualitative and quantitative financial information in terms of accuracy, relevance, and verifiability. Employing a design science research approach, we compare ChatGPT-4 responses before and after RAG integration, using annual reports from ten selected technology companies. Our findings demonstrate that RAG improves the relevance and verifiability of LLM outputs (by 0.66 and 0.71, respectively, on a scale from 1 to 5), while also reducing irrelevant or incorrect answers. Prompt specificity is shown to critically impact response quality. This study indicates RAG’s potential to mitigate LLM biases and inaccuracies, offering a practical solution for generating reliable and contextually rich financial insights.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter