Catalogue Search | MBRL

Chef Dalle: Transforming Cooking with Multi-Model Multimodal AI

by Li, J. Jenny , Morreale, Patricia , Hannon, Brendan in Accessibility , Artificial intelligence , Cookery

2024

In an era where dietary habits significantly impact health, technological interventions can offer personalized and accessible food choices. This paper introduces Chef Dalle, a recipe recommendation system that leverages multi-model and multimodal human-computer interaction (HCI) techniques to provide personalized cooking guidance. The application integrates voice-to-text conversion via Whisper and ingredient image recognition through GPT-Vision. It employs an advanced recipe filtering system that utilizes user-provided ingredients to fetch recipes, which are then evaluated through multi-model AI through integrations of OpenAI, Google Gemini, Claude, and/or Anthropic APIs to deliver highly personalized recommendations. These methods enable users to interact with the system using voice, text, or images, accommodating various dietary restrictions and preferences. Furthermore, the utilization of DALL-E 3 for generating recipe images enhances user engagement. User feedback mechanisms allow for the refinement of future recommendations, demonstrating the system’s adaptability. Chef Dalle showcases potential applications ranging from home kitchens to grocery stores and restaurant menu customization, addressing accessibility and promoting healthier eating habits. This paper underscores the significance of multimodal HCI in enhancing culinary experiences, setting a precedent for future developments in the field.

Journal Article

Share this book

Add to My Shelf

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook

by Sheikh, Javaid , Renault, Max-Antoine , Damseh, Rafat in Application , Artificial intelligence , Chatbots

2024

In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data–driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.

Journal Article

Share this book

Add to My Shelf

Google Gemini as a next generation AI educational tool: a review of emerging educational technology

by Imran, Muhammad , Almusharraf, Norah in Artificial Intelligence , Audio data , Chatbots

2024

This emerging technology report discusses Google Gemini as a multimodal generative AI tool and presents its revolutionary potential for future educational technology. It introduces Gemini and its features, including versatility in processing data from text, image, audio, and video inputs and generating diverse content types. This study discusses recent empirical studies, technology in practice, and the relationship between Gemini technology and the educational landscape. This report further explores Gemini’s relevance for future educational endeavors and practical applications in emerging technologies. Also, it discusses the significant challenges and ethical considerations that must be addressed to ensure its responsible and effective integration into the educational landscape.

Journal Article

Share this book

Add to My Shelf

Bridging the gap between AI and human emotion: a multimodal recognition system

by Teja, Jakkula Sai Surya , Lakshmi Prasanna, J. , Neeraja, Ganta

2024

This study introduces a novel system that integrates voice and facial recognition technologies to enhance human-computer interaction by accurately interpreting and responding to user emotions. Unlike conventional approaches that analyze either voice or facial expressions in isolation, this system combines both modalities, o ering a more comprehensive understanding of emotional states. By evaluating facial expressions, vocal tones, and contextual conversation history, the system generates personalized, context-aware responses, fostering more natural and empathetic AI interactions. This advancement significantly improves user engagement and satisfaction, paving the way for emotionally intelligent AI applications across diverse fields.

Journal Article

Share this book

Add to My Shelf

Correction: A multimodal deep learning architecture for predicting interstitial glucose for effective type 2 diabetes management

by Dafoulas, George E. , Pecchia, Leandro , Fotiadis, Dimitrios in Correction , Deep learning , Humanities and Social Sciences

2025

Journal Article

Share this book

Add to My Shelf

Machine learning for cognitive behavioral analysis: datasets, methods, paradigms, and research directions

by Jain, N. K. , Sinha, Pratyush , Tasgaonkar, Vaibhav in Affective computing , Artificial Intelligence , Behavior

2023

Human behaviour reflects cognitive abilities. Human cognition is fundamentally linked to the different experiences or characteristics of consciousness/emotions, such as joy, grief, anger, etc., which assists in effective communication with others. Detection and differentiation between thoughts, feelings, and behaviours are paramount in learning to control our emotions and respond more effectively in stressful circumstances. The ability to perceive, analyse, process, interpret, remember, and retrieve information while making judgments to respond correctly is referred to as Cognitive Behavior. After making a significant mark in emotion analysis, deception detection is one of the key areas to connect human behaviour, mainly in the forensic domain. Detection of lies, deception, malicious intent, abnormal behaviour, emotions, stress, etc., have significant roles in advanced stages of behavioral science. Artificial Intelligence and Machine learning (AI/ML) has helped a great deal in pattern recognition, data extraction and analysis, and interpretations. The goal of using AI and ML in behavioral sciences is to infer human behaviour, mainly for mental health or forensic investigations. The presented work provides an extensive review of the research on cognitive behaviour analysis. A parametric study is presented based on different physical characteristics, emotional behaviours, data collection sensing mechanisms, unimodal and multimodal datasets, modelling AI/ML methods, challenges, and future research directions.

Journal Article

Share this book

Add to My Shelf

Advanced Multimodal AI for Resilient Healthcare: Enhancing Early Risk Assessment in Critical Care

by Li, Chengcheng , Wu, Shih-Wei , Zhang, Yao-Yu

2026

This study develops an advanced multimodal AI framework to strengthen early risk assessment in critical care and support resilient healthcare delivery. Utilizing the MIMIC-III database, this research extracted structured variables and clinical notes from 26,829 adult patients. A text mining approach based on the BERTopic model was employed to generate topic embeddings from unstructured notes, which were subsequently integrated with 16 quantitative variables. Six machine learning models: Adaboost, Gradient Boosting, Support Vector Classification (SVC), Bagging, Logistic Regression, and MLP Classifier were trained to predict short-term and long-term mortality outcomes. Model performance was evaluated through AUROC, accuracy, recall, precision, and F1-score metrics. The results demonstrate that integrating topic embeddings with structured data significantly improved short-term risk prediction. The SVC model, in particular, achieved an AUROC of 0.9137 for predicting 2-day mortality. Critical predictors identified included the Glasgow Coma Scale, White Blood Cell Count, and text-derived topics related to cardiovascular and neurological conditions. The study is based on a single-center dataset, limiting generalizability. Additionally, only a subset of textual data sources was analyzed, and improvements in long-term risk prediction were relatively modest. These findings demonstrate how multimodal AI can significantly improve early risk assessment and enhance resilience in critical care decision-making. This research pioneers the integration of BERTopic-based text mining with machine learning models for clinical risk prediction, highlighting the value of multimodal data fusion in improving predictive accuracy and enriching medical informatics.

Journal Article

Share this book

Add to My Shelf

DermaGPT a federated multimodal framework with a meta learned trust function for interpretable dermatology diagnostics

by Amiri, Mohammad Hussein , Hashjin, Nastaran Mehrabi , Najafabadi, Maryam Khanian in 631/114 , 631/67 , 639/705

2026

Advances in generative and federated artificial intelligence enable privacy-aware diagnostic systems that integrate multimodal reasoning and explainability. This work introduces DermaGPT, a federated multimodal framework for dermatology decision support that emphasizes trustworthy use under heterogeneous, privacy-sensitive data. The system combines a PaLI-Gemma 2 vision–language backbone, fine-tuned with low-rank adaptation, with a retrieval-augmented large language model that generates clinically coherent and patient-friendly explanations. To improve robustness and calibration across sites, a meta-learned trust function (MLTF) dynamically re-weights client updates based on uncertainty, calibration, and domain-shift indicators. Evaluated on four institutional datasets and an external cohort of 4,452 biopsy-confirmed clinical and dermoscopic images, DermaGPT achieved 90.2% diagnostic accuracy across 11 lesion types and 93.3% accuracy in malignancy prediction, with well-calibrated outputs under federated training. Expert dermatologists rated its explanations as clear and clinically relevant; these ratings were obtained on class-level canonical exemplars rather than per-image reports. In our deployment threat model, images are processed locally by the vision module; when a third-party LLM is used, only text (a short diagnostic summary and the user question) is transmitted, which may still be considered sensitive health data. Taken together, these results indicate that a trust-aware, federated multimodal design can deliver interpretable, efficient, and privacy-aware dermatology decision support that is intended to augment rather than replace clinician judgment.

Journal Article

Share this book

Add to My Shelf

Multimodal AI in Biomedicine: Pioneering the Future of Biomaterials, Diagnostics, and Personalized Healthcare

by Jung, Jae Hak , Parvin, Nargish , Joo, Sang Woo in AlphaFold , Artificial intelligence , Biocompatibility

2025

Multimodal artificial intelligence (AI) is driving a paradigm shift in modern biomedicine by seamlessly integrating heterogeneous data sources such as medical imaging, genomic information, and electronic health records. This review explores the transformative impact of multimodal AI across three pivotal areas: biomaterials science, medical diagnostics, and personalized medicine. In the realm of biomaterials, AI facilitates the design of patient-specific solutions tailored for tissue engineering, drug delivery, and regenerative therapies. Advanced tools like AlphaFold have significantly improved protein structure prediction, enabling the creation of biomaterials with enhanced biological compatibility. In diagnostics, AI systems synthesize multimodal inputs combining imaging, molecular markers, and clinical data—to improve diagnostic precision and support early disease detection. For precision medicine, AI integrates data from wearable technologies, continuous monitoring systems, and individualized health profiles to inform targeted therapeutic strategies. Despite its promise, the integration of AI into clinical practice presents challenges such as ensuring data security, meeting regulatory standards, and promoting algorithmic transparency. Addressing ethical issues including bias and equitable access remains critical. Nonetheless, the convergence of AI and biotechnology continues to shape a future where healthcare is more predictive, personalized, and responsive.

Journal Article

Share this book

Add to My Shelf

Integrating genetics, age and imaging to predict treatment outcomes in neovascular age-related macular degeneration: a proof-of-concept study

by Wagner, Siegfried K. , Moghul, Ismail , Balaskas, Konstantinos in 692/308 , 692/53 , 692/699

2026

To evaluate the feasibility of integrating genetic, imaging, and demographic data for predictive modelling of treatment outcomes in neovascular age-related macular degeneration (nAMD). Proof-of-concept retrospective cohort study with prospective DNA collection. Patients with unilateral nAMD receiving anti-vascular endothelial growth factor (anti-VEGF) therapy on a treat-and-extend regimen at a single tertiary centre were recruited. Polygenic risk scores (PRS) for AMD were derived from genotyping data (NIHR Bioresource). Optical coherence tomography (OCT) biomarkers-intraretinal fluid (IRF), subretinal fluid (SRF), pigment epithelial detachment (PED), and subretinal hyperreflective material (SHRM)-were automatically quantified using a deep learning segmentation model. Predictors of treatment outcomes included PRS, age at first injection, and OCT feature volumes at baseline. XGBoost was used for binary outcomes and linear regression for continuous outcomes, employing five-fold cross-validation. (1) macular dryness (no IRF/SRF) at 24 months, (2) average treatment interval in year 2, and (3) age at first injection. 106 participants were included. The multimodal model integrating age, imaging, and PRS predicted macular dryness at 24 months with AUC = 0.903, outperforming imaging alone (AUC = 0.701). PRS was associated with younger age at first injection (β = –4.69, 95% CI [–8.93, –0.44], P = 0.031) but not with treatment burden (β = –6.39, P = 0.13). Integrating PRS with OCT-derived imaging biomarkers and patient age is technically feasible and improves predictive performance of modelling for anatomical treatment outcomes in nAMD. PRS reflects genetic susceptibility to nAMD and contextualizes the predictive value of imaging biomarkers for treatment response.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter