Catalogue Search | MBRL

Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis

by Shen, Dinggang , Suk, Heung-Il , Lee, Seong-Whan in Accuracy , Aged , Aged, 80 and over

2014

For the last decade, it has been shown that neuroimaging can be a potential tool for the diagnosis of Alzheimer's Disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI), and also fusion of different modalities can further provide the complementary information to enhance diagnostic accuracy. Here, we focus on the problems of both feature representation and fusion of multimodal information from Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET). To our best knowledge, the previous methods in the literature mostly used hand-crafted features such as cortical thickness, gray matter densities from MRI, or voxel intensities from PET, and then combined these multimodal features by simply concatenating into a long vector or transforming into a higher-dimensional kernel space. In this paper, we propose a novel method for a high-level latent and shared feature representation from neuroimaging modalities via deep learning. Specifically, we use Deep Boltzmann Machine (DBM)22Although it is clear from the context that the acronym DBM denotes “Deep Boltzmann Machine” in this paper, we would clearly indicate that DBM here is not related to “Deformation Based Morphometry”., a deep network with a restricted Boltzmann machine as a building block, to find a latent hierarchical feature representation from a 3D patch, and then devise a systematic method for a joint feature representation from the paired patches of MRI and PET with a multimodal DBM. To validate the effectiveness of the proposed method, we performed experiments on ADNI dataset and compared with the state-of-the-art methods. In three binary classification problems of AD vs. healthy Normal Control (NC), MCI vs. NC, and MCI converter vs. MCI non-converter, we obtained the maximal accuracies of 95.35%, 85.67%, and 74.58%, respectively, outperforming the competing methods. By visual inspection of the trained model, we observed that the proposed method could hierarchically discover the complex latent patterns inherent in both MRI and PET. •A novel method for a high-level latent feature representation from neuroimaging data•A systematic method for joint feature representation of multimodal neuroimaging data•Hierarchical patch-level information fusion via an ensemble classifier•Maximal diagnostic accuracies of 93.52% (AD vs. NC), 85.19% (MCI vs. NC), and 74.58% (MCI converter vs. MCI non-converter)

Journal Article

Share this book

Add to My Shelf

MDFNet: application of multimodal fusion method based on skin image and clinical data to skin cancer classification

by Zhou, Panyun , Chen, Chen , Li, Min in Cancer Research , Clinical Decision-Making , Decision making

2023

Purpose Skin cancer is one of the ten most common cancer types in the world. Early diagnosis and treatment can effectively reduce the mortality of patients. Therefore, it is of great significance to develop an intelligent diagnosis system for skin cancer. According to the survey, at present, most intelligent diagnosis systems of skin cancer only use skin image data, but the multi-modal cross-fusion analysis using image data and patient clinical data is limited. Therefore, to further explore the complementary relationship between image data and patient clinical data, we propose multimode data fusion diagnosis network (MDFNet), a framework for skin cancer based on data fusion strategy. Methods MDFNet establishes an effective mapping among heterogeneous data features, effectively fuses clinical skin images and patient clinical data, and effectively solves the problems of feature paucity and insufficient feature richness that only use single-mode data. Results The experimental results present that our proposed smart skin cancer diagnosis model has an accuracy of 80.42%, which is an improvement of about 9% compared with the model accuracy using only medical images, thus effectively confirming the unique fusion advantages exhibited by MDFNet. Conclusions This illustrates that MDFNet can not only be applied as an effective auxiliary diagnostic tool for skin cancer diagnosis, help physicians improve clinical decision-making ability and effectively improve the efficiency of clinical medicine diagnosis, but also its proposed data fusion method fully exerts the advantage of information convergence and has a certain reference value for the intelligent diagnosis of numerous clinical diseases.

Journal Article

Share this book

Add to My Shelf

LGCDF: Label-Guided Contrastive Disentanglement Fusion of Sensitive Attribute-Free Representations for Fair Multimodal Sentiment Analysis

by Zhang, Hanlin , Cheng, Siwei , Zhang, Tingting in Accuracy , Bias , Emotions

2026

Multimodal sentiment analysis (MSA) has emerged as a prominent research frontier, enabling a comprehensive understanding of complex human emotions through the synergistic integration of heterogeneous multimodal signals. However, most existing approaches rely on idealized signal distribution assumptions, overlooking the detrimental impact of demographic bias on representation fairness and fusion robustness. This paper proposes a Label-Guided Contrastive Decoupling Fusion (LGCDF) framework that enhances model robustness to demographic bias by learning and fusing multimodal representations invariant to Sensitive Attributes (SAs). Specifically, the proposed LGCDF framework employs gender-sensitive attribute information as modality-level constraints to achieve language-centric cross-modal sentiment alignment, which is accomplished by computing contrastive losses between text–audio and text–visual feature pairs. Moreover, it introduces a SA-guided contrastive decoupling mechanism that decomposes multimodal representations into SA-related and -independent components. The SA-independent components are subsequently fused through a cross-modal attention fusion strategy, thereby facilitating fair sentiment representation and enabling efficient and robust multimodal information fusion. Extensive experimental results demonstrate that the proposed LGCDF framework achieves superior performance in fair representation learning and cross-modal information fusion while maintaining strong robustness under varying gender distribution biases.

Journal Article

Share this book

Add to My Shelf

Artificial intelligence and multimodal data fusion for smart healthcare: topic modeling and bibliometrics

by Tao, Xiaohui , Chen, Xieling , Leng, Mingming in Analysis , Artificial Intelligence , Bibliometrics

2024

Advancements in artificial intelligence (AI) have driven extensive research into developing diverse multimodal data analysis approaches for smart healthcare. There is a scarcity of large-scale analysis of literature in this field based on quantitative approaches. This study performed a bibliometric and topic modeling examination on 683 articles from 2002 to 2022, focusing on research topics and trends, journals, countries/regions, institutions, authors, and scientific collaborations. Results showed that, firstly, the number of articles has grown from 1 in 2002 to 220 in 2022, with a majority being published in interdisciplinary journals that link healthcare and medical research and information technology and AI. Secondly, the significant rise in the quantity of research articles can be attributed to the increasing contribution of scholars from non-English speaking countries/regions and the noteworthy contributions made by authors in the USA and India. Thirdly, researchers show a high interest in diverse research issues, especially, cross-modality magnetic resonance imaging (MRI) for brain tumor analysis, cancer prognosis through multi-dimensional data analysis, and AI-assisted diagnostics and personalization in healthcare, with each topic experiencing a significant increase in research interest. There is an emerging trend towards issues such as applying generative adversarial networks and contrastive learning for multimodal medical image fusion and synthesis and utilizing the combined spatiotemporal resolution of functional MRI and electroencephalogram in a data-centric manner. This study is valuable in enhancing researchers’ and practitioners’ understanding of the present focal points and upcoming trajectories in AI-powered smart healthcare based on multimodal data analysis.

Journal Article

Share this book

Add to My Shelf

Multimodal Sentiment-Based Popularity Evaluation of Tourist Attractions Using Text, Image, and Geospatial Data Fusion

by Li, Xizi , Lu, Lu , Chen, Bozhou

2025

This study proposes a novel tourist attraction popularity evaluation model that integrates multimodal sentiment analysis. The model combines a Transformer-based BERT network for textual sentiment classification, an improved ResNet convolutional network with a support vector machine (SVM) for image-based sentiment analysis, and Gaussian kernel-based spatial modeling for geographic heat estimation. Data collected from Zhangjiajie National Forest Park includes 300,000 user reviews, 50,000 images, and tourist origin data from 34 domestic provinces and 50 international countries. The proposed model achieved a text sentiment classification accuracy of up to 88%, an image sentiment F1-score of 0.82 in peak seasons, and a Pearson correlation coefficient of 0.92 between predicted heat values and actual tourist traffic. These results demonstrate strong predictive accuracy, cross-modal integration effectiveness, and robustness to noisy data, offering practical insights for attraction managers in real-time decision-making.

Journal Article

Share this book

Add to My Shelf

Design of a 3D High-Definition Map Visualizer for Pose Estimation and Autonomous Navigation in Dynamic Environments

by Ge, Yunchen , Bhatt, Neel P. , Hashemi, Ehsan in 3D point cloud data , Automation , Data acquisition systems

2026

A high-definition (HD) map development framework providing real-time visualization of multimodal perception data for state estimation, motion planning, and decision-making in autonomous navigation is presented and experimentally validated. The proposed framework integrates synchronized visual and LiDAR data and generates consistent frame transformations to construct accurate and interpretable HD maps suitable for navigation in dynamic environments. In addition, the framework enables flexible customization of essential map elements, including road features and static landmarks, facilitating efficient map generation and visualization. Building upon the developed HD map visualizer, a semantic-aware visual odometry (VO)-based pose estimation module is designed and verified through extensive evaluations and under perceptually degraded conditions. To ensure the reliability of synchronized multimodal data used by downstream perception and pose estimation modules, a sensor health monitoring system is also developed and validated in urban canyon scenarios with intermittent or unavailable global navigation satellite system (GNSS) measurements. Experimental results demonstrate that the proposed HD map visualizer and associated perception modules are transferable for autonomous navigation and can be effectively employed as benchmarking tools for state estimation and motion planning algorithms in autonomous driving.

Journal Article

Share this book

Add to My Shelf

A Higher-Order Generalized Singular Value Decomposition for Rank-Deficient Matrices

by Kempf, Idris , Duncan, Stephen R. , Goulart, Paul J.

2023

Journal Article

Share this book

Add to My Shelf

Multimodal machine learning for deception detection using behavioral and physiological data

by Maheshwari, Harsh , Gupta, Shubhashi , Jain, N. K. in 631/378 , 692/700 , Adult

2025

Deception detection is crucial in domains like national security, privacy, judiciary, and courtroom trials. Differentiating truth from lies is inherently challenging due to many complex, diversified behavioural, physiological and cognitive aspects. Traditional lie detector tests (polygraphs) have been widely used but remain controversial due to scientific, ethical, and practical concerns. With advancements in machine learning, deception detection can be automated. However, existing secondary datasets are limited—they are small, unimodal, and predominantly based on non-Indian populations. To address these gaps, we present CogniModal-D , a primary real-world multimodal dataset for deception detection, specifically targeting the Indian population. It spans seven modalities—electroencephalography (EEG), electrocardiography (ECG), electrooculography (EOG), eye-gaze, galvanic skin response (GSR), audio, and video—collected from over 100 subjects. The data was gathered through tasks focused on social relationships and controlled mock crime interrogations. Our multimodal AI-based score-level fusion approach integrates diverse verbal and nonverbal cues, significantly improving deception detection accuracy compared to unimodal methods. Performance improvements of up to 15% were observed in mock crime and best friend scenarios with multimodal fusion. Notably, behavioural modalities (audio, video, gaze, GSR) proved more robust than neurophysiological ones (EEG, ECG, EOG).The study demonstrates that multimodal features offer superior discriminatory power in deception detection. These insights highlight the pivotal role of integrating multiple modalities to develop robust, scalable, and advanced deception detection systems in the future.

Journal Article

Share this book

Add to My Shelf

A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data

by Bai, Xuyu , Zhang, Peng , Li, Erzhu in artificial intelligence , attention mechanism , automation

2020

Automated extraction of buildings from earth observation (EO) data has long been a fundamental but challenging research topic. Combining data from different modalities (e.g., high-resolution imagery (HRI) and light detection and ranging (LiDAR) data) has shown great potential in building extraction. Recent studies have examined the role that deep learning (DL) could play in both multimodal data fusion and urban object extraction. However, DL-based multimodal fusion networks may encounter the following limitations: (1) the individual modal and cross-modal features, which we consider both useful and important for final prediction, cannot be sufficiently learned and utilized and (2) the multimodal features are fused by a simple summation or concatenation, which appears ambiguous in selecting cross-modal complementary information. In this paper, we address these two limitations by proposing a hybrid attention-aware fusion network (HAFNet) for building extraction. It consists of RGB-specific, digital surface model (DSM)-specific, and cross-modal streams to sufficiently learn and utilize both individual modal and cross-modal features. Furthermore, an attention-aware multimodal fusion block (Att-MFBlock) was introduced to overcome the fusion problem by adaptively selecting and combining complementary features from each modality. Extensive experiments conducted on two publicly available datasets demonstrated the effectiveness of the proposed HAFNet for building extraction.

Journal Article

Share this book

Add to My Shelf

Advancing healthcare through multimodal data fusion: a comprehensive review of techniques and applications

by Dong, Jian , Hasikin, Khairunnisa , Zuo, Xiaowei in Artificial intelligence , Early fusion , Healthcare

2024

With the increasing availability of diverse healthcare data sources, such as medical images and electronic health records, there is a growing need to effectively integrate and fuse this multimodal data for comprehensive analysis and decision-making. However, despite its potential, multimodal data fusion in healthcare remains limited. This review paper provides an overview of existing literature on multimodal data fusion in healthcare, covering 69 relevant works published between 2018 and 2024. It focuses on methodologies that integrate different data types to enhance medical analysis, including techniques for integrating medical images with structured and unstructured data, combining multiple image modalities, and other features. Additionally, the paper reviews various approaches to multimodal data fusion, such as early, intermediate, and late fusion methods, and examines the challenges and limitations associated with these techniques. The potential benefits and applications of multimodal data fusion in various diseases are highlighted, illustrating specific strategies employed in healthcare artificial intelligence (AI) model development. This research synthesizes existing information to facilitate progress in using multimodal data for improved medical diagnosis and treatment planning.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter