Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
781
result(s) for
"Multimodal data fusion"
Sort by:
Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis
2014
For the last decade, it has been shown that neuroimaging can be a potential tool for the diagnosis of Alzheimer's Disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI), and also fusion of different modalities can further provide the complementary information to enhance diagnostic accuracy. Here, we focus on the problems of both feature representation and fusion of multimodal information from Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET). To our best knowledge, the previous methods in the literature mostly used hand-crafted features such as cortical thickness, gray matter densities from MRI, or voxel intensities from PET, and then combined these multimodal features by simply concatenating into a long vector or transforming into a higher-dimensional kernel space. In this paper, we propose a novel method for a high-level latent and shared feature representation from neuroimaging modalities via deep learning. Specifically, we use Deep Boltzmann Machine (DBM)22Although it is clear from the context that the acronym DBM denotes “Deep Boltzmann Machine” in this paper, we would clearly indicate that DBM here is not related to “Deformation Based Morphometry”., a deep network with a restricted Boltzmann machine as a building block, to find a latent hierarchical feature representation from a 3D patch, and then devise a systematic method for a joint feature representation from the paired patches of MRI and PET with a multimodal DBM. To validate the effectiveness of the proposed method, we performed experiments on ADNI dataset and compared with the state-of-the-art methods. In three binary classification problems of AD vs. healthy Normal Control (NC), MCI vs. NC, and MCI converter vs. MCI non-converter, we obtained the maximal accuracies of 95.35%, 85.67%, and 74.58%, respectively, outperforming the competing methods. By visual inspection of the trained model, we observed that the proposed method could hierarchically discover the complex latent patterns inherent in both MRI and PET.
•A novel method for a high-level latent feature representation from neuroimaging data•A systematic method for joint feature representation of multimodal neuroimaging data•Hierarchical patch-level information fusion via an ensemble classifier•Maximal diagnostic accuracies of 93.52% (AD vs. NC), 85.19% (MCI vs. NC), and 74.58% (MCI converter vs. MCI non-converter)
Journal Article
MDFNet: application of multimodal fusion method based on skin image and clinical data to skin cancer classification
by
Zhou, Panyun
,
Chen, Chen
,
Li, Min
in
Cancer Research
,
Clinical Decision-Making
,
Decision making
2023
Purpose
Skin cancer is one of the ten most common cancer types in the world. Early diagnosis and treatment can effectively reduce the mortality of patients. Therefore, it is of great significance to develop an intelligent diagnosis system for skin cancer. According to the survey, at present, most intelligent diagnosis systems of skin cancer only use skin image data, but the multi-modal cross-fusion analysis using image data and patient clinical data is limited. Therefore, to further explore the complementary relationship between image data and patient clinical data, we propose multimode data fusion diagnosis network (MDFNet), a framework for skin cancer based on data fusion strategy.
Methods
MDFNet establishes an effective mapping among heterogeneous data features, effectively fuses clinical skin images and patient clinical data, and effectively solves the problems of feature paucity and insufficient feature richness that only use single-mode data.
Results
The experimental results present that our proposed smart skin cancer diagnosis model has an accuracy of 80.42%, which is an improvement of about 9% compared with the model accuracy using only medical images, thus effectively confirming the unique fusion advantages exhibited by MDFNet.
Conclusions
This illustrates that MDFNet can not only be applied as an effective auxiliary diagnostic tool for skin cancer diagnosis, help physicians improve clinical decision-making ability and effectively improve the efficiency of clinical medicine diagnosis, but also its proposed data fusion method fully exerts the advantage of information convergence and has a certain reference value for the intelligent diagnosis of numerous clinical diseases.
Journal Article
Artificial intelligence and multimodal data fusion for smart healthcare: topic modeling and bibliometrics
by
Tao, Xiaohui
,
Chen, Xieling
,
Leng, Mingming
in
Analysis
,
Artificial Intelligence
,
Bibliometrics
2024
Advancements in artificial intelligence (AI) have driven extensive research into developing diverse multimodal data analysis approaches for smart healthcare. There is a scarcity of large-scale analysis of literature in this field based on quantitative approaches. This study performed a bibliometric and topic modeling examination on 683 articles from 2002 to 2022, focusing on research topics and trends, journals, countries/regions, institutions, authors, and scientific collaborations. Results showed that, firstly, the number of articles has grown from 1 in 2002 to 220 in 2022, with a majority being published in interdisciplinary journals that link healthcare and medical research and information technology and AI. Secondly, the significant rise in the quantity of research articles can be attributed to the increasing contribution of scholars from non-English speaking countries/regions and the noteworthy contributions made by authors in the USA and India. Thirdly, researchers show a high interest in diverse research issues, especially, cross-modality magnetic resonance imaging (MRI) for brain tumor analysis, cancer prognosis through multi-dimensional data analysis, and AI-assisted diagnostics and personalization in healthcare, with each topic experiencing a significant increase in research interest. There is an emerging trend towards issues such as applying generative adversarial networks and contrastive learning for multimodal medical image fusion and synthesis and utilizing the combined spatiotemporal resolution of functional MRI and electroencephalogram in a data-centric manner. This study is valuable in enhancing researchers’ and practitioners’ understanding of the present focal points and upcoming trajectories in AI-powered smart healthcare based on multimodal data analysis.
Journal Article
A Reexamination of the Substructure Inside the Castillo at Chichen Itza, Yucatan, Mexico
by
Stanton, Travis W.
,
Osorio León, José Francisco Javier
,
Meacham, Samuel S.
in
Archaeology
,
Archives & records
,
Cultural heritage
2025
The Castillo (also known at the Temple of Kukulkan) is one the most iconic structures in Mesoamerica. This temple-pyramid towers over the main plaza of the civic-ceremonial city of Chichen Itza, which once dominated the political and economic landscape of the northern Maya lowlands. Reported here are the preliminary results of a multimodal and multiresolution scanning campaign and fusion of 3D data outputs intended to more accurately record the physical attributes of the earlier temple-pyramid inside the Castillo, known as the Castillo-sub, and examine the spatial and architectonic relationships between the two structures. A focus of our scanning campaign involved the upper façades of the sub-temple and the Chacmool and jaguar throne sculptures inside the sub-temple itself. Structured-light scans of the upper façades now serve as the definitive representation of this portion of the Castillo-sub.
Journal Article
A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data
by
Bai, Xuyu
,
Zhang, Peng
,
Li, Erzhu
in
artificial intelligence
,
attention mechanism
,
automation
2020
Automated extraction of buildings from earth observation (EO) data has long been a fundamental but challenging research topic. Combining data from different modalities (e.g., high-resolution imagery (HRI) and light detection and ranging (LiDAR) data) has shown great potential in building extraction. Recent studies have examined the role that deep learning (DL) could play in both multimodal data fusion and urban object extraction. However, DL-based multimodal fusion networks may encounter the following limitations: (1) the individual modal and cross-modal features, which we consider both useful and important for final prediction, cannot be sufficiently learned and utilized and (2) the multimodal features are fused by a simple summation or concatenation, which appears ambiguous in selecting cross-modal complementary information. In this paper, we address these two limitations by proposing a hybrid attention-aware fusion network (HAFNet) for building extraction. It consists of RGB-specific, digital surface model (DSM)-specific, and cross-modal streams to sufficiently learn and utilize both individual modal and cross-modal features. Furthermore, an attention-aware multimodal fusion block (Att-MFBlock) was introduced to overcome the fusion problem by adaptively selecting and combining complementary features from each modality. Extensive experiments conducted on two publicly available datasets demonstrated the effectiveness of the proposed HAFNet for building extraction.
Journal Article
Bimodal variational autoencoder for audiovisual speech recognition
by
ElDeeb, Hesham E.
,
Sayed, Hadeer M.
,
Taie, Shereen A.
in
Accuracy
,
Artificial Intelligence
,
Artificial neural networks
2023
Multimodal fusion is the idea of combining information in a joint representation of multiple modalities. The goal of multimodal fusion is to improve the accuracy of results from classification or regression tasks. This paper proposes a Bimodal Variational Autoencoder (BiVAE) model for audiovisual features fusion. Reliance on audiovisual signals in a speech recognition task increases the recognition accuracy, especially when an audio signal is corrupted. The BiVAE model is trained and validated on the CUAVE dataset. Three classifiers have evaluated the fused audiovisual features: Long-short Term Memory, Deep Neural Network, and Support Vector Machine. The experiment involves the evaluation of the fused features in the case of whether two modalities are available or there is only one modality available (i.e., cross-modality). The experimental results display the superiority of the proposed model (BiVAE) of audiovisual features fusion over the state-of-the-art models by an average accuracy difference
≃
3.28% and 13.28% for clean and noisy, respectively. Additionally, BiVAE outperforms the state-of-the-art models in the case of cross-modality by an accuracy difference
≃
2.79% when the only audio signal is available and 1.88% when the only video signal is available. Furthermore, SVM satisfies the best recognition accuracy compared with other classifiers.
Journal Article
SegGen: An Unreal Engine 5 Pipeline for Generating Multimodal Semantic Segmentation Datasets
2025
Synthetic data has become an increasingly important tool for semantic segmentation, where collecting large-scale annotated datasets is often costly and impractical. Prior work has leveraged computer graphics and game engines to generate training data, but many pipelines remain limited to single modalities and constrained environments or require substantial manual setup. To address these limitations, we present a fully automated pipeline built within Unreal Engine 5 (UE5) that procedurally generates diverse, labeled environments and collects multimodal visual data for semantic segmentation tasks. Our system integrates UE5’s biome-based procedural generation framework with a spline-following drone actor capable of capturing both RGB and depth imagery, alongside pixel-perfect semantic segmentation labels. As a proof of concept, we generated a dataset consisting of 1169 samples across two visual modalities and seven semantic classes. The pipeline supports scalable expansion and rapid environment variation, enabling high-throughput synthetic data generation with minimal human intervention. To validate our approach, we trained benchmark computer vision segmentation models on the synthetic dataset and demonstrated their ability to learn meaningful semantic representations. This work highlights the potential of game-engine-based data generation to accelerate research in multimodal perception and provide reproducible, scalable benchmarks for future segmentation models.
Journal Article
An Improved Multimodal Framework-Based Fault Classification Method for Distribution Systems Using LSTM Fusion and Cross-Attention
2025
Accurate and rapid diagnosis of fault causes is crucial for ensuring the stability and safety of power distribution systems, which are frequently subjected to a variety of fault-inducing events. This study proposes a novel multimodal data fusion approach that effectively integrates external environmental information with internal electrical signals associated with faults. Initially, the TabTransformer and embedding techniques are employed to construct a unified representation of categorical fault information across multiple dimensions. Subsequently, an LSTM-based fusion module is introduced to aggregate continuous signals from multiple dimensions. Furthermore, a cross-attention module is designed to integrate both continuous and categorical fault information, thereby enhancing the model’s capability to capture complex relationships among data from diverse sources. Additionally, to address challenges such as a limited data scale, class imbalance, and potential mislabeling, this study introduces a loss function that combines soft label loss with focal loss. Experimental results demonstrate that the proposed multimodal data fusion algorithm significantly outperforms existing methods in terms of fault identification accuracy, thereby highlighting its potential for rapid and precise fault classification in real-world power grids.
Journal Article