Catalogue Search | MBRL

Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches

by Lim, Mei Kuan , Khoo, Lin Sze , McNaney, Roisin in Algorithms , Artificial intelligence , Automation

2024

As mental health (MH) disorders become increasingly prevalent, their multifaceted symptoms and comorbidities with other conditions introduce complexity to diagnosis, posing a risk of underdiagnosis. While machine learning (ML) has been explored to mitigate these challenges, we hypothesized that multiple data modalities support more comprehensive detection and that non-intrusive collection approaches better capture natural behaviors. To understand the current trends, we systematically reviewed 184 studies to assess feature extraction, feature fusion, and ML methodologies applied to detect MH disorders from passively sensed multimodal data, including audio and video recordings, social media, smartphones, and wearable devices. Our findings revealed varying correlations of modality-specific features in individualized contexts, potentially influenced by demographics and personalities. We also observed the growing adoption of neural network architectures for model-level fusion and as ML algorithms, which have demonstrated promising efficacy in handling high-dimensional features while modeling within and cross-modality relationships. This work provides future researchers with a clear taxonomy of methodological approaches to multimodal detection of MH disorders to inspire future methodological advancements. The comprehensive analysis also guides and supports future researchers in making informed decisions to select an optimal data source that aligns with specific use cases based on the MH disorder of interest.

Journal Article

Share this book

Add to My Shelf

Region-based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation

by Nan, Yang , Del Ser, Javier , Yang, Guang in Artificial Intelligence , Brain , Brain cancer

2023

Despite recent advances in the accuracy of brain tumor segmentation, the results still suffer from low reliability and robustness. Uncertainty estimation is an efficient solution to this problem, as it provides a measure of confidence in the segmentation results. The current uncertainty estimation methods based on quantile regression, Bayesian neural network, ensemble, and Monte Carlo dropout are limited by their high computational cost and inconsistency. In order to overcome these challenges, Evidential Deep Learning (EDL) was developed in recent work but primarily for natural image classification and showed inferior segmentation results. In this paper, we proposed a region-based EDL segmentation framework that can generate reliable uncertainty maps and accurate segmentation results, which is robust to noise and image corruption. We used the Theory of Evidence to interpret the output of a neural network as evidence values gathered from input features. Following Subjective Logic, evidence was parameterized as a Dirichlet distribution, and predicted probabilities were treated as subjective opinions. To evaluate the performance of our model on segmentation and uncertainty estimation, we conducted quantitative and qualitative experiments on the BraTS 2020 dataset. The results demonstrated the top performance of the proposed method in quantifying segmentation uncertainty and robustly segmenting tumors. Furthermore, our proposed new framework maintained the advantages of low computational cost and easy implementation and showed the potential for clinical application.

Journal Article

Share this book

Add to My Shelf

Multimodal Sarcasm Detection via Hybrid Classifier with Optimistic Logic

by Bavkar, Dnyaneshwar Madhukar , Khairnar, Vaishali , Kashyap, Ramgopal in Algorithms , Audio data , Bi-GRU

2022

This work aims to provide a novel multimodal sarcasm detection model that includes four stages: pre-processing, feature extraction, feature level fusion, and classification. The pre-processing uses multimodal data that includes text, video, and audio. Here, text is pre-processed using tokenization and stemming, video is pre-processed during the face detection phase, and audio is pre-processed using the filtering technique. During the feature extraction stage, such text features as TF-IDF, improved bag of visual words, n-gram, and emojis as well on the video features using improved SLBT, and constraint local model (CLM) are extraction. Similarly the audio features like MFCC, chroma, spectral features, and jitter are extracted. Then, the extracted features are transferred to the feature level fusion stage, wherein an improved multilevel canonical correlation analysis (CCA) fusion technique is performed. The classification is performed using a hybrid classifier (HC), e.g. bidirectional gated recurrent unit (Bi-GRU) and LSTM. The outcomes of Bi-GRU and LSTM are averaged to obtain an effective output. To make the detection results more accurate, the weight of LSTM will be optimally tuned by the proposed opposition learning-based aquila optimization (OLAO) model. The MUStARD dataset is a multimodal video corpus used for automated sarcasm discovery studies. Finally, the effectiveness of the proposed approach is proved based on various metrics.

Journal Article

Share this book

Add to My Shelf

BF2SkNet: best deep learning features fusion-assisted framework for multiclass skin lesion classification

by Alhaisoni, Majed , Akram, Tallha , Althubiti, Sara A. in Accuracy , Algorithms , Artificial Intelligence

2023

The convolutional neural network showed considerable success in medical imaging with explainable AI for cancer detection and recognition. However, the irrelevant and large number of features increases the computational time and decreases the accuracy. This work proposes a deep learning and fuzzy entropy slime mould algorithm-based architecture for multiclass skin lesion classification. In the first step, we employed the data augmentation technique to increase the training data and further utilized it for training two fine-tuned deep learning models such as Inception-ResNetV2 and NasNet Mobile. Then, we used transfer learning on augmented datasets to train both models and obtained two feature vectors from newly fine-tuned models. Later, we applied a fuzzy entropy slime mould algorithm on both vectors to get optimal features that are finally fused using the Serial-Threshold fusion technique and classified using several machine learning classifiers. Eventually, the explainable AI technique named Gradcam opted for the visualization of the lesion region. The experimental process was conducted on two datasets, such as HAM10000 and ISIC 2018, and achieved 97.1 and 90.2% accuracy, better than the other techniques.

Journal Article

Share this book

Add to My Shelf

Advancements in the Intelligent Detection of Driver Fatigue and Distraction: A Comprehensive Review

by Ma, Yuan , Li, Zhenfeng , Fu, Shichen in Accidents , Artificial intelligence , Cameras

2024

Detecting the factors affecting drivers’ safe driving and taking early warning measures can effectively reduce the probability of automobile safety accidents and improve vehicle driving safety. Considering the two factors of driver fatigue and distraction state, their influences on driver behavior are elaborated from both experimental data and an accident library analysis. Starting from three modes and six types, intelligent detection methods for driver fatigue and distraction detection from the past five years are reviewed in detail. Considering its wide range of applications, the research on machine vision detection based on facial features in the past five years is analyzed, and the methods are carefully classified and compared according to their innovation points. Further, three safety warning and response schemes are proposed in light of the development of autonomous driving and intelligent cockpit technology. Finally, the paper summarizes the current state of research in the field, presents five conclusions, and discusses future trends.

Journal Article

Share this book

Add to My Shelf

An underwater dual-modal denoising detection network for illumination fluctuation and dense occlusion scenes

by Shuang Wu , Zhongfeng Zhang , Chao Zhang in deep learning , feature enhancement , multimodal feature fusion

2026

BackgroundUnderwater object detection is a critical enabling technology for intelligent ocean exploration. However, light scattering and absorption in underwater environments cause significant RGB image degradation and texture loss, leading to weak foreground feature representation and limited detection accuracy in densely occluded scenes.MethodsTo address these challenges, this paper proposes the Underwater Dual-modal Detection Network (UW-DualDet), which incorporates depth information to compensate for RGB degradation and enhance detection performance. Two plug-and-play modules are designed: the Underwater Denoise Feature Fusion (UDFF) module suppresses dual-modal noise and adaptively exploits the complementarity between RGB texture and depth geometry to mitigate single-modality failure under extreme illumination, while the Underwater Feature Enhancement Module (UFEM) improves feature representation and multiscale semantic perception through multi-scale depthwise separable convolutions and a multi-dimensional weighting mechanism. Furthermore, a Gaussian-filterenhanced YOLO26 detection head (D26Head) is introduced to reduce missed and false detections in densely occluded scenarios by balancing positive sample coverage and localization accuracy through a dual-branch optimization strategy, while also suppressing feature noise.ResultsUW-DualDet achieves a mean Average Precision (mAP) of 86.7%, outperforming all compared state-of-the-art methods. It demonstrates superior robustness across three representative scenarios: 88.2% AP₅₀ in clear water, 79.6% AP₅₀ in turbid environments, and 75.3% AP₅₀ in low-light deep water. The model maintains an inference speed of 116 FPS, meeting real-time operational requirements.DiscussionThe proposed method effectively addresses the core challenges of underwater object detection by leveraging multimodal complementarity and task-specific optimization. It provides robust technical support for underwater intelligent operations and ecological monitoring. Future work will focus on improving depth estimation accuracy and extending the model to more extreme underwater environments.

Journal Article

Share this book

Add to My Shelf

Synthesis and luminescence monitoring of iridium(III) complex-functionalized gold nanoparticles and their application for determination of gold(III) ions

by Niu, Dou , Kong, Lingtan , Liu, Jianhua in Alkynes , Analytical Chemistry , Carbon dioxide

2023

A new method is presented for the one-step synthesis and real-time monitoring of iridium(III) complex-functionalized AuNPs from the precursor gold(III) chloride (AuCl 3 ). The functionalized AuNPs with an average size of 8 − 20 nm were obtained by the reduction of Au 3+ ions by the alkyne group of iridium(III) complexes, which was accompanied by the anchoring iridium(III) complexes on the surface of the nanoparticles. Meanwhile, the luminescence of the iridium(III) complexes was effectively quenched due to distance-dependent fluorescence quenching by AuNPs, thereby enabling luminescence monitoring of the formation process of the functionalized AuNPs and obtaining scattering information and spectral information in real time. Moreover, this method was applied to the determination of Au 3+ ions in buffer with a limit of detection of 0.38 μM at 700 nm in luminescence mode, while the detection limit for absorbance was 10.04 μM. Importantly, the multimodal detection strategy alleviates interference from other metal ions. Furthermore, the iridium(III) alkyne complexes were capable of imaging mitochondrial Au 3+ ions in living cells. Taken together, this work opens a new avenue for convenient synthesis and monitoring formation of functionalized AuNPs, and also provides a tool for selective determination of Au 3+ ions in solution and in cellulo . Graphical abstract

Journal Article

Share this book

Add to My Shelf

Can Separation Enhance Fusion? An Efficient Framework for Target Detection in Multimodal Remote Sensing Imagery

by Liu, Rui , Feng, Jie , Wang, Lei in Ablation , Accuracy , Algorithms

2025

Target detection in remote sensing images has garnered significant attention due to its wide range of applications. Many traditional methods primarily rely on unimodal data, which often struggle to address the complexities of remote sensing environments. Furthermore, small-target detection remains a critical challenge in remote sensing image analysis, as small targets occupy only a few pixels, making feature extraction difficult and prone to errors. To address these challenges, this paper revisits the existing multimodal fusion methodologies and proposes a novel framework of separation before fusion (SBF). Leveraging this framework, we present Sep-Fusion—an efficient target detection approach tailored for multimodal remote sensing aerial imagery. Within the modality separation module (MSM), the method separates the three RGB channels of visible light images into independent modalities aligned with infrared image channels. Each channel undergoes independent feature extraction through the unimodal block (UB) to effectively capture modality-specific features. The extracted features are then fused using the feature attention fusion (FAF) module, which integrates channel attention and spatial attention mechanisms to enhance multimodal feature interaction. To improve the detection of small targets, an image regeneration module is exploited during the training stage. It incorporates the super-resolution strategy with attention mechanisms to further optimize high-resolution feature representations for subsequent positioning and detection. Sep-Fusion is currently developed on the YOLO series to make itself a potential real-time detector. Its lightweight architecture enables the model to achieve high computational efficiency while maintaining the desired detection accuracy. Experimental results on the multimodal VEDAI dataset show that Sep-Fusion achieves 77.9% mAP50, surpassing many state-of-the-art models. Ablation experiments further illustrate the respective contribution of modality separation and attention fusion. The adaptation of our multimodal method to unimodal target detection is also verified on NWPU VHR-10 and DIOR datasets, which proves Sep-Fusion to be a suitable alternative to current detectors in various remote sensing scenarios.

Journal Article

Share this book

Add to My Shelf

A Novel framework of Adaptive fuzzy-GLCM Segmentation and Fuzzy with Capsules Network (F-CapsNet) Classification

by Xu, Jinghong , Ali, Rizwan , Manikandan, A. in Accuracy , Artificial Intelligence , Artificial neural networks

2023

In this paper, offer a new framework for skin disease image recognition using deep learning techniques and local descriptor encoding approaches. For the purpose of detecting melanoma early, skin lesions must be accurately classified. In this research, an automatic image preprocessing approach is proposed for the removal of noise artefacts in photographs, including thin and thick hair objects, surgical ink markings, dark halo effects, and ebony frames. Due to hazy contrasts and distortions at the border margins, segmenting images are quite challenging. So, this research suggests a partitioning technique based on a fuzzy gray-level co-occurrence matrix (GLCM) that is both effective and adaptive. An alternative to convolutional neural networks (CNN) is proposed: the capsule-based network. An object's existence and the relationship between its functions are represented by a group of neurons (in logical units) that make up a vector called a capsule. While synthetic product neural networks use max-pooling layers to define capsule coupling between subsequent layers, capsule networks repeatedly utilise a dynamic routing technique to do so. Alternatively said, the routing-by-agreement approach offers learning between capsule layers. To assess the efficacy of the F-CapsNet technique, three widely used datasets—the ISIC 2017 Challenge, the 2019 Challenge, and the PH2 datasets—are employed. The suggested technique has an average accuracy of 99.16% for the ISBI 2017 test dataset and 99.45% accuracy for the ISBI 2019 test dataset. Additionally, the PH2 test dataset shows that the suggested approach has an average accuracy of 98.42%.

Journal Article

Share this book

Add to My Shelf

A Multi-Modal Approach for Robust Oriented Ship Detection: Dataset and Methodology

by Li, Shengyang , You, Jianing , Lv, Yixuan in Accuracy , Algorithms , Benchmarks

2026

Maritime ship detection is a critical task for security and traffic management. To advance research in this area, we constructed a new high-resolution, spatially aligned optical-SAR dataset, named MOS-Ship. Building on this, we propose MOS-DETR, a novel query-based framework. This model incorporates an innovative multi-modal Swin Transformer backbone to extract unified feature pyramids from both RGB and SAR images. This design allows the model to jointly exploit optical textures and SAR scattering signatures for precise, oriented bounding box prediction. We also introduce an adaptive probabilistic fusion mechanism. This post-processing module dynamically integrates the detection results generated by our model from the optical and SAR inputs, synergistically combining their complementary strengths. Experiments validate that MOS-DETR achieves highly competitive accuracy and significantly outperforms unimodal baselines, demonstrating superior robustness across diverse conditions. This work provides a robust framework and methodology for advancing multimodal maritime surveillance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter