Catalogue Search | MBRL

Recent advancements in multimodal human–robot interaction

by Sandoval, Juan , Chen, Jiahao , Qi, Wen in Computer Science , human–robot interaction , multi-modal feedback

2023

Robotics have advanced significantly over the years, and human–robot interaction (HRI) is now playing an important role in delivering the best user experience, cutting down on laborious tasks, and raising public acceptance of robots. New HRI approaches are necessary to promote the evolution of robots, with a more natural and flexible interaction manner clearly the most crucial. As a newly emerging approach to HRI, multimodal HRI is a method for individuals to communicate with a robot using various modalities, including voice, image, text, eye movement, and touch, as well as bio-signals like EEG and ECG. It is a broad field closely related to cognitive science, ergonomics, multimedia technology, and virtual reality, with numerous applications springing up each year. However, little research has been done to summarize the current development and future trend of HRI. To this end, this paper systematically reviews the state of the art of multimodal HRI on its applications by summing up the latest research articles relevant to this field. Moreover, the research development in terms of the input signal and the output signal is also covered in this manuscript.

Journal Article

Share this book

Add to My Shelf

A randomized phase II feasibility trial of a multimodal intervention for the management of cachexia in lung and pancreatic cancer

by Kaasa, Stein , Balstad, Trude Rakel , Fearon, Kenneth in Aged , Anti‐inflammatory , Authorship

2017

Background Cancer cachexia is a syndrome of weight loss (including muscle and fat), anorexia, and decreased physical function. It has been suggested that the optimal treatment for cachexia should be a multimodal intervention. The primary aim of this study was to examine the feasibility and safety of a multimodal intervention (n‐3 polyunsaturated fatty acid nutritional supplements, exercise, and anti‐inflammatory medication: celecoxib) for cancer cachexia in patients with incurable lung or pancreatic cancer, undergoing chemotherapy. Methods Patients receiving two cycles of standard chemotherapy were randomized to either the multimodal cachexia intervention or standard care. Primary outcome measures were feasibility assessed by recruitment, attrition, and compliance with intervention (>50% of components in >50% of patients). Key secondary outcomes were change in weight, muscle mass, physical activity, safety, and survival. Results Three hundred and ninety‐nine were screened resulting in 46 patients recruited (11.5%). Twenty five patients were randomized to the treatment and 21 as controls. Forty‐one completed the study (attrition rate 11%). Compliance to the individual components of the intervention was 76% for celecoxib, 60% for exercise, and 48% for nutritional supplements. As expected from the sample size, there was no statistically significant effect on physical activity or muscle mass. There were no intervention‐related Serious Adverse Events and survival was similar between the groups. Conclusions A multimodal cachexia intervention is feasible and safe in patients with incurable lung or pancreatic cancer; however, compliance to nutritional supplements was suboptimal. A phase III study is now underway to assess fully the effect of the intervention.

Journal Article

Share this book

Add to My Shelf

Multi-modal remote perception learning for object sensory data

by Algarni, Asaad , Al Mudawi, Naif , Alazeb, Abdulwahab in multi-modal , Neuroscience , objects recognition

2024

When it comes to interpreting visual input, intelligent systems make use of contextual scene learning, which significantly improves both resilience and context awareness. The management of enormous amounts of data is a driving force behind the growing interest in computational frameworks, particularly in the context of autonomous cars. The purpose of this study is to introduce a novel approach known as Deep Fused Networks (DFN), which improves contextual scene comprehension by merging multi-object detection and semantic analysis. To enhance accuracy and comprehension in complex situations, DFN makes use of a combination of deep learning and fusion techniques. With a minimum gain of 6.4% in accuracy for the SUN-RGB-D dataset and 3.6% for the NYU-Dv2 dataset. Findings demonstrate considerable enhancements in object detection and semantic analysis when compared to the methodologies that are currently being utilized.

Journal Article

Share this book

Add to My Shelf

Enhancing Nowcasting With Multi‐Resolution Inputs Using Deep Learning: Exploring Model Decision Mechanisms

by Feng, Jie , Chen, Lei , Cao, Yuan in Decision making , Deep learning , interpretability

2025

Nowcasting methods based on deep learning typically rely solely on radar data. However, effectively leveraging multi‐source data with diverse spatio‐temporal resolutions remains a significant challenge in the field. To address this challenge, we propose and validate a novel deep learning model for nowcasting, termed Nowcastformer. This model utilizes radar data and upper‐air atmospheric variables, and has been pretrained on satellite data from non‐target regions. Quantitative statistical assessments demonstrate that both the integration of multi‐source data and the implementation of pre‐training strategies enhance the model's performance. Additionally, we conduct a comprehensive analysis of predictor importance, revealing a trend where atmospheric variables become increasingly important as the forecast horizon increases. To illustrate the model's interpretability, we employ the integrated gradients method, which highlights critical areas in representative cases and provides insights into the model's decision‐making process. Plain Language Summary As a sophisticated monitoring tool, weather radar occupies a pivotal position in convective nowcasting. While numerous contemporary deep learning approaches predominantly concentrate on refining network architectures using radar reflectivity as the sole input, the impact of atmospheric physical information on nowcasting remains underexplored. To incorporate the contextual backdrop of atmospheric states in nowcasting, we devise a comprehensive deep learning framework that integrates atmospheric variables across multiple levels. To enhance generalization, we employ a transfer learning strategy to extract generalized spatialtemporal features. Rather than emphasizing a specific network design, we underscore the advantages of harnessing multi‐source data and the decision mechanism of the model. By fusing atmospheric variables and radar reflectivity, and adopting a pre‐training and fine‐tuning approach, we achieve more reliable and resilient nowcasting. Overall, our successful implementation of transfer learning within this multi‐modal model offers promising insights for advancing the field of nowcasting. Key Points We present a nowcasting model that is scalable and flexible, enabling it to incorporate heterogeneous inputs With multi‐source data and pre‐training, performance enhancement is achieved in terms of both general statistics and representative event The interpretability method reveals how the model generates predictions in a physically meaningful manner

Journal Article

Share this book

Add to My Shelf

A defensive attention mechanism to detect deepfake content across multiple modalities

by Menon, Varun G. , Asha, S. , Vinod, P. in Algorithms , Computer Communication Networks , Computer Graphics

2024

Recently, researchers have attracted much attention to the realistic nature of multi-modal deepfake content. They have employed plenty of handcrafted, learned features, and deep learning techniques to achieve promising performances for recognizing facial deepfakes. However, attackers continue to create deepfakes that outperform their earlier works by focusing on changes in many modalities, making deepfake identification under multiple modalities difficult. To exploit the merits of attention-based network architecture, we propose a novel cross-modal attention architecture on a bi-directional recurrent convolutional network to capture fake content in audio and video. For effective deepfake detection, the system records the spatial–temporal deformations of audio–video sequences and investigates the correlation in these modalities. We propose a self-attenuated VGG16 deep model for extracting visual features for facial fake recognition. Besides, the system incorporates a recurrent neural network with self-attention to extract false audio elements effectively. The cross-modal attention mechanism effectively learns the divergence between two modalities. Besides, we include multi-modal fake examples to create a well-balanced bespoke dataset to address the drawbacks of small and unbalanced training samples. We test the effectiveness of our proposed multi-modal deepfake detection strategy in comparison to state-of-the-art methods on a variety of existing datasets.

Journal Article

Share this book

Add to My Shelf

Msfusenet: a multi-stage information fusion network for multi-modal skin lesion diagnosis

by Xiao, Yukun , Yu, Long , Kang, Xiaojing in Algorithms , Artificial intelligence , Classification

2025

Utilizing deep learning to process multi-modal information, including clinical images, dermatoscopy images, and patient metadata, for multi-modal skin lesion diagnosis (MSLD) aligns with modern medical dermatological diagnostic methods. A crucial task in achieving MSLD is to fully leverage multi-modal information, which previous works often failed to accomplish. To address this challenge, in this paper, we propose a novel network, MSfuseNet, for multi-modal skin disease classification. Our method primarily consists of four modules: (1) First, we employ a Coordinate-Spatial Attention Fusion Module to align the two types of images in both coordinate and spatial dimensions, combining this module at different stages with intermediate fusion strategies to reduce information loss in the model. (2) Then, we utilize a MIX Module to facilitate the transformation of shallow local features into deep global features within the model, thereby enhancing the model’s modeling capabilities and robustness. (3) We adopt a Double-Modality Cross-Attention Fusion Module, employing cross-attention mechanisms for global modeling of image features. (4) Finally, we employ a Triple-Modality Fusion Module to aggregate textual features and image features, achieving a full integration of multi-modal information. We have validated the effectiveness of our approach on a public dataset named Derm7pt and a dataset we collected named XJU-MMSD. Compared to state-of-the-art methods, our method achieved the highest average accuracy of 77.73% on the Derm7pt dataset.

Journal Article

Share this book

Add to My Shelf

Multimodal cyberbullying detection using capsule network with dynamic routing and deep convolutional neural network

by Sachdeva, Nitin , Kumar, Akshi in Artificial neural networks , Bullying , Cyberbullying

2022

Cyberbullying is the use of information technology networks by individuals’ to humiliate, tease, embarrass, taunt, defame and disparage a target without any face-to-face contact. Social media is the 'virtual playground' used by bullies with the upsurge of social networking sites such as Facebook, Instagram, YouTube and Twitter. It is critical to implement models and systems for automatic detection and resolution of bullying content available online as the ramifications can lead to a societal epidemic. This paper presents a deep neural model for cyberbullying detection in three different modalities of social data, namely textual, visual and info-graphic (text embedded along with an image). The all-in-one architecture, CapsNet–ConvNet, consists of a capsule network (CapsNet) deep neural network with dynamic routing for predicting the textual bullying content and a convolution neural network (ConvNet) for predicting the visual bullying content. The info-graphic content is discretized by separating text from the image using Google Lens of Google Photos app. The perceptron-based decision-level late fusion strategy for multimodal learning is used to dynamically combine the predictions of discrete modalities and output the final category as bullying or non-bullying type. Experimental evaluation is done on a mix-modal dataset which contains 10,000 comments and posts scrapped from YouTube, Instagram and Twitter. The proposed model achieves a superlative performance with the AUC–ROC of 0.98.

Journal Article

Share this book

Add to My Shelf

Advanced mass spectrometric and spectroscopic methods coupled with machine learning for in vitro diagnosis

by Wan, Jingjing , Chen, Xiaonan , Shu, Weikang in Accuracy , Algorithms , Artificial intelligence

2023

In vitro diagnosis (IVD) is one vital component of medical tests that detects biological samples of tissues or bio‐fluids. Recently, mass spectrometry and spectroscopy have been increasingly employed in the field of IVD, due to their high accuracy, facile sample preparation, and rapid detection. Notably, the large datasets generated by these two technology methods provide a wealth of information but subsequently involve complex and time‐consuming processing works. Machine learning (ML), an important branch of artificial intelligence (AI), has emerged as a promising solution for the decoding of big data. ML imitates the human brain to process data, significantly improving accuracy and efficiency compared with traditional processing methods. In this review, we first introduce the commonly used ML algorithms and advanced mass spectrometry and spectroscopy techniques in the field of IVD, respectively. The ML algorithms are summarized as four aspects according to different learning tasks. Then, the combinations of ML with mass spectrometry, spectroscopy, and multi‐modal analysis for IVD are presented, and the roles of ML in these combinations are elucidated by some representative examples. This review aims to provide a systematic and comprehensive summary of the literature on ML‐assisted mass spectrometry or spectroscopy. We believe that it will facilitate researchers to select suitable ML algorithms for supplementing existing detection techniques or to develop the potential of coupling more detection techniques with ML, thus promoting the development of mass spectrometry and spectroscopy in IVD. Recently, mass spectrometric and spectroscopic methods coupled with machine learning have been increasingly employed in the field of in vitro diagnoses, such as pathogen identification, cancer diagnosis, and cell classification. In this review, the authors focus on the combinations of machine learning with mass spectrometry, spectroscopy, and multi‐modal analysis for in vitro diagnoses, and they highlight the roles of machine learning in these combinations through some representative examples. The authors furthermore discuss the challenges and perspectives of mass spectrometry, spectroscopy, multi‐modal analysis, and machine learning.

Journal Article

Share this book

Add to My Shelf

Multi-modal Domain Adaptation Method Based on Parameter Fusion and Two-Step Alignment

by Gong, Lishuang , Guo, Xin , Yao, Yuan in Alignment , Artificial Intelligence , Classification

2024

Due to the well-known domain shift problem, directly deploying a trained multi-modal classifier to a new environment usually leads to poor performance. The existing multi-modal domain adaption methods not only lack the fine-grained information of cross-modal data distribution, but also lack the cross-modal correlation research. Therefore, this paper proposes a multi-modal domain adaption method based on parameter fusion and two-step alignment (PFTS) to solve the related problems. The consistency of network parameters is used to enhance the correlation among modalities, and a higher-order moment measurement is introduced to improve the alignment of data distribution at the fine-grained level. In addition, the weighting of each modality is further carried out to achieve focused transfer. Comprehensive experiments based on multi-modal datasets with different domain adaption settings have been conducted, the results show that the precision of PFTS is 5.38% higher than state-of-the-art multi-modal domain adaption methods.

Journal Article

Share this book

Add to My Shelf

Multi-Modal 3D Object Detection in Autonomous Driving: A Survey

by Wang, Yingjie , Mao, Qiuyu , Zhang, Yanyong in Algorithms , Autonomous cars , Autonomy

2023

The past decade has witnessed the rapid development of autonomous driving systems. However, it remains a daunting task to achieve full autonomy, especially when it comes to understanding the ever-changing, complex driving scenes. To alleviate the difficulty of perception, self-driving vehicles are usually equipped with a suite of sensors (e.g., cameras, LiDARs), hoping to capture the scenes with overlapping perspectives to minimize blind spots. Fusing these data streams and exploiting their complementary properties is thus rapidly becoming the current trend. Nonetheless, combining data that are captured by different sensors with drastically different ranging/ima-ging mechanisms is not a trivial task; instead, many factors need to be considered and optimized. If not careful, data from one sensor may act as noises to data from another sensor, with even poorer results by fusing them. Thus far, there has been no in-depth guidelines to designing the multi-modal fusion based 3D perception algorithms. To fill in the void and motivate further investigation, this survey conducts a thorough study of tens of recent deep learning based multi-modal 3D detection networks (with a special emphasis on LiDAR-camera fusion), focusing on their fusion stage (i.e., when to fuse), fusion inputs (i.e., what to fuse), and fusion granularity (i.e., how to fuse). These important design choices play a critical role in determining the performance of the fusion algorithm. In this survey, we first introduce the background of popular sensors used for self-driving, their data properties, and the corresponding object detection algorithms. Next, we discuss existing datasets that can be used for evaluating multi-modal 3D object detection algorithms. Then we present a review of multi-modal fusion based 3D detection networks, taking a close look at their fusion stage, fusion input and fusion granularity, and how these design choices evolve with time and technology. After the review, we discuss open challenges as well as possible solutions. We hope that this survey can help researchers to get familiar with the field and embark on investigations in the area of multi-modal 3D object detection.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter