Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
440 result(s) for "Multimodal user interfaces (Computers)"
Sort by:
Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion
Multimodal sentiment analysis has gained popularity as a research field for its ability to predict users’ emotional tendencies more comprehensively. The data fusion module is a critical component of multimodal sentiment analysis, as it allows for integrating information from multiple modalities. However, it is challenging to combine modalities and remove redundant information effectively. In our research, we address these challenges by proposing a multimodal sentiment analysis model based on supervised contrastive learning, which leads to more effective data representation and richer multimodal features. Specifically, we introduce the MLFC module, which utilizes a convolutional neural network (CNN) and Transformer to solve the redundancy problem of each modal feature and reduce irrelevant information. Moreover, our model employs supervised contrastive learning to enhance its ability to learn standard sentiment features from data. We evaluate our model on three widely-used datasets, namely MVSA-single, MVSA-multiple, and HFM, demonstrating that our model outperforms the state-of-the-art model. Finally, we conduct ablation experiments to validate the efficacy of our proposed method.
Convolutional Neural Network Approach Based on Multimodal Biometric System with Fusion of Face and Finger Vein Features
In today’s information age, how to accurately identify a person’s identity and protect information security has become a hot topic of people from all walks of life. At present, a more convenient and secure solution to identity identification is undoubtedly biometric identification, but a single biometric identification cannot support increasingly complex and diversified authentication scenarios. Using multimodal biometric technology can improve the accuracy and safety of identification. This paper proposes a biometric method based on finger vein and face bimodal feature layer fusion, which uses a convolutional neural network (CNN), and the fusion occurs in the feature layer. The self-attention mechanism is used to obtain the weights of the two biometrics, and combined with the RESNET residual structure, the self-attention weight feature is cascaded with the bimodal fusion feature channel Concat. To prove the high efficiency of bimodal feature layer fusion, AlexNet and VGG-19 network models were selected in the experimental part for extracting finger vein and face image features as inputs to the feature fusion module. The extensive experiments show that the recognition accuracy of both models exceeds 98.4%, demonstrating the high efficiency of the bimodal feature fusion.
A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing
In modern manufacturing, making accurate and timely decisions requires the ability to effectively handle multiple types of data. This paper presents a multimodal system designed specifically for smart manufacturing applications. The system combines various data sources including images, sensor data, and production records, using advanced multimodal large language models. This approach addresses common limitations of traditional single-modal methods, such as isolated data analysis and poor integration between different data types. Key contributions include a unified method for representing different data types, dynamic semantic tokenization for better data processing, strong alignment strategies across modalities, and a practical two-stage training method involving initial large-scale pretraining and later fine-tuning for specific tasks. Additionally, a novel Transformer-based model is introduced for generating both images and text, significantly improving real-time decision-making capabilities. Experiments on relevant industrial datasets show that this method consistently performs better than current state-of-the-art approaches in tasks like image–text retrieval and visual question answering. The results demonstrate the effectiveness and versatility of the proposed methods, offering important insights and practical solutions to enhance intelligent manufacturing, predictive maintenance, and anomaly detection, thus supporting the development of more efficient and reliable industrial systems.
Isabl Platform, a digital biobank for processing multimodal patient data
Background The widespread adoption of high throughput technologies has democratized data generation. However, data processing in accordance with best practices remains challenging and the data capital often becomes siloed. This presents an opportunity to consolidate data assets into digital biobanks—ecosystems of readily accessible, structured, and annotated datasets that can be dynamically queried and analysed. Results We present Isabl, a customizable plug-and-play platform for the processing of multimodal patient-centric data. Isabl's architecture consists of a relational database (Isabl DB), a command line client (Isabl CLI), a RESTful API (Isabl API) and a frontend web application (Isabl Web). Isabl supports automated deployment of user-validated pipelines across the entire data capital. A full audit trail is maintained to secure data provenance, governance and ensuring reproducibility of findings. Conclusions As a digital biobank, Isabl supports continuous data utilization and automated meta analyses at scale, and serves as a catalyst for research innovation, new discoveries, and clinical translation.
Redesigning Multimodal Interaction: Adaptive Signal Processing and Cross-Modal Interaction for Hands-Free Computer Interaction
Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech commands, wake-word detection, and vocal gestures. However, existing systems often suffer from limitations in responsiveness and accuracy, especially under real-world conditions. In this paper, we present 3-Modal Human-Computer Interaction (3M-HCI), a novel interaction system that dynamically integrates facial, vocal, and eye-based inputs through a new signal processing pipeline and a cross-modal coordination mechanism. This approach not only enhances recognition accuracy but also reduces interaction latency. Experimental results demonstrate that 3M-HCI outperforms several recent hands-free interaction solutions in both speed and precision, highlighting its potential as a robust assistive interface.
Design and Implementation of Attention Depression Detection Model Based on Multimodal Analysis
Depression is becoming a social problem as the number of sufferers steadily increases. In this regard, this paper proposes a multimodal analysis-based attention depression detection model that simultaneously uses voice and text data obtained from users. The proposed models consist of Bidirectional Encoders from Transformers-Convolutional Neural Network (BERT-CNN) for natural language analysis, CNN-Bidirectional Long Short-Term Memory (CNN-BiLSTM) for voice signal processing, and multimodal analysis and fusion models for depression detection. The experiments in this paper are conducted using the DAIC-WOZ dataset, a clinical interview designed to support psychological distress states such as anxiety and post-traumatic stress. The voice data were set to 4 seconds in length and the number of mel filters was set to 128 in the preprocessing process. For text data, we used the subject text data of the interview and derived the embedding vector using a transformers tokenizer. Based on each data set, the BERT-CNN and CNN-BiLSTM proposed in this paper were applied and combined to classify depression. Through experiments, the accuracy and loss degree were compared for the cases of using multimodal data and using single data, and it was confirmed that the existing low accuracy was improved.
Structured matching models in multimodal information fusion: An optimized Kuhn-Munkres algorithm
In modern multimodal interaction design, integrating information from diverse modalities—such as speech, vision, and text—presents a significant challenge. These modalities differ in structure, timing, and data volume, often leading to mismatches, low computational efficiency, and suboptimal user experiences during the integration process. This study aims to enhance both the efficiency and accuracy of multimodal information fusion. To achieve this, publicly available datasets—Carnegie Mellon University Multimodal Opinion Sentiment Intensity (CMU-MOSI) and Interactive Emotional Dyadic Motion Capture (IEMOCAP)—are employed to collect speech, visual, and textual data relevant to multimodal interaction scenarios. The data undergo preprocessing steps including noise reduction, feature extraction (e.g., Mel Frequency Cepstral Coefficients and keypoint detection), and temporal alignment. An improved Kuhn-Munkres algorithm is then proposed, extending the traditional bipartite graph matching model to support weighted multimodal matching. The algorithm dynamically adjusts weight coefficients based on the importance scores of each modality, while also incorporating a cross-modal correlation matrix as a constraint to improve the robustness of the matching process. The enhanced algorithm’s performance is validated through information matching efficiency tests and user interaction satisfaction surveys. Experimental results show that it improves multimodal information matching accuracy by 28.2% over the baseline method. Integration efficiency increases by 18.7%, and computational complexity is significantly reduced, with average computation time decreased by 15.4%. User satisfaction also improves, with a 19.5% increase in experience ratings. Ablation studies further confirm the critical contribution of both the dynamic weighting mechanism and the correlation matrix constraint to the overall performance. This study introduces a novel optimization strategy for multimodal information integration, offering substantial theoretical value and broad applicability in intelligent interaction design and human-computer collaboration. These advancements contribute meaningfully to the development of next-generation multimodal interaction systems.
Assessing User Experience with Piezoresistive Force Sensors: Interpreting Button Press Impulse and Duration
As robotic systems become increasingly integrated into daily life, the need for user experience (UX) assessment methods that are both privacy-conscious and suitable for embedded hardware platforms has grown. Traditional UX evaluations relying on vision, audio, or lengthy questionnaires are often intrusive, computationally demanding, or impractical for low-power devices. In this study, we introduce a novel sensor-based method for assessing UX through direct physical interaction. We designed a robot lamp with a force-sensing button interface and conducted a user study involving controlled robot errors. Participants interacted with the lamp during a reading task and rated their UX on a 7-point Likert scale. Using force and time data from button presses, we correlated force and time data to user experience and demographic information. Our results demonstrate the potential of bodily interaction metrics as a viable alternative for UX assessment in human-robot interaction, enabling real-time, embedded, and privacy-aware evaluation of user satisfaction in robotic systems.
Haptic interface with multimodal tactile sensing and feedback for human–robot interaction
Novel sensing and actuation technologies have notably advanced haptic interfaces, paving the way for more immersive user experiences. We introduce a haptic system that transcends traditional pressure-based interfaces by delivering more comprehensive tactile sensations. This system provides an interactive combination of a robotic hand and haptic glove to operate devices within the wireless communication range. Each component is equipped with independent sensors and actuators, enabling real-time mirroring of user’s hand movements and the effective transmission of tactile information. Remarkably, the proposed system has a multimodal feedback mechanism based on both vibration motors and Peltier elements. This mechanism ensures a varied tactile experience encompassing pressure and temperature sensations. The accuracy of tactile feedback is meticulously calibrated according to experimental data, thereby enhancing the reliability of the system and user experience. The Peltier element for temperature feedback allows users to safely experience temperatures similar to those detected by the robotic hand. Potential applications of this system are wide ranging and include operations in hazardous environments and medical interventions. By providing realistic tactile sensations, our haptic system aims to improve both the performance and safety of workers in such critical sectors, thereby highlighting the great potential of advanced haptic technologies.
Integrating Textual Queries with AI-Based Object Detection: A Compositional Prompt-Guided Approach
While object detection and recognition have been extensively adopted by many applications in decision-making, new algorithms and methodologies have emerged to enhance the automatic identification of target objects. In particular, the rise of deep learning and language models has opened many possibilities in this area, although challenges in contextual query analysis and human interactions persist. This article presents a novel neuro-symbolic object detection framework that aligns object proposals with textual prompts using a deep learning module while enabling logical reasoning through a symbolic module. By integrating deep learning with symbolic reasoning, object detection and scene understanding are considerably enhanced, enabling complex, query-driven interactions. Using a synthetic 3D image dataset, the results demonstrate that this framework effectively generalizes to complex queries, combining simple attribute-based descriptions without explicit training on compound prompts. We present the numerical results and comprehensive discussions, highlighting the potential of our approach for emerging smart applications.