Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
84 result(s) for "multi-task fusion"
Sort by:
A Multi-Task Fusion Strategy-Based Decision-Making and Planning Method for Autonomous Driving Vehicles
The autonomous driving technology based on deep reinforcement learning (DRL) has been confirmed as one of the most cutting-edge research fields worldwide. The agent is enabled to achieve the goal of making independent decisions by interacting with the environment and learning driving strategies based on the feedback from the environment. This technology has been widely used in end-to-end driving tasks. However, this field faces several challenges. First, developing real vehicles is expensive, time-consuming, and risky. To further expedite the testing, verification, and iteration of end-to-end deep reinforcement learning algorithms, a joint simulation development and validation platform was designed and implemented in this study based on VTD–CarSim and the Tensorflow deep learning framework, and research work was conducted based on this platform. Second, sparse reward signals can cause problems (e.g., a low-sample learning rate). It is imperative for the agent to be capable of navigating in an unfamiliar environment and driving safely under a wide variety of weather or lighting conditions. To address the problem of poor generalization ability of the agent to unknown scenarios, a deep deterministic policy gradient (DDPG) decision-making and planning method was proposed in this study in accordance with a multi-task fusion strategy. The main task based on DRL decision-making planning and the auxiliary task based on image semantic segmentation were cross-fused, and part of the network was shared with the main task to reduce the possibility of model overfitting and improve the generalization ability. As indicated by the experimental results, first, the joint simulation development and validation platform built in this study exhibited prominent versatility. Users were enabled to easily substitute any default module with customized algorithms and verify the effectiveness of new functions in enhancing overall performance using other default modules of the platform. Second, the deep reinforcement learning strategy based on multi-task fusion proposed in this study was competitive. Its performance was better than other DRL algorithms in certain tasks, which improved the generalization ability of the vehicle decision-making planning algorithm.
Enhancing Driver Monitoring Systems Based on Novel Multi-Task Fusion Algorithm
Distracted driving continues to be a major contributor to road accidents, highlighting the growing research interest in advanced driver monitoring systems for enhanced safety. This paper seeks to improve the overall performance and effectiveness of such systems by highlighting the importance of recognizing the driver’s activity. This paper introduces a novel methodology for assessing driver attention by using multi-perspective information using videos that capture the full driver body, hands, and face and focusing on three driver tasks: distracted actions, gaze direction, and hands-on-wheel monitoring. The experimental evaluation was conducted in two phases: first, assessing driver distracted activities, gaze direction, and hands-on-wheel using a CNN-based model and videos from three cameras that were placed inside the vehicle, and second, evaluating the multi-task fusion algorithm, considering the aggregated danger score, which was introduced in this paper, as a representation of the driver’s attentiveness based on the multi-task data fusion algorithm. The proposed methodology was built and evaluated using a DMD dataset; additionally, model robustness was tested on the AUC_V2 and SAMDD driver distraction datasets. The proposed algorithm effectively combines multi-task information from different perspectives and evaluates the attention level of the driver.
MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation
The accurate recognition of emotions in conversations helps understand the speaker’s intentions and facilitates various analyses in artificial intelligence, especially in human–computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model’s accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.
Advances and challenges in infrared-visible image fusion: a comprehensive review of techniques and applications
Infrared–visible image fusion (IVIF) integrates complementary thermal and photometric cues for surveillance, remote sensing, and autonomous perception. Existing surveys, while comprehensive, provide limited guidance for design-to-deployment and seldom relate fusion quality to task outcomes or device constraints. This work provides a unified perspective that organizes IVIF methods along an interface-attention-alignment coordinate system covering classical spatial/transform pipelines and contemporary deep paradigms (generative, discriminative, multi-task, hybrid/Transformer, dynamic). Building on literature through 2025, we synthesize fidelity-robustness-efficiency trade-offs and introduce a comparison-to-deployment protocol that couples fusion metrics with task accuracy (AP/mIoU), latency, memory footprint, and condition-performance characterization (misregistration, noise, illumination/weather). We consolidate Transformer/hybrid coverage with practical recipes and focused guidance on temporal consistency, robustness auditing, and physics-grounded interpretability. Compared with previous reviews, our survey concurrently addresses four under-covered dimensions-video temporal consistency, robustness auditing, task-aware evaluation, and deployment reporting-and distills a practical checklist linking architectural choices to operating conditions and hardware budgets, enabling reproducible, task-relevant IVIF practice.
Multi-Task Fusion Deep Learning Model for Short-Term Intersection Operation Performance Forecasting
Urban road intersection bottleneck has become an important factor in causing traffic delay and restricting traffic efficiency. It is essential to explore the prediction of the operating performance at intersections in real-time and formulate corresponding strategies to alleviate intersection delay. However, because of the sophisticated intersection traffic condition, it is difficult to capture the intersection traffic Spatio-temporal features by the traditional data and prediction methods. The development of big data technology and the deep learning model provides us a good chance to address this challenge. Therefore, this paper proposes a multi-task fusion deep learning (MFDL) model based on massive floating car data to effectively predict the passing time and speed at intersections over different estimation time granularity. Moreover, the grid model and the fuzzy C-means (FCM) clustering method are developed to identify the intersection area and derive a set of key Spatio-temporal traffic parameters from floating car data. In order to validate the effectiveness of the proposed model, the floating car data from ten intersections of Beijing with a sampling rate of 3s are adopted for the training and test process. The experiment result shows that the MFDL model enables us to capture the Spatio-temporal and topology feature of the traffic state efficiently. Compared with the traditional prediction method, the proposed model has the best prediction performance. The interplay between these two targeted prediction variables can significantly improve prediction accuracy and efficiency. Thereby, this method predicts the intersection operation performance in real-time and can provide valuable insights for traffic managers to improve the intersection’s operation efficiency.
MTFFNet: a Multi-task Feature Fusion Framework for Chinese Painting Classification
  Different artists have their unique painting styles, which can be hardly recognized by ordinary people without professional knowledge. How to intelligently analyze such artistic styles via underlying features remains to be a challenging research problem. In this paper, we propose a novel multi-task feature fusion architecture (MTFFNet), for cognitive classification of traditional Chinese paintings. Specifically, by taking the full advantage of the pre-trained DenseNet as backbone, MTFFNet benefits from the fusion of two different types of feature information: semantic and brush stroke features. These features are learned from the RGB images and auxiliary gray-level co-occurrence matrix (GLCM) in an end-to-end manner, to enhance the discriminative power of the features for the first time. Through abundant experiments, our results demonstrate that our proposed model MTFFNet achieves significantly better classification performance than many state-of-the-art approaches. In this paper, an end-to-end multi-task feature fusion method for Chinese painting classification is proposed. We come up with a new model named MTFFNet, composed of two branches, in which one branch is top-level RGB feature learning and the other branch is low-level brush stroke feature learning. The semantic feature learning branch takes the original image of traditional Chinese painting as input, extracting the color and semantic information of the image, while the brush feature learning branch takes the GLCM feature map as input, extracting the texture and edge information of the image. Multi-kernel learning SVM (supporting vector machine) is selected as the final classifier. Evaluated by experiments, this method improves the accuracy of Chinese painting classification and enhances the generalization ability. By adopting the end-to-end multi-task feature fusion method, MTFFNet could extract more semantic features and texture information in the image. When compared with state-of-the-art classification method for Chinese painting, the proposed method achieves much higher accuracy on our proposed datasets, without lowering speed or efficiency. The proposed method provides an effective solution for cognitive classification of Chinese ink painting, where the accuracy and efficiency of the approach have been fully validated.
Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist. For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or features effectively. In addition, how to include other information or tasks to enhance the estimation accuracy is also one of the challenges. In this study, we propose a multi-modal adaptive fusion transformer network for estimating the levels of depression. Transformer-based models have achieved state-of-the-art performance in language understanding and sequence modeling. Thus, the proposed transformer-based network is utilized to extract long-term temporal context information from uni-modal audio and visual data in our work. This is the first transformer-based approach for depression detection. We also propose an adaptive fusion method for adaptively fusing useful multi-modal features. Furthermore, inspired by current multi-task learning work, we also incorporate an auxiliary task (depression classification) to enhance the main task of depression level regression (estimation). The effectiveness of the proposed method has been validated on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge) in terms of the PHQ-8 scores. Experimental results indicate that the proposed method achieves better performance compared with currently state-of-the-art methods. Our proposed method achieves a concordance correlation coefficient (CCC) of 0.733 on AVEC 2019 which is 6.2% higher than the accuracy (CCC = 0.696) of the state-of-the-art method.
An Improved Boundary-Aware U-Net for Ore Image Semantic Segmentation
Particle size is the most important index to reflect the crushing quality of ores, and the accuracy of particle size statistics directly affects the subsequent operation of mines. Accurate ore image segmentation is an important prerequisite to ensure the reliability of particle size statistics. However, given the diversity of the size and shape of ores, the influence of dust and light, the complex texture and shadows on the ore surface, and especially the adhesion between ores, it is difficult to segment ore images accurately, and under-segmentation can be a serious problem. The construction of a large, labeled dataset for complex and unclear conveyor belt ore images is also difficult. In response to these challenges, we propose a novel, multi-task learning network based on U-Net for ore image segmentation. To solve the problem of limited available training datasets and to improve the feature extraction ability of the model, an improved encoder based on Resnet18 is proposed. Different from the original U-Net, our model decoder includes a boundary subnetwork for boundary detection and a mask subnetwork for mask segmentation, and information of the two subnetworks is fused in a boundary mask fusion block (BMFB). The experimental results showed that the pixel accuracy, Intersection over Union (IOU) for the ore mask (IOU_M), IOU for the ore boundary (IOU_B), and error of the average statistical ore particle size (ASE) rate of our proposed model on the testing dataset were 92.07%, 86.95%, 52.32%, and 20.38%, respectively. Compared to the benchmark U-Net, the improvements were 0.65%, 1.01%, 5.78%, and 12.11% (down), respectively.
UAV Multisensory Data Fusion and Multi-Task Deep Learning for High-Throughput Maize Phenotyping
Recent advances in unmanned aerial vehicles (UAV), mini and mobile sensors, and GeoAI (a blend of geospatial and artificial intelligence (AI) research) are the main highlights among agricultural innovations to improve crop productivity and thus secure vulnerable food systems. This study investigated the versatility of UAV-borne multisensory data fusion within a framework of multi-task deep learning for high-throughput phenotyping in maize. UAVs equipped with a set of miniaturized sensors including hyperspectral, thermal, and LiDAR were collected in an experimental corn field in Urbana, IL, USA during the growing season. A full suite of eight phenotypes was in situ measured at the end of the season for ground truth data, specifically, dry stalk biomass, cob biomass, dry grain yield, harvest index, grain nitrogen utilization efficiency (Grain NutE), grain nitrogen content, total plant nitrogen content, and grain density. After being funneled through a series of radiometric calibrations and geo-corrections, the aerial data were analytically processed in three primary approaches. First, an extended version normalized difference spectral index (NDSI) served as a simple arithmetic combination of different data modalities to explore the correlation degree with maize phenotypes. The extended NDSI analysis revealed the NIR spectra (750–1000 nm) alone in a strong relation with all of eight maize traits. Second, a fusion of vegetation indices, structural indices, and thermal index selectively handcrafted from each data modality was fed to classical machine learning regressors, Support Vector Machine (SVM) and Random Forest (RF). The prediction performance varied from phenotype to phenotype, ranging from R2 = 0.34 for grain density up to R2 = 0.85 for both grain nitrogen content and total plant nitrogen content. Further, a fusion of hyperspectral and LiDAR data completely exceeded limitations of single data modality, especially addressing the vegetation saturation effect occurring in optical remote sensing. Third, a multi-task deep convolutional neural network (CNN) was customized to take a raw imagery data fusion of hyperspectral, thermal, and LiDAR for multi-predictions of maize traits at a time. The multi-task deep learning performed predictions comparably, if not better in some traits, with the mono-task deep learning and machine learning regressors. Data augmentation used for the deep learning models boosted the prediction accuracy, which helps to alleviate the intrinsic limitation of a small sample size and unbalanced sample classes in remote sensing research. Theoretical and practical implications to plant breeders and crop growers were also made explicit during discussions in the studies.
An EEG-Based Person Authentication System with Open-Set Capability Combining Eye Blinking Signals
The electroencephalogram (EEG) signal represents a subject’s specific brain activity patterns and is considered as an ideal biometric given its superior forgery prevention. However, the accuracy and stability of the current EEG-based person authentication systems are still unsatisfactory in practical application. In this paper, a multi-task EEG-based person authentication system combining eye blinking is proposed, which can achieve high precision and robustness. Firstly, we design a novel EEG-based biometric evoked paradigm using self- or non-self-face rapid serial visual presentation (RSVP). The designed paradigm could obtain a distinct and stable biometric trait from EEG with a lower time cost. Secondly, the event-related potential (ERP) features and morphological features are extracted from EEG signals and eye blinking signals, respectively. Thirdly, convolutional neural network and back propagation neural network are severally designed to gain the score estimation of EEG features and eye blinking features. Finally, a score fusion technology based on least square method is proposed to get the final estimation score. The performance of multi-task authentication system is improved significantly compared to the system using EEG only, with an increasing average accuracy from 92.4% to 97.6%. Moreover, open-set authentication tests for additional imposters and permanence tests for users are conducted to simulate the practical scenarios, which have never been employed in previous EEG-based person authentication systems. A mean false accepted rate (FAR) of 3.90% and a mean false rejected rate (FRR) of 3.87% are accomplished in open-set authentication tests and permanence tests, respectively, which illustrate the open-set authentication and permanence capability of our systems.