Catalogue Search | MBRL

Equivalent processing of facial expression and identity by macaque visual system and task-optimized neural network

by Nolan, Rachel , Zhang, Hui , Japee, Shruti in Amygdala , Animals , Classification

2023

•Build a large macaque monkey face dataset containing 16,604 real monkey face images.•Develop a DNN to simultaneously classify monkey facial expression and identity.•Perform monkey fMRI experiment to estimate neural responses to monkey face stimuli.•Compare neural responses between DNN layers and monkey face-selective ROIs.•Found representational correspondence between DNN and monkey face processing system. Both the primate visual system and artificial deep neural network (DNN) models show an extraordinary ability to simultaneously classify facial expression and identity. However, the neural computations underlying the two systems are unclear. Here, we developed a multi-task DNN model that optimally classified both monkey facial expressions and identities. By comparing the fMRI neural representations of the macaque visual cortex with the best-performing DNN model, we found that both systems: (1) share initial stages for processing low-level face features which segregate into separate branches at later stages for processing facial expression and identity respectively, and (2) gain more specificity for the processing of either facial expression or identity as one progresses along each branch towards higher stages. Correspondence analysis between the DNN and monkey visual areas revealed that the amygdala and anterior fundus face patch (AF) matched well with later layers of the DNN's facial expression branch, while the anterior medial face patch (AM) matched well with later layers of the DNN's facial identity branch. Our results highlight the anatomical and functional similarities between macaque visual system and DNN model, suggesting a common mechanism between the two systems.

Journal Article

Share this book

Add to My Shelf

Multi-Task Environmental Perception Methods for Autonomous Driving

by Liu, Ri , Yuan, Jie , Yang, Yunchuan in Accuracy , Algorithms , autonomous driving

2024

In autonomous driving, environmental perception technology often encounters challenges such as false positives, missed detections, and low accuracy, particularly in detecting small objects and complex scenarios. Existing algorithms frequently suffer from issues like feature redundancy, insufficient contextual interaction, and inadequate information fusion, making it difficult to perform multi-task detection and segmentation efficiently. To address these challenges, this paper proposes an end-to-end multi-task environmental perception model named YOLO-Mg, designed to simultaneously perform traffic object detection, lane line detection, and drivable area segmentation. First, a multi-stage gated aggregation network (MogaNet) is employed during the feature extraction process to enhance contextual interaction by improving diversity in the channel dimension, thereby compensating for the limitations of feed-forward neural networks in contextual understanding. Second, to further improve the model’s accuracy in detecting objects of various scales, a restructured weighted bidirectional feature pyramid network (BiFPN) is introduced, optimizing cross-level information fusion and enabling the model to handle object detection at different scales more accurately. Finally, the model is equipped with one detection head and two segmentation heads to achieve efficient multi-task environmental perception, ensuring the simultaneous execution of multiple tasks. The experimental results on the BDD100K dataset demonstrate that the model achieves a mean average precision (mAP50) of 81.4% in object detection, an Intersection over Union (IoU) of 28.9% in lane detection, and a mean Intersection over Union (mIoU) of 92.6% in drivable area segmentation. The tests conducted in real-world scenarios show that the model performs effectively, significantly enhancing environmental perception in autonomous driving and laying a solid foundation for safer and more reliable autonomous driving systems.

Journal Article

Share this book

Add to My Shelf

HMT-Net: A Multi-Task Learning Based Framework for Enhanced Convolutional Code Recognition

by Zhang, Lingbo , Zhang, Yijia , Xu, Lu in Accuracy , channel coding identification , Classification

2026

Due to the critical role of channel coding, convolutional code recognition has attracted growing interest, particularly in non-cooperative communication scenarios such as spectrum surveillance. Deep learning-based approaches have emerged as promising techniques, offering improved classification performance. However, most existing works focus on single-parameter recognition and ignore the inherent correlations between code parameters. To address this, we propose a novel framework named Hybrid Multi-Task Network (HMT-Net), which adopts multi-task learning to simultaneously identify both the code rate and constraint length of convolutional codes. HMT-Net combines dilated convolutions with attention mechanisms and integrates a Transformer backbone to extract robust multi-scale sequence features. It also leverages a Channel-Wise Transformer to capture both local and global information efficiently. Meanwhile, we enhance the dataset by incorporating a comprehensive sequence dataset and further improve the recognition performance by extracting the statistical features of the sequences. Experimental results demonstrate that HMT-Net outperforms single-task models by an average recognition accuracy of 2.89%. Furthermore, HMT-Net exhibits even more remarkable performance, achieving enhancements of 4.57% in code rate recognition and 4.31% in constraint length recognition compared to other notable multi-tasking frameworks such as MAR-Net. These findings underscore the potential of HMT-Net as a robust solution for intelligent signal analysis, offering significant practical value for efficient spectrum management in next-generation communication systems.

Journal Article

Share this book

Add to My Shelf

Multitask semantic change detection guided by spatiotemporal semantic interaction

by Zhao, Liangjun , Dai, Hui , Hu, Yueming in 639/705/117 , 704/172 , 704/844/685

2025

Semantic Change Detection (SCD) aims to accurately identify the change areas and their categories in dual-time images, which is more complex and challenging than traditional binary change detection tasks. Accurately capturing the change information of land cover types is crucial for remote sensing image analysis and subsequent decision-making applications. However, existing SCD methods often neglect the spatial details and temporal dependencies of dual-time images, leading to problems such as change category imbalance and limited detection accuracy, especially in capturing small target changes. To address this issue, this study proposes a network that guides multitask semantic change detection through spatiotemporal semantic interaction (STGNet). STGNet enhances the ability to capture spatial details by introducing a Detail-Aware Path (DAP) and designs a Bidirectional Guidance Module for Spatial Detail and Semantic Information for adaptive feature selection, improving feature extraction capabilities in complex scenes. Furthermore, to resolve the inconsistency between semantic information and change areas, this paper designs a Cross-Temporal Refinement Interaction Module (CTIM), which enables cross-time scale feature fusion and interaction, constraining the consistency of detection results and improving the recognition accuracy of unchanged areas. To further enhance detection performance, a dynamic depthwise separable convolution is designed in the CTIM module, which can adaptively adjust convolution kernels to more precisely capture change features in different regions of the image. Experimental results on three SCD datasets show that the proposed method outperforms other existing methods in various evaluation metrics. In particular, on the Landsat-SCD dataset, the F1 score (F1 scd ) reaches 91.64%, and the separation Kappa coefficient improves by 17.68%. These experimental results fully demonstrate the significant advantages of STGNet in improving semantic change detection accuracy, robustness, and generalization capability.

Journal Article

Share this book

Add to My Shelf

YOLOMH: you only look once for multi-task driving perception with high efficiency

by Bowen, Sun , Jianxi, Miao , Fang, Liu in Accuracy , Communications Engineering , Computer Science

2024

Aiming at the requirements of high accuracy, lightweight and real-time performance of the panoptic driving perception system, this paper proposes an efficient multi-task network (YOLOMH). The network uses a shared encoder and three independent decoding heads to simultaneously complete the three major panoptic driving perception tasks of traffic object detection, road drivable area segmentation and road lane segmentation. Thanks to our innovative design of the YOLOMH network structure: first, we design an appropriate information input structure based on the different information requirements between different tasks, and secondly, we propose a Hybrid Deep Atrous Spatial Pyramid Pooling module to efficiently complete the feature fusion work of the neck network, and finally effective approaches such as Anchor-free detection head and Depthwise Separable Convolution are introduced into the network, making the network more efficient while being lightweight. Experimental results show that our model achieves competitive results in both accuracy and speed on the challenging BDD100K dataset, especially in terms of inference speed, The model’s inference speed on NVIDIA TESLA V100 is as high as 107 Frames Per Second (FPS), far exceeding the 49 FPS of the YOLOP network under the same experimental settings. This well meets the requirements of autonomous vehicles for high system accuracy and low latency.

Journal Article

Share this book

Add to My Shelf

SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images

by Salgueiro, Luis , Marcello, Javier , Vilaplana, Verónica in Coders , Datasets , Deep learning

2022

The production of highly accurate land cover maps is one of the primary challenges in remote sensing, which depends on the spatial resolution of the input images. Sometimes, high-resolution imagery is not available or is too expensive to cover large areas or to perform multitemporal analysis. In this context, we propose a multi-task network to take advantage of the freely available Sentinel-2 imagery to produce a super-resolution image, with a scaling factor of 5, and the corresponding high-resolution land cover map. Our proposal, named SEG-ESRGAN, consists of two branches: the super-resolution branch, that produces Sentinel-2 multispectral images at 2 m resolution, and an encoder–decoder architecture for the semantic segmentation branch, that generates the enhanced land cover map. From the super-resolution branch, several skip connections are retrieved and concatenated with features from the different stages of the encoder part of the segmentation branch, promoting the flow of meaningful information to boost the accuracy in the segmentation task. Our model is trained with a multi-loss approach using a novel dataset to train and test the super-resolution stage, which is developed from Sentinel-2 and WorldView-2 image pairs. In addition, we generated a dataset with ground-truth labels for the segmentation task. To assess the super-resolution improvement, the PSNR, SSIM, ERGAS, and SAM metrics were considered, while to measure the classification performance, we used the IoU, confusion matrix and the F1-score. Experimental results demonstrate that the SEG-ESRGAN model outperforms different full segmentation and dual network models (U-Net, DeepLabV3+, HRNet and Dual_DeepLab), allowing the generation of high-resolution land cover maps in challenging scenarios using Sentinel-2 10 m bands.

Journal Article

Share this book

Add to My Shelf

CGMNet: Semantic Change Detection via a Change-Aware Guided Multi-Task Network

by Zuo, Xiaolong , Tan, Li , Cheng, Xi in Algorithms , branches , branching

2024

Change detection (CD) is the main task in the remote sensing field. Binary change detection (BCD), which only focuses on the region of change, cannot meet current needs. Semantic change detection (SCD) is pivotal for identifying regions of change in sequential remote sensing imagery, focusing on discerning “from-to” transitions in land cover. The emphasis on features within these regions of change is critical for SCD efficacy. Traditional methodologies, however, often overlook this aspect. In order to address this gap, we introduce a change-aware guided multi-task network (CGMNet). This innovative network integrates a change-aware mask branch, leveraging prior knowledge of regions of change to enhance land cover classification in dual temporal remote sensing images. This strategic focus allows for the more accurate identification of altered regions. Furthermore, to navigate the complexities of remote sensing environments, we develop a global and local attention mechanism (GLAM). This mechanism adeptly captures both overarching and fine-grained spatial details, facilitating more nuanced analysis. Our rigorous testing on two public datasets using state-of-the-art methods yielded impressive results. CGMNet achieved Overall Score metrics of 58.77% on the Landsat-SCD dataset and 37.06% on the SECOND dataset. These outcomes not only demonstrate the exceptional performance of the method but also signify its superiority over other comparative algorithms.

Journal Article

Share this book

Add to My Shelf

An Efficient End-to-End Multitask Network Architecture for Defect Inspection

by Yang, Heqiu , Chen, Huayue , Ma, Jun in Accuracy , Algorithms , Classification

2022

Recently, computer vision-based methods have been successfully applied in many industrial fields. Nevertheless, automated detection of steel surface defects remains a challenge due to the complexity of surface defects. To solve this problem, many models have been proposed, but these models are not good enough to detect all defects. After analyzing the previous research, we believe that the single-task network cannot fully meet the actual detection needs owing to its own characteristics. To address this problem, an end-to-end multi-task network has been proposed. It consists of one encoder and two decoders. The encoder is used for feature extraction, and the two decoders are used for object detection and semantic segmentation, respectively. In an effort to deal with the challenge of changing defect scales, we propose the Depthwise Separable Atrous Spatial Pyramid Pooling module. This module can obtain dense multi-scale features at a very low computational cost. After that, Residually Connected Depthwise Separable Atrous Convolutional Blocks are used to extract spatial information under low computation for better segmentation prediction. Furthermore, we investigate the impact of training strategies on network performance. The performance of the network can be optimized by adopting the strategy of training the segmentation task first and using the deep supervision training method. At length, the advantages of object detection and semantic segmentation are tactfully combined. Our model achieves mIOU 79.37% and mAP@0.5 78.38% on the NEU dataset. Comparative experiments demonstrate that this method has apparent advantages over other models. Meanwhile, the speed of detection amount to 85.6 FPS on a single GPU, which is acceptable in the practical detection process.

Journal Article

Share this book

Add to My Shelf

Human mobility prediction with causal and spatial-constrained multi-task network

by Jin, Yaohui , Xu, Shengyuan , Xu, Yanyan in Causality , Complexity , Computer Appl. in Social and Behavioral Sciences

2024

Modeling human mobility helps to understand how people are accessing resources and physically contacting with each other in cities, and thus contributes to various applications such as urban planning, epidemic control, and location-based advertisement. Next location prediction is one decisive task in individual human mobility modeling and is usually viewed as sequence modeling, solved with Markov or RNN-based methods. However, the existing models paid little attention to the logic of individual travel decisions and the reproducibility of the collective behavior of population. To this end, we propose a Causal and Spatial-constrained Long and Short-term Learner (CSLSL) for next location prediction. CSLSL utilizes a causal structure based on multi-task learning to explicitly model the “ when → what → where ”, a.k.a. “ time → activity → location ” decision logic. We next propose a spatial-constrained loss function as an auxiliary task, to ensure the consistency between the predicted and actual spatial distribution of travelers’ destinations. Moreover, CSLSL adopts modules named Long and Short-term Capturer (LSC) to learn the transition regularities across different time spans. Extensive experiments on three real-world datasets show promising performance improvements of CSLSL over baselines and confirm the effectiveness of introducing the causality and consistency constraints. The implementation is available at https://github.com/urbanmobility/CSLSL .

Journal Article

Share this book

Add to My Shelf

Robust 6-DoF Pose Estimation under Hybrid Constraints

by Dong, Xin , Lin, Lin , Wang, Yanjie in Accuracy , Algorithms , Datasets

2022

To solve the problem of the insufficient accuracy and stability of the two-stage pose estimation algorithm using heatmap in the problem of occluded object pose estimation, a new robust 6-DoF pose estimation algorithm under hybrid constraints is proposed in this paper. First, a new loss function suitable for heatmap regression is formulated to improve the quality of the predicted heatmaps and increase keypoint accuracy in complex scenes. Second, the heatmap regression network is expanded and a translation regression branch is added to constrain the pose further. Finally, a robust pose optimization module is used to fuse the heatmap and translation estimates and improve the pose estimation accuracy. The proposed algorithm achieves ADD(-S) accuracy rates of 93.5% and 46.2% on the LINEMOD dataset and the Occlusion LINEMOD dataset, which are better than other state-of-the-art algorithms. Compared with the conventional two-stage heatmap-based pose estimation algorithms, the mean estimation error is greatly reduced, and the stability of pose estimation is improved. The proposed algorithm can run at a maximum speed of 22 FPS, thus constituting both a performant and efficient method.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter