Catalogue Search | MBRL

DeCon-Net: decoupled hierarchical contrast for soccer object detection

by Ouyang, Qingya , Li, Qingyuan , Du, Tao in 639/166 , 639/705 , Algorithms

2026

Soccer video analysis has significant application value in sports broadcasting, tactical research, and athlete training, with accurate object detection serving as the key foundation for automated analysis. Soccer object detection typically improves performance through enhanced feature representation and optimized network architectures, but these methods assume that models can automatically learn discriminative features of targets. Through experiments, we reveal the “feature collapse” phenomenon in soccer detection, where features of players from the same team are excessively clustered in high-dimensional space, and soccer ball features degenerate to near background noise. Furthermore, existing methods lack progressive feature evolution mechanisms, resulting in insufficient discriminative capability when handling dense scenes. To address these issues, we propose DeCon-Net, which contains a Decoupled Feature Learning Module (DFLM) and a Hierarchical Contrastive Constraint Module (HCCM). Specifically, DFLM designs dual-stream encoders to extract appearance features and identity features separately, forcing the identity stream to learn truly discriminative representations through mutual exclusivity constraints. HCCM adopts dynamic threshold contrastive learning, adaptively adjusting learning intensity based on feature distances between sample pairs, achieving progressive optimization from coarse to fine granularity. Experimental results demonstrate that DeCon-Net achieves significant performance improvements on the SportsMOT and SoccerNet-Tracking datasets, particularly showing substantial gains in ball detection.

Journal Article

Share this book

Add to My Shelf

Towards prognostic generalization: a domain conditional invariance and specificity disentanglement network for remaining useful life prediction

by Huang, Yixiang , Xia, Pengcheng , Qin, Chengjin in Ablation , Advanced manufacturing technologies , Deep learning

2024

Remaining useful life (RUL) prediction is an essential task in ensuring reliability in intelligent manufacturing. Recent advances in deep learning-based data-driven methods have shown promising results. However, one non-ignorable challenge is that distribution shift across various machine individuals often results in a performance decline. Domain adaptation approaches appear to be effective in tackling this issue, whereas they often require sufficient unlabeled target data, which is causally infeasible in prognostic tasks in practical scenarios. In this paper, we discuss the significance of prognostic generalization for RUL prediction, and a domain generalization-based scheme is proposed. A domain conditional invariance and specificity disentanglement network (DCISD) is proposed to learn domain conditional-invariant and domain-specific information simultaneously in a unified network. Domain conditional-invariant features are extracted through conditional domain adversarial learning and samples are conditioned by multiple RUL fuzzy sets. Domain-specific features correlated to individual degradation patterns are disentangled to promote sufficiency of degradation information. Moreover, a degradation dynamics-based augmentation method is proposed to mitigate domain imbalance following the degradation dynamics in the latent space. Two bearing run-to-failure datasets are utilized to evaluate the proposed method. Comparative and ablation studies validate the method effectiveness and superiority.

Journal Article

Share this book

Add to My Shelf

Learning Hierarchically Consistent Disentanglement with Multi-Channel Augmentation for Public Security-Oriented Sketch Person Re-Identification

by Chen, Jun , Sun, Zhihong , Ye, Yu in Asymmetry , cross-modality , Datasets

2025

Sketch re-identification (Re-ID) aims to retrieve pedestrian photographs in the gallery dataset by a query sketch image drawn by professionals, which is crucial for criminal investigations and missing person searches in the field of public security. The main challenge of this task lies in bridging the significant modality gap between sketches and photos while extracting discriminative modality-invariant features. However, information asymmetry between sketches and RGB photographs, particularly the differences in color information, severely interferes with cross-modal matching processes. To address this challenge, we propose a novel network architecture that integrates multi-channel augmentation with hierarchically consistent disentanglement learning. Specifically, a multi-channel augmentation module is developed to mitigate the interference of color bias in cross-modal matching. Furthermore, a modality-disentangled prototype(MDP) module is introduced to decompose pedestrian representations at the feature level into modality-invariant structural prototypes and modality-specific appearance prototypes. Additionally, a cross-layer decoupling consistency constraint is designed to ensure the semantic coherence of disentangled prototypes across different network layers and to improve the stability of the whole decoupling process. Extensive experimental results on two public datasets demonstrate the superiority of our proposed approach over state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Unbalanced power anomaly detection model based on improved transformer and countermeasure encoder

by Song, Yanjun , Yang, Shuai in 639/166 , 639/705 , Datasets

2025

Current intelligent grid anomaly detection faces challenges such as low minority-class recognition due to imbalanced data, high computational complexity in long-sequence processing, and model bias from scarce anomaly samples. To address these, we propose a hybrid architecture combining an enhanced Transformer with an Adversarial Autoencoder (AAE). We introduce a Locality-Sensitive Hashing (LSH) attention mechanism using Focal Loss with Temperature (FLT) to cluster similar features. A dynamic weighting module, implemented via a Spatial-Temporal Feature Disentanglement Network (STFDN), adaptively adjusts gradients by category. Our approach reduces memory usage for node sequences from 18.7GB to 8.9GB (52.4% less) via Spectral Normalization. Under Wasserstein distance constraints, the model achieves an FID score of 28.4, a 10.4% improvement. An innovative dynamic temperature scaling strategy elevates the AUPRC to 0.837 on the SGSC dataset. Tests on the UK-DALE dataset show an F1-score of 89.3% with 183ms inference latency, meeting edge deployment requirements. This research offers a promising new generation of automated detection tools for grid operation and maintenance.

Journal Article

Share this book

Add to My Shelf

A machine learning study highlighting the challenges of fidgety movement recognition using vision and inertial sensors

by Hölzl, Hannes , Mansow-Model, Sebastian , Stein, Anne in 631/114/1305 , 692/617/375 , 692/700/1720/3187

2026

Past medical research has shown that infantile movement and early neurological development are closely linked. Fidgety Movements that are reflex-like movement occurring in healthy infants less than 20-week of age have proven to be especially important, as past studies have highlighted that their absence is strongly correlated with the future development of neurological disorders like Cerebral Palsy. To provide a timely intervention, the General Movement Assessment was proposed as a screening medical procedure carried out by clinical personnel specifically trained to recognize Fidgety Movements. Because of its high cost in time and resources, several initiatives to automatize General Movement Assessment using machine learning techniques have been proposed in the literature. However none has managed to emerge as state-of-the-art so far. To investigate this problem, we conducted a study using deep learning approaches to learn disentangled feature representations for the recognition of Fidgety Movements using RGB-D video and Inertial Measurement Unit data acquired from 95 infants (average age: weeks). Our results show that while it is possible to learn features that characterize movement independently of subject information, obtaining feature representations that consistently generalize to subjects unseen during training remains challenging. More specifically, we observe that both the vision- and sensor-based modalities have specific challenges to be addressed for the recognition of Fidgety Movements. We discuss them and provide recommendations to help researchers interested in investigating this problem in the future.

Journal Article

Share this book

Add to My Shelf

Contrastive Feature Disentanglement via Physical Priors for Underwater Image Enhancement

by Xi, Yue , Wan, Li , Li, Fei in Aquatic environment , Archaeology , Autonomous underwater vehicles

2025

Underwater image enhancement (UIE) serves as a fundamental preprocessing step in ocean remote sensing applications, encompassing marine life detection, archaeological surveying, and subsea resource exploration. However, UIE encounters substantial technical challenges due to the intricate physics of underwater light propagation and the inherent homogeneity of aquatic environments. Images captured underwater are significantly degraded through wavelength-dependent absorption and scattering processes, resulting in color distortion, contrast degradation, and illumination irregularities. To address these challenges, we propose a contrastive feature disentanglement network (CFD-Net) that systematically addresses underwater image degradation. Our framework employs a multi-stream decomposition architecture with three specialized decoders to disentangle the latent feature space into components associated with degradation and those representing high-quality features. We incorporate hierarchical contrastive learning mechanisms to establish clear relationships between standard and degraded feature spaces, emphasizing intra-layer similarity and inter-layer exclusivity. Through the synergistic utilization of internal feature consistency and cross-component distinctiveness, our framework achieves robust feature extraction without explicit supervision. Compared to existing methods, our approach achieves a 12% higher UIQM score on the EUVP dataset and outperforms other state-of-the-art techniques on various evaluation metrics such as UCIQE, MUSIQ, and NIQE, both quantitatively and qualitatively.

Journal Article

Share this book

Add to My Shelf

Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition

by Qu, Kai , Huang, Shucheng , Qian, Yu in Algorithms , Cognition , Deep learning

2026

Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)3 and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization.

Journal Article

Share this book

Add to My Shelf

Arbitrary Font Generation by Encoder Learning of Disentangled Features

by Lee, Jeong-Sik , Choi, Hyun-Chul , Baek, Rock-Hyun in arbitrary font generation , consistency loss , Design

2022

Making a new font requires graphical designs for all base characters, and this designing process consumes lots of time and human resources. Especially for languages including a large number of combinations of consonants and vowels, it is a heavy burden to design all such combinations independently. Automatic font generation methods have been proposed to reduce this labor-intensive design problem. Most of the methods are GAN-based approaches, and they are limited to generate the trained fonts. In some previous methods, they used two encoders, one for content, the other for style, but their disentanglement of content and style is not sufficiently effective in generating arbitrary fonts. Arbitrary font generation is a challenging task because learning text and font design separately from given font images is very difficult, where the font images have both text content and font style in each image. In this paper, we propose a new automatic font generation method to solve this disentanglement problem. First, we use two stacked inputs, i.e., images with the same text but different font style as content input and images with the same font style but different text as style input. Second, we propose new consistency losses that force any combination of encoded features of the stacked inputs to have the same values. In our experiments, we proved that our method can extract consistent features of text contents and font styles by separating content and style encoders and this works well for generating unseen font design from a small number of reference font images that are human-designed. Comparing to the previous methods, the font designs generated with our method showed better quality both qualitatively and quantitatively than those with the previous methods for Korean, Chinese, and English characters. e.g., 17.84 lower FID in unseen font compared to other methods.

Journal Article

Share this book

Add to My Shelf

Feature Disentanglement Based on Dual-Mask-Guided Slot Attention for SAR ATR Across Backgrounds

by Liang, Yuan , Su, Tao , Liu, Jiangtao in Adaptability , Algorithms , Automatic target recognition

2026

Due to the limited number of SAR samples in the dataset, current networks for SAR automatic target recognition (SAR ATR) are prone to overfitting the environmental information, which diminishes their generalization ability under cross-background conditions. However, acquiring sufficient measured data to cover the entire environmental space remains a significant challenge. This paper proposes a novel feature disentanglement network, named FDSANet. The network is designed to decouple and distinguish the features of the target from the background before classification, thereby improving its adaptability to background changes. Specifically, the network consists of two sub-networks. The first is an autoencoder sub-network based on dual-mask-guided slot attention. This sub-network utilizes target mask to guide the encoder to distinguish between target and background features. It then outputs these features as independent representations, respectively, achieving feature disentanglement. The second is a classification sub-network. It includes an encoder and a classifier, which work together to perform the classification based on the extracted target features. This network enhances the causal relationship between the target and the classification result, while mitigating the background’s interference on the classification. Moreover, the network, trained under a fixed background, demonstrates strong adaptability when applied to a new background. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset, as well as the OpenSARShip dataset, demonstrate the superior performance of FDSANet.

Journal Article

Share this book

Add to My Shelf

LIMFA: label-irrelevant multi-domain feature alignment-based fake news detection for unseen domain

by Qi, Meilin , Tan, Zhenhua , Wu, Danke in Alignment , Artificial Intelligence , Categories

2024

Fake news in social networks causes disastrous effects on the real world yet effectively detecting newly emerged fake news remains difficult. This problem is particularly pronounced when the testing samples (target domain) are derived from different topics, events, platforms or time periods from the training dataset (source domains). Though efforts have focused on learning domain-invariant features (DIF) across multiple source domains to transfer universal knowledge from the source to the target domain, they ignore the complexity that arises when the number of source domains increases, resulting in unreliable DIF. In this paper, we first point out two challenges faced by learning DIF for fake news detection, (1) high intra-domain correlations, caused by the similarity of news samples within the same domain but different categories can be higher than that in different domains but the same categories, and (2) complex inter-domain correlations, stemming from that news samples in different domains are semantically related. To tackle these challenges, we propose two modules, center-aware feature alignment and likelihood gain-based feature disentanglement, to enhance the multiple domains alignment while enforcing two categories separated and disentangle the domain-specific features in an adversarial supervision manner. By combining these modules, we conduct a label-irrelevant multi-domain feature alignment (LIMFA) framework. Our experiments show that LIMFA can be deployed with various base models and it outperforms the state-of-the-art baselines in 4 cross-domain scenarios. Our source codes will be available upon the acceptance of this manuscript.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter