Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
30 result(s) for "Feature Disentanglement"
Sort by:
DeCon-Net: decoupled hierarchical contrast for soccer object detection
Soccer video analysis has significant application value in sports broadcasting, tactical research, and athlete training, with accurate object detection serving as the key foundation for automated analysis. Soccer object detection typically improves performance through enhanced feature representation and optimized network architectures, but these methods assume that models can automatically learn discriminative features of targets. Through experiments, we reveal the “feature collapse” phenomenon in soccer detection, where features of players from the same team are excessively clustered in high-dimensional space, and soccer ball features degenerate to near background noise. Furthermore, existing methods lack progressive feature evolution mechanisms, resulting in insufficient discriminative capability when handling dense scenes. To address these issues, we propose DeCon-Net, which contains a Decoupled Feature Learning Module (DFLM) and a Hierarchical Contrastive Constraint Module (HCCM). Specifically, DFLM designs dual-stream encoders to extract appearance features and identity features separately, forcing the identity stream to learn truly discriminative representations through mutual exclusivity constraints. HCCM adopts dynamic threshold contrastive learning, adaptively adjusting learning intensity based on feature distances between sample pairs, achieving progressive optimization from coarse to fine granularity. Experimental results demonstrate that DeCon-Net achieves significant performance improvements on the SportsMOT and SoccerNet-Tracking datasets, particularly showing substantial gains in ball detection.
Towards prognostic generalization: a domain conditional invariance and specificity disentanglement network for remaining useful life prediction
Remaining useful life (RUL) prediction is an essential task in ensuring reliability in intelligent manufacturing. Recent advances in deep learning-based data-driven methods have shown promising results. However, one non-ignorable challenge is that distribution shift across various machine individuals often results in a performance decline. Domain adaptation approaches appear to be effective in tackling this issue, whereas they often require sufficient unlabeled target data, which is causally infeasible in prognostic tasks in practical scenarios. In this paper, we discuss the significance of prognostic generalization for RUL prediction, and a domain generalization-based scheme is proposed. A domain conditional invariance and specificity disentanglement network (DCISD) is proposed to learn domain conditional-invariant and domain-specific information simultaneously in a unified network. Domain conditional-invariant features are extracted through conditional domain adversarial learning and samples are conditioned by multiple RUL fuzzy sets. Domain-specific features correlated to individual degradation patterns are disentangled to promote sufficiency of degradation information. Moreover, a degradation dynamics-based augmentation method is proposed to mitigate domain imbalance following the degradation dynamics in the latent space. Two bearing run-to-failure datasets are utilized to evaluate the proposed method. Comparative and ablation studies validate the method effectiveness and superiority.
Learning Hierarchically Consistent Disentanglement with Multi-Channel Augmentation for Public Security-Oriented Sketch Person Re-Identification
Sketch re-identification (Re-ID) aims to retrieve pedestrian photographs in the gallery dataset by a query sketch image drawn by professionals, which is crucial for criminal investigations and missing person searches in the field of public security. The main challenge of this task lies in bridging the significant modality gap between sketches and photos while extracting discriminative modality-invariant features. However, information asymmetry between sketches and RGB photographs, particularly the differences in color information, severely interferes with cross-modal matching processes. To address this challenge, we propose a novel network architecture that integrates multi-channel augmentation with hierarchically consistent disentanglement learning. Specifically, a multi-channel augmentation module is developed to mitigate the interference of color bias in cross-modal matching. Furthermore, a modality-disentangled prototype(MDP) module is introduced to decompose pedestrian representations at the feature level into modality-invariant structural prototypes and modality-specific appearance prototypes. Additionally, a cross-layer decoupling consistency constraint is designed to ensure the semantic coherence of disentangled prototypes across different network layers and to improve the stability of the whole decoupling process. Extensive experimental results on two public datasets demonstrate the superiority of our proposed approach over state-of-the-art methods.
Unbalanced power anomaly detection model based on improved transformer and countermeasure encoder
Current intelligent grid anomaly detection faces challenges such as low minority-class recognition due to imbalanced data, high computational complexity in long-sequence processing, and model bias from scarce anomaly samples. To address these, we propose a hybrid architecture combining an enhanced Transformer with an Adversarial Autoencoder (AAE). We introduce a Locality-Sensitive Hashing (LSH) attention mechanism using Focal Loss with Temperature (FLT) to cluster similar features. A dynamic weighting module, implemented via a Spatial-Temporal Feature Disentanglement Network (STFDN), adaptively adjusts gradients by category. Our approach reduces memory usage for node sequences from 18.7GB to 8.9GB (52.4% less) via Spectral Normalization. Under Wasserstein distance constraints, the model achieves an FID score of 28.4, a 10.4% improvement. An innovative dynamic temperature scaling strategy elevates the AUPRC to 0.837 on the SGSC dataset. Tests on the UK-DALE dataset show an F1-score of 89.3% with 183ms inference latency, meeting edge deployment requirements. This research offers a promising new generation of automated detection tools for grid operation and maintenance.
A machine learning study highlighting the challenges of fidgety movement recognition using vision and inertial sensors
Past medical research has shown that infantile movement and early neurological development are closely linked. Fidgety Movements that are reflex-like movement occurring in healthy infants less than 20-week of age have proven to be especially important, as past studies have highlighted that their absence is strongly correlated with the future development of neurological disorders like Cerebral Palsy. To provide a timely intervention, the General Movement Assessment was proposed as a screening medical procedure carried out by clinical personnel specifically trained to recognize Fidgety Movements. Because of its high cost in time and resources, several initiatives to automatize General Movement Assessment using machine learning techniques have been proposed in the literature. However none has managed to emerge as state-of-the-art so far. To investigate this problem, we conducted a study using deep learning approaches to learn disentangled feature representations for the recognition of Fidgety Movements using RGB-D video and Inertial Measurement Unit data acquired from 95 infants (average age: weeks). Our results show that while it is possible to learn features that characterize movement independently of subject information, obtaining feature representations that consistently generalize to subjects unseen during training remains challenging. More specifically, we observe that both the vision- and sensor-based modalities have specific challenges to be addressed for the recognition of Fidgety Movements. We discuss them and provide recommendations to help researchers interested in investigating this problem in the future.
Contrastive Feature Disentanglement via Physical Priors for Underwater Image Enhancement
Underwater image enhancement (UIE) serves as a fundamental preprocessing step in ocean remote sensing applications, encompassing marine life detection, archaeological surveying, and subsea resource exploration. However, UIE encounters substantial technical challenges due to the intricate physics of underwater light propagation and the inherent homogeneity of aquatic environments. Images captured underwater are significantly degraded through wavelength-dependent absorption and scattering processes, resulting in color distortion, contrast degradation, and illumination irregularities. To address these challenges, we propose a contrastive feature disentanglement network (CFD-Net) that systematically addresses underwater image degradation. Our framework employs a multi-stream decomposition architecture with three specialized decoders to disentangle the latent feature space into components associated with degradation and those representing high-quality features. We incorporate hierarchical contrastive learning mechanisms to establish clear relationships between standard and degraded feature spaces, emphasizing intra-layer similarity and inter-layer exclusivity. Through the synergistic utilization of internal feature consistency and cross-component distinctiveness, our framework achieves robust feature extraction without explicit supervision. Compared to existing methods, our approach achieves a 12% higher UIQM score on the EUVP dataset and outperforms other state-of-the-art techniques on various evaluation metrics such as UCIQE, MUSIQ, and NIQE, both quantitatively and qualitatively.
Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition
Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)3 and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization.
Arbitrary Font Generation by Encoder Learning of Disentangled Features
Making a new font requires graphical designs for all base characters, and this designing process consumes lots of time and human resources. Especially for languages including a large number of combinations of consonants and vowels, it is a heavy burden to design all such combinations independently. Automatic font generation methods have been proposed to reduce this labor-intensive design problem. Most of the methods are GAN-based approaches, and they are limited to generate the trained fonts. In some previous methods, they used two encoders, one for content, the other for style, but their disentanglement of content and style is not sufficiently effective in generating arbitrary fonts. Arbitrary font generation is a challenging task because learning text and font design separately from given font images is very difficult, where the font images have both text content and font style in each image. In this paper, we propose a new automatic font generation method to solve this disentanglement problem. First, we use two stacked inputs, i.e., images with the same text but different font style as content input and images with the same font style but different text as style input. Second, we propose new consistency losses that force any combination of encoded features of the stacked inputs to have the same values. In our experiments, we proved that our method can extract consistent features of text contents and font styles by separating content and style encoders and this works well for generating unseen font design from a small number of reference font images that are human-designed. Comparing to the previous methods, the font designs generated with our method showed better quality both qualitatively and quantitatively than those with the previous methods for Korean, Chinese, and English characters. e.g., 17.84 lower FID in unseen font compared to other methods.
Feature Disentanglement Based on Dual-Mask-Guided Slot Attention for SAR ATR Across Backgrounds
Due to the limited number of SAR samples in the dataset, current networks for SAR automatic target recognition (SAR ATR) are prone to overfitting the environmental information, which diminishes their generalization ability under cross-background conditions. However, acquiring sufficient measured data to cover the entire environmental space remains a significant challenge. This paper proposes a novel feature disentanglement network, named FDSANet. The network is designed to decouple and distinguish the features of the target from the background before classification, thereby improving its adaptability to background changes. Specifically, the network consists of two sub-networks. The first is an autoencoder sub-network based on dual-mask-guided slot attention. This sub-network utilizes target mask to guide the encoder to distinguish between target and background features. It then outputs these features as independent representations, respectively, achieving feature disentanglement. The second is a classification sub-network. It includes an encoder and a classifier, which work together to perform the classification based on the extracted target features. This network enhances the causal relationship between the target and the classification result, while mitigating the background’s interference on the classification. Moreover, the network, trained under a fixed background, demonstrates strong adaptability when applied to a new background. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset, as well as the OpenSARShip dataset, demonstrate the superior performance of FDSANet.
LIMFA: label-irrelevant multi-domain feature alignment-based fake news detection for unseen domain
Fake news in social networks causes disastrous effects on the real world yet effectively detecting newly emerged fake news remains difficult. This problem is particularly pronounced when the testing samples (target domain) are derived from different topics, events, platforms or time periods from the training dataset (source domains). Though efforts have focused on learning domain-invariant features (DIF) across multiple source domains to transfer universal knowledge from the source to the target domain, they ignore the complexity that arises when the number of source domains increases, resulting in unreliable DIF. In this paper, we first point out two challenges faced by learning DIF for fake news detection, (1) high intra-domain correlations, caused by the similarity of news samples within the same domain but different categories can be higher than that in different domains but the same categories, and (2) complex inter-domain correlations, stemming from that news samples in different domains are semantically related. To tackle these challenges, we propose two modules, center-aware feature alignment and likelihood gain-based feature disentanglement, to enhance the multiple domains alignment while enforcing two categories separated and disentangle the domain-specific features in an adversarial supervision manner. By combining these modules, we conduct a label-irrelevant multi-domain feature alignment (LIMFA) framework. Our experiments show that LIMFA can be deployed with various base models and it outperforms the state-of-the-art baselines in 4 cross-domain scenarios. Our source codes will be available upon the acceptance of this manuscript.