Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
1,003 result(s) for "unsupervised representation learning"
Sort by:
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification
With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features (“prototypes”) from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.
Autoencoding slow representations for semi-supervised data-efficient regression
The slowness principle is a concept inspired by the visual cortex of the brain. It postulates that the underlying generative factors of a quickly varying sensory signal change on a different, slower time scale. By applying this principle to state-of-the-art unsupervised representation learning methods one can learn a latent embedding to perform supervised downstream regression tasks more data efficient. In this paper, we compare different approaches to unsupervised slow representation learning such as L p norm based slowness regularization and the SlowVAE, and propose a new term based on Brownian motion used in our method, the S-VAE. We empirically evaluate these slowness regularization terms with respect to their downstream task performance and data efficiency in state estimation and behavioral cloning tasks. We find that slow representations show great performance improvements in settings where only sparse labeled training data is available. Furthermore, we present a theoretical and empirical comparison of the discussed slowness regularization terms. Finally, we discuss how the Fréchet Inception Distance (FID), commonly used to determine the generative capabilities of GANs, can predict the performance of trained models in supervised downstream tasks.
Segment-Based Unsupervised Learning Method in Sensor-Based Human Activity Recognition
Sensor-based human activity recognition (HAR) is a task to recognize human activities, and HAR has an important role in analyzing human behavior such as in the healthcare field. HAR is typically implemented using traditional machine learning methods. In contrast to traditional machine learning methods, deep learning models can be trained end-to-end with automatic feature extraction from raw sensor data. Therefore, deep learning models can adapt to various situations. However, deep learning models require substantial amounts of training data, and annotating activity labels to construct a training dataset is cost-intensive due to the need for human labor. In this study, we focused on the continuity of activities and propose a segment-based unsupervised deep learning method for HAR using accelerometer sensor data. We define segment data as sensor data measured at one time, and this includes only a single activity. To collect the segment data, we propose a measurement method where the users only need to annotate the starting, changing, and ending points of their activity rather than the activity label. We developed a new segment-based SimCLR, which uses pairs of segment data, and propose a method that combines segment-based SimCLR with SDFD. We investigated the effectiveness of feature representations obtained by training the linear layer with fixed weights obtained by unsupervised learning methods. As a result, we demonstrated that the proposed combined method acquires generalized feature representations. The results of transfer learning on different datasets suggest that the proposed method is robust to the sampling frequency of the sensor data, although it requires more training data than other methods.
Sensor-Based Fault Diagnosis and Prognosis of Neurophysiological States: A Transformer Autoencoder Approach to EEG Monitoring
This study presents a sensor-based condition monitoring framework for the diagnosis and prognosis of neurophysiological states using electroencephalographic (EEG) signals. Leveraging a comparative deep learning architecture, we evaluate a baseline Variational Autoencoder against a Transformer-based Autoencoder to model latent representations of EEG dynamics across three therapeutic phases: pre-intervention, during intervention, and post-intervention. The proposed methodology aligns with sensor-based fault diagnosis principles by treating deviations from stable neurophysiological states as diagnostic indicators and temporal phase transitions as markers of therapeutic stage progression. Using a dataset of 94 EEG sessions from six subjects with diverse neurological conditions, we demonstrate that the Transformer Autoencoder, through its self-attention mechanism, captures cross-band spectral relationships more effectively than the VAE, resulting in denser within-phase clusters and improved separation between therapeutic stages. Quantitative evaluation reveals small but statistically significant effects between pre- and during-intervention phases (ηpartial2=0.0388) and pre- and post-intervention phases (ηpartial2=0.0470), predominantly driven by delta, theta, beta, and gamma rhythms. These findings illustrate how sensor-based latent state monitoring can provide interpretable, data-driven insights for condition assessment and phase transition assessment between sessions in complex dynamic systems, with potential applicability beyond clinical domains to industrial condition monitoring and fault diagnosis tasks. The framework confirms that it offers qualitative indicators, rather than predictive clinical outputs.
Grid Jigsaw Representation with CLIP: a new perspective on image clustering
Unsupervised representation learning for image clustering is essential in computer vision. Although the advancement of visual models has improved image clustering with efficient visual representations, challenges still remain. Firstly, existing features often lack the ability to represent the internal structure of images, hindering the accurate clustering of visually similar images. Secondly, finer-grained semantic labels are often missing, limiting the ability to capture nuanced differences and similarities between images. In this paper, we propose a new perspective on image clustering, the pretrain-based Grid Jigsaw Representation (pGJR). Inspired by human jigsaw puzzle processing, we modify the traditional jigsaw learning to gain a more sequential and incremental understanding of image structure. We also leverage the pretrained CLIP to extract the prior features which can benefit from the enhanced cross-modal representation for richer and more nuanced semantic information and label level differentiation. Our experiments demonstrate that using the pretrained model as a feature extractor can accelerate the convergence of clustering. We append the GJR module to pGJR and observe significant improvements on common-use benchmark datasets. The experimental results highlight the effectiveness of our approach in the clustering task, as evidenced by improvements in the ACC, NMI, and ARI metrics, as well as the super-fast convergence speed.
Unsupervised Representation Learning for Proteochemometric Modeling
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.
View sequence prediction GAN: unsupervised representation learning for 3D shapes by decomposing view content and viewpoint variance
Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading to uninformative 3D features that limit their practical applications. To address these issues, we propose an unsupervised 3D shape representation learning method called View Sequence Prediction GAN (VSP-GAN), which decomposes view content and viewpoint variance. VSP-GAN takes several adjacent views of a 3D shape as input and outputs the subsequent views. The key idea is to split the multi-view sequence into two available perceptible parts, view content and viewpoint variance, and independently encode them with separate encoders. With the information, we design a decoder implemented by the mirrored architecture of the content encoder to predict the view sequence by multi-steps. Besides, to improve the quality of the reconstructed views, we propose a novel hierarchical view prediction loss to enhance view realism, semantic consistency, and details retainment. We evaluate the proposed VSP-GAN on two popular 3D CAD datasets, ModelNet10 and ModelNet40, for 3D shape classification and retrieval. The experimental results demonstrate that our VSP-GAN can learn more discriminative features than the state-of-the-art methods.
A Taxonomy and Theoretical Analysis of Collapse Phenomena in Unsupervised Representation Learning
Unsupervised representation learning has emerged as a promising paradigm in machine learning, owing to its capacity to extract semantically meaningful features from unlabeled data. Despite recent progress, however, such methods remain vulnerable to collapse phenomena, wherein the expressiveness and diversity of learned representations are severely degraded. This phenomenon poses significant challenges to both model performance and generalizability. This paper presents a systematic investigation into two distinct forms of collapse: complete collapse and dimensional collapse. Complete collapse typically arises in non-contrastive frameworks, where all learned representations converge to trivial constants, thereby rendering the learned feature space non-informative. While contrastive learning has been introduced as a principled remedy, recent empirical findings indicate that it falls to prevent collapse entirely. In particular, contrastive methods are still susceptible to dimensional collapse, where representations are confined to a narrow subspace, thus restricting both the information content and effective dimensionality. To address these concerns, we conduct a comprehensive literature analysis encompassing theoretical definitions, underlying causes, and mitigation strategies for each collapse type. We further categorize recent approaches to collapse prevention, including feature decorrelation techniques, eigenvalue distribution regularization, and batch-level statistical constraints, and assess their effectiveness through a comparative framework. This work aims to establish a unified conceptual foundation for understanding collapse in unsupervised learning and to guide the design of more robust representation learning algorithms.
PointStaClu: A Deep Point Cloud Clustering Method Based on Stable Cluster Discrimination
Potential inconsistencies between the goals of unsupervised representation learning and clustering within multi-stage deep clustering can diminish the effectiveness of these techniques. However, because the goal of unsupervised representation learning is inherently flexible and can be tailored to clustering, we introduce PointStaClu, a novel single-stage point cloud clustering method. This method employs stable cluster discrimination (StaClu) to tackle the inherent instability present in single-stage deep clustering training. It achieves this by constraining the gradient descent updates for negative instances within the cross-entropy loss function, and by updating the cluster centers using the same loss function. Furthermore, we integrate entropy constraints to regulate the distribution entropy of the dataset, thereby enhancing the cluster allocation. Our framework simplifies the process, employing a single loss function and an encoder for deep point cloud clustering. Extensive experiments on the ModelNet40 and ShapeNet dataset demonstrate that PointStaClu significantly narrows the performance gap between unsupervised point cloud clustering and supervised point cloud classification, presenting a novel approach to point cloud classification tasks.
Semi-Supervised Cross-Subject Emotion Recognition Based on Stacked Denoising Autoencoder Architecture Using a Fusion of Multi-Modal Physiological Signals
In recent decades, emotion recognition has received considerable attention. As more enthusiasm has shifted to the physiological pattern, a wide range of elaborate physiological emotion data features come up and are combined with various classifying models to detect one’s emotional states. To circumvent the labor of artificially designing features, we propose to acquire affective and robust representations automatically through the Stacked Denoising Autoencoder (SDA) architecture with unsupervised pre-training, followed by supervised fine-tuning. In this paper, we compare the performances of different features and models through three binary classification tasks based on the Valence-Arousal-Dominance (VAD) affection model. Decision fusion and feature fusion of electroencephalogram (EEG) and peripheral signals are performed on hand-engineered features; data-level fusion is performed on deep-learning methods. It turns out that the fusion data perform better than the two modalities. To take advantage of deep-learning algorithms, we augment the original data and feed it directly into our training model. We use two deep architectures and another generative stacked semi-supervised architecture as references for comparison to test the method’s practical effects. The results reveal that our scheme slightly outperforms the other three deep feature extractors and surpasses the state-of-the-art of hand-engineered features.