Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
36,164 result(s) for "Representation learning"
Sort by:
Context Autoencoder for Self-supervised Representation Learning
We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction—predict the representations for the masked patches, and masked patch reconstruction—reconstruct the masked patches. The network is an encoder–regressor–decoder architecture: the encoder takes the visible patches as input; the regressor predicts the representations of the masked patches, which are expected to be aligned with the representations computed from the encoder, using the representations of visible patches and the positions of visible and masked patches; the decoder reconstructs the masked patches from the predicted encoded representations. The CAE design encourages the separation of learning the encoder (representation) from completing the pertaining tasks: masked representation prediction and masked patch reconstruction tasks, and making predictions in the encoded representation space empirically shows the benefit to representation learning. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks: semantic segmentation, object detection and instance segmentation, and classification. The code will be available at https://github.com/Atten4Vis/CAE.
Tensorized Multi-view Subspace Representation Learning
Self-representation based subspace learning has shown its effectiveness in many applications. In this paper, we promote the traditional subspace representation learning by simultaneously taking advantages of multiple views and prior constraint. Accordingly, we establish a novel algorithm termed as Tensorized Multi-view Subspace Representation Learning. To exploit different views, the subspace representation matrices of different views are regarded as a low-rank tensor, which effectively models the high-order correlations of multi-view data. To incorporate prior information, a constraint matrix is devised to guide the subspace representation learning within a unified framework. The subspace representation tensor equipped with a low-rank constraint models elegantly the complementary information among different views, reduces redundancy of subspace representations, and then improves the accuracy of subsequent tasks. We formulate the model with a tensor nuclear norm minimization problem constrained with ℓ2,1-norm and linear equalities. The minimization problem is efficiently solved by using an Augmented Lagrangian Alternating Direction Minimization method. Extensive experimental results on diverse multi-view datasets demonstrate the effectiveness of our algorithm.
A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network
Background Drug-target interaction prediction is of great significance for narrowing down the scope of candidate medications, and thus is a vital step in drug discovery. Because of the particularity of biochemical experiments, the development of new drugs is not only costly, but also time-consuming. Therefore, the computational prediction of drug target interactions has become an essential way in the process of drug discovery, aiming to greatly reducing the experimental cost and time. Results We propose a learning-based method based on feature representation learning and deep neural network named DTI-CNN to predict the drug-target interactions. We first extract the relevant features of drugs and proteins from heterogeneous networks by using the Jaccard similarity coefficient and restart random walk model. Then, we adopt a denoising autoencoder model to reduce the dimension and identify the essential features. Third, based on the features obtained from last step, we constructed a convolutional neural network model to predict the interaction between drugs and proteins. The evaluation results show that the average AUROC score and AUPR score of DTI-CNN were 0.9416 and 0.9499, which obtains better performance than the other three existing state-of-the-art methods. Conclusions All the experimental results show that the performance of DTI-CNN is better than that of the three existing methods and the proposed method is appropriately designed.
Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (“which problem”), and determines the number of discriminative regions in an automatic and adaptive manner (“how many problem”). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement learning to localize the distinctions in a weakly supervised manner or even an unsupervised manner. (4) Unsupervised discriminative localization is further explored to avoid the heavy labor consumption of annotating, and extremely strengthen the usability and scalability of our M2DRL approach. Compared with state-of-the-art methods on two widely-used fine-grained visual categorization datasets, our M2DRL approach achieves the best categorization accuracy.
Autoencoding slow representations for semi-supervised data-efficient regression
The slowness principle is a concept inspired by the visual cortex of the brain. It postulates that the underlying generative factors of a quickly varying sensory signal change on a different, slower time scale. By applying this principle to state-of-the-art unsupervised representation learning methods one can learn a latent embedding to perform supervised downstream regression tasks more data efficient. In this paper, we compare different approaches to unsupervised slow representation learning such as L p norm based slowness regularization and the SlowVAE, and propose a new term based on Brownian motion used in our method, the S-VAE. We empirically evaluate these slowness regularization terms with respect to their downstream task performance and data efficiency in state estimation and behavioral cloning tasks. We find that slow representations show great performance improvements in settings where only sparse labeled training data is available. Furthermore, we present a theoretical and empirical comparison of the discussed slowness regularization terms. Finally, we discuss how the Fréchet Inception Distance (FID), commonly used to determine the generative capabilities of GANs, can predict the performance of trained models in supervised downstream tasks.
Deep multimodal representation learning for generalizable person re-identification
Person re-identification plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. Recently, leveraging the supervised or semi-unsupervised learning paradigms, which benefits from the large-scale datasets and strong computing performance, has achieved a competitive performance on a specific target domain. However, when Re-ID models are directly deployed in a new domain without target samples, they always suffer from considerable performance degradation and poor domain generalization. To address this challenge, we propose a Deep Multimodal Representation Learning network to elaborate rich semantic knowledge for assisting in representation learning during the pre-training. Importantly, a multimodal representation learning strategy is introduced to translate the features of different modalities into the common space, which can significantly boost generalization capability of Re-ID model. As for the fine-tuning stage, a realistic dataset is adopted to fine-tune the pre-trained model for better distribution alignment with real-world data. Comprehensive experiments on benchmarks demonstrate that our method can significantly outperform previous domain generalization or meta-learning methods with a clear margin. Our source code will also be publicly available at https://github.com/JeremyXSC/DMRL .
View-Invariant Skeleton Action Representation Learning via Motion Retargeting
Current self-supervised approaches for skeleton action representation learning often focus on constrained scenarios, where videos and skeleton data are recorded in laboratory settings. When dealing with estimated skeleton data in real-world videos, such methods perform poorly due to the large variations across subjects and camera viewpoints. To address this issue, we introduce ViA, a novel View-Invariant Autoencoder for self-supervised skeleton action representation learning. ViA leverages motion retargeting between different human performers as a pretext task, in order to disentangle the latent action-specific ‘Motion’ features on top of the visual representation of a 2D or 3D skeleton sequence. Such ‘Motion’ features are invariant to skeleton geometry and camera view and allow ViA to facilitate both, cross-subject and cross-view action classification tasks. We conduct a study focusing on transfer-learning for skeleton-based action recognition with self-supervised pre-training on real-world data (e.g., Posetics). Our results showcase that skeleton representations learned from ViA are generic enough to improve upon state-of-the-art action classification accuracy, not only on 3D laboratory datasets such as NTU-RGB+D 60 and NTU-RGB+D 120, but also on real-world datasets where only 2D data are accurately estimated, e.g., Toyota Smarthome, UAV-Human and Penn Action. Code and models will be publicly available at https://walker-a11y.github.io/ViA-project.
Learning Dynamic Batch-Graph Representation for Deep Representation Learning
Recently, batch-based image data representation has been demonstrated to be effective for context-enhanced image representation. The core issue for this task is capturing the dependences of image samples within each mini-batch and conducting message communication among different samples. Existing approaches mainly adopt self-attention or local self-attention models (on patch dimension) for this task which fail to fully exploit the intrinsic relationships of samples within mini-batch and also be sensitive to noises and outliers. To address this issue, in this paper, we propose a flexible Dynamic Batch-Graph Representation (DyBGR) model, to automatically explore the intrinsic relationship of samples for contextual sample representation. Specifically, DyBGR first represents the mini-batch with a graph (termed batch-graph) in which nodes represent image samples and edges encode the dependences of images. This graph is dynamically learned with the constraint of similarity, sparseness and semantic correlation. Upon this, DyBGR exchanges the sample (node) information on the batch-graph to update each node representation. Note that, both batch-graph learning and information propagation are jointly optimized to boost their respective performance. Furthermore, in practical, DyBGR model can be implemented via a simple plug-and-play block (named DyBGR block) which thus can be potentially integrated into any mini-batch based deep representation learning schemes. Extensive experiments on deep metric learning tasks demonstrate the effectiveness of DyBGR. We will release the code at https://github.com/SissiW/DyBGR .
Self-Supervised Representation Learning for Remote Sensing Image Change Detection Based on Temporal Prediction
Traditional change detection (CD) methods operate in the simple image domain or hand-crafted features, which has less robustness to the inconsistencies (e.g., brightness and noise distribution, etc.) between bitemporal satellite images. Recently, deep learning techniques have reported compelling performance on robust feature learning. However, generating accurate semantic supervision that reveals real change information in satellite images still remains challenging, especially for manual annotation. To solve this problem, we propose a novel self-supervised representation learning method based on temporal prediction for remote sensing image CD. The main idea of our algorithm is to transform two satellite images into more consistent feature representations through a self-supervised mechanism without semantic supervision and any additional computations. Based on the transformed feature representations, a better difference image (DI) can be obtained, which reduces the propagated error of DI on the final detection result. In the self-supervised mechanism, the network is asked to identify different sample patches between two temporal images, namely, temporal prediction. By designing the network for the temporal prediction task to imitate the discriminator of generative adversarial networks, the distribution-aware feature representations are automatically captured and the result with powerful robustness can be acquired. Experimental results on real remote sensing data sets show the effectiveness and superiority of our method, improving the detection precision up to 0.94–35.49%.