Catalogue Search | MBRL

Dual insurance for generalized zero-shot learning

by Fang, Xiaozhao , Kang, Peipei , Li, Chuang in Artificial Intelligence , Classification , Clustering

2025

Traditional zero-shot learning aims to use the trained model to accurately classify samples from unseen classes, while for the more difficult task of generalized zero-shot learning, the trained model needs to classify samples from both seen and unseen classes into the correct classes. Because only seen class samples are available during training, generalized zero-shot learning meets great challenges in classification. Generative model is one of the good methods to solve this problem. However, the samples generated by the generative model are often of poor quality. In addition, there are semantic redundancies in the generated samples that are not conducive to classification. To solve these problems, we proposed the dual insurance model (DI-GAN) for generalized zero-shot learning in this paper, including a feature generation module and a semantic separation module. They guarantee the high quality of generated features and the good classification performance respectively. Specifically, the first insurance is based on generative adversarial network, whose generator is constrained by a clustering method to make the generated samples close to the real samples. The second insurance is based on variational autoencoder, including semantic separation, instance network and classification network. Semantic separation is designed to extract the semantically related parts which are beneficial to classification, while instance network acting on the semantically related parts is used to ensure the classification performance. Extensive experiments on four benchmark datasets show the competitiveness of the proposed DI-GAN.

Journal Article

Share this book

Add to My Shelf

Zero-shot learning via visual-semantic aligned autoencoder

by Jin, Cong , Huang, Jinjie , Wei, Tianshu in Bias , Classification , Deep learning

2023

Zero-shot learning recognizes the unseen samples via the model learned from the seen class samples and semantic features. Due to the lack of information of unseen class samples in the training set, some researchers have proposed the method of generating unseen class samples by using generative models. However, the generated model is trained with the training set samples first, and then the unseen class samples are generated, which results in the features of the unseen class samples tending to be biased toward the seen class and may produce large deviations from the real unseen class samples. To tackle this problem, we use the autoencoder method to generate the unseen class samples and combine the semantic features of the unseen classes with the proposed new sample features to construct the loss function. The proposed method is validated on three datasets and showed good results.

Journal Article

Share this book

Add to My Shelf

Classifier and Exemplar Synthesis for Zero-Shot Learning

by Wei-Lun, Chao , Gong Boqing , Soravit, Changpinyo in Benchmarks , Classifiers , Empirical analysis

2020

Zero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as learning manifold embeddings from graphs composed of object classes, leading to a flexible approach that synthesizes “classifiers” for the unseen classes. Then, we define an auxiliary task of synthesizing “exemplars” for the unseen classes to be used as an automatic denoising mechanism for any existing ZSL approaches or as an effective ZSL model by itself. On five visual recognition benchmark datasets, we demonstrate the superior performances of our proposed frameworks in various scenarios of both conventional and generalized ZSL. Finally, we provide valuable insights through a series of empirical analyses, among which are a comparison of semantic representations on the full ImageNet benchmark as well as a comparison of metrics used in generalized ZSL. Our code and data are publicly available at https://github.com/pujols/Zero-shot-learning-journal.

Journal Article

Share this book

Add to My Shelf

Semantic Contrastive Embedding for Generalized Zero-Shot Learning

by Han, Zongyan , Chen, Shuo , Fu, Zhenyong in Datasets , Embedding , Object recognition

2022

Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes when only the labeled examples from seen classes are provided. Recent feature generation methods learn a generative model that can synthesize the missing visual features of unseen classes to mitigate the data-imbalance problem in GZSL. However, the original visual feature space is suboptimal for GZSL recognition since it lacks semantic information, which is vital for recognizing the unseen classes. To tackle this issue, we propose to integrate the feature generation model with an embedding model. Our GZSL framework maps both the real and the synthetic samples produced by the generation model into an embedding space, where we perform the final GZSL classification. Specifically, we propose a semantic contrastive embedding (SCE) for our GZSL framework. Our SCE consists of attribute-level contrastive embedding and class-level contrastive embedding. They aim to obtain the transferable and discriminative information, respectively, in the embedding space. We evaluate our GZSL method with semantic contrastive embedding, named SCE-GZSL, on four benchmark datasets. The results show that our SCE-GZSL method can achieve the state-of-the-art or the second-best on these datasets.

Journal Article

Share this book

Add to My Shelf

Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning

by Wang, Yu-Chiang Frank , Lee, Yuan-Hao , Lin, Chia-Ching in Datasets , Deep learning , Hallucinations

2023

Zero-shot learning (ZSL) requires one to associate visual and semantic information observed from data of seen classes, so that test data of unseen classes can be recognized based on the described semantic representation. Aiming at synthesizing visual data from the given semantic inputs, hallucination-based ZSL approaches might suffer from mode collapse and biased problems due to the lack of ability in modeling the desirable visual features for unseen categories. In this paper, we present a generative model of Cross-Modal Consistency GAN (CMC-GAN), which performs semantics-guided intra-category knowledge transfer across image categories, so that data hallucination for unseen classes can be achieved with proper semantics and sufficient visual diversity. In our experiments, we perform standard and generalized ZSL on four benchmark datasets, confirming the effectiveness of our approach over that of state-of-the-art ZSL methods.

Journal Article

Share this book

Add to My Shelf

Generalized zero-shot emotion recognition from body gestures

by Wu, Jinting , Zhang, Yujia , Zhao, Xiaoguang in Categories , Coders , Emotion recognition

2022

In human-human interaction, body language is one of the most important emotional expressions. However, each emotion category contains abundant emotional body gestures, and basic emotions used in most researches are difficult to describe complex and diverse emotional states. It is costly to collect sufficient samples of all emotional expressions, and new emotions or new body gestures that are not included in the training set may appear during testing. To address the above problems, we design a novel mechanism that treats each emotion category as a collection of multiple body gesture categories to make better use of gesture information for emotion recognition. A Generalized Zero-Shot Learning (GZSL) framework is introduced to recognize both seen and unseen body gesture categories with the help of semantic information, and emotion predictions are further provided based on the relationship between gestures and emotions. This framework consists of two branches. The first branch is a Hierarchical Prototype Network (HPN) which learns the prototypes of body gestures and uses them to calculate the emotion attentive prototypes. This branch aims to obtain predictions on samples of the seen gesture categories. The second branch is a Semantic Auto-Encoder (SAE) which utilizes semantic representations to predict samples of unseen gesture categories. Thresholds are further trained to determine which branch result will be used during testing, and the emotion labels are finally obtained from these results. Comprehensive experiments are conducted on an emotion recognition dataset which contains skeleton data of multiple body gestures, and the performance of our framework is superior to both the traditional emotion classifier and state-of-the-art zero-shot learning methods.

Journal Article

Share this book

Add to My Shelf

Zero-Shot Image Classification Based on a Learnable Deep Metric

by Shi, Caijuan , Tu, Dongjing , Shi, Ze in Algorithms , Classification , common space embedding

2021

The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.

Journal Article

Share this book

Add to My Shelf

Dual-level contrastive learning network for generalized zero-shot learning

by Wu, Jigang , Liu, Jigang , Guan, Jiaqi in Artificial Intelligence , Classification , Computer Graphics

2022

Generalized zero-shot learning (GZSL) aims to utilize semantic information to recognize the seen and unseen samples, where unseen classes are unavailable during training. Though recent advances have been made by incorporating contrastive learning into GZSL, existing approaches still suffer from two limitations: (1) without considering fine-grained cluster structures, these models cannot guarantee the discriminability and semantic awareness of synthetic features; (2) classifiers tend to overfit the seen classes, as they only concentrate on the seen domain. To address these challenges, we propose a Dual-level Contrastive Learning Network (DCLN), in which intra-domain and cross-domain contrastive learning are seamlessly integrated into a unified learning model. Specifically, the former performs center-prototype contrasting to fully explore the discriminative structure knowledge, while the latter is proposed to effectively alleviate the overfitting problem by utilizing the semantic relationships between the seen and unseen domain. Finally, the experimental results on four benchmark datasets demonstrate the superiority of our DCLN over the state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Vision transformer-based generalized zero-shot learning with data criticizing

by Zhou, Quan , Zhang, Zhenqi , Liang, Yucuan in Artificial Intelligence , Bias , Classification

2025

Generalized Zero-Shot Learning (GZSL) aims to enable accurate testing and recognition of unseen classes by utilizing training data from seen classes and leveraging attribute knowledge. However, GZSL faces a challenge wherein the model, trained solely on seen class data, tends to be biased towards recognizing visual features of seen classes, resulting in poorer recognition performance for unseen classes. To address this issue, we propose an approach called Vi sion T ransformer-Based Generalized Zero-Shot Learning with Da ta Cr iticizing (ViT-DaCr). In order to obtain improved visual features, we thoroughly examine features extracted by Vision Transformer (ViT) with a new design. Additionally, we recognize that not all training data align with our model during the training process, leading the model to exhibit a bias towards recognizing visual features of seen classes and directly impacting visual feature recognition. Therefore, we propose a data critic mechanism that utilizes Adjusted Boxplot to filter out such data automatically during the training process. Extensive experiments demonstrate the advanced performance of our model on three challenging and popular datasets.

Journal Article

Share this book

Add to My Shelf

Learning semantic consistency for audio-visual zero-shot learning

by Chen, Yuling , Ruan, Xiaoli , Zhang, Wei in Artificial Intelligence , Audio data , Computer Science

2025

Audio-visual zero-shot learning requires an understanding of the relationship between audio and visual information to determine unseen classes. Despite many efforts and significant progress in the field, many existing methods tend to focus on learning strong representations, neglecting the semantic consistency between audio and video as well as the inherent hierarchical structure of the data. To address these issues, we propose Learning Semantic Consistency for Audio-Visual Zero-shot Learning. Specifically, we employ an attention mechanism to enhance cross-modal information interactions, aiming to capture the semantic consistency between audio and visual data. Meanwhile, we introduce a hyperbolic space to model the hierarchical structure of the data itself. Moreover, the proposed approach includes a novel loss function that considers the relationships between input modalities, reducing the distance between features of different modalities. To evaluate the proposed method, we test it on three benchmark datasets , , and . Extensive experimental results show that the proposed method achieves state-of-the-art performance on all three datasets. For example, on the dataset, the harmonic mean is improved by 5.7%. Code and data available at https://github.com/ybyangjing/LSC-AVZSL .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter