Catalogue Search | MBRL

Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models

by Bulat, Adrian , Tzimiropoulos, Georgios in Adaptation , Calibration , Datasets

2024

Soft prompt learning has emerged as a promising direction for adapting V &L models to a downstream task using a few training examples. However, current methods significantly overfit the training data suffering from large accuracy degradation when tested on unseen classes from the same domain. In addition, all prior methods operate exclusively under the assumption that both vision and language data is present. To this end, we make the following 5 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we also propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) Moreover, we identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) Importantly, we show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Expanding for the first time the setting to language-only adaptation, (5) we present a novel zero-shot variant of LASP where no visual samples at all are available for the downstream task. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Finally, (c) we show that our zero-shot variant improves upon CLIP without requiring any extra data. Code will be made available.

Journal Article

Share this book

Add to My Shelf

Detecting Dairy Cow Behavior Using Vision Technology

by Tzimiropoulos, Georgios , Bell, Matt J. , Down, Peter M. in agriculture , Animal welfare , Automation

2021

The aim of this study was to investigate using existing image recognition techniques to predict the behavior of dairy cows. A total of 46 individual dairy cows were monitored continuously under 24 h video surveillance prior to calving. The video was annotated for the behaviors of standing, lying, walking, shuffling, eating, drinking and contractions for each cow from 10 h prior to calving. A total of 19,191 behavior records were obtained and a non-local neural network was trained and validated on video clips of each behavior. This study showed that the non-local network used correctly classified the seven behaviors 80% or more of the time in the validated dataset. In particular, the detection of birth contractions was correctly predicted 83% of the time, which in itself can be an early warning calving alert, as all cows start contractions several hours prior to giving birth. This approach to behavior recognition using video cameras can assist livestock management.

Journal Article

Share this book

Add to My Shelf

Fast Algorithms for Fitting Active Appearance Models to Unconstrained Images

by Tzimiropoulos, Georgios , Pantic, Maja in Algorithms , Artificial Intelligence , Computer Imaging

2017

Fitting algorithms for Active Appearance Models (AAMs) are usually considered to be robust but slow or fast but less able to generalize well to unseen variations. In this paper, we look into AAM fitting algorithms and make the following orthogonal contributions: We present a simple “project-out” optimization framework that unifies and revises the most well-known optimization problems and solutions in AAMs. Based on this framework, we describe robust simultaneous AAM fitting algorithms the complexity of which is not prohibitive for current systems. We then go on one step further and propose a new approximate project-out AAM fitting algorithm which we coin Extended Project-Out Inverse Compositional (E-POIC). In contrast to current algorithms, E-POIC is both efficient and robust. Next, we describe a part-based AAM employing a translational motion model, which results in superior fitting and convergence properties. We also show that the proposed AAMs, when trained “in-the-wild” using SIFT descriptors, perform surprisingly well even for the case of unseen unconstrained images. Via a number of experiments on unconstrained human and animal face databases, we show that our combined contributions largely bridge the gap between exact and current approximate methods for AAM fitting and perform comparably with state-of-the-art face alignment systems.

Journal Article

Share this book

Add to My Shelf

Estimation of continuous valence and arousal levels from faces in naturalistic conditions

by Kossaifi, Jean , Bulat, Adrian , Tzimiropoulos, Georgios in 631/114/1305 , 631/477/2811 , Alignment

2021

Facial affect analysis aims to create new types of human–computer interactions by enabling computers to better understand a person’s emotional state in order to provide ad hoc help and interactions. Since discrete emotional classes (such as anger, happiness, sadness and so on) are not representative of the full spectrum of emotions displayed by humans on a daily basis, psychologists typically rely on dimensional measures, namely valence (how positive the emotional display is) and arousal (how calming or exciting the emotional display looks like). However, while estimating these values from a face is natural for humans, it is extremely difficult for computer-based systems and automatic estimation of valence and arousal in naturalistic conditions is an open problem. Additionally, the subjectivity of these measures makes it hard to obtain good quality data. Here we introduce a novel deep neural network architecture to analyse facial affect in naturalistic conditions with a high level of accuracy. The proposed network integrates face alignment and jointly estimates both categorical and continuous emotions in a single pass, making it suitable for real-time applications. We test our method on three challenging datasets collected in naturalistic conditions and show that our approach outperforms all previous methods. We also discuss caveats regarding the use of this tool, and ethical aspects that must be considered in its application. The annotation of the visual signs of emotions can be important for psychological studies and even human–computer interactions. Instead of only ascribing discrete emotions, Toisoul and colleagues use a single neural network that predicts emotional labels on a spectrum of valence and arousal without separate face-alignment steps.

Journal Article

Share this book

Add to My Shelf

Changes in Sheep Behavior before Lambing

by Tzimiropoulos, Georgios , Bell, Matt J. , Waters, Beatrice E. in agriculture , Animal behavior , behavior

2021

The aim of this study was to assess the duration and frequency of behavioral observations of pregnant ewes as they approached lambing. An understanding of behavioral changes before birth may provide opportunities for enhanced visual monitoring at this critical stage in the animal’s life. Behavioral observations for 17 ewes in late pregnancy were recorded during two separate time periods, which were 4 to 6 weeks before lambing and before giving birth. It was normal farm procedure for the sheep to come indoors for 6 weeks of close monitoring before lambing. The behaviors of standing, lying, walking, shuffling and contraction behaviors were recorded for each animal during both time periods. Over both time periods, the ewes spent a large proportion of their time either lying (0.40) or standing (0.42), with a higher frequency of standing (0.40) and shuffling (0.28) bouts than other behaviors. In the time period before giving birth, the frequency of lying and contraction bouts increased and the standing and walking bouts decreased, with a higher frequency of walking bouts in ewes that had an assisted lambing. The monitoring of behavioral patterns, such as lying and contractions, could be used as an alert to the progress of parturition.

Journal Article

Share this book

Add to My Shelf

Changes in Dairy Cow Behavior with and without Assistance at Calving

by Tzimiropoulos, Georgios , Bell, Matt J. , Cavendish, Bethan in agriculture , Animals , Behavior

2021

The aim of this study was to characterize calving behavior of dairy cows and to compare the duration and frequency of behaviors for assisted and unassisted dairy cows at calving. Behavioral data from nine hours prior to calving were collected for 35 Holstein-Friesian dairy cows. Cows were continuously monitored under 24 h video surveillance. The behaviors of standing, lying, walking, shuffle, eating, drinking and contractions were recorded for each cow until birth. A generalized linear mixed model was used to assess differences in the duration and frequency of behaviors prior to calving for assisted and unassisted cows. The nine hours prior to calving was assessed in three-hour time periods. The study found that the cows spent a large proportion of their time either lying (0.49) or standing (0.35), with a higher frequency of standing (0.36) and shuffle (0.26) bouts than other behaviors during the study. There were no differences in behavior between assisted and unassisted cows. During the three-hours prior to calving, the duration and bouts of lying, including contractions, were higher than during other time periods. While changes in behavior failed to identify an association with calving assistance, the monitoring of behavioral patterns could be used as an alert to the progress of parturition.

Journal Article

Share this book

Add to My Shelf

Knowledge Distillation Meets Open-Set Semi-supervised Learning

by Martinez, Brais , Bulat, Adrian , Yang, Jing in Analysis , Artificial Intelligence , Combinatorial analysis

2025

Existing knowledge distillation methods mostly focus on distillation of teacher’s prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel semantic representational distillation (SRD) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. The key idea is that we leverage the teacher’s classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions. This is accomplished by introducing a notion of cross-network logit computed through passing student’s representation into teacher’s classifier. Further, considering the set of seen classes as a basis for the semantic space in a combinatorial perspective, we scale SRD to unseen classes for enabling effective exploitation of largely available, arbitrary unlabeled training data. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL). Extensive experiments show that our SRD outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks, as well as less studied yet practically crucial binary network distillation. Under more realistic open-set SSL settings we introduce, we reveal that knowledge distillation is generally more effective than existing out-of-distribution sample detection, and our proposed SRD is superior over both previous distillation and SSL competitors. The source code is available at https://github.com/jingyang2017/SRD_ossl .

Journal Article

Share this book

Add to My Shelf

Euler Principal Component Analysis

by Tzimiropoulos, Georgios , Liwicki, Stephan , Zafeiriou, Stefanos in Algorithms , Analysis , Artificial Intelligence

2013

Principal Component Analysis (PCA) is perhaps the most prominent learning tool for dimensionality reduction in pattern recognition and computer vision. However, the ℓ 2 -norm employed by standard PCA is not robust to outliers. In this paper, we propose a kernel PCA method for fast and robust PCA, which we call Euler-PCA ( e -PCA). In particular, our algorithm utilizes a robust dissimilarity measure based on the Euler representation of complex numbers. We show that Euler-PCA retains PCA’s desirable properties while suppressing outliers. Moreover, we formulate Euler-PCA in an incremental learning framework which allows for efficient computation. In our experiments we apply Euler-PCA to three different computer vision applications for which our method performs comparably with other state-of-the-art approaches.

Journal Article

Share this book

Add to My Shelf

Euler Principal Component Analysis

by ZAFEIRIOU, Stefanos , TZIMIROPOULOS, Georgios , LIWICKI, Stephan in Applied sciences , Artificial intelligence , Computer science; control theory; systems

2013

Journal Article

Share this book

Add to My Shelf

CLIPCleaner: Cleaning Noisy Labels with CLIP

by Chen, Feng , Tzimiropoulos, Georgios , Patras, Ioannis in Bias , Cleaning , Labels

2024

Learning with Noisy labels (LNL) poses a significant challenge for the Machine Learning community. Some of the most widely used approaches that select as clean samples for which the model itself (the in-training model) has high confidence, e.g., `small loss', can suffer from the so called `self-confirmation' bias. This bias arises because the in-training model, is at least partially trained on the noisy labels. Furthermore, in the classification case, an additional challenge arises because some of the label noise is between classes that are visually very similar (`hard noise'). This paper addresses these challenges by proposing a method (\\textit{CLIPCleaner}) that leverages CLIP, a powerful Vision-Language (VL) model for constructing a zero-shot classifier for efficient, offline, clean sample selection. This has the advantage that the sample selection is decoupled from the in-training model and that the sample selection is aware of the semantic and visual similarities between the classes due to the way that CLIP is trained. We provide theoretical justifications and empirical evidence to demonstrate the advantages of CLIP for LNL compared to conventional pre-trained models. Compared to current methods that combine iterative sample selection with various techniques, \\textit{CLIPCleaner} offers a simple, single-step approach that achieves competitive or superior performance on benchmark datasets. To the best of our knowledge, this is the first time a VL model has been used for sample selection to address the problem of Learning with Noisy Labels (LNL), highlighting their potential in the domain.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter