Catalogue Search | MBRL

Large pre-trained language models contain human-like biases of what is right and wrong to do

by Kersting, Kristian , Rothkopf, Constantin A. , Turan, Cigdem in 4007/4009 , 639/705/117 , Artificial intelligence

2022

Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, GPT-2 and GPT-3. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended the state of the art for many natural language processing tasks and shown that they capture not only linguistic knowledge but also retain general knowledge implicitly present in the data. Unfortunately, LMs trained on unfiltered text corpora suffer from degenerated and biased behaviour. While this is well established, we show here that recent LMs also contain human-like biases of what is right and wrong to do, reflecting existing ethical and moral norms of society. We show that these norms can be captured geometrically by a ‘moral direction’ which can be computed, for example, by a PCA, in the embedding space. The computed ‘moral direction’ can rate the normativity (or non-normativity) of arbitrary phrases without explicitly training the LM for this task, reflecting social norms well. We demonstrate that computing the ’moral direction’ can provide a path for attenuating or even preventing toxic degeneration in LMs, showcasing this capability on the RealToxicityPrompts testbed. Large language models identify patterns in the relations between words and capture their relations in an embedding space. Schramowski and colleagues show that a direction in this space can be identified that separates ‘right’ and ‘wrong’ actions as judged by human survey participants.

Journal Article

Share this book

Add to My Shelf

Making deep neural networks right for the right scientific reasons by interacting with their explanations

by Kersting, Kristian , Stammer, Wolfgang , Shao, Xiaoting in 631/114/1305 , 631/449 , Active learning

2020

Deep neural networks have demonstrated excellent performances in many real-world applications. Unfortunately, they may show Clever Hans-like behaviour (making use of confounding factors within datasets) to achieve high performance. In this work we introduce the novel learning setting of explanatory interactive learning and illustrate its benefits on a plant phenotyping research task. Explanatory interactive learning adds the scientist into the training loop, who interactively revises the original model by providing feedback on its explanations. Our experimental results demonstrate that explanatory interactive learning can help to avoid Clever Hans moments in machine learning and encourages (or discourages, if appropriate) trust in the underlying model. Deep learning approaches can show excellent performance but still have limited practical use if they learn to predict based on confounding factors in a dataset, for instance text labels in the corner of images. By using an explanatory interactive learning approach, with a human expert in the loop during training, it becomes possible to avoid predictions based on confounding factors.

Journal Article

Share this book

Add to My Shelf

Extending Hyperspectral Imaging for Plant Phenotyping to the UV-Range

by Kersting, Kristian , Kuska, Matheus Thomas , Brugger, Anna in Abiotic stress , Amino acids , Apoptosis

2019

Previous plant phenotyping studies have focused on the visible (VIS, 400–700 nm), near-infrared (NIR, 700–1000 nm) and short-wave infrared (SWIR, 1000–2500 nm) range. The ultraviolet range (UV, 200–380 nm) has not yet been used in plant phenotyping even though a number of plant molecules like flavones and phenol feature absorption maxima in this range. In this study an imaging UV line scanner in the range of 250–430 nm is introduced to investigate crop plants for plant phenotyping. Observing plants in the UV-range can provide information about important changes of plant substances. To record reliable and reproducible time series results, measurement conditions were defined that exclude phototoxic effects of UV-illumination in the plant tissue. The measurement quality of the UV-camera has been assessed by comparing it to a non-imaging UV-spectrometer by measuring six different plant-based substances. Given the findings of these preliminary studies, an experiment has been defined and performed monitoring the stress response of barley leaves to salt stress. The aim was to visualize the effects of abiotic stress within the UV-range to provide new insights into the stress response of plants. Our study demonstrated the first use of a hyperspectral sensor in the UV-range for stress detection in plant phenotyping.

Journal Article

Share this book

Add to My Shelf

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

by Kersting, Kristian , Struppek, Lukas , Hintersdorf, Dom

2023

Models for text-to-image synthesis, such as DALL-E 2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in the textual description, common models reflect cultural biases in their generated images. We analyze this behavior both qualitatively and quantitatively and identify a model’s text encoder as the root cause of the phenomenon. Such behavior can be interpreted as a model feature, offering users a simple way to customize the image generation and reflect their own cultural background. Yet, malicious users or service providers may also try to intentionally bias the image generation. One goal might be to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.

Journal Article

Share this book

Add to My Shelf

Does CLIP Know My Face?

by Kersting, Kristian , Brack, Manuel , Hintersdorf, Dominik

2024

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introduce a novel method to assess privacy for multi-modal models, specifically vision-language models like CLIP. The proposed Identity Inference Attack (IDIA) reveals whether an individual was included in the training data by querying the model with images of the same person. Letting the model choose from a wide variety of possible text labels, the model reveals whether it recognizes the person and, therefore, was used for training. Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws. This article appears in the AI & Society track.

Journal Article

Share this book

Add to My Shelf

A typology for exploring the mitigation of shortcut behaviour

by Kersting, Kristian , Stammer, Wolfgang , Friedrich, Felix in 639/705 , 639/705/117 , Active learning

2023

As machine learning models become larger, and are increasingly trained on large and uncurated datasets in weakly supervised mode, it becomes important to establish mechanisms for inspecting, interacting with and revising models. These are necessary to mitigate shortcut learning effects and to guarantee that the model’s learned knowledge is aligned with human knowledge. Recently, several explanatory interactive machine learning methods have been developed for this purpose, but each has different motivations and methodological details. In this work, we provide a unification of various explanatory interactive machine learning methods into a single typology by establishing a common set of basic modules. We discuss benchmarks and other measures for evaluating the overall abilities of explanatory interactive machine learning methods. With this extensive toolbox, we systematically and quantitatively compare several explanatory interactive machine learning methods. In our evaluations, all methods are shown to improve machine learning models in terms of accuracy and explainability. However, we found remarkable differences in individual benchmark tasks, which reveal valuable application-relevant aspects for the integration of these benchmarks in the development of future methods. Explanatory interactive machine learning methods have been developed to facilitate the learning process between the machine and the user. Friedrich et al. provide a unification of various explanatory interactive machine learning methods into a single typology, and present benchmarks for evaluating such methods.

Journal Article

Share this book

Add to My Shelf

Explanatory Interactive Machine Learning

by Rohde, Gernot , Kersting, Kristian , Stammer, Wolfgang in Artificial intelligence , Explainable artificial intelligence , Machine learning

2023

The most promising standard machine learning methods can deliver highly accurate classification results, often outperforming standard white-box methods. However, it is hardly possible for humans to fully understand the rationale behind the black-box results, and thus, these powerful methods hamper the creation of new knowledge on the part of humans and the broader acceptance of this technology. Explainable Artificial Intelligence attempts to overcome this problem by making the results more interpretable, while Interactive Machine Learning integrates humans into the process of insight discovery. The paper builds on recent successes in combining these two cutting-edge technologies and proposes how Explanatory Interactive Machine Learning (XIL) is embedded in a generalizable Action Design Research (ADR) process – called XIL-ADR. This approach can be used to analyze data, inspect models, and iteratively improve them. The paper shows the application of this process using the diagnosis of viral pneumonia, e.g., Covid-19, as an illustrative example. By these means, the paper also illustrates how XIL-ADR can help identify shortcomings of standard machine learning projects, gain new insights on the part of the human user, and thereby can help to unlock the full potential of AI-based systems for organizations and research.

Journal Article

Share this book

Add to My Shelf

Machine Learning Assisted Pattern Matching: Insight into Oxide Electronic Device Performance by Phase Determination in 4D-STEM Datasets

by Zintler, Alexander , Eilhardt, Robert , Kersting, Kristian in Four-dimensional Scanning Transmission Electron Microscopy (4D-STEM): New Experiments and Data Analyses for Determining Materials Functionality and Biological Structures , Learning algorithms , Machine learning

2020

Journal Article

Share this book

Add to My Shelf

Does CLIP Know My Face?

by Kersting, Kristian , Brack, Manuel , Hintersdorf, Dominik in Data points , Deep learning , Inference

2024

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introduce a novel method to assess privacy for multi-modal models, specifically vision-language models like CLIP. The proposed Identity Inference Attack (IDIA) reveals whether an individual was included in the training data by querying the model with images of the same person. Letting the model choose from a wide variety of possible text labels, the model reveals whether it recognizes the person and, therefore, was used for training. Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws.

Paper

Share this book

Add to My Shelf

Inferring Offensiveness In Images From Natural Language Supervision

by Kersting, Kristian , Schramowski, Patrick in Computer vision , Datasets , Natural language processing

2021

Probing or fine-tuning (large-scale) pre-trained models results in state-of-the-art performance for many NLP tasks and, more recently, even for computer vision tasks when combined with image data. Unfortunately, these approaches also entail severe risks. In particular, large image datasets automatically scraped from the web may contain derogatory terms as categories and offensive images, and may also underrepresent specific classes. Consequently, there is an urgent need to carefully document datasets and curate their content. Unfortunately, this process is tedious and error-prone. We show that pre-trained transformers themselves provide a methodology for the automated curation of large-scale vision datasets. Based on human-annotated examples and the implicit knowledge of a CLIP based model, we demonstrate that one can select relevant prompts for rating the offensiveness of an image. In addition to e.g. privacy violation and pornographic content previously identified in ImageNet, we demonstrate that our approach identifies further inappropriate and potentially offensive content.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter