Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Large pre-trained language models contain human-like biases of what is right and wrong to do
by
Kersting, Kristian
, Rothkopf, Constantin A.
, Turan, Cigdem
, Andersen, Nico
, Schramowski, Patrick
in
4007/4009
/ 639/705/117
/ Artificial intelligence
/ Bias
/ Computation
/ Degeneration
/ Embedding
/ Engineering
/ Ethical standards
/ Ethics
/ Knowledge
/ Language
/ Morality
/ Natural language processing
/ Norms
/ Values
2022
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Large pre-trained language models contain human-like biases of what is right and wrong to do
by
Kersting, Kristian
, Rothkopf, Constantin A.
, Turan, Cigdem
, Andersen, Nico
, Schramowski, Patrick
in
4007/4009
/ 639/705/117
/ Artificial intelligence
/ Bias
/ Computation
/ Degeneration
/ Embedding
/ Engineering
/ Ethical standards
/ Ethics
/ Knowledge
/ Language
/ Morality
/ Natural language processing
/ Norms
/ Values
2022
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Large pre-trained language models contain human-like biases of what is right and wrong to do
by
Kersting, Kristian
, Rothkopf, Constantin A.
, Turan, Cigdem
, Andersen, Nico
, Schramowski, Patrick
in
4007/4009
/ 639/705/117
/ Artificial intelligence
/ Bias
/ Computation
/ Degeneration
/ Embedding
/ Engineering
/ Ethical standards
/ Ethics
/ Knowledge
/ Language
/ Morality
/ Natural language processing
/ Norms
/ Values
2022
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Large pre-trained language models contain human-like biases of what is right and wrong to do
Journal Article
Large pre-trained language models contain human-like biases of what is right and wrong to do
2022
Request Book From Autostore
and Choose the Collection Method
Overview
Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, GPT-2 and GPT-3. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended the state of the art for many natural language processing tasks and shown that they capture not only linguistic knowledge but also retain general knowledge implicitly present in the data. Unfortunately, LMs trained on unfiltered text corpora suffer from degenerated and biased behaviour. While this is well established, we show here that recent LMs also contain human-like biases of what is right and wrong to do, reflecting existing ethical and moral norms of society. We show that these norms can be captured geometrically by a ‘moral direction’ which can be computed, for example, by a PCA, in the embedding space. The computed ‘moral direction’ can rate the normativity (or non-normativity) of arbitrary phrases without explicitly training the LM for this task, reflecting social norms well. We demonstrate that computing the ’moral direction’ can provide a path for attenuating or even preventing toxic degeneration in LMs, showcasing this capability on the RealToxicityPrompts testbed.
Large language models identify patterns in the relations between words and capture their relations in an embedding space. Schramowski and colleagues show that a direction in this space can be identified that separates ‘right’ and ‘wrong’ actions as judged by human survey participants.
This website uses cookies to ensure you get the best experience on our website.