Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by
Ren, Richard
, Barrass, Isabelle
, Yin, Xuwang
, Trevino, Eduardo
, Menghini, Cristina
, Geralnik, Matias
, Yang, Mick
, Kenstler, Brad
, Agarwal, Arunim
, Gatti, Alice
, Mantas Mazeika
, Vacareanu, Robert
, Lee, Dean
, Khoja, Adam
, Hendrycks, Dan
, Yue, Summer
in
Accuracy
/ Benchmarks
/ Honesty
/ Large language models
2026
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by
Ren, Richard
, Barrass, Isabelle
, Yin, Xuwang
, Trevino, Eduardo
, Menghini, Cristina
, Geralnik, Matias
, Yang, Mick
, Kenstler, Brad
, Agarwal, Arunim
, Gatti, Alice
, Mantas Mazeika
, Vacareanu, Robert
, Lee, Dean
, Khoja, Adam
, Hendrycks, Dan
, Yue, Summer
in
Accuracy
/ Benchmarks
/ Honesty
/ Large language models
2026
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by
Ren, Richard
, Barrass, Isabelle
, Yin, Xuwang
, Trevino, Eduardo
, Menghini, Cristina
, Geralnik, Matias
, Yang, Mick
, Kenstler, Brad
, Agarwal, Arunim
, Gatti, Alice
, Mantas Mazeika
, Vacareanu, Robert
, Lee, Dean
, Khoja, Adam
, Hendrycks, Dan
, Yue, Summer
in
Accuracy
/ Benchmarks
/ Honesty
/ Large language models
2026
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Paper
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
2026
Request Book From Autostore
and Choose the Collection Method
Overview
As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of \"honesty\" in LLMs, along with interventions aimed at mitigating deceptive behaviors. However, some benchmarks claiming to measure honesty in fact simply measure accuracy--the correctness of a model's beliefs--in disguise. Moreover, no benchmarks currently exist for directly measuring whether language models lie. In this work, we introduce a large-scale human-collected dataset for directly measuring lying, allowing us to disentangle accuracy from honesty. Across a diverse set of LLMs, we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest. Surprisingly, most frontier LLMs obtain high scores on truthfulness benchmarks yet exhibit a substantial propensity to lie under pressure, resulting in low honesty scores on our benchmark. We find that simple methods, such as representation engineering interventions, can improve honesty. These results underscore the growing need for robust evaluations and effective interventions to ensure LLMs remain trustworthy.
Publisher
Cornell University Library, arXiv.org
Subject
MBRLCatalogueRelatedBooks
Related Items
Related Items
We currently cannot retrieve any items related to this title. Kindly check back at a later time.
This website uses cookies to ensure you get the best experience on our website.