Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Evaluating Large Language Models Trained on Code

by Khlaaf, Heidy , Tillet, Philippe , Murati, Mira , Gray, Scott , Sutskever, Ilya , Chen, Mark , Hesse, Christopher , Felipe Petroski Such , Nichol, Alex , Mishkin, Pamela , Achiam, Josh , Suchir Balaji , McGrew, Bob , Krueger, Gretchen , McCandlish, Sam , Knight, Matthew , Henrique Ponde de Oliveira Pinto , Chan, Brooke , Carr, Andrew N , Zaremba, Wojciech , Sastry, Girish , Cummings, Dave , Burda, Yuri , Saunders, William , Ray, Alex , Guss, William Hebgen , Kaplan, Jared , Power, Alethea , Leike, Jan , Paino, Alex , Babuschkin, Igor , Ryder, Nick , Bavarian, Mohammad , Petrov, Michael , Herbert-Voss, Ariel , Yuan, Qiming , Chantzis, Fotios , Puri, Raul , Winter, Clemens , Kaiser, Lukasz , Tang, Jie , Welinder, Peter , Barnes, Elizabeth , Jain, Shantanu , Tworek, Jerry , Nicholas, Joseph , Heewoo Jun , Misra, Vedant , Radford, Alec , Amodei, Dario , Tezak, Nikolas , Brundage, Miles , Pavlov, Mikhail , Brockman, Greg , Plappert, Matthias , Mayer, Katie , Morikawa, Evan , Edwards, Harri

in Evaluation / Large language models

2021

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Evaluating Large Language Models Trained on Code

in Evaluation / Large language models

2021

Confirm

Do you wish to request the book?

Evaluating Large Language Models Trained on Code

in Evaluation / Large language models

2021

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Paper

Evaluating Large Language Models Trained on Code

Khlaaf, Heidy,

Tillet, Philippe,

Murati, Mira,

Gray, Scott,

Sutskever, Ilya,

Chen, Mark,

Hesse, Christopher,

Felipe Petroski Such,

Nichol, Alex,

Mishkin, Pamela,

Achiam, Josh,

Suchir Balaji,

McGrew, Bob,

Krueger, Gretchen,

McCandlish, Sam,

Knight, Matthew,

Henrique Ponde de Oliveira Pinto,

Chan, Brooke,

Carr, Andrew N,

Zaremba, Wojciech,

Sastry, Girish,

Cummings, Dave,

Burda, Yuri,

Saunders, William,

Ray, Alex,

Guss, William Hebgen,

Kaplan, Jared,

Power, Alethea,

Leike, Jan,

Paino, Alex,

Babuschkin, Igor,

Ryder, Nick,

Bavarian, Mohammad,

Petrov, Michael,

Herbert-Voss, Ariel,

Yuan, Qiming,

Chantzis, Fotios,

Puri, Raul,

Winter, Clemens,

Kaiser, Lukasz,

Tang, Jie,

Welinder, Peter,

Barnes, Elizabeth,

Jain, Shantanu,

Tworek, Jerry,

Nicholas, Joseph,

Heewoo Jun,

Misra, Vedant,

Radford, Alec,

Amodei, Dario,

Tezak, Nikolas,

Brundage, Miles,

Pavlov, Mikhail,

Brockman, Greg,

Plappert, Matthias,

Mayer, Katie,

Morikawa, Evan,

Edwards, Harri

2021

Overview

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Evaluation

/ Large language models