Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
6
result(s) for
"Power, Alethea"
Sort by:
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
2022
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of \"grokking\" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset.
Tada(da)! I Found it!
2004
\"Tada(da)! I Found it!\" is a collection of experimental poetry and fiction. One of the central themes throughout the work is \"found material.\" While the poetry cannot be classified as \"Found Poems\" in the strictest sense of the term they do draw on various textual sources (car advertisements, the Internet, both canonical and non-canonical poetry, etc.) for inspiration. In my Statement of Poetics I link my work to both Dada and contemporary Canadian experimental writers such as bpNichol, Christian Bök and Darren Wershler-Henry.Reader accessibility is a big concern for me and I see myself as writing for a broad audience as opposed to a specifically academic one. As such I have included a process appendix which, I feel, gives both the recreational reader and the academic a way into the text.
Dissertation
GPT-4 Technical Report
2024
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
by
Waites, Chris
,
Kiritchenko, Svetlana
,
Brown, Adam R
in
Human bias
,
Human performance
,
Linguistics
2023
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit \"breakthrough\" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
Evaluating Large Language Models Trained on Code
2021
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
Changes in eating, physical activity and related behaviors in a primary care-based weight loss intervention
2013
OBJECTIVE:
To examine changes in eating behaviors and physical activity, as well as predictors of weight loss success, in obese adults who participated in a 2-year behavioral weight loss intervention conducted in a primary care setting.
DESIGN:
A longitudinal, randomized controlled, multisite trial.
SUBJECTS:
Three hundred ninety obese (body mass index, 30–50 kg m
–2
) adults, ⩾21 years, in the Philadelphia region.
METHODS:
Participants were assigned to one of three interventions: (1) Usual Care (quarterly primary care provider (PCP) visits that included education on diet and exercise); (2) Brief Lifestyle Counseling (quarterly PCP visits plus monthly lifestyle counseling (LC) sessions about behavioral weight control); or (3) Enhanced Brief LC (the previous intervention with a choice of meal replacements or weight loss medication).
RESULTS:
At month 24, participants in both Brief LC and Enhanced Brief LC reported significantly greater improvements in mean (±s.e.) dietary restraint than those in Usual Care (4.4±0.5, 4.8±0.5 and 2.8±0.5, respectively; both
P
-values⩽0.016). The percentage of calories from fat, along with fruit and vegetable consumption, did not differ significantly among the three groups. At month 24, both the Brief LC and Enhanced Brief LC groups reported significantly greater increases than usual care in energy expenditure (kcal per week) from moderately vigorous activity (+593.4±175.9, +415.4±179.6 and −70.4±185.5 kcal per week, respectively; both
P
-values⩽0.037). The strongest predictor of weight loss at month 6 (partial
R
2
=33.4%,
P
<0.0001) and at month 24 (partial
R
2
=19.3%,
P
<0.001) was food records completed during the first 6 months. Participants who achieved a 5% weight loss at month 6 had 4.7 times greater odds of maintaining a ⩾5% weight loss at month 24.
CONCLUSIONS:
A behavioral weight loss intervention delivered in a primary care setting can result in significant weight loss, with corresponding improvements in eating restraint and energy expenditure. Moreover, completion of food records, along with weight loss at month 6, is a strong predictor of long-term weight loss.
Journal Article