Catalogue Search | MBRL

A mathematical theory of semantic development in deep neural networks

by Saxe, Andrew M. , Ganguli, Surya , McClelland, James L. in Applied Mathematics , Artificial neural networks , Biological Sciences

2019

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: What are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.

Journal Article

Share this book

Add to My Shelf

Abrupt and spontaneous strategy switches emerge in simple regularised neural networks

by Muhle-Karbe, Paul S. , Summerfield, Christopher , Touzo, Léo in Adjustment (Psychology) , Adult , Artificial neural networks

2024

Humans sometimes have an insight that leads to a sudden and drastic performance improvement on the task they are working on. Sudden strategy adaptations are often linked to insights, considered to be a unique aspect of human cognition tied to complex processes such as creativity or meta-cognitive reasoning. Here, we take a learning perspective and ask whether insight-like behaviour can occur in simple artificial neural networks, even when the models only learn to form input-output associations through gradual gradient descent. We compared learning dynamics in humans and regularised neural networks in a perceptual decision task that included a hidden regularity to solve the task more efficiently. Our results show that only some humans discover this regularity, and that behaviour is marked by a sudden and abrupt strategy switch that reflects an aha-moment. Notably, we find that simple neural networks with a gradual learning rule and a constant learning rate closely mimicked behavioural characteristics of human insight-like switches, exhibiting delay of insight, suddenness and selective occurrence in only some networks. Analyses of network architectures and learning dynamics revealed that insight-like behaviour crucially depended on a regularised gating mechanism and noise added to gradient updates, which allowed the networks to accumulate “silent knowledge” that is initially suppressed by regularised gating. This suggests that insight-like behaviour can arise from gradual learning in simple neural networks, where it reflects the combined influences of noise, gating and regularisation. These results have potential implications for more complex systems, such as the brain, and guide the way for future insight research.

Journal Article

Share this book

Add to My Shelf

Modelling continual learning in humans with Hebbian context gating and exponentially decaying task signals

by Summerfield, Christopher , Nagy, David G. , Saxe, Andrew in Animals , Architecture , Artificial neural networks

2023

Humans can learn several tasks in succession with minimal mutual interference but perform more poorly when trained on multiple tasks at once. The opposite is true for standard deep neural networks. Here, we propose novel computational constraints for artificial neural networks, inspired by earlier work on gating in the primate prefrontal cortex, that capture the cost of interleaved training and allow the network to learn two tasks in sequence without forgetting. We augment standard stochastic gradient descent with two algorithmic motifs, so-called “sluggish” task units and a Hebbian training step that strengthens connections between task units and hidden units that encode task-relevant information. We found that the “sluggish” units introduce a switch-cost during training, which biases representations under interleaved training towards a joint representation that ignores the contextual cue, while the Hebbian step promotes the formation of a gating scheme from task units to the hidden layer that produces orthogonal representations which are perfectly guarded against interference. Validating the model on previously published human behavioural data revealed that it matches performance of participants who had been trained on blocked or interleaved curricula, and that these performance differences were driven by misestimation of the true category boundary.

Journal Article

Share this book

Add to My Shelf

Strategically managing learning during perceptual decision making

by Juliana Y Rhee , Travis Chapman , Javier Masís in Accuracy , Animals , Behavior

2023

Making optimal decisions in the face of noise requires balancing short-term speed and accuracy. But a theory of optimality should account for the fact that short-term speed can influence long-term accuracy through learning. Here, we demonstrate that long-term learning is an important dynamical dimension of the speed-accuracy trade-off. We study learning trajectories in rats and formally characterize these dynamics in a theory expressed as both a recurrent neural network and an analytical extension of the drift-diffusion model that learns over time. The model reveals that choosing suboptimal response times to learn faster sacrifices immediate reward, but can lead to greater total reward. We empirically verify predictions of the theory, including a relationship between stimulus exposure and learning speed, and a modulation of reaction time by future learning prospects. We find that rats’ strategies approximately maximize total reward over the full learning epoch, suggesting cognitive control over the learning process.

Journal Article

Share this book

Add to My Shelf

Does the Use of Intraoperative Neuromonitoring during Thyroid and Parathyroid Surgery Reduce the Incidence of Recurrent Laryngeal Nerve Injuries? A Systematic Review and Meta-Analysis

by Saxe, Andrew , Idris, Mohamed , Gemechu, Jickssa in Analysis , Electrodes , Injuries

2024

Injury to the recurrent laryngeal nerve (RLN) can be a devastating complication of thyroid and parathyroid surgery. Intraoperative neuromonitoring (IONM) has been proposed as a method to reduce the number of RLN injuries but the data are inconsistent. We performed a meta-analysis to critically assess the data. After applying inclusion and exclusion criteria, 60 studies, including five randomized trials and eight non-randomized prospective trials, were included. A meta-analysis of all studies demonstrated an odds ratio (OR) of 0.66 (95% CI [0.56, 0.79], p < 0.00001) favoring IONM compared to the visual identification of the RLN in limiting permanent RLN injuries. A meta-analysis of studies employing contemporaneous controls and routine postoperative laryngoscopy to diagnose RLN injuries (considered to be the most reliable design) demonstrated an OR of 0.69 (95% CI [0.56, 0.84], p = 0.0003), favoring IONM. Strong consideration should be given to employing IONM when performing thyroid and parathyroid surgery.

Journal Article

Share this book

Add to My Shelf

If deep learning is the answer, what is the question?

by Nelli, Stephanie , Saxe, Andrew , Summerfield, Christopher in Artificial intelligence , Brain research , Cognition

2021

Neuroscience research is undergoing a minor revolution. Recent advances in machine learning and artificial intelligence research have opened up new ways of thinking about neural computation. Many researchers are excited by the possibility that deep neural networks may offer theories of perception, cognition and action for biological brains. This approach has the potential to radically reshape our approach to understanding neural systems, because the computations performed by deep networks are learned from experience, and not endowed by the researcher. If so, how can neuroscientists use deep networks to model and understand biological brains? What is the outlook for neuroscientists who seek to characterize computations or neural codes, or who wish to understand perception, attention, memory and executive functions? In this Perspective, our goal is to offer a road map for systems neuroscience research in the age of deep learning. We discuss the conceptual and methodological challenges of comparing behaviour, learning dynamics and neural representations in artificial and biological systems, and we highlight new research questions that have emerged for neuroscience as a direct consequence of recent advances in machine learning.Deep neural networks may offer theories of perception, cognition and action for biological brains. Here, Saxe, Nelli and Summerfield offer a road map of how neuroscientists can use deep networks to model and understand biological brains.

Journal Article

Share this book

Add to My Shelf

Organizing memories for generalization in complementary learning systems

by Advani, Madhu , Saxe, Andrew , Spruston, Nelson in 631/378/116/1925 , 631/378/1595/2167 , 631/378/1595/2638

2023

Memorization and generalization are complementary cognitive processes that jointly promote adaptive behavior. For example, animals should memorize safe routes to specific water sources and generalize from these memories to discover environmental features that predict new ones. These functions depend on systems consolidation mechanisms that construct neocortical memory traces from hippocampal precursors, but why systems consolidation only applies to a subset of hippocampal memories is unclear. Here we introduce a new neural network formalization of systems consolidation that reveals an overlooked tension—unregulated neocortical memory transfer can cause overfitting and harm generalization in an unpredictable world. We resolve this tension by postulating that memories only consolidate when it aids generalization. This framework accounts for partial hippocampal–cortical memory transfer and provides a normative principle for reconceptualizing numerous observations in the field. Generalization-optimized systems consolidation thus provides new insight into how adaptive behavior benefits from complementary learning systems specialized for memorization and generalization. The authors derive a neural network theory of systems consolidation to assess why some memories consolidate more than others. They propose that brains regulate consolidation to optimize generalization, so only predictable memory components consolidate.

Journal Article

Share this book

Add to My Shelf

A deep learning framework for neuroscience

by Roelfsema, Pieter , Kriegeskorte, Nikolaus , Schapiro, Anna C in Artificial intelligence , Artificial neural networks , Brain

2019

Systems neuroscience seeks explanations for how the brain implements a wide variety of perceptual, cognitive and motor tasks. Conversely, artificial intelligence attempts to design computational systems based on the tasks they will have to solve. In artificial neural networks, the three components specified by design are the objective functions, the learning rules and the architectures. With the growing success of deep learning, which utilizes brain-inspired architectures, these three designed components have increasingly become central to how we model, engineer and optimize complex artificial learning systems. Here we argue that a greater focus on these components would also benefit systems neuroscience. We give examples of how this optimization-based framework can drive theoretical and experimental progress in neuroscience. We contend that this principled perspective on systems neuroscience will help to generate more rapid progress.

Journal Article

Share this book

Add to My Shelf

Modelling continual learning in humans with Hebbian context gating and exponentially decaying task signals

by Christopher Summerfield , Timo Flesch , David G. Nagy

2023

Humans can learn several tasks in succession with minimal mutual interference but perform more poorly when trained on multiple tasks at once. The opposite is true for standard deep neural networks. Here, we propose novel computational constraints for artificial neural networks, inspired by earlier work on gating in the primate prefrontal cortex, that capture the cost of interleaved training and allow the network to learn two tasks in sequence without forgetting. We augment standard stochastic gradient descent with two algorithmic motifs, so-called “sluggish” task units and a Hebbian training step that strengthens connections between task units and hidden units that encode task-relevant information. We found that the “sluggish” units introduce a switch-cost during training, which biases representations under interleaved training towards a joint representation that ignores the contextual cue, while the Hebbian step promotes the formation of a gating scheme from task units to the hidden layer that produces orthogonal representations which are perfectly guarded against interference. Validating the model on previously published human behavioural data revealed that it matches performance of participants who had been trained on blocked or interleaved curricula, and that these performance differences were driven by misestimation of the true category boundary. Author summary Humans can learn multiple tasks over their lifetime with minimal forgetting. In contrast, machine learning architectures based on artificial neural networks fail to learn multiple tasks in sequence and require data of all tasks to be present at once. Previous reports suggest that the opposite is true for humans: We learn better when trained on one task at a time. Here, we sought to identify the basis-set of algorithmic motifs required to mimic human-like continual learning. We propose a novel training method inspired by insights into the function of prefrontal cortex of the human brain. The method consists of task neurons that carry information over successive trials, and an update step that links those to other neurons that encode task-relevant information. Together, these two innovations allow us to model human continual task performance. Analysing how the network represented task information revealed striking similarities between our network and recent reports on task representations in the prefrontal cortex of the mammalian brain. Taken together, our approach describes an effort to bridge insights from machine learning and neuroscience to advance our understanding of the algorithmic basis of continual learning.

Journal Article

Share this book

Add to My Shelf

Probing transfer learning with a model of synthetic correlated datasets

by Gerace, Federica , Zdeborová, Lenka , Saxe, Andrew in correlated dataset , data modelling , Datasets

2022

Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. This setup allows for an analytic characterization of the generalization performance obtained when transferring the learned feature map from the source to the target task. Focusing on the problem of training two-layer networks in a binary classification setting, we show that our model can capture a range of salient features of transfer learning with real data. Moreover, by exploiting parametric control over the correlation between the two data-sets, we systematically investigate under which conditions the transfer of features is beneficial for generalization.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter