Catalogue Search | MBRL

To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review

by Shwartz Ziv, Ravid , LeCun, Yann in Algorithms , Artificial neural networks , Cognitive tasks

2024

Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory has shaped deep neural networks, particularly the information bottleneck principle. This principle optimizes the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. This framework includes multiple encoders and decoders, suggesting that all existing work on self-supervised learning can be seen as specific instances. We aim to unify these approaches to understand their underlying principles better and address the main challenge: many works present different frameworks with differing theories that may seem contradictory. By weaving existing research into a cohesive narrative, we delve into contemporary self-supervised methodologies, spotlight potential research areas, and highlight inherent challenges. Moreover, we discuss how to estimate information-theoretic quantities and their associated empirical problems. Overall, this paper provides a comprehensive review of the intersection of information theory, self-supervised learning, and deep neural networks, aiming for a better understanding through our proposed unified approach.

Journal Article

Share this book

Add to My Shelf

A hierarchical loss and its problems when classifying non-hierarchically

by Tygert, Mark , Wu, Cinna , LeCun, Yann in Artificial neural networks , Biology and Life Sciences , Classification

2019

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called \"loss\" or \"win\") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier.

Journal Article

Share this book

Add to My Shelf

Catalyzing next-generation Artificial Intelligence through NeuroAI

by Ölveczky, Bence , Zador, Anthony , Lillicrap, Timothy in 631/378 , 639/705/117 , Animal models

2023

Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities – inherited from over 500 million years of evolution – that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI. One of the ambitions of computational neuroscience is that we will continue to make improvements in the field of artificial intelligence that will be informed by advances in our understanding of how the brains of various species evolved to process information. To that end, here the authors propose an expanded version of the Turing test that involves embodied sensorimotor interactions with the world as a new framework for accelerating progress in artificial intelligence.

Journal Article

Share this book

Add to My Shelf

Feature learning and deep architectures: new directions for music informatics

by Bello, Juan P. , Humphrey, Eric J. , LeCun, Yann in Algorithms , Analysis , Architecture

2013

As we look to advance the state of the art in content-based music informatics, there is a general sense that progress is decelerating throughout the field. On closer inspection, performance trajectories across several applications reveal that this is indeed the case, raising some difficult questions for the discipline: why are we slowing down, and what can we do about it? Here, we strive to address both of these concerns. First, we critically review the standard approach to music signal analysis and identify three specific deficiencies to current methods: hand-crafted feature design is sub-optimal and unsustainable, the power of shallow architectures is fundamentally limited, and short-time analysis cannot encode musically meaningful structure. Acknowledging breakthroughs in other perceptual AI domains, we offer that deep learning holds the potential to overcome each of these obstacles. Through conceptual arguments for feature learning and deeper processing architectures, we demonstrate how deep processing models are more powerful extensions of current methods, and why now is the time for this paradigm shift. Finally, we conclude with a discussion of current challenges and the potential impact to further motivate an exploration of this promising research area.

Journal Article

Share this book

Add to My Shelf

Deep learning

by Bengio, Yoshua , Hinton, Geoffrey , LeCun, Yann in 639/705 , 639/705/117 , Algorithms

2015

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

Journal Article

Share this book

Add to My Shelf

The Power and Limits of Deep Learning

by LeCun, Yann in Artificial intelligence , Computational linguistics , Language processing

2018

Journal Article

Share this book

Add to My Shelf

The Power and Limits of Deep Learning

by LeCun, Yann in FEATURES

2018

Journal Article

Share this book

Add to My Shelf

Universal halting times in optimization and machine learning

by Trogdon, Thomas , Sagun, Levent , LeCun, Yann in Research article

2018

We present empirical evidence that the halting times for a class of optimization algorithms are universal. The algorithms we consider come from quadratic optimization, spin glasses and machine learning. A universality theorem is given in the case of the quadratic gradient descent flow. More precisely, given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time of the algorithm follow a distribution that, after centering and scaling, appears invariant under changes in the distribution on the landscape — universality is present.

Journal Article

Share this book

Add to My Shelf

Guest Editorial: Deep Learning

by Ranzato, Marc’Aurelio , Hinton, Geoffrey , LeCun, Yann in Artificial Intelligence , Computer Imaging , Computer Science

2015

Issue Title: Special Issue: Deep Learning

Journal Article

Share this book

Add to My Shelf

Unscented Kalman Filter for Brain-Machine Interfaces

by Li, Zheng , Lebedev, Mikhail A. , Nicolelis, Miguel A. L. in Accuracy , Algorithms , Animal experimentation

2009

Brain machine interfaces (BMIs) are devices that convert neural signals into commands to directly control artificial actuators, such as limb prostheses. Previous real-time methods applied to decoding behavioral commands from the activity of populations of neurons have generally relied upon linear models of neural tuning and were limited in the way they used the abundant statistical information contained in the movement profiles of motor tasks. Here, we propose an n-th order unscented Kalman filter which implements two key features: (1) use of a non-linear (quadratic) model of neural tuning which describes neural activity significantly better than commonly-used linear tuning models, and (2) augmentation of the movement state variables with a history of n-1 recent states, which improves prediction of the desired command even before incorporating neural activity information and allows the tuning model to capture relationships between neural activity and movement at multiple time offsets simultaneously. This new filter was tested in BMI experiments in which rhesus monkeys used their cortical activity, recorded through chronically implanted multielectrode arrays, to directly control computer cursors. The 10th order unscented Kalman filter outperformed the standard Kalman filter and the Wiener filter in both off-line reconstruction of movement trajectories and real-time, closed-loop BMI operation.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter