Catalogue Search | MBRL

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

by Khosla, Aditya , Torralba, Antonio , Cichy, Radoslaw Martin in 631/378/2613/2616 , 631/378/2649/1723 , Adult

2016

The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

Journal Article

Share this book

Add to My Shelf

SUN Database: Exploring a Large Collection of Scene Categories

by Ehinger, Krista A. , Torralba, Antonio , Xiao, Jianxiong in Accuracy , Analysis , Artificial Intelligence

2016

Progress in scene understanding requires reasoning about the rich and diverse visual environments that make up our daily experience. To this end, we propose the Scene Understanding database, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse. The database contains 908 distinct scene categories and 131,072 images. Given this data with both scene and object labels available, we perform in-depth analysis of co-occurrence statistics and the contextual relationship. To better understand this large scale taxonomy of scene categories, we perform two human experiments: we quantify human scene recognition accuracy, and we measure how typical each image is of its assigned scene category. Next, we perform computational experiments: scene recognition with global image features, indoor versus outdoor classification, and “scene detection,” in which we relax the assumption that one image depicts only one scene category. Finally, we relate human experiments to machine performance and explore the relationship between human and machine recognition errors and the relationship between image “typicality” and machine recognition accuracy.

Journal Article

Share this book

Add to My Shelf

Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response

by Mullin, Caitlin , Lahner, Benjamin , Mohsenzadeh, Yalda in Biology and Life Sciences , Brain research , Functional magnetic resonance imaging

2024

Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.

Journal Article

Share this book

Add to My Shelf

Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks

by Martin Cichy, Radoslaw , Khosla, Aditya , Pantazis, Dimitrios in Adult , Brain , Brain Mapping - methods

2017

Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ~100ms, we found a marker of real-world scene size, i.e. spatial layout processing, at ~250ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain.

Journal Article

Share this book

Add to My Shelf

Modeling short visual events through the BOLD moments video fMRI dataset and metadata

by Lahner, Benjamin , Gifford, Alessandro Thomas , Pan, Bowen in 59/36 , 59/57 , 631/378/116/2395

2024

Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos’ extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception. There is a need for extensive neuroimaging datasets to facilitate the study of dynamic human visual perception. Here, the authors present a repository of whole-brain fMRI responses to over 1000 short naturalistic video clips across ten human subjects.

Journal Article

Share this book

Add to My Shelf

feedforward architecture accounts for rapid categorization

by Poggio, Tomaso , Serre, Thomas , Oliva, Aude in Adult , Animal cognition , Animal memory

2007

Primates are remarkably good at recognizing objects. The level of performance of their visual system and its robustness to image degradations still surpasses the best computer vision systems despite decades of engineering effort. In particular, the high accuracy of primates in ultra rapid object categorization and rapid serial visual presentation tasks is remarkable. Given the number of processing stages involved and typical neural latencies, such rapid visual processing is likely to be mostly feedforward. Here we show that a specific implementation of a class of feedforward theories of object recognition (that extend the Hubel and Wiesel simple-to-complex cell hierarchy and account for many anatomical and physiological constraints) can predict the level and the pattern of performance achieved by humans on a rapid masked animal vs. non-animal categorization task.

Journal Article

Share this book

Add to My Shelf

The perceptual neural trace of memorable unseen scenes

by Mullin, Caitlin , Pantazis, Dimitrios , Mohsenzadeh, Yalda in 631/378/2613 , 631/378/2649/1723 , Adolescent

2019

Some scenes are more memorable than others: they cement in minds with consistencies across observers and time scales. While memory mechanisms are traditionally associated with the end stages of perception, recent behavioral studies suggest that the features driving these memorability effects are extracted early on, and in an automatic fashion. This raises the question: is the neural signal of memorability detectable during early perceptual encoding phases of visual processing? Using the high temporal resolution of magnetoencephalography (MEG), during a rapid serial visual presentation (RSVP) task, we traced the neural temporal signature of memorability across the brain. We found an early and prolonged memorability related signal under a challenging ultra-rapid viewing condition, across a network of regions in both dorsal and ventral streams. This enhanced encoding could be the key to successful storage and recognition.

Journal Article

Share this book

Add to My Shelf

Visual long-term memory has a massive storage capacity for object details

by Konkle, Talia , Brady, Timothy F , Alvarez, George A in Adult , Biological Sciences , cognition

2008

One of the major lessons of memory research has been that human memory is fallible, imprecise, and subject to interference. Thus, although observers can remember thousands of images, it is widely assumed that these memories lack detail. Contrary to this assumption, here we show that long-term memory is capable of storing a massive number of objects with details from the image. Participants viewed pictures of 2,500 objects over the course of 5.5 h. Afterward, they were shown pairs of images and indicated which of the two they had seen. The previously viewed item could be paired with either an object from a novel category, an object of the same basic-level category, or the same object in a different state or pose. Performance in each of these conditions was remarkably high (92%, 88%, and 87%, respectively), suggesting that participants successfully maintained detailed representations of thousands of images. These results have implications for cognitive models, in which capacity limitations impose a primary computational constraint (e.g., models of object recognition), and pose a challenge to neural models of memory storage and retrieval, which must be able to account for such a large and detailed storage capacity.

Journal Article

Share this book

Add to My Shelf

The Representation of Simple Ensemble Visual Features outside the Focus of Attention

by Alvarez, George A. , Oliva, Aude in Adolescent , Adult , Attention

2008

The representation of visual information inside the focus of attention is more precise than the representation of information outside the focus of attention. We found that the visual system can compensate for the cost of withdrawing attention by pooling noisy local features and computing summary statistics. The location of an individual object is a local feature, whereas the center of mass of several objects (centroid) is a summary feature representing the mean object location. Three experiments showed that withdrawing attention degraded the representation of individual positions more than the representation of the centroid. It appears that information outside the focus of attention can be represented at an abstract level that lacks local detail, but nevertheless carries a precise statistical summary of the scene. The term ensemble features refers to a broad class of statistical summary features that we propose collectively make up the representation of information outside the focus of attention.

Journal Article

Share this book

Add to My Shelf

Emergence of Visual Center-Periphery Spatial Organization in Deep Convolutional Neural Networks

by Mullin, Caitlin , Lahner, Benjamin , Mohsenzadeh, Yalda in 59/36 , 631/378/116/1925 , 631/378/2649/1723

2020

Research at the intersection of computer vision and neuroscience has revealed hierarchical correspondence between layers of deep convolutional neural networks (DCNNs) and cascade of regions along human ventral visual cortex. Recently, studies have uncovered emergence of human interpretable concepts within DCNNs layers trained to identify visual objects and scenes. Here, we asked whether an artificial neural network (with convolutional structure) trained for visual categorization would demonstrate spatial correspondences with human brain regions showing central/peripheral biases. Using representational similarity analysis, we compared activations of convolutional layers of a DCNN trained for object and scene categorization with neural representations in human brain visual regions. Results reveal a brain-like topographical organization in the layers of the DCNN, such that activations of layer-units with central-bias were associated with brain regions with foveal tendencies (e.g. fusiform gyrus), and activations of layer-units with selectivity for image backgrounds were associated with cortical regions showing peripheral preference (e.g. parahippocampal cortex). The emergence of a categorical topographical correspondence between DCNNs and brain regions suggests these models are a good approximation of the perceptual representation generated by biological neural networks.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter