Catalogue Search | MBRL

A self-supervised domain-general learning framework for human ventral stream representation

by Konkle, Talia , Alvarez, George A. in 59/36 , 631/378/116/2396 , 631/378/2613/2616

2022

Anterior regions of the ventral visual stream encode substantial information about object categories. Are top-down category-level forces critical for arriving at this representation, or can this representation be formed purely through domain-general learning of natural image structure? Here we present a fully self-supervised model which learns to represent individual images, rather than categories, such that views of the same image are embedded nearby in a low-dimensional feature space, distinctly from other recently encountered views. We find that category information implicitly emerges in the local similarity structure of this feature space. Further, these models learn hierarchical features which capture the structure of brain responses across the human ventral visual stream, on par with category-supervised models. These results provide computational support for a domain-general framework guiding the formation of visual representation, where the proximate goal is not explicitly about category information, but is instead to learn unique, compressed descriptions of the visual world. It is unknown whether object category learning can be formed purely through domain general learning of natural image structure. Here the authors show that human visual brain responses to objects are well-captured by self-supervised deep neural network models trained without labels, supporting a domain-general account.

Journal Article

Share this book

Add to My Shelf

Sociality and interaction envelope organize visual action representations

by Konkle, Talia , Tarhan, Leyla in 59/36 , 631/378/116/2395 , 631/378/2649/1723

2020

Humans observe a wide range of actions in their surroundings. How is the visual cortex organized to process this diverse input? Using functional neuroimaging, we measured brain responses while participants viewed short videos of everyday actions, then probed the structure in these responses using voxel-wise encoding modeling. Responses are well fit by feature spaces that capture the body parts involved in an action and the action’s targets (i.e. whether the action was directed at an object, another person, the actor, and space). Clustering analyses reveal five large-scale networks that summarize the voxel tuning: one related to social aspects of an action, and four related to the scale of the interaction envelope, ranging from fine-scale manipulations directed at objects, to large-scale whole-body movements directed at distant locations. We propose that these networks reveal the major representational joints in how actions are processed by visual regions of the brain. How is action perception organized in the brain? Here, the authors report evidence for five networks tuned to actions’ social content and the scale of their effect on the world and propose that sociality and interaction envelope are organizing dimensions of visual action representation.

Journal Article

Share this book

Add to My Shelf

Reliability-based voxel selection

by Konkle, Talia , Tarhan, Leyla in Algorithms , Brain - physiology , Brain mapping

2020

While functional magnetic resonance imaging (fMRI) studies typically measure responses across the whole brain, not all regions are likely to be informative for a given study. Which voxels should be considered? Here we propose a method for voxel selection based on the reliability of the data. This method isolates voxels that respond consistently across imaging runs while maximizing the reliability of multi-voxel patterns across the selected voxels. We estimate that it is suitable for designs with at least 15 conditions. In two example datasets, we found that this proposed method defines a smaller set of voxels than another common method, activity-based voxel selection. Broadly, this method eliminates the need to define regions or statistical thresholds a priori and puts the focus on data reliability as the first step in analyzing fMRI data. •When predicting and mapping voxel responses, which cortex should be considered?•We introduce a method to isolate cortex that responds reliably across fMRI runs.•This method is suitable for condition-rich designs with at least 15 conditions.•Notably, it puts the focus on reliability as the first stage of fMRI data analysis.

Journal Article

Share this book

Add to My Shelf

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines

by Conwell, Colin , Konkle, Talia , Kay, Kendrick N. in 59/57 , 631/378/116/2395 , 631/378/2613/2616

2024

The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity – a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations – suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems. Through controlled model-to-brain comparisons across a large-scale survey of deep neural networks, the authors show the data models are trained on matters far more for downstream brain prediction than design factors such as architecture and training task.

Journal Article

Share this book

Add to My Shelf

The contribution of object size, manipulability, and stability on neural responses to inanimate objects

by Konkle, Talia , Caramazza, Alfonso , Magri, Caterina in Brain mapping , Experiments , Factor analysis

2021

•Examined the relationship between real-world size and motor-relevant properties in the structure of responses to inanimate objects.•Large scale topography was more robust for contrast that followed natural covariance of small motor-relevant vs. large non-motor-relevant, over contrast that went against natural covariance.•Factor analysis revealed that manipulability and stability were, respectively, better explanatory predictors of responses in small- and large-object regions. In human occipitotemporal cortex, brain responses to depicted inanimate objects have a large-scale organization by real-world object size. Critically, the size of objects in the world is systematically related to behaviorally-relevant properties: small objects are often grasped and manipulated (e.g., forks), while large objects tend to be less motor-relevant (e.g., tables), though this relationship does not always have to be true (e.g., picture frames and wheelbarrows). To determine how these two dimensions interact, we measured brain activity with functional magnetic resonance imaging while participants viewed a stimulus set of small and large objects with either low or high motor-relevance. The results revealed that the size organization was evident for objects with both low and high motor-relevance; further, a motor-relevance map was also evident across both large and small objects. Targeted contrasts revealed that typical combinations (small motor-relevant vs. large non-motor-relevant) yielded more robust topographies than the atypical covariance contrast (small non-motor-relevant vs. large motor-relevant). In subsequent exploratory analyses, a factor analysis revealed that the construct of motor-relevance was better explained by two underlying factors: one more related to manipulability, and the other to whether an object moves or is stable. The factor related to manipulability better explained responses in lateral small-object preferring regions, while the factor related to object stability (lack of movement) better explained responses in ventromedial large-object preferring regions. Taken together, these results reveal that the structure of neural responses to objects of different sizes further reflect behavior-relevant properties of manipulability and stability, and contribute to a deeper understanding of some of the factors that help the large-scale organization of object representation in high-level visual cortex.

Journal Article

Share this book

Add to My Shelf

General object-based features account for letter perception

by Deza, Arturo , Konkle, Talia , Hamblin, Chris in Artificial neural networks , Behavior , Biology and Life Sciences

2022

After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or by reusing general visual features previously learned in service of object categorization? To explore this question, we first measured the perceptual similarity of letters in two behavioral tasks, visual search and letter categorization. Then, we trained deep convolutional neural networks on either 26-way letter categorization or 1000-way object categorization, as a way to operationalize possible specialized letter features and general object-based features, respectively. We found that the general object-based features more robustly correlated with the perceptual similarity of letters. We then operationalized additional forms of experience-dependent letter specialization by altering object-trained networks with varied forms of letter training; however, none of these forms of letter specialization improved the match to human behavior. Thus, our findings reveal that it is not necessary to appeal to specialized letter representations to account for perceptual similarity of letters. Instead, we argue that it is more likely that the perception of letters depends on domain-general visual features.

Journal Article

Share this book

Add to My Shelf

Visual long-term memory has a massive storage capacity for object details

by Konkle, Talia , Brady, Timothy F , Alvarez, George A in Adult , Biological Sciences , cognition

2008

One of the major lessons of memory research has been that human memory is fallible, imprecise, and subject to interference. Thus, although observers can remember thousands of images, it is widely assumed that these memories lack detail. Contrary to this assumption, here we show that long-term memory is capable of storing a massive number of objects with details from the image. Participants viewed pictures of 2,500 objects over the course of 5.5 h. Afterward, they were shown pairs of images and indicated which of the two they had seen. The previously viewed item could be paired with either an object from a novel category, an object of the same basic-level category, or the same object in a different state or pose. Performance in each of these conditions was remarkably high (92%, 88%, and 87%, respectively), suggesting that participants successfully maintained detailed representations of thousands of images. These results have implications for cognitive models, in which capacity limitations impose a primary computational constraint (e.g., models of object recognition), and pose a challenge to neural models of memory storage and retrieval, which must be able to account for such a large and detailed storage capacity.

Journal Article

Share this book

Add to My Shelf

Immersive scene representation in human visual cortex with ultra-wide-angle neuroimaging

by Segawa, Jennifer , Mair, Ross , Konkle, Talia in 631/378/2613/2616 , 631/378/2649/1723 , Adult

2024

While human vision spans 220°, traditional functional MRI setups display images only up to central 10-15°. Thus, it remains unknown how the brain represents a scene perceived across the full visual field. Here, we introduce a method for ultra-wide angle display and probe signatures of immersive scene representation. An unobstructed view of 175° is achieved by bouncing the projected image off angled-mirrors onto a custom-built curved screen. To avoid perceptual distortion, scenes are created with wide field-of-view from custom virtual environments. We find that immersive scene representation drives medial cortex with far-peripheral preferences, but shows minimal modulation in classic scene regions. Further, scene and face-selective regions maintain their content preferences even with extreme far-periphery stimulation, highlighting that not all far-peripheral information is automatically integrated into scene regions computations. This work provides clarifying evidence on content vs. peripheral preferences in scene representation and opens new avenues to research immersive vision. How scenes are represented in the brain across the full visual field is unknown. Here, the authors develop a novel method to present wide-angled scenes in an fMRI scanner, finding classic scene regions’ clear preference for image content over peripheral stimulation

Journal Article

Share this book

Add to My Shelf

Ramp-shaped neural tuning supports graded population-level representation of the object-to-scene continuum

by Park, Jeongho , Josephs, Emilie , Konkle, Talia in 631/378/2613/2616 , 631/378/2649/1723 , Brain architecture

2022

We can easily perceive the spatial scale depicted in a picture, regardless of whether it is a small space (e.g., a close-up view of a chair) or a much larger space (e.g., an entire class room). How does the human visual system encode this continuous dimension? Here, we investigated the underlying neural coding of depicted spatial scale, by examining the voxel tuning and topographic organization of brain responses. We created naturalistic yet carefully-controlled stimuli by constructing virtual indoor environments, and rendered a series of snapshots to smoothly sample between a close-up view of the central object and far-scale view of the full environment (object-to-scene continuum). Human brain responses were measured to each position using functional magnetic resonance imaging. We did not find evidence for a smooth topographic mapping for the object-to-scene continuum on the cortex. Instead, we observed large swaths of cortex with opposing ramp-shaped profiles, with highest responses to one end of the object-to-scene continuum or the other, and a small region showing a weak tuning to intermediate scale views. However, when we considered the population code of the entire ventral occipito-temporal cortex, we found smooth and linear representation of the object-to-scene continuum. Our results together suggest that depicted spatial scale information is encoded parametrically in large-scale population codes across the entire ventral occipito-temporal cortex.

Journal Article

Share this book

Add to My Shelf

A feedforward mechanism for human-like contour integration

by Konkle, Talia , Doshi, Fenil R. , Alvarez, George A. in Artificial neural networks , Attention , Biology and Life Sciences

2025

Deep neural network models provide a powerful experimental platform for exploring core mechanisms underlying human visual perception, such as perceptual grouping and contour integration—the process of linking local edge elements to arrive at a unified perceptual representation of a complete contour. Here, we demonstrate that feedforward convolutional neural networks (CNNs) fine-tuned on contour detection show this human-like capacity, but without relying on mechanisms proposed in prior work, such as lateral connections, recurrence, or top-down feedback. We identified two key properties needed for ImageNet pre-trained, feed-forward models to yield human-like contour integration: first, progressively increasing receptive field structure served as a critical architectural motif to support this capacity; and second, biased fine-tuning for contour-detection specifically for gradual curves (~20 degrees) resulted in human-like sensitivity to curvature. We further demonstrate that fine-tuning ImageNet pretrained models uncovers other hidden human-like capacities in feed-forward networks, including uncrowding (reduced interference from distractors as the number of distractors increases), which is considered a signature of human perceptual grouping. Thus, taken together these results provide a computational existence proof that purely feedforward hierarchical computations are capable of implementing gestalt “good continuation” and perceptual organization needed for human-like contour-integration and uncrowding. More broadly, these results raise the possibility that in human vision, later stages of processing play a more prominent role in perceptual-organization than implied by theories focused on recurrence and early lateral connections.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter