Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
1,037 result(s) for "Scene perception"
Sort by:
The Two-Body Inversion Effect
How does one perceive groups of people? It is known that functionally interacting objects (e.g., a glass and a pitcher tilted as if pouring water into it) are perceptually grouped. Here, we showed that processing of multiple human bodies is also influenced by their relative positioning. In a series of categorization experiments, bodies facing each other (seemingly interacting) were recognized more accurately than bodies facing away from each other (noninteracting). Moreover, recognition of facing body dyads (but not nonfacing body dyads) was strongly impaired when those stimuli were inverted, similar to what has been found for individual bodies. This inversion effect demonstrates sensitivity of the visual system to facing body dyads in their common upright configuration and might imply recruitment of configural processing (i.e., processing of the overall body configuration without prior part-by-part analysis). These findings suggest that facing dyads are represented as one structured unit, which may be the intermediate level of representation between multiple-object (body) perception and representation of social actions.
ST‐SIGMA: Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving (AD) system. However, most proposed methods aim at addressing one of the two challenges mentioned above with a single model. To tackle this dilemma, this paper proposes spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting (ST‐SIGMA), an efficient end‐to‐end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework. ST‐SIGMA adopts a trident encoder–decoder architecture to learn scene semantics and agent interaction information on bird’s‐eye view (BEV) maps simultaneously. Specifically, an iterative aggregation network is first employed as the scene semantic encoder (SSE) to learn diverse scene information. To preserve dynamic interactions of traffic agents, ST‐SIGMA further exploits a spatio‐temporal graph network as the graph interaction encoder. Meanwhile, a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed. Extensive experiments on the nuScenes data set have demonstrated that the proposed ST‐SIGMA achieves significant improvements compared to the state‐of‐the‐art (SOTA) methods in terms of scene perception and trajectory forecasting, respectively. Therefore, the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in real‐world AD scenarios.
Generation of stable heading representations in diverse visual scenes
Many animals rely on an internal heading representation when navigating in varied environments 1 – 10 . How this representation is linked to the sensory cues that define different surroundings is unclear. In the fly brain, heading is represented by ‘compass’ neurons that innervate a ring-shaped structure known as the ellipsoid body 3 , 11 , 12 . Each compass neuron receives inputs from ‘ring’ neurons that are selective for particular visual features 13 – 16 ; this combination provides an ideal substrate for the extraction of directional information from a visual scene. Here we combine two-photon calcium imaging and optogenetics in tethered flying flies with circuit modelling, and show how the correlated activity of compass and visual neurons drives plasticity 17 – 22 , which flexibly transforms two-dimensional visual cues into a stable heading representation. We also describe how this plasticity enables the fly to convert a partial heading representation, established from orienting within part of a novel setting, into a complete heading representation. Our results provide mechanistic insight into the memory-related computations that are essential for flexible navigation in varied surroundings. Two-photon calcium imaging and optogenetic experiments in tethered flying flies, combined with modelling, demonstrate how the correlation of compass and visual neurons underpins plasticity that enables the transformation of visual cues into stable heading representations.
Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior
Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.
Differential Electrophysiological Signatures of Semantic and Syntactic Scene Processing
In sentence processing, semantic and syntactic violations elicit differential brain responses observable in event-related potentials: An N400 signals semantic violations, whereas a P600 marks inconsistent syntactic structure. Does the brain register similar distinctions in scene perception? To address this question, we presented participants with semantic inconsistencies, in which an object was incongruent with a scene's meaning, and syntactic inconsistencies, in which an object violated structural rules. We found a clear dissociation between semantic and syntactic processing: Semantic inconsistencies produced negative deflections in the N300-N400 time window, whereas mild syntactic inconsistencies elicited a late positivity resembling the P600 found for syntactic inconsistencies in sentence processing. Extreme syntactic violations, such as a hovering beer bottle defying gravity, were associated with earlier perceptual processing difficulties reflected in the N300 response, but failed to produce a P600 effect. We therefore conclude that different neural populations are active during semantic and syntactic processing of scenes, and that syntactically impossible object placements are processed in a categorically different manner than are syntactically resolvable object misplacements.
J. J. Gibson’s “Ground Theory of Space Perception”
J. J. Gibson's ground theory of space perception is contrasted with Descartes’ theory, which reduces all of space perception to the perception of distance and angular direction, relative to an abstract viewpoint. Instead, Gibson posits an embodied perceiver, grounded by gravity, in a stable layout of realistically textured, extended surfaces and more delimited objects supported by these surfaces. Gibson's concept of optical contact ties together this spatial layout, locating each surface relative to the others and specifying the position of each object by its location relative to its surface of support. His concept of surface texture—augmented by perspective structures such as the horizon—specifies the scale of objects and extents within this layout. And his concept of geographical slant provides surfaces with environment-centered orientations that remain stable as the perceiver moves around. Contact-specified locations on extended environmental surfaces may be the unattended primitives of the visual world, rather than egocentric or allocentric distances. The perception of such distances may best be understood using Gibson's concept of affordances. Distances may be perceived only as needed, bound through affordances to the particular actions that require them.
Do street-level scene perceptions affect housing prices in Chinese megacities? An analysis using open access datasets and deep learning
Many studies have explored the relationship between housing prices and environmental characteristics using the hedonic price model (HPM). However, few studies have deeply examined the impact of scene perception near residential units on housing prices. This article used house purchasing records from FANG.com and open access geolocation data (including massive street view pictures, point of interest (POI) data and road network data) and proposed a framework named \"open-access-dataset-based hedonic price modeling (OADB-HPM)\" for comprehensive analysis in Beijing and Shanghai, China. A state-of-the-art deep learning framework and massive Baidu street view panoramas were employed to visualize and quantify three major scene perception characteristics (greenery, sky and building view indexes, abbreviated GVI, SVI and BVI, respectively) at the street level. Then, the newly introduced scene perception characteristics were combined with other traditional characteristics in the HPM to calculate marginal prices, and the results for Beijing and Shanghai were explored and compared. The empirical results showed that the greenery and sky perceptual elements at the property level can significantly increase the housing price in Beijing (RMB 39,377 and 6011, respectively) and Shanghai (RMB 21,689 and 2763, respectively), indicating an objectively higher willingness by buyers to pay for houses that provide the ability to perceive natural elements in the surrounding environment. This study developed quantification tools to help decision makers and planners understand and analyze the interaction between residents and urban scene components.
Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks
Human scene recognition is a rapid multistep process evolving over time from single scene image to spatial layout processing. We used multivariate pattern analyses on magnetoencephalography (MEG) data to unravel the time course of this cortical process. Following an early signal for lower-level visual analysis of single scenes at ~100ms, we found a marker of real-world scene size, i.e. spatial layout processing, at ~250ms indexing neural representations robust to changes in unrelated scene properties and viewing conditions. For a quantitative model of how scene size representations may arise in the brain, we compared MEG data to a deep neural network model trained on scene classification. Representations of scene size emerged intrinsically in the model, and resolved emerging neural scene size representation. Together our data provide a first description of an electrophysiological signal for layout processing in humans, and suggest that deep neural networks are a promising framework to investigate how spatial layout representations emerge in the human brain.
Behavioral and neural markers of visual configural processing in social scene perception
•The visual processing of minimal social (two-body) scenes is investigated.•Behavioral and neural effects show signatures of visual configural processing.•Individual performance in body-dyad perception is reliable and stable-over-time.•And it predicts the individual's social sensitivity.•Body-dyad perception exposes basic processes that may contribute to social cognition. Research on face perception has revealed highly specialized visual mechanisms such as configural processing, and provided markers of interindividual differences –including disease risks and alterations– in visuo-perceptual abilities that traffic in social cognition. Is face perception unique in degree or kind of mechanisms, and in its relevance for social cognition? Combining functional MRI and behavioral methods, we address the processing of an uncharted class of socially relevant stimuli: minimal social scenes involving configurations of two bodies spatially close and face-to-face as if interacting (hereafter, facing dyads). We report category-specific activity for facing (vs. non-facing) dyads in visual cortex. That activity shows face-like signatures of configural processing –i.e., stronger response to facing (vs. non-facing) dyads, and greater susceptibility to stimulus inversion for facing (vs. non-facing) dyads–, and is predicted by performance-based measures of configural processing in visual perception of body dyads. Moreover, we observe that the individual performance in body-dyad perception is reliable, stable-over-time and correlated with the individual social sensitivity, coarsely captured by the Autism-Spectrum Quotient. Further analyses clarify the relationship between single-body and body-dyad perception. We propose that facing dyads are processed through highly specialized mechanisms –and brain areas–, analogously to other biologically and socially relevant stimuli such as faces. Like face perception, facing-dyad perception can reveal basic (visual) processes that lay the foundations for understanding others, their relationships and interactions.
The role of meaning in attentional guidance during free viewing of real-world scenes
In real-world vision, humans prioritize the most relevant visual information at the expense of other information via attentional selection. The current study sought to understand the role of semantic features and image features on attentional selection during free viewing of real-world scenes. We compared the ability of meaning maps generated from ratings of isolated, context-free image patches and saliency maps generated from the Graph-Based Visual Saliency model to predict the spatial distribution of attention in scenes as measured by eye movements. Additionally, we introduce new contextualized meaning maps in which scene patches were rated based upon how informative or recognizable they were in the context of the scene from which they derived. We found that both context-free and contextualized meaning explained significantly more of the overall variance in the spatial distribution of attention than image salience. Furthermore, meaning explained early attention to a significantly greater extent than image salience, contrary to predictions of the 'saliency first' hypothesis. Finally, both context-free and contextualized meaning predicted attention equivalently. These results support theories in which meaning plays a dominant role in attentional guidance during free viewing of real-world scenes.