Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
357 result(s) for "scene construction"
Sort by:
A neural-level model of spatial memory and imagery
We present a model of how neural representations of egocentric spatial experiences in parietal cortex interface with viewpoint-independent representations in medial temporal areas, via retrosplenial cortex, to enable many key aspects of spatial cognition. This account shows how previously reported neural responses (place, head-direction and grid cells, allocentric boundary- and object-vector cells, gain-field neurons) can map onto higher cognitive function in a modular way, and predicts new cell types (egocentric and head-direction-modulated boundary- and object-vector cells). The model predicts how these neural populations should interact across multiple brain regions to support spatial memory, scene construction, novelty-detection, ‘trace cells’, and mental navigation. Simulated behavior and firing rate maps are compared to experimental data, for example showing how object-vector cells allow items to be remembered within a contextual representation based on environmental boundaries, and how grid cells could update the viewpoint in imagery during planning and short-cutting by driving sequential place cell activity.
Unrestricted eye movements strengthen effective connectivity from hippocampal to oculomotor regions during scene construction
•The role of eye movements in mentally constructing scene imagery was investigated.•Restricting eye movements reduced vividness of constructed scene imagery.•Making eye movements strengthened connectivity from memory to oculomotor regions. Scene construction is a key component of memory recall, navigation, and future imagining, and relies on the medial temporal lobes (MTL). A parallel body of work suggests that eye movements may enable the imagination and construction of scenes, even in the absence of external visual input. There are vast structural and functional connections between regions of the MTL and those of the oculomotor system. However, the directionality of connections between the MTL and oculomotor control regions, and how it relates to scene construction, has not been studied directly in human neuroimaging. In the current study, we used dynamic causal modeling (DCM) to interrogate effective connectivity between the MTL and oculomotor regions using a scene construction task in which participants’ eye movements were either restricted (fixed-viewing) or unrestricted (free-viewing). By omitting external visual input, and by contrasting free- versus fixed- viewing, the directionality of neural connectivity during scene construction could be determined. As opposed to when eye movements were restricted, allowing free-viewing during construction of scenes strengthened top-down connections from the MTL to the frontal eye fields, and to lower-level cortical visual processing regions, suppressed bottom-up connections along the visual stream, and enhanced vividness of the constructed scenes. Taken together, these findings provide novel, non-invasive evidence for the underlying, directional, connectivity between the MTL memory system and oculomotor system associated with constructing vivid mental representations of scenes.
A BIM-Guided Virtual-to-Real Framework for Component-Level Semantic Segmentation of Construction Site Point Clouds
LiDAR point cloud semantic segmentation is pivotal for scan-to-BIM workflows; however, contemporary deep learning approaches remain constrained by their reliance on extensive annotated datasets, which are challenging to acquire in actual construction environments due to prohibitive labeling costs, structural occlusion, and sensor noise. This study proposes a BIM-guided Virtual-to-Real (V2R) framework that requires no real annotations. The method is trained entirely on a large synthetic point cloud (SPC) dataset consisting of 132 scans and approximately 8.75×109 points, generated directly from BIM models with component-level labels. A multi-feature fusion network combines the global contextual modeling of PCT with the local geometric encoding of PointNet++, producing robust representations across scales. A learnable point cloud augmentation module and multi-level domain adaptation strategies are incorporated to mitigate differences in noise, density, occlusion, and structural variation between synthetic and real scans. Experiments on real construction floors from high-rise residential buildings, together with the BIM-Net benchmark, show that the proposed method achieves 70.89% overall accuracy, 53.14% mean IoU, 69.67% mean accuracy, 54.75% FWIoU, and 59.66% Cohen’s κ, consistently outperforming baseline models. The Fusion model achieves 73 of 80 best scene–metric results and 31 of 70 best component-level scores, demonstrating stable performance across the evaluated scenes and floors. These results confirm the effectiveness of BIM-generated SPC and indicate the potential of the V2R framework for BIM–reality updates and automated site monitoring within similar building contexts.
Does hippocampal volume explain performance differences on hippocampal-dependant tasks?
•Evidence is mixed about whether hippocampal volume affects cognitive task performance.•This is particularly the case concerning individual differences in healthy people.•We collected structural MRI data from 217 healthy people.•They also had widely-varying performance on cognitive tasks linked to the hippocampus.•In-depth analyses showed little evidence hippocampal volume affected task performance. Marked disparities exist across healthy individuals in their ability to imagine scenes, recall autobiographical memories, think about the future and navigate in the world. The importance of the hippocampus in supporting these critical cognitive functions has prompted the question of whether differences in hippocampal grey matter volume could be one source of performance variability. Evidence to date has been somewhat mixed. In this study we sought to mitigate issues that commonly affect these types of studies. Data were collected from a large sample of 217 young, healthy adult participants, including whole brain structural MRI data (0.8 mm isotropic voxels) and widely-varying performance on scene imagination, autobiographical memory, future thinking and navigation tasks. We found little evidence that hippocampal grey matter volume was related to task performance in this healthy sample. This was the case using different analysis methods (voxel-based morphometry, partial correlations), when whole brain or hippocampal regions of interest were examined, when comparing different sub-groups (divided by gender, task performance, self-reported ability), and when using latent variables derived from across the cognitive tasks. Hippocampal grey matter volume may not, therefore, significantly influence performance on tasks known to require the hippocampus in healthy people. Perhaps only in extreme situations, as in the case of licensed London taxi drivers, are measurable ability-related hippocampus volume changes consistently exhibited.
Imaging the human hippocampus with optically-pumped magnetoencephalography
Optically-pumped (OP) magnetometers allow magnetoencephalography (MEG) to be performed while a participant’s head is unconstrained. To fully leverage this new technology, and in particular its capacity for mobility, the activity of deep brain structures which facilitate explorative behaviours such as navigation, must be detectable using OP-MEG. One such crucial brain region is the hippocampus. Here we had three healthy adult participants perform a hippocampal-dependent task – the imagination of novel scene imagery – while being scanned using OP-MEG. A conjunction analysis across these three participants revealed a significant change in theta power in the medial temporal lobe. The peak of this activated cluster was located in the anterior hippocampus. We repeated the experiment with the same participants in a conventional SQUID-MEG scanner and found similar engagement of the medial temporal lobe, also with a peak in the anterior hippocampus. These OP-MEG findings indicate exciting new opportunities for investigating the neural correlates of a range of crucial cognitive functions in naturalistic contexts including spatial navigation, episodic memory and social interactions. •Optically-pumped (OP) magnetometers are advancing magnetoencephalography (MEG).•OP-MEG brain scanning can be performed while the head is unconstrained.•We combined OP-MEG with a hippocampal-dependent task.•Task-related changes in theta power in the hippocampus were detectable.•We observed similar hippocampal engagement in the same participants using SQUID-MEG.
Style-Enhanced Transformer for Image Captioning in Construction Scenes
Image captioning is important for improving the intelligence of construction projects and assisting managers in mastering construction site activities. However, there are few image-captioning models for construction scenes at present, and the existing methods do not perform well in complex construction scenes. According to the characteristics of construction scenes, we label a text description dataset based on the MOCS dataset and propose a style-enhanced Transformer for image captioning in construction scenes, simply called SETCAP. Specifically, we extract the grid features using the Swin Transformer. Then, to enhance the style information, we not only use the grid features as the initial detail semantic features but also extract style information by style encoder. In addition, in the decoder, we integrate the style information into the text features. The interaction between the image semantic information and the text features is carried out to generate content-appropriate sentences word by word. Finally, we add the sentence style loss into the total loss function to make the style of generated sentences closer to the training set. The experimental results show that the proposed method achieves encouraging results on both the MSCOCO and the MOCS datasets. In particular, SETCAP outperforms state-of-the-art methods by 4.2% CIDEr scores on the MOCS dataset and 3.9% CIDEr scores on the MSCOCO dataset, respectively.
Hardware Implementation of Improved Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping Version 2
The field of autonomous driving has seen continuous advances, yet achieving higher levels of automation in real-world applications remains challenging. A critical requirement for autonomous navigation is accurate map construction, particularly in novel and unstructured environments. In recent years, Simultaneous Localization and Mapping (SLAM) has evolved to support diverse sensor modalities, with some implementations incorporating machine learning to improve performance. However, these approaches often demand substantial computational resources. The key challenge lies in achieving efficiency within resource-constrained environments while minimizing errors that could degrade downstream tasks. This paper presents an enhanced ORB-SLAM2 (Oriented FAST and Rotated BRIEF Simultaneous Localization and Mapping, version 2) algorithm implemented on a Raspberry Pi 3 (ARM A53 CPU) to improve mapping performance under limited computational resources. ORB-SLAM2 comprises four main stages: Tracking, Local Mapping, Loop Closing, and Full Bundle Adjustment (BA). The proposed improvements include employing a more efficient feature descriptor to increase stereo feature-matching rates and optimizing loop-closing parameters to reduce accumulated errors. Experimental results demonstrate that the proposed system achieves notable improvements on the Raspberry Pi 3 platform. For monocular SLAM, RMSE is reduced by 18.11%, mean error by 22.97%, median error by 29.41%, and maximum error by 17.18%. For stereo SLAM, RMSE decreases by 0.30% and mean error by 0.38%. Furthermore, the ROS topic frequency stabilizes at 10 Hz, with quad-core CPU utilization averaging approximately 90%. These results indicate that the system satisfies real-time requirements while maintaining a balanced trade-off between accuracy and computational efficiency under resource constraints.
Scene Construction, Visual Foraging, and Active Inference
This paper describes an active inference scheme for visual searches and the perceptual synthesis entailed by scene construction. Active inference assumes that perception and action minimize variational free energy, where actions are selected to minimize the free energy expected in the future. This assumption generalizes risk-sensitive control and expected utility theory to include epistemic value; namely, the value (or salience) of information inherent in resolving uncertainty about the causes of ambiguous cues or outcomes. Here, we apply active inference to saccadic searches of a visual scene. We consider the (difficult) problem of categorizing a scene, based on the spatial relationship among visual objects where, crucially, visual cues are sampled myopically through a sequence of saccadic eye movements. This means that evidence for competing hypotheses about the scene has to be accumulated sequentially, calling upon both prediction (planning) and postdiction (memory). Our aim is to highlight some simple but fundamental aspects of the requisite functional anatomy; namely, the link between approximate Bayesian inference under mean field assumptions and functional segregation in the visual cortex. This link rests upon the (neurobiologically plausible) process theory that accompanies the normative formulation of active inference for Markov decision processes. In future work, we hope to use this scheme to model empirical saccadic searches and identify the prior beliefs that underwrite intersubject variability in the way people forage for information in visual scenes (e.g., in schizophrenia).
How sports event scenarios shape urban image and influence audience conative tendency
This study investigates how large-scale sports events influence urban image construction and audience behavioral responses through the logical chain of “scene cognition–emotional identification–conative tendency.” Drawing on empirical data from the 2025 Wuxi Marathon (valid sample N  = 438), the event scenario is conceptualized across four dimensions: role participation, environmental perception, activity evaluation, and cultural identification. Integrating the cognition–affection–conation (C–A–C) model, a research framework was constructed that incorporates emotional mediation and motivational moderation. Results from structural equation modeling indicate that: (1) scene cognition significantly enhances both event-related and city-related emotional identification, though the magnitude of effects varies across dimensions; (2) emotional identification mediates the relationship between scene cognition and conative tendency, with city-related emotional identification exerting the strongest effect; (3) participation motivation positively moderates the link between scene cognition and conative tendency. Theoretically, this research extends the applicability of scene theory to the study of sports events and advances the use of the C–A–C model; practically, it offers managerial implications for event organizers and urban policymakers in scenario construction, brand building, and audience cultivation.
Enabling High-Level Worker-Centric Semantic Understanding of Onsite Images Using Visual Language Models with Attention Mechanism and Beam Search Strategy
Visual information is becoming increasingly essential in construction management. However, a significant portion of this information remains underutilized by construction managers due to the limitations of existing image processing algorithms. These algorithms primarily rely on low-level visual features and struggle to capture high-order semantic information, leading to a gap between computer-generated image semantics and human interpretation. However, current research lacks a comprehensive justification for the necessity of employing scene understanding algorithms to address this issue. Moreover, the absence of large-scale, high-quality open-source datasets remains a major obstacle, hindering further research progress and algorithmic optimization in this field. To address this issue, this paper proposes a construction scene visual language model based on attention mechanism and encoder–decoder architecture, with the encoder built using ResNet101 and the decoder built using LSTM (long short-term memory). The addition of the attention mechanism and beam search strategy improves the model, making it more accurate and generalizable. To verify the effectiveness of the proposed method, a publicly available construction scene visual-language dataset containing 16 common construction scenes, SODA-ktsh, is built and verified. The experimental results demonstrate that the proposed model achieves a BLEU-4 score of 0.7464, a CIDEr score of 5.0255, and a ROUGE_L score of 0.8106 on the validation set. These results indicate that the model effectively captures and accurately describes the complex semantic information present in construction images. Moreover, the model exhibits strong generalization, perceptual, and recognition capabilities, making it well suited for interpreting and analyzing intricate construction scenes.