Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
27
result(s) for
"Photography Masking."
Sort by:
Real-Time Aerial Multispectral Object Detection with Dynamic Modality-Balanced Pixel-Level Fusion
2025
Aerial object detection plays a critical role in numerous fields, utilizing the flexibility of airborne platforms to achieve real-time tasks. Combining visible and infrared sensors can overcome limitations under low-light conditions, enabling full-time tasks. While feature-level fusion methods exhibit comparable performances in visible–infrared multispectral object detection, they suffer from heavy model size, inadequate inference speed and visible light preferences caused by inherent modality imbalance, limiting their applications in airborne platform deployment. To address these challenges, this paper proposes a YOLO-based real-time multispectral fusion framework combining pixel-level fusion with dynamic modality-balanced augmentation called Full-time Multispectral Pixel-wise Fusion Network (FMPFNet). Firstly, we introduce the Multispectral Luminance Weighted Fusion (MLWF) module consisting of attention-based modality reconstruction and feature fusion. By leveraging YUV color space transformation, this module efficiently fuses RGB and IR modalities while minimizing computational overhead. We also propose the Dynamic Modality Dropout and Threshold Masking (DMDTM) strategy, which balances modality attention and improves detection performance in low-light scenarios. Additionally, we refine our model to enhance the detection of small rotated objects, a requirement commonly encountered in aerial detection applications. Experimental results on the DroneVehicle dataset demonstrate that our FMPFNet achieves 76.80% mAP50 and 132 FPS, outperforming state-of-the-art feature-level fusion methods in both accuracy and inference speed.
Journal Article
Landslide Displacement Prediction via Attentive Graph Neural Network
2022
Landslides are among the most common geological hazards that result in considerable human and economic losses globally. Researchers have put great efforts into addressing the landslide prediction problem for decades. Previous methods either focus on analyzing the landslide inventory maps obtained from aerial photography and satellite images or propose machine learning models—trained on historical land deformation data—to predict future displacement and sedimentation. However, existing approaches generally fail to capture complex spatial deformations and their inter-dependencies in different areas. This work presents a novel landslide prediction model based on graph neural networks, which utilizes graph convolutions to aggregate spatial correlations among different monitored locations. Besides, we introduce a novel locally historical transformer network to capture dynamic spatio-temporal relations and predict the surface deformation. We conduct extensive experiments on real-world data and demonstrate that our model significantly outperforms state-of-the-art approaches in terms of prediction accuracy and model interpretations.
Journal Article
Quality Index for Stereoscopic Images by Separately Evaluating Adding and Subtracting
2015
The human visual system (HVS) plays an important role in stereo image quality perception. Therefore, it has aroused many people's interest in how to take advantage of the knowledge of the visual perception in image quality assessment models. This paper proposes a full-reference metric for quality assessment of stereoscopic images based on the binocular difference channel and binocular summation channel. For a stereo pair, the binocular summation map and binocular difference map are computed first by adding and subtracting the left image and right image. Then the binocular summation is decoupled into two parts, namely additive impairments and detail losses. The quality of binocular summation is obtained as the adaptive combination of the quality of detail losses and additive impairments. The quality of binocular summation is computed by using the Contrast Sensitivity Function (CSF) and weighted multi-scale (MS-SSIM). Finally, the quality of binocular summation and binocular difference is integrated into an overall quality index. The experimental results indicate that compared with existing metrics, the proposed metric is highly consistent with the subjective quality assessment and is a robust measure. The result have also indirectly proved hypothesis of the existence of binocular summation and binocular difference channels.
Journal Article
A transformer-based UAV instance segmentation model TF-YOLOv7
2024
In a dense target scenario in a real city, how to efficiently achieve the labeling of different targets and overcome the problem of mutual occlusion caused by the dense targets in the process becomes the key point of our UAV instance segmentation, therefore, to address the problem of mutual occlusion of targets in UAV instance segmentation, this paper proposes a model for UAV instance segmentation TF-YOLOv7. This model introduced Swin Transformer structure in the backbone network to construct a hierarchical feature map by fusing deep network feature blocks, which is well suited for the dense recognition task of instance segmentation. In addition, the Bottleneck Transformer structure was introduced in the detection stage to recognize the abstract information of the underlying features using convolution, and the higher-level information obtained through the convolution layer is processed using the self-attention mechanism, which could effectively handle large resolution images. Finally, the Focal-EioU loss function was introduced to further optimize the masking performance in mutually occluded small targets for the masking problem in occluded target segmentation and improve the segmentation effect on occluded targets. Through experimental validation on the UAV aerial photography dataset VisDroneDET, our proposed model has a 2.2% performance improvement compared with the benchmark model YOLOv7, proving that the model is suitable for UAV instance segmentation tasks.
Journal Article
Physics-Prior-Guided Feature Pyramid Network for Unified Multi-Angle Spectral–Polarimetric Cloud Detection
by
Gan, Yongyin
,
Wang, Xinqiang
,
Ji, Xingyuan
in
Algorithms
,
Artificial intelligence
,
attention mechanism
2026
Accurate cloud detection remains a significant challenge due to the spectral ambiguity between clouds and bright or heterogeneous surfaces (e.g., snow, desert). While multi-angle and polarization data offer rich information, the discriminative power of joint spectral analysis for resolving these ambiguities has been underexploited. In this work, we demonstrate that physically motivated spectral band ratios and differences can robustly enhance cloud signatures. Motivated by this insight, we propose a novel deep learning framework, the Multi-angle Polarization Feature Pyramid Structure (MP-FPS), that explicitly leverages joint spectral features as discriminative priors. Our architecture employs a dual-branch network to disentangle and adaptively fuse spectral and multi-angle polarization modalities. Within this framework, a hierarchical, multi-scale cross-channel multi-angle fusion module dynamically captures spatial–spectral–angular dependencies, enriching the structural representation of clouds. Furthermore, a channel-space dual-path attention mechanism refines sub-pixel responses, significantly improving detection accuracy in challenging regions such as cloud edges and thin cirrus. Evaluated on the global POLDER-3 dataset, MP-FPS achieves a mean Intersection over Union (mIoU) of 0.8662 across diverse surface types, surpassing the official baseline by 12.4%. This study establishes joint spectral analysis as a critical enabler for high-precision cloud masking, and demonstrates its synergistic value when integrated with multi-angle polarimetric information in a unified deep architecture.
Journal Article
CAT: Causal Attention with Linear Complexity for Efficient and Interpretable Hyperspectral Image Classification
2026
Hyperspectral image (HSI) classification is pivotal in remote sensing, yet deep learning models, particularly Transformers, remain susceptible to spurious spectral–spatial correlations and suffer from limited interpretability. These issues stem from their inability to model the underlying causal structure in high-dimensional data. This paper introduces the Causal Attention Transformer (CAT), a novel architecture that integrates causal inference with a hierarchical CNN-Transformer backbone to address these limitations. CAT incorporates three key modules: (1) a Causal Attention Mechanism that enforces temporal and spatial causality via triangular masking and axial decomposition to eliminate spurious dependencies; (2) a Dual-Path Hierarchical Fusion module that adaptively integrates spectral and spatial causal features using learnable gating; and (3) a Linearized Causal Attention module that reduces the computational complexity from O(N2) to O(N) via kernelized cumulative summation, enabling scalable high-resolution HSI processing. Extensive experiments on three benchmark datasets (Indian Pines, Pavia University, Houston2013) demonstrate that CAT achieves state-of-the-art performance, outperforming leading CNN and Transformer models in both accuracy and robustness. Furthermore, CAT provides inherently interpretable spectral–spatial causal maps, offering valuable insights for reliable remote sensing analysis.
Journal Article
Tree-CRowNN: A Network for Estimating Forest Stand Density from VHR Aerial Imagery
by
Zhang, Ying
,
Richardson, Galen
,
Richardson, Elisha
in
Aerial photography
,
Annotations
,
Artificial intelligence
2023
Estimating the number of trees within a forest stand, i.e., the forest stand density (FSD), is challenging at large scales. Recently, researchers have turned to a combination of remote sensing and machine learning techniques to derive these estimates. However, in most cases, the developed models rely heavily upon additional data such as LiDAR-based elevations or multispectral information and are mostly applied to managed environments rather than natural/mixed forests. Furthermore, they often require the time-consuming manual digitization or masking of target features, or an annotation using a bounding box rather than a simple point annotation. Here, we introduce the Tree Convolutional Row Neural Network (Tree-CRowNN), an alternative model for tree counting inspired by Multiple-Column Neural Network architecture to estimate the FSD over 12.8 m × 12.8 m plots from high-resolution RGB aerial imagery. Our model predicts the FSD with very high accuracy (MAE: ±2.1 stems/12.8 m2, RMSE: 3.0) over a range of forest conditions and shows promise in linking to Sentinel-2 imagery for broad-scale mapping (R2: 0.43, RMSE: 3.9 stems/12.8 m2). We believe that the satellite imagery linkage will be strengthened with future efforts, and transfer learning will enable the Tree-CRowNN model to predict the FSD accurately in other ecozones.
Journal Article
How ubiquitous is the direct-gaze advantage? Evidence for an averted-gaze advantage in a gaze-discrimination task
by
Huestegge, Lynn
,
Riechelmann, Eva
,
Böckler, Anne
in
Advantages
,
Attention
,
Behavioral Science and Psychology
2021
Human eye gaze conveys an enormous amount of socially relevant information, and the rapid assessment of gaze direction is of particular relevance in order to adapt behavior accordingly. Specifically, previous research demonstrated evidence for an advantage of processing direct (vs. averted) gaze. The present study examined discrimination performance for gaze direction (direct vs. averted) under controlled presentation conditions: Using a backward-masking gaze-discrimination task, photographs of faces with direct and averted gaze were briefly presented, followed by a mask stimulus. Additionally, effects of facial context on gaze discrimination were assessed by either presenting gaze direction in isolation (i.e., by only showing the eye region) or in the context of an upright or inverted face. Across three experiments, we consistently observed a
facial context effect
with highest discrimination performance for faces presented in upright position, lower performance for inverted faces, and lowest performance for eyes presented in isolation. Additionally, averted gaze was generally responded to faster and with higher accuracy than direct gaze, indicating an
averted-gaze advantage
. Overall, the results suggest that direct gaze is not generally associated with processing advantages, thereby highlighting the important role of presentation conditions and task demands in gaze perception.
Journal Article
Zebra Stripes through the Eyes of Their Predators, Zebras, and Humans
by
Kline, Donald W.
,
Hiramatsu, Chihiro
,
Caro, Tim
in
Animals
,
Biology and Life Sciences
,
Camouflage
2016
The century-old idea that stripes make zebras cryptic to large carnivores has never been examined systematically. We evaluated this hypothesis by passing digital images of zebras through species-specific spatial and colour filters to simulate their appearance for the visual systems of zebras' primary predators and zebras themselves. We also measured stripe widths and luminance contrast to estimate the maximum distances from which lions, spotted hyaenas, and zebras can resolve stripes. We found that beyond ca. 50 m (daylight) and 30 m (twilight) zebra stripes are difficult for the estimated visual systems of large carnivores to resolve, but not humans. On moonless nights, stripes are difficult for all species to resolve beyond ca. 9 m. In open treeless habitats where zebras spend most time, zebras are as clearly identified by the lion visual system as are similar-sized ungulates, suggesting that stripes cannot confer crypsis by disrupting the zebra's outline. Stripes confer a minor advantage over solid pelage in masking body shape in woodlands, but the effect is stronger for humans than for predators. Zebras appear to be less able than humans to resolve stripes although they are better than their chief predators. In conclusion, compared to the uniform pelage of other sympatric herbivores it appears highly unlikely that stripes are a form of anti-predator camouflage.
Journal Article