Catalogue Search | MBRL

Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images

by Qiao, Mengjia , Zhou, Tao , Zhang, Beibei in Accuracy , adaptive weighting , Architecture

2024

The precise building extraction from high-resolution remote sensing images holds significant application for urban planning, resource management, and environmental conservation. In recent years, deep neural networks (DNNs) have garnered substantial attention for their adeptness in learning and extracting features, becoming integral to building extraction methodologies and yielding noteworthy performance outcomes. Nonetheless, prevailing DNN-based models for building extraction often overlook spatial information during the feature extraction phase. Additionally, many existing models employ a simplistic and direct approach in the feature fusion stage, potentially leading to spurious target detection and the amplification of internal noise. To address these concerns, we present a multi-scale attention network (MSANet) tailored for building extraction from high-resolution remote sensing images. In our approach, we initially extracted multi-scale building feature information, leveraging the multi-scale channel attention mechanism and multi-scale spatial attention mechanism. Subsequently, we employed adaptive hierarchical weighting processes on the extracted building features. Concurrently, we introduced a gating mechanism to facilitate the effective fusion of multi-scale features. The efficacy of the proposed MSANet was evaluated using the WHU aerial image dataset and the WHU satellite image dataset. The experimental results demonstrate compelling performance metrics, with the F1 scores registering at 93.76% and 77.64% on the WHU aerial imagery dataset and WHU satellite dataset II, respectively. Furthermore, the intersection over union (IoU) values stood at 88.25% and 63.46%, surpassing benchmarks set by DeepLabV3 and GSMC.

Journal Article

Share this book

Add to My Shelf

Remote Sensing Imagery Super Resolution Based on Adaptive Multi-Scale Feature Fusion Network

by Wu, Yingdan , Wang, Xinying , Ming, Yang in adaptive multi-scale feature fusion , remote sensing imagery , super-resolution

2020

Due to increasingly complex factors of image degradation, inferring high-frequency details of remote sensing imagery is more difficult compared to ordinary digital photos. This paper proposes an adaptive multi-scale feature fusion network (AMFFN) for remote sensing image super-resolution. Firstly, the features are extracted from the original low-resolution image. Then several adaptive multi-scale feature extraction (AMFE) modules, the squeeze-and-excited and adaptive gating mechanisms are adopted for feature extraction and fusion. Finally, the sub-pixel convolution method is used to reconstruct the high-resolution image. Experiments are performed on three datasets, the key characteristics, such as the number of AMFEs and the gating connection way are studied, and super-resolution of remote sensing imagery of different scale factors are qualitatively and quantitatively analyzed. The results show that our method outperforms the classic methods, such as Super-Resolution Convolutional Neural Network(SRCNN), Efficient Sub-Pixel Convolutional Network (ESPCN), and multi-scale residual CNN(MSRN).

Journal Article

Share this book

Add to My Shelf

A Multi-Scale Feature Pyramid Network for Detection and Instance Segmentation of Marine Ships in SAR Images

by Meng, Chunning , Cheng, Jierong , Chang, Shengjiang in Algorithms , data collection , Datasets

2022

In the remote sensing field, synthetic aperture radar (SAR) is a type of active microwave imaging sensor working in all-weather and all-day conditions, providing high-resolution SAR images of objects such as marine ships. Detection and instance segmentation of marine ships in SAR images has become an important question in remote sensing, but current deep learning models cannot accurately quantify marine ships because of the multi-scale property of marine ships in SAR images. In this paper, we propose a multi-scale feature pyramid network (MS-FPN) to achieve the simultaneous detection and instance segmentation of marine ships in SAR images. The proposed MS-FPN model uses a pyramid structure, and it is mainly composed of two proposed modules, namely the atrous convolutional pyramid (ACP) module and the multi-scale attention mechanism (MSAM) module. The ACP module is designed to extract both the shallow and deep feature maps, and these multi-scale feature maps are crucial for the description of multi-scale marine ships, especially the small ones. The MSAM module is designed to adaptively learn and select important feature maps obtained from different scales, leading to improved detection and segmentation accuracy. Quantitative comparison of the proposed MS-FPN model with several classical and recently developed deep learning models, using the high-resolution SAR images dataset (HRSID) that contains multi-scale marine ship SAR images, demonstrated the superior performance of MS-FPN over other models.

Journal Article

Share this book

Add to My Shelf

HCViT‐Net: Hybrid CNN and multi scale query transformer network for dermatological image segmentation

by Ling, Dandan , Fang, Yijiao , Huang, Jiaojiao in Accuracy , Algorithms , Architecture

2025

Background Dermoscopic lesion segmentation is crucial for dermatology, yet existing methods struggle to integrate global context with local details under the efficiency constraints required for clinical use. Purpose We aim to develop a lightweight model that simultaneously captures long‐range spatial dependencies and preserves fine‐grained boundary details for dermoscopic lesions. The method is designed to achieve a favorable accuracy–efficiency trade‐off, thereby improving segmentation performance and ensuring potential for practical clinical deployment. Methods Proposing a lightweight hybrid model, HCViT‐Net, featuring an encoder–decoder architecture. It incorporates a multi‐scale query transformer (MSQFormer) into each stage of its convolutional encoder to efficiently capture global, multi‐scale context. Furthermore, a wavelet‐guided attention refinement module (WARM) is introduced on the highest‐resolution skip connection to selectively enhance high‐frequency boundary details and bridge the semantic gap between the encoder and decoder, thus improving model performance. Results Evaluated on ISIC 2017 and 2018, our model achieved mean intersection‐over‐union (mIoU) of 87.76% and 87.45%, respectively. With only 5.76M parameters and 7.51 GFLOPs, it demonstrates performance competitive with existing methods at a significantly lower computational cost. Conclusions HCViT‐Net achieves an excellent accuracy–efficiency trade‐off. It improves segmentation accuracy with a low computational footprint, showing strong potential for practical deployment in dermatology workflows.

Journal Article

Share this book

Add to My Shelf

Crop Classification Using MSCDN Classifier and Sparse Auto-Encoders with Non-Negativity Constraints for Multi-Temporal, Quad-Pol SAR Data

by Guo, Jiao , Zhang, Wei-Tao , Wang, Min in Accuracy , Agriculture , Algorithms

2021

Accurate and reliable crop classification information is a significant data source for agricultural monitoring and food security evaluation research. It is well-known that polarimetric synthetic aperture radar (PolSAR) data provides ample information for crop classification. Moreover, multi-temporal PolSAR data can further increase classification accuracies since the crops show different external forms as they grow up. In this paper, we distinguish the crop types with multi-temporal PolSAR data. First, due to the “dimension disaster” of multi-temporal PolSAR data caused by excessive scattering parameters, a neural network of sparse auto-encoder with non-negativity constraint (NC-SAE) was employed to compress the data, yielding efficient features for accurate classification. Second, a novel crop discrimination network with multi-scale features (MSCDN) was constructed to improve the classification performance, which is proved to be superior to the popular classifiers of convolutional neural networks (CNN) and support vector machine (SVM). The performances of the proposed method were evaluated and compared with the traditional methods by using simulated Sentinel-1 data provided by European Space Agency (ESA). For the final classification results of the proposed method, its overall accuracy and kappa coefficient reaches 99.33% and 99.19%, respectively, which were almost 5% and 6% higher than the CNN method. The classification results indicate that the proposed methodology is promising for practical use in agricultural applications.

Journal Article

Share this book

Add to My Shelf

Multi-Stage Multi-Scale Local Feature Fusion for Infrared Small Target Detection

by Liu, Jijun , Wang, Yahui , Tian, Yan in Ablation , Algorithms , Artificial intelligence

2023

The detection of small infrared targets with dense distributions and large-scale variations is an extremely challenging problem. This paper proposes a multi-stage, multi-scale local feature fusion method for infrared small target detection to address this problem. The method is based on multi-stage and multi-scale local feature fusion. Firstly, considering the significant variation in target sizes, ResNet-18 is utilized to extract image features at different stages. Then, for each stage, multi-scale feature pyramids are employed to obtain corresponding multi-scale local features. Secondly, to enhance the detection rate of densely distributed targets, the multi-stage and multi-scale features are progressively fused and concatenated to form the final fusion results. Finally, the fusion results are fed into the target detector for detection. The experimental results for the SIRST and MDFA demonstrate that the proposed method effectively improves the performance of infrared small target detection. The proposed method achieved mIoU values of 63.43% and 46.29% on two datasets, along with F-measure values of 77.62% and 63.28%, respectively.

Journal Article

Share this book

Add to My Shelf

Multi‐scale feature extraction for energy‐efficient object detection in remote sensing images

by Xie, Fei , Liu, Hongning , Wu, Di in Accuracy , Computer vision , Deep learning

2024

Object detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one‐stage detectors, for example, the Real‐Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi‐Scale Feature Extraction‐assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC‐FPN, which enhances the model's ability to capture global and multi‐scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly‐scale convolution (PSConv), respectively. The experiment in DIOR‐R demonstrated that MRTMDet achieves competitive performance of 62.2% mAP, balancing precision with a lightweight design. In this paper, an improved detector MRTMDet, is proposed to overcome the complex backgrounds noise and large scale‐variations challenge for oriented object detection in remote sensing images by designing innovative feature extraction network and feature fusion network. These networks integrate a lightweight vision transformer and a multi‐scale feature extraction module in different structures, thereby enhancing the overall quality of feature representation and the effectiveness in understanding and predicting tasks and further augmenting the model's ability to perceive both global features and multi‐scale features. The authors set the ablation and comparison experiments on the publicly available dataset DIOR‐R which show the model achieves excellent comprehensive performance and is well‐balanced with precision and lightweight.

Journal Article

Share this book

Add to My Shelf

High-Resolution SAR Image Classification Using Multi-Scale Deep Feature Fusion and Covariance Pooling Manifold Network

by Liang, Wenkai , Li, Ming , Wu, Yan in Algorithms , Artificial neural networks , Classification

2021

The classification of high-resolution (HR) synthetic aperture radar (SAR) images is of great importance for SAR scene interpretation and application. However, the presence of intricate spatial structural patterns and complex statistical nature makes SAR image classification a challenging task, especially in the case of limited labeled SAR data. This paper proposes a novel HR SAR image classification method, using a multi-scale deep feature fusion network and covariance pooling manifold network (MFFN-CPMN). MFFN-CPMN combines the advantages of local spatial features and global statistical properties and considers the multi-feature information fusion of SAR images in representation learning. First, we propose a Gabor-filtering-based multi-scale feature fusion network (MFFN) to capture the spatial pattern and get the discriminative features of SAR images. The MFFN belongs to a deep convolutional neural network (CNN). To make full use of a large amount of unlabeled data, the weights of each layer of MFFN are optimized by unsupervised denoising dual-sparse encoder. Moreover, the feature fusion strategy in MFFN can effectively exploit the complementary information between different levels and different scales. Second, we utilize a covariance pooling manifold network to extract further the global second-order statistics of SAR images over the fusional feature maps. Finally, the obtained covariance descriptor is more distinct for various land covers. Experimental results on four HR SAR images demonstrate the effectiveness of the proposed method and achieve promising results over other related algorithms.

Journal Article

Share this book

Add to My Shelf

FI‐Net: Rethinking Feature Interactions for Medical Image Segmentation

by Liu, Jinhui , Liang, Haisu , Huang, Jinliang in Algorithms , Artificial neural networks , Batch processing

2024

To solve the problems of existing hybrid networks based on convolutional neural networks (CNN) and Transformers, we propose a new encoder–decoder network FI‐Net based on CNN‐Transformer for medical image segmentation. In the encoder part, a dual‐stream encoder is used to capture local details and long‐range dependencies. Moreover, the attentional feature fusion module is used to perform interactive feature fusion of dual‐branch features, maximizing the retention of local details and global semantic information in medical images. At the same time, the multi‐scale feature aggregation module is used to aggregate local information and capture multi‐scale context to mine more semantic details. The multi‐level feature bridging module is used in skip connections to bridge multi‐level features and mask information to assist multi‐scale feature interaction. Experimental results on seven public medical image datasets fully demonstrate the effectiveness and advancement of our method. In future work, we plan to extend FI‐Net to support 3D medical image segmentation tasks and combine self‐supervised learning and knowledge distillation to alleviate the overfitting problem of limited data training. A new encoder–decoder network FI‐Net based on convolutional neural networks (CNN)‐transformer is proposed for medical image segmentation. It rethinks the uniqueness of feature interactions in medical images to design four effective modules. Experimental results on seven public medical image datasets fully demonstrate the effectiveness and advancement of the method.

Journal Article

Share this book

Add to My Shelf

Wavelet‐Based Feature Extraction for Efficient High‐Resolution Image Classification

by Akowuah, Emmanuel Kofi , Acquah, Isaac , Nunoo‐Mensah, Henry in Accuracy , Artificial neural networks , classification

2025

Convolutional neural networks (CNNs) typically compress high‐resolution images to minimize computational requirements. However, this can lead to loss of information and reduced accuracy in classification tasks. This paper introduces WaveNet, a novel approach for processing high‐resolution images using wavelet‐domain inputs in CNNs. We address the challenge of maintaining classification accuracy with high‐resolution inputs while minimizing computational complexity. Our method employs wavelet packet transform (WPT) for image pre‐processing, extracting detailed multi‐scale and directional information from high‐resolution images. We propose a wavelet‐adaptive efficient channel attention (WAECA) module to dynamically select the most informative wavelet subbands. Popular CNN architectures like ResNet‐50 and MobileNetV2 are adapted by replacing their initial convolutional layers with wavelet‐transformed inputs, enabling direct learning in the wavelet domain. Experiments conducted on the Caltech‐256 and ALOT datasets demonstrate that WaveNet improves classification accuracy while reducing computational complexity. For instance, our wavelet‐enhanced ResNet‐50 achieves a Top‐1 accuracy of 72.47% on Caltech‐256, outperforming the baseline (70.65%) while reducing FLOPs from 16.52G to 3.98G. Similar improvements are observed across different architectures and datasets. We also evaluate various wavelet filters and ResNet backbones, finding that the bior1.1 filter and ResNet‐50 provide optimal performance. This work presents a practical solution for developing more accurate and efficient models for high‐resolution inputs without extensive computational resources or complex architectural modifications. The paper introduces a novel CNN architecture that uses wavelet‐domain inputs for high‐resolution image processing. A wavelet‐domain efficient channel attention module that dynamically focuses on informative wavelet subbands is also presented. The proposed architecture improves classification accuracy while reducing computational complexity.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter