Catalogue Search | MBRL

IoU-Adaptive Deformable R-CNN: Make Full Use of IoU for Multi-Class Object Detection in Remote Sensing Imagery

by Yan, Jiangqiao , Wang, Hongqi , Diao, Wenhui in anchor matching , Artificial neural networks , Aspect ratio

2019

Recently, methods based on Faster region-based convolutional neural network (R-CNN) have been popular in multi-class object detection in remote sensing images due to their outstanding detection performance. The methods generally propose candidate region of interests (ROIs) through a region propose network (RPN), and the regions with high enough intersection-over-union (IoU) values against ground truth are treated as positive samples for training. In this paper, we find that the detection result of such methods is sensitive to the adaption of different IoU thresholds. Specially, detection performance of small objects is poor when choosing a normal higher threshold, while a lower threshold will result in poor location accuracy caused by a large quantity of false positives. To address the above issues, we propose a novel IoU-Adaptive Deformable R-CNN framework for multi-class object detection. Specially, by analyzing the different roles that IoU can play in different parts of the network, we propose an IoU-guided detection framework to reduce the loss of small object information during training. Besides, the IoU-based weighted loss is designed, which can learn the IoU information of positive ROIs to improve the detection accuracy effectively. Finally, the class aspect ratio constrained non-maximum suppression (CARC-NMS) is proposed, which further improves the precision of the results. Extensive experiments validate the effectiveness of our approach and we achieve state-of-the-art detection performance on the DOTA dataset.

Journal Article

Share this book

Add to My Shelf

WSF-NET: Weakly Supervised Feature-Fusion Network for Binary Segmentation in Remote Sensing Image

by Diao, Wenhui , Yan, Menglong , Sun, Xian in Annotations , Classification , Datasets

2018

Binary segmentation in remote sensing aims to obtain binary prediction mask classifying each pixel in the given image. Deep learning methods have shown outstanding performance in this task. These existing methods in fully supervised manner need massive high-quality datasets with manual pixel-level annotations. However, the annotations are generally expensive and sometimes unreliable. Recently, using only image-level annotations, weakly supervised methods have proven to be effective in natural imagery, which significantly reduce the dependence on manual fine labeling. In this paper, we review existing methods and propose a novel weakly supervised binary segmentation framework, which is capable of addressing the issue of class imbalance via a balanced binary training strategy. Besides, a weakly supervised feature-fusion network (WSF-Net) is introduced to adapt to the unique characteristics of objects in remote sensing image. The experiments were implemented on two challenging remote sensing datasets: Water dataset and Cloud dataset. Water dataset is acquired by Google Earth with a resolution of 0.5 m, and Cloud dataset is acquired by Gaofen-1 satellite with a resolution of 16 m. The results demonstrate that using only image-level annotations, our method can achieve comparable results to fully supervised methods.

Journal Article

Share this book

Add to My Shelf

AIR-PV: a benchmark dataset for photovoltaic panel extraction in optical remote sensing imagery

by Yan, Zhiyuan , Xu, Feng , Diao, Wenhui in Computer Science , Information Systems and Communication Service , News & Views

2023

Journal Article

Share this book

Add to My Shelf

LAM: Remote Sensing Image Captioning with Label-Attention Mechanism

by Gao, Xin , Diao, Wenhui , Yan, Menglong in Attention , Coders , computer vision

2019

Significant progress has been made in remote sensing image captioning by encoder-decoder frameworks. The conventional attention mechanism is prevalent in this task but still has some drawbacks. The conventional attention mechanism only uses visual information about the remote sensing images without considering using the label information to guide the calculation of attention masks. To this end, a novel attention mechanism, namely Label-Attention Mechanism (LAM), is proposed in this paper. LAM additionally utilizes the label information of high-resolution remote sensing images to generate natural sentences to describe the given images. It is worth noting that, instead of high-level image features, the predicted categories’ word embedding vectors are adopted to guide the calculation of attention masks. Representing the content of images in the form of word embedding vectors can filter out redundant image features. In addition, it can also preserve pure and useful information for generating complete sentences. The experimental results from UCM-Captions, Sydney-Captions and RSICD demonstrate that LAM can improve the model’s performance for describing high-resolution remote sensing images and obtain better S m scores compared with other methods. S m score is a hybrid scoring method derived from the AI Challenge 2017 scoring method. In addition, the validity of LAM is verified by the experiment of using true labels.

Journal Article

Share this book

Add to My Shelf

SA-SatMVS: Slope Feature-Aware and Across-Scale Information Integration for Large-Scale Earth Terrain Multi-View Stereo

by Zhang, Song , Wei, Zhiwei , Diao, Wenhui in Accuracy , Algorithms , data collection

2024

Satellite multi-view stereo (MVS) is a fundamental task in large-scale Earth surface reconstruction. Recently, learning-based multi-view stereo methods have shown promising results in this field. However, these methods are mainly developed by transferring the general learning-based MVS framework to satellite imagery, which lacks consideration of the specific terrain features of the Earth’s surface and results in inadequate accuracy. In addition, mainstream learning-based methods mainly use equal height interval partition, which insufficiently utilizes the height hypothesis surface, resulting in inaccurate height estimation. To address these challenges, we propose an end-to-end terrain feature-aware height estimation network named SA-SatMVS for large-scale Earth surface multi-view stereo, which integrates information across different scales. Firstly, we transform the Sobel operator into slope feature-aware kernels to extract terrain features, and a dual encoder–decoder architecture with residual blocks is applied to incorporate slope information and geometric structural characteristics to guide the reconstruction process. Secondly, we introduce a pixel-wise unequal interval partition method using a Laplacian distribution based on the probability volume obtained from other scales, resulting in more accurate height hypotheses for height estimation. Thirdly, we apply an adaptive spatial feature extraction network to search for the optimal fusion method for feature maps at different scales. Extensive experiments on the WHU-TLC dataset also demonstrate that our proposed model achieves the best MAE metric of 1.875 and an RMSE metric of 3.785, which constitutes a state-of-the-art performance.

Journal Article

Share this book

Add to My Shelf

Dynamic Pseudo-Label Generation for Weakly Supervised Object Detection in Remote Sensing Images

by Diao, Wenhui , Zhao, Liangjin , Wang, Hui in Aircraft , Algorithms , Annotations

2021

In recent years, fully supervised object detection methods in remote sensing images with good performance have been developed. However, this approach requires a large number of instance-level annotated samples that are relatively expensive to acquire. Therefore, weakly supervised learning using only image-level annotations has attracted much attention. Most of the weakly supervised object detection methods are based on multi-instance learning methods, and their performance depends on the process of scoring the candidate region proposals during training. In this process, the use of only image-level labels for supervision usually cannot obtain optimal results due to the lack of location information of the object. To address the above problem, a dynamic sample pseudo-label generation framework is proposed to generate pseudo-labels for each proposal without additional annotations. First, we propose the pseudo-label generation algorithm (PLG) to generate the category labels of the proposal by using the localization information of the object. Specifically, we propose to use the pixel average of the object’s localization map in the proposal as the proposal category confidence and calculate the pseudo-label by comparing the proposal category confidence with the preset threshold. In addition, an effective adaptive threshold selection strategy is designed to eliminate the effect of different category shape differences in computing sample pseudo-labels. Comparative experiments on the NWPU VHR-10 dataset demonstrate that our method can significantly improve the detection performance compared to existing methods.

Journal Article

Share this book

Add to My Shelf

Towards Automated Ship Detection and Category Recognition from High-Resolution Aerial Images

by Gao, Xin , Diao, Wenhui , Feng, Yingchao in aerial photography , Algorithms , automation

2019

Ship category classification in high-resolution aerial images has attracted great interest in applications such as maritime security, naval construction, and port management. However, the applications of previous methods were mainly limited by the following issues: (i) The existing ship category classification methods were mainly to classify on accurately-cropped image patches. This is unsatisfactory for the results of the existing methods in practical applications, because the location of the ship in the patch obtained by the object detection varies greatly. (ii) The factors such as target scale variations and class imbalance have a great influence on the performance of ship category classification. Aiming at the issues above, we propose a novel ship detection and category classification framework. The category classification is based on accurate location. The detection network can generate more precise rotated bounding boxes in large-scale aerial images by introducing a novel Sequence Local Context (SLC) module. Besides, three different ship category classification networks are proposed to eliminate the effect of scale variations, and the Spatial Transform Crop (STC) operation is used to get aligned image patches. Whatever the problem of insufficient samples or class imbalance have, the Proposals Simulation Generator (PSG) is considered to handle this properly. Most remarkably, the state-of-the-art performance of our framework is demonstrated by experiments based on the 19-class ship dataset HRSC2016 and our multiclass warship dataset.

Journal Article

Share this book

Add to My Shelf

C3Net: Cross-Modal Feature Recalibrated, Cross-Scale Semantic Aggregated and Compact Network for Semantic Segmentation of Multi-Modal High-Resolution Aerial Images

by Cao, Zhiying , Lyu, Xiaode , Diao, Wenhui in Accuracy , Classification , data collection

2021

Semantic segmentation of multi-modal remote sensing images is an important branch of remote sensing image interpretation. Multi-modal data has been proven to provide rich complementary information to deal with complex scenes. In recent years, semantic segmentation based on deep learning methods has made remarkable achievements. It is common to simply concatenate multi-modal data or use parallel branches to extract multi-modal features separately. However, most existing works ignore the effects of noise and redundant features from different modalities, which may not lead to satisfactory results. On the one hand, existing networks do not learn the complementary information of different modalities and suppress the mutual interference between different modalities, which may lead to a decrease in segmentation accuracy. On the other hand, the introduction of multi-modal data greatly increases the running time of the pixel-level dense prediction. In this work, we propose an efficient C3Net that strikes a balance between speed and accuracy. More specifically, C3Net contains several backbones for extracting features of different modalities. Then, a plug-and-play module is designed to effectively recalibrate and aggregate multi-modal features. In order to reduce the number of model parameters while remaining the model performance, we redesign the semantic contextual extraction module based on the lightweight convolutional groups. Besides, a multi-level knowledge distillation strategy is proposed to improve the performance of the compact model. Experiments on ISPRS Vaihingen dataset demonstrate the superior performance of C3Net with 15× fewer FLOPs than the state-of-the-art baseline network while providing comparable overall accuracy.

Journal Article

Share this book

Add to My Shelf

AF-EMS Detector: Improve the Multi-Scale Detection Performance of the Anchor-Free Detector

by Yan, Jiangqiao , Wang, Hongqi , Diao, Wenhui in algorithms , anchor free , computer vision

2021

As a precursor step for computer vision algorithms, object detection plays an important role in various practical application scenarios. With the objects to be detected becoming more complex, the problem of multi-scale object detection has attracted more and more attention, especially in the field of remote sensing detection. Early convolutional neural network detection algorithms are mostly based on artificially preset anchor-boxes to divide different regions in the image, and then obtain the prior position of the target. However, the anchor box is difficult to set reasonably and will cause a large amount of computational redundancy, which affects the generality of the detection model obtained under fixed parameters. In the past two years, anchor-free detection algorithm has achieved remarkable development in the field of detection on natural image. However, there is no sufficient research on how to deal with multi-scale detection more effectively in anchor-free framework and use these detectors on remote sensing images. In this paper, we propose a specific-attention Feature Pyramid Network (FPN) module, which is able to generate a feature pyramid, basing on the characteristics of objects with various sizes. In addition, this pyramid suits multi-scale object detection better. Besides, a scale-aware detection head is proposed which contains a multi-receptive feature fusion module and a size-based feature compensation module. The new anchor-free detector can obtain a more effective multi-scale feature expression. Experiments on challenging datasets show that our approach performs favorably against other methods in terms of the multi-scale object detection performance.

Journal Article

Share this book

Add to My Shelf

Active Bidirectional Self-Training Network for Cross-Domain Segmentation in Remote-Sensing Images

by Yang, Zhujun , Yan, Zhiyuan , Diao, Wenhui in active learning , Adaptation , Adaptive sampling

2024

Semantic segmentation with cross-domain adaptation in remote-sensing images (RSIs) is crucial and mitigates the expense of manually labeling target data. However, the performance of existing unsupervised domain adaptation (UDA) methods is still significantly impacted by domain bias, leading to a considerable gap compared to supervised trained models. To address this, our work focuses on semi-supervised domain adaptation, selecting a small subset of target annotations through active learning (AL) that maximize information to improve domain adaptation. Overall, we propose a novel active bidirectional self-training network (ABSNet) for cross-domain semantic segmentation in RSIs. ABSNet consists of two sub-stages: a multi-prototype active region selection (MARS) stage and a source-weighted class-balanced self-training (SCBS) stage. The MARS approach captures the diversity in labeled source data by introducing multi-prototype density estimation based on Gaussian mixture models. We then measure inter-domain similarity to select complementary and representative target samples. Through fine-tuning with the selected active samples, we propose an enhanced self-training strategy SCBS, designed for weighted training on source data, aiming to avoid the negative effects of interfering samples. We conduct extensive experiments on the LoveDA and ISPRS datasets to validate the superiority of our method over existing state-of-the-art domain-adaptive semantic segmentation methods.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter