Catalogue Search | MBRL

Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation

by Zheng Zhedong , Yang, Yi in Adaptation , Benchmarks , Confidence

2021

This paper focuses on the unsupervised domain adaptation of transferring the knowledge from the source domain to the target domain in the context of semantic segmentation. Existing approaches usually regard the pseudo label as the ground truth to fully exploit the unlabeled target-domain data. Yet the pseudo labels of the target-domain data are usually predicted by the model trained on the source domain. Thus, the generated labels inevitably contain the incorrect prediction due to the discrepancy between the training domain and the test domain, which could be transferred to the final adapted model and largely compromises the training process. To overcome the problem, this paper proposes to explicitly estimate the prediction uncertainty during training to rectify the pseudo label learning for unsupervised semantic segmentation adaptation. Given the input image, the model outputs the semantic segmentation prediction as well as the uncertainty of the prediction. Specifically, we model the uncertainty via the prediction variance and involve the uncertainty into the optimization objective. To verify the effectiveness of the proposed method, we evaluate the proposed method on two prevalent synthetic-to-real semantic segmentation benchmarks, i.e., GTA5 → Cityscapes and SYNTHIA → Cityscapes, as well as one cross-city benchmark, i.e., Cityscapes → Oxford RobotCar. We demonstrate through extensive experiments that the proposed approach (1) dynamically sets different confidence thresholds according to the prediction variance, (2) rectifies the learning from noisy pseudo labels, and (3) achieves significant improvements over the conventional pseudo label learning and yields competitive performance on all three benchmarks.

Journal Article

Share this book

Add to My Shelf

Recent progress in semantic image segmentation

by Yang, Yuhan , Deng, Zhidong , Liu, Xiaolong in Accuracy , Algorithms , Annotations

2019

Semantic image segmentation, which becomes one of the key applications in image processing and computer vision domain, has been used in multiple domains such as medical area and intelligent transportation. Lots of benchmark datasets are released for researchers to verify their algorithms. Semantic segmentation has been studied for many years. Since the emergence of Deep Neural Network (DNN), segmentation has made a tremendous progress. In this paper, we divide semantic image segmentation methods into two categories: traditional and recent DNN method. Firstly, we briefly summarize the traditional method as well as datasets released for segmentation, then we comprehensively investigate recent methods based on DNN which are described in the eight aspects: fully convolutional network, up-sample ways, FCN joint with CRF methods, dilated convolution approaches, progresses in backbone network, pyramid methods, Multi-level feature and multi-stage method, supervised, weakly-supervised and unsupervised methods. Finally, a conclusion in this area is drawn.

Journal Article

Share this book

Add to My Shelf

Towards a guideline for evaluation metrics in medical image segmentation

by Kramer, Frank , Müller, Dominik , Soto-Rey, Iñaki in Accuracy , Algorithms , Artificial Intelligence

2022

In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen’s Kappa, and Hausdorff distance. Furthermore, common issues like class imbalance and statistical as well as interpretation biases in evaluation are discussed. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

by Chan, Lyndon , Hosseini, Mahdi S , Plataniotis, Konstantinos N in Algorithms , Datasets , Domains

2021

Recently proposed methods for weakly-supervised semantic segmentation have achieved impressive performance in predicting pixel classes despite being trained with only image labels which lack positional information. Because image annotations are cheaper and quicker to generate, weak supervision is more practical than full supervision for training segmentation algorithms. These methods have been predominantly developed to solve the background separation and partial segmentation problems presented by natural scene images and it is unclear whether they can be simply transferred to other domains with different characteristics, such as histopathology and satellite images, and still perform well. This paper evaluates state-of-the-art weakly-supervised semantic segmentation methods on natural scene, histopathology, and satellite image datasets and analyzes how to determine which method is most suitable for a given dataset. Our experiments indicate that histopathology and satellite images present a different set of problems for weakly-supervised semantic segmentation than natural scene images, such as ambiguous boundaries and class co-occurrence. Methods perform well for datasets they were developed on, but tend to perform poorly on other datasets. We present some practical techniques for these methods on unseen datasets and argue that more work is needed for a generalizable approach to weakly-supervised semantic segmentation. Our full code implementation is available on GitHub: https://github.com/lyndonchan/wsss-analysis.

Journal Article

Share this book

Add to My Shelf

PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation

by Sun, Baigui , Xie, Haoyu , Wang, Changqi in Computer vision , Contrastive learning , Image enhancement

2024

Tremendous breakthroughs have been developed in Semi-Supervised Semantic Segmentation (S4) through contrastive learning. However, due to limited annotations, the guidance on unlabeled images is generated by the model itself, which inevitably exists noise and disturbs the unsupervised training process. To address this issue, we propose a robust contrastive-based S4 framework, termed the Probabilistic Representation Contrastive Learning (PRCL) framework to enhance the robustness of the unsupervised training process. We model the pixel-wise representation as Probabilistic Representations (PR) via multivariate Gaussian distribution and tune the contribution of the ambiguous representations to tolerate the risk of inaccurate guidance in contrastive learning. Furthermore, we introduce Global Distribution Prototypes (GDP) by gathering all PRs throughout the whole training process. Since the GDP contains the information of all representations with the same class, it is robust from the instant noise in representations and bears the intra-class variance of representations. In addition, we generate Virtual Negatives (VNs) based on GDP to involve the contrastive learning process. Extensive experiments on two public benchmarks demonstrate the superiority of our PRCL framework.

Journal Article

Share this book

Add to My Shelf

Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow

by Yang, Kuiyuan , Tao, Dacheng , Zhang, Jiangning in Alignment , Computer networks , Datasets

2024

In this paper, we focus on exploring effective methods for faster and accurate semantic segmentation. A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation. Two strategies are widely used: atrous convolutions and feature pyramid fusion, while both are either computationally intensive or ineffective. Inspired by the Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels and broadcast high-level features to high-resolution features effectively and efficiently. Furthermore, integrating our FAM to a standard feature pyramid structure exhibits superior performance over other real-time methods, even on lightweight backbone networks, such as ResNet-18 and DFNet. Then to further speed up the inference procedure, we also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps where we term the improved version network as SFNet-Lite. Extensive experiments are conducted on several challenging datasets, where results show the effectiveness of both SFNet and SFNet-Lite. In particular, when using Cityscapes test set, the SFNet-Lite series achieve 80.1 mIoU while running at 60 FPS using ResNet-18 backbone and 78.8 mIoU while running at 120 FPS using STDC backbone on RTX-3090. Moreover, we unify four challenging driving datasets (i.e., Cityscapes, Mapillary, IDD, and BDD) into one large dataset, which we named Unified Driving Segmentation (UDS) dataset. It contains diverse domain and style information. We benchmark several representative works on UDS. Both SFNet and SFNet-Lite still achieve the best speed and accuracy trade-off on UDS, which serves as a strong baseline in such a challenging setting. The code and models are publicly available at https://github.com/lxtGH/SFSegNets.

Journal Article

Share this book

Add to My Shelf

Deep semantic segmentation of natural and medical images: a review

in Deep learning , Educational activities , Groups

2021

The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation.

Journal Article

Share this book

Add to My Shelf

Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling

by Ru Lixiang , Du, Bo , Zhan Yibing in Classification , Datasets , Image classification

2022

Weakly-supervised semantic segmentation (WSSS) methods with image-level labels generally train a classification network to generate the Class Activation Maps (CAMs) as the initial coarse segmentation labels. However, current WSSS methods still perform far from satisfactorily because their adopted CAMs (1) typically focus on partial discriminative object regions and (2) usually contain useless background regions. These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks. In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in classification network to mitigate the above problems. In visual words learning module, we counter the first problem by enforcing the classification network to learn fine-grained visual word labels so that more object extents could be discovered. Specifically, the visual words are learned with a codebook, which could be updated via two proposed strategies, i.e. learning-based strategy and memory-bank strategy. The second drawback of CAMs is alleviated with the proposed hybrid pooling, which incorporates the global average and local discriminative information to simultaneously ensure object completeness and reduce background regions. We evaluated our methods on PASCAL VOC 2012 and MS COCO 2014 datasets. Without any extra saliency prior, our method achieved 70.6% and 70.7% mIoU on the val and test set of PASCAL VOC dataset, respectively, and 36.2% mIoU on the val set of MS COCO dataset, which significantly surpassed the performance of state-of-the-art WSSS methods.

Journal Article

Share this book

Add to My Shelf

Computer Vision and Deep Learning Techniques for the Analysis of Drone-Acquired Forest Images, a Transfer Learning Study

by Roure, Ferran , Serrano, Daniel , Kentsch, Sarah in computer vision , data collection , deciduous forests

2020

Unmanned Aerial Vehicles (UAV) are becoming an essential tool for evaluating the status and the changes in forest ecosystems. This is especially important in Japan due to the sheer magnitude and complexity of the forest area, made up mostly of natural mixed broadleaf deciduous forests. Additionally, Deep Learning (DL) is becoming more popular for forestry applications because it allows for the inclusion of expert human knowledge into the automatic image processing pipeline. In this paper we study and quantify issues related to the use of DL with our own UAV-acquired images in forestry applications such as: the effect of Transfer Learning (TL) and the Deep Learning architecture chosen or whether a simple patch-based framework may produce results in different practical problems. We use two different Deep Learning architectures (ResNet50 and UNet), two in-house datasets (winter and coastal forest) and focus on two separate problem formalizations (Multi-Label Patch or MLP classification and semantic segmentation). Our results show that Transfer Learning is necessary to obtain satisfactory outcome in the problem of MLP classification of deciduous vs evergreen trees in the winter orthomosaic dataset (with a 9.78% improvement from no transfer learning to transfer learning from a a general-purpose dataset). We also observe a further 2.7% improvement when Transfer Learning is performed from a dataset that is closer to our type of images. Finally, we demonstrate the applicability of the patch-based framework with the ResNet50 architecture in a different and complex example: Detection of the invasive broadleaf deciduous black locust (Robinia pseudoacacia) in an evergreen coniferous black pine (Pinus thunbergii) coastal forest typical of Japan. In this case we detect images containing the invasive species with a 75% of True Positives (TP) and 9% False Positives (FP) while the detection of native trees was 95% TP and 10% FP.

Journal Article

Share this book

Add to My Shelf

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

by Ding, Hao , Wang, Xing , Zhang, Zekai in Artificial neural networks , Benchmarks , Computer applications

2024

Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter