Catalogue Search | MBRL

Occluded Video Instance Segmentation: A Benchmark

by Liu, Xiaoyu , Bai, Xiang , Hu, Yao in Algorithms , Datasets , Image segmentation

2022

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion. Built upon MaskTrack R-CNN and SipMask, we obtain a remarkable AP improvement on the OVIS dataset. The OVIS dataset and project code are available at http://songbai.site/ovis.

Journal Article

Share this book

Add to My Shelf

HTC+ for SAR Ship Instance Segmentation

by Zhang, Xiaoling , Zhang, Tianwen in Accuracy , Adaptive learning , convolutional neural network

2022

Existing instance segmentation models mostly pay less attention to the targeted characteristics of ships in synthetic aperture radar (SAR) images, which hinders further accuracy improvements, leading to poor segmentation performance in more complex SAR image scenes. To solve this problem, we propose a hybrid task cascade plus (HTC+) for better SAR ship instance segmentation. Aiming at the specific SAR ship task, seven techniques are proposed to ensure the excellent performance of HTC+ in more complex SAR image scenes, i.e., a multi-resolution feature extraction network (MRFEN), an enhanced feature pyramid net-work (EFPN), a semantic-guided anchor adaptive learning network (SGAALN), a context ROI extractor (CROIE), an enhanced mask interaction network (EMIN), a post-processing technique (PPT), and a hard sample mining training strategy (HSMTS). Results show that each of them offers an observable accuracy gain, and the instance segmentation performance in more complex SAR image scenes becomes better. On two public datasets SSDD and HRSID, HTC+ surpasses the other nine competitive models. It achieves 6.7% higher box AP and 5.0% higher mask AP than HTC on SSDD. These are 4.9% and 3.9% on HRSID.

Journal Article

Share this book

Add to My Shelf

OV-VIS: Open-Vocabulary Video Instance Segmentation

by Jiang, Xiaolong , Kang, Guoliang , Tang, Xu in Categories , Datasets , Instance segmentation

2024

Conventionally, the goal of Video Instance Segmentation (VIS) is to segment and categorize objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation (OV-VIS), which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark OV-VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset (LV-VIS), that contains well-annotated objects from 1196 diverse categories, significantly surpassing the category size of existing datasets by more than an order of magnitude. Third, we propose a transformer-based OV-VIS model, OV2Seg+, which associates per-frame segmentation masks with a memory-induced transformer and clarifies objects in videos with a voting module given language guidance. In addition, to monitor the progress, we set up the evaluation protocols for OV-VIS and propose a set of strong baseline models to facilitate future endeavors. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of OV2Seg+. The dataset and code are released here https://github.com/haochenheheda/LVVIS. The competition website is provided here https://www.codabench.org/competitions/1748.

Journal Article

Share this book

Add to My Shelf

GCBANet: A Global Context Boundary-Aware Network for SAR Ship Instance Segmentation

by Ke, Xiao , Zhang, Xiaoling , Zhang, Tianwen in Ablation , Accuracy , boundary-aware box prediction

2022

Synthetic aperture radar (SAR) is an advanced microwave sensor, which has been widely used in ocean surveillance, and its operation is not affected by light and weather. SAR ship instance segmentation can provide not only the box-level ship location but also the pixel-level ship contour, which plays an important role in ocean surveillance. However, most existing methods are provided with limited box positioning ability, hence hindering further accuracy improvement of instance segmentation. To solve the problem, we propose a global context boundary-aware network (GCBANet) for better SAR ship instance segmentation. Specifically, we propose two novel blocks to guarantee GCBANet’s excellent performance, i.e., a global context information modeling block (GCIM-Block) which is used to capture spatial global long-range dependences of ship contextual surroundings, enabling larger receptive fields, and a boundary-aware box prediction block (BABP-Block) which is used to estimate ship boundaries, achieving better cross-scale box prediction. We conduct ablation studies to confirm each block’s effectiveness. Ultimately, on two public SSDD and HRSID datasets, GCBANet outperforms the other nine competitive models. On SSDD, it achieves 2.8% higher box average precision (AP) and 3.5% higher mask AP than the existing best model; on HRSID, they are 2.7% and 1.9%, respectively.

Journal Article

Share this book

Add to My Shelf

Video Instance Segmentation in an Open-World

by Shah, Mubarak , Narayan, Sanath , Laaksonen, Jorma in Artificial Intelligence , Computer Imaging , Computer Science

2025

Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as ‘unknown’ and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at https://github.com/OmkarThawakar/OWVISFormer .

Journal Article

Share this book

Add to My Shelf

A new deep-learning strawberry instance segmentation methodology based on a fully convolutional neural network

by Perez-Borrero, Isaac , Vasallo-Vazquez, Manuel J. , Marin-Santos, Diego in Algorithms , Artificial Intelligence , Artificial neural networks

2021

Instance segmentation is one of the image processing problems where deep learning techniques are beginning to show potential. In agriculture, one of its main application is automatic fruit harvesting. This study focuses on its application on strawberry crops, where the development of automatic harvesting machines is of particular interest. At present, the reference methodology to deal with instance segmentation is Mask R-CNN. However, Mask R-CNN requires a large processing power which limits its implementation in real-time systems. This work proposes a new methodology to carry out instance segmentation of strawberries based on the use of a fully convolutional neural network. Instance segmentation is achieved by adding two new channels to the network output so that each strawberry pixel predicts the centroid of its strawberry. The final segmentation of each strawberry is obtained by applying a grouping and filtering algorithm. The methodology was tested using the publicly available StrawDI_Db1 database. The evaluation results show values of mean average precision (mAP) and mean instance intersection over union (I 2 oU) of 52.61 and 93.38, respectively, with a processing speed of 30 fps. These figures mean an increase in precision higher than 15% and a fps rate six times higher than those obtained in the reference methodologies based on Mask R-CNN. Therefore, the methodology presented in this paper can be considered as the latest reference methodology for strawberry segmentation, meeting the precision and speed requirements needed for it to be used in the automatic strawberry harvesting systems that work in real time.

Journal Article

Share this book

Add to My Shelf

MambaYOLACT: you only look at mamba prediction head for head-neck lymph nodes

by Chai, Wenwen , Chen, Kaixiong , Zhang, Zhe in Artificial Intelligence , Attention , Computer Science

2025

Lymph nodes in the head-neck are often infected when malignant tumors metastasize. At present, Magnetic Resonance Imaging (MRI) is widely used in the evaluation of head-neck lymph nodes. However, there are some problems, such as different sizes, low contrast of head-neck lymph nodes. The instance segmentation accuracy of head-neck lymph nodes is decreased, which affects the patients treatment decision and the surgical effect evaluation. To solve these problems, a single stage Mamba YOLACT instance segmentation model is proposed in this paper. The main contributions are as follows: Firstly, a Cross-field and Cross-direction Feature Enhancement module (CCFE) is designed. The module through the channel grouping mechanism, effectively enhances the ability of each group of features to express different spatial semantic information, by mixing attention mechanism to improve the feature extraction ability of lesions with different dimensions. Secondly, a MambaNet-based prediction head module is designed. The module combined the State-Space Model (SSM) and self-attention mechanism to accurately capture global image dependencies, highlight the lesion area. Thirdly, A dataset of MRI images of head-neck lymph nodes is used to verify the model effectiveness. The results show that the values of APdet, APseg, ARdet, ARseg, mAPdet and mAPseg are 69.8%, 70.9%, 55.3%, 56.4%, 39.4% and 41.0%, respectively. The model can achieve accurate segmentation of the lymph nodes, which has positive significance for lymph nodes auxiliary diagnosis.

Journal Article

Share this book

Add to My Shelf

Synergistic Attention for Ship Instance Segmentation in SAR Images

by Shi, Zhenwei , Qi, Jing , Su, Zhenhua in Algorithms , data collection , Datasets

2021

This paper takes account of the fact that there is a lack of consideration for imaging methods and target characteristics of synthetic aperture radar (SAR) images among existing instance segmentation methods designed for optical images. Thus, we propose a method for SAR ship instance segmentation based on the synergistic attention mechanism which not only improves the performance of ship detection with multi-task branches but also provides pixel-level contours for subsequent applications such as orientation or category determination. The proposed method—SA R-CNN—presents a synergistic attention strategy at the image, semantic, and target level with the following module corresponding to the different stages in the whole process of the instance segmentation framework. The global attention module (GAM), semantic attention module (SAM), and anchor attention module (AAM) were constructed for feature extraction, feature fusion, and target location, respectively, for multi-scale ship targets under complex background conditions. Compared with several state-of-the-art methods, our method reached 68.7 AP in detection and 56.5 AP in segmentation on the HRSID dataset, and showed 91.5 AP in the detection task on the SSDD dataset.

Journal Article

Share this book

Add to My Shelf

Scale in Scale for SAR Ship Instance Segmentation

by Zeng, Tianjiao , Ke, Xiao , Wei, Shunjun in Accuracy , Aspect ratio , data collection

2023

Ship instance segmentation in synthetic aperture radar (SAR) images can provide more detailed location information and shape information, which is of great significance for port ship scheduling and traffic management. However, there is little research work on SAR ship instance segmentation, and the general accuracy is low because the characteristics of target SAR ship task, such as multi-scale, ship aspect ratio, and noise interference, are not considered. In order to solve these problems, we propose an idea of scale in scale (SIS) for SAR ship instance segmentation. Its essence is to establish multi-scale modes in a single scale. In consideration of the characteristic of the targeted SAR ship instance segmentation task, SIS is equipped with four tentative modes in this paper, i.e., an input mode, a backbone mode, an RPN mode (region proposal network), and an ROI mode (region of interest). The input mode establishes multi-scale inputs in a single scale. The backbone mode enhances the ability to extract multi-scale features. The RPN mode makes bounding boxes better accord with ship aspect ratios. The ROI mode expands the receptive field. Combined with them, a SIS network (SISNet) is reported, dedicated to high-quality SAR ship instance segmentation on the basis of the prevailing Mask R-CNN framework. For Mask R-CNN, we also redesign (1) its feature pyramid network (FPN) for better small ship detection and (2) its detection head (DH) for a more refined box regression. We conduct extensive experiments to verify the effectiveness of SISNet on the open SSDD and HRSID datasets. The experimental results reveal that SISNet surpasses the other nine competitive models. Specifically, the segmentation average precision (AP) index is superior to the suboptimal model by 4.4% on SSDD and 2.5% on HRSID.

Journal Article

Share this book

Add to My Shelf

Research on persimmon fruit diameter accurate detection method based on improved RCNN instance segmentation algorithm

by Feng, Ya , Fang, Yuan , Liu, Yangyang in Accuracy , Algorithms , Artificial neural networks

2025

Aiming at the problem of inaccurate fruit recognition and fruit diameter detection in the persimmon inspection process, this research proposes a novel persimmon accurate recognition and fruit diameter detection algorithm based on the Region-based Convolutional Neural Network (RCNN) Mask and instance segmentation algorithm. The algorithm strategically targets the object of interest by integrating cropping, morphological processing, and concave point segmentation modules into the fully connected layer following the Region of Interest (RoI) feature. Initially, the algorithm separates the front and back background of the cropped target object using morphological processing to obtain a binarized image. Subsequently, concave point segmentation is applied to address sticking issues arising from overlapping or occlusion between fruits, while a template matching algorithm helps in image recognition. The improved instance segmentation algorithm enhances the segmentation accuracy of the target fruit and reduces the relative error in the fruit diameter measurement caused by sticking problems during occlusion and overlap. Notably, compared with the original algorithm, the improved Mask RCNN instance segmentation algorithm achieves a mean Average Precision (mAP) of 94.25%, representing an improvement of 8.05%, with the Mean Intersection-over-Union (MIoU) value increasing by 18.5%. The maximum relative error in fruit diameter measurement is reduced to 1.3%, while the maximum relative error in fruit thickness measurement is 1.98%, meeting the stringent requirements of orchard inspection. Overall, the proposed method enhances the precision and accuracy of fruit diameter detection, offering valuable theoretical and technical insights for intelligent inspection, yield estimation, fruit detection, and mechanized picking in the agricultural domain.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter