Catalogue Search | MBRL

One-Shot Multiple Object Tracking in UAV Videos Using Task-Specific Fine-Grained Features

by He, Zhiwei , Zhu, Ziming , Wu, Han in Benchmarks , Coders , Computer applications

2022

Multiple object tracking (MOT) in unmanned aerial vehicle (UAV) videos is a fundamental task and can be applied in many fields. MOT consists of two critical procedures, i.e., object detection and re-identification (ReID). One-shot MOT, which incorporates detection and ReID in a unified network, has gained attention due to its fast inference speed. It significantly reduces the computational overhead by making two subtasks share features. However, most existing one-shot trackers struggle to achieve robust tracking in UAV videos. We observe that the essential difference between detection and ReID leads to an optimization contradiction within one-shot networks. To alleviate this contradiction, we propose a novel feature decoupling network (FDN) to convert shared features into detection-specific and ReID-specific representations. The FDN searches for characteristics and commonalities between the two tasks to synergize detection and ReID. In addition, existing one-shot trackers struggle to locate small targets in UAV videos. Therefore, we design a pyramid transformer encoder (PTE) to enrich the semantic information of the resulting detection-specific representations. By learning scale-aware fine-grained features, the PTE empowers our tracker to locate targets in UAV videos accurately. Extensive experiments on VisDrone2021 and UAVDT benchmarks demonstrate that our tracker achieves state-of-the-art tracking performance.

Journal Article

Share this book

Add to My Shelf

Learning task-specific discriminative representations for multiple object tracking

by He, Zhiwei , Zhu, Ziming , Wu, Han in Accuracy , Artificial Intelligence , Cognitive tasks

2023

One-shot multiple object tracking (MOT), which learns object detection and identity embedding in a unified network, has attracted increasing attention due to its low complexity and high tracking speed. However, most one-shot trackers ignore that detection and re-identification (ReID) require different representations of features. The inherent difference between these two subtasks leads to optimization contradictions in the training procedure. This issue would result in suboptimal tracking performance. To alleviate this contradiction, we propose a novel dual-path transformation network (DTN) that decouples the shared features into detection-specific and ReID-specific representations. By learning task-specific features, this module satisfies the different requirements of both subtasks. Moreover, we observe that previous trackers generally utilize local information to distinguish targets and ignore global semantic relations, which are crucial for tracking. Therefore, we design a pyramid non-local network (PNN) that allows our network to explore pixel-to-pixel relations with a global receptive field. Meanwhile, PNN considers the scale information to enhance the robustness to scale variations. Extensive experiments conducted on three benchmarks, i.e., MOT16, MOT17, and MOT20, demonstrate the superiority of our tracker, namely DPTrack. The experimental results reveal that DPTrack achieves state-of-the-art performance, e.g., MOTA of 77.1 % and IDF1 of 74.9 % on MOT17. Moreover, DPTrack runs at 14.9FPS, and our lightweight version runs at 26.6FPS with only a slight performance decay.

Journal Article

Share this book

Add to My Shelf

Artificial intelligence in medical imaging: From task-specific models to large-scale foundation models

by Li, Jin , Bian, Yueyan , Jia, Xiuqin in Accuracy , Artificial Intelligence , Automation

2025

Abstract Artificial intelligence (AI), particularly deep learning, has demonstrated remarkable performance in medical imaging across a variety of modalities, including X-ray, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET), and pathological imaging. However, most existing state-of-the-art AI techniques are task-specific and focus on a limited range of imaging modalities. Compared to these task-specific models, emerging foundation models represent a significant milestone in AI development. These models can learn generalized representations of medical images and apply them to downstream tasks through zero-shot or few-shot fine-tuning. Foundation models have the potential to address the comprehensive and multifactorial challenges encountered in clinical practice. This article reviews the clinical applications of both task-specific and foundation models, highlighting their differences, complementarities, and clinical relevance. We also examine their future research directions and potential challenges. Unlike the replacement relationship seen between deep learning and traditional machine learning, task-specific and foundation models are complementary, despite inherent differences. While foundation models primarily focus on segmentation and classification, task-specific models are integrated into nearly all medical image analyses. However, with further advancements, foundation models could be applied to other clinical scenarios. In conclusion, all indications suggest that task-specific and foundation models, especially the latter, have the potential to drive breakthroughs in medical imaging, from image processing to clinical workflows.

Journal Article

Share this book

Add to My Shelf