Catalogue Search | MBRL

FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking

by Wang Xinggang , Zeng Wenjun , Zhang, Yifu in Accuracy , Computer vision , Datasets

2021

Multi-object tracking (MOT) is an important problem in computer vision which has a wide range of applications. Formulating MOT as multi-task learning of object detection and re-ID in a single network is appealing since it allows joint optimization of the two tasks and enjoys high computation efficiency. However, we find that the two tasks tend to compete with each other which need to be carefully addressed. In particular, previous works usually treat re-ID as a secondary task whose accuracy is heavily affected by the primary detection task. As a result, the network is biased to the primary detection task which is not fair to the re-ID task. To solve the problem, we present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. Note that it is not a naive combination of CenterNet and re-ID. Instead, we present a bunch of detailed designs which are critical to achieve good tracking results by thorough empirical studies. The resulting approach achieves high accuracy for both detection and tracking. The approach outperforms the state-of-the-art methods by a large margin on several public datasets. The source code and pre-trained models are released at https://github.com/ifzhang/FairMOT.

Journal Article

Share this book

Add to My Shelf

Comparison of Object Detection and Patch-Based Classification Deep Learning Models on Mid- to Late-Season Weed Detection in UAV Imagery

by Shi, Yeyin , Scott, Stephen , Veeranampalayam Sivakumar, Arun Narenthiran in algorithms , altitude , Faster RCNN

2020

Mid- to late-season weeds that escape from the routine early-season weed management threaten agricultural production by creating a large number of seeds for several future growing seasons. Rapid and accurate detection of weed patches in field is the first step of site-specific weed management. In this study, object detection-based convolutional neural network models were trained and evaluated over low-altitude unmanned aerial vehicle (UAV) imagery for mid- to late-season weed detection in soybean fields. The performance of two object detection models, Faster RCNN and the Single Shot Detector (SSD), were evaluated and compared in terms of weed detection performance using mean Intersection over Union (IoU) and inference speed. It was found that the Faster RCNN model with 200 box proposals had similar good weed detection performance to the SSD model in terms of precision, recall, f1 score, and IoU, as well as a similar inference time. The precision, recall, f1 score and IoU were 0.65, 0.68, 0.66 and 0.85 for Faster RCNN with 200 proposals, and 0.66, 0.68, 0.67 and 0.84 for SSD, respectively. However, the optimal confidence threshold of the SSD model was found to be much lower than that of the Faster RCNN model, which indicated that SSD might have lower generalization performance than Faster RCNN for mid- to late-season weed detection in soybean fields using UAV imagery. The performance of the object detection model was also compared with patch-based CNN model. The Faster RCNN model yielded a better weed detection performance than the patch-based CNN with and without overlap. The inference time of Faster RCNN was similar to patch-based CNN without overlap, but significantly less than patch-based CNN with overlap. Hence, Faster RCNN was found to be the best model in terms of weed detection performance and inference time among the different models compared in this study. This work is important in understanding the potential and identifying the algorithms for an on-farm, near real-time weed detection and management.

Journal Article

Share this book

Add to My Shelf

Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model

by Aly, Mohammed , Fathi, Islam S. in 631/553/117 , 631/553/2695 , 631/61

2025

Gesture recognition plays a vital role in computer vision, especially for interpreting sign language and enabling human–computer interaction. Many existing methods struggle with challenges like heavy computational demands, difficulty in understanding long-range relationships, sensitivity to background noise, and poor performance in varied environments. While CNNs excel at capturing local details, they often miss the bigger picture. Vision Transformers, on the other hand, are better at modeling global context but usually require significantly more computational resources, limiting their use in real-time systems. To tackle these issues, we propose a Hybrid Transformer-CNN model that combines the strengths of both architectures. Our approach begins with CNN layers that extract detailed local features from both the overall hand and specific hand regions. These CNN features are then refined by a Vision Transformer module, which captures long-range dependencies and global contextual information within the gesture. This integration allows the model to effectively recognize subtle hand movements while maintaining computational efficiency. Tested on the ASL Alphabet dataset, our model achieves a high accuracy of 99.97%, runs at 110 frames per second, and requires only 5.0 GFLOPs—much less than traditional Vision Transformer models, which need over twice the computational power. Central to this success is our feature fusion strategy using element-wise multiplication, which helps the model focus on important gesture details while suppressing background noise. Additionally, we employ advanced data augmentation techniques and a training approach incorporating contrastive learning and domain adaptation to boost robustness. Overall, this work offers a practical and powerful solution for gesture recognition, striking an optimal balance between accuracy, speed, and efficiency—an important step toward real-world applications.

Journal Article

Share this book

Add to My Shelf

A novel genetic algorithm-based approach for compression and acceleration of deep learning convolution neural network: an application in computer tomography lung cancer data

by Skandha, Sanagala S. , Utkarsh, Kumar , Koppula, Vijaya K. in Artificial Intelligence , Artificial neural networks , Computational Biology/Bioinformatics

2022

Deep learning (DL) models are computationally expensive in space and time, which makes it difficult to deploy DL models in edge computing devices, such as Raspberry-Pi or Jetson Nano. The current strategy uses genetic algorithm (GA), which compresses the deep convolution neural network models without compromising performance. GA was applied by converting the CNN layers into binary vectors. Further, the fitness function in GA was computed based on (i) the minimization of hidden units and (ii) test accuracy. The GA-based strategy was applied on different pre-trained architectures, namely AlexNet, VGG16, SqueezeNet, and ResNet50, respectively, by using three kinds of datasets, namely MNIST, CIFAR-10, and CIFAR-100. The proposed approach demonstrated the reduction in the storage space of AlexNet by 87.62%, 80.97%, and 86.20% corresponding to the datasets MNIST, CIFAR-10, and CIFAR-100, respectively. Further, for the same three datasets, namely VGG16, ResNet50, and SqueezeNet, the system average compression was 91.15%, 78.42%, and 38.40%, respectively. In addition to that, the inference time of the models using proposed strategy was significantly improved with an average of the four datasets of ~ 35.61%, 9.23%, 73.76%, and 79.93% corresponding to AlexNet, SqueezeNet, ResNet50, and VGG16 models. Further, our method when applied to the proposed CNN using the LIDC-IDRI dataset showed a 90.3% reduction in the storage space and inference time. DL system when optimized using GA shows improved performance in both storage and execution time.

Journal Article

Share this book

Add to My Shelf

YOLIP: An Enhanced Framework for UAV-Assisted Wildlife Monitoring Based on YOLO Integrated with the CLIP Model

by Zhang, Leyan , Hu, Ruiheng , Pi, Hao in Analysis , cross-modal learning , Drone aircraft

2026

UAV-based wildlife monitoring encounters tremendous challenges posed by complex environments, such as the extremely low proportion of effective targets in aerial images and variations in remote sensing scales. This paper presents a novel fusion framework named YOLIP, which integrates a detection head with semantic perception capabilities and an implicit feature adjustment module to boost detection accuracy and feature representation ability. Specifically, this paper redesigns the detection head to enable it to simultaneously learn spatial positioning and semantic features, thereby achieving more reliable extraction of regional features. The implicit feature modulation module introduces a dual-path fusion mechanism, which elevates the feature quality through geometric-semantic fusion, thereby improving the consistency and robustness of the detection. Furthermore, this paper also develops an asynchronous scheduling strategy, which can selectively execute computationally intensive operations to achieve computational optimization, enabling this framework to adapt to actual detection scenarios based on unmanned aerial vehicles. In this study, we conducted numerous experiments on the self-built drone wildlife dataset as well as the publicly available aerial wildlife dataset. Theresults demonstrate that compared with existing detection models, YOLIP improves mAP@0.5 by 11.6% while maintaining an efficient inference speed, achieving an improvement in detection performance. In addition, cross-dataset evaluation verifies the stable performance and generalization capability of the proposed method across multiple real-world scenarios.

Journal Article

Share this book

Add to My Shelf

Federated learning system on autonomous vehicles for lane segmentation

by Darweesh, M. Saeed , Abd El-Hafez, Mohamed T. , Yousri, Retaj in 639/166 , 639/166/987 , Automation

2024

Autonomous Vehicles (AV) is one of the most evolving industries in the last decade. However, one of the bottlenecks of this evolution is providing data that contains different scenarios and scenes to improve the models without exposing the privacy and security of the edge vehicles. The authors of this research propose a secure and efficient novel solution for lane segmentation in AVs through the use of Federated Learning (FL). FedLane involves initial training of U-Net, ResUNet, and ResUNet++ models, followed by real-time inference in edge devices and the application of FL to update the server model using clients’ data. The study found that FL has enhanced the performance of lane segmentation significantly over baseline, enabling decentralized privacy-preserving collaborative optimization with increased dice coef from 0.9429 to 0.9794 for U-Net, from 0.9291 to 0.9854 for ResUNet and from 0.9079 to 0.9675 for ResUNet++. Additionally, the models show increased stability over the training iterations, highlighting the potential of FL to play a significant role in the future of automation in the AV industry.

Journal Article

Share this book

Add to My Shelf

Edge-priority-extraction network using re-parameterization for real-time super-resolution

by Ying, Wen-yuan , Dong, Tian-yang , Fan, Jing in Artificial Intelligence , Computer Graphics , Computer Science

2024

Recently, super-resolution (SR) has achieved superior performance with the development of deep learning. However, previous methods usually require considerable computational resources with a large model size, which hinders practical applications. To achieve real-time inference and high quality for SR, this paper presents an edge-priority-extraction network, which is constructed with our proposed edge-priority blocks (EPB). The EPB utilizes multiple branches with edge information to further improve the network representation. Moreover, it can be re-parameterized for efficient inference. For more effective utilization of edge information, this paper also proposes the mix-priority filter with edge extraction of horizontal and vertical priorities to improve the network performance. The filters can adaptively extract the edge information with multi-direction derivatives. The experimental results show that our models can use less computational cost to meet the real-time demand and have a better SR performance than the recent real-time SR models.

Journal Article

Share this book

Add to My Shelf

Fast-MFQE: A Fast Approach for Multi-Frame Quality Enhancement on Compressed Video

by Zeng, Huanqiang , Shen, Xueyuan , Chen, Kemi in Bandwidths , Coding standards , compressed video enhancement

2023

For compressed images and videos, quality enhancement is essential. Though there have been remarkable achievements related to deep learning, deep learning models are too large to apply to real-time tasks. Therefore, a fast multi-frame quality enhancement method for compressed video, named Fast-MFQE, is proposed to meet the requirement of video-quality enhancement for real-time applications. There are three main modules in this method. One is the image pre-processing building module (IPPB), which is used to reduce redundant information of input images. The second one is the spatio-temporal fusion attention (STFA) module. It is introduced to effectively merge temporal and spatial information of input video frames. The third one is the feature reconstruction network (FRN), which is developed to effectively reconstruct and enhance the spatio-temporal information. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods in terms of lightweight parameters, inference speed, and quality enhancement performance. Even at a resolution of 1080p, the Fast-MFQE achieves a remarkable inference speed of over 25 frames per second, while providing a PSNR increase of 19.6% on average when QP = 37.

Journal Article

Share this book

Add to My Shelf

Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection

by Lai, Jian-Huang , Xie, Xiaohua , Chen, Zixuan in Affine transformations , Anomalies , Artificial Intelligence

2025

Anomaly detectors are widely used in industrial manufacturing to detect and localize unknown defects in query images. These detectors are trained on anomaly-free samples and have successfully distinguished anomalies from most normal samples. However, hard-normal examples are scattered and far apart from most normal samples, and thus they are often mistaken for anomalies by existing methods. To address this issue, we propose H ard-normal E xample-aware T emplate M utual M atching (HETMM), an efficient framework to build a robust prototype-based decision boundary. Specifically, HETMM employs the proposed A ffine-invariant T emplate M utual M atching (ATMM) to mitigate the affection brought by the affine transformations and easy-normal examples. By mutually matching the pixel-level prototypes within the patch-level search spaces between query and template set, ATMM can accurately distinguish between hard-normal examples and anomalies, achieving low false-positive and missed-detection rates. In addition, we also propose PTS to compress the original template set for speed-up. PTS selects cluster centres and hard-normal examples to preserve the original decision boundary, allowing this tiny set to achieve comparable performance to the original one. Extensive experiments demonstrate that HETMM outperforms state-of-the-art methods, while using a 60-sheet tiny set can achieve competitive performance and real-time inference speed (around 26.1 FPS) on a Quadro 8000 RTX GPU. HETMM is training-free and can be hot-updated by directly inserting novel samples into the template set, which can promptly address some incremental learning issues in industrial manufacturing.

Journal Article

Share this book

Add to My Shelf

Hallucination-aware learning and latency optimization transformer (HALL-OPT) for real-time edge intelligence

by Algawiaz, Danah in 639/166 , 639/705 , Accuracy

2026

Transformer architectures and large language models remain competitive across a broad range of AI tasks, making them challenging to deploy in resource-constrained edge computing environments due to high resource demands and the generation of erroneous or fake outputs (hallucinations). In this paper, a single scheme, HALL-OPT, is proposed to address both latency detection and reduction in hallucination for real-time edge intelligence. The paper presents three main elements of the framework, namely, (1) a dual-stream hallucination detector that analyses internal attention behaviour, (2) an adaptive token-pruning system, which decodes and extracts the necessary context at minimal computation, and (3) a lightweight edge-optimized transformer obtained by knowledge distillation. On SQuAD 2.0 and CNN/DailyMail, HALL-OPT detects hallucinations accurately at 94.3% and achieves a 67.8% reduction in inference latency with only a 2.1% decrease in accuracy compared to the BERT-base model. The system (when deployed on edge hardware) provides sub-50 ms response times while consuming 43% less energy. It is appropriate for real-time applications in industrial IoT, autonomous systems, healthcare monitoring, and other applications where low latency is critical. Existing transformer optimisation and hallucination mitigation approaches treat reliability and Efficiency as separate objectives, limiting their applicability in real-time edge environments. HALL-OPT uniquely integrates hallucination-aware attention, adaptive pruning, and edge-oriented optimisation into a single unified framework, enabling simultaneous reductions in hallucination, latency, and energy consumption. This integrated design distinguishes HALL-OPT from prior work that optimises accuracy or Efficiency in isolation.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter