Catalogue Search | MBRL

DeCon-Net: decoupled hierarchical contrast for soccer object detection

by Ouyang, Qingya , Li, Qingyuan , Du, Tao in 639/166 , 639/705 , Algorithms

2026

Soccer video analysis has significant application value in sports broadcasting, tactical research, and athlete training, with accurate object detection serving as the key foundation for automated analysis. Soccer object detection typically improves performance through enhanced feature representation and optimized network architectures, but these methods assume that models can automatically learn discriminative features of targets. Through experiments, we reveal the “feature collapse” phenomenon in soccer detection, where features of players from the same team are excessively clustered in high-dimensional space, and soccer ball features degenerate to near background noise. Furthermore, existing methods lack progressive feature evolution mechanisms, resulting in insufficient discriminative capability when handling dense scenes. To address these issues, we propose DeCon-Net, which contains a Decoupled Feature Learning Module (DFLM) and a Hierarchical Contrastive Constraint Module (HCCM). Specifically, DFLM designs dual-stream encoders to extract appearance features and identity features separately, forcing the identity stream to learn truly discriminative representations through mutual exclusivity constraints. HCCM adopts dynamic threshold contrastive learning, adaptively adjusting learning intensity based on feature distances between sample pairs, achieving progressive optimization from coarse to fine granularity. Experimental results demonstrate that DeCon-Net achieves significant performance improvements on the SportsMOT and SoccerNet-Tracking datasets, particularly showing substantial gains in ball detection.

Journal Article

Share this book

Add to My Shelf

SEMANTIC3D.NET: A NEW LARGE-SCALE POINT CLOUD CLASSIFICATION BENCHMARK

by Ladicky, L. , Schindler, K. , Hackel, T. in Artificial neural networks , Benchmarks , Castles

2017

This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.

Journal Article

Share this book

Add to My Shelf

A survey on soccer player detection and tracking with videos

by Yang, Meng , Jiang, Linlu , Li, Zhen in Artificial Intelligence , Computer Graphics , Computer Science

2025

Soccer is a popular sport, and there is a growing need for automated analysis of soccer videos, while the detection and tracking of the players is the indispensable prerequisite. In this paper, we first introduce and classify multi-object tracking and then present two mostly used multi-object tracking methods, DeepSort and TrackFormer. When multi-object tracking is applied to soccer scenarios, some preprocessing and post-processing are generally performed, with preprocessing including processing of the video, such as splicing and background removing, and post-processing including further applications, such as player mapping for a 2D stadium. By directly employing the two methods above, we test the real scene and train TrackFormer to get further results. Meanwhile, in order to facilitate researchers who are interested in multi-object tracking as well as in the direction of player tracking, recent advances in preprocessing and processing methods for soccer player tracking are given and future research directions are suggested.

Journal Article

Share this book

Add to My Shelf

FoT: an efficient transformer framework for real-time small object detection in football videos

by Zhang, Wentao , Yang, Yaocong in 639/705/1042 , 639/705/117 , Accuracy

2025

Football videos are playing an increasingly important role in event analysis and tactical evaluation within computer vision. Traditional object detection methods, relying on region proposals and anchor generation, struggle to balance real-time performance and accuracy in complex scenarios such as multi-view, motion blur, and small object recognition. Meanwhile, Transformer-based methods face challenges in capturing fine-grained target information due to their high computational cost and slow training convergence. To address these problems, we propose a novel end-to-end detection framework–Football Transformer (FoT). By introducing the Local Interaction Aggregation Unit (LIAU) and Multi-Scale Feature Interaction Module (MFIM), FoT achieves an efficient balance between global semantic expression and local detail capture. Specifically, LIAU reduces the self-attention computation complexity from to O ( N ) through feature aggregation within local windows and a window offset mechanism. MFIM strengthens the collaborative expression of low-level details and high-level semantics through multi-scale feature alignment and progressive fusion, effectively integrating low-level details and high-level semantics, significantly improving small object detection performance. Experimental results show that FoT achieves a 3.0% mAP improvement over the best baseline on the Soccer-Det dataset and a 1.3% gain on the FIFA-Vid dataset, while maintaining real-time inference speed. These results validate the effectiveness and robustness of the proposed method under complex football video scenarios.

Journal Article

Share this book

Add to My Shelf

An Improved YOLOv11n Model Based on Wavelet Convolution for Object Detection in Soccer Scenes

by Wu, Chao , Geng, Lanxin , Wu, Yue in Algorithms , Attention , Convolution

2025

Object detection in soccer scenes serves as a fundamental task for soccer video analysis and target tracking. This paper proposes WCC-YOLO, a symmetry-enhanced object detection framework based on YOLOv11n. Our approach integrates symmetry principles at multiple levels: (1) The novel C3k2-WTConv module synergistically combines conventional convolution with wavelet decomposition, leveraging the orthogonal symmetry of Haar wavelet quadrature mirror filters (QMFs) to achieve balanced frequency-domain decomposition and enhance multi-scale feature representation. (2) The Channel Prior Convolutional Attention (CPCA) mechanism incorporates symmetrical operations—using average-max pooling pairs in channel attention and multi-scale convolutional kernels in spatial attention—to automatically learn to prioritize semantically salient regions through channel-wise feature recalibration, thereby enabling balanced feature representation. Coupled with InnerShape-IoU for refined bounding box regression, WCC-YOLO achieves a 4.5% improvement in mAP@0.5:0.95 and a 5.7% gain in mAP@0.5 compared to the baseline YOLOv11n while simultaneously reducing the number of parameters and maintaining near-identical inference latency (δ < 0.1 ms). This work demonstrates the value of explicit symmetry-aware modeling for sports analytics.

Journal Article

Share this book

Add to My Shelf

Bin-Picking for Planar Objects Based on a Deep Learning Network: A Case Study of USB Packs

by Lin, Chyi-Yeu , Le, Tuan-Tang in Artificial intelligence , Automation , Calibration

2019

Random bin-picking is a prominent, useful, and challenging industrial robotics application. However, many industrial and real-world objects are planar and have oriented surface points that are not sufficiently compact and discriminative for those methods using geometry information, especially depth discontinuities. This study solves the above-mentioned problems by proposing a novel and robust solution for random bin-picking for planar objects in a cluttered environment. Different from other research that has mainly focused on 3D information, this study first applies an instance segmentation-based deep learning approach using 2D image data for classifying and localizing the target object while generating a mask for each instance. The presented approach, moreover, serves as a pioneering method to extract 3D point cloud data based on 2D pixel values for building the appropriate coordinate system on the planar object plane. The experimental results showed that the proposed method reached an accuracy rate of 100% for classifying two-sided objects in the unseen dataset, and 3D appropriate pose prediction was highly effective, with average translation and rotation errors less than 0.23 cm and 2.26°, respectively. Finally, the system success rate for picking up objects was over 99% at an average processing time of 0.9 s per step, fast enough for continuous robotic operation without interruption. This showed a promising higher successful pickup rate compared to previous approaches to random bin-picking problems. Successful implementation of the proposed approach for USB packs provides a solid basis for other planar objects in a cluttered environment. With remarkable precision and efficiency, this study shows significant commercialization potential.

Journal Article

Share this book

Add to My Shelf

Flexible coding of object motion in multiple reference frames by parietal cortex neurons

by Anzai Akiyuki , Angelaki, Dora E , DeAngelis, Gregory C in Alliances , Cortex (parietal) , Monkeys

2020

Neurons represent spatial information in diverse reference frames, but it remains unclear whether neural reference frames change with task demands and whether these changes can account for behavior. In this study, we examined how neurons represent the direction of a moving object during self-motion, while monkeys switched, from trial to trial, between reporting object direction in head- and world-centered reference frames. Self-motion information is needed to compute object motion in world coordinates but should be ignored when judging object motion in head coordinates. Neural responses in the ventral intraparietal area are modulated by the task reference frame, such that population activity represents object direction in either reference frame. In contrast, responses in the lateral portion of the medial superior temporal area primarily represent object motion in head coordinates. Our findings demonstrate a neural representation of object motion that changes with task requirements.Sasaki et al. demonstrate that neurons in the macaque parietal cortex (ventral intraparietal area) flexibly represent object motion in either a head-centered or world-centered reference frame depending on the requirements of the task.

Journal Article

Share this book

Add to My Shelf

Review of medical image recognition technologies to detect melanomas using neural networks

by Ignatev, Alexander , Koshechkin, Konstantin , Efimenko, Mila in Accuracy , Algorithms , Analysis

2020

Background Melanoma is one of the most aggressive types of cancer that has become a world-class problem. According to the World Health Organization estimates, 132,000 cases of the disease and 66,000 deaths from malignant melanoma and other forms of skin cancer are reported annually worldwide ( https://apps.who.int/gho/data/?theme=main ) and those numbers continue to grow. In our opinion, due to the increasing incidence of the disease, it is necessary to find new, easy to use and sensitive methods for the early diagnosis of melanoma in a large number of people around the world. Over the last decade, neural networks show highly sensitive, specific, and accurate results. Objective This study presents a review of PubMed papers including requests «melanoma neural network» and «melanoma neural network dermatoscopy». We review recent researches and discuss their opportunities acceptable in clinical practice. Methods We searched the PubMed database for systematic reviews and original research papers on the requests «melanoma neural network» and «melanoma neural network dermatoscopy» published in English. Only papers that reported results, progress and outcomes are included in this review. Results We found 11 papers that match our requests that observed convolutional and deep-learning neural networks combined with fuzzy clustering or World Cup Optimization algorithms in analyzing dermatoscopic images. All of them require an ABCD (asymmetry, border, color, and differential structures) algorithm and its derivates (in combination with ABCD algorithm or separately). Also, they require a large dataset of dermatoscopic images and optimized estimation parameters to provide high specificity, accuracy and sensitivity. Conclusions According to the analyzed papers, neural networks show higher specificity, accuracy and sensitivity than dermatologists. Neural networks are able to evaluate features that might be unavailable to the naked human eye. Despite that, we need more datasets to confirm those statements. Nowadays machine learning becomes a helpful tool in early diagnosing skin diseases, especially melanoma.

Journal Article

Share this book

Add to My Shelf

Ball Detection Using Deep Learning Implemented on an Educational Robot Based on Raspberry Pi

by Sovic Krzic, Ana , Matić, Jakov , Keča, Dominik in Accuracy , Algorithms , Aluminum

2023

RoboCupJunior is a project-oriented competition for primary and secondary school students that promotes robotics, computer science and programing. Through real life scenarios, students are encouraged to engage in robotics in order to help people. One of the popular categories is Rescue Line, in which an autonomous robot has to find and rescue victims. The victim is in the shape of a silver ball that reflects light and is electrically conductive. The robot should find the victim and place it in the evacuation zone. Teams mostly detect victims (balls) using random walk or distant sensors. In this preliminary study, we explored the possibility of using a camera, Hough transform (HT) and deep learning methods for finding and locating balls with the educational mobile robot Fischertechnik with Raspberry Pi (RPi). We trained, tested and validated the performance of different algorithms (convolutional neural networks for object detection and U-NET architecture for sematic segmentation) on a handmade dataset made of images of balls in different light conditions and surroundings. RESNET50 was the most accurate, and MOBILENET_V3_LARGE_320 was the fastest object detection method, while EFFICIENTNET-B0 proved to be the most accurate, and MOBILENET_V2 was the fastest semantic segmentation method on the RPi. HT was by far the fastest method, but produced significantly worse results. These methods were then implemented on a robot and tested in a simplified environment (one silver ball with white surroundings and different light conditions) where HT had the best ratio of speed and accuracy (4.71 s, DICE 0.7989, IoU 0.6651). The results show that microcomputers without GPUs are still too weak for complicated deep learning algorithms in real-time situations, although these algorithms show much higher accuracy in complicated environment situations.

Journal Article

Share this book

Add to My Shelf

The Eye in the Sky—A Method to Obtain On-Field Locations of Australian Rules Football Athletes

by Mundt, Marion , Mian, Ajmal , Alderson, Jacqueline in Accuracy , Algorithms , Artificial neural networks

2024

The ability to overcome an opposition in team sports is reliant upon an understanding of the tactical behaviour of the opposing team members. Recent research is limited to a performance analysts’ own playing team members, as the required opposing team athletes’ geolocation (GPS) data are unavailable. However, in professional Australian rules Football (AF), animations of athlete GPS data from all teams are commercially available. The purpose of this technical study was to obtain the on-field location of AF athletes from animations of the 2019 Australian Football League season to enable the examination of the tactical behaviour of any team. The pre-trained object detection model YOLOv4 was fine-tuned to detect players, and a custom convolutional neural network was trained to track numbers in the animations. The object detection and the athlete tracking achieved an accuracy of 0.94 and 0.98, respectively. Subsequent scaling and translation coefficients were determined through solving an optimisation problem to transform the pixel coordinate positions of a tracked player number to field-relative Cartesian coordinates. The derived equations achieved an average Euclidean distance from the athletes’ raw GPS data of 2.63 m. The proposed athlete detection and tracking approach is a novel methodology to obtain the on-field positions of AF athletes in the absence of direct measures, which may be used for the analysis of opposition collective team behaviour and in the development of interactive play sketching AF tools.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter