Catalogue Search | MBRL

Intra- and inter-video context-aware action spotting

by Chen, Yi , Lin, Xinyang , Lin, Xiang in Action localization , Action spotting , Computer vision

2026

As a recently proposed task on video understanding, action spotting aims to locate and classify action in a long video clip and it can be applied for automatically generating video summaries and highlights. To resolve this problem, this article proposes a novel framework capable of capturing both intra- and inter-video contextual information. In particular, we present a transformer based model that views the task as a set prediction problem that aims to match the set of predicted action instances and the set of ground-truths. It is able to capture long-range intra-video temporal information and discovers causal relationships between actions. Next, based on the observation that actions of the same type recur in different videos, we propose to exploit the inter-video contextual information from dataset. To do so, we design an action memory module which stores the compact feature representation of each action class during training, so as to improve action recognition and localization performance. We evaluate our model on public benchmark and demonstrate that our model outperforms the state-of-the-art methods by a large margin.

Journal Article

Share this book

Add to My Shelf

Airborne optical and thermal remote sensing for wildfire detection and monitoring

by Craig, Gregory , Jennings, Sion , Allison, Robert S in airborne sensors , Aircraft , Automation

2016

NRC publication: Yes

Journal Article

Share this book

Add to My Shelf

Real-Time Hand Gesture Spotting and Recognition Using RGB-D Camera and 3D Convolutional Neural Network

by Tran, Dinh-Son , Ho, Ngoc-Huynh , Lee, Gueesang in 3dcnn , Accuracy , Cameras

2020

Using hand gestures is a natural method of interaction between humans and computers. We use gestures to express meaning and thoughts in our everyday conversations. Gesture-based interfaces are used in many applications in a variety of fields, such as smartphones, televisions (TVs), video gaming, and so on. With advancements in technology, hand gesture recognition is becoming an increasingly promising and attractive technique in human–computer interaction. In this paper, we propose a novel method for fingertip detection and hand gesture recognition in real-time using an RGB-D camera and a 3D convolution neural network (3DCNN). This system can accurately and robustly extract fingertip locations and recognize gestures in real-time. We demonstrate the accurateness and robustness of the interface by evaluating hand gesture recognition across a variety of gestures. In addition, we develop a tool to manipulate computer programs to show the possibility of using hand gesture recognition. The experimental results showed that our system has a high level of accuracy of hand gesture recognition. This is thus considered to be a good approach to a gesture-based interface for human–computer interaction by hand in the future.

Journal Article

Share this book

Add to My Shelf

Reading Text in the Wild with Convolutional Neural Networks

by Simonyan, Karen , Vedaldi, Andrea , Zisserman, Andrew in Algorithms , Artificial Intelligence , Boxes

2016

In this work we present an end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

Journal Article

Share this book

Add to My Shelf

Hough Transform-Based Angular Features for Learning-Free Handwritten Keyword Spotting

by Moon, Yoon Young , Geem, Zong Woo , Sarkar, Ram in Digitization , dynamic time warping , handwritten word

2021

Handwritten keyword spotting (KWS) is of great interest to the document image research community. In this work, we propose a learning-free keyword spotting method following query by example (QBE) setting for handwritten documents. It consists of four key processes: pre-processing, vertical zone division, feature extraction, and feature matching. The pre-processing step deals with the noise found in the word images, and the skewness of the handwritings caused by the varied writing styles of the individuals. Next, the vertical zone division splits the word image into several zones. The number of vertical zones is guided by the number of letters in the query word image. To obtain this information (i.e., number of letters in a query word image) during experimentation, we use the text encoding of the query word image. The user provides the information to the system. The feature extraction process involves the use of the Hough transform. The last step is feature matching, which first compares the features extracted from the word images and then generates a similarity score. The performance of this algorithm has been tested on three publicly available datasets: IAM, QUWI, and ICDAR KWS 2015. It is noticed that the proposed method outperforms state-of-the-art learning-free KWS methods considered here for comparison while evaluated on the present datasets. We also evaluate the performance of the present KWS model using state-of-the-art deep features and it is found that the features used in the present work perform better than the deep features extracted using InceptionV3, VGG19, and DenseNet121 models.

Journal Article

Share this book

Add to My Shelf

A simplified model to incorporate firebrand transport into coupled fire atmosphere models

by Filippi, Jean-Baptiste , Nguyen, Ha-Ninh , Alonso-Pinar, Alberto in Aerodynamic coefficients , Aerodynamics , Agricultural sciences

2025

BackgroundDuring wildfires, vegetation elements can be ignited and detached leading to the generation of firebrands. These firebrands can be lifted by the fire plume, transported far away from the main fire, and ignite new fires – a phenomenon known as fire spotting. Recently, numerical simulations of fire spotting using coupled fire–atmosphere models have provided insights on the role of different components of the phenomena such as fire intensity and turbulence. However, current fire propagation models do not account for long-range spotting distances.AimThis study aims to develop a medium and long-range firebrand transport model that can provide firebrand trajectories under the numerical and time constraints of a coupled fire–atmosphere model.MethodsA computationally efficient transport model for calculating firebrand transport is proposed. This model is evaluated against more complex models incorporating drag and lift coefficients, and combustion models.Key resultsThe reduced model accurately replicates firebrand landing patterns for both simple and complex topographies.ConclusionsThe proposed transport model represents firebrand landing patterns with a reduced computational time by a factor of 7, when compared to the more complex model.ImplicationsUsing the proposed model, spotting phenomena can be integrated within coupled fire–atmosphere models and thereby improve fire management.

Journal Article

Share this book

Add to My Shelf

DSNet: A End‐to‐End Scene Text Spotting Network With Dual‐Stream Feature Fusion

by Gao, Quanli , Wang, Xihan , Zhong, Mengjie in computer vision , deep learning , feature reconstruction

2025

End‐to‐end scene text spotting has attracted considerable academic interest in recent years. However, due to complex environmental factors, text recognition remains a formidable challenge. In this paper, we introduce an end‐to‐end scene text spotting framework, referred to as DSNet. This framework comprises two principal modules: the text feature enhancement module (TFEM) for enhancing text regions and the redundant feature suppression module (RFSM) for noise suppression. Within the TFEM, we have designed multiple transformer layers for feature encoding; these layers are utilized to extract and enhance the feature representation of the text region. In the RFSM, we have designed a spatial reconstruction unit (SRU) and a channel reconstruction unit (CRU); these units effectively suppress irrelevant information through the feature reconstruction process. The proposed framework jointly optimizes text features by operating the TFEM and RFSM in parallel. The fused features from both modules are subsequently input to the decoder, enabling precise text area localization and robust character recognition. Extensive experiments demonstrate that our model achieves competitive performance in end‐to‐end scene text spotting, attaining an F‐measure of 90.2% on ICDAR2015, closely approaching the state‐of‐the‐art (91.0%). We propose DSNet, an end‐to‐end scene text spotting framework that jointly enhances text features (via Transformer‐based TFEM) and suppresses noise (via RFSM with spatial/channel reconstruction). Through dual‐stream parallel optimization, DSNet achieves precise text detection and recognition, attaining 90.2% F‐measure on ICDAR2015—closely matching state‐of‐the‐art performance.

Journal Article

Share this book

Add to My Shelf

Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario

by Piczak, Karol J. , Nowak, Robert , Lepak, Łukasz in Acoustics , automatic speech recognition , Data Curation

2021

Models for keyword spotting in continuous recordings can significantly improve the experience of navigating vast libraries of audio recordings. In this paper, we describe the development of such a keyword spotting system detecting regions of interest in Polish call centre conversations. Unfortunately, in spite of recent advancements in automatic speech recognition systems, human-level transcription accuracy reported on English benchmarks does not reflect the performance achievable in low-resource languages, such as Polish. Therefore, in this work, we shift our focus from complete speech-to-text conversion to acoustic similarity matching in the hope of reducing the demand for data annotation. As our primary approach, we evaluate Siamese and prototypical neural networks trained on several datasets of English and Polish recordings. While we obtain usable results in English, our models’ performance remains unsatisfactory when applied to Polish speech, both after mono- and cross-lingual training. This performance gap shows that generalisation with limited training resources is a significant obstacle for actual deployments in low-resource languages. As a potential countermeasure, we implement a detector using audio embeddings generated with a generic pre-trained model provided by Google. It has a much more favourable profile when applied in a cross-lingual setup to detect Polish audio patterns. Nevertheless, despite these promising results, its performance on out-of-distribution data are still far from stellar. It would indicate that, in spite of the richness of internal representations created by more generic models, such speech embeddings are not entirely malleable to cross-language transfer.

Journal Article

Share this book

Add to My Shelf

Stability of Immobilized Chemosensor‐Filled Vesicles on Anti‐Fouling Polymer Brush Surfaces (Adv. Mater. Interfaces 21/2024)

by Hirtz, Michael , Xiao, Jiangxiong , Schäfer, Andreas H. in cucurbiturils , lipid vesicles , membrane permeability

2024

Vesicle Immobilization Vesicles filled with chemosensors are immobilized into regular arrays on a polymer brush antifouling surface. Analytes entering the vesicle can turn off the chemosensor fluorescence. In article 2400200, Michael Hirtz and co‐workers investigate the stability of these systems and their potential for membrane permeability assays.

Journal Article

Share this book

Add to My Shelf

Novel Speech Recognition Systems Applied to Forensics within Child Exploitation: Wav2vec2.0 vs. Whisper

by Vásquez-Correa, Juan Camilo , Álvarez Muniain, Aitor in Acoustics , Business-to-business market , Child

2023

The growth in online child exploitation material is a significant challenge for European Law Enforcement Agencies (LEAs). One of the most important sources of such online information corresponds to audio material that needs to be analyzed to find evidence in a timely and practical manner. That is why LEAs require a next-generation AI-powered platform to process audio data from online sources. We propose the use of speech recognition and keyword spotting to transcribe audiovisual data and to detect the presence of keywords related to child abuse. The considered models are based on two of the most accurate neural-based architectures to date: Wav2vec2.0 and Whisper. The systems were tested under an extensive set of scenarios in different languages. Additionally, keeping in mind that obtaining data from LEAs are very sensitive, we explore the use of federated learning to provide more robust systems for the addressed application, while maintaining the privacy of the data from LEAs. The considered models achieved a word error rate between 11% and 25%, depending on the language. In addition, the systems are able to recognize a set of spotted words with true-positive rates between 82% and 98%, depending on the language. Finally, federated learning strategies show that they can maintain and even improve the performance of the systems when compared to centralized trained models. The proposed systems set the basis for an AI-powered platform for automatic analysis of audio in the context of forensic applications of child abuse. The use of federated learning is also promising for the addressed scenario, where data privacy is an important issue to be managed.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter