Catalogue Search | MBRL

Heterogeneous Feature Fusion for Improving Performance of Action Detection

by Li, Kai , Takahashi, Katsuhiko , Iwamoto, Kota in Moving object recognition

2024

We present a novel framework aimed at improving video action detection through the integration of heterogeneous features. Conventional action detection methods which focus on modeling the relationships between person/object instances rely exclusively on video features and do not exploit valuable intra-instance heterogeneous features, such as person pose, positional information or object category, that can support action recognition. Our proposed framework, termed Heterogeneous Feature Fusion (HFF) framework, addresses this limitation by integrating such intra-instance heterogeneous features for person/object instances, and can improve existing action detection methods. To efficiently exploit each heterogeneous feature, which vary in importance depending on actions and/or scenes, we introduce an attention mechanism to dynamically enhance important heterogeneous features within an instance. Experiments on JHMDB and AVA v2.2 datasets show that our HFF significantly enhances the action detection performance of two existing methods.

Journal Article

Share this book

Add to My Shelf

Moving object detection in dynamic background

by Ding, Xiaoyun in Moving object recognition , Physics

2021

The application of RPCA model in moving object detection can accurately extract the moving foreground, but the effect of the model is not ideal under complex dynamic background conditions. Based on this, this paper proposes an improved RPCA model based on rank –1 regulation and 3D-TV. The improved model uses regulation term to describe the low rank of video background, 3D-TV to constrain the spatiotemporal continuity of moving objects, and F-norm to eliminate the dynamic interference in video background. The experimental results show that the improved model proposed in this paper can effectively deal with the complex dynamic background and obtain a complete foreground object.

Journal Article

Share this book

Add to My Shelf

A Review of Machine Learning and Deep Learning for Object Detection, Semantic Segmentation, and Human Action Recognition in Machine and Robotic Vision

by Manakitsa, Nikoleta , Fragulis, George F. , Maraslidis, George S. in Algorithms , Artificial intelligence , Autonomous vehicles

2024

Machine vision, an interdisciplinary field that aims to replicate human visual perception in computers, has experienced rapid progress and significant contributions. This paper traces the origins of machine vision, from early image processing algorithms to its convergence with computer science, mathematics, and robotics, resulting in a distinct branch of artificial intelligence. The integration of machine learning techniques, particularly deep learning, has driven its growth and adoption in everyday devices. This study focuses on the objectives of computer vision systems: replicating human visual capabilities including recognition, comprehension, and interpretation. Notably, image classification, object detection, and image segmentation are crucial tasks requiring robust mathematical foundations. Despite the advancements, challenges persist, such as clarifying terminology related to artificial intelligence, machine learning, and deep learning. Precise definitions and interpretations are vital for establishing a solid research foundation. The evolution of machine vision reflects an ambitious journey to emulate human visual perception. Interdisciplinary collaboration and the integration of deep learning techniques have propelled remarkable advancements in emulating human behavior and perception. Through this research, the field of machine vision continues to shape the future of computer systems and artificial intelligence applications.

Journal Article

Share this book

Add to My Shelf

A Survey on Contrastive Self-Supervised Learning

by Jaiswal, Ashish , Babu, Ashwin Ramesh , Banerjee, Debapriya in Annotations , Classification , Computer vision

2021

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.

Journal Article

Share this book

Add to My Shelf

New optical method for studying a magnetic track from a moving object

by Glinuchkin, A P , Rud', V Yu , Logunov, S E in Ferromagnetism , Moving object recognition , Physics

2019

A new method for detecting a magnetic track from a moving magnetic object is presented. A technique has been developed to study the nature of changes in the magnetic field in a magnetic track using ferromagnetic fluid. The results of experimental studies are presented.

Journal Article

Share this book

Add to My Shelf

All-in-one two-dimensional retinomorphic hardware device for motion detection and recognition

by Hu, Weida , Liu, Chunsen , Xie, Runzhang in 639/166/987 , 639/301/1005/1007 , Chemistry and Materials Science

2022

With the advent of the Internet of Things era, the detection and recognition of moving objects is becoming increasingly important 1 . The current motion detection and recognition (MDR) technology based on the complementary metal oxide semiconductor (CMOS) image sensors (CIS) platform contains redundant sensing, transmission conversion, processing and memory modules, rendering the existing systems bulky and inefficient in comparison to the human retina. Until now, non-memory capable vision sensors have only been used for static targets, rather than MDR. Here, we present a retina-inspired two-dimensional (2D) heterostructure based retinomorphic hardware device with all-in-one perception, memory and computing capabilities for the detection and recognition of moving trolleys. The proposed 2D retinomorphic device senses an optical stimulus to generate progressively tuneable positive/negative photoresponses and memorizes it, combined with interframe differencing computations, to achieve 100% separation detection of moving trichromatic trolleys without ghosting. The detected motion images are fed into a conductance mapped neural network to achieve fast trolley recognition in as few as four training epochs at 10% noise level, outperforming previous results from similar customized datasets. The prototype demonstration of a 2D retinomorphic device with integrated perceptual memory and computation provides the possibility of building compact, efficient MDR hardware. A retina-inspired two-dimensional material based retinomorphic device exhibits all-in-one perception, memory and computing capabilities for motion detection and recognition.

Journal Article

Share this book

Add to My Shelf

Multi-input CNN-GRU based human activity recognition using wearable sensors

by Dua Nidhi , Semwal Vijay Bhaskar , Singh, Shiva Nand in Accelerometers , Artificial neural networks , Classification

2021

Human Activity Recognition (HAR) has attracted much attention from researchers in the recent past. The intensification of research into HAR lies in the motive to understand human behaviour and inherently anticipate human intentions. Human activity data obtained via wearable sensors like gyroscope and accelerometer is in the form of time series data, as each reading has a timestamp associated with it. For HAR, it is important to extract the relevant temporal features from raw sensor data. Most of the approaches for HAR involves a good amount of feature engineering and data pre-processing, which in turn requires domain expertise. Such approaches are time-consuming and are application-specific. In this work, a Deep Neural Network based model, which uses Convolutional Neural Network, and Gated Recurrent Unit is proposed as an end-to-end model performing automatic feature extraction and classification of the activities as well. The experiments in this work were carried out using the raw data obtained from wearable sensors with nominal pre-processing and don’t involve any handcrafted feature extraction techniques. The accuracies obtained on UCI-HAR, WISDM, and PAMAP2 datasets are 96.20%, 97.21%, and 95.27% respectively. The results of the experiments establish that the proposed model achieved superior classification performance than other similar architectures.

Journal Article

Share this book

Add to My Shelf

Zero-Shot Visual Recognition via Bidirectional Latent Embedding

by Wang, Qian , Chen, Ke in Artificial Intelligence , Computer Imaging , Computer Science

2017

Zero-shot learning for visual recognition, e.g., object and action recognition, has recently attracted a lot of attention. However, it still remains challenging in bridging the semantic gap between visual features and their underlying semantics and transferring knowledge to semantic categories unseen during learning. Unlike most of the existing zero-shot visual recognition methods, we propose a stagewise bidirectional latent embedding framework of two subsequent learning stages for zero-shot visual recognition. In the bottom–up stage, a latent embedding space is first created by exploring the topological and labeling information underlying training data of known classes via a proper supervised subspace learning algorithm and the latent embedding of training data are used to form landmarks that guide embedding semantics underlying unseen classes into this learned latent space. In the top–down stage, semantic representations of unseen-class labels in a given label vocabulary are then embedded to the same latent space to preserve the semantic relatedness between all different classes via our proposed semi-supervised Sammon mapping with the guidance of landmarks. Thus, the resultant latent embedding space allows for predicting the label of a test instance with a simple nearest-neighbor rule. To evaluate the effectiveness of the proposed framework, we have conducted extensive experiments on four benchmark datasets in object and action recognition, i.e., AwA, CUB-200-2011, UCF101 and HMDB51. The experimental results under comparative studies demonstrate that our proposed approach yields the state-of-the-art performance under inductive and transductive settings.

Journal Article

Share this book

Add to My Shelf

Deep Learning Models for Real-time Human Activity Recognition with Smartphones

by Lianyong, Qi , Xu, Xiaolong , Tong, Chao in Accelerometers , Artificial neural networks , Data acquisition

2020

With the widespread application of mobile edge computing (MEC), MEC is serving as a bridge to narrow the gaps between medical staff and patients. Relatedly, MEC is also moving toward supervising individual health in an automatic and intelligent manner. One of the main MEC technologies in healthcare monitoring systems is human activity recognition (HAR). Built-in multifunctional sensors make smartphones a ubiquitous platform for acquiring and analyzing data, thus making it possible for smartphones to perform HAR. The task of recognizing human activity using a smartphone’s built-in accelerometer has been well resolved, but in practice, with the multimodal and high-dimensional sensor data, these traditional methods fail to identify complicated and real-time human activities. This paper designs a smartphone inertial accelerometer-based architecture for HAR. When the participants perform typical daily activities, the smartphone collects the sensory data sequence, extracts the high-efficiency features from the original data, and then obtains the user’s physical behavior data through multiple three-axis accelerometers. The data are preprocessed by denoising, normalization and segmentation to extract valuable feature vectors. In addition, a real-time human activity classification method based on a convolutional neural network (CNN) is proposed, which uses a CNN for local feature extraction. Finally, CNN, LSTM, BLSTM, MLP and SVM models are utilized on the UCI and Pamap2 datasets. We explore how to train deep learning methods and demonstrate how the proposed method outperforms the others on two large public datasets: UCI and Pamap2.

Journal Article

Share this book

Add to My Shelf

A Robust and Efficient Video Representation for Action Recognition

by Wang, Heng , Schmid, Cordelia , Oneata, Dan in Artificial Intelligence , Cameras , Computer Imaging

2016

This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words (BOW) histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to BOW encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter