Catalogue Search | MBRL

A Survey on Contrastive Self-Supervised Learning

by Jaiswal, Ashish , Babu, Ashwin Ramesh , Banerjee, Debapriya in Annotations , Classification , Computer vision

2021

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.

Journal Article

Share this book

Add to My Shelf

Active Video Games Improve Muscular Fitness and Motor Skills in Children with Overweight or Obesity

by Perez-Lasierra, Jose Luis , Marín-Puyalto, Jorge , Lozano-Berges, Gabriel in Body Mass Index , Child , COVID-19

2022

(1) Background: Childhood obesity is an important public health problem. Children with overweight or obesity often tend to show the pediatric inactivity triad components; these involve exercise deficit disorder, pediatric dynapenia, and physical illiteracy. The aim of the study was to examine the influence of an active video games (AVG) intervention combined with multicomponent exercise on muscular fitness, physical activity (PA), and motor skills in children with overweight or obesity. (2) Methods: A total of 29 (13 girls) children (10.07 ± 0.84 years) with overweight or obesity were randomly allocated in the intervention group (AVG group; n = 21) or in the control group (CG; n = 8). The intervention group performed a 5-month AVG training using the Xbox 360® with the Kinect, the Nintendo Wii®, dance mats, and the BKOOL® interactive cycling simulator, combined with multicomponent exercise, performing three sessions per week. The control group continued their daily activities without modification. Weight, PA using accelerometers, and motor competence using the Test of Gross Motor Development 3rd edition were measured. Muscular fitness was evaluated through the Counter Movement Jump height, maximal isometric strength of knee extension and handgrip strength, and lean mass using Dual-energy X-ray Absorptiometry. Mann–Whitney U and Wilcoxon signed rank tests were performed. The biserial correlation coefficients (r) were calculated. Spearman’s correlation coefficients among PA, muscular fitness, and motor competence variables were also calculated. (3) Results: The AVG group significantly increased their knee extension maximal isometric strength (4.22 kg; p < 0.01), handgrip strength (1.93 kg; p < 0.01), and jump height (1.60 cm; p < 0.01), while the control group only increased the knee extension maximal isometric strength (3.15 kg; p < 0.01). The AVG group improved motor competence and light physical activity (p < 0.05) and decreased sedentary time (p < 0.05). Lean mass improved in both AVG group and CG (p < 0.05). Lastly, the percentage of improvement of motor skills positively correlated with the percentage of improvement in vigorous PA (r = 0.673; p = 0.003) and the percentage of improvement in CMJ (r = 0.466; p = 0.039). (4) Conclusions: A 5-month intervention combining AVG with multicomponent training seems to have positive effects on muscle fitness, motor competence, and PA in children with overweight or obesity.

Journal Article

Share this book

Add to My Shelf

Anomaly Detection in Traffic Surveillance Videos Using Deep Learning

by Ullah, Syed Sajid , Khan, Sardar Waqar , Iqbal, Jawaid in accident detection , Accidents , Algorithms

2022

In the recent past, a huge number of cameras have been placed in a variety of public and private areas for the purposes of surveillance, the monitoring of abnormal human actions, and traffic surveillance. The detection and recognition of abnormal activity in a real-world environment is a big challenge, as there can be many types of alarming and abnormal activities, such as theft, violence, and accidents. This research deals with accidents in traffic videos. In the modern world, video traffic surveillance cameras (VTSS) are used for traffic surveillance and monitoring. As the population is increasing drastically, the likelihood of accidents is also increasing. The VTSS is used to detect abnormal events or incidents regarding traffic on different roads and highways, such as traffic jams, traffic congestion, and vehicle accidents. Mostly in accidents, people are helpless and some die due to the unavailability of emergency treatment on long highways and those places that are far from cities. This research proposes a methodology for detecting accidents automatically through surveillance videos. A review of the literature suggests that convolutional neural networks (CNNs), which are a specialized deep learning approach pioneered to work with grid-like data, are effective in image and video analysis. This research uses CNNs to find anomalies (accidents) from videos captured by the VTSS and implement a rolling prediction algorithm to achieve high accuracy. In the training of the CNN model, a vehicle accident image dataset (VAID), composed of images with anomalies, was constructed and used. For testing the proposed methodology, the trained CNN model was checked on multiple videos, and the results were collected and analyzed. The results of this research show the successful detection of traffic accident events with an accuracy of 82% in the traffic surveillance system videos.

Journal Article

Share this book

Add to My Shelf

A novel keyframe extraction method for video classification using deep neural networks

by Gan, John Q. , Escobar, Juan José , Savran Kızıltepe, Rukiye in Artificial Intelligence , Artificial neural networks , Classification

2023

Combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) produces a powerful architecture for video classification problems as spatial–temporal information can be processed simultaneously and effectively. Using transfer learning, this paper presents a comparative study to investigate how temporal information can be utilized to improve the performance of video classification when CNNs and RNNs are combined in various architectures. To enhance the performance of the identified architecture for effective combination of CNN and RNN, a novel action template-based keyframe extraction method is proposed by identifying the informative region of each frame and selecting keyframes based on the similarity between those regions. Extensive experiments on KTH and UCF-101 datasets with ConvLSTM-based video classifiers have been conducted. Experimental results are evaluated using one-way analysis of variance, which reveals the effectiveness of the proposed keyframe extraction method in the sense that it can significantly improve video classification accuracy.

Journal Article

Share this book

Add to My Shelf

Activity Recognition for Ambient Assisted Living with Videos, Inertial Units and Ambient Sensors

by Vargas, Patricia Amancio , Ranieri, Caetano Mazzoni , Dragone, Mauro in Activities of Daily Living , Algorithms , Ambient Intelligence

2021

Worldwide demographic projections point to a progressively older population. This fact has fostered research on Ambient Assisted Living, which includes developments on smart homes and social robots. To endow such environments with truly autonomous behaviours, algorithms must extract semantically meaningful information from whichever sensor data is available. Human activity recognition is one of the most active fields of research within this context. Proposed approaches vary according to the input modality and the environments considered. Different from others, this paper addresses the problem of recognising heterogeneous activities of daily living centred in home environments considering simultaneously data from videos, wearable IMUs and ambient sensors. For this, two contributions are presented. The first is the creation of the Heriot-Watt University/University of Sao Paulo (HWU-USP) activities dataset, which was recorded at the Robotic Assisted Living Testbed at Heriot-Watt University. This dataset differs from other multimodal datasets due to the fact that it consists of daily living activities with either periodical patterns or long-term dependencies, which are captured in a very rich and heterogeneous sensing environment. In particular, this dataset combines data from a humanoid robot’s RGBD (RGB + depth) camera, with inertial sensors from wearable devices, and ambient sensors from a smart home. The second contribution is the proposal of a Deep Learning (DL) framework, which provides multimodal activity recognition based on videos, inertial sensors and ambient sensors from the smart home, on their own or fused to each other. The classification DL framework has also validated on our dataset and on the University of Texas at Dallas Multimodal Human Activities Dataset (UTD-MHAD), a widely used benchmark for activity recognition based on videos and inertial sensors, providing a comparative analysis between the results on the two datasets considered. Results demonstrate that the introduction of data from ambient sensors expressively improved the accuracy results.

Journal Article

Share this book

Add to My Shelf

Vehicular Traffic Congestion Classification by Visual Features and Deep Learning Approaches: A Comparison

by Balducci, Fabrizio , Impedovo, Donato , Dentamaro, Vincenzo in Algorithms , Cameras , Datasets

2019

Automatic traffic flow classification is useful to reveal road congestions and accidents. Nowadays, roads and highways are equipped with a huge amount of surveillance cameras, which can be used for real-time vehicle identification, and thus providing traffic flow estimation. This research provides a comparative analysis of state-of-the-art object detectors, visual features, and classification models useful to implement traffic state estimations. More specifically, three different object detectors are compared to identify vehicles. Four machine learning techniques are successively employed to explore five visual features for classification aims. These classic machine learning approaches are compared with the deep learning techniques. This research demonstrates that, when methods and resources are properly implemented and tested, results are very encouraging for both methods, but the deep learning method is the most accurately performing one reaching an accuracy of 99.9% for binary traffic state classification and 98.6% for multiclass classification.

Journal Article

Share this book

Add to My Shelf

On the Use of Deep Learning for Video Classification

by Kabir, Md Alamgir , Belhaouari, Samir Brahim , ur Rehman, Atiq in Algorithms , automatic video classification , Classification

2023

The video classification task has gained significant success in the recent years. Specifically, the topic has gained more attention after the emergence of deep learning models as a successful tool for automatically classifying videos. In recognition of the importance of the video classification task and to summarize the success of deep learning models for this task, this paper presents a very comprehensive and concise review on the topic. There are several existing reviews and survey papers related to video classification in the scientific literature. However, the existing review papers do not include the recent state-of-art works, and they also have some limitations. To provide an updated and concise review, this paper highlights the key findings based on the existing deep learning models. The key findings are also discussed in a way to provide future research directions. This review mainly focuses on the type of network architecture used, the evaluation criteria to measure the success, and the datasets used. To make the review self-contained, the emergence of deep learning methods towards automatic video classification and the state-of-art deep learning methods are well explained and summarized. Moreover, a clear insight of the newly developed deep learning architectures and the traditional approaches is provided. The critical challenges based on the benchmarks are highlighted for evaluating the technical progress of these methods. The paper also summarizes the benchmark datasets and the performance evaluation matrices for video classification. Based on the compact, complete, and concise review, the paper proposes new research directions to solve the challenging video classification problem.

Journal Article

Share this book

Add to My Shelf

A Short Video Classification Framework Based on Cross-Modal Fusion

by Yan, Ming , Pang, Nuo , Chan, Chien Aun in Accuracy , Algorithms , Analysis

2023

The explosive growth of online short videos has brought great challenges to the efficient management of video content classification, retrieval, and recommendation. Video features for video management can be extracted from video image frames by various algorithms, and they have been proven to be effective in the video classification of sensor systems. However, frame-by-frame processing of video image frames not only requires huge computing power, but also classification algorithms based on a single modality of video features cannot meet the accuracy requirements in specific scenarios. In response to these concerns, we introduce a short video categorization architecture centered around cross-modal fusion in visual sensor systems which jointly utilizes video features and text features to classify short videos, avoiding processing a large number of image frames during classification. Firstly, the image space is extended to three-dimensional space–time by a self-attention mechanism, and a series of patches are extracted from a single image frame. Each patch is linearly mapped into the embedding layer of the Timesformer network and augmented with positional information to extract video features. Second, the text features of subtitles are extracted through the bidirectional encoder representation from the Transformers (BERT) pre-training model. Finally, cross-modal fusion is performed based on the extracted video and text features, resulting in improved accuracy for short video classification tasks. The outcomes of our experiments showcase a substantial superiority of our introduced classification framework compared to alternative baseline video classification methodologies. This framework can be applied in sensor systems for potential video classification.

Journal Article

Share this book

Add to My Shelf

A Coarse-to-Fine Framework for Resource Efficient Video Recognition

by Davis, Larry S , Wu Zuxuan , Li Hengduo in Artificial neural networks , Classification , Experiments

2021

Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.

Journal Article

Share this book

Add to My Shelf

Variable Temporal Length Training for Action Recognition CNNs

by Chan, Kwok-Leung , Tjahjadi, Tardi , Li, Tan-Kun in Accuracy , action recognition , Classification

2024

Most current deep learning models are suboptimal in terms of the flexibility of their input shape. Usually, computer vision models only work on one fixed shape used during training, otherwise their performance degrades significantly. For video-related tasks, the length of each video (i.e., number of video frames) can vary widely; therefore, sampling of video frames is employed to ensure that every video has the same temporal length. This training method brings about drawbacks in both the training and testing phases. For instance, a universal temporal length can damage the features in longer videos, preventing the model from flexibly adapting to variable lengths for the purposes of on-demand inference. To address this, we propose a simple yet effective training paradigm for 3D convolutional neural networks (3D-CNN) which enables them to process videos with inputs having variable temporal length, i.e., variable length training (VLT). Compared with the standard video training paradigm, our method introduces three extra operations during training: sampling twice, temporal packing, and subvideo-independent 3D convolution. These operations are efficient and can be integrated into any 3D-CNN. In addition, we introduce a consistency loss to regularize the representation space. After training, the model can successfully process video with varying temporal length without any modification in the inference phase. Our experiments on various popular action recognition datasets demonstrate the superior performance of the proposed method compared to conventional training paradigm and other state-of-the-art training paradigms.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter