Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
87 result(s) for "action evaluation dataset"
Sort by:
A Survey of Vision-Based Human Action Evaluation Methods
The fields of human activity analysis have recently begun to diversify. Many researchers have taken much interest in developing action recognition or action prediction methods. The research on human action evaluation differs by aiming to design computation models and evaluation approaches for automatically assessing the quality of human actions. This line of study has become popular because of its explosively emerging real-world applications, such as physical rehabilitation, assistive living for elderly people, skill training on self-learning platforms, and sports activity scoring. This paper presents a comprehensive survey of approaches and techniques in action evaluation research, including motion detection and preprocessing using skeleton data, handcrafted feature representation methods, and deep learning-based feature representation methods. The benchmark datasets from this research field and some evaluation criteria employed to validate the algorithms’ performance are introduced. Finally, the authors present several promising future directions for further studies.
Procedure-Aware Action Quality Assessment: Datasets and Performance Evaluation
In this paper, we investigate the problem of procedure-aware action quality assessment, which analyzes the action quality by delving into the semantic and spatial-temporal relationships among various composed steps of the action. Most existing action quality assessment methods regress on deep features of entire videos to learn diverse scores, which ignore the relationships among different fine-grained steps in actions and result in limitations in visual interpretability and generalization ability. To address these issues, we construct a fine-grained competitive sports video dataset called FineDiving with detailed semantic and temporal annotations, which helps understand the internal structures of each action. We also propose a new approach (i.e., spatial-temporal segmentation attention, STSA) that introduces procedure segmentation to parse an action into consecutive steps, learns powerful representations from these steps by constructing spatial motion attention and procedure-aware cross-attention, and designs a fine-grained contrastive regression to achieve an interpretable scoring mechanism. In addition, we build a benchmark on the FineDiving dataset to evaluate the performance of representative action quality assessment methods. Then, we expand FineDiving to FineDiving+ and construct three new benchmarks to investigate the transferable abilities between different diving competitions, between synchronized and individual dives, and between springboard and platform dives to demonstrate the generalization abilities of our STSA in unknown scenarios, scoring rules, action types, and difficulty degrees. Extensive experiments demonstrate that our approach, designed for procedure-aware action quality assessment, achieves substantial improvements. Our dataset and code are available at https://github.com/xujinglin/FineDiving.
Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities
Human action recognition systems use data collected from a wide range of sensors to accurately identify and interpret human actions. One of the most challenging issues for computer vision is the automatic and precise identification of human activities. A significant increase in feature learning-based representations for action recognition has emerged in recent years, due to the widespread use of deep learning-based features. This study presents an in-depth analysis of human activity recognition that investigates recent developments in computer vision. Augmented reality, human–computer interaction, cybersecurity, home monitoring, and surveillance cameras are all examples of computer vision applications that often go in conjunction with human action detection. We give a taxonomy-based, rigorous study of human activity recognition techniques, discussing the best ways to acquire human action features, derived using RGB and depth data, as well as the latest research on deep learning and hand-crafted techniques. We also explain a generic architecture to recognize human actions in the real world and its current prominent research topic. At long last, we are able to offer some study analysis concepts and proposals for academics. In-depth researchers of human action recognition will find this review an effective tool.
Evaluating Strategies for Adaptation to Climate Change in Grapevine Production–A Systematic Review
In many areas of the world, maintaining grapevine production will require adaptation to climate change. While rigorous evaluations of adaptation strategies provide decision makers with valuable insights, those that are published often overlook major constraints, ignore local adaptive capacity, and suffer from a compartmentalization of disciplines and scales. The objective of our study was to identify current knowledge of evaluation methods and their limitations, reported in the literature. We reviewed 111 papers that evaluate adaptation strategies in the main vineyards worldwide. Evaluation approaches are analyzed through key features (e.g., climate data sources, methodology, evaluation criteria) to discuss their ability to address climate change issues, and to identify promising outcomes for climate change adaptations. We highlight the fact that combining adaptation levers in the short and long term (location, vine training, irrigation, soil, and canopy management, etc.) enables local compromises to be reached between future water availability and grapevine productivity. The main findings of the paper are three-fold: (1) the evaluation of a combination of adaptation strategies provides better solutions for adapting to climate change; (2) multi-scale studies allow local constraints and opportunities to be considered; and (3) only a small number of studies have developed multi-scale and multi-lever approaches to quantify feasibility and effectiveness of adaptation. In addition, we found that climate data sources were not systematically clearly presented, and that climate uncertainty was hardly accounted for. Moreover, only a small number of studies have assessed the economic impacts of adaptation, especially at farm scale. We conclude that the development of methodologies to evaluate adaptation strategies, considering both complementary adaptations and scales, is essential if relevant information is to be provided to the decision-makers of the wine industry.
A study of autoencoders as a feature extraction technique for spike sorting
Spike sorting is the process of grouping spikes of distinct neurons into their respective clusters. Most frequently, this grouping is performed by relying on the similarity of features extracted from spike shapes. In spite of recent developments, current methods have yet to achieve satisfactory performance and many investigators favour sorting manually, even though it is an intensive undertaking that requires prolonged allotments of time. To automate the process, a diverse array of machine learning techniques has been applied. The performance of these techniques depends however critically on the feature extraction step. Here, we propose deep learning using autoencoders as a feature extraction method and evaluate extensively the performance of multiple designs. The models presented are evaluated on publicly available synthetic and real “in vivo” datasets, with various numbers of clusters. The proposed methods indicate a higher performance for the process of spike sorting when compared to other state-of-the-art techniques.
Video-Based Stress Detection through Deep Learning
Stress has become an increasingly serious problem in the current society, threatening mankind’s well-beings. With the ubiquitous deployment of video cameras in surroundings, detecting stress based on the contact-free camera sensors becomes a cost-effective and mass-reaching way without interference of artificial traits and factors. In this study, we leverage users’ facial expressions and action motions in the video and present a two-leveled stress detection network (TSDNet). TSDNet firstly learns face- and action-level representations separately, and then fuses the results through a stream weighted integrator with local and global attention for stress identification. To evaluate the performance of TSDNet, we constructed a video dataset containing 2092 labeled video clips, and the experimental results on the built dataset show that: (1) TSDNet outperformed the hand-crafted feature engineering approaches with detection accuracy 85.42% and F1-Score 85.28%, demonstrating the feasibility and effectiveness of using deep learning to analyze one’s face and action motions; and (2) considering both facial expressions and action motions could improve detection accuracy and F1-Score of that considering only face or action method by over 7%.
Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition
In recent years, deep learning techniques have excelled in video action recognition. However, currently commonly used video action recognition models minimize the importance of different video frames and spatial regions within some specific frames when performing action recognition, which makes it difficult for the models to adequately extract spatiotemporal features from the video data. In this paper, an action recognition method based on improved residual convolutional neural networks (CNNs) for video frames and spatial attention modules is proposed to address this problem. The network can guide what and where to emphasize or suppress with essentially little computational cost using the video frame attention module and the spatial attention module. It also employs a two-level attention module to emphasize feature information along the temporal and spatial dimensions, respectively, highlighting the more important frames in the overall video sequence and the more important spatial regions in some specific frames. Specifically, we create the video frame and spatial attention map by successively adding the video frame attention module and the spatial attention module to aggregate the spatial and temporal dimensions of the intermediate feature maps of the CNNs to obtain different feature descriptors, thus directing the network to focus more on important video frames and more contributing spatial regions. The experimental results further show that the network performs well on the UCF-101 and HMDB-51 datasets.
Enhancing Long-Term Action Quality Assessment: A Dual-Modality Dataset and Causal Cross-Modal Framework for Trampoline Gymnastics
Action quality assessment (AQA) plays a pivotal role in intelligent sports analysis, aiding athlete training and refereeing decisions. However, existing datasets and methods are limited to short-term actions, lacking comprehensive spatiotemporal modeling for complex, long-duration sequences like those in trampoline gymnastics. To bridge this gap, we introduce Trampoline-AQA, a novel dataset comprising 206 video clips from major competitions (2018–2024), featuring dual-modality (RGB and optical flow) data and rich annotations. Leveraging this dataset, we propose a framework comprising a Temporal Feature Enhancer (TFE) and a forward-looking causal cross-modal attention (FCCA) module, which improves action quality assessment by delivering more accurate and robust scoring for long-duration, high-speed routines, particularly under motion ambiguities. Our approach achieves a Spearman correlation of 0.938 on Trampoline-AQA and 0.882 on UNLV-Dive, demonstrating superior performance and generalization capability.
Meta-analytic Gaussian Network Aggregation
A growing number of publications focus on estimating Gaussian graphical models (GGM, networks of partial correlation coefficients). At the same time, generalizibility and replicability of these highly parameterized models are debated, and sample sizes typically found in datasets may not be sufficient for estimating the underlying network structure. In addition, while recent work emerged that aims to compare networks based on different samples, these studies do not take potential cross-study heterogeneity into account. To this end, this paper introduces methods for estimating GGMs by aggregating over multiple datasets. We first introduce a general maximum likelihood estimation modeling framework in which all discussed models are embedded. This modeling framework is subsequently used to introduce meta-analytic Gaussian network aggregation (MAGNA). We discuss two variants: fixed-effects MAGNA, in which heterogeneity across studies is not taken into account, and random-effects MAGNA, which models sample correlations and takes heterogeneity into account. We assess the performance of MAGNA in large-scale simulation studies. Finally, we exemplify the method using four datasets of post-traumatic stress disorder (PTSD) symptoms, and summarize findings from a larger meta-analysis of PTSD symptom.
Bus Violence: An Open Benchmark for Video Violence Detection on Public Transport
The automatic detection of violent actions in public places through video analysis is difficult because the employed Artificial Intelligence-based techniques often suffer from generalization problems. Indeed, these algorithms hinge on large quantities of annotated data and usually experience a drastic drop in performance when used in scenarios never seen during the supervised learning phase. In this paper, we introduce and publicly release the Bus Violence benchmark, the first large-scale collection of video clips for violence detection on public transport, where some actors simulated violent actions inside a moving bus in changing conditions, such as the background or light. Moreover, we conduct a performance analysis of several state-of-the-art video violence detectors pre-trained with general violence detection databases on this newly established use case. The achieved moderate performances reveal the difficulties in generalizing from these popular methods, indicating the need to have this new collection of labeled data, beneficial for specializing them in this new scenario.