Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
59 result(s) for "spatial-temporal graph convolutional network"
Sort by:
Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network
The timely detection of falls and alerting medical aid is critical for health monitoring in elderly individuals living alone. This paper mainly focuses on issues such as poor adaptability, privacy infringement, and low recognition accuracy associated with traditional visual sensor-based fall detection. We propose an infrared video-based fall detection method utilizing spatial-temporal graph convolutional networks (ST-GCNs) to address these challenges. Our method used fine-tuned AlphaPose to extract 2D human skeleton sequences from infrared videos. Subsequently, the skeleton data was represented in Cartesian and polar coordinates and processed through a two-stream ST-GCN to recognize fall behaviors promptly. To enhance the network’s recognition capability for fall actions, we improved the adjacency matrix of graph convolutional units and introduced multi-scale temporal graph convolution units. To facilitate practical deployment, we optimized time window and network depth of the ST-GCN, striking a balance between model accuracy and speed. The experimental results on a proprietary infrared human action recognition dataset demonstrated that our proposed algorithm accurately identifies fall behaviors with the highest accuracy of 96%. Moreover, our algorithm performed robustly, identifying falls in both near-infrared and thermal-infrared videos.
Combining Supervised and Unsupervised Learning Algorithms for Human Activity Recognition
Human activity recognition is an extensively researched topic in the last decade. Recent methods employ supervised and unsupervised deep learning techniques in which spatial and temporal dependency is modeled. This paper proposes a novel approach for human activity recognition using skeleton data. The method combines supervised and unsupervised learning algorithms in order to provide qualitative results and performance in real time. The proposed method involves a two-stage framework: the first stage applies an unsupervised clustering technique to group up activities based on their similarity, while the second stage classifies data assigned to each group using graph convolutional networks. Different clustering techniques and data augmentation strategies are explored for improving the training process. The results were compared against the state of the art methods and the proposed model achieved 90.22% Top-1 accuracy performance for NTU-RGB+D dataset (the performance was increased by approximately 9% compared with the baseline graph convolutional method). Moreover, inference time and total number of parameters stay within the same magnitude order. Extending the initial set of activities with additional classes is fast and robust, since there is no required retraining of the entire architecture but only to retrain the cluster to which the activity is assigned.
Skeleton-Based Fall Detection with Multiple Inertial Sensors Using Spatial-Temporal Graph Convolutional Networks
The application of wearable devices for fall detection has been the focus of much research over the past few years. One of the most common problems in established fall detection systems is the large number of false positives in the recognition schemes. In this paper, to make full use of the dependence between human joints and improve the accuracy and reliability of fall detection, a fall-recognition method based on the skeleton and spatial-temporal graph convolutional networks (ST-GCN) was proposed, using the human motion data of body joints acquired by inertial measurement units (IMUs). Firstly, the motion data of five inertial sensors were extracted from the UP-Fall dataset and a human skeleton model for fall detection was established through the natural connection relationship of body joints; after that, the ST-GCN-based fall-detection model was established to extract the motion features of human falls and the activities of daily living (ADLs) at the spatial and temporal scales for fall detection; then, the influence of two hyperparameters and window size on the algorithm performance was discussed; finally, the recognition results of ST-GCN were also compared with those of MLP, CNN, RNN, LSTM, TCN, TST, and MiniRocket. The experimental results showed that the ST-GCN fall-detection model outperformed the other seven algorithms in terms of accuracy, precision, recall, and F1-score. This study provides a new method for IMU-based fall detection, which has the reference significance for improving the accuracy and robustness of fall detection.
Optimizing the early diagnosis of neurological disorders through the application of machine learning for predictive analytics in medical imaging
Early diagnosis of Neurological Disorders (ND) such as Alzheimer’s disease (AD) and Brain Tumors (BT) can be highly challenging since these diseases cause minor changes in the brain’s anatomy. Magnetic Resonance Imaging (MRI) is a vital tool for diagnosing and visualizing these ND; however, standard techniques contingent upon human analysis can be inaccurate, require a long-time, and detect early-stage symptoms necessary for effective treatment. Spatial Feature Extraction (FE) has been improved by Convolutional Neural Networks (CNN) and hybrid models, both of which are changes in Deep Learning (DL). However, these analysis methods frequently fail to accept temporal dynamics, which is significant for a complete test. The present investigation introduces the STGCN-ViT, a hybrid model that integrates CNN + Spatial–Temporal Graph Convolutional Networks (STGCN) + Vision Transformer (ViT) components to address these gaps. The model causes the reference to EfficientNet-B0 for FE in space, STGCN for FE in time, and ViT for FE using AM. By applying the Open Access Series of Imaging Studies (OASIS) and Harvard Medical School (HMS) benchmark datasets, the recommended approach proved effective in the investigations, with Group A attaining an accuracy of 93.56%, a precision of 94.41% and an Area under the Receiver Operating Characteristic Curve (AUC-ROC) score of 94.63%. Compared with standard and transformer-based models, the model attains better results for Group B, with an accuracy of 94.52%, precision of 95.03%, and AUC-ROC score of 95.24%. Those results support the model’s use in real-time medical applications by providing proof of the probability of accurate but early-stage ND diagnosis.
Federated duelling deep Q‐network based collaborative energy scheduling for a power distribution network
The collaborative energy scheduling of source‐load‐energy storage has great potential to meet the active control requirements of power‐distribution networks. In this study, a federated deep reinforcement learning framework was developed to facilitate collaborative energy scheduling and maximize the total economic benefit in a distribution network. Then, considering the application of Markov decision processes for energy scheduling, a spatial temporal graph convolutional network transformer based power generation packaging model for renewable energy sources was presented, and a collaborative energy scheduling strategy based on a federated duelling deep Q‐network was designed. The simulation results indicate that the developed collaborative scheduling strategy can maximize the economic benefits of a power distribution network while ensuring data privacy.
3D skeleton-based human motion prediction using spatial–temporal graph convolutional network
3D human motion prediction; predicting future human poses in the basis of historically observed motion sequences, is a core task in computer vision. Thus far, it has been successfully applied to both autonomous driving and human–robot interaction. Previous research work has usually employed Recurrent Neural Networks (RNNs)-based models to predict future human poses. However, as previous works have amply demonstrated, RNN-based prediction models suffer from unrealistic and discontinuous problems in human motion prediction due to the accumulation of prediction errors. To address this, we propose a feed-forward, 3D skeleton-based model for human motion prediction. This model, the Spatial–Temporal Graph Convolutional Network (ST-GCN) model, automatically learns the spatial and temporal patterns of human motion from input sequences. This model overcomes the limitations of previous research approaches. Specifically, our ST-GCN model is based on an encoder-decoder architecture. The encoder consists of 5 ST-GCN modules, with each ST-GCN module consisting of a spatial GCN layer and a 2D convolution-based TCN layer, which facilitate the encoding of the spatio-temporal dynamics of human motion. Subsequently, the decoder, consisting of 5 TCN layers, exploits the encoded spatio-temporal representation of human motion to predict future human pose. We leveraged the ST-GCN model to perform extensive experiments on various large-scale human activity 3D pose datasets (Human3.6 M, AMASS, 3DPW) while adopting MPJPE (Mean Per Joint Position Error) as the evaluation metric. The experimental results demonstrate that our ST-GCN model outperforms the baseline models in both short-term (< 400 ms) and long-term (> 400 ms) predictions, thus yielding the best prediction results.
Hybrid Spatial–Temporal Graph Convolutional Networks for On-Street Parking Availability Prediction
With the development of sensors and of the Internet of Things (IoT), smart cities can provide people with a variety of information for a more convenient life. Effective on-street parking availability prediction can improve parking efficiency and, at times, alleviate city congestion. Conventional methods of parking availability prediction often do not consider the spatial–temporal features of parking duration distributions. To this end, we propose a parking space prediction scheme called the hybrid spatial–temporal graph convolution networks (HST-GCNs). We use graph convolutional networks and gated linear units (GLUs) with a 1D convolutional neural network to obtain the spatial features and the temporal features, respectively. Then, we construct a spatial–temporal convolutional block to obtain the instantaneous spatial–temporal correlations. Based on the similarity of the parking duration distributions, we propose an attention mechanism called distAtt to measure the similarity of parking duration distributions. Through the distAtt mechanism, we add the long-term spatial–temporal correlations to our spatial–temporal convolutional block, and thus, we can capture complex hybrid spatial–temporal correlations to achieve a higher accuracy of parking availability prediction. Based on real-world datasets, we compare the proposed scheme with the benchmark models. The experimental results show that the proposed scheme has the best performance in predicting the parking occupancy rate.
Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels.
Human Action Recognition and Note Recognition: A Deep Learning Approach Using STA-GCN
Human action recognition (HAR) is growing in machine learning with a wide range of applications. One challenging aspect of HAR is recognizing human actions while playing music, further complicated by the need to recognize the musical notes being played. This paper proposes a deep learning-based method for simultaneous HAR and musical note recognition in music performances. We conducted experiments on Morin khuur performances, a traditional Mongolian instrument. The proposed method consists of two stages. First, we created a new dataset of Morin khuur performances. We used motion capture systems and depth sensors to collect data that includes hand keypoints, instrument segmentation information, and detailed movement information. We then analyzed RGB images, depth images, and motion data to determine which type of data provides the most valuable features for recognizing actions and notes in music performances. The second stage utilizes a Spatial Temporal Attention Graph Convolutional Network (STA-GCN) to recognize musical notes as continuous gestures. The STA-GCN model is designed to learn the relationships between hand keypoints and instrument segmentation information, which are crucial for accurate recognition. Evaluation on our dataset demonstrates that our model outperforms the traditional ST-GCN model, achieving an accuracy of 81.4%.
Thermal infrared action recognition with two-stream shift Graph Convolutional Network
The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.