Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
25
result(s) for
"MediaPipe Hand"
Sort by:
Sign Language Recognition Method Based on Palm Definition Model and Multiple Classification
by
Karipzhanova, Ardak
,
Amangeldy, Nurzada
,
Kassymova, Akmaral
in
Algorithms
,
Analysis
,
Deafness
2022
Technologies for pattern recognition are used in various fields. One of the most relevant and important directions is the use of pattern recognition technology, such as gesture recognition, in socially significant tasks, to develop automatic sign language interpretation systems in real time. More than 5% of the world’s population—about 430 million people, including 34 million children—are deaf-mute and not always able to use the services of a living sign language interpreter. Almost 80% of people with a disabling hearing loss live in low- and middle-income countries. The development of low-cost systems of automatic sign language interpretation, without the use of expensive sensors and unique cameras, would improve the lives of people with disabilities, contributing to their unhindered integration into society. To this end, in order to find an optimal solution to the problem, this article analyzes suitable methods of gesture recognition in the context of their use in automatic gesture recognition systems, to further determine the most optimal methods. From the analysis, an algorithm based on the palm definition model and linear models for recognizing the shapes of numbers and letters of the Kazakh sign language are proposed. The advantage of the proposed algorithm is that it fully recognizes 41 letters of the 42 in the Kazakh sign alphabet. Until this time, only Russian letters in the Kazakh alphabet have been recognized. In addition, a unified function has been integrated into our system to configure the frame depth map mode, which has improved recognition performance and can be used to create a multimodal database of video data of gesture words for the gesture recognition system.
Journal Article
IPN HandS: Efficient Annotation Tool and Dataset for Skeleton-Based Hand Gesture Recognition
by
Takahashi, Hiroki
,
Olivares-Mercado, Jesus
,
Sanchez-Perez, Gabriel
in
Annotations
,
Automation
,
Datasets
2025
Hand gesture recognition (HGR) heavily relies on high-quality annotated datasets. However, annotating hand landmarks in video sequences is a time-intensive challenge. In this work, we introduce IPN HandS, an enhanced version of our IPN Hand dataset, which now includes approximately 700,000 hand skeleton annotations and corrected gesture boundaries. To generate these annotations efficiently, we propose a novel annotation tool that combines automatic detection, inter-frame interpolation, copy–paste capabilities, and manual refinement. This tool significantly reduces annotation time from 70 min to just 27 min per video, allowing for the scalable and precise annotation of large datasets. We validate the advantages of the IPN HandS dataset by training a lightweight LSTM-based model using these annotations and comparing its performance against models trained with annotations from the widely used MediaPipe hand pose estimators. Our model achieves an accuracy that is 12% higher than the MediaPipe Hands model and 8% higher than the MediaPipe Holistic model. These results underscore the importance of annotation quality in training generalization and overall recognition performance. Both the IPN HandS dataset and the annotation tool will be released to support reproducible research and future work in HGR and related fields.
Journal Article
Personal identification using a cross-sectional hyperspectral image of a hand
2025
I explore hyperspectral imaging, a rapid and noninvasive technique with significant potential in biometrics and medical diagnosis. Personal identification was performed using cross-sectional hyperspectral images of palms, offering a simpler and more robust method than conventional vascular pattern identification methods.
I aim to demonstrate the potential of local cross-sectional hyperspectral palm images to identify individuals with high accuracy.
Hyperspectral imaging of palms, artificial intelligence (AI)-based region of interest (ROI) detection, feature vector extraction, and dimensionality reduction were utilized to validate personal identification accuracy using the area under the curve (AUC) and equal error rate (EER).
The feature vectors extracted by the proposed method demonstrated higher intra-cluster similarity when the clustering data were reduced through uniform manifold approximation and projection compared with principal component analysis and
-distributed stochastic neighbor embedding. A maximum AUC of 0.98 and an EER of 0.04% were observed.
I proposed a biometric method using cross-sectional hyperspectral imaging of human palms. The procedure includes AI-based ROI detection, feature extraction, dimension reduction, and intra- and inter-subject matching using Euclidean distances as a discriminant function. The proposed method has the potential to identify individuals with high accuracy.
Journal Article
Video-Based Hand Movement Analysis of Parkinson Patients before and after Medication Using High-Frame-Rate Videos and MediaPipe
by
Dafotakis, Manuel
,
Dill, Sebastian
,
Hoog Antink, Christoph
in
Accelerometers
,
Accuracy
,
Artificial intelligence
2022
Tremor is one of the common symptoms of Parkinson’s disease (PD). Thanks to the recent evolution of digital technologies, monitoring of PD patients’ hand movements employing contactless methods gained momentum. Objective: We aimed to quantitatively assess hand movements in patients suffering from PD using the artificial intelligence (AI)-based hand-tracking technologies of MediaPipe. Method: High-frame-rate videos and accelerometer data were recorded from 11 PD patients, two of whom showed classical Parkinsonian-type tremor. In the OFF-state and 30 Minutes after taking their standard oral medication (ON-state), video recordings were obtained. First, we investigated the frequency and amplitude relationship between the video and accelerometer data. Then, we focused on quantifying the effect of taking standard oral treatments. Results: The data extracted from the video correlated well with the accelerometer-based measurement system. Our video-based approach identified the tremor frequency with a small error rate (mean absolute error 0.229 (±0.174) Hz) and an amplitude with a high correlation. The frequency and amplitude of the hand movement before and after medication in PD patients undergoing medication differ. PD Patients experienced a decrease in the mean value for frequency from 2.012 (±1.385) Hz to 1.526 (±1.007) Hz and in the mean value for amplitude from 8.167 (±15.687) a.u. to 4.033 (±5.671) a.u. Conclusions: Our work achieved an automatic estimation of the movement frequency, including the tremor frequency with a low error rate, and to the best of our knowledge, this is the first paper that presents automated tremor analysis before/after medication in PD, in particular using high-frame-rate video data.
Journal Article
Signer-Independent Arabic Sign Language Recognition System Using Deep Learning Model
by
Chowdhury, Muhammad E. H.
,
Ayari, Mohamed Arselene
,
Kadir, Muhammad Abdul
in
Accuracy
,
Acknowledgment
,
Analysis
2023
Every one of us has a unique manner of communicating to explore the world, and such communication helps to interpret life. Sign language is the popular language of communication for hearing and speech-disabled people. When a sign language user interacts with a non-sign language user, it becomes difficult for a signer to express themselves to another person. A sign language recognition system can help a signer to interpret the sign of a non-sign language user. This study presents a sign language recognition system that is capable of recognizing Arabic Sign Language from recorded RGB videos. To achieve this, two datasets were considered, such as (1) the raw dataset and (2) the face–hand region-based segmented dataset produced from the raw dataset. Moreover, operational layer-based multi-layer perceptron “SelfMLP” is proposed in this study to build CNN-LSTM-SelfMLP models for Arabic Sign Language recognition. MobileNetV2 and ResNet18-based CNN backbones and three SelfMLPs were used to construct six different models of CNN-LSTM-SelfMLP architecture for performance comparison of Arabic Sign Language recognition. This study examined the signer-independent mode to deal with real-time application circumstances. As a result, MobileNetV2-LSTM-SelfMLP on the segmented dataset achieved the best accuracy of 87.69% with 88.57% precision, 87.69% recall, 87.72% F1 score, and 99.75% specificity. Overall, face–hand region-based segmentation and SelfMLP-infused MobileNetV2-LSTM-SelfMLP surpassed the previous findings on Arabic Sign Language recognition by 10.970% accuracy.
Journal Article
Evaluation of Commercial Camera-Based Solutions for Tracking Hand Kinematics
by
Wolf, Evelynne
,
Kamper, Derek
,
Vogel, Christopher
in
Adult
,
Biomechanical Phenomena - physiology
,
Cameras
2025
Tracking hand kinematics is essential for numerous clinical and scientific applications. Markerless motion capture devices have advantages over other modalities in terms of calibration, set up, and overall ease of use; however, their accuracy during dynamic tasks has not been fully explored. This study examined the performance of two popular markerless systems, the Leap Motion Controller 2 (LM2) and MediaPipe (MP), in capturing joint motion of the digits. Data were compared to joint motion collected from a marker-based multi-camera system (Vicon). Eleven participants performed six tasks with their dominant right hand at three movement speeds while all three devices simultaneously captured the position of hand landmarks. Using these data, digit joint angles were calculated. The root mean squared error (RMSE) and correlation coefficient (r) relative to the Vicon angles were computed for LM2 and MP. LM2 achieved a lower error (p < 0.001, mean RMSE = 14.8°) and a higher correlation (p = 0.007, mean r = 0.58) than the MP system (mean RMSE = 22.5°, mean r = 0.45). Greater movement speed led to significantly higher RMSE (p < 0.001) and lower r (p < 0.001) for MP but not for LM2. Error was substantially greater for the proximal interphalangeal joint than for other finger joints, although r values were higher for this joint. Overall, the LM2 and MP systems were able to capture motion at the joint level across digits for a variety of tasks in real time, although the level of error may not be acceptable for certain applications.
Journal Article
Biomimetic learning of hand gestures in a humanoid robot
by
Karri, Bharat Kashyap
,
Kakoty, Nayan M.
,
Vinjamuri, Ramana
in
bioinspired robots
,
biomimetic robots
,
hand kinematics
2024
Hand gestures are a natural and intuitive form of communication, and integrating this communication method into robotic systems presents significant potential to improve human-robot collaboration. Recent advances in motor neuroscience have focused on replicating human hand movements from synergies also known as movement primitives. Synergies, fundamental building blocks of movement, serve as a potential strategy adapted by the central nervous system to generate and control movements. Identifying how synergies contribute to movement can help in dexterous control of robotics, exoskeletons, prosthetics and extend its applications to rehabilitation. In this paper, 33 static hand gestures were recorded through a single RGB camera and identified in real-time through the MediaPipe framework as participants made various postures with their dominant hand. Assuming an open palm as initial posture, uniform joint angular velocities were obtained from all these gestures. By applying a dimensionality reduction method, kinematic synergies were obtained from these joint angular velocities. Kinematic synergies that explain 98% of variance of movements were utilized to reconstruct new hand gestures using convex optimization. Reconstructed hand gestures and selected kinematic synergies were translated onto a humanoid robot, Mitra, in real-time, as the participants demonstrated various hand gestures. The results showed that by using only few kinematic synergies it is possible to generate various hand gestures, with 95.7% accuracy. Furthermore, utilizing low-dimensional synergies in control of high dimensional end effectors holds promise to enable near-natural human-robot collaboration.
Journal Article
Lightweight real-time hand segmentation leveraging MediaPipe landmark detection
2023
Real-time hand segmentation is a key process in applications that require human–computer interaction, such as gesture recognition or augmented reality systems. However, the infinite shapes and orientations that hands can adopt, their variability in skin pigmentation and the self-occlusions that continuously appear in images make hand segmentation a truly complex problem, especially with uncontrolled lighting conditions and backgrounds. The development of robust, real-time hand segmentation algorithms is essential to achieve immersive augmented reality and mixed reality experiences by correctly interpreting collisions and occlusions. In this paper, we present a simple but powerful algorithm based on the MediaPipe Hands solution, a highly optimized neural network. The algorithm processes the landmarks provided by MediaPipe using morphological and logical operators to obtain the masks that allow dynamic updating of the skin color model. Different experiments were carried out comparing the influence of the color space on skin segmentation, with the CIELab color space chosen as the best option. An average intersection over union of 0.869 was achieved on the demanding Ego2Hands dataset running at 90 frames per second on a conventional computer without any hardware acceleration. Finally, the proposed segmentation procedure was implemented in an augmented reality application to add hand occlusion for improved user immersion. An open-source implementation of the algorithm is publicly available at https://github.com/itap-robotica-medica/lightweight-hand-segmentation.
Journal Article
Detection of Rehabilitation Training Effect of Upper Limb Movement Disorder Based on MPL-CNN
2024
Stroke represents a medical emergency and can lead to the development of movement disorders such as abnormal muscle tone, limited range of motion, or abnormalities in coordination and balance. In order to help stroke patients recover as soon as possible, rehabilitation training methods employ various movement modes such as ordinary movements and joint reactions to induce active reactions in the limbs and gradually restore normal functions. Rehabilitation effect evaluation can help physicians understand the rehabilitation needs of different patients, determine effective treatment methods and strategies, and improve treatment efficiency. In order to achieve real-time and accuracy of action detection, this article uses Mediapipe’s action detection algorithm and proposes a model based on MPL-CNN. Mediapipe can be used to identify key point features of the patient’s upper limbs and simultaneously identify key point features of the hand. In order to detect the effect of rehabilitation training for upper limb movement disorders, LSTM and CNN are combined to form a new LSTM-CNN model, which is used to identify the action features of upper limb rehabilitation training extracted by Medipipe. The MPL-CNN model can effectively identify the accuracy of rehabilitation movements during upper limb rehabilitation training for stroke patients. In order to ensure the scientific validity and unified standards of rehabilitation training movements, this article employs the postures in the Fugl-Meyer Upper Limb Rehabilitation Training Functional Assessment Form (FMA) and establishes an FMA upper limb rehabilitation data set for experimental verification. Experimental results show that in each stage of the Fugl-Meyer upper limb rehabilitation training evaluation effect detection, the MPL-CNN-based method’s recognition accuracy of upper limb rehabilitation training actions reached 95%. At the same time, the average accuracy rate of various upper limb rehabilitation training actions reaches 97.54%. This shows that the model is highly robust across different action categories and proves that the MPL-CNN model is an effective and feasible solution. This method based on MPL-CNN can provide a high-precision detection method for the evaluation of rehabilitation effects of upper limb movement disorders after stroke, helping clinicians in evaluating the patient’s rehabilitation progress and adjusting the rehabilitation plan based on the evaluation results. This will help improve the personalization and precision of rehabilitation treatment and promote patient recovery.
Journal Article
Light-Weight Deep Learning Techniques with Advanced Processing for Real-Time Hand Gesture Recognition
2022
In the discipline of hand gesture and dynamic sign language recognition, deep learning approaches with high computational complexity and a wide range of parameters have been an extremely remarkable success. However, the implementation of sign language recognition applications for mobile phones with restricted storage and computing capacities is usually greatly constrained by those limited resources. In light of this situation, we suggest lightweight deep neural networks with advanced processing for real-time dynamic sign language recognition (DSLR). This paper presents a DSLR application to minimize the gap between hearing-impaired communities and regular society. The DSLR application was developed using two robust deep learning models, the GRU and the 1D CNN, combined with the MediaPipe framework. In this paper, the authors implement advanced processes to solve most of the DSLR problems, especially in real-time detection, e.g., differences in depth and location. The solution method consists of three main parts. First, the input dataset is preprocessed with our algorithm to standardize the number of frames. Then, the MediaPipe framework extracts hands and poses landmarks (features) to detect and locate them. Finally, the features of the models are passed after processing the unification of the depth and location of the body to recognize the DSL accurately. To accomplish this, the authors built a new American video-based sign dataset and named it DSL-46. DSL-46 contains 46 daily used signs that were presented with all the needed details and properties for recording the new dataset. The results of the experiments show that the presented solution method can recognize dynamic signs extremely fast and accurately, even in real-time detection. The DSLR reaches an accuracy of 98.8%, 99.84%, and 88.40% on the DSL-46, LSA64, and LIBRAS-BSL datasets, respectively.
Journal Article