Catalogue Search | MBRL

An Improved Speech Base Frequency Extraction Algorithm

by Zhang, Hongyan , Yu, Xiaojing , Jin, Yingying in double threshold method , Endpoint detection , fundamental frequency extraction

2021

Journal Article

Share this book

Add to My Shelf

A Study on the Utilization of Polyphonic Folk Songs in College Choral Teaching in the Context of Big Data

by Xia, Fan in 62-07 , Algorithms , Choral teaching

2024

Introducing polyphonic folk songs into the choral teaching classroom in colleges and universities not only helps to carry forward the excellent traditional culture but also promotes the students’ emotional education and physical and mental health development. This paper analyzes the status quo and dilemma of the use of polyphonic folk songs in college choral teaching and puts forward the model of music choral intelligent tutoring system (ITS) and the method of spectral tone recognition as countermeasures. A collaborative filtering algorithm is used to construct the learning resource recommendation algorithm in the music chorus intelligent guided learning system model, which is based on the ITS system structure. The YIN algorithm is used to extract the audio, and the pitch template matching is accomplished with the help of the improved DTW algorithm to complete the choral ability assessment. An improved normal harmonic algorithm has been proposed to preprocess the audio signal and achieve spectral tone recognition for multi-part folk songs in the fundamental frequency extraction stage of spectral tone recognition. In a comprehensive university in Sichuan Province, China, the teaching practice of polyphonic folk song chorus was carried out. The students’ scores in the dimensions of learning interest and learning confidence increased by 0.976 and 0.698 on average after the experiment. The p-values were all less than 0.05, showing significant differences.

Journal Article

Share this book

Add to My Shelf

Position and speed estimation for high speed permanent magnet synchronous motors using wideband synchronous fundamental-frequency extraction filters

by Zhang, Chunjuan , Shen, Yirong , Song, Jie in Accuracy , Back electromotive force , Bandwidths

2022

The non-ideal factors such as the inherent chattering of a sliding mode observer (SMO), the delay of a control algorithm, and the dead-time effect give rise to the harmonic error of position estimation. To improve the performance of the position observation of high-speed permanent magnet synchronous motors (HSPMSMs), a wideband synchronous fundamental-frequency extraction filter (WSFEF) is proposed. On this basis, a novel signal processing method consisting of a WSFEF-PLL is applied to extract the fundamental frequency signal of the estimated back electromotive force (EMF). The application of a phase-locked loop (PLL) ensures that the resonance frequency of the WSFEF is adaptive, which is essential for the variable-speed operation in sensorless HSPMSM drive systems. Using the WSFEF-PLL in a SMO-based position estimator, the rotor position estimation error caused by the harmonics contained in the back EMF can be effectively eliminated, which contributes to improving the accuracy and dynamic performance of the rotor position estimation. Simulations and experiments verify the feasibility and effectiveness of this method.

Journal Article

Share this book

Add to My Shelf

High speed directional relaying algorithm based on the fundamental frequency positive sequence superimposed components

by Gu, Bin , Wei, Hua , Tan, Jiancheng in Algorithms , Applied sciences , Connection and protection apparatus

2014

The conventional travelling wave protection is susceptible to the high frequency transient quantities, which can reduce the protection reliability. A directional-relay algorithm based on positive sequence superimposed components is proposed in this study. In microprocessor relays, the extraction of fundamental frequency voltages and currents is, conventionally, provided by phasor estimation methods such as the Fourier algorithm. The common required time for fault detection in these relays is approximately one to two cycles. This study presents a high-speed algorithm for the extraction of fundamental frequency positive sequence voltages and currents superimposed components based on Park transformation. In the meantime, specially designed high speed algorithms to solve the difficult problems that exist in a real system are proposed. Extensive simulations are performed to evaluate the performance of the proposed algorithm. The results show that the algorithm is fast and reliable for power transmission line protections. Also it is immune to fault resistances and system condition.

Journal Article

Share this book

Add to My Shelf

Quantification of speech and synchrony in the conversation of adults with autism spectrum disorder

by Sagayama, Shigeki , Yamasue, Hidenori , Ono, Nobutaka in Adult , Adults , Autism

2019

Autism spectrum disorder (ASD) is a highly prevalent neurodevelopmental disorder characterized by impairments in social reciprocity and communication together with restricted interest and stereotyped behaviors. The Autism Diagnostic Observation Schedule (ADOS) is considered a 'gold standard' instrument for diagnosis of ASD and mainly depends on subjective assessments made by trained clinicians. To develop a quantitative and objective surrogate marker for ASD symptoms, we investigated speech features including F0, speech rate, speaking time, and turn-taking gaps, extracted from footage recorded during a semi-structured socially interactive situation from ADOS. We calculated not only the statistic values in a whole session of the ADOS activity but also conducted a block analysis, computing the statistical values of the prosodic features in each 8s sliding window. The block analysis identified whether participants changed volume or pitch according to the flow of the conversation. We also measured the synchrony between the participant and the ADOS administrator. Participants with high-functioning ASD showed significantly longer turn-taking gaps and a greater proportion of pause time, less variability and less synchronous changes in blockwise mean of intensity compared with those with typical development (TD) (p<0.05 corrected). In addition, the ASD group had significantly wider distribution than the TD group in the within-participant variability of blockwise mean of log F0 (p<0.05 corrected). The clinical diagnosis could be discriminated using the speech features with 89% accuracy. The features of turn-taking and pausing were significantly correlated with deficits of ASD in reciprocity (p<0.05 corrected). Additionally, regression analysis provided 1.35 of mean absolute error in the prediction of deficits in reciprocity, to which the synchrony of intensity especially contributed. The findings suggest that considering variance of speech features, interaction and synchrony with conversation partner are critical to characterize atypical features in the conversation of people with ASD.

Journal Article

Share this book

Add to My Shelf

Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm

by Kabudian Seyed Jahanshah , Daneshfar Fatemeh in Acknowledgment , Affective computing , Algorithms

2020

In recent years, Speech Emotion Recognition (SER) has received considerable attention in affective computing field. In this paper, an improved system for SER is proposed. In the feature extraction step, a hybrid high-dimensional rich feature vector is extracted from both speech signal and glottal-waveform signal using techniques such as MFCC, PLPC, and MVDR. The prosodic features derived from fundamental frequency (f0) contour are also added to this feature vector. The proposed system is based on a holistic approach that employs a modified quantum-behaved particle swarm optimization (QPSO) algorithm (called pQPSO) to estimate both the optimal projection matrix for feature-vector dimension reduction and Gaussian Mixture Model (GMM) classifier parameters. Since the problem parameters are in a limited range and the standard QPSO algorithm performs a search in an infinite range, in this paper, the QPSO is modified in such a way that it uses a truncated probability distribution and makes the search more efficient. The system works in real-time and is evaluated on three standard emotional speech databases Berlin database of emotional speech (EMO-DB), Surrey Audio-Visual Expressed Emotion (SAVEE) and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The proposed method improves the accuracy of the SER system compared to classical methods such as FA, PCA, PPCA, LDA, standard QPSO, wQPSO, and deep neural network, and also outperforms many state-of-the-art recent approaches that use the same datasets.

Journal Article

Share this book

Add to My Shelf

Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis

by James, Jesin , Watson, Catherine , B.T., Balamurali in Anger , Anxiety , Deep learning

2023

A low-resource emotional speech synthesis system for empathetic speech synthesis based on modelling prosody features is presented here. Secondary emotions, identified to be needed for empathetic speech, are modelled and synthesised in this investigation. As secondary emotions are subtle in nature, they are difficult to model compared to primary emotions. This study is one of the few to model secondary emotions in speech as they have not been extensively studied so far. Current speech synthesis research uses large databases and deep learning techniques to develop emotion models. There are many secondary emotions, and hence, developing large databases for each of the secondary emotions is expensive. Hence, this research presents a proof of concept using handcrafted feature extraction and modelling of these features using a low-resource-intensive machine learning approach, thus creating synthetic speech with secondary emotions. Here, a quantitative-model-based transformation is used to shape the emotional speech’s fundamental frequency contour. Speech rate and mean intensity are modelled via rule-based approaches. Using these models, an emotional text-to-speech synthesis system to synthesise five secondary emotions-anxious, apologetic, confident, enthusiastic and worried-is developed. A perception test to evaluate the synthesised emotional speech is also conducted. The participants could identify the correct emotion in a forced response test with a hit rate greater than 65%.

Journal Article

Share this book

Add to My Shelf

Biomarker confirmed reference values for acoustic voice features: findings from the Framingham Heart Study

by Karjadi, Cody , Serrano, Xavier , Ding, Huitong in Acoustics , Adults , Aging

2025

Background Interest in acoustic voice features as digital biomarkers of underlying Alzheimer's disease (AD) has been increasing. However, lacking confirmation of AD specificity and reference values or normative information, particularly in relation to AD‐specific biomarkers, greatly limits the ability to determine measurement thresholds that are clinically meaningful. We present preliminary normative values of acoustic voice features for those who are positron emission tomography (PET) beta amyloid positive (Aß+) and negative (Aß‐). Method This study included 268 cognitively unimpaired participants (mean age 57.2 ± 9.9 years; 50.4% female) from the Framingham Heart Study Brain Aging Program who had voice recordings of neuropsychological assessment obtained within one year before amyloid PET imaging. Sixty‐five acoustic features (i.e., prosodic, spectral, and sound quality voice features) were extracted from recordings during the Wechsler Memory Scale Logical Memory Delayed recall tests using open‐source Speech and Music Interpretation by Large‐space Extraction (OpenSMILE). Reference values were established at the 2.5th, 25th, 50th, 75th, and 97.5th percentiles for each acoustic feature within the entire sample, amyloid‐positive (Aß+) and amyloid‐negative (Aß‐) groups. Differences between the Aß+ and Aß‐ groups were evaluated using Mann‐Whitney U tests. Result Of the 268 participants, 30 (11%) were Aß+. Reference values for all 65 acoustic features were established across all percentile thresholds within the whole sample, the Aß+ and Aß‐ groups (see Table). Four acoustic features differed between the Aß+ and Aß‐ groups: voicingFinalUnclipped (P = 0.03), pcm_fftMag_spectralKurtosis (P = 0.04), MFCC[5] (P = 0.02), and MFCC[10] (P = 0.03). Three of them have higher median values in Aß+ group. As a sound quality measure, VoicingFinalUnclipped indicates the voicing probability of the final fundamental frequency candidate without zero‐clipping. The pcm_fftMag_spectralKurtosis represents magnitude of spectral kurtosis. MFCCs reflect the power spectrum of a sound and are mathematical representations of essential human speech characteristics. Conclusion These results suggest acoustic features may be an effective marker for preclinical AD screening of older adults who are Aß+. Future studies should stratify based on biomarker status to refine reference values and expand doing so with more diverse populations.

Journal Article

Share this book

Add to My Shelf

Harmonics Signal Feature Extraction Techniques: A Review

by Bilik, Petr , Duc, Minh Ly , Martinek, Radek in Algorithms , Artificial intelligence , Clustering

2023

Harmonic estimation is essential for mitigating or suppressing harmonic distortions in power systems. The most important idea is that spectrum analysis, waveform estimation, harmonic source classification, source location, the determination of harmonic source contributions, data clustering, and filter-based harmonic elimination capacity are also considered. The feature extraction method is a fundamental component of the optimization that improves the effectiveness of the Harmonic Mitigation method. In this study, techniques to extract fundamental frequencies and harmonics in the frequency domain, the time domain, and the spatial domain include 67 literature reviews and an overall assessment. The combinations of signal processing with artificial intelligence (AI) techniques are also reviewed and evaluated in this study. The benefit of the feature extraction methods is that the analysis extracts the powerful basic information of the feedback signals from the sensors with the most redundancy, ensuring the highest efficiency for the next sampling process of algorithms. This study provides an overview of the fundamental frequency and harmonic extraction methods of recent years, an analysis, and a presentation of their advantages and limitations.

Journal Article

Share this book

Add to My Shelf

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

by Babu, Ben P , Leena, Mary , Ben, Alex Starlet in Artificial neural networks , Attention , Classifiers

2020

This work attempts to recognize emotions from human speech using prosodic information represented by variations in duration, energy, and fundamental frequency (F0) values. For this, the speech signal is first automatically segmented into syllables. Prosodic features at the utterance (15 features) and syllable level (10 features) are extracted using the syllable boundaries and trained separately using deep neural network classifiers. The effectiveness of the proposed approach is demonstrated on German speech corpus-EMOTional Sensitivity ASistance System (EmotAsS) for people with disabilities, the dataset used for the Interspeech 2018 Atypical Affect Sub-Challenge. The initial set of prosodic features on evaluation yields an unweighted average recall (UAR) of 30.15%. A fusion of the decision scores of these features with spectral features gives a UAR of 36.71%. This paper also employs methods like attention mechanism and feature selection using resampling-based recursive feature elimination (RFE) to enhance system performance. Implementing attention and feature selection followed by a score-level fusion improves the UAR to 36.83% and 40.96% for prosodic features and overall fusion, respectively. The fusion of the scores of the best individual system of the Atypical Affect Sub-Challenge and the proposed system provides a UAR (43.71%) above the best test result reported. The effectiveness of the proposed system has also been demonstrated on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database with a UAR of 63.83%.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter