Catalogue Search | MBRL

Gender Differences in Speech Temporal Patterns Detected Using Lagged Co-occurrence Text-Analysis of Personal Narratives

by Cohen, Shuki J. in Averages , Behavioral Science and Psychology , Biological and medical sciences

2009

This paper describes a novel methodology for the detection of speech patterns. Lagged co-occurrence analysis (LCA) utilizes the likelihood that a target word will be uttered in a certain position after a trigger word. Using this methodology, it is possible to uncover a statistically significant repetitive temporal patterns of word use, compared to a random choice of words. To demonstrate this new tool on autobiographical narratives, 200 subjects related each a 5-min story, and these stories were transcribed and subjected to LCA, using software written by the author. This study focuses on establishing the usefulness of LCA in psychological research by examining its associations with gender. The application of LCA to the corpus of personal narratives revealed significant differences in the temporal patterns of using the word “I” between male and female speakers. This finding is particularly demonstrative of the potential for studying speech temporal patterns using LCA, as men and women tend to utter the pronoun “I” in comparable frequencies. Specifically, LCA of the personal narratives showed that, on average, men tended to have shorter interval between their use of the pronoun, while women speak longer between two subsequent utterances of the pronoun. The results of this study are discussed in light of psycholinguistic factors governing male and female speech communities.

Journal Article

Share this book

Add to My Shelf

Speech pattern disorders in verbally fluent individuals with autism spectrum disorder: a machine learning analysis

by Wang, Shuo , Thrasher, Jacob , Yu, Xiangxu in ADOS , audio , machine learning

2025

Diagnosing Autism Spectrum Disorder (ASD) in verbally fluent individuals based on speech patterns in examiner-patient dialogues is challenging because speech-related symptoms are often subtle and heterogeneous. This study aimed to identify distinctive speech characteristics associated with ASD by analyzing recorded dialogues from the Autism Diagnostic Observation Schedule (ADOS-2). We analyzed examiner-participant dialogues from ADOS-2 Module 4 and extracted 40 speech-related features categorized into intonation, volume, rate, pauses, spectral characteristics, chroma, and duration. These acoustic and prosodic features were processed using advanced speech analysis tools and used to train machine learning models to classify ASD participants into two subgroups: those with and without A2-defined speech pattern abnormalities. Model performance was evaluated using cross-validation and standard classification metrics. Using all 40 features, the support vector machine (SVM) achieved an F1-score of 84.49%. After removing Mel-Frequency Cepstral Coefficients (MFCC) and Chroma features to focus on prosodic, rhythmic, energy, and selected spectral features aligned with ADOS-2 A2 scores, performance improved, achieving 85.77% accuracy and an F1-score of 86.27%. Spectral spread and spectral centroid emerged as key features in the reduced set, while MFCC 6 and Chroma 4 also contributed significantly in the full feature set. These findings demonstrate that a compact, diverse set of non-MFCC and selected spectral features effectively characterizes speech abnormalities in verbally fluent individuals with ASD. The approach highlights the potential of context-aware, data-driven models to complement clinical assessments and enhance understanding of speech-related manifestations in ASD.

Journal Article

Share this book

Add to My Shelf

Using Acoustic Speech Patterns From Smartphones to Investigate Mood Disorders: Scoping Review

by Sundram, Frederick , Flanagan, Olivia , Roop, Partha in Acoustics , Bipolar disorder , Boolean

2021

Mood disorders are commonly underrecognized and undertreated, as diagnosis is reliant on self-reporting and clinical assessments that are often not timely. Speech characteristics of those with mood disorders differs from healthy individuals. With the wide use of smartphones, and the emergence of machine learning approaches, smartphones can be used to monitor speech patterns to help the diagnosis and monitoring of mood disorders. The aim of this review is to synthesize research on using speech patterns from smartphones to diagnose and monitor mood disorders. Literature searches of major databases, Medline, PsycInfo, EMBASE, and CINAHL, initially identified 832 relevant articles using the search terms \"mood disorders\", \"smartphone\", \"voice analysis\", and their variants. Only 13 studies met inclusion criteria: use of a smartphone for capturing voice data, focus on diagnosing or monitoring a mood disorder(s), clinical populations recruited prospectively, and in the English language only. Articles were assessed by 2 reviewers, and data extracted included data type, classifiers used, methods of capture, and study results. Studies were analyzed using a narrative synthesis approach. Studies showed that voice data alone had reasonable accuracy in predicting mood states and mood fluctuations based on objectively monitored speech patterns. While a fusion of different sensor modalities revealed the highest accuracy (97.4%), nearly 80% of included studies were pilot trials or feasibility studies without control groups and had small sample sizes ranging from 1 to 73 participants. Studies were also carried out over short or varying timeframes and had significant heterogeneity of methods in terms of the types of audio data captured, environmental contexts, classifiers, and measures to control for privacy and ambient noise. Approaches that allow smartphone-based monitoring of speech patterns in mood disorders are rapidly growing. The current body of evidence supports the value of speech patterns to monitor, classify, and predict mood states in real time. However, many challenges remain around the robustness, cost-effectiveness, and acceptability of such an approach and further work is required to build on current research and reduce heterogeneity of methodologies as well as clinical evaluation of the benefits and risks of such approaches.

Journal Article

Share this book

Add to My Shelf

Automatic speech patterns recognition of commands using SVM and PSO

by Duarte Lopes de Oliveira , Santos Silva, Washington Luis , Saotome, Osamu in Acknowledgment , Algorithms , Automatic speech recognition

2019

This paper proposes the implementation of an Automatic Speech Recognition (ASR) process through extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from voice signal commands, application of the Discrete Cosine Transform (DCT) in these coefficients, Support Vector Machine (SVM) training optimized by the Particle Swarm Optimization (PSO) technique in order to speed up the whole process and using One Against All (OAA) multiclass SVM classification. The main contribution is in training phase that it is the combination of SVM with PSO algorithm, resulting in computational load and processing time reduction. This novel algorithm is called here as PSO-SVM hybrid training application and its performance is shown as the experimental results of voice signal commands in Brazilian Portuguese language. Such commands comprise 10 isolated digits (from zero to nine) and 20 action commands such as “go ahead”, “finish”, “pause”, etc.; that is, there are 30 different pattern types (classes) to be separated (recognized). The process is speaker independent type, that is, the voice bank used in training is different from the one used in tests. The obtained results presented success rates of 92% to 99% during the tests for the classifier using RBF kernel function. Besides, the comparison section shows that this technique is 25 times faster than the recognition without optimization and also, it presents 10% of improvement in recognition success rate when compared to the well-known technique, Gaussian Mixture Models (GMM) algorithm. In addition, the proposed algorithm can be applied in any data processing board for voice signals (DSP, FPGA, DSPIC, ...).

Journal Article

Share this book

Add to My Shelf

Improving the Performance of Automatic Lip-Reading Using Image Conversion Techniques

by Lee, Ki-Seung in Accuracy , Acquisitions & mergers , Artificial neural networks

2024

Variation in lighting conditions is a major cause of performance degradation in pattern recognition when using optical imaging. In this study, infrared (IR) and depth images were considered as possible robust alternatives against variations in illumination, particularly for improving the performance of automatic lip-reading. The variations due to lighting conditions were quantitatively analyzed for optical, IR, and depth images. Then, deep neural network (DNN)-based lip-reading rules were built for each image modality. Speech recognition techniques based on IR or depth imaging required an additional light source that emitted light in the IR range, along with a special camera. To mitigate this problem, we propose a method that does not use an IR/depth image directly, but instead estimates images based on the optical RGB image. To this end, a modified U-net was adopted to estimate the IR/depth image from an optical RGB image. The results show that the IR and depth images were rarely affected by the lighting conditions. The recognition rates for the optical, IR, and depth images were 48.29%, 95.76%, and 92.34%, respectively, under various lighting conditions. Using the estimated IR and depth images, the recognition rates were 89.35% and 80.42%, respectively. This was significantly higher than for the optical RGB images.

Journal Article

Share this book

Add to My Shelf

Speech Features as Predictors of Momentary Depression Severity in Patients With Depressive Disorder Undergoing Sleep Deprivation Therapy: Ambulatory Assessment Pilot Study

by Zillich, Lea , Limberger, Matthias F , Schultz, Tanja in Affect (Psychology) , Ambulatory assessment , Antidepressants

2024

The use of mobile devices to continuously monitor objectively extracted parameters of depressive symptomatology is seen as an important step in the understanding and prevention of upcoming depressive episodes. Speech features such as pitch variability, speech pauses, and speech rate are promising indicators, but empirical evidence is limited, given the variability of study designs. Previous research studies have found different speech patterns when comparing single speech recordings between patients and healthy controls, but only a few studies have used repeated assessments to compare depressive and nondepressive episodes within the same patient. To our knowledge, no study has used a series of measurements within patients with depression (eg, intensive longitudinal data) to model the dynamic ebb and flow of subjectively reported depression and concomitant speech samples. However, such data are indispensable for detecting and ultimately preventing upcoming episodes. In this study, we captured voice samples and momentary affect ratings over the course of 3 weeks in a sample of patients (N=30) with an acute depressive episode receiving stationary care. Patients underwent sleep deprivation therapy, a chronotherapeutic intervention that can rapidly improve depression symptomatology. We hypothesized that within-person variability in depressive and affective momentary states would be reflected in the following 3 speech features: pitch variability, speech pauses, and speech rate. We parametrized them using the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) from open-source Speech and Music Interpretation by Large-Space Extraction (openSMILE; audEERING GmbH) and extracted them from a transcript. We analyzed the speech features along with self-reported momentary affect ratings, using multilevel linear regression analysis. We analyzed an average of 32 (SD 19.83) assessments per patient. Analyses revealed that pitch variability, speech pauses, and speech rate were associated with depression severity, positive affect, valence, and energetic arousal; furthermore, speech pauses and speech rate were associated with negative affect, and speech pauses were additionally associated with calmness. Specifically, pitch variability was negatively associated with improved momentary states (ie, lower pitch variability was linked to lower depression severity as well as higher positive affect, valence, and energetic arousal). Speech pauses were negatively associated with improved momentary states, whereas speech rate was positively associated with improved momentary states. Pitch variability, speech pauses, and speech rate are promising features for the development of clinical prediction technologies to improve patient care as well as timely diagnosis and monitoring of treatment response. Our research is a step forward on the path to developing an automated depression monitoring system, facilitating individually tailored treatments and increased patient empowerment.

Journal Article

Share this book

Add to My Shelf

The predictors of foreign-accentedness in the home language of Polish–English bilingual children

by WREMBEL, MAGDALENA , SZEWCZYK, JAKUB , OTWINOWSKA, AGNIESZKA in Accentuation , Acceptability , Adult Basic Education

2019

We investigated the speech patterns and accentedness of Polish–English bilingual children raised in Great Britain to verify whether their L1 Polish would be perceived as different from that of monolinguals matched for age and socioeconomic status. To this end, Polish-language speech samples of 32 bilinguals and 10 monolinguals (a 3:1 ratio, M Age = 5.79) were phonetically analysed by trained phoneticians and rated by 55 Polish raters, who assessed the degree of native accent, intelligibility, acceptability and perceived age. The results show significant differences in the phonetic performance of bilingual and monolingual children – both in terms of atypical speech patterns uncovered in the phonetic analysis and in terms of the holistic accentedness ratings. We also explored the socio-linguistic predictors of accent ratings in bilingual speech and found that the amount of L1 Polish input was the main predictor of accentedness in children's L1 Polish speech, while L2 English input was marginally significant. (149)

Journal Article

Share this book

Add to My Shelf

A Wavelet Based Hybrid Threshold Transform Method for Speech Intelligibility and Quality in Noisy Speech Patterns of English Language

by Patil, Sharada , Kaur Ojhla, Harjeet in Adaptive filters , Algorithms , Communications Engineering

2020

The paper proposes a method to improve the performance of speech communication system in a highly noisy industrial environment. For the improvement, different speech signals are considered which includes signals from different environments such as car noise, railway station, babble noise, street noise which are corrupted with additional noise as input data set for processing. This database is processed using suitable filters which will remove the effect of noise to some extent. Different algorithms have been proposed to minimize the effect of noise to a certain limit. The denoising algorithms are generally the different wavelet thresholding method which removes the noise from the speech signal. Many researchers have worked on soft and hard thresholding for image processing. The proposed method of hybrid thresholding comprises of both soft and hard thresholding process which is comparatively better method than the previous methods. The method can be implemented for the non-stationary noise and it also removes the problems of edges. Unlike the traditional way of using single value, different values are used for the adaptive filtering to remove the edges. During the course of experiments, the dataset of IIIT-H with a set of noisy files from Noizeus and AURORA database having sampling rate of 16 kHz has been used. Results are calculated with subjective and objective measures for fine and broad level quality assessment. SNR, SSNR, PSNR, NRMSE, and PESQ parameters are used as performance parameters and outperform with other combinations as compared to conventional methods. The hybrid threshold method yields better results with significant improvement in speech quality and intelligibility.

Journal Article

Share this book

Add to My Shelf

Psychoeducational intervention improves Chinese mothers' parenting and symptoms of their autistic children

by Zu, Yan-Fei , Chen, Ya-Ting , Xu, Guangxing in Alleviation , Anxiety , Autism

2019

In China, mothers of children with autism spectrum disorder struggle with parenting, often becoming depressed. This can harm the well-being and sociocognitive development of their children. We determined whether a psychoeducational group intervention increases the frequency of mothers' positive speech patterns and alleviates their depression, and, in turn, whether these changes improve the behavior of their children. Mothers (8 from Shanghai, 8 from Taiwan) participated in a 12-week intervention of a weekly counseling session. At the end, analysis of transcripts of the mothers' speech showed that the frequency of positive emotional words increased and negative emotional words decreased, and their scores on the Beck Depression Inventory and Beck Anxiety Inventory decreased significantly. Children also showed a significant reduction in scores on the Childhood Autism Rating Scale. These results demonstrate that counseling via a semistructured group intervention can improve mothers' parenting and coping skills, and help to alleviate their children's autism symptoms.

Journal Article

Share this book

Add to My Shelf

Image-based features for speech signal classification

by Mukherjee Himadri , Phadikar Santanu , Kaushik, Roy in Accuracy , Artificial neural networks , Classification

2020

Like other applications, under the purview of pattern classification, analyzing speech signals is crucial. People often mix different languages while talking which makes this task complicated. This happens mostly in India, since different languages are used from one state to another. Among many, Southern part of India suffers a lot from this situation, where distinguishing their languages is important. In this paper, we propose image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns. Modified Mel frequency cepstral coefficient (MFCC) features namely MFCC- Statistics Grade (MFCC-SG) were extracted which were visualized by plotting techniques and thereafter fed to a convolutional neural network. In this study, we used the top 4 languages namely Telugu, Tamil, Malayalam, and Kannada. Experiments were performed on more than 900 hours of data collected from YouTube leading to over 150000 images and the highest accuracy of 94.51% was obtained.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter