Catalogue Search | MBRL

Influence of Facial, Head, and Neck Dimensions on Vocal Acoustic Parameters in Polish Speakers

by Łukasz Pawelec , Kamila Słowik , Anna Lipowicz in biometry , formants , fundamental frequency

2026

The relationships between human voice parameters and body dimensions have been previously described, but the connections between voice and face geometry remain poorly researched. This study aims to determine the relationships between face dimensions and acoustic parameters in both sexes and examines 111 adult participants (30 males). Each participant undergoes voice recording, which includes five sustained vowels, along with anthropometric measurements of the neck, head, and face regions. Comparisons between voice parameters and the head, face, and neck regions are conducted employing Pearson’s correlation coefficients (r) and a multiple linear regression model. The results reveal significant relationships between head, neck, face dimensions and acoustic parameters in both sexes. Males with higher noses, greater head circumferences, and wider faces tend to have lower formants and more stable voices. Females with larger head circumferences had lower formant values, and those with greater neck circumferences tend to have more stable voices. Also, females with increased nose height have a lower fourth formant (F4). Moreover, females with wider faces, noses, and jaws tend to have less rough voices (lower jitter) and longer maximum phonation time (MPT). These findings may be useful for scientists and law enforcement authorities in creating algorithms that build face models based on voice signals.

Journal Article

Share this book

Add to My Shelf

Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles

by Cavalcanti, Julio Cesar , Madureira, Sandra , Barbosa, Plinio A. in Acoustic Phonetics , Acoustic properties , Acoustic resonance

2024

Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants—resonance peaks in the vocal tract—in two different speaking styles: Dialogue and Interview . Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios ( C llr ) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower C llr and EER values. F2 performed the worst intra-style in both Dialogue and Interview . The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants ‘F3 + F4’ outperformed the combination of lower formants ‘F1 + F2’. However, in mismatched-style analyses, the magnitude of improvement in C llr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average C llr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower C llr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the ‘F1 + F2’ relation was concerned.

Journal Article

Share this book

Add to My Shelf

Vowel Acoustic Space Development in Children: A Synthesis of Acoustic and Anatomic Data

by Vorperian, Houri K , Kent, Ray D in Acoustic properties , Acoustics , Adolescents

2007

Contact author: Houri K. Vorperian, 481 Waisman Center, 1500 Highland Avenue, Madison, WI 53705. E-mail: vorperian{at}waisman.wisc.edu . Purpose: This article integrates published acoustic data on the development of vowel production. Age specific data on formant frequencies are considered in the light of information on the development of the vocal tract (VT) to create an anatomic–acoustic description of the maturation of the vowel acoustic space for English. Method: Literature searches identified 14 studies reporting data on vowel formant frequencies. Data on corner vowels are summarized graphically to show age- and sex- related changes in the area and shape of the traditional vowel quadrilateral. Conclusions: Vowel development is expressed as follows: (a) establishment of a language-appropriate acoustic representation (e.g., F1–F2 quadrilateral or F1–F2–F3 space), (b) gradual reduction in formant frequencies and F1–F2 area with age, (c) reduction in formant-frequency variability, (d) emergence of male–female differences in formant frequency by age 4 years with more apparent differences by 8 years, (e) jumps in formant frequency at ages corresponding to growth spurts of the VT, and (f) a decline of f 0 after age 1 year, with the decline being more rapid during early childhood and adolescence. Questions remain about optimal procedures for VT normalization and the exact relationship between VT growth and formant frequencies. Comments are included on nasalization and vocal fundamental frequency as they relate to the development of vowel production. KEY WORDS: vowels, speech development, formant frequencies, nasalization, vocal fundamental frequency, vocal tract development CiteULike Connotea Del.icio.us Digg Facebook Reddit Technorati Twitter What's this?

Journal Article

Share this book

Add to My Shelf

Formant analysis of vertebrate vocalizations: achievements, pitfalls, and promises

by Fitch, W. Tecumseh , Reby, David , Valente, Daria in Acoustics , Animal communication , Animal vocalization

2025

When applied to vertebrate vocalizations, source-filter theory, initially developed for human speech, has revolutionized our understanding of animal communication, resulting in major insights into the form and function of animal sounds. However, animal calls and human nonverbal vocalizations can differ qualitatively from human speech, often having more chaotic and higher-frequency sources, making formant measurement challenging. We review the considerable achievements of the “formant revolution” in animal vocal communication research, then highlight several important methodological problems in formant analysis. We offer concrete recommendations for effectively applying source-filter theory to non-speech vocalizations and discuss promising avenues for future research in this area. Brief Formants (vocal tract resonances) play key roles in animal communication, offering researchers exciting promise but also potential pitfalls.

Journal Article

Share this book

Add to My Shelf

Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender

by Skuk, Verena G , Schweinberger, Stefan R in Acoustics , Adult , Androgyny

2014

Purpose: To determine the relative importance of acoustic parameters (fundamental frequency [F0], formant frequencies [FFs], aperiodicity, and spectrum level [SL]) on voice gender perception, the authors used a novel parameter-morphing approach that, unlike spectral envelope shifting, allows the application of nonuniform scale factors to transform formants and more direct comparison of parameter impact. Method: In each of 2 experiments, 16 listeners with normal hearing (8 female, 8 male) classified voice gender for morphs between female and male speakers, using syllable tokens from 2 male-female speaker pairs. Morphs varied single acoustic parameters (Experiment 1) or selected combinations (Experiment 2), keeping residual parameters androgynous, as determined in a baseline experiment. Results: The strongest cue related to gender perception was F0, followed by FF and SL. Aperiodicity did not systematically influence gender perception. Morphing F0 and FF in conjunction produced convincing changes in perceived gender--changes that were equivalent to those for Full morphs interpolating all parameters. Despite the importance of F0, morphing FF and SL in combination produced effective changes in voice gender perception. Conclusions: The most important single parameters for gender perception are, in order, F0, FF, and SL. At the same time, F0 and vocal tract resonances have a comparable impact on voice gender perception.

Journal Article

Share this book

Add to My Shelf

Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech

by Fox, Cynthia , Sapir, Shimon , Ramig, Lorraine O in Acoustics , Aged , Articulation Disorders

2010

Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the \"formant centralization ratio\" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register treatment effects. Method: Speech recordings of 38 individuals with idiopathic Parkinson's disease and dysarthria (19 of whom received 1 month of intensive speech therapy [Lee Silverman Voice Treatment; LSVT LOUD]) and 14 healthy control participants were acoustically analyzed. Vowels were extracted from short phrases. The same vowel-formant elements were used to construct the FCR, expressed as (F2u + F2a + F1i + F1u ) / (F2i + F1a ), the VSA, expressed as ABS([F1i x (F2a - F2u ) + F1a x (F2u - F2i ) + F1u x (F2i - F2a )] / 2), a logarithmically scaled version of the VSA (LnVSA), and the F2i /F2u ratio. Results: Unlike the VSA and the LnVSA, the FCR and F2i /F2u ratio robustly differentiated dysarthric from healthy speech and were not gender sensitive. All metrics effectively registered treatment effects and were strongly correlated with each other. Conclusion: Albeit preliminary, the present findings indicate that the FCR is a sensitive, valid, and reliable acoustic metric for distinguishing dysarthric from unimpaired speech and for monitoring treatment effects, probably because of reduced sensitivity to interspeaker variability and enhanced sensitivity to vowel centralization.

Journal Article

Share this book

Add to My Shelf

Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage

by Samuni, Liran , Girard-Buttoz, Cédric , Bortolato, Tatiana in Acoustics , Animal biology , Animals

2022

The origins of human speech are obscure; it is still unclear what aspects are unique to our species or shared with our evolutionary cousins, in part due to a lack of a common framework for comparison. We asked what chimpanzee and human vocal production acoustics have in common. We examined visible supra-laryngeal articulators of four major chimpanzee vocalizations (hoos, grunts, barks, screams) and their associated acoustic structures, using techniques from human phonetic and animal communication analysis. Data were collected from wild adult chimpanzees, Taï National Park, Ivory Coast. Both discriminant and principal component classification procedures revealed classification of call types. Discriminating acoustic features include voice quality and formant structure, mirroring phonetic features in human speech. Chimpanzee lip and jaw articulation variables also offered similar discrimination of call types. Formant maps distinguished call types with different vowel-like sounds. Comparing our results with published primate data, humans show less F1–F2 correlation and further expansion of the vowel space, particularly for [i] sounds. Unlike recent studies suggesting monkeys achieve human vowel space, we conclude from our results that supra-laryngeal articulatory capacities show moderate evolutionary change, with vowel space expansion continuing through hominoid evolution. Studies on more primate species will be required to substantiate this. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.

Journal Article

Share this book

Add to My Shelf

The Use of Formants’ Correlation in Assessing the Sadness State of the Speakers

by Ștefan-Andrei Ghelţu

2022

The purpose of this study is to further the comparative analysis of the variations in the correlation coefficients of the formants for the Romanian vowels during emotional speech. We created an annotated speech database composed of recordings of speakers pronouncing the following sentences: /vine mama/ (/mother is coming/), /aseară/ (/last night/), /cine a făcut asta/ (/who did this/), first on a neutral tone of voice, and then expressing sadness.The analysis focuses on the influence that sadness has on the vocal signal. The formants and F0 (pitch) of each vowel were extracted. Statistical analysis techniques were applied in order to verify whether the variation of the correlation coefficients between F0 and F1-F4 presents significant variations or whether it tends to be homogenous under the following conditions: (1) same speaker – same vowel – different sentences; (2) same speaker – same vowel – emotional neutrality vs. sadness; (3) different speakers – same sentence – same vowel.

Journal Article

Share this book

Add to My Shelf

Educational computer game supporting skills development in timbre solfege

by Bielesz, Paulina , Gawlas, Krzysztof

2026

This paper presents Sound Jobs, an educational computer game designed to develop timbre solfege skills for sound engineers and audio professionals. Unlike existing eartraining tools that operate as standalone applications or webbased services, Sound Jobs integrates listening exercises within an engaging game narrative set in the 1970s hacking culture. The system was developed using the Unity game engine integrated with FMOD Studio middleware, enabling precise control over audio signal processing parameters essential for timbre discrimination tasks. The game offers three modes (Jobs, Training, and Testing) with exercises covering equalization recognition, dynamic range discrimination, distortion detection, reverb characterization, and delay identification. Evaluation through user surveys with Music in Multimedia students and professional audio engineers revealed positive reception, with participants particularly appreciating the gamification approach and the progressive difficulty system. Based on feedback, a second version was developed incorporating game save functionality and interface improvements. The tool is currently employed in timbre solfege instruction at the University of Silesia. Future development plans include expanding the sound material library and establishing a public repository for broader accessibility.

Journal Article

Share this book

Add to My Shelf

Gender detection in children’s speech utterances for human-robot interaction

by Abdul-Hassan, Alia Karim , Badr, Ameer Abdul-Baqi in Datasets , Feature extraction , Gender

2022

The human voice speech essentially includes paralinguistic information used in many real-time applications. Detecting the children’s gender is considered a challenging task compared to the adult’s gender. In this study, a system for human-robot interaction (HRI) is proposed to detect the gender in children’s speech utterances without depending on the text. The robot's perception includes three phases: Feature’s extraction phase where four formants are measured at each glottal pulse and then a median is calculated across these measurements. After that, three types of features are measured which are formant average (AF), formant dispersion (DF), and formant position (PF). Feature’s standardization phase where the measured feature dimensions are standardized using the z-score method. The semantic understanding phase is where the children’s gender is detected accurately using the logistic regression classifier. At the same time, the action of the robot is specified via a speech response using the text to speech (TTS) technique. Experiments are conducted on the Carnegie Mellon University (CMU) Kids dataset to measure the suggested system’s performance. In the suggested system, the overall accuracy is 98%. The results show a relatively clear improvement in terms of accuracy of up to 13% compared to related works that utilized the CMU Kids dataset.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter