Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
408 result(s) for "Formants (Speech)"
Sort by:
Sonic Signatures
Sonic Signatures is devoted to the representation of sound patterns and sound structures across a range of typologically distinct languages with the aim of understanding the nature of linguistic data structures from a balance between representational economy and the interfaces of phonology with other domains, including acoustic and visual.
Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles
Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants—resonance peaks in the vocal tract—in two different speaking styles: Dialogue and Interview . Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios ( C llr ) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower C llr and EER values. F2 performed the worst intra-style in both Dialogue and Interview . The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants ‘F3 + F4’ outperformed the combination of lower formants ‘F1 + F2’. However, in mismatched-style analyses, the magnitude of improvement in C llr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average C llr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower C llr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the ‘F1 + F2’ relation was concerned.
Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech
Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the \"formant centralization ratio\" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register treatment effects. Method: Speech recordings of 38 individuals with idiopathic Parkinson's disease and dysarthria (19 of whom received 1 month of intensive speech therapy [Lee Silverman Voice Treatment; LSVT LOUD]) and 14 healthy control participants were acoustically analyzed. Vowels were extracted from short phrases. The same vowel-formant elements were used to construct the FCR, expressed as (F2u + F2a + F1i + F1u ) / (F2i + F1a ), the VSA, expressed as ABS([F1i x (F2a - F2u ) + F1a x (F2u - F2i ) + F1u x (F2i - F2a )] / 2), a logarithmically scaled version of the VSA (LnVSA), and the F2i /F2u ratio. Results: Unlike the VSA and the LnVSA, the FCR and F2i /F2u ratio robustly differentiated dysarthric from healthy speech and were not gender sensitive. All metrics effectively registered treatment effects and were strongly correlated with each other. Conclusion: Albeit preliminary, the present findings indicate that the FCR is a sensitive, valid, and reliable acoustic metric for distinguishing dysarthric from unimpaired speech and for monitoring treatment effects, probably because of reduced sensitivity to interspeaker variability and enhanced sensitivity to vowel centralization.
Conventional and contemporary approaches used in text to speech synthesis: a review
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human like natural sounding voice from the written text, is gaining popularity in the field of speech processing. For any TTS, intelligibility and naturalness are the two important measures for defining the quality of a synthesized sound which is highly dependent on the prosody modeling using acoustic model of synthesizer. The purpose of this review survey is firstly to study and analyze the various approaches used traditionally (articulatory synthesis, formant synthesis, concatenative speech synthesis and statistical parametric techniques based on hidden Markov model) and recently (statistical parametric based on deep learning approaches) for acoustic modeling with their pros and cons. The approaches based on deep learning to build the acoustic model has significantly contributed to the advancement of TTS as models based on deep learning are capable of modelling the complex context dependencies in the input data. Apart from these, this article also reviews the TTS approaches for generating speech with different voices and emotions to makes the TTS more realistic to use. It also addresses the subjective and objective metrics used to measure the quality of the synthesized voice. Various well known speech synthesis systems based on autoregressive and non-autoregressive models such as Tacotron, Deep Voice, WaveNet, Parallel WaveNet, Parallel Tacotron, FastSpeech by global tech-giant Google, Facebook, Microsoft employed the architecture of deep learning for end-to-end speech waveform generation and attained a remarkable mean opinion score (MOS).
Vowel Acoustic Space Development in Children: A Synthesis of Acoustic and Anatomic Data
Contact author: Houri K. Vorperian, 481 Waisman Center, 1500 Highland Avenue, Madison, WI 53705. E-mail: vorperian{at}waisman.wisc.edu . Purpose: This article integrates published acoustic data on the development of vowel production. Age specific data on formant frequencies are considered in the light of information on the development of the vocal tract (VT) to create an anatomic–acoustic description of the maturation of the vowel acoustic space for English. Method: Literature searches identified 14 studies reporting data on vowel formant frequencies. Data on corner vowels are summarized graphically to show age- and sex- related changes in the area and shape of the traditional vowel quadrilateral. Conclusions: Vowel development is expressed as follows: (a) establishment of a language-appropriate acoustic representation (e.g., F1–F2 quadrilateral or F1–F2–F3 space), (b) gradual reduction in formant frequencies and F1–F2 area with age, (c) reduction in formant-frequency variability, (d) emergence of male–female differences in formant frequency by age 4 years with more apparent differences by 8 years, (e) jumps in formant frequency at ages corresponding to growth spurts of the VT, and (f) a decline of f 0 after age 1 year, with the decline being more rapid during early childhood and adolescence. Questions remain about optimal procedures for VT normalization and the exact relationship between VT growth and formant frequencies. Comments are included on nasalization and vocal fundamental frequency as they relate to the development of vowel production. KEY WORDS: vowels, speech development, formant frequencies, nasalization, vocal fundamental frequency, vocal tract development CiteULike     Connotea     Del.icio.us     Digg     Facebook     Reddit     Technorati     Twitter     What's this?
Formant analysis of vertebrate vocalizations: achievements, pitfalls, and promises
When applied to vertebrate vocalizations, source-filter theory, initially developed for human speech, has revolutionized our understanding of animal communication, resulting in major insights into the form and function of animal sounds. However, animal calls and human nonverbal vocalizations can differ qualitatively from human speech, often having more chaotic and higher-frequency sources, making formant measurement challenging. We review the considerable achievements of the “formant revolution” in animal vocal communication research, then highlight several important methodological problems in formant analysis. We offer concrete recommendations for effectively applying source-filter theory to non-speech vocalizations and discuss promising avenues for future research in this area. Brief Formants (vocal tract resonances) play key roles in animal communication, offering researchers exciting promise but also potential pitfalls.
Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison
The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.
Consonant Cluster Productions in Preschool Children Who Speak African American English
Purpose: The aim of this study was to compare word-initial and word-final consonant cluster productions in young children who speak African American English (AAE) and compare their productions to what we know about cluster productions in children who speak Mainstream American English (MAE), in order to minimize misdiagnosis of speech sound disorders. Method: Twenty-two children (ages 2;10-5;4 [years;months]) labeled pictures whose names contained at least one consonant cluster in word-initial and/or word-final position. Most two-element clusters of English were sampled, the majority in two or more words. The participants' responses were transcribed using a consensus transcription procedure. Each cluster attempt was analyzed for its similarity with MAE. Results: Percentage matching scores were significantly higher for word-initial than word-final clusters. Word-final clusters produced as singletons were significantly more common than word-final cluster substitutions. However, word-initial cluster substitutions were significantly more common than word-initial clusters produced as singletons. Word-initial cluster mismatches were consistent with markedness theory and the sonority sequencing principle (SSP). By contrast, word-final cluster mismatches were not consistent with the SSP, while the voicing generalization seen in adult speakers of AAE was evident. Conclusion: Culturally and linguistically appropriate assessment of phonological development in children who speak AAE requires an understanding of the contrastive and noncontrastive features exemplified in their consonant cluster productions.
Icelandic Children's Acquisition of Consonants and Consonant Clusters
Purpose: This study investigated Icelandic-speaking children's acquisition of singleton consonants and consonant clusters. Method: Participants were 437 typically developing children aged 2;6-7;11 (years;months) acquiring Icelandic as their first language. Single-word speech samples of the 47 single consonants and 45 consonant clusters were collected using Málhljó[voiced dental fricative]apróf [thorn]M ([thorn]M's Test of Speech Sound Disorders). Results: Percentage of consonants correct for children aged 2;6-2;11 was 73.12 (SD = 13.33) and increased to 98.55 (SD = 3.24) for children aged 7;0-7;11. Overall, singleton consonants were more likely to be accurate than consonant clusters. The earliest consonants to be acquired were /m, n, p, t, j, h/ in word-initial position and /f, l/ within words. The last consonants to be acquired were /x, r, [voiceless alveolar trill], s, [voiceless dental fricative], [voiceless alveolar nasal]/, and consonant clusters in word-initial /sv-, stl-, str-, skr-, [voiceless dental fricative]r-/, within-word /-[voiced dental fricative]r-, -tl-/, and word-final /-k[voiceless alveolar lateral approximant], -xt/ contexts. Within-word phonemes were more often accurate than those in word-initial position, with word-final position the least accurate. Accuracy of production was significantly related to increasing age, but not sex. Conclusions: This is the first comprehensive study of consonants and consonant cluster acquisition by typically developing Icelandic-speaking children. The findings align with trends for other Germanic languages; however, there are notable language-specific differences of clinical importance.
Effects of Phonetic Context on Relative Fundamental Frequency
Purpose: The effect of phonetic context on relative fundamental frequency (RFF) was examined, in order to develop stimuli sets with minimal within-speaker variability that can be implemented in future clinical protocols. Method: Sixteen speakers with healthy voices produced RFF stimuli. Uniform utterances consisted of 3 repetitions of the same voiced sonorant-voiceless consonant-voiced sonorant speech sequence; moderately variable sentences contained speech sequences with a single voiceless phoneme (/f/, /s/, /?/, /p/, /t/, or /k/); highly variable sentences were loaded with speech sequences using multiple phonemes. Effects of stimulus type (uniform, moderately variable, and highly variable) and phoneme identity (/f/, /s/, /?/, /p/, /t/, and /k/) on RFF means and standard deviations were determined. Results: Stimulus type and the interaction of vocal cycle and stimulus type were significant for RFF means and standard deviations but with small effect sizes. Phoneme identity and the interaction of vocal cycle and phoneme identity on RFF means and standard deviations were also significant with small to medium effect sizes. Conclusions: For speakers with healthy voices, uniform utterances with /f/ and /?/ have the lowest standard deviations and thus are recommended for RFF-based assessments. Future work is necessary to extend these findings to disordered voices.