Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
20,262 result(s) for "Artificial Speech"
Sort by:
Robotics, Vision and Control : Fundamental Algorithms In MATLAB® Second, Completely Revised, Extended And Updated Edition
Robotic vision, the combination of robotics and computer vision, involves the application of computer algorithms to data acquired from sensors. The research community has developed a large body of such algorithms but for a newcomer to the field this can be quite daunting. For over 20 years the author has maintained two open-source MATLAB® Toolboxes, one for robotics and one for vision. They provide implementations of many important algorithms and allow users to work with real problems, not just trivial examples. This book makes the fundamental algorithms of robotics, vision and control accessible to all. It weaves together theory, algorithms and examples in a narrative that covers robotics and computer vision separately and together. Using the latest versions of the Toolboxes the author shows how complex problems can be decomposed and solved using just a few simple lines of code. The topics covered are guided by real problems observed by the author over many years as a practitioner of both robotics and computer vision. It is written in an accessible but informative style, easy to read and absorb, and includes over 1000 MATLAB and Simulink® examples and over 400 figures. The book is a real walk through the fundamentals of mobile robots, arm robots. then camera models, image processing, feature extraction and multi-view geometry and finally bringing it all together with an extensive discussion of visual servo systems. This second edition is completely revised, updated and extended with coverage of Lie groups, matrix exponentials and twists; inertial navigation; differential drive robots; lattice planners; pose-graph SLAM and map making; restructured material on arm-robot kinematics and dynamics; series-elastic actuators and operational-space control; Lab color spaces; light field cameras; structured light, bundle adjustment and visual odometry; and photometric visual servoing. \"An authoritative book, reaching across fields, thoughtfully conceived and brilliantly accomplished!\" OUSSAMA KHATIB, Stanford.
A comparison of different support vector machine kernels for artificial speech detection
As the emergence of the voice biometric provides enhanced security and convenience, voice biometric-based applications such as speaker verification were gradually replacing the authentication techniques that were less secure. However, the automatic speaker verification (ASV) systems were exposed to spoofing attacks, especially artificial speech attacks that can be generated with a large amount in a short period of time using state-of-the-art speech synthesis and voice conversion algorithms. Despite the extensively used support vector machine (SVM) in recent works, there were none of the studies shown to investigate the performance of different SVM settings against artificial speech detection. In this paper, the performance of different SVM settings in artificial speech detection will be investigated. The objective is to identify the appropriate SVM kernels for artificial speech detection. An experiment was conducted to find the appropriate combination of the proposed features and SVM kernels. Experimental results showed that the polynomial kernel was able to detect artificial speech effectively, with an equal error rate (EER) of 1.42% when applied to the presented handcrafted features.
Surface Electromyography–Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech
Purpose: This study aimed to evaluate a novel communication system designed to translate surface electromyographic (sEMG) signals from articulatory muscles into speech using a personalized, digital voice. The system was evaluated for word recognition, prosodic classification, and listener perception of synthesized speech. Method: sEMG signals were recorded from the face and neck as speakers with (n = 4) and without (n = 4) laryngectomy subvocally recited (silently mouthed) a speech corpus comprising 750 phrases (150 phrases with variable phrase-level stress). Corpus tokens were then translated into speech via personalized voice synthesis (n = 8 synthetic voices) and compared against phrases produced by each speaker when using their typical mode of communication (n = 4 natural voices, n = 4 electrolaryngeal [EL] voices). Naïve listeners (n = 12) evaluated synthetic, natural, and EL speech for acceptability and intelligibility in a visual sort-and-rate task, as well as phrasal stress discriminability via a classification mechanism. Results: Recorded sEMG signals were processed to translate sEMG muscle activity into lexical content and categorize variations in phrase-level stress, achieving a mean accuracy of 96.3% (SD = 3.10%) and 91.2% (SD = 4.46%), respectively. Synthetic speech was significantly higher in acceptability and intelligibility than EL speech, also leading to greater phrasal stress classification accuracy, whereas natural speech was rated as the most acceptable and intelligible, with the greatest phrasal stress classification accuracy. Conclusion: This proof-of-concept study establishes the feasibility of using subvocal sEMG-based alternative communication not only for lexical recognition but also for prosodic communication in healthy individuals, as well as those living with vocal impairments and residual articulatory function.
A Systematic Review of Tablet Computers and Portable Media Players as Speech Generating Devices for Individuals with Autism Spectrum Disorder
Powerful, portable, off-the-shelf handheld devices, such as tablet based computers (i.e., iPad ® ; Galaxy ® ) or portable multimedia players (i.e., iPod ® ), can be adapted to function as speech generating devices for individuals with autism spectrum disorders or related developmental disabilities. This paper reviews the research in this new and rapidly growing area and delineates an agenda for future investigations. In general, participants using these devices acquired verbal repertoires quickly. Studies comparing these devices to picture exchange or manual sign language found that acquisition was often quicker when using a tablet computer and that the vast majority of participants preferred using the device to picture exchange or manual sign language. Future research in interface design, user experience, and extended verbal repertoires is recommended.
Predicting Perceived Vocal Roughness Using a Bio-Inspired Computational Model of Auditory Temporal Envelope Processing
Purpose: Vocal roughness is often present in many voice disorders but the assessment of roughness mainly depends on the subjective auditory-perceptual evaluation and lacks acoustic correlates. This study aimed to apply the concept of roughness in general sound quality perception to vocal roughness assessment and to characterize the relationship between vocal roughness and temporal envelop fluctuation measures obtained from an auditory model. Method: Ten /[open back unrounded vowel]/ recordings with a wide range of roughness were selected from an existing database. Ten listeners rated the roughness of the recordings in a single-variable matching task. Temporal envelope fluctuations of the recordings were analyzed with an auditory processing model of amplitude modulation that utilizes a modulation filterbank of different modulation frequencies. Pitch strength and the smoothed cepstral peak prominence were also obtained for comparison. Results: Individual simple regression models yielded envelope standard deviation from a modulation filter with a low center frequency (64.3 Hz) as a statistically significant predictor of vocal roughness with a strong coefficient of determination (r[subscript 2] = 0.80). Pitch strength and CPPS were not significant predictors of roughness. Conclusion: This result supports the possible utility of envelope fluctuation measures from an auditory model as objective correlates of vocal roughness.
Does the Speech Cue Profile Affect Response to Amplitude Envelope Distortion?
Purpose: A broad area of interest to our group is to understand the consequences of the \"cue profile\" (a measure of how well a listener can utilize audible temporal and/or spectral cues for listening scenarios in which a subset of cues is distorted). The study goal was to determine if listeners whose cue profile indicated that they primarily used temporal cues for recognition would respond differently to speech-envelope distortion than listeners who utilized both spectral and temporal cues. Method: Twenty-five adults with sensorineural hearing loss participated in the study. The listener's cue profile was measured by analyzing identification patterns for a set of synthetic syllables in which envelope rise time and formant transitions were varied. A linear discriminant analysis quantified the relative contributions of spectral and temporal cues to identification patterns. Low-context sentences in noise were processed with time compression, wide-dynamic range compression, or a combination of time compression and wide-dynamic range compression to create a range of speech-envelope distortions. An acoustic metric, a modified version of the Spectral Correlation Index, was calculated to quantify envelope distortion. Results: A binomial generalized linear mixed-effects model indicated that envelope distortion, the cue profile, the interaction between envelope distortion and the cue profile, and the pure-tone average were significant predictors of sentence recognition. Conclusions: The listeners with good perception of spectro-temporal contrasts were more resilient to the detrimental effects of envelope compression than listeners who used temporal cues to a greater extent. The cue profile may provide information about individual listening that can direct choice of hearing aid parameters, especially those parameters that affect the speech envelope.
Dissociation Between Linguistic and Nonlinguistic Statistical Learning in Children with Autism
Statistical learning (SL), the ability to detect and extract regularities from inputs, is considered a domain-general building block for typical language development. We compared 55 verbal children with autism (ASD, 6–12 years) and 50 typically-developing children in four SL tasks. The ASD group exhibited reduced learning in the linguistic SL tasks (syllable and letter), but showed intact learning for the nonlinguistic SL tasks (tone and image). In the ASD group, better linguistic SL was associated with higher language skills measured by parental report and sentence recall. Therefore, the atypicality of SL in autism is not domain-general but tied to specific processing constraints related to verbal stimuli. Our findings provide a novel perspective for understanding language heterogeneity in autism.
Processing of Acoustic Cues in Lexical-Tone Identification by Pediatric Cochlear-Implant Recipients
Purpose: The objective was to investigate acoustic cue processing in lexical-tone recognition by pediatric cochlear-implant (CI) recipients who are native Mandarin speakers. Method: Lexical-tone recognition was assessed in pediatric CI recipients and listeners with normal hearing (NH) in 2 tasks. In Task 1, participants identified naturally uttered words that were contrastive in lexical tones. For Task 2, a disyllabic word (\"yanjing\") was manipulated orthogonally, varying in fundamental-frequency (F0) contours and duration patterns. Participants identified each token with the second syllable \"jing\" pronounced with Tone 1 (a high level tone) as \"eyes\" or with Tone 4 (a high falling tone) as \"eyeglasses.\" Results: CI participants' recognition accuracy was significantly lower than NH listeners' in Task 1. In Task 2, CI participants' reliance on F0 contours was significantly less than that of NH listeners; their reliance on duration patterns, however, was significantly higher than that of NH listeners. Both CI and NH listeners' performance in Task 1 was significantly correlated with their reliance on F0 contours in Task 2. Conclusion: For pediatric CI recipients, lexical-tone recognition using naturally uttered words is primarily related to their reliance on F0 contours, although duration patterns may be used as an additional cue.
Auditory Sensory Gating of Speech and Nonspeech Stimuli
Purpose: Auditory sensory gating is a neural measure of inhibition and is typically measured with a click or tonal stimulus. This electrophysiological study examined if stimulus characteristics and the use of speech stimuli affected auditory sensory gating indices. Method: Auditory event-related potentials were elicited using natural speech, synthetic speech, and nonspeech stimuli in a traditional auditory gating paradigm in 15 adult listeners with normal hearing. Cortical responses were recorded at 64 electrode sites, and peak amplitudes and latencies to the different stimuli were extracted. Individual data were analyzed using repeated-measures analysis of variance. Results: Significant gating of P1-N1-P2 peaks was observed for all stimulus types. N1-P2 cortical responses were affected by stimulus type, with significantly less neural inhibition of the P2 response observed for natural speech compared to nonspeech and synthetic speech. Conclusions: Auditory sensory gating responses can be measured using speech and nonspeech stimuli in listeners with normal hearing. The results of the study indicate the amount of gating and neural inhibition observed is affected by the spectrotemporal characteristics of the stimuli used to evoke the neural responses.
Exploring the “anchor word” effect in infants: Segmentation and categorisation of speech with and without high frequency words
High frequency words play a key role in language acquisition, with recent work suggesting they may serve both speech segmentation and lexical categorisation. However, it is not yet known whether infants can detect novel high frequency words in continuous speech, nor whether they can use them to help learning for segmentation and categorisation at the same time. For instance, when hearing “ you eat the biscuit ”, can children use the high-frequency words “ you ” and “ the ” to segment out “ eat ” and “ biscuit ”, and determine their respective lexical categories? We tested this in two experiments. In Experiment 1, we familiarised 12-month-old infants with continuous artificial speech comprising repetitions of target words , which were preceded by high-frequency marker words that distinguished the targets into two distributional categories. In Experiment 2, we repeated the task using the same language but with additional phonological cues to word and category structure. In both studies, we measured learning with head-turn preference tests of segmentation and categorisation, and compared performance against a control group that heard the artificial speech without the marker words (i.e., just the targets). There was no evidence that high frequency words helped either speech segmentation or grammatical categorisation. However, segmentation was seen to improve when the distributional information was supplemented with phonological cues (Experiment 2). In both experiments, exploratory analysis indicated that infants’ looking behaviour was related to their linguistic maturity (indexed by infants’ vocabulary scores) with infants with high versus low vocabulary scores displaying novelty and familiarity preferences, respectively. We propose that high-frequency words must reach a critical threshold of familiarity before they can be of significant benefit to learning.