Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
56,836 result(s) for "Voice recognition"
Sort by:
Toward a Chatbot for Financial Sustainability
This study examines technology effectiveness for industry demand in which artificial intelligence (AI) is applied in the financial sector. It summarizes prior studies on chatbot and customer service and investigates theories on acceptance attitudes for innovative technologies. By setting variables, the study examines bank revenue methodologically and assesses the impact of customer service and chatbot on bank revenues through customer age classification. The results indicate that new product-oriented funds or housing subscription savings are more suitable for purchase through customer service than through chatbot. However, services for existing products through chatbot positively affect banks’ net income. When classified by age, purchases by the majority age group in the channel positively affect bank profits. Finally, there is a tendency to process small banking transactions through the chatbot system, which saves transaction and management costs, positively affecting profits. Through empirical analysis, we first examine the effect of an AI-based chatbot system implemented to strengthen financial soundness and suggest policy alternatives. Second, we use banking data to increase the study’s real-life applicability and prove that problems in customer service can be solved through a chatbot system. Finally, we investigate how resistance to technology can be reduced and efficiently accommodated.
Normal recognition of famous voices in developmental prosopagnosia
Developmental prosopagnosia (DP) is a condition characterised by lifelong face recognition difficulties. Recent neuroimaging findings suggest that DP may be associated with aberrant structure and function in multimodal regions of cortex implicated in the processing of both facial and vocal identity. These findings suggest that both facial and vocal recognition may be impaired in DP. To test this possibility, we compared the performance of 22 DPs and a group of typical controls, on closely matched tasks that assessed famous face and famous voice recognition ability. As expected, the DPs showed severe impairment on the face recognition task, relative to typical controls. In contrast, however, the DPs and controls identified a similar number of voices. Despite evidence of interactions between facial and vocal processing, these findings suggest some degree of dissociation between the two processing pathways, whereby one can be impaired while the other develops typically. A possible explanation for this dissociation in DP could be that the deficit originates in the early perceptual encoding of face structure, rather than at later, post-perceptual stages of face identity processing, which may be more likely to involve interactions with other modalities.
Neural dissociation of the acoustic and cognitive representation of voice identity
Recognising a speaker's identity by the sound of their voice is important for successful interaction. The skill depends on our ability to discriminate minute variations in the acoustics of the vocal signal. Performance on voice identity assessments varies widely across the population. The neural underpinnings of this ability and its individual differences, however, remain poorly understood. Here we provide critical tests of a theoretical framework for the neural processing stages of voice identity and address how individual differences in identity discrimination mediate activation in this neural network. We scanned 40 individuals on an fMRI adaptation task involving voices drawn from morphed continua between two personally familiar identities. Analyses dissociated neuronal effects induced by repetition of acoustically similar morphs from those induced by a switch in perceived identity. Activation in temporal voice-sensitive areas decreased with acoustic similarity between consecutive stimuli. This repetition suppression effect was mediated by the performance on an independent voice assessment and this result highlights an important functional role of adaptive coding in voice expertise. Bilateral anterior insulae and medial frontal gyri responded to a switch in perceived voice identity compared to an acoustically equidistant switch within identity. Our results support a multistep model of voice identity perception.
Neural Network-Enabled Flexible Pressure and Temperature Sensor with Honeycomb-like Architecture for Voice Recognition
Flexible pressure sensors have been studied as wearable voice-recognition devices to be utilized in human-machine interaction. However, the development of highly sensitive, skin-attachable, and comfortable sensing devices to achieve clear voice detection remains a considerable challenge. Herein, we present a wearable and flexible pressure and temperature sensor with a sensitive response to vibration, which can accurately recognize the human voice by combing with the artificial neural network. The device consists of a polyethylene terephthalate (PET) printed with a silver electrode, a filament-microstructured polydimethylsiloxane (PDMS) film embedded with single-walled carbon nanotubes and a polyimide (PI) film sputtered with a patterned Ti/Pt thermistor strip. The developed pressure sensor exhibited a pressure sensitivity of 0.398 kPa−1 in the low-pressure regime, and the fabricated temperature sensor shows a desirable temperature coefficient of resistance of 0.13% ∘C in the range of 25 ∘C to 105 ∘C. Through training and testing the neural network model with the waveform data of the sensor obtained from human pronunciation, the vocal fold vibrations of different words can be successfully recognized, and the total recognition accuracy rate can reach 93.4%. Our results suggest that the fabricated sensor has substantial potential for application in the human-computer interface fields, such as voice control, vocal healthcare monitoring, and voice authentication.
A French Version of a Voice Recognition Symbol Digit Modalities Test Analog
We previously showed that a fully automated voice recognition analog of the Symbol Digit Modalities Test (VR-SDMT) is sensitive in detecting processing speed deficits in people with multiple sclerosis (pwMS). We subsequently developed a French language version and administered it to 49 French-Canadian pwMS and 29 matched healthy control (HC) subjects. Significant correlations between the VR-SDMT and traditional oral SDMT were found in the MS (r = −0.716, p < 0.001) and HC (r = −0.623, p < 0.001) groups. These findings in French replicate our previous findings and confirm the utility of voice recognition software in assessing cognition in pwMS.
Evaluation of a Sensor System for Detecting Humans Trapped under Rubble: A Pilot Study
Rapid localization of injured survivors by rescue teams to prevent death is a major issue. In this paper, a sensor system for human rescue including three different types of sensors, a CO2 sensor, a thermal camera, and a microphone, is proposed. The performance of this system in detecting living victims under the rubble has been tested in a high-fidelity simulated disaster area. Results show that the CO2 sensor is useful to effectively reduce the possible concerned area, while the thermal camera can confirm the correct position of the victim. Moreover, it is believed that the use of microphones in connection with other sensors would be of great benefit for the detection of casualties. In this work, an algorithm to recognize voices or suspected human noise under rubble has also been developed and tested.
Design and Development of Smart Blind Stick for Visually Impaired People
Blindness is a condition in which a person loses their ability to see because of physiological or neurological issues. This paper suggests a smart blind stick that uses modern technologies to make traveling easier for visually impaired people. Ultrasonic, light, water, and height sensors are used in the blind stick. An ultrasonic sensor is used to identify obstacles ahead of blind people. In addition, water sensors detect the presence of water and leaks when deployed in regions. One ultrasonic sensor is placed on the walking stick to classify the height of a barrier. An LDR is used to provide information about day and night. It also has GPS to help blind people track their location. Furthermore, a voice recognition system was employed to deliver the message by the human voice. Its height can be adjusted easily. After understanding customers’ demands, some ideas are added to create a product prototype. A cost analysis has been conducted, and it is discovered that mass production of the product is quite profitable. The smart blind stick is a low-cost, fast, and easy solution for blind and visually impaired people in third-world countries.
A hybrid model for unsupervised single channel speech separation
The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.
An Efficient Framework of Human Voice Verification for Robotic Applications
Waves are considered to be used to decode the speech signal more efficiently. This study is an accessible and robust approach for obtaining voice recognition features. Here, we suggested a new text-related method for the identification of human voices (TDHVR) system, which utilizes the discrete wavelet transform (DWT) for low level feature extraction, Relative Spectral Algorithm (RSA) for denoising the voice signal and finally Additive Prognostication (AP) for estimating the formants. First, the proposed methods are used for voice signals, and then we construct a vector train function that includes the derived low level function and estimated formant parameters. The same technique is then applied for calculating speech signals and constructing a test feature vector. The Euclidean distance between the vectors will now be used to balance all vectors in order to distinguish the voice and voice. The simulated human voice would equal the educated person’s speech if the difference between two vectors is almost null. Computation results were compared with the LPC Scheme and revealed, that by using fifty preconfigured six voice signals, verification trials were carried out, and a best accuracy of approximately 90 percent was reached, the suggested methodology surpassed the current methodology.
Deepfake Voice Detection: An Approach Using End-to-End Transformer with Acoustic Feature Fusion by Cross-Attention
Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in the digital world. Recently, many works have reported on the detection of deepfake videos/images. However, few studies have concentrated on developing robust deepfake voice detection systems. Among most existing studies in this field, a deepfake voice detection system commonly requires a large amount of training data and a robust backbone to detect real and logistic attack audio. For acoustic feature extractions, Mel-frequency Filter Bank (MFB)-based approaches are more suitable for extracting speech signals than applying the raw spectrum as input. Recurrent Neural Networks (RNNs) have been successfully applied to Natural Language Processing (NLP), but these backbones suffer from gradient vanishing or explosion while processing long-term sequences. In addition, the cross-dataset evaluation of most deepfake voice recognition systems has weak performance, leading to a system robustness issue. To address these issues, we propose an acoustic feature-fusion method to combine Mel-spectrum and pitch representation based on cross-attention mechanisms. Then, we combine a Transformer encoder with a convolutional neural network block to extract global and local features as a front end. Finally, we connect the back end with one linear layer for classification. We summarized several deepfake voice detectors’ performances on the silence-segment processed ASVspoof 2019 dataset. Our proposed method can achieve an Equal Error Rate (EER) of 26.41%, while most of the existing methods result in EER higher than 30%. We also tested our proposed method on the ASVspoof 2021 dataset, and found that it can achieve an EER as low as 28.52%, while the EER values for existing methods are all higher than 28.9%.