Catalogue Search | MBRL

Performance Analysis of Feature sets in Speaker Diarization techniques

by Maloji, Suman , Sailaja, C , Manepalli, Kasiprasad in Audio data , Audio equipment , False alarms

2021

Speech is the most important communication among humans. Processing of speech signal has many strategies including speech coding, speaker recognition, speaker verification, etc. Speaker diarization is the pre-processing stage for many applications of speaker recognition systems. Speaker Diarization is the mission of determining “who Spoke when” for any audio recording that carries an unknown quantity of records and an unknown variety of audio systems. Speaker diarization has come to be achief era for many tasks like navigation, retrieval, or higher-level interference on audio data. It mainly performs three operations feature extraction, voice activity detection, and classification. In this paper, we’ve reviewed the few speaker diarization Techniques. The trendy speaker diarization structures finished nice outcomes. In this paper, few speaker diarization device performances are evaluated for Diarization mistakes, Tracking time, and False alarm.

Journal Article

Share this book

Add to My Shelf

FAT-Net: A Spectral-Attention Transformer Network for Industrial Audio Anomaly Detection Using MFCC and Raw Features

by Shi, Yanhua

2025

This paper proposes FAT-Net, an audio noise anomaly detection method that integrates big data with a Transformer-based architecture. The model combines Mel-Frequency Cepstral Coefficients (MFCCs) and raw audio features to capture both spectral and temporal characteristics. A novel Spectral Attention Mechanism (SAM) is introduced to enhance sensitivity to anomaly-relevant frequency bands. Experiments were conducted on a large industrial dataset comprising approximately 3,000 audio recordings collected under real manufacturing conditions. FAT-Net was evaluated using accuracy, precision, recall, and F1- score as metrics, achieving a best F1-score of 98.05%, outperforming baseline models such as CNN (90.31%), LSTM (89.04%), and MFCC+LSTM (97.04%). These results demonstrate the effectiveness and generalization capability of FAT-Net for deployment in industrial environments.

Journal Article

Share this book

Add to My Shelf

Real-Time Smart-Digital Stethoscope System for Heart Diseases Monitoring

by Mansoor, Samar , M. Tahir, Anas , Khandakar, Amith in Algorithms , Automation , Cardiology

2019

One of the major causes of death all over the world is heart disease or cardiac dysfunction. These diseases could be identified easily with the variations in the sound produced due to the heart activity. These sophisticated auscultations need important clinical experience and concentrated listening skills. Therefore, there is an unmet need for a portable system for the early detection of cardiac illnesses. This paper proposes a prototype model of a smart digital-stethoscope system to monitor patient’s heart sounds and diagnose any abnormality in a real-time manner. This system consists of two subsystems that communicate wirelessly using Bluetooth low energy technology: A portable digital stethoscope subsystem, and a computer-based decision-making subsystem. The portable subsystem captures the heart sounds of the patient, filters and digitizes, and sends the captured heart sounds to a personal computer wirelessly to visualize the heart sounds and for further processing to make a decision if the heart sounds are normal or abnormal. Twenty-seven t-domain, f-domain, and Mel frequency cepstral coefficients (MFCC) features were used to train a public database to identify the best-performing algorithm for classifying abnormal and normal heart sound (HS). The hyper parameter optimization, along with and without a feature reduction method, was tested to improve accuracy. The cost-adjusted optimized ensemble algorithm can produce 97% and 88% accuracy of classifying abnormal and normal HS, respectively.

Journal Article

Share this book

Add to My Shelf

Questionnaires for the Assessment of Cognitive Function Secondary to Intake Interviews in In-Hospital Work and Development and Evaluation of a Classification Model Using Acoustic Features

by Yumi Umeda-Kameyama , Toshiharu Igarashi , Misato Nihei in Accuracy , Acoustics , Aged

2023

The number of people with dementia is increasing each year, and early detection allows for early intervention and treatment. Since conventional screening methods are time-consuming and expensive, a simple and inexpensive screening is expected. We created a standardized intake questionnaire with thirty questions in five categories and used machine learning to categorize older adults with moderate and mild dementia and mild cognitive impairment, based on speech patterns. To evaluate the feasibility of the developed interview items and the accuracy of the classification model based on acoustic features, 29 participants (7 males and 22 females) aged 72 to 91 years were recruited with the approval of the University of Tokyo Hospital. The MMSE results showed that 12 participants had moderate dementia with MMSE scores of 20 or less, 8 participants had mild dementia with MMSE scores between 21 and 23, and 9 participants had MCI with MMSE scores between 24 and 27. As a result, Mel-spectrogram generally outperformed MFCC in terms of accuracy, precision, recall, and F1-score in all classification tasks. The multi-classification using Mel-spectrogram achieved the highest accuracy of 0.932, while the binary classification of moderate dementia and MCI group using MFCC achieved the lowest accuracy of 0.502. The FDR was generally low for all classification tasks, indicating a low rate of false positives. However, the FNR was relatively high in some cases, indicating a higher rate of false negatives.

Journal Article

Share this book

Add to My Shelf

Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review

by Ben Othman, Mohamed Tahar , Belkhier, Youcef , Rehman, Ateeq Ur in acoustic model , automatic speech recognition , children’s speech recognition

2022

Automatic speech recognition (ASR) is one of the ways used to transform acoustic speech signals into text. Over the last few decades, an enormous amount of research work has been done in the research area of speech recognition (SR). However, most studies have focused on building ASR systems based on adult speech. The recognition of children’s speech was neglected for some time, which means that the field of children’s SR research is wide open. Children’s SR is a challenging task due to the large variations in children’s articulatory, acoustic, physical, and linguistic characteristics compared to adult speech. Thus, the field became a very attractive area of research and it is important to understand where the main center of attention is, and what are the most widely used methods for extracting acoustic features, various acoustic models, speech datasets, the SR toolkits used during the recognition process, and so on. ASR systems or interfaces are extensively used and integrated into various real-life applications, such as search engines, the healthcare industry, biometric analysis, car systems, the military, aids for people with disabilities, and mobile devices. A systematic literature review (SLR) is presented in this work by extracting the relevant information from 76 research papers published from 2009 to 2020 in the field of ASR for children. The objective of this review is to throw light on the trends of research in children’s speech recognition and analyze the potential of trending techniques to recognize children’s speech.

Journal Article

Share this book

Add to My Shelf

Non-Contact Monitoring and Classification of Breathing Pattern for the Supervision of People Infected by COVID-19

by Lin, Ding-Bing , Hendria, Willy Fitra , Adiprabowo, Tjahjo in Auscultation , Body temperature , Cameras

2021

During the pandemic of coronavirus disease-2019 (COVID-19), medical practitioners need non-contact devices to reduce the risk of spreading the virus. People with COVID-19 usually experience fever and have difficulty breathing. Unsupervised care to patients with respiratory problems will be the main reason for the rising death rate. Periodic linearly increasing frequency chirp, known as frequency-modulated continuous wave (FMCW), is one of the radar technologies with a low-power operation and high-resolution detection which can detect any tiny movement. In this study, we use FMCW to develop a non-contact medical device that monitors and classifies the breathing pattern in real time. Patients with a breathing disorder have an unusual breathing characteristic that cannot be represented using the breathing rate. Thus, we created an Xtreme Gradient Boosting (XGBoost) classification model and adopted Mel-frequency cepstral coefficient (MFCC) feature extraction to classify the breathing pattern behavior. XGBoost is an ensemble machine-learning technique with a fast execution time and good scalability for predictions. In this study, MFCC feature extraction assists machine learning in extracting the features of the breathing signal. Based on the results, the system obtained an acceptable accuracy. Thus, our proposed system could potentially be used to detect and monitor the presence of respiratory problems in patients with COVID-19, asthma, etc.

Journal Article

Share this book

Add to My Shelf

Combination of VMD Mapping MFCC and LSTM: A New Acoustic Fault Diagnosis Method of Diesel Engine

by Jia, Xisheng , Yan, Hao , Wen, Liang in Accuracy , Acoustic properties , acoustic signals

2022

Diesel engines have a wide range of functions in the industrial and military fields. An urgent problem to be solved is how to diagnose and identify their faults effectively and timely. In this paper, a diesel engine acoustic fault diagnosis method based on variational modal decomposition mapping Mel frequency cepstral coefficients (MFCC) and long-short-term memory network is proposed. Variational mode decomposition (VMD) is used to remove noise from the original signal and differentiate the signal into multiple modes. The sound pressure signals of different modes are mapped to the Mel filter bank in the frequency domain, and then the Mel frequency cepstral coefficients of the respective mode signals are calculated in the mapping range of frequency domain, and the optimized Mel frequency cepstral coefficients are used as the input of long and short time memory network (LSTM) which is trained and verified, and the fault diagnosis model of the diesel engine is obtained. The experimental part compares the fault diagnosis effects of different feature extraction methods, different modal decomposition methods and different classifiers, finally verifying the feasibility and effectiveness of the method proposed in this paper, and providing solutions to the problem of how to realise fault diagnosis using acoustic signals.

Journal Article

Share this book

Add to My Shelf

A Novel Underwater Acoustic Target Recognition Method Based on MFCC and RACNN

by Wang, Baozhu , Liu, Dali , Hou, Weimin in Accuracy , Acoustics , Algorithms

2024

In ocean remote sensing missions, recognizing an underwater acoustic target is a crucial technology for conducting marine biological surveys, ocean explorations, and other scientific activities that take place in water. The complex acoustic propagation characteristics present significant challenges for the recognition of underwater acoustic targets (UATR). Methods such as extracting the DEMON spectrum of a signal and inputting it into an artificial neural network for recognition, and fusing the multidimensional features of a signal for recognition, have been proposed. However, there is still room for improvement in terms of noise immunity, improved computational performance, and reduced reliance on specialized knowledge. In this article, we propose the Residual Attentional Convolutional Neural Network (RACNN), a convolutional neural network that quickly and accurately recognize the type of ship-radiated noise. This network is capable of extracting internal features of Mel Frequency Cepstral Coefficients (MFCC) of the underwater ship-radiated noise. Experimental results demonstrate that the proposed model achieves an overall accuracy of 99.34% on the ShipsEar dataset, surpassing conventional recognition methods and other deep learning models.

Journal Article

Share this book

Add to My Shelf

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders

by Cho, Young-Im , Makhmudov, Fazliddin , Oteniyazov, Rashid in Accuracy , Algorithms , Analysis

2023

Understanding and identifying emotional cues in human speech is a crucial aspect of human–computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.

Journal Article

Share this book

Add to My Shelf

TriSpectraKAN: a novel approach for COPD detection via lung sound analysis

by Oza, Aditya , Roy, Abhinav , Singh, Anurag in 631/114/1305 , 692/308 , 692/699/1785

2025

This study aims to create an automated, accessible, and cost-effective diagnostic tool for chronic obstructive pulmonary disease (COPD). Traditional diagnostic methods are expensive, time-consuming, and require specialized equipment. The proposed TriSpectraKAN model leverages audio-based lung sound features to improve early diagnosis. TriSpectraKAN is a hybrid model combining spectral features and the Kolmogorov–Arnold Network (KAN) to analyze lung sounds using Mel-frequency cepstral coefficients (MFCCs), chromagram, and Mel spectrograms. Each sub-model focuses on a different audio feature, capturing unique sonic signatures. These features are merged through a hybrid network for comprehensive analysis. The model, trained on a COPD dataset, was deployed on a Raspberry Pi for real-time use. TriSpectraKAN achieved 93% accuracy, an F1 score of 0.98, precision of 0.97, and recall of 0.98. This multimodal approach captured a broad range of lung sound features, improving diagnosis accuracy compared to traditional methods. The integration of multiple audio features in TriSpectraKAN enhances COPD diagnosis, demonstrating the potential of AI and machine learning to transform respiratory disease diagnosis through accessible tools.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter