Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
129 result(s) for "automatic sound classification"
Sort by:
A Spiking Neural Network Framework for Robust Sound Classification
Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, we propose a biologically plausible ASC framework, namely SOM-SNN. This framework uses the unsupervised self-organizing map (SOM) for representing frequency contents embedded within the acoustic signals, followed by an event-based spiking neural network (SNN) for spatiotemporal spiking pattern classification. We report experimental results on the RWCP environmental sound and TIDIGITS spoken digits datasets, which demonstrate competitive classification accuracies over other deep learning and SNN-based models. The SOM-SNN framework is also shown to be highly robust to corrupting noise after multi-condition training, whereby the model is trained with noise-corrupted sound samples. Moreover, we discover the early decision making capability of the proposed framework: an accurate classification can be made with an only partial presentation of the input.
Classification of Heart Sounds Using Convolutional Neural Network
Heart sounds play an important role in the diagnosis of cardiac conditions. Due to the low signal-to-noise ratio (SNR), it is problematic and time-consuming for experts to discriminate different kinds of heart sounds. Thus, objective classification of heart sounds is essential. In this study, we combined a conventional feature engineering method with deep learning algorithms to automatically classify normal and abnormal heart sounds. First, 497 features were extracted from eight domains. Then, we fed these features into the designed convolutional neural network (CNN), in which the fully connected layers that are usually used before the classification layer were replaced with a global average pooling layer to obtain global information about the feature maps and avoid overfitting. Considering the class imbalance, the class weights were set in the loss function during the training process to improve the classification algorithm’s performance. Stratified five-fold cross-validation was used to evaluate the performance of the proposed method. The mean accuracy, sensitivity, specificity and Matthews correlation coefficient observed on the PhysioNet/CinC Challenge 2016 dataset were 86.8%, 87%, 86.6% and 72.1% respectively. The proposed algorithm’s performance achieves an appropriate trade-off between sensitivity and specificity.
Low-complexity F0-based speech/nonspeech discrimination approach for digital hearing aids
Digital hearing aids impose strong complexity and memory constraints on digital signal processing algorithms that implement different applications. This paper proposes a low complexity approach for automatic sound classification in digital hearing aids. The proposed scheme, which operates on a frame-by-frame basis, consists of two stages: analysis stage and classification stage. The analysis stage provides a set of low-complexity signal features derived from fundamental frequency (F0) estimation. Here, F0 estimation is performed by a decimated difference function, which results in a reduced-complexity analysis stage. The classification stage has been designed with the aim of reducing the complexity while maintaining high accuracy rates. Three low-complexity classifiers have been evaluated (tree-based C4.5, 1-Nearest Neighbor (1-NN) and a Multilayer Perceptron (MLP)), the MLP being chosen because it provides the best accuracy rates and fits to the computational and memory constraints of ultra low-power DSP-based hearing aids. The classification stage is composed of a MLP classifier followed by a Hidden Markov Model (HMM), providing a good trade-off solution between complexity and classification accuracy rate. The goal of the proposed approach is to perform a robust discrimination among speech/nonspeech parts of audio signals in commercial digital hearing aids, the computational cost being a critical issue. For the experiments, an audio database including speech, music and noise signals has been used.
Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review
The aim of this systematic literature review (SLR) is to identify and critically evaluate current research advancements with respect to small data and the use of data augmentation methods to increase the amount of data available for deep learning classifiers for sound (including voice, speech, and related audio signals) classification. Methodology: This SLR was carried out based on the standard SLR guidelines based on PRISMA, and three bibliographic databases were examined, namely, Web of Science, SCOPUS, and IEEE Xplore. Findings. The initial search findings using the variety of keyword combinations in the last five years (2017–2021) resulted in a total of 131 papers. To select relevant articles that are within the scope of this study, we adopted some screening exclusion criteria and snowballing (forward and backward snowballing) which resulted in 56 selected articles. Originality: Shortcomings of previous research studies include the lack of sufficient data, weakly labelled data, unbalanced datasets, noisy datasets, poor representations of sound features, and the lack of effective augmentation approach affecting the overall performance of classifiers, which we discuss in this article. Following the analysis of identified articles, we overview the sound datasets, feature extraction methods, data augmentation techniques, and its applications in different areas in the sound classification research problem. Finally, we conclude with the summary of SLR, answers to research questions, and recommendations for the sound classification task.
Transformers for Urban Sound Classification—A Comprehensive Performance Evaluation
Many relevant sound events occur in urban scenarios, and robust classification models are required to identify abnormal and relevant events correctly. These models need to identify such events within valuable time, being effective and prompt. It is also essential to determine for how much time these events prevail. This article presents an extensive analysis developed to identify the best-performing model to successfully classify a broad set of sound events occurring in urban scenarios. Analysis and modelling of Transformer models were performed using available public datasets with different sets of sound classes. The Transformer models’ performance was compared to the one achieved by the baseline model and end-to-end convolutional models. Furthermore, the benefits of using pre-training from image and sound domains and data augmentation techniques were identified. Additionally, complementary methods that have been used to improve the models’ performance and good practices to obtain robust sound classification models were investigated. After an extensive evaluation, it was found that the most promising results were obtained by employing a Transformer model using a novel Adam optimizer with weight decay and transfer learning from the audio domain by reusing the weights from AudioSet, which led to an accuracy score of 89.8% for the UrbanSound8K dataset, 95.8% for the ESC-50 dataset, and 99% for the ESC-10 dataset, respectively.
PCG Classification Using Multidomain Features and SVM Classifier
This paper proposes a method using multidomain features and support vector machine (SVM) for classifying normal and abnormal heart sound recordings. The database was provided by the PhysioNet/CinC Challenge 2016. A total of 515 features are extracted from nine feature domains, i.e., time interval, frequency spectrum of states, state amplitude, energy, frequency spectrum of records, cepstrum, cyclostationarity, high-order statistics, and entropy. Correlation analysis is conducted to quantify the feature discrimination abilities, and the results show that “frequency spectrum of state”, “energy”, and “entropy” are top domains to contribute effective features. A SVM with radial basis kernel function was trained for signal quality estimation and classification. The SVM classifier is independently trained and tested by many groups of top features. It shows the average of sensitivity, specificity, and overall score are high up to 0.88, 0.87, and 0.88, respectively, when top 400 features are used. This score is competitive to the best previous scores. The classifier has very good performance with even small number of top features for training and it has stable output regardless of randomly selected features for training. These simulations demonstrate that the proposed features and SVM classifier are jointly powerful for classifying heart sound recordings.
Past and Trends in Cough Sound Acquisition, Automatic Detection and Automatic Classification: A Comparative Review
Cough is a very common symptom and the most frequent reason for seeking medical advice. Optimized care goes inevitably through an adapted recording of this symptom and automatic processing. This study provides an updated exhaustive quantitative review of the field of cough sound acquisition, automatic detection in longer audio sequences and automatic classification of the nature or disease. Related studies were analyzed and metrics extracted and processed to create a quantitative characterization of the state-of-the-art and trends. A list of objective criteria was established to select a subset of the most complete detection studies in the perspective of deployment in clinical practice. One hundred and forty-four studies were short-listed, and a picture of the state-of-the-art technology is drawn. The trend shows an increasing number of classification studies, an increase of the dataset size, in part from crowdsourcing, a rapid increase of COVID-19 studies, the prevalence of smartphones and wearable sensors for the acquisition, and a rapid expansion of deep learning. Finally, a subset of 12 detection studies is identified as the most complete ones. An unequaled quantitative overview is presented. The field shows a remarkable dynamic, boosted by the research on COVID-19 diagnosis, and a perfect adaptation to mobile health.
An improved ViT model for music genre classification based on mel spectrogram
Automating the task of music genre classification offers opportunities to enhance user experiences, streamline music management processes, and unlock insights into the rich and diverse world of music. In this paper, an improved ViT model is proposed to extract more comprehensive music genre features from Mel spectrograms by leveraging the strengths of both convolutional neural networks and Transformers. Also, the paper incorporates a channel attention mechanism by amplifying differences between channels within the Mel spectrograms of individual music genres, thereby facilitating more precise classification. Experimental results on the GTZAN dataset show that the proposed model achieves an accuracy of 86.8%, paving the way for more accurate and efficient music genre classification methods compared to earlier approaches.
CNN–Aided Optical Fiber Distributed Acoustic Sensing for Early Detection of Red Palm Weevil: A Field Experiment
Red palm weevil (RPW) is a harmful pest that destroys many date, coconut, and oil palm plantations worldwide. It is not difficult to apply curative methods to trees infested with RPW; however, the early detection of RPW remains a major challenge, especially on large farms. In a controlled environment and an outdoor farm, we report on the integration of optical fiber distributed acoustic sensing (DAS) and machine learning (ML) for the early detection of true weevil larvae less than three weeks old. Specifically, temporal and spectral data recorded with the DAS system and processed by applying a 100–800 Hz filter are used to train convolutional neural network (CNN) models, which distinguish between “infested” and “healthy” signals with a classification accuracy of ∼97%. In addition, a strict ML-based classification approach is introduced to improve the false alarm performance metric of the system by ∼20%. In a controlled environment experiment, we find that the highest infestation alarm count of infested and healthy trees to be 1131 and 22, respectively, highlighting our system’s ability to distinguish between the infested and healthy trees. On an outdoor farm, in contrast, the acoustic noise produced by wind is a major source of false alarm generation in our system. The best performance of our sensor is obtained when wind speeds are less than 9 mph. In a representative experiment, when wind speeds are less than 9 mph outdoor, the highest infestation alarm count of infested and healthy trees are recorded to be 1622 and 94, respectively.
Two-stage CNNs for computerized BI-RADS categorization in breast ultrasound images
Background Quantizing the Breast Imaging Reporting and Data System (BI-RADS) criteria into different categories with the single ultrasound modality has always been a challenge. To achieve this, we proposed a two-stage grading system to automatically evaluate breast tumors from ultrasound images into five categories based on convolutional neural networks (CNNs). Methods This new developed automatic grading system was consisted of two stages, including the tumor identification and the tumor grading. The constructed network for tumor identification, denoted as ROI-CNN, can identify the region contained the tumor from the original breast ultrasound images. The following tumor categorization network, denoted as G-CNN, can generate effective features for differentiating the identified regions of interest (ROIs) into five categories: Category “3”, Category “4A”, Category “4B”, Category “4C”, and Category “5”. Particularly, to promote the predictions identified by the ROI-CNN better tailor to the tumor, refinement procedure based on Level-set was leveraged as a joint between the stage and grading stage. Results We tested the proposed two-stage grading system against 2238 cases with breast tumors in ultrasound images. With the accuracy as an indicator, our automatic computerized evaluation for grading breast tumors exhibited a performance comparable to that of subjective categories determined by physicians. Experimental results show that our two-stage framework can achieve the accuracy of 0.998 on Category “3”, 0.940 on Category “4A”, 0.734 on Category “4B”, 0.922 on Category “4C”, and 0.876 on Category “5”. Conclusion The proposed scheme can extract effective features from the breast ultrasound images for the final classification of breast tumors by decoupling the identification features and classification features with different CNNs. Besides, the proposed scheme can extend the diagnosing of breast tumors in ultrasound images to five sub-categories according to BI-RADS rather than merely distinguishing the breast tumor malignant from benign.