Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
5,530 result(s) for "Spectrograms"
Sort by:
MCAF-Net: A non-invasive early screening method for coronary artery disease based on a multi-scale cross-modal model
Coronary artery disease (CAD) is one of the leading causes of death worldwide. Achieving early, non-invasive, and highly accurate detection of CAD is crucial for reducing its mortality rate. This study utilizes a high-sensitivity MEMS-based PCG-ECG Synchronous Auscultation System to construct a high-fidelity clinical dataset of synchronous PCG-ECG data for CAD. Leveraging deep learning technology, we developed a Multi-scale Cross-modal Attention Fusion Network (MCAF-Net). This network extracts multi-resolution features from synchronous PCG-ECG spectrograms using an improved residual network and performs feature interaction and fusion through Mutual Cross Attention (MCA), achieving high-precision detection of CAD (96.29% on the clinical dataset, 97.69% on the public dataset).
An improved feature extraction for Hindi language audio impersonation attack detection
Audio impersonation attacks offer a substantial risk to voice-based authentication systems and various speech recognition applications. Hence, there is a requirement for robust detection methods to assure system security and dependability. The work in this paper discusses a new approach to improve front-end feature extraction of an audio imitation attack detection system, notably in the context of the Hindi language. The proposed model is implemented in three main steps. Firstly, Gammatone spectrogram, Mel spectrogram, and Acoustic Ternary Pattern Audio Features (TPAF)spectrogram are generated from the recorded audio samples. Secondly, an optimized Residual Network (ResNet27) is employed to capture distinctive characteristics from these spectrograms. Lastly, four different binary classifier algorithms; eXtreme Gradient Boosting (XGboost), Random Forest (RF), K-nearest neighbor (KNN), and Naïve Bayes (NB) are individually applied to the aforementioned three different feature combinations, resulting in a total of twelve distinct systems. All these systems have been evaluated using own created dataset named as Voice Impersonation Corpus in Hindi Language (VIHL) for audio impersonation attack. Also, the evaluation of the proposed models have been carried using ASVspoof 2019 and ASVspoof 2021 datasets for spoof, impersonation, replay and deepfake attacks. The results obtained from the proposed work show that Gammatone spectrogram-ResNet27 combination with XGboost classifier achieved 0.9% Equal Error Rate (EER) for impersonation attack, which surpasses existing techniques in accurately identifying such attacks.
Penetration State Identification of Aluminum Alloy Cold Metal Transfer Based on Arc Sound Signals Using Multi-Spectrogram Fusion Inception Convolutional Neural Network
The CMT welding process has been widely used for aluminum alloy welding. The weld’s penetration state is essential for evaluating the welding quality. Arc sound signals contain a wealth of information related to the penetration state of the weld. This paper studies the correlation between the frequency domain features of arc sound signals and the weld penetration state, as well as the correlation between Mel spectrograms, Gammatone spectrograms and Bark spectrograms and the weld penetration state. Arc sound features fused with multilingual spectrograms are constructed as inputs to a custom Inception CNN model that is optimized based on GoogleNet for CMT weld penetration state recognition. The experimental results show that the accuracy of the method proposed in this paper for identifying the fusion state of CMT welds in aluminum alloy plates is 97.7%, which is higher than the identification accuracy of a single spectrogram as the input. The recognition accuracy of the customized Inception CNN is improved by 0.93% over the recognition accuracy of GoogleNet. The customized Inception CNN also has high recognition results compared to AlexNet and ResNet.
Deep features-based speech emotion recognition for smart affective services
Emotion recognition from speech signals is an interesting research with several applications like smart healthcare, autonomous voice response systems, assessing situational seriousness by caller affective state analysis in emergency centers, and other smart affective services. In this paper, we present a study of speech emotion recognition based on the features extracted from spectrograms using a deep convolutional neural network (CNN) with rectangular kernels. Typically, CNNs have square shaped kernels and pooling operators at various layers, which are suited for 2D image data. However, in case of spectrograms, the information is encoded in a slightly different manner. Time is represented along the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value in the spectrogram at a particular position. To analyze speech through spectrograms, we propose rectangular kernels of varying shapes and sizes, along with max pooling in rectangular neighborhoods, to extract discriminative features. The proposed scheme effectively learns discriminative features from speech spectrograms and performs better than many state-of-the-art techniques when evaluated its performance on Emo-DB and Korean speech dataset.
Fully Automated Reduction of Longslit Spectroscopy with the Low Resolution Imaging Spectrometer at the Keck Observatory
This paper presents and summarizes a software package (\"LPipe\") for completely automated, end-to-end reduction of both bright and faint sources with the Low Resolution Imaging Spectrometer (LRIS) at Keck Observatory. It supports all gratings, grisms, and dichroics, and also reduces imaging observations, although it does not include multislit or polarimetric reduction capabilities at present. It is suitable for on-the-fly quicklook reductions at the telescope, for large-scale reductions of archival data sets, and (in many cases) for science-quality post-run reductions of PI data. To demonstrate its capabilities the pipeline is run in fully automated mode on all LRIS longslit data in the Keck Observatory Archive acquired during the 12-month period between 2016 August and 2017 July. The reduced spectra (of 675 single-object targets, totaling ∼200 hours of on-source integration time in each camera), and the pipeline itself, are made publicly available to the community.
Improved Audio Separation Using U-Net and ICA
This paper introduces UNetICA, an innovative hybrid model for audio source separation that integrates the strengths of U-Net and Independent Component Analysis (ICA). The model is designed to effectively isolate individual audio sources such as vocals, drums, bass, and other instruments from mixed music tracks. Initially, the U-Net architecture is employed to process spectrograms, extracting multi-scale features and generating coarse estimates of each source. These preliminary outputs are then refined through ICA, which enhances separation by leveraging the statistical independence of audio components. This two-stage approach allows UNetICA to address both spectral structure and statistical properties of sources, resulting in more accurate separation. The model was trained and evaluated on the MUSDB18 dataset, which includes 100 tracks for training and 50 for testing. Performance was measured using Signal-to-Distortion Ratio (SDR). UNetICA demonstrated superior results, achieving an SDR of 19.05 dB for bass, significantly outperforming existing models. Vocals and other sources also showed competitive SDRs of 8.792 dB and 8.868 dB, respectively. When compared with state-of-the-art models such as Open-Unmix, Demucs, and Conv-Tasnet, UNetICA consistently achieved better separation performance, validating the effectiveness of the proposed hybrid framework.
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients
Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, wireless devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) is an active research field in the healthcare domain and machine learning. It is a technique that can be used to automatically identify speakers’ emotions from their speech. However, the SER system, especially in the healthcare domain, is confronted with a few challenges. For example, low prediction accuracy, high computational complexity, delay in real-time prediction, and how to identify appropriate features from speech. Motivated by these research gaps, we proposed an emotion-aware IoT-enabled WBAN system within the healthcare framework where data processing and long-range data transmissions are performed by an edge AI system for real-time prediction of patients’ speech emotions as well as to capture the changes in emotions before and after treatment. Additionally, we investigated the effectiveness of different machine learning and deep learning algorithms in terms of performance classification, feature extraction methods, and normalization methods. We developed a hybrid deep learning model, i.e., convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM), and a regularized CNN model. We combined the models with different optimization strategies and regularization techniques to improve the prediction accuracy, reduce generalization error, and reduce the computational complexity of the neural networks in terms of their computational time, power, and space. Different experiments were performed to check the efficiency and effectiveness of the proposed machine learning and deep learning algorithms. The proposed models are compared with a related existing model for evaluation and validation using standard performance metrics such as prediction accuracy, precision, recall, F1 score, confusion matrix, and the differences between the actual and predicted values. The experimental results proved that one of the proposed models outperformed the existing model with an accuracy of about 98%.
Bottom-up broadcast neural network for music genre classification
Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification.
Modelling the una-corda effect in pianos
Most notes of a piano are fitted with two or three strings that, in normal playing conditions, are simultaneously struck by the hammer. In grand pianos, the una corda pedal offers alternative musical effects, notably a softer and duller tonal quality. This pedal’s operation is linked to a lever system that displaces the action to one side causing the hammer to hit only one-out-of-two or two-out-of-three strings, thereby altering the sound. Meanwhile, the other strings of the same note undergo sympathetic vibration, owing to their structural connection through the bridge. This paper introduces a dynamic model that replicates the effects of the una corda pedal. It consists of a state-space scheme, serving as a framework to couple the dynamic behaviour of the various components. Stiff-string models describe the strings’ vibration in three dimensions while a reduced modal model captures the dynamics of the soundboard. Results show how the string vibration and the force transmitted to the soundboard are affected by applying hammer excitation to individual or multiple strings. When all strings are struck, the transverse vibration shows beating. This effect is audible and evidenced in the spectrograms, where the amplitude of partials oscillates over time. In the una corda case, the force exerted on the bridge by the passive string increases initially with time, and its contribution to the overall transmitted force is smaller. However, there is no beating, the decay is more even, and the spectrograms do not show irregularities over time.
The spectral characteristics of plasma-erosion torch on the surface of carbon-bearing materials
The paper obtains spectrograms of the plasma-erosion torch on the surface of the glass carbon in the air and in the vacuum, which allow us to judge the presence of certain particles and their number in the torch depending on the parameters of laser radiation. Experiments have shown that the composition of the plasma-erosion torch in air and in vacuum has a complex structure with different particles, depends on the parameters of laser radiation, which can affect the characteristics and modes of deposition of carbon films.