Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
14,666
result(s) for
"Audio data"
Sort by:
Mixing and mastering in the box : the guide to making great mixes and final masters on your computer
\"Mixing and mastering, the two final steps in the complex process of sound engineering, require both artistic finesse and technical facility. Even the slightest difference in the way a sound is processed can lead to a shift in the overall aesthetic of a piece, and so sound engineers must work towards an understanding of sound engineering that is particularly oriented towards the artistic and aesthetic. In order to create effective mixes, a sound engineer must maintain a distinct set of artistic goals while drawing on an in-depth understanding of the software involved in the process. Creating final masters requires specialized aural skills and a similarly advanced understanding of the software in order to fine-tune the product with respect to these goals. Mixing and Mastering in the Box addresses the practical and technological necessities of these two final steps without neglecting the creative process that is integral to the creation of high-quality recordings. Savage focuses primarily on creating mixes and masters in the Digital Audio Workstation (DAW), or \"in the box,\" currently a popular platform in the field of sound engineering due to the creative advantages and advanced technological capabilities it offers to its users. However, much of the information presented in Mixing and Mastering in the Box is also applicable to analog mixing gear or a hybrid system of digital and analog tools. This book, which features over one hundred illustrations and a comprehensive companion website, is ideal for beginning or intermediate students in sound engineering with a focus on DAW, recording artists who do their own mixing and mastering, or musicians who wish to be better informed when collaborating on mixes and masters\"-- Provided by publisher.
Fault Detection and Diagnosis of Railway Point Machines by Sound Analysis
by
Chung, Yongwha
,
Yoon, Sukhan
,
Lee, Jonguk
in
audio data
,
railway condition monitoring system
,
railway point machine
2016
Railway point devices act as actuators that provide different routes to trains by driving switchblades from the current position to the opposite one. Point failure can significantly affect railway operations, with potentially disastrous consequences. Therefore, early detection of anomalies is critical for monitoring and managing the condition of rail infrastructure. We present a data mining solution that utilizes audio data to efficiently detect and diagnose faults in railway condition monitoring systems. The system enables extracting mel-frequency cepstrum coefficients (MFCCs) from audio data with reduced feature dimensions using attribute subset selection, and employs support vector machines (SVMs) for early detection and classification of anomalies. Experimental results show that the system enables cost-effective detection and diagnosis of faults using a cheap microphone, with accuracy exceeding 94.1% whether used alone or in combination with other known methods.
Journal Article
Augmented human : how technology is shaping the new reality
Augmented reality (AR) blurs the boundary between the physical and digital worlds. In AR's current exploration phase, innovators are beginning to create compelling and contextually rich applications that enhance a user's everyday experiences. In this book, Dr. Helen Papagiannis, a world leading expert in the field, introduces you to AR: how it's evolving, where the opportunities are, and where it's headed.
Comparative analysis of audio-MAE and MAE-AST models for real-time audio classification
2025
Real-time audio classification is a complex process that requires systems to be highly accurate and reduce latency in signal processing. The main challenges include processing large amounts of data, particularly for high-quality audio files, which require significant computing resources. Another important problem is noise and other interference, which systems must effectively filter without losing useful information. In addition, the diversity of audio signals, such as speech recordings with different accents and tones, requires flexibility and adaptability of classification models. Implementing real-time processing involves optimizing performance to minimize latency, which is critical for responding quickly to incoming data. The ability of systems to adapt in response to new conditions and signals ensures their effectiveness in dynamic environments. This article is devoted to a comparative analysis of Audio-MAE and MAE-AST models, as well as their performance, efficiency, and parallelization capabilities. The paper discusses innovative solutions to overcome the existing challenges aimed at achieving a balance between processing speed and classification accuracy, as well as optimizing the use of hardware resources.
Journal Article
Speech Emotion Recognition Based on Parallel CNN-Attention Networks with Multi-Fold Data Augmentation
by
Lee, Yun Kyung
,
Shin, Hyun Soon
,
Bautista, John Lorenzo
in
Accuracy
,
Audio data
,
Classification
2022
In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networks, running in parallel, was used to model both spatial features and temporal feature representations. Multiple Augmentation techniques using Additive White Gaussian Noise (AWGN), SpecAugment, Room Impulse Response (RIR), and Tanh Distortion techniques were used to augment the training data to further generalize the model representation. Raw audio data were transformed into Mel-Spectrograms as the model’s input. Using CNN’s proven capability in image classification and spatial feature representations, the spectrograms were treated as an image with the height and width represented by the spectrogram’s time and frequency scales. Temporal feature representations were represented by attention-based models Transformer, and BLSTM-Attention modules. Proposed architectures of the parallel CNN-based networks running along with Transformer and BLSTM-Attention modules were compared with standalone CNN architectures and attention-based networks, as well as with hybrid architectures with CNN layers wrapped in time-distributed wrappers stacked on attention-based networks. In these experiments, the highest accuracy of 89.33% for a Parallel CNN-Transformer network and 85.67% for a Parallel CNN-BLSTM-Attention Network were achieved on a 10% hold-out test set from the dataset. These networks showed promising results based on their accuracies, while keeping significantly less training parameters compared with non-parallel hybrid models.
Journal Article
EMD-based time-frequency analysis methods of audio signals
2024
Using appropriate signal processing tools to analyze time series data accurately is essential for correctly interpreting the underlying processes. Commonly employed methods include kernel-based transforms that utilize base functions and modifications to depict time series data. This paper refers to the analysis of audio data using two such transforms: the Fourier transform and the wavelet transform, both based on assumptions regarding the signal's linearity and stationarity. However, in audio engineering, these assumptions often do not hold as the statistical characteristics of most audio signals vary over time, making them unsuitable for treatment as outputs from a Linear Time-Invariant (LTI) system. Consequently, more recent methods have shifted towards breaking down signals into various modes in an adaptive, data-specific manner, potentially offering benefits over traditional kernel-based methods. Techniques like empirical mode decomposition and Holo-Hilbert Spectral Analysis are examples of this. The effectiveness of these methods was tested through simulations using speech signals for both kernel-based and adaptive decomposition methods, demonstrating that these adaptive methods are effective for analyzing audio data that is both nonstationary and an output of the nonlinear system.
Journal Article
An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction
2022
The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial information carried by audio signals. This is in contrast to model-based methods, which impose spatial information from, for example, metadata like the intended position of a source onto signals that are otherwise free of spatial information. Signal processing has traditionally been at the core of spatial audio systems, and it continues to play a very important role. The irruption of deep learning in many closely related fields has put the focus on the potential of learning-based approaches for the development of data-based spatial audio applications. This article reviews the most important application domains of data-based spatial audio including well-established methods that employ conventional signal processing while paying special attention to the most recent achievements that make use of machine learning. Our review is organized based on the topology of the spatial audio pipeline that consist in capture, processing/manipulation, and reproduction. The literature on the three stages of the pipeline is discussed, as well as on the spatial audio representations that are used to transmit the content between them, highlighting the key references and elaborating on the underlying concepts. We reflect on the literature based on a juxtaposition of the prerequisites that made machine learning successful in domains other than spatial audio with those that are found in the domain of spatial audio as of today. Based on this, we identify routes that may facilitate future advancement.
Journal Article
Towards a unified terminology for sonification and visualization
by
Iber, Michael
,
Aigner, Wolfgang
,
Höldrich, Robert
in
Audio data
,
Computer Science
,
Mobile Computing
2023
Both sonification and visualization convey information about data by effectively using our human perceptual system, but their ways to transform the data differ. Over the past 30 years, the sonification community has demanded a holistic perspective on data representation, including audio-visual analysis, several times. A design theory of audio-visual analysis would be a relevant step in this direction. An indispensable foundation for this endeavor is a terminology describing the combined design space. To build a bridge between the domains, we adopt three of the established theoretical constructs from visualization theory for the field of sonification. The three constructs are the
spatial substrate
, the
visual mark
, and the
visual channel
. In our model, we choose time to be the
temporal substrate
of sonification.
Auditory marks
are then positioned in time, such as visual marks are positioned in space.
Auditory channels
are encoded into auditory marks to convey information. The proposed definitions allow discussing visualization and sonification designs as well as multi-modal designs based on a common terminology. While the identified terminology can support audio-visual analytics research, it also provides a new perspective on sonification theory itself.
Journal Article
Estimating the first and second derivatives of discrete audio data
2024
A new method for estimating the first and second derivatives of discrete audio signals intended to achieve higher computational precision in analyzing the performance and characteristics of digital audio systems is presented. The method could find numerous applications in modeling nonlinear audio circuit systems, e.g., for audio synthesis and creating audio effects, music recognition and classification, time-frequency analysis based on nonstationary audio signal decomposition, audio steganalysis and digital audio authentication or audio feature extraction methods. The proposed algorithm employs the ordinary 7 point-stencil central-difference formulas with improvements that minimize the round-off and truncation errors. This is achieved by treating the step size of numerical differentiation as a regularization parameter, which acts as a decision threshold in all calculations. This approach requires shifting discrete audio data by fractions of the initial sample rate, which was obtained by fractional delay FIR filters designed with modified 11-term cosine-sum windows for interpolation and shifting of audio signals. The maximum relative error in estimating first and second derivatives of discrete audio signals are respectively in order of
10
-
13
and
10
-
10
over the entire audio band, which is close to double-precision floating-point accuracy for the first and better than single-precision floating-point accuracy for the second derivative estimation. Numerical testing showed that this performance of the proposed method is not influenced by the type of signal being differentiated (either stationary or nonstationary), and provides better results than other known differentiation methods, in the audio band up to 21 kHz.
Journal Article
Security analysis of an audio data encryption scheme based on key chaining and DNA encoding
2021
Fairly recently, a new audio encryption scheme has been proposed. The cryptosystem is based on a substitution-permutation algorithm using DNA encoding. The key-generation of this proposed scheme is based on a key chaining mode, that generates a new key block for every plain block using the chaotic logistic map. After several statistical tests, handled by the authors of the scheme, they claimed that their cryptosystem is robust. In this paper, we scrutinize the cryptosystem from a cryptanalytic perspective, and we handle several security attacks to evaluate the immunity of the system, and to assess its possible adoption in real-world applications. We demonstrate two successful conventional attacks on the scheme, which are: the chosen ciphertext and chosen-plaintext attacks. The cryptosystem’s shuffling process design is scrutinized as well, and a cycle attack is described using the drawn results. Lessons learned from this cryptanalytic paper, are then outlined in order to be considered in further designs and proposals.
Journal Article