Catalogue Search | MBRL

Automatic music transcription: challenges and future directions

by Benetos, Emmanouil , Giannoulis, Dimitrios , Kirchhoff, Holger in Algorithms , Analysis , Artificial Intelligence

2013

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

Journal Article

Share this book

Add to My Shelf

Superfamilies of Evolved and Designed Networks

by Shen-Orr, Shai , Sheffer, Michal , Itzkovitz, Shalev in Animals , Applied sciences , Artificial intelligence

2004

Complex biological, technological, and sociological networks can be of very different sizes and connectivities, making it difficult to compare their structures. Here we present an approach to systematically study similarity in the local structure of networks, based on the significance profile (SP) of small subgraphs in the network compared to randomized networks. We find several superfamilies of previously unrelated networks with very similar SPs. One superfamily, including transcription networks of microorganisms, represents \"rate-limited\" information-processing networks strongly constrained by the response time of their components. A distinct superfamily includes protein signaling, developmental genetic networks, and neuronal wiring. Additional superfamilies include power grids, protein-structure networks and geometric networks, World Wide Web links and social networks, and word-adjacency networks from different languages.

Journal Article

Share this book

Add to My Shelf

Towards Automatic Expressive Pipa Music Transcription Using Morphological Analysis of Photoelectric Signals

by Wang, Qiao , Zhang, Yunxiao , Wang, Yuancheng in Algorithms , amplitude modulation-frequency modulation (AM-FM) , Analysis

2025

The musical signal produced by plucked instruments often exhibits non-stationarity due to variations in the pitch and amplitude, making pitch estimation a challenge. In this paper, we assess different transcription processes and algorithms applied to signals captured by optical sensors mounted on a pipa—a traditional Chinese plucked instrument—played using a range of techniques. The captured signal demonstrates a distinctive arched feature during plucking. This facilitates onset detection to avoid the impact of the spurious energy peaks within vibration areas that arise from pitch-shift playing techniques. Subsequently, we developed a novel time–frequency feature, known as continuous time-period mapping (CTPM), which contains pitch curves. The proposed process can also be applied to playing techniques that mix pitch shifts and tremolo. When evaluated on four renowned pipa music pieces of varying difficulty levels, our fully time-domain-based onset detectors outperformed four short-time methods, particularly during tremolo. Our zero-crossing-based pitch estimator achieved a performance comparable to short-time methods with a far better computational efficiency, demonstrating its suitability for use in a lightweight algorithm in future work.

Journal Article

Share this book

Add to My Shelf

Advancing deep learning for expressive music composition and performance modeling

by Zhang, Man in 639/705/1042 , 639/705/117 , 639/705/258

2025

The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music.

Journal Article

Share this book

Add to My Shelf

Sound and music biases in deep music transcription models: a systematic analysis

by Marták, Lukáš Samuel , Widmer, Gerhard , Hu, Patricia in Automatic music transcription , Bias , Corpus bias

2025

Automatic Music Transcription (AMT)—the task of converting music audio into note representations—has seen rapid progress, driven largely by deep learning systems. Due to the limited availability of richly annotated music datasets, much of the progress in AMT has been concentrated on classical piano music, and even a few very specific datasets. Whether these systems can generalize effectively to other musical contexts remains an open question. Complementing recent studies on distribution shifts in sound (e.g., recording conditions), in this work we investigate the musical dimension—specifically, variations in genre, dynamics, and polyphony levels. To this end, we introduce the MDS corpus, comprising three distinct subsets—(1) genre, (2) random, and (3) MAEtest—to emulate different axes of distribution shift. We evaluate the performance of several state-of-the-art AMT systems on the MDS corpus using both traditional information-retrieval and musically informed performance metrics. Our extensive evaluation isolates and exposes varying degrees of performance degradation under specific distribution shifts. In particular, we measure a note-level F1 performance drop of 20 percentage points due to sound, and 14 due to genre. Generally, we find that dynamics estimation proves more vulnerable to musical variation than onset prediction. Musically informed evaluation metrics, particularly those capturing harmonic structure, help identify potential contributing factors. Furthermore, experiments with randomly generated, non-musical sequences reveal clear limitations in system performance under extreme musical distribution shifts. Altogether, these findings offer new evidence of the persistent impact of the corpus bias problem in deep AMT systems.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Review on Music Transcription

by Bhattarai, Bhuwan , Lee, Joonwhoan in Acoustics , Analysis , Artificial intelligence

2023

Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys the diverse methodologies, techniques, and advancements that have shaped the landscape of music transcription. The paper outlines the significance of music transcription in preserving, analyzing, and disseminating musical compositions across various genres and cultures. It also provides a historical perspective by tracing the evolution of music transcription from traditional manual methods to modern automated approaches. It also highlights the challenges in transcription posed by complex singing techniques, variations in instrumentation, ambiguity in pitch, tempo changes, rhythm, and dynamics. The review also categorizes four different types of transcription techniques, frame-level, note-level, stream-level, and notation-level, discussing their strengths and limitations. It also encompasses the various research domains of music transcription from general melody extraction to vocal melody, note-level monophonic to polyphonic vocal transcription, single-instrument to multi-instrument transcription, and multi-pitch estimation. The survey further covers a broad spectrum of music transcription applications in music production and creation. It also reviews state-of-the-art open-source as well as commercial music transcription tools for pitch estimation, onset and offset detection, general melody detection, and vocal melody detection. In addition, it also encompasses the currently available python libraries that can be used for music transcription. Furthermore, the review highlights the various open-source benchmark datasets for different areas of music transcription. It also provides a wide range of references supporting the historical context, theoretical frameworks, and foundational concepts to help readers understand the background of music transcription and the context of our paper.

Journal Article

Share this book

Add to My Shelf

A perceptual measure for evaluating the resynthesis of automatic music transcriptions

by Avanzini, Federico , Ntalampiras, Stavros , Simonetta, Federico in Acoustics , Algorithms , Audio data

2022

This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change. We propose to distinguish the concept of “performance” from the one of “interpretation”, which expresses the “artistic intention”. Towards assessing this distinction, we carried out an experimental evaluation where 91 subjects were invited to listen to various audio recordings created by resynthesizing MIDI data obtained through Automatic Music Transcription (AMT) systems and a sensorized acoustic piano. During the resynthesis, we simulated different contexts and asked listeners to evaluate how much the interpretation changes when the context changes. Results show that: (1) MIDI format alone is not able to completely grasp the artistic intention of a music performance; (2) usual objective evaluation measures based on MIDI data present low correlations with the average subjective evaluation. To bridge this gap, we propose a novel measure which is meaningfully correlated with the outcome of the tests. In addition, we investigate multimodal machine learning by providing a new score-informed AMT method and propose an approximation algorithm for the p -dispersion problem.

Journal Article

Share this book

Add to My Shelf

Harmonizing minds and machines: survey on transformative power of machine learning in music

by Liang, Jing in Algorithms , Artificial intelligence , automatic music

2023

This survey explores the symbiotic relationship between Machine Learning (ML) and music, focusing on the transformative role of Artificial Intelligence (AI) in the musical sphere. Beginning with a historical contextualization of the intertwined trajectories of music and technology, the paper discusses the progressive use of ML in music analysis and creation. Emphasis is placed on present applications and future potential. A detailed examination of music information retrieval, automatic music transcription, music recommendation, and algorithmic composition presents state-of-the-art algorithms and their respective functionalities. The paper underscores recent advancements, including ML-assisted music production and emotion-driven music generation. The survey concludes with a prospective contemplation of future directions of ML within music, highlighting the ongoing growth, novel applications, and anticipation of deeper integration of ML across musical domains. This comprehensive study asserts the profound potential of ML to revolutionize the musical landscape and encourages further exploration and advancement in this emerging interdisciplinary field.

Journal Article

Share this book

Add to My Shelf

SpectTrans: Joint Spectral–Temporal Modeling for Polyphonic Piano Transcription via Spectral Gating Networks

by Liang, Yan , Cao, Rui , Li, Yuanzi in Acoustics , Adaptive filters , Audio data

2026

Automatic Music Transcription (AMT) plays a fundamental role in Music Information Retrieval (MIR) by converting raw audio signals into symbolic representations such as MIDI or musical scores. Despite advances in deep learning, accurately transcribing piano performances remains challenging due to dense polyphony, wide dynamic range, sustain pedal effects, and harmonic interactions between simultaneous notes. Existing approaches using convolutional and recurrent architectures, or autoregressive models, often fail to capture long-range temporal dependencies and global harmonic structures, while conventional Vision Transformers overlook the anisotropic characteristics of audio spectrograms, leading to harmonic neglect. In this work, we propose SpectTrans, a novel piano transcription framework that integrates a Spectral Gating Network with a multi-head self-attention Transformer to jointly model spectral and temporal dependencies. Latent CNN features are projected into the frequency domain via a Real Fast Fourier Transform, enabling adaptive filtering of overlapping harmonics and suppression of non-stationary noise, while deeper layers capture long-term melodic and chordal relationships. Experimental evaluation on polyphonic piano datasets demonstrates that this architecture produces acoustically coherent representations, improving the robustness and precision of transcription under complex performance conditions. These results suggest that combining frequency-domain refinement with global temporal modeling provides an effective strategy for high-fidelity AMT.

Journal Article

Share this book

Add to My Shelf

Piano Transcription Using Temporal Harmonic Diagram and Transfer Window Attention in Self-Attention Networks

by Wu, Qiong , Yu, Tao in Accuracy , Algorithms , Datasets

2025

Music transcription is an important means to record and transmit music culture. However, the existing music transcription algorithms still have certain errors in practical applications. To address this problem, the study adopts constant Q conversion to process music signals, introduces note starting point and frame-level pitch recognition module and transfer window attention, constructs temporal harmonic map for music melody extraction, and adopts significance function for music melody smoothing. The study uses the MAESTRO dataset containing about 200 hours of paired audio and MIDI recordings covering different performance styles, and the MedleyDB dataset protecting 122 pieces of music. The experimental results show that the overall accuracy of the transcription algorithm is 2.58% and 2.35% higher than the other algorithms, and the raw pitch accuracy is 2.23%> and 1.06% higher than the other algorithms, respectively, for a frequency point count of 600 and a search range of 0.5. The accuracy, recall, and Fl value of the transcription algorithm are 2.11%, 2.27%>, and 2.21% higher than the second-best algorithm, and the removal of the window attention and recognition modules decreases the accuracy of the algorithm by 8.07% and 16.76%, respectively. The average processing time of the transcription algorithm is 7.2ms lower than that of the traditional method, and the computational complexity grows more slowly as the amount of data grows. It can be concluded that the piano music transcription algorithm can effectively improve the accuracy of music recognition and transcription, and quickly and accurately convert the relevant audio into the corresponding notes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter