TY - CHAP
T1 - END-TO-END PIANO PERFORMANCE-MIDI TO SCORE CONVERSION WITH TRANSFORMERS
AU - Beyer, Tim
AU - Dai, Angela
N1 - Publisher Copyright:
© T. Beyer and A. Dai.
PY - 2024
Y1 - 2024
N2 - The automated creation of accurate musical notation from an expressive human performance is a fundamental task in computational musicology. To this end, we present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files. We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data. Framing the task as sequence-to-sequence translation rather than note-wise classification re-duces alignment requirements and annotation costs, while allowing the prediction of more concise and accurate nota-tion. To serialize symbolic music data, we design a custom tokenization stage based on compound tokens that care-fully quantizes continuous values. This technique pre-serves more score information while reducing sequence lengths by 3.5× compared to prior approaches. Using the transformer backbone, our method demonstrates better understanding of note values, rhythmic structure, and details such as staff assignment. When evaluated end-to-end using transcription metrics such as MUSTER, we achieve signifi-cant improvements over previous deep learning approaches and complex HMM-based state-of-the-art pipelines. Our method is also the first to directly predict notational details like trill marks or stem direction from performance data. Code and models are available on GitHub.
AB - The automated creation of accurate musical notation from an expressive human performance is a fundamental task in computational musicology. To this end, we present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files. We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data. Framing the task as sequence-to-sequence translation rather than note-wise classification re-duces alignment requirements and annotation costs, while allowing the prediction of more concise and accurate nota-tion. To serialize symbolic music data, we design a custom tokenization stage based on compound tokens that care-fully quantizes continuous values. This technique pre-serves more score information while reducing sequence lengths by 3.5× compared to prior approaches. Using the transformer backbone, our method demonstrates better understanding of note values, rhythmic structure, and details such as staff assignment. When evaluated end-to-end using transcription metrics such as MUSTER, we achieve signifi-cant improvements over previous deep learning approaches and complex HMM-based state-of-the-art pipelines. Our method is also the first to directly predict notational details like trill marks or stem direction from performance data. Code and models are available on GitHub.
UR - http://www.scopus.com/inward/record.url?scp=85219094393&partnerID=8YFLogxK
M3 - Chapter
AN - SCOPUS:85219094393
T3 - Proceedings of the International Society for Music Information Retrieval Conference
SP - 319
EP - 326
BT - Proceedings of the International Society for Music Information Retrieval Conference
PB - International Society for Music Information Retrieval
ER -