A multimodal approach for percussion music transcription from audio and video

Multimodal approach for percussion music transcription from audio and video recordings, applying digital signal processing techniques.

Bernardo Marenco, Magdalena Fuentes, Florencia Lanzaro, Martín Rocamora, and Alvaro Gómez.

November 2, 2015

Montevideo, Uruguay

Proceedings of the 20th Iberoamerican Congress on Pattern Recognition

Lecture Notes in Computer Science, 9423

Springer International Publishing Switzerland

2015-11_CIARP2015

Abstract

A multimodal approach for percussion music transcription from audio and video recordings is proposed in this work. It is part of an ongoing research effort for the development of tools for computer-aided analysis of Candombe drumming, a popular afro-rooted rhythm from Uruguay. Several signal processing techniques are applied to automatically extract meaningful information from each source. This involves detecting certain relevant objects in the scene from the video stream. The location of events is obtained from the audio signal and this information is used to drive the processing of both modalities. Then, the detected events are classified by combining the information from each source in a feature-level fusion scheme. The experiments conducted yield promising results that show the advantages of the proposed method.