Training Deep Convolutional Networks with Unlimited Synthesis of Musical Examples for Multiple Instrument Recognition

Rameel Sethi, Noah Weninger, Abram Hindle, Vadim Bulitko, Michael Frishkopf

2018/05/10

Training Deep Convolutional Networks with Unlimited Synthesis of Musical Examples for Multiple Instrument Recognition

Authors

Rameel Sethi, Noah Weninger, Abram Hindle, Vadim Bulitko, Michael Frishkopf

Venue

Abstract

Deep learning has yielded promising results in music information retrieval and other domains compared to machine learning algorithms trained on hand-crafted feature representations, but is often limited by the availability of data and vast hyper-parameter space. It is difficult to obtain large amounts of annotated recordings due to prohibitive labelling costs and copyright restrictions. This is especially true when the MIR task is low-level in nature such as instrument recognition and applied to wide ranges of world instruments, causing most MIR techniques to focus on recovering easily verifiable metadata such as genre. We tackle this data availability problem using two techniques: generation of synthetic recordings using MIDI files and synthesizers, and by adding noise and filters to the generated samples for data augmentation purposes. We investigate the application of deep synthetically trained models to two related low-level MIR tasks of frame-level polyphony detection and instrument classification in polyphonic recordings, and empirically show that deep models trained on synthetic recordings augmented with noise can outperform a majority class baseline on a dataset of polyphonic recordings labeled with predominant instruments.

Bibtex

@inproceedings{sethi2018SMC-synthesis,
 abstract = {Deep learning has yielded promising results in music information retrieval and other domains compared to machine learning algorithms trained on hand-crafted feature representations, but is often limited by the availability of data and vast hyper-parameter space. It is difficult to obtain large amounts of annotated recordings due to prohibitive labelling costs and copyright restrictions. This is especially true when the MIR task is low-level in nature such as instrument recognition and applied to wide ranges of world instruments, causing most MIR techniques to focus on recovering easily verifiable metadata such as genre. We tackle this data availability problem using two techniques: generation of synthetic recordings using MIDI files and synthesizers, and by adding noise and filters to the generated samples for data augmentation purposes. We investigate the application of deep synthetically trained models to two related low-level MIR tasks of frame-level polyphony detection and instrument classification in polyphonic recordings, and empirically show that deep models trained on synthetic recordings augmented with noise can outperform a majority class baseline on a dataset of polyphonic recordings labeled with predominant instruments.},
 accepted = {2018-05-10},
 author = {Rameel Sethi and Noah Weninger and Abram Hindle and Vadim Bulitko and Michael Frishkopf},
 authors = {Rameel Sethi, Noah Weninger, Abram Hindle, Vadim Bulitko, Michael Frishkopf},
 booktitle = {15th Sound and Music Computing Conference (SMC 2018)},
 code = {sethi2018SMC-synthesis},
 date = {2018-05-10},
 funding = {KIAS, NSERC Discovery},
 location = {Limassol, Cyprus},
 pagerange = {1--10},
 pages = {1--10},
 role = { Author},
 title = {Training Deep Convolutional Networks with Unlimited Synthesis of Musical Examples for Multiple Instrument Recognition},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sethi2018SMC-synthesis.pdf},
 venue = {15th Sound and Music Computing Conference (SMC 2018)},
 year = {2018}
}