![]() |
|||||
|
|
|||||
| Electronic Engineering > Contact > People > Dr Samer A Abdallah | |||||
Unsupervised analysis of polyphonic music using sparse codingThis page contains supporting material for the article: S. A. Abdallah and M. D. Plumbley. Unsupervised analysis of polyphonic music using sparse coding. IEEE Transactions on Neural Networks, 17(1), 179-196, January 2006. Synthetic harpsichord recording
Sparse coding resulted in the dictionary shown above, which is available for download as a MAT file (105 kB) or a compressed text file (97 kB). Many of the dictionary elements have approximately harmonic spectra, and when sonified (by using the spectra to filter Gaussian white noise) give rise to a clear pitch percept. The sonified dictionary elements are available as an uncompressed WAV (368 kB) or an MP3 (48 kB). The ordering corresponds to that in the above figure. The sparse encoding of the audio spectrum using the above dictionary results in a rather piano-roll like representation, which, because of the almost one-to-one correspondence between dictionary elements and notes, can be used to generate a MIDI encoding of the music. We used a simple threshold-crossing detector to trigger MIDI events; some of the resulting MIDI files are available here [MIDI file 0, MIDI file 1, MIDI file 2, MIDI file 3]. Note that these MIDI files use the piano patch rather than the harpsichord patch as the harpsichord patches on most consumer systems (and some would say harpsichords in general) are rather painful to listen too. Real piano recordingThe results in this section were generated by analysing real piano recordings from two commercially available CDs (Jeno Jando playing Bach's Well Tempered Clavier, Naxos 855097071, and Andras Schiff playing Bach's Two and Three Part Inventions). The stereo signals were down-sampled to 11 kHz and the left and right channels summed before the analysis.
Sparse coding resulted in the above dictionary. The two following audio files differ in the way the sonified dictionary elements were normalised: in this MP3 (239 kB), the overall power relationships between the different elements are preserved (and hence some of them are much quieter than others, while in this MP3 (239 kB), each element was individualy scaled to have the same energy so that they all have similar loudnesses.
AcknowledgementsThis work was funded by EPSRC grant Automatic Music Transcription using ICA. |
|||||
| © Queen Mary, University of London 2005 | |||||
| Electronic Engineering, Queen Mary University of London, Mile End Road, London E1 4NS, UK Tel: +44 (0)20 7882 5346, Fax: +44 (0)20 7882 7997 | |||||