Digital Music Research Network

Digital Music Research Network

EPSRC Network GR/R64810/01

Funded by
Engineering and Physical Sciences Research Council

Report on SARC Research Workshop on Music Informatics and Cognition

29 May, 2004
Sonic Arts Research Centre (SARC),
Queen's University of Belfast

Report by:
Dave Meredith ( Centre for Computational Creativity, City University, London)

The SARC Research Workshop on Music Informatics and Cognition was one of the events in the programme of the 2004 Sonorities Festival of Contemporary Music. This festival marked the opening of the Sonic Arts Research Centre (SARC) at Queens University of Belfast.

The workshop was organised by Christina Anagnostopoulou and consisted of four lectures given by Alan Marsden, Henkjan Honing, Mark Steedman and François Pachet. All four speakers are well-established researchers in the exciting multi-disciplinary fields of music informatics and computational music cognition.

Alan Marsden began by tracing the changes that have taken place since the 1970s in the prevailing attitude towards the dichotomy between music data and the processing of this data. Whereas the main focus in the 1970s was on the development of music data formats, the 1980s saw the emergence of artificial intelligence approaches emphasizing the importance of modelling musical behaviour. In the 1990s, the development of neural networks, genetic algorithms and agent-based systems resulted in the break-down of the distinction between data and processing. He observed that there are still three essentially different unstructured data formats used for exchanging musical information - digital audio, MIDI and notation encodings - and he stressed that we should "return to the issue of information and processing".

Dr. Marsden then used the example task of "shortening" (i.e., summarising) a piece of music to demonstrate the need for structural representations and tools for extracting structure from music. Finally, he demonstrated a program called Novagen which uses Schenkerian principles of melodic structure to automatically generate tonal melodies in response to a dancer's movements captured using EyesWeb, a system developed in Genoa for visual capture and analysis of dance gestures.

Henkjan Honing described some exciting recent work on categorical rhythm perception that he has carried out with Peter Desain and the other members of the Music, Mind, Machine Group at the Universities of Amsterdam and Nijmegen. Dr. Honing described experiments in which expert listeners were asked to transcribe a range of temporal patterns spanning the space of all possible 4-event patterns. These experiments revealed the way that expert listeners categorically perceive temporal patterns that vary along a continuous time-scale, as rhythms in which the inter-onset intervals are related by small-integer ratios.

The rhythmic categories found in these experiments were shown to be connected, quasi-convex regions in a ternary plot of the complete rhythm space. Moreoever, they showed that the categories were not centred on mechanical renditions of the notated rhythms. The experiments also showed the powerful effect that both metre and tempo have on rhythm perception. For example, most listeners perceive a rhythm in which the inter-onset intervals are (0.263s, 0.421s, 0.316s) as a 1-2-1 rhythm when it is accompanied by a beat that induces a duple metre but as a 1-3-2 rhythm when it is accompanied by a triple-metre beat.

Dr. Honing then showed how an analysis of jazz performances revealed that performances of the same notated rhythm by different performers may belong to different rhythmic categories.

Mark Steedman described the advances that he has made since 1984 in the development of a grammar for characterising the set of 12-bar blues chord sequences. Professor Steedman explained that more complicated chord sequences may be derived from simpler ones by propagating perfect cadences backwards. He showed that his earlier (1984) grammar could be interpreted as a finite-state transducer but that this could not be used to construct a harmonic analysis of a piece. In order to do this, the idea of syntactic substitution had to be abandoned in favour of a grammar founded in a model theory for harmony, which he based on Longuet-Higgins's three-dimensional tonal space. Professor Steedman proposed that "musically coherent chord sequences" correspond to orderly progressions between two points by small steps within this space. Moreover, he claimed that representing the chord sequences in Longuet-Higgins's tonal space makes it clear that the dominant seventh tends to resolve to the tonic because the tonic triad "fits neatly" into a "hole" in the dominant seventh chord when viewed in this space.

In the second half of his talk, Professor Steedman explained how a combinatory categorial grammar would probably work better for characterising the system of permissible chord structures in 12-bar blues because it allows left-branching analyses of structures that we usually think of as being predominantly right-branching.

The final talk was given by François Pachet who presented an overview of the technologies developed at the Sony Computer Science Laboratory in Paris over the course of the three-year Cuidado project which ended in December 2003. The main application developed in the Cuidado project is the Music Browser, a database management system capable of handling large music catalogues and offering many novel content-based access methods in an integrated environment. It provides a sophisticated interface that allows the user to search or browse for music titles using not only textual editorial information (e.g., composer, date, publisher, etc.) but also acoustic descriptors, extracted directly from the audio data. The Music Browser also gives results organised by acoustic and cultural similarity. For example, the user may specify that he wishes to find "high energy" music or pieces with a similar timbre to some specified work.

Perhaps the most impressive technology presented was the Extractor Discovery System (EDS) incorporated into the Music Browser. This is the first generic scheme for extracting arbitrary high-level music descriptors from audio data. EDS is capable of automatically generating a complete audio extractor when given only a test-database and corresponding perceptive tests. It searches for specific and efficient audio features using genetic programming and clusters these features using a machine-learning algorithm. EDS has been used to generate an extractor that identifies whether or not a voice is present in a work with over 80% accuracy.

Finally, Dr. Pachet described the MusicCrawler application which automatically computes cultural associations between various text strings which may denote performers, composers, genres, song titles etc. The MusicCrawler crawls the web, accumulating text web pages and detecting occurrences of search items. It then computes co-occurrence matrices from which it derives inter-item distances which can be used as measures of "cultural similarity".

Many of those attending the workshop had travelled far and they were rewarded with the opportunity to hear extended presentations of cutting-edge research by leading practitioners in the field. It was noticeable that most of those attending the workshop were either doctoral students or professionals in the field. The coffee breaks and lunch therefore afforded a valuable opportunity for interesting discussions with other researchers from various parts of the world.

Dave Meredith