![]() |
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| Electronic Engineering > Contact > People | ||||||||||||||||||||||||||||||||||||||||||||
|
Nicolas Chetry
Contact Details Title: Research Student Research Group / Lab: DSP & Multimedia / Centre for Digital Music Research Project: Musical Instrument Identification / Audio Signal Parametric Modelling Research Interests:Musical Instrument IdentificationWithin the Centre for Digital Music, we are interested in information retrieval from audio and musical signals. A particular aspect of our work focuses on the study and the design of automatic musical instrument identification systems. Such algorithms tend to reproduce the ability of humans to recognise and identify the sounds populating their environment. In terms of classification, these systems can provide tools to manage and browse musical database: retrieving pieces of music containing a given instrument, looking for similarities in different musical extracts or classifying pop songs in accordance with the lead singer voice are possible examples of user end applications. Audio source recognition systems are based on the concept of timbre, quality, or tone colour of a sound. Thanks to the timbre, humans can pick out different instruments even if they are playing notes at the same pitch and loudness. Timbre modelling is an essential and necessary step in the design of musical instrument identification system. During this stage, features are extracted from the audio signals and then used as input to a statistical classifier whose aim is to build a unique model for each considered instrument. Our work firstly focused on the use of the Line Spectrum Frequencies (LSF) in conjunction with a K-means classifier. Widely used in speech compression, the LSF are derived from the linear prediction polynomial coefficients. Representing the signal short-term spectral envelope, it is shown that the LSF are good candidates for modelling the timbre of a sound. During the training phase, the statistical modelling is performed using a k-means algorithm in its classical LBG (Linde-Buzo-Gray) form. Each instrument in the database is modelled by an optimised (in the k-means sense) dictionary of LSF vectors. In essence, each codeword can be seen as a characteristic short-term spectral envelope of the given class. During the testing phase, an identical number of codewords is determined for each excerpt in the testing set. The identification process consists in evaluating the distortion between two codebooks, one taken from the models database, one corresponding to the sample to identify. Recent research works reported that the consideration of temporal features in automatic identification systems significantly increased the systems robustness, particularly when dealing with instruments belonging to the same family. As a result, our base algorithm is currently modified in a way that it is possible to separate the modelling of onset and transient signals on one hand from the modelling of pseudo-stationary signals on the other hand. Perceptually Motivated Signal DecompositionHarmonic signals are composed of sinusoidal components having their frequencies being integer multiples of their fundamental frequencies. There are an infinite number of acoustic signals having harmonically or locally harmonically related components: the tones than can be produced with musical instruments or the voiced sounds produced by the human voicebox exhibits harmonic patterns. From the musical point of vue, the use of sinusoidal basis to decompose signals provide an elegant and meaningful approach to the parametric representation problem: techniques such as the Spectral Modelling Synthesis (SMS), Matching or Harmonic Matching Pursuit (MP, HMP) algorithms have found various applications in audio signal analysis, processing (effects, mastering, ...) and compression. From a perceptual point of vue, pure tones belong to the elementary signal families from which psychoacoustic tables (e.g. ISO/IEC 11172-3) are built. When more complex sounds are analysed, a mapping between empirical and novel data is performed and the global masking curves are extrapolated using rules determined during listening tests. Another goal of our work is to include psychoacoustic knowledge in pre-processing modules for signal analysis and high level information retrieval purposes. For example, we are interested to see whether improvements in terms of correct identification rates could be observed in musical instrument identification systems if a psychoacoustic decimation was performed prior to the feature extraction. This involves the design of an analysis/processing/synthesis stage during which perceptually relevant partials (regarding to the model) are selected and retained for the synthesis. We are currently in the process of evaluating the performances of a system based on the short-term Fourier transform and the MPEG-1/2 psychoacoustic models together with the musical instrument classifier described above. High Quality Voice CodingDigital speech coders are nowadays very effective in the coding of the narrow band of speech signal (300 Hz-4 kHz). State of the art dedicated coders achieve compression rate as high as 1:10 (e.g. G729 A at 8 kbits/s) for a quality equivalent to a 8 kHz sampled, 16 bits PCM signal. The increasing networks bandwidth capabilities, together with new portable DSP computational efficiency feed the need for higher digital quality speech. Recently, the new AMR-WB (Adaptive Multi-Rate Wide Band) coder has been selected by both 3GPP/ETSI and ITU to become the standard for the next generation of mobile communication system (3G/UMTS): a 16 kHz sampled speech signal can be encoded at nine different bit rates (from 6.60 to 23.85 kbits/s), depending on the quality of the connexion. This coder achieves very good speech quality, equivalent to the G722 (SB-ADPCM) at 64 kbits/s for the 23.05 kbits/s mode. In our research, we are interested in the design of a scalable wide-band (50 Hz-8 kHz) and ultra-wide-band (30 Hz-16 kHz) speech coder. For this purpose, we are exploring new ways of encoding and transmitting the signal. In terms of source coding, techniques of bandwidth extension present a strong interest in terms of bit rate reduction. The aim is to predict the upper spectral band from the narrow band with none or very few side information. More specifically, the spectral replication of the signal low frequency short-term residual has shown to produce relatively good quality speech when the sampling frequency is increased. As a preliminary experiment, the GSM-FR (13 kbits/s, MOS = 3.5) speech coder has been modified into its bandwidth extended version. The amount of side information to encode the high frequency band (8 spectral envelope parameters) are encoded on 3 bits, totalling 24 bits per frame of 160 ms or 150 bits/s. In terms of channel coding, we aim at developing techniques in order to hide the side information representing the high frequency band into the encoded bitstream using watermarking algorithms. PublicationsN. Chetry, M. Davies and M.Sandler. Identification of Monophonic Instrument Recordings using K-means and Support Vector Machines, in Proc. Digital Music Research Network conference, Glasgow, 2005. N. Chetry, M. Davies and M. Sandler. Musical Instrument Identification using LSF and K-means, in Proc. AES 118, Barcelona, 2005. C. Duxbury, N. Chetry, M. Sandler and M. Davies. An Efficient Two-stages Implementation of Harmonic Matching Pursuit, in Proc EUSIPCO, Vienna, 2004. |
|||||||||||||||||||||||||||||||||||||||||||
| © Queen Mary, University of London 2008 | ||||||||||||||||||||||||||||||||||||||||||||
| Electronic Engineering, Queen Mary University of London, Mile End Road, London E1 4NS, UK Tel: +44 (0)20 7882 5346, Fax: +44 (0)20 7882 7997 | ||||||||||||||||||||||||||||||||||||||||||||