VITA 2010

  Vienna Talk 2010 on Music Acoustics
"Bridging the Gaps"
      September 19–21


Applications of musical sound analysis/synthesis including timbre perception (J. Beauchamp, S. Carral)

In this session we welcome all papers that present methods to describe, characterise and/or differentiate instruments by analysing the radiated sound they produce.

hide abstracts

Beauchamp; James: 
(Keynote) / O
In Western music culture instruments have been developed according to unique instrument acoustical features based on types of excitation, resonance, and radiation. These include the woodwind, brass, bowed and plucked string, and percussion families of instruments. On the other hand, instrument performance depends on musical training, and music listening depends on perception of instrument output. Since musical signals are easier to understand in the frequency domain than the time domain, much effort has been made to perform spectral analysis and extract salient parameters, such as spectral centroid changes, in order to create simplified synthesis models for musical instrument sound synthesis. Moreover, perceptual tests have been made to determine the relative importance of various parameters, such as spectral centroid variation, spectral incoherence, and spectral irregularity. It turns out that importance of particular parameters depend on both their strengths within musical sounds as well as the robustness of their effect on perception. Methods that the author and his colleagues have used to explore timbre perception are: 1) discrimination of parameter reduction or elimination, 2) parameter perturbation, and 3) multidimensional scaling based on perception of timbre dissimilarity. Ramifications of this work for sound synthesis and timbre morphing will be discussed and demonstrated.
Buen; Anders: 
(Invited) / O
During this presentation one should be able to hear and see what makes a violin sound more or less classic Italian. Additionally, some ideas how to increase the probability of making an instrument with such characteristics is presented.
Violin timbre is discussed and demonstrated in light of Heinrich Dünnwalds work on parameters for relative levels of certain frequency ranges of narrowband violin body spectra. He analyzed more than 700 violins and found some characteristic objective features in the spectra of Old Italian violins that differed from modern master and factory made violins.
Using data from my impact hammer data set of violins and hardangers I compare the spectra of instruments with low versus high values of these Dünnwald parameters and make “difference filters” between them. These filters are then used in modifications of a short played violin phrase demonstrating the effects as the Dünnwald parameters are varied. I will also present some preliminary results from a data mining project where I extract some simple significant correlations between construction details and these objective timbre parameters.
Fritz; Claudia: 
(Invited) / O
This study aims at investigating what “nasality” and “clarity” means for violinists, and what their acoustical correlates are. Based on his measurement of the acoustical properties of a large range of violins, Dünnwald (1991) associated a large amplitude in the band 650-1300 Hz with “nasality” and a low amplitude in the band 4200-6400 Hz with “clarity”, but without any perceptual testing. Therefore, Fritz et al. (2009) conducted listening tests with English speaking violinists to analyse the relationship of these verbal descriptors to specific acoustical features of computer-generated (“virtual”) violin sounds.
The listening test was redone with French speaking violinists. To go beyond the simple labelling, the participants were asked to give explicit descriptions of what “clair” and “nasal” means for them and the reasons of their choices.
The results are very similar to those obtained in English and conflict with Dünnwald’s suggestions. Like for “clear”, the results for “clair” yielded a high degree of consistency between subjects and the term was associated with an increase of energy in the frequency range 1.6 kHz to 3.2 kHz. For “nasal”, subjects could be divided into two groups, each showing high consistency. An increase in the band 1.6 kHz to 3.2 kHz increased “nasalité” for the first group while decreasing it for the second one. Linguistics analyses were then conducted on the descriptions given by the participants and were put in correspondence with the results from the listening test. While acoustical parameters cannot discriminate the two meanings of “clair” (“rich in high frequencies” and “precise”/“definite”), the two groups for “nasal” were semantically identified as corresponding to two meanings of nasality (the “vocal nasality” referring to the quality of a twangy voice for one group, and the “phonetical nasality” and/or the fact that the sounds appear low-pass filtered for the other group).
Kostek; Bozena: 
(Invited) / O
Effectiveness of blind separation process of musical sounds contained in sound mixtures is examined utilizing subjective tests and the machine-learning approach. First, evaluation of the separation results is performed based on the perceptual assessment and analysis of the energy-based error between original signals used for mixing and separated ones. Then, an alternative approach to objective evaluation is introduced and discussed. Artificial Neural Networks are employed in the solution proposed to recognize separated sounds. The separation process utilizes sinusoidal modeling approach. The frequency domain representation is sampled by non-constant frequency complex exponentials and therefore may robustly represent long and closely spaced in frequency domain sounds with significant pitch variations. Detection of the sinusoidal content is performed, amplitude and phase tracks of the sounds contained in the mixture are estimated based on the short term time Fourier spectra. Also an extension to existing separation methods is shown. In this case harmonic partials are retrieved directly from the frequency domain. Signals in the mixture are represented then as a sum of sinusoids of time varying frequencies, phases and amplitudes. Four separation algorithms are presented, their performance with regard to the new approach to objective evaluation is assessed, results are discussed and conclusions derived.
Marchand; Sylvain: 
(Invited) / O
Spectral models attempt to parametrize sound at the basilar membrane of the ear. Thus, sound representations and transformations in these models should be closely linked to the perception. However, the perceived quality is highly dependent on the analysis stage. For decades, researchers have spent lots of efforts improving the precision of sound analysis. And yet this quality is not sufficient for demanding applications.

One approach is to try to improve the analysis methods even further, without guarantee of success though, since theoretical bounds may exist, indicating the minimal error (i.e. maximal quality) reachable without extra information (blind approach).

Another approach is to inject some information. This can be prior knowledge about the sound sources and / or the way the human auditory system will perceive them (computational auditory scene analysis approach). But when access to the compositional process is given, another option is to use some bits of the ground truth as an additional information in order to help the analysis process. This is the concept of "informed analysis" (in opposition to the blind approach), used recently to improve sound source separation.

The additional information can be embedded in the sound signal itself, using audio watermarking techniques. The stereo mix can then be stored on a CD-audio, in a manner fully backward compatible with standard CD players while permitting an enhanced sound analysis thanks to the additional information. The precision of the analysis gets improved a way beyond the limitations of the blind approach.

This opens up new impressive applications, such as "active listening", enabling the listener to interact with the sound while it is played. The musical parameters (loudness, pitch, timbre, duration, spatial location) of the sound entities (sources) present in the musical mix stored on the CD can thus be changed interactively, with a great perceived quality.
Reuter; Christoph: 
(Invited) / O
Herbert Griebel and Christoph Reuter.
Jaco is a tool that allows you to work directly in the time-frequency domain. Basic operations are deleting, amplifying, copying and pasting arbitrary time-frequency regions. These operations require only a selection, i.e., a time frequency mask, and may not consider signal components like noise and sinusoids. More advanced functions will allow selection of signal components so that basic operations can be performed on these components as well. For example, sinusoids can be selected by simply clicking on them. In the demonstration, we will focus on a few features used at the Institute of Musicology in Vienna. Thresholding the energy distribution over the time-frequency plane will allow you to characterize sounds. This is done by removing certain energy levels of the signal below or above a certain threshold. Using sinusoid tracks multiple operations can be performed, such as removing or amplifying voices from a dense music recording or analyzing frequencies of the harmonics using mathematical models. Further examples will be to apply spectral envelopes while working with timbre.
Interactive tools have proved to be useful for music education as they provide the user with real time performance assessment, unlimited orientation and practice time, entertaining ways of displaying content and variable levels of difficulty in a single application. The algorithms behind these tools require in general solid rhythmic, harmonic and melodic processing approaches as well as flexibility in terms of parameter selection and processing time. In the past years, many efforts have been made by the Music Information Retrieval (MIR) community to improve algorithm performance. Even though good results have been obtained in systems for harmonic-percussive separation, singing voice extraction, accompaniment track creation, pitch tracking, rhythm and beat extraction, audio segmentation and chord transcription among others, there is still much room for improvement and exploration. In particular, better understanding of instrument acoustics and models and their implications in signal processing is needed. Many studies have attempted to include instrument dependent spectral information in their algorithms. However, most of them are trained systems that use a set of collected reference data but that include very limited or no real acoustical models and parameters. In contrast, some few systems have attempted to use instrument specific acoustical models but suffer in general from lack of robustness due to the great variability found in terms of playing styles, setups and performers that necessarily create deviations from the model. It is then important and relevant to explore possibilities of building algorithms that combine the flexibility of trained systems with current signal processing techniques and real structured acoustical information from the treated instruments. Furthermore real testing data, recordings, measurements and design parameters used by acousticians and instrument makers can also be important in terms of generality and flexibility in any built system.
This study investigates the appearance of combination tones in violins.
An experiment was performed in which a violinist played a particular musical interval as accurately as possible. This interval was recorded and then subsequently analysed using a Fourier Transformation. In addition to the partial tones of the primary interval the resulting spectrum showed frequencies which correspond to combination tones. Such particular frequencies may influence the timbre of musical intervals played on the violin.
With our newly devised tone matrix one can compute all potential combination tones that can occur between any pairs of partial tones. The detailed analysis of musical intervals by both the frequency spectrum and the tone matrix show in their partial tone structure characteristic mirror and point symmetries.
It is hoped that this research will lead to results relevant for composers, interpreters and violin acousticians. The presentation will be interactive and includes practical examples.
The Pipa is a short-necked lute and one of the oldest and most important solo instruments in China. Because of its complicated technique and wide range of expressive musical forms and styles, the Pipa is an instrument which is popular not only in China, but also in Western culture.
The traditional Pipa schools were formed in Shanghai, the most prosperous place, in the middle of the nineteenth century. Each Pipa School had its own special music collections, notations, fingerings (techniques), styles, different improvisations, performance aesthetics and representative Pipa teachers.
For this study, recordings of three representative Pipa Masters were used to analyse the same civil piece “Yue er gao” (The Moon on High) instead of simply analysing the musical score, using the computer program SNDAN developed by Rob Maher and James Beauchamp. This paper explores the way the masters used to perform the same core melody but embellished it in various ways, which were governed by the different aesthetic principles of the respective school. In addition, this paper examines how various playing techniques produce different sound effects regarding pitch variation and tremolo.
Banner Pictures: (c) PID/Schaub-Walzer