VITA 2010

  Vienna Talk 2010 on Music Acoustics
"Bridging the Gaps"
      September 19–21


Timbre synthesis and perception (Chr. Reuter)

In this session, model representations of timbre perception and synthesis will be discussed. The focus will be particularly on questions (that perhaps might help to fill or bridge some gaps) such as: What timbre cues entail what kind of instrument recognition (from a music-psychological perspective) accompanied with what kind of cortical representation (from a neuronal perspective)? How can synthesis techniques like physical modeling or pulse forming help us to deepen our knowledge about timbre genesis in real instruments? What are the instruments and synthesis techniques in the first part of the 20th century that are more or less entirely unknown to Western researchers? How is it possible to extract individual acoustical events (as noises, instruments or similar) from a complex auditory scene with the help of mathematics?

hide abstracts

Andermann; Martin: 
(Invited) / P
The size of a sound source determines the scale of its resonances (“acoustic scale”) in harmonically complex periodic sounds like vowels or instrumental tones. Here, the representation of acoustic scale in human auditory cortex was investigated in two experiments using Magnetoencephalography (MEG). Using the STRAIGHT software (Kawahara et al., 2004), French horn tones were scaled to sound larger or smaller by shifting their spectral envelope as a unit along a logarithmic frequency axis independent of the fundamental frequency. In experiment 1, auditory evoked fields (AEFs) were recorded in fifteen subjects as they listened to scaled French horn tones which had additionally been “whitened” to eliminate pitch cues. In experiment 2, the same subjects were presented with French horn tone triplets varying in pitch and scale. The resulting cortical activity was evaluated by spatio-temporal source analysis with two equivalent dipoles in each hemisphere. In the first experiment, the change of acoustic scale elicited a well-defined N1m/P2m AEF-complex in all subjects with the cortical generators of the N1m component being located in the Planum temporale and the P2m being generated in Heschl’s gyrus. The N1m model was then applied to the data from the second experiment, where the neuromagnetic responses to a scale change could be clearly separated from responses to changes in pitch. The results demonstrate that acoustic scale as a physical property of sound can be distinguished in its cortical representation from pitch-associated cortical activity. Further, the existence of a “Scale-N1m” within the complex structure of the N1m highlights the importance of this component for the processing of spectral features in auditory cortex.
Davidenkova; Ekaterina: 
(Invited) / O
Analysis of electro-musical instruments development history in Russia presented in this paper shows that the period from 30s to 70s of 20th century characterized by great interest of acoustics scientists and engineers to produce a new generation such instruments. First instrument was "Thereminvox" (1919) authored by outstanding Russian engineer L.S. Termen, who was continually working throughout many years to invent instruments of new generation: “Terpsitone”, “Rhythmicon”, “Harmonium” and others. New series of other electro- musical instruments came up after that: in 1924 V.A. Gurov and V.I. Volynkin built monodic neck- rheostatic instrument named “Violena”; in 1925 S.N. Rzhevkin created polyphonic keyboard instrument “Cathoodic Harmonium”; by 1935 A.A. Volodin engineered series of keyboard-necked instruments “Ekvodin”. In 1937 I.D. Simonov represented keyboard instrument “Companola” and in 1944 A.V. Rimsky-Korssakov, V.A. Kreytser, A.A. Ivanov produced keyboard electro- musical instrument “Emiriton”. In 1955 S.G. Korssunsky invented harmonium “Kristadin” and in 1958 same I.D. Simonov engineered piano with electronic sound driver. First analogue synthesizer (ANS), served as prototype for following instruments of this type, was engineered by E.A. Murzin in 1938-1957. Based on unique method of optical generation of pure sounds with following photoelectrical reading of signals,the ANS combined process of sounds generation, recording and playback of sounds by means of painting on the glass. With 72nd degree scale temperament, 10 octaves sounding range ANS let composer use all frets and tone scales, noises and unlimitedly alter timbres.
Dörfler; Monika: 
(Invited) / O
In a mathematical setting, most commonly used representations of audio signals may be modelled in the time-frequency domain. For musical instrument sounds possessing the same fundamental frequency and onset-time, the difference in timbre can be described via a transition mask. Using this mask, the transition between instrumental timbres may be directly studied. In particular, intermediate timbres can be produced in a controlled manner, i.e. while observing the changes of the time-frequency pattern.
In this contribution we will introduce the techniques necessary to understand the approach. In particular, we will point out the importance of judiciously designed time-frequency dictionaries, which allow for perfect reconstruction. This design step requires the inspection of certain operators associated with time-frequency representations. We will show that in situations of practical relevance, these operators are pure multiplication operators and can thus be easily inverted. This inversion step then provides a perfect analysis-synthesis system allowing for controlled coefficient modification. Furthermore, we will show several and play several examples. As an application, the approach is applied to the classification of instrumental sounds.
Fricke; Jobst P.: 
(Invited) / O
In 1951 Licklider presented a model for pitch perception, based on a periodicity analysis of neural spikes. This process of pitch extraction is similar to an autocorrelation analysis. Langner and Schreiner could prove in 1988 that a periodicity analysis in the auditory organ exists and that the periodicity pitch is neurally represented in the Inferior Colliculus and Cortex. Their dimension is independent of the tonotopical representation and runs in about orthogonal to it. The periodicity of sound signals of consonant intervals could be neuronally proven by means of a periodicity analysis too (Tramo et al. 2001). The periodicity of the acoustic signals however is imperfect in music performance. Intonation deviations, which are a disturbance of the periodicity, are tolerated in the hearing process to a considerable extent. This can be seen particularly in the judgment of consonant intervals. Depending on the musical context, standard deviations of 13 cents for the optimal intonation were measured for the fourth as well as for the fifth. In total the variation was even at 70 cents. For those on this scale experimentally determined hearing tolerances, the statistical processes of neural coding and processing, in particular the neural integration for the autocorrelation, seem to be responsible.
Oehler; Michael: 
(Invited) / O
The aim of the presented study is to investigate the role of formants and noise parts (e.g. breath noise, air flow) of several woodwind sounds with respect to the perceived typicality of the instrument sounds. In concordance with the results of Reuter (1995, 1996) it is supposed that formants play a dominant role, if the fundamental frequency stays below the first formant area, whereas noise parts become more important, when the fundamental frequency exceeds the region of the first formant. By means of the currently developed analysis software Jaco Visual Signal (by Herbert Griebel with special functions for the Musicological Institute of the University of Vienna), recorded woodwind sounds could be precisely decomposed into a priori specified spectral parts. Clarinet, flute, oboe and bassoon sounds with deleted formant areas, deleted noise parts and a combination of both deletions as well as the inverse versions (only the extracted formant areas etc.) were produced in different registers and used as stimuli in a listening experiment. Subjects were presented with the original instrument sounds and subsequently judged the similarity as well as the naturalness of the modified and original stimuli. Although the conditions were rated slightly different in dependence of the specific instrument, the overall results seem to support the hypothesis. At the same time the described corpus of stimuli is used in neuropsychological timbre experiments (MEG) that, in a further step, will be interrelated with the results of the presented study. Furthermore future experiments may additionally include the phase parameter, in order to correlate the psychological results with current research in the field of digital pulse forming as an explanation for the sound production of wind instruments.
Rupp; Andre: 
(Invited) / O
Harmonically complex sounds like instrumental tones and vowels exhibit a detailed and distinct formant structure which is crucial for the identification of the instrument. In the current experiment we investigated the specific cortical representation of formants by recording auditory evoked fields (AEF) evoked by different tones of the oboe (d1, a1, d2, a2, d3) and the bassoon (D, A, d, a, d1). In addition to the original tones, we presented the isolated formants as well as the original tones where the formant structure had been extracted using the Jaco Visual Editor of Herbert Griebel (see Abstract "Jaco" of session "Applications of musical sound analysis/synthesis" (Beauchamp/Carral)). The cutoff for isolating the formants included the range from from 0 dB (max level) to -30 dB. The AEFs of the oboe and bassoon tones were recorded in two separate sessions. The original and manipulated tones were presented diotically in a pseudo-random order at a comfortable level. The inter-stimulus-interval varied between 800 and 1000 ms. AEFs were recorded in normal hearing subjects using whole-head magnetoencephalography and evaluated by spatio-temporal source analysis. This technique allows to disentangle the specific neuromagnetic representation of the left and right auditory cortex. Beside the description of the AEF waveform morphology the presentation will focus on the relation of the AEF with the results from simulations of (i) the neural activity at the output of cochlear preprocessing and (ii) the spectro-temproal processing as reflected by the strobed temporal integration using the Auditory Image Model of Patterson, Allerhand & Giguere (1995).
Siddiq; Sadjad: 
(Invited) / P
The acoustic properties of the sitar are studied with the aid of physical models. The non-linearity of the string movement, caused by the bridge acting as an obstacle to the vibrating string, is of special interest.

Different kinds of physical modelling techniques are investigated, each offering certain advantages and drawbacks. Several hypothesis concerning the non-linearity and the sound formation can be verified in the models proposed.

A mass-spring model gives acoustically satisfying results and sheds light on the interaction of string and bridge, while being very slow in calculation. The important role of dispersion for the sound of the sitar is demonstrated in a finite difference model and further studied in a waveguide model.

After discussing the difficulties of the implementation of the models, their results are compared to recordings of the instrument. The waveguide model is found to yield satisfying results while being very fast at the same time.

Banner Pictures: (c) PID/Schaub-Walzer