Human-computer analysis of musical sound mixtures
Bozena Kostek

Effectiveness of blind separation process of musical sounds contained in sound mixtures is examined utilizing subjective tests and the machine-learning approach. First, evaluation of the separation results is performed based on the perceptual assessment and analysis of the energy-based error between original signals used for mixing and separated ones. Then, an alternative approach to objective evaluation is introduced and discussed. Artificial Neural Networks are employed in the solution proposed to recognize separated sounds. The separation process utilizes sinusoidal modeling approach. The frequency domain representation is sampled by non-constant frequency complex exponentials and therefore may robustly represent long and closely spaced in frequency domain sounds with significant pitch variations. Detection of the sinusoidal content is performed, amplitude and phase tracks of the sounds contained in the mixture are estimated based on the short term time Fourier spectra. Also an extension to existing separation methods is shown. In this case harmonic partials are retrieved directly from the frequency domain. Signals in the mixture are represented then as a sum of sinusoids of time varying frequencies, phases and amplitudes. Four separation algorithms are presented, their performance with regard to the new approach to objective evaluation is assessed, results are discussed and conclusions derived.