Informed sound analysis for active music listening
Sylvain Marchand

Spectral models attempt to parametrize sound at the basilar membrane of the ear. Thus, sound representations and transformations in these models should be closely linked to the perception. However, the perceived quality is highly dependent on the analysis stage. For decades, researchers have spent lots of efforts improving the precision of sound analysis. And yet this quality is not sufficient for demanding applications.

One approach is to try to improve the analysis methods even further, without guarantee of success though, since theoretical bounds may exist, indicating the minimal error (i.e. maximal quality) reachable without extra information (blind approach).

Another approach is to inject some information. This can be prior knowledge about the sound sources and / or the way the human auditory system will perceive them (computational auditory scene analysis approach). But when access to the compositional process is given, another option is to use some bits of the ground truth as an additional information in order to help the analysis process. This is the concept of "informed analysis" (in opposition to the blind approach), used recently to improve sound source separation.

The additional information can be embedded in the sound signal itself, using audio watermarking techniques. The stereo mix can then be stored on a CD-audio, in a manner fully backward compatible with standard CD players while permitting an enhanced sound analysis thanks to the additional information. The precision of the analysis gets improved a way beyond the limitations of the blind approach.

This opens up new impressive applications, such as "active listening", enabling the listener to interact with the sound while it is played. The musical parameters (loudness, pitch, timbre, duration, spatial location) of the sound entities (sources) present in the musical mix stored on the CD can thus be changed interactively, with a great perceived quality.