Sunday, October 16, 2005
Spatialization techniques
Here is what I have compiled so far about current spatialization technology, and I've tried to roughly categorize the papers. As I mentioned in the last meeting, I won't be able to come to this week's meeting, but I would appreciate some feedback on this list.
Spatial Perception
Hrtf-based binaural
Cinema-style loudspeaker arrays (5.1, 8-channel)
Ambisonics
Holophonics and Wave-field synthesis
Miscellaneous
Thanks, Grace
Spatial Perception
Spatial perception is frequency dependent, cues mostly come from higher spectral components, which makes it easier to localize widely distributed spectra.
Frequency cues for localization seems to be based on experience. In nature, high frequency sounds are more likely to originate above our heads and low sounds from below. Inverting this mapping will create bad localization.
Locating stationary sounds is easier; adding Doppler shift eases localization.
Malham, CMJ 19(4)
- D. G. MALHAM “Approaches to spatialization” Organised Sound 3(2): 167–77.
- J. Chowning, "The simulation of moving sound sources", Journal of the Audio Engineering Society, vol. 19, no. 1, pp. 2-6, 1971.
- F. R. Moore, "A general model for spatial processing of sounds", Computer Music Journal, vol. 7, no. 6, pp. 6-15, 1983.
- M. Kleiner, B.-I. Dalenback, P. Svensson, "Auralization - An overview", Journal of the Audio Engineering Society, vol. 41, no. 11, pp. 861-875, 1993
- Thiele, G. and G. Plenge “Localization of Lateral Phantom Sources: JAES 25(4): 196-200.
Hrtf-based binaural
Mimic the change in timbre, delay, and amplitude that happens directly at the ears.
Are most successful when personal hrtfs are used to custom tailor the experience for the listener, and when the listener’s head movement is tracked (a way to prevent front-back reversal error). Measurements of dummy heads such as KEMAR provide an approximate effect. Recordings made in binaural format can be transferred to stereo using interaural crosstalk cancellation, though the result will have an extremely small sweet spot. Binaural setups are suitable for Internet concerts, but not for traditional concert settings. HRTFs take a huge amount of processing power compared to other spatialization techniques.
Researchers: Larcher, Jot, Warusfel
- Véronique Larcher et Jean-Marc Jot “Techniques d'interpolation de filtres audio-numériques: Application à la reproduction spatiale des sons sur écouteurs” Congrès Français d'Acoustique, Marseille, France, Avril 1997
- Psychophysical calibration of auditory range control in binaural synthesis with independent adjustment of virtual source loudness William L. Martens PDF (JASA)
- Practical system for recording spatially lifelike 5.1 surround sound and 3D fully periphonic reproduction Robert E. (Robin) Miller III PDF (JASA)
- Individualized HRTFs using computer vision and computational acoustics, JASA Volume 108, Issue 5, p. 2597 PDF
- Cooper, D. H., and Bauck, J. L. 1989. "Prospects for transaural recording". J. Audio Eng. Soc. 37(1/2).
Cinema-style loudspeaker arrays (5.1, 8-channel)
Phantom sound images are created. When there is at least an 18dB difference in power between loudspeakers, the sound-producing body will seem to be located near the louder speaker. The perceived width of the sound image varies depending on the spectral content of the sound. The optimum angle between speakers is 60 degrees, meaning that a minumum of six speakers is needed to cover the two dimensional space surrounding the listener; smaller numbers of speakers will create more unstable images. The speaker layouts cannot change between studio monitoring and concert hall settings.
- VBAP (Vector-Based Amplitude Panning)
- Jot, Jean-Marc et Olivier Warusfel:
Spat~ : A Spatial Processor for Musicians and Sound Engineers, CIARM: International Conference on Acoustics and Musical Research, Mai 1995.
- Jot, J.-M., and Warusfel, O. 1995. "A real-time spatial sound processor for music and virtual reality applications". Proc. 1995 ICMC
- Dérogis, Philippe, René Caussé et Olivier Warusfel:
On the Reproduction of Directivity Patterns Using Multi-Loudspeaker Sources, ISMA: International Symposium of Music Acoustics 1995
Ambisonics
Ambisonic stages can be either pantophonic (2-D) or periphonic (3-D or with-height). Ambisonic encoding and decoding are completely separate and mostly autonomous processes, meaning that decoding of an ambisonic signal can be performed with little information about the encoding process. Ambisonic microphones can provide 3-D encoded signals. After encoding, each dimension (x,y,z,w) of an ambisonic signal can be manipulated independently. A pantophonic setup requires 3 channels and 4 loudspeakers, and a with-height setup requires 4 channels and 8 loudspeakers.
- Peter Felgett. ``Ambisonics. Part One: General System Description''. Studio Sound, 1:20--22,40, August 1975.
- Michael A. Gerzon. ``Periphony: With-Height Sound Reproduction''. Journal of the Audio Engineering Socitey, 21(1):2--10, 1973.
- Michael A. Gerzon. ``Ambisonics. Part two: Studio Techniques''. Studio Sound, pages 24--30, October 1975
- Malham, David 'Higher order Ambisonic systems for the spatialisation of sound' Proceedings, ICMC99, Beijing, October 1999
- Malham, D and Myatt, A “3-D Sound Spatialization using Ambisonic Techniques” CMJ 19(4) 1995
- D.G. Malham “Homogeneous And Nonhomogeneous Surround Sound Systems”.
- Ambisonics Encoding of Other Audio Formats for Multiple Listening Conditions Jérôme Daniel, Jean-Bernard Rault, and Jean-Dominique Polack, AES Convention 1998
- David Malham at University of York
Holophonics and Wave-field synthesis
Multiple speakers are used to represent secondary sources of the wave front (see Huygens’ Principle). Because the actual sound wave is reproduced, this technique creates an acceptable image over a larger area, but spatial aliasing is a potential error.
- Acoustic rendering with wave field synthesis, Marinus M. Boone
- Wave field synthesis: A promising spatial audio rendering concept, Gunther Theile and Helmut Witteky, Acoust. Sci. & Tech. 25, 6 (2004)
- Wave Field Synthesis And Analysis Using Array Technology, Diemer de Vries and Marinus M.Boone, Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, Oct. 17-20, 1999
- Nicol, R and M. Emerit 1999
Miscellaneous
- Hyper-dense transducer array (Malham)
- Hybrid speaker-headphone approach?
- Huopaniemi, J. 1999. Virtual acoustics and 3-D sound in multimedia signal processing
- Organized Sound Issue 3(2)
- Computer Music Journal Issue 19(4)
- Jean-Marc Jot Synthesizing Three-Dimensional Sound Scenes in Audio or Multimedia Production and Interactive Human-Computer Interfaces 5th International Conference: Interface to Real & Virtual Worlds, Montpellier, France, Mai 1996
- Belin, Pascal, Bennett Smith, L Thivard, Sophie Savel, Séverine Samson et Yves Samson:
The functional anatomy of sound intensity change detection, Society for Neuroscience, 1997.
- Belin, Pascal, Stephen McAdams, Bennett K Smith, Sophie Savel, Lionel Thivard et Séverine Samson:
The functional anatomy of sound intensity discrimination, Journal of Neuroscience, 1998.
- Jean-Pascal Jullien, Olivier Warusfel Technologies et perception auditive de l'espace. Cahiers de l'Ircam (5), mars 1994
- Jot, Jean-Marc:
Efficient models for reverberation and distance rendering in computer music and virtual audio reality, ICMC: International Computer Music Conference, Septembre 1997.