BATES TheCompositionAndPerformanceOfSpatialMusic
BATES TheCompositionAndPerformanceOfSpatialMusic
of Spatial Music
Enda Bates
Trinity College Dublin, August 2009.
Declaration
I hereby declare that this thesis has not been submitted as an exercise for a degree at
this or any other University and that it is entirely my own work.
I agree that the Library may lend or copy this thesis upon request.
Signed,
___________________
Enda Bates
ii
Summary
The use of space as a musical parameter is a complex issue which involves a
number of different, yet interrelated factors. The technical means of performance, the
sonic material, and the overall musical aesthetic must all work in tandem to produce a
spatial impression in the listener which is in some way musically significant.
Performances of spatial music typically involve a distributed audience and often take
place in an acoustically reverberant space. This situation is quite different from the
case of a single listener at home, or the composer in the studio. As a result, spatial
strategies which are effective in this context may not be perceived correctly when
transferred to a performance venue. This thesis examines these complex issues in
terms of both the technical means of spatialization, and the compositional approach to
the use of space as a musical parameter. Particular attention will be paid to the
effectiveness of different spatialization techniques in a performance context, and what
this implies for compositional strategies which use space as a musical parameter.
Finally, a number of well known works of spatial music, and some original
compositions by the author, are analyzed in terms of the perceptual effectiveness of
the spatialization strategy.
The results of a large number of listening tests and simulations were analysed
to determine the fundamental capabilities of different spatialization techniques under
the less than ideal conditions typically encountered during a performance. This
analysis focussed on multichannel stereophony, Ambisonics, and Wavefield
Synthesis. Other methods which are orientated toward a single listener are not
addressed in this thesis. The results indicated that each spatialization scheme has
particular strengths and weaknesses, and that the optimum technique in any situation
is dependent on the particular spatial effect required. It was found that stereophonic
techniques based on amplitude panning provided the most accurate localization but
suffered from a lack of spaciousness and envelopment. Ambisonics provided an
improved sense of envelopment but poor localization accuracy, particularly with first
order Ambisonics systems. Consequently it would appear that stereophony is
preferable when the directionality and focus of the virtual source is paramount, while
Ambisonics is preferable if a more diffuse enveloping sound field is required.
Ambisonics was consistently preferred for dynamically moving sources as this
iii
iv
Acknowledgements
I am deeply grateful to my two supervisors, Dr. Dermot Furlong and Mr. Donnacha
Dennehy, for all their support and guidance over the past four years. Without their
incredible knowledge and encouragement, this thesis would not have been possible.
I would also like to particularly thank Dr. Fionnuala Conway and Gavin Kearney for
their help over the years, all those who took part in listening tests, and my colleagues
at the Spatial Music Collective.
Finally I am very grateful for the support of my family and friends, and all the
students and staff of the Music and Media Technology course in Trinity College.
Table of Contents
Summary ...................................................................................................................... iii
Acknowledgements........................................................................................................v
Table of Contents..........................................................................................................vi
List of Figures ................................................................................................................x
List of Tables ..............................................................................................................xiv
1 Introduction.................................................................................................................1
1.1 Spatial Music: A Personal Perspective ................................................................1
1.1.1 What Now in the Age of Disillusionment.....................................................2
1.1.2 Why Spatial Music?......................................................................................3
1.1.3 Why Talk About Spatial Hearing?................................................................5
1.1.4 The Imaginative Use of Empirical Thinking ................................................6
1.2 The Research Question ........................................................................................7
1.3 Aims and Objectives ............................................................................................8
1.4 Methodology ........................................................................................................8
1.5 Motivation............................................................................................................9
1.6 Outline................................................................................................................10
2 Spatial Hearing..........................................................................................................13
2.1 Directional Hearing............................................................................................14
2.2 Directional Hearing and Acoustics ....................................................................19
2.3 Distance Hearing & Moving Sources ................................................................22
2.3.1 Summary of Spatial Hearing.......................................................................25
2.4 Spatial Hearing with Multiple Sources ..............................................................26
2.4.1 The Limits of Auditory Perception .............................................................28
2.4.2 Spatial Hearing and Virtual Sources...........................................................29
2.4.3 Spatial Audio Techniques ...........................................................................30
3 Stereophony ..............................................................................................................32
3.1 Quadraphonic Sound..........................................................................................35
3.2 Cinema Surround Sound and 5.1 .......................................................................37
3.3 Multichannel Amplitude Panning Techniques...................................................38
3.4 Theories of Stereophonic Localization ..............................................................39
3.5 Critics of Summing Localization .......................................................................40
3.6 Meta-Theories of Localization...........................................................................41
vi
ix
List of Figures
Fig. 2.1 Spherical coordinate system ...........................................................................13
Fig. 2.2 Lateral source example...................................................................................15
Fig. 2.3 Zone-of-confusion example............................................................................17
Fig. 2.4 Impulse response of a room with a relatively short reverb time ....................18
Fig. 2.5 Source distance v sound intensity...................................................................23
Fig. 2.6 Spatial cues and stream segregation ...............................................................28
Fig. 3.1 Bell Labs stereophony, proposed (left) and implemented (right)...................32
Fig. 3.2 Blumleins coincident microphone arrangement............................................33
Fig. 3.3 Standard stereophonic arrangement................................................................34
Fig. 3.4 Lateral phantom source direction versus ILD for a quadraphonic layout ......36
Fig. 3.5 ITU 5.1 loudspeaker arrangement ..................................................................37
Fig. 4.1 Basic multichannel stereophony example ......................................................44
Fig. 4.2 Microphone responses derived from two figure-of-eight microphones .........45
Fig. 4.3 The Soundfield microphone ...........................................................................46
Fig. 4.4 Zero and first order spherical harmonics........................................................47
Fig. 4.5 Second (top) and third (bottom) order spherical harmonics...........................49
Fig. 4.6 First, second & third order microphones ........................................................49
Fig. 4.7 Ambisonic plane wave - theoretical (left) and real (right) sources ................53
Fig. 4.8 Directivity patterns of various ambisonic decoding schemes ........................54
Fig. 4.9 Illustration of the Huygens principle ..............................................................55
Fig. 4.10 The WFS method..........................................................................................56
Fig. 4.11 WFS reproduction of different wavefronts...................................................56
Fig. 4.12 WFS reproduction of two-channel stereo.....................................................57
Fig. 4.13 Truncation effects (a) and (b) 4ms later .......................................................58
Fig. 4.14 A distant source (a) below the aliasing freq & (b) above.............................59
Fig. 4.15 Optimised Phantom Source Imaging WFS...................................................59
Fig. 4.16 A WFS cinema system in Ilmenau ...............................................................60
Fig. 5.1 Moores spatial model (a) direct signal (b) reflected signal paths .................64
Fig. 6.1 Direction of phantom source versus ILD reported by Theile.........................70
Fig. 6.2 Naturalness responses reported by Guastavino ..............................................74
Fig. 6.3 Decoder criteria related to the size of the listening area.................................75
x
xiii
List of Tables
Table 4.1 Furse-Malham set of encoding coefficients.................................................50
Table 4.2 Analysis of Ambisonics normalisation schemes .........................................51
Table 4.3 Summary of ambisonic decoding schemes..................................................53
Table 8.1 Auto Harp tuning scheme ..........................................................................185
xiv
1 Introduction
One of the defining characteristics of electroacoustic music has been the use of
technology to expand musical boundaries, whether through electronic processes in
conjunction with traditional instruments, or through electronic processes alone. This
allowed previously neglected musical parameters such as timbre to come to the fore,
while other parameters such as rhythm, melody and traditional harmony were
relegated to the background. The use of space as a musical parameter is another novel
aspect of this artistic movement. However, space is in many respects, fundamentally
different from these other parameters. Timbre relates to the spectral and temporal
relationship between the components of an individual sound object, while rhythm,
melody and harmony involve temporal and spectral relationships between sound
objects. Space as a musical parameter is, however, much broader and more difficult
to define. It incorporates the dimensions of individual sound objects, the relationships
between sound objects and even the relationship between the sound objects and the
acoustic space in which they are heard. Furthermore, audible space itself can only
become apparent through the temporal development of sounds themselves. Space as a
musical parameter is therefore all-encompassing and yet difficult to define. It
encompasses every aspect of a sound and yet the spatial aspects of a sound are often
not consciously perceived as separate from the sound itself. In our evolutionary
development, the where is dealt with immediately by our subconscious while the what
and why become the primary focus of conscious attention.
The basic philosophy outlined in the previous Section is not in any sense
unusual and it could in fact be considered as quite symptomatic of our times. The title
of this section has been used to refer to various time periods following the First World
War, however this description is as appropriate now as in any other period of history.
The English speaking world has lost a great deal of faith in science, religion, politics,
and the media (often it should be said with very good reason) and while this can be in
many respects quite liberating, it may also lead to a certain loss of direction. In a
musical context, this dichotomy has been further magnified by the further
development of digital technology which fundamentally alters the relationship
between the composer and their instrument. In the past, the sheer expense of
hardware-based music technology meant that composers were by necessity limited in
the technological options available to them. However, this in turn forced composers
to be highly creative in their use of these limited devices, and to develop sophisticated
this point. I very quickly formed the opinion that spatialization was a critical and
necessary aspect of electronic music, as the spatial movement of sounds provided a
physicality and dynamism that greatly enhanced the listening experience. This
opinion was reinforced when I realized that the earliest practitioners of electronic
music had also believed in the absolute necessity of spatialization, particularly in a
performance context (see Chapter 8).
As with many other composers, this early enthusiasm was quickly tempered
by the experience of hearing my work performed for the first time outside of the
studio over a large loudspeaker array. The piece Discordianism, which is discussed in
detail in Chapter 10.2, is a clear example of a work which functions quite well for a
single listener, but rapidly degrades as the size of the listening area increases. This is
particularly true of the dynamically moving noise drone which is the focal point of the
third and final movement. Amplitude panning and a quadraphonic system simply
could not effectively reproduce this movement when the distance between the
loudspeakers was increased to cater for a larger audience. However, having created a
stereo mix of this piece for a CD release, I remained convinced of the benefits of
spatialization, as much of the dynamism and clarity of the work was lost in the
reduction to two channels. As I began to explore these issues it quickly became
apparent that a gap existed between empirical research on spatialization techniques
and the issues faced by a composer of spatial music during a live performance. The
majority of the scientific research conducted in this area has focused on the ideal
conditions of a single listener in an acoustically treated space. In addition, much of
the writing by composers of spatial music has discussed both the single listener and
performance contexts and this seemed somewhat anachronistic considering the drastic
decline in sales of music media in the modern era and the continued widespread
enthusiasm for live performances. These very practical concerns were the initial
motivation for much of the technical research discussed in this thesis. However, the
experience of composing Discordianism also raised other, broader questions about the
nature of spatial music composition. While the organization of material in this work
can be analysed using Denis Smalleys theory of spectromorphology (see Chapter 8.5)
this was not consciously implemented at the time. Certain aspects of the work are
quiet effective, such as the spatial distribution of different layers of rhythmic material
for example. However, the way in which spatial movement is used to support gestural
interplay between the different layers of material is quite inconsistent, as there was no
4
underlying rationale guiding the compositional process in this regard. In addition this
argument could perhaps also be applied to the mix of different aesthetics contained
within the piece. As I began to examine how other composers had dealt with the use
of space as musical parameter, it quickly became apparent that this issue could not be
treated in isolation, as the way in which these composers used space is intimately
connected to their overall musical philosophy. A study of the use of space in music,
rapidly lead to debates which are as old as electronic music itself. Spatial music is in
many respects a microcosm of electroacoustic music which can refer to many of the
different styles within this aesthetic, but is not tied to any one in particular. The study
of the aesthetics of spatial music and the musical use of space as a musical parameter
therefore appeared to be a good way to indirectly approach electroacoustic music
composition and the performance of electronic music in general.
The two, seemingly distinct issues discussed in the previous Section, one quite
practical and technical, the other more conceptual and artistic, emerged at the same
time from a single piece of music, and have been ever present since. It therefore felt
entirely natural to me to approach the study of spatial music via these distinct, yet
interrelated topics. The musical use of space exploits a fundamental aspect of
hearing, namely our ability to locate sounds in space. However unlike other
parameters such as pitch, rhythm and timbre, the movement of sounds through space
is usually not an intrinsic aspect of the sound itself, but is instead an illusion created
through the careful blending of a number of static sources. A composer of spatial
music cannot therefore treat this parameter in the same way as pitch or rhythm, and
the technical details of how this illusion is created and maintained are clearly a
fundamental and necessary part of the composers craft. While practical problems
remain, there exists a clear empirical approach to solving these issues, and this will
allow us to impose some order on space by revealing just how far the illusion can be
pushed before it falls apart. This in turn will provide us with a firm basis upon which
we can reliably exploit space as a musical parameter. The empirical and systematic
examination of these techniques will indicate their particular strengths, capabilities
5
and weaknesses and will in effect function as a form of orchestration guide for spatial,
electroacoustic music
presented. Some of these works, particularly early pieces such as Discordianism, are
good illustrations of the practical and technical problems common to works of spatial
music and which are examined in detail earlier in the thesis. However, later works
such as Auto Harp and Rise hopefully illustrate the benefits of this research and a
more positive use of the empirical examination of spatialization techniques. The
algorithmic control of space using flocking algorithms represents an even tighter
merging of empirical and creative processes. In this case, it is the algorithm and not
the composer which dictates the movement of sound through space and works such as
Flock and Rise use this relatively new technique to explore the implications of this
ceding of control for the compositional process.
1.4 Methodology
This thesis begins with an examination of the perceptual mechanisms related
to spatial hearing and a scientific evaluation of the perceptual capabilities of the most
commonly used spatialization schemes, namely stereophony, Ambisonics and
wavefield synthesis (WFS). The perceptual performance of these systems under the
less than ideal conditions typically found in a performance is examined in detail
through a series of listening tests carried out by the author. The results of these tests
are then incorporated into a meta-analysis of the existing research in this area which
summarizes the results of a large number of other listening tests and simulations. The
conclusions drawn from this meta-analysis are then used to assess the validity of the
various spatial strategies adopted by composers of spatial music such as Charles Ives,
Karlheinz Stockhausen, Iannis Xenakis, Denis Smalley and Pierre Boulez. Finally,
this research was utilized in the composition of a number of original works of spatial
music, the development of a spatial music instrument and an implementation of the
Boids flocking algorithm for spatial music composition for the Csound synthesis
language.
This particular methodology was adopted so as to ensure that the real technical
and perceptual limitations of the practical means of spatialization are fully considered.
This emphasis on the limitations of these systems is perhaps somewhat negative, yet
if space is to be used effectively as a musical parameter then these practical issues
must be fully appreciated. This is particularly true in the case of spatial music
performances as while a particular technique may be effective for a single listener, it
may be much less effective for a distributed audience.
1.5 Motivation
The use of space in electroacoustic music composition and performance is
perhaps one of the most unique aspects of this artistic movement. However, the
dissemination of this music via fixed media and domestic audio technology is still a
significant challenge. This problem has been further exacerbated by the drastic
decline in sales of fixed media music and the increase in online distribution. Yet
despite these difficulties, the publics appetite for live musical performances is
undiminished, and in fact has significantly expanded over the last decade. The
musical and social experience of attending a concert is fundamentally different from
listening to music on an mp3 player, on the internet, or at home, and this is
particularly true of live performances of spatial electroacoustic music. The
experience of listening to music performed with a large and carefully configured
loudspeaker array or loudspeaker orchestra provides a unique selling point and an
experience which cannot be easily reproduced elsewhere. This aspect of
electroacoustic music is the primary motivation for this thesis, which concentrates
9
1.6 Outline
This thesis is broadly divided into two parts. The first part deals with auditory
perception and the various perceptual mechanisms related to directional hearing. This
provides a perceptual basis for an assessment of various spatialization techniques in
terms of their perceptual performance in a performance context. The second half of
this thesis examines the use of space as a musical parameter through the analysis of
various works of spatial music. In each case, the compositional and spatialization
strategy is assessed in terms of its perceptual effectiveness, based upon the findings
presented in the first half of this thesis.
Chapter Two summarizes the perceptual cues associated with spatial hearing.
The perceptual mechanisms which allow a listener to determine the direction and
distance of a source signal are presented and the effect of acoustic reflections and
reverberance on spatial hearing are discussed. Finally the perception of multiple,
concurrent sources is discussed in terms of Bregmans theory of auditory scene
analysis (ASA).
Chapter Three examines the technique of stereophony which for the first time
allowed for the creation of virtual sources that are not fixed at the physical location of
a loudspeaker. The development of this technique is presented from its conception in
the 1930s, to modern, multichannel formats such as 5.1. The perceptual basis of the
stereophonic principle is also investigated along with various theories of stereophonic
localization.
Chapter Four introduces more recently developed spatialization techniques
such as Ambisonics and wavefield synthesis (WFS). The development of
Ambisonics from Alan Blumleins work with coincident microphone techniques is
presented and the perceptual optimization of ambisonic decoders is discussed. The
final part of this chapter addresses the theoretical background of the new technique of
WFS and some of the practical issues related to this technically demanding method
are discussed.
10
11
Stockhausen and Boulez are examined in terms of their technical and artistic approach
and the specific difficulties associated with this form of spatial music are discussed.
In Chapter Ten, a number of original works of acoustic, electronic and mixedmedia spatial music are presented and analyzed. An original work of choral spatial
music is discussed in terms of the spatial choral music discussed previously in
Chapter Seven. Two works of electronic spatial music are presented which illustrate
the divergent approaches to electronic music composition discussed in Chapter Eight.
Finally an original mixed-media composition by the author is analyzed and the
spatialization approach adopted for this work is evaluated.
Chapter Eleven examines various musical instruments which can be used for
the live performance of spatial music. Various augmented instruments such as the
hypercello are introduced, and the use of the hexaphonic guitar as a spatial music
instrument is discussed. Finally the technical implementation of a hexaphonic system
is presented along with an original electroacoustic composition for the hexaphonic
guitar.
Chapter Twelve discusses the use of flocking algorithms such as Boids for
sound spatialization and synthesis. Real-time and off-line applications are evaluated
along with two original compositions which illustrate these different approaches to
the use of flocking algorithms in electroacoustic composition.
12
2 Spatial Hearing
Traditional musical parameters such as pitch, rhythm and timbre are perceived
with a relatively high degree of accuracy. Various studies have shown that a change
in pitch of a fraction of a semitone is quite perceptible and our ability to temporally
segregate an audio signal is similarly precise. The cross-modal perception of spatial
locations is also relatively accurate but is reduced significantly when the visual
element is removed. Parameters such as pitch, timbre and rhythm are often directly
related to the physical structure of the instrument or the actions of the musician.
However, the use of space in music often relies on electronic processes which can
only simulate the effect of spatial movement. The ability of these processes to satisfy
the various perceptual mechanisms involved would appear to be crucial if this aspect
of the work is to be successful. However, before these spatialization processes can be
assessed it is first necessary to understand the various perceptual mechanisms
involved in normal spatial hearing. By necessity, the mechanisms which allow the
location of a real sound to be determined must be first understood, before the illusion
of sounds moving in space can be created.
In this thesis, spatial locations will be described using the spherical coordinate
system illustrated in Figure 2.1 [Blauert, 1997]. The azimuth angle indicates the
angular position in the horizontal plane (with zero degrees being straight ahead), the
elevation indicates the vertical angle of incidence in the median plane, and the
distance is measured from a point at the centre of the head, directly to the source.
13
The study of spatial hearing can be broadly divided into two categories, which
typically focus on either certain characteristics of the source signal or of the acoustic
environment. A typical audio scene may contain sources that are relatively discrete
and localizable (localization refers to the perceived direction and distance of the
source signal), however, the same scene may also contain reflections which are
diffuse and not easily localizable and provide information about the acoustic
environment. This chapter will begin with the more straightforward scenario of a
single source under free-field conditions. This implies that the contribution of
reflections of the source signal from nearby surfaces is negligible and that the
influence of the acoustic environment can be ignored. While this rarely occurs under
normal circumstances, outdoor locations such as an open field or mountain top can be
considered as approximately equivalent to free-field conditions.
The acoustic environment containing both the source and the listener
It is also reasonable to assume that these three factors must interact in some fashion to
produce an impression in the listener of the spatial location of the source signal.
Consider now a simple example of a laterally displaced sound source
positioned in front of a single listener as shown in Figure 2.2. For now the discussion
is limited to the horizontal plane and free-field conditions. As the signal produced by
the source spreads out and progresses toward the listener the wavefront will first
arrive at the right ear, before then diffracting around the head to reach the left ear.
14
This ability to hear binaurally, i.e. with two ears, is an important aspect of the human
auditory system. Localization mechanisms are often distinguished as being either
interaural, i.e. related to the differences between the signals at the two ears, or
monaural which is related to attributes of the signal that are perceived equally with
both ears.
The preceding example illustrates how a laterally displaced source will result
in a time delay between the two signals arriving at the ears. This interaural time delay
(ITD) is one of the principal cues used to determine the source azimuth and an
extensive amount of experimental work has been carried out to examine its influence
on source localization. Various experiments have shown that both the spectral
content of the source signal and the attack portion of the signal envelope can produce
ITD cues, depending on the context [Blauert, 1997].
If the source in the preceding example consisted of a continuous signal then
the interaural delay will result in a phase shift between the two ear signals. This
localization cue will be referred to here as an interaural phase shift (IPD) to
distinguish it from the more general ITD cue. It should be noted that this phase shift
can only be determined by comparing the relative phase of the two ear signals. As the
frequency of the source signal increases it eventually produces a phase shift between
the two signals that is greater than 1800. At this point the IPD cue becomes a less
15
thought to be resolved using both head movements and monaural localization cues
created by the filtering effect of the shoulders and ears.
The complex shape of the ear pinnae acts as an acoustic filter which
introduces time and level differences between the individual spectral components of
each ear input signal. As the shape of the upper body and pinnae is irregular and
highly idiosyncratic, this filtering effect would change relative to the spatial location
of the source signal. The total filtering effect of the pinnae, head and shoulders is
often measured and characterized as the Head Related Transfer Function (HRTF). As
the ear pinnae are orientated toward the front, the two source signals in the preceding
example would be filtered in different ways, allowing the listener to distinguish
between the two source locations. It is still not entirely clear as to how the brain
distinguishes between the spectral content of the source signal and spectral changes
introduced by the HRTF. This could be achieved using slight movements of the head,
as this would introduce changes in the HRTF independently of the source signal and,
indeed, numerous experiments have shown that head movements allow listeners to
determine whether a source is positioned in front or behind or above or below
[Blauert, 1997; Spikofski, 2001]. Rotating the head toward the source will alter the
various interaural cues in such as way as to resolve the cone of confusion and the ITD
and ILD will decrease as the head turns towards a source positioned in front but
17
increase for a source to the rear. In the same way, head movements will result in a
change in the filtering effect of the head and pinnae which could also help to resolve
the cone of confusion. Of course, head movements can only be effective if the signal
duration is long enough to allow for their effect on the different cues to be evaluated.
In a number of experiments with narrowband sinusoidal signals, certain
frequencies were found to correlate with specific angles of elevation, independently of
the actual source position [Blauert, 1997]. This suggests that HRTF cues not only
help to resolve the cone of confusion in the horizontal plane, but also contribute to the
localization of elevated sources and, in fact, HRTF cues have been shown to be
crucial in determining the position of a source anywhere in the median plane. Studies
suggest that spectral cues in both low (below 2kHz) and high frequency (above 5kHz)
regions are used to resolve front-back confusion while the prominent spectral cues for
the judgement of elevation are derived solely from the high frequency components
(above 5kHz) [Asano et al, 1990].
Fig. 2.4 Impulse response of a room with a relatively short reverb time
18
music is heard. In an enclosed space, a sounding object will produce a wavefront that
will travel directly to the ears and also indirectly via multiple reflections from the
floor, walls and ceiling of the space. This indirect signal is generally divided into
early reflections which arrive relatively shortly after the direct sound, and the later
arriving, more diffuse reflections or reverberance (see Figure 2.4 [Wiggens, 2004]).
Despite the presence of these additional wavefronts arriving from multiple different
directions it is still quite possible to accurately localize a sound in an enclosed space.
[Bates et al, 2007a] The law of the first wavefront, or precedence effect is thought to
account for this ability and as shall be seen later, is an important consideration in
multichannel loudspeaker systems.
19
the room reverberation time, but may depend on the specific room geometry which
can result in significant early reflections [Hartmann, 1983].
Early reflections and reverberation also has a significant effect on both the
perceived size of the source, and the spatial impression of the acoustic environment.
The term spaciousness has been used to refer to both these effects, and other terms
such as spatial impression, ambiance, apparent source width, immersion and
envelopment are also frequently used, sometimes interchangeably, and sometimes
with more specific definitions. A strong correlation has been found between the
degree of coherence between the two ear signals and the lateral component of the ear
signals, and this is though to influence the spatial impression of both the source and
environment [Blauert, 1997; Plenge et al, 1975]. Early studies of spaciousness
described this characteristic in terms of the lateral and frontal components of the ear
signals [Barron et al, 1981]. However, what is meant by spaciousness is different
depending on whether this lateral component is derived from early reflections alone,
or from both early reflections and later reverberation. The addition of lateral early
reflections results in a change in the perceived size of the source and the apparent
source width (ASW) is generally used to refer to this source-specific measure
[Beranek, 1996]. Early reflected energy arriving within approximately 80 ms of the
direct sound results in an increased ASW and the extent of this broadening effect
depends upon the ratio of the total energy to the energy of the lateral component
[Barron et al, 1981]. Blauert introduced the term locatedness as a measure of the
degree to which an auditory event can be said to be clearly in a particular location
[Blauert, 1997] and this is clearly related to ASW. Localization refers only to the
perceived direction and does not therefore directly relate to the locatedness or ASW,
although clearly the source direction will be difficult to precisely determine in the
case of a very large ASW. The important distinction between these two measures will
be discussed in more detail in Chapter Six.
Later arriving reverberation alters the spatial impression in a different way
which is primarily related to the acoustic environment. The term spaciousness is
often used in this case, as opposed to ASW which is primarily related to the spatial
impression of the source. The terms spaciousness and envelopment are often
considered equivalent, although occasionally slightly different definitions are given.
When distinguished, spaciousness is described as the sense of open space in which the
20
source is located while envelopment refers to the sense of immersion and involvement
in a reverberant sound field which fully surrounds the listener [Rumsey, 2001].
Numerous studies have shown that envelopment and spaciousness are
generally desirable qualities in a concert hall [Schroeder et al, 1974; Barron et al,
1981; Beranek, 1996] and many concert hall designs have attempted to increase the
amount of lateral reflected energy directed to the seating area for this reason. The
acoustician David Griesinger has suggested that the importance of the distinction
between early arriving lateral reflections and later reverberation is often overlooked in
this context [Griesinger, 2009]. It has already been shown that early arriving
reflections reduce localization accuracy and Griesinger suggests that an increase in
lateral energy of this sort will negatively impact clarity. This clearly contradicts
previous studies which have stressed the importance of ASW. Griesinger suggests
that the direct sound and later lateral reverberation should be emphasized to improve
clarity (which corresponds to improved localization) and also spaciousness (meaning
the desirable spatial characteristics of the hall).
Rumsey has similarly pointed out the difference between acoustic research,
which suggests that large values of ASW are preferable, and studies of spatialization
techniques which emphasize localization accuracy [Rumsey, 1998]. Griesinger has
noted a similar contradiction between the differing levels of reverberation typically
found in performances and recordings of classical music [Griesinger, 2009]. A clear
distinction can be made between the three-dimensional spatial experience of a live
instrumental performance in a concert hall and a two-channel stereo reproduction
which must reproduce both the direct and reflected sound from the front. Improved
localization accuracy is perhaps desirable in the latter case in order to distinguish
sources in the foreground from background reverberation [Rumsey, 1998]. However,
the situation is much more complicated in the case of electronic spatial music
performances which often utilize multi-channel loudspeaker arrays and electronic
spatialization techniques within a concert hall. The distinction between ASW,
spaciousness and envelopment introduced earlier may also be hard to maintain in this
situation, as a multichannel reproduction of a single source from multiple positions
around the audience will provide some sense of envelopment, but by the direct sound
and not the reverberant field. In this situation, the sense of spaciousness may be
associated with the source rather than the acoustic environment, and the distinction
21
between these terms becomes harder to define. In addition, it is hard to predict the
effect of a lateral source on perceptual aspects such as ASW or spaciousness.
The preceding discussion indicates the significant effect of acoustic reflections
on the perception of an auditory source. In addition, reverberation provides a
significant amount of information regarding the size and composition of the
environment within which the source is situated. The degree of attenuation of
acoustic reflections provides an indication of the nature of the reflecting surfaces
while the time and duration of the diffuse late reverberation can indicate the
dimensions of the space. Room reflections and reverberation can also provide
information on another highly important aspect of spatial hearing, namely, the
distance of the source from the listener and this will be examined in more detail in the
next section
22
It has long been known that the perception of distance is also influenced by the
effect of acoustic reflections in the listening environment. The level of the direct and
early reflected sound will change substantially as the distance from the source to the
listener changes. However, the level of diffuse reverberation is largely independent of
the position of the listener in the room. Therefore, as the source distance increases,
the direct sound will decrease while the reverberant sound remains constant. Beyond
a certain distance, the reverberant signal level will be greater than the direct signal
level, and the perceived distance becomes fixed and independent of the actual source
distance. This critical distance is indicated in Figure 2.5 [Howard et al, 1996].
Various studies have shown that in reverberant rooms, the perceived distance of a real
source is independent of the source level [Nielsen, 1993], which suggests that the
ratio between the direct and reverberant signals, the D/R ratio, is a significant distance
cue in real rooms. This theory was first proposed in the 1960s and this simple ratio is
still commonly used by sound engineers and producers to control the depth of
different sources in two-channel stereo mixes.
The D/R ratio can provide a relative sense of distance but this simple ratio
ignores the fine spatial and temporal structure of the reflected indirect signals.
Experiments by Kendall reported that a strong impression of distance was perceived
when listening to dry test signals augmented solely with a limited number of artificial
early reflections, even when these reflections were restricted to those which followed
the direct signal by 33ms or less [Kendall et al, 1984]. Michelsen carried out a
23
similar test which also found that a better distinction of distance was achieved when
simulated early reflections were added instead of solely diffuse reverberation
[Michelsen et al, 1997]. Neher investigated the perceptual effect of different early
reflection patterns and found that listeners were unable to distinguish between an
early reflection pattern comprised of accurately panned reflections, and one that was
physically identical except that each reflection was simply reproduced by the nearest
available loudspeaker. This suggests that although spatial differences in the early
reflections pattern are perceptually salient, the actual angles of incidence of
reflections may not be crucial [Neher, 2004].
Michael Gerzon presented a similar model of distance hearing, based on a
theory originally proposed by Peter Craven [Gerzon, 1992b]. The Craven hypothesis
assumes that the apparent distance of sounds is derived from the relationship between
the relative time delay and amplitude of the early reflections and the direct signal.
Gerzon and others have suggested that closely-spaced or coincident microphone
techniques having a substantially omnidirectional total energy response will
reproduce the absolute source distance better than microphones with a more
directional response [Gerzon, 1992b; Theile, 1991]. Gerzon also points out that
although it is now known that the simple direct/reverberant ratio does not provide an
absolute measure of distance, it is still a useful subsidiary cue for relative distance,
and is thus preferably made consistent with the apparent distance [Gerzon, 1992b].
In general it has been found that the perceived distance of a sound source in a
room is compressed, as it increases virtually linearly with source distance at short
range, but converges to a certain limit when the source distance is increased beyond
the critical distance [Mershon et al, 1975; Nielsen, 1993]. There is therefore a nonlinear relationship between the perceived and actual source distance. Bronkhurst
suggests that this non-linearity arises not only due to the different mechanisms
involved for different source distances, but also because the auditory system is not
always able to accurately separate the direct and reflected signals [Bronkhurst, 2002].
In a number of listening tests, the effect of early reflections on perceived source
distance was assessed in terms of the angle of incidence of the reflected sound. When
the lateral walls were made completely absorbent, the source was perceived to be
close to the head, virtually independently of the actual source distance. When lateral
reflections were introduced, the perceived distance more closely matched the actual
source distance, although larger distances were still underestimated [Bronkhurst,
24
2002]. This suggests that the direct to reverberant ratio is estimated by the auditory
system using directional binaural cues to separate the direct and reverberant signals,
although further tests are needed to confirm the validity of this hypothesis.
While clearly the amplitude of the source signal and the specific relationship
between the direct and indirect signals have been shown to be dominant cues in
distance perception, other secondary cues also provide some indication of relative
distance. In general, an increase in source distance results in a reduction of the high
frequency spectral content of the source signal due to the effect of air absorption.
This occurs at large distances outdoors, but also in rooms due the absorptive nature of
the boundary surfaces and the large overall distances travelled by the indirect signals
as they reflect around the room.
The perceived shift in frequency due to the movement of a source relative to
the listener, i.e. the Doppler effect, also provides an indication of the relative motion
of the source.
25
the perceived size of the source, which can result in a corresponding decrease in
localization accuracy.
26
ear signals, dividing it into separate streams, while at the same time longer schema
processes interpret each stream as it changes in terms of prior experience and
expectation of its future state. Interestingly, a similar two stage process is thought to
operate for the visual senses, as first uncovered in the 1959 paper "Receptive fields of
single neurons in the cat's striate cortex," by Hubel and Wiesel [Hubel et al, 1959].
Further research in machine vision has suggested that when the visual input enters the
brain from the eyes, it is immediately sent through two separate neurological
pathways [Goodale et al, 1992]. The fast path quickly transmits a rough, blurred
outline of the image to the frontal cortex while the second path performs a slower (the
slow image arrives in the prefrontal cortex about 50ms after the fast image) analysis
of the image, using prior experience and knowledge to fill out and refine the rather
crude initial impression.
It has been shown that spatial auditory cues are not a dominant factor in
determining the number of competing sound sources [Bregman, 1990]. However,
other studies have shown that spatial hearing is highly important for the intelligibility
of multiple, simultaneously presented speech signals [Shinn-Cunningham, 2003; Best,
2004] and that our ability to segregate an audio scene into multiple streams strongly
influences our perception of fundamental musical constructs such as melody and
rhythm [Bregman, 1990]. The concept of spatial masking, in which a listeners ability
to detect and understand the content of multiple signals is improved when the signals
are spatially separated is highly important in spatial music. As shall be seen later in
Chapter Seven, composers such as Charles Ives, Henry Brant and Karlheinz
Stockhausen regularly used multiple, spatially separated musicians to reinforce the
inherent polyphony in the musical work. Of course spatial masking can also be used
in the opposite way, to deliberately undermine the individuality of each source in an
attempt to create a single sound mass using multiple musicians positioned together on
stage. Works such as Atmospheres by Gyorgy Ligeti or Metastasis by Iannis Xenakis
are clear examples of orchestral works which deliberately utilise spatial masking in
this way. These two different uses of spatial masking can be further expanded in
purely electronic works, where the various spectral components of a signal can be
heightened through the dynamic spatial separation of the various components. The
electroacoustic composer Natasha Barrett illustrates this process with a number of
audio examples in a paper on spatio-musical composition strategies [Barrett, 1990].
In the first example, eight continuous sounds with a significant degree of temporal
27
and textural overlap are played simultaneously at static points in space while in the
second example each sound is dynamically moved in space. Barrett suggests that
only five distinct sources are perceivable when positioned at static spatial locations
but that this number increases when each source is provided with individual and
dynamic spatial trajectories.
28
reported that a cyclical lateral movement must take at least 172ms to occur if the
trajectory is to be accurately followed while 233ms is required for a front-rear
movement. Sounds following a circular path around a listener that exceed these limits
will at first appear to oscillate from left and right before stabilising at a central
position at even faster speeds [Blauert, 1997]. This effect was deliberately employed
by the composer Karlheinz Stockhausen in his eight-channel composition Cosmic
Pulses [Sonoloco, 2007]. The technical implementation of this work was carried out
by Joachim Haas and Gregorio Garcia Karman at the Experimental studio in Freiburg
between December 2006 and April 2007. Karman describes the perceptual effect of
the OKTEG (Oktophonic effect generator) system developed for this piece to
implement very high speed rotational effects, as follows;
Like in the Rotationsmhle a device used in the spherical auditorium at the
Worlds Fair in Osaka and later implemented as output stage of the Klangwandler the OKTEG provides the performer with manual control of rotation velocity, and
different routings are accomplished by means of matrix programs. The
Rotationstisch, first used as a spatialization instrument in KONTAKTE, was later
further developed for exploring the artefacts, which appeared at very high rotation
speeds. Following this idea, the OKTEG provides with sample accurate trajectories
and arbitrary high rotation speeds, assisting the exploration of a continuum linking
space and timbre. When sound trajectories get close to the upper velocity range of
16 rot/sec in the composition of COSMIC PULSES, the perception of movement is
gradually transformed into a diffuse and vibrating spatial quality. Higher rotation
frequencies manifest themselves as audible modulation effects. [Sonoloco, 2007]
The acoustic environment containing both the source and the listener
The preceding chapter summarized the effect of these parameters on our perception of
the direction and location of a real sounding object. The composition of spatial music
obviously involves the direct manipulation of the first parameter, however, the latter
two parameters can only be dealt with indirectly, if at all. Some spatial music,
particularly acoustic music for distributed musicians, can be understood relatively
simply in terms of the localization mechanisms discussed in this chapter. However,
most electronic spatial music does not confine itself to multiple independent signals
29
The first approach of manipulating either phase/time or more usually level differences
between pairs of loudspeakers is often referred to as stereophony. The production of
ITD and ILD cues through the manipulation of these factors can be achieved both
acoustically through the use of different microphone arrays, and through electronic
processing techniques such as amplitude panning. Stereophony originally referred to
any method of reproducing a sound field using a number of loudspeakers, but is now
generally used to refer specifically to techniques based on the manipulation of level
and/or time differences in pairs or multiple pairs of loudspeakers, such as in twochannel stereo and 5.1 surround sound.
Ambisonics and Wavefield Synthesis are two techniques which attempt to
reconstruct a sound field within a listening area using loudspeaker arrays.
Ambisonics is a complete set of techniques for recording, manipulating and
synthesizing artificial sound fields [Malham et al, 1995] which has been regularly
used in spatial music and theatre for the past three decades. While never a
commercial success, Ambisonics has proved enduringly popular for spatial music
presentations for various reasons, such as its independence from a specific
loudspeaker configuration and its elegant theoretical construction which is based on
spherical harmonics.
30
31
3 Stereophony
Fig. 3.1 Bell Labs stereophony, proposed (left) and implemented (right)
The earliest work on stereophony was carried out independently by both Bell
Laboratories in the United States and Alan Blumlein at EMI in the UK in the early
nineteen thirties. The approach adopted by Bell labs was based on the concept of an
acoustic curtain [Steinberg et al, 1934], namely that a sound source recorded by a
large number of equally spaced microphones could then be reproduced using a
matching curtain of loudspeakers (Figure 3.1 left). In theory, the source wavefront is
sampled by the microphone array and then reconstructed using the loudspeaker array.
In practice, this approach had to use a reduced number of channels, so a system was
developed using three matching spaced omni-directional microphones and three
loudspeakers placed in a front-left, centre and front-right arrangement (Figure 3.1b).
This approach was problematic however, as the reduction in channels distorted the
wavefront and audible echoes sometimes occurred due to the phenomenon of spatial
aliasing (see Section 3.2.2). Spaced microphone techniques such as this capture the
different onset arrival times of high frequency transients, and so capture the ITD
localization cues present in the original signal. However, this also makes it difficult
to process the audio afterward as unpredictable time differences are fixed in the
recording.
32
extent resembles natural hearing, as it produces IPD and ILD cues in the frequency
ranges at which these localization cues are most effective. It therefore uses the
unavoidable cross-talk between the loudspeakers as an advantage, as this cross-talk
produces an IPD which is related to the original source direction. Critics of this
approach of summing localization argue that level differences alone cannot produce
the ITD cues necessary for correct localization of onset transients [Thiele, 1980].
However, subjective listening tests have shown that this is not the case and that
transients can be clearly localized in Blumlein stereo recordings [Rumsey, 2001]. In
addition, this approach allows for the post-processing of the stereo image by adjusting
the combination of the two microphone signals. More recently alternative
microphone arrangements such as ORTF or the Decca tree have been developed
which represent a trade-off between the two approaches and reduce the conflicting
ITD cues that arise for transient and steady-state signals with purely coincident
techniques.
34
position is highly dependent on the position and orientation of the listener [Rumsey,
2001]. Amplitude panning introduces level differences by simply weighting the
signal routed to each loudspeaker and this technique is quite effective when used with
a symmetrical pair of loudspeakers in front of a single, centrally positioned listener,
with an optimal separation angle of 30o. Amplitude panning can be considered as a
simplification of Blumleins coincident microphone technique shown in Fig. 3.2.
With this arrangement, a signal in the front left quadrant will arrive at the maximum
of the blue microphone response characteristic and at the null point of the red
microphone. Amplitude panning simplifies this idea so that a signal panned hard left
will only be produced by the left-most loudspeaker, and vice versa, while a signal
panned to the centre will be created as a phantom image by both loudspeakers. This
has the result that a slight yet perceptible change in timbre occurs when a signal is
panned from a loudspeaker position to a point in between.
The specific implementation of stereophony for two loudspeakers, i.e. twochannel stereo, is by far the most commonly used audio format in the world today.
However, as this format only utilises a pair of front loudspeakers, it must necessarily
reproduce both the direct source signal and reverberation from the front. One of the
earliest formal extensions of this method to more than two channels is the
Quadraphonic system, which is summarized in the next section.
35
Fig. 3.4 Lateral phantom source direction versus ILD for a quadraphonic layout
36
37
While 5.1 surround can be highly effective when used for its intended purpose
as support for a frontal visual image, it is much less suitable for the presentation of
spatial music to a distributed audience. The front three loudspeakers allow for
accurate frontal images to be created in this direction but the problems with lateral
and rear virtual images discussed in the previous section are magnified. Of course the
commercial attractiveness of 5.1 cannot be understated. The means of production and
delivery are by now well-established and it is unlikely that any new format will
replace it in the near future. Some research has reported good results using 5.1 as a
delivery format for playback with an alternative loudspeaker arrangement and
encoding such as Ambisonics [Gerzon, 1992c; Bamford, 1995; Wiggins, 2004].
38
Moore, 1983] which will be discussed later in this Chapter. The supplied output
module can be configured for reproduction using standard two-channel stereo,
discrete intensity panning over various multichannel loudspeaker configurations (up
to a maximum of eight), Ambisonics B-format encoding, or binaural encoding for
reproduction over headphones.
39
40
would produce an effective listening area suitable for a single listener, while 400Hz
[Gerzon, 1992a] would be suitable for a domestic listening situation with
approximately six listeners.
Above this cross-over frequency, the decoder emphasizes the ILD localization
cues which arise due to the directional behaviour of the energy field around the
listener. It can be shown mathematically that it is only possible to recreate the energy
field of a real sound source using a small number of loudspeakers, if the sound
happens to be at the position of one of the loudspeakers. Therefore, at mid and high
frequencies, not all of the ears localization mechanisms can be satisfied in a practical
reproduction system. The direction of the energy localization vector can, however, be
adjusted so it matches the velocity localization vector (E = V) for all frequencies up
to 4kHz. This is similar to the stereophonic approach recommended by Clark (see
Section 3.4) [Clark et al, 1958]. In addition, Gerzons design optimizes RE in all
directions, which necessarily compromises localization in the directions of the
loudspeakers in favour of making the quality of the localization uniform in all
directions [Benjamin et al, 2006]. This effectively eliminates the timbral change
which occurs with amplitude panning as the signal moves from a position at a
loudspeaker to one in between loudspeakers. This also means, however, that the
localization of sources positioned at a loudspeaker will be less than optimal. In
summary, therefore, Gerzon recommends that the following optimizations be
implemented when designing an ambisonic decoder:
-
The velocity and energy vector directions are the same up to 4kHz (E = V)
At low frequencies, the magnitude of the velocity vector should be near unity
for all directions (rV = 1)
42
4.1 Ambisonics
Ambisonics was developed by Michael Gerzon, Peter Fellgett and others in the
1970s as an alternative spatialization technique for surround and periphonic
loudspeaker arrangements [Gerzon, 1974a; Fellgett, 1975]. The basic approach taken
by Ambisonics is described by one of its author as follows.
For each possible position of a sound in space, for each possible direction and for
each possible distance away from the listener, assign a particular way of storing the
sound on the available channels. Different sound positions correspond to the stored
sound having different relative phases and amplitudes on the various channels. To
reproduce the sound, first decide on a layout of loudspeakers around the listener, and
then choose what combinations of the recorded information channels, with what
phases and amplitudes, are to be fed to each speaker. The apparatus that converts
the information channels to speaker feed signals is called a decoder, and must be
designed to ensure the best subjective approximation to the effect of the original
sound field [Gerzon, 1974b].
This approach differs from Blumlein stereophony in a number of ways. Firstly, the
initial encoding stage is removed from the eventual playback system, its sole aim
being to capture as much information about the sound scene as possible using a
certain number of channels. The decoding stage can now use the recorded spatial
information to determine the appropriate loudspeaker signals that will recreate this
spatial scene. Furthermore, as discussed in the last section, this decoding stage can be
psychacoustically optimized so that as many localization mechanisms as possible are
satisfied [Gerzon, 1992a]. The technical means by which this system was realised
built on the work by Blumlein with coincident microphone techniques. This chapter
will therefore begin with a summary of the Ambisonics system and the associated
encoding and decoding process based on microphone directivity patterns. This will
be followed with a more detailed description of the theoretical basis of the system and
the application of psychoacoustic principles to the decoding process.
43
eight microphones, it will be possible to create almost any response pattern in any
direction. If the four microphones shown in Figure 4.1 are replaced with this new
arrangement, then four virtual response patterns corresponding to the original
loudspeaker arrangement can be derived. However, if the number of loudspeakers is
increased from four to eight, eight new virtual response patterns corresponding to this
new loudspeaker arrangement can also be derived. In fact, with this particular
arrangement of four microphones it is possible to create response patterns that
correspond to whatever loudspeaker arrangement is available. In addition, the
directivity of the response characteristic can be adjusted which is highly important in
the decoding process.
The process just outlined describes in broad terms the basic Ambisonics
encoding method. This particular microphone arrangement is called a Soundfield
microphone and is shown in Fig. 4.3. This apparatus contains the four described
microphone capsules and produces four audio signals which correspond to the four
microphone response patterns. The Ambisonics terminology for this set of four
signals is referred to as A-Format. Obviously it is not possible to have four
microphone capsules occupying the exact same point in space, however, Soundfield
microphones are able to overcome this problem with electronic equalization so that
the output produced is essentially coincident up to 10 kHz [Rumsey, 2001]. These
45
46
47
48
many capsules can be positioned at approximately the same point in space. This topic
is currently the focus of considerable research and various higher order microphone
arrangements are being examined (see Figure 4.6) [Bertet et al, 2009; Moreau et al,
2006]. However, there is no such problem when encoding monophonic recordings in
software and it is in this regard that higher order Ambisonics has proved most useful.
Various software implementations of high order ambisonic encoders and decoders are
now available and this has important consequences for situations involving multiple
listeners.
Fig. 4.5 Second (top) and third (bottom) order spherical harmonics
order systems, and a spherical harmonic formulation for the higher order channels in a
form which is consistent with B-format systems, has not yet been agreed upon
[Malham, 1999].
W
X
Y
Z
R
S
T
U
V
K
L
M
N
O
P
Q
where
=
(1 / 2)
=
cos(A) * cos(E)
=
cos(E) * sin(A)
=
sin(E)
=
(3*sin2(E)-1) / 2
=
cos(A) * sin(2E)
=
sin(A) * sin(2E)
=
cos2 (E) * cos(2A)
=
cos2 (E) * sin(2A)
=
sin(E) * (5*sin2(E)-3)/2
=
(135/256) * (cos(E)*cos(A)*(5*sin2(E)-1))
=
(135/256) * (cos(E)*sin(A)*(5*sin2(E)-1))
=
(27/4) *(cos(2*A)*sin(E)*cos2(E))
=
(27/4) *(sin(E)*sin(2*A)*cos2(E))
=
cos(3*A)*cos3(E)
=
sin(3*A)*cos3(E)
A = source azimuth &
E = source elevation
Table 4.1 Furse-Malham set of encoding coefficients
50
in existing ambisonic hardware such as the Soundfield microphone. The FMH set of
coefficients for encoding a signal into third order Ambisonics is shown in Table 4.1.
Furse
Malham
Criteria
N3D N2D
Hybrid
W
SN3D
No
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
No
No
No
No
No
Yes
Yes
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
No
No
No
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
No
Yes
No
Yes
Greatest dynamic range when the major sources are in/near the
horizontal plane
No
No
Yes
No
No
Greatest dynamic range when the major sources are distributed over
the sphere
No
Yes
No
No
No
The deviation of the FMH set from a strict mathematical spherical harmonic
formulation does, however, have some negative consequences. Firstly, while
decoding equations can be readily obtained up to third order, this is much more
difficult at higher orders. In addition, the calculation of matrices for tilt, tumble and
dominance effects has proven to be extremely difficult, and, the alphabetical naming
scheme is unpractical above third order. As it is much easier to design and implement
Ambisonics systems based on more mathematically pure formulations of the basic
equations, particularly at higher orders, a number of different formats have been
proposed. Efforts are being made by the Ambisonics community to standardize these
different formats and an analysis of the advantages and disadvantages of the
51
These formulations of the first order decoding equations differ in a number of ways.
The first difference relates to how the 3dB gain adjustment (factor of 2) is
distributed between the zero and first order components. Similarly, the gain
adjustments which are used to alter the directivity of the response characteristic can
be applied in a number of ways, as shown in the second two equations. Interestingly,
only the first equation includes a weighting factor due to the number of loudspeakers,
52
so if the second two equations are used the perceived volume will increase as more
loudspeakers are added. The first approach has some potential benefits as, if the
volume is automatically reduced when more loudspeakers are added, as in the first
equation, then, the signal to noise ratio (or the bit resolution in a digital
implementation) will also be reduced.
Fig. 4.7 Ambisonic plane wave - theoretical (left) and real (right) sources
Decoder Titles /
Descriptions
Approx. optimum
frequency ranges
Response
Characteristic
Effective Listening
Range
Strict,
Matched,
Velocity,
max-rV,
Furse-Malham,
Systematic,
Ideal
< 700 Hz
Hypercardioid, strong
anti-phase components
Single Listener
Energy,
max-rE,
Rationalised Energy
Decoder,
Rationalised Square
Decode
In-phase,
Controlled opposites,
Cardiod
500Hz - 5kHz
All
Hypercardioid, reduced
anti-phase components
Increased listening area,
still some anti-phase
components
Criteria (1st order)
rV = 1 rE = 0.667
rV = rE = 0.707
nd
Criteria (2 order)
rV = 1 rE = 0.8
rV = rE = 0.866
Criteria (3rd order)
rV = 1 rE = 0.857
rV = rE = 0.924
V = E
V = E
Crieria (all)
Table 4.3 Summary of ambisonic decoding schemes
Cardioid, no anti-phase
components
Large listening area but
reduced localization
accuracy.
rV = 0.5 rE = 0.667
rV = 0.667 rE = 0.8
rV = 0.75 rE = 0.857
V = E
common decoding scheme is the in-phase decode proposed by David Malham for
large areas. Figure 4.8 illustrates the directivity patterns of these different ambisonic
decoding schemes [Daniel, 2000].
It should also be pointed out that there are a minimum number of speakers
needed to successfully reproduce each ambisonic order, and that this number is
always greater than the number of channels available for the decoder [Wiggens,
2004]. The number of loudspeaker required can be calculated from the following
equation, where M is the order of the system, and N is the number of loudspeakers.
N > 2M+1
M > (N+1)
54
55
Plane wave sources can also be achieved with WFS, as shown in the middle
diagram in Figure 4.11. This type of reproduction mimics the effect of very distant
sources and can potentially be used to extend the suitable listening area for standard
two-channel stereo reproduction. Figure 4.12 illustrates how generating the two
stereo signals as plane waves can potentially maintain the angular relationship
between the two wavefronts across a greater area [Franco et al, 2004].
56
57
[Boone et al, 1995] However, this solution does reduce the effective listening area
[Sonke, 2000].
Perhaps the most significant issue with WFS is the problem of spatial aliasing
which arises due to the discrete nature of the loudspeaker array. Essentially, the interloudspeaker spacing places an upper limit on the frequencies which can be accurately
reconstructed with the array. This effect is somewhat analogous to Nyquists
sampling theorem which states that the sampling rate must be twice the highest
frequency to be reproduced. A WFS system effectively samples a virtual wavefront,
and in this case it is the inter-loudspeaker spacing which places an upper limit on the
frequencies which can be accurately sampled and reproduced.
This frequency limit, the so-called aliasing frequency, is also dependent on the
angular position of the source and listener, relative to the array. Figure 4.14 shows a
distant source positioned at 30 o to the left behind the array for two frequencies above
and below the aliasing frequency [Franco et al, 2004]. Above this limit, the original
shape of the wavefront is not reproduced and the perceived direction and spectral
coloration of the source will be highly dependent on the listening position.
The aliasing frequency can obviously be increased by using a greater number
of loudspeakers placed closer together. However, in practical WFS systems this is
often not possible. One solution is to randomize the time-offsets of all of the high
frequency source content in an attempt to reduce the extent of the artefacts which
occur due to spatial aliasing [Start, 1997]. This approach does, however, sacrifice the
spatial accuracy of the sound field at high frequencies and has proven to be quiet
58
Fig. 4.14 A distant source (a) below the aliasing freq & (b) above
59
60
available. One system has been successfully operating in the Ilmenau cinema since
2003 (see Figure 4.16). In addition, the WONDER system, developed by Marije
Baalman for the Linux environment, is a free open-source WFS program developed
specifically for the production of spatial music [Baalman, 2006]. While WFS is
certainly a highly promising technique, the large amount of hardware required for its
implementation will most likely restrict its use to large dedicated venues for spatial
music production, at least for the near future.
WFS is one of the few spatialization techniques which can theoretically
recreate the correct wavefield over an extended listening area and its ability to
simulate virtual sources within the listening area is certainly unique. Other techniques
such as stereophony and Ambisonics can only reproduce sources at the distance of the
loudspeakers, and additional processes must be used to simulate distance. These
processes attempt to simulate the perceptual cues related to distance hearing presented
in Section 2.2, and are discussed in the next Section.
61
While recorded sounds may retain many of these perceptual cues, the same cannot be
said for synthesized sounds. A method of simulating distance through the
manipulation of these perceptual cues was first proposed by John Chowning in 1971.
This system, although originally specified for a Quadraphonic system, can readily be
adapted to other loudspeaker arrangements and directional panning techniques. When
this system was first proposed, the influence of early reflections on distance
perception was not yet fully understood. Chowning's system therefore used the direct
to reverberant signal ratio as the primary distance cue, with some modifications.
proportional to 1/distance, and the local reverberation which is produced in the same
direction as the direct signal and made proportional to 1-(1/distance). Therefore, as
the distance increases, the reverberation becomes increasingly localized,
compensating for the decreasing direct signal level. Chowning suggests that this is a
fair approximation of a real acoustical situation as when the distance of a sound
source increases, the distance to nearby reflecting surface decreases, thereby giving
the reverberation some directional emphasis [Chowning, 1971]. More recent research
has suggested that the directional aspect of the reverberant signal, particularly of the
lateral early reflections, is used by the auditory system to estimate the egocentric
distance to the source [Bronkhurst, 2002; Gerzon, 1992b]. The directional behaviour
of the reverberant signal in Chowning's system is somewhat similar to this situation.
However, as the precise spatio-temporal pattern of the reflected signal is not
recreated, it is likely that this system will not provide an absolute sense of distance.
The advantage of this system lies primarily in its efficiency and Chowning went on to
develop a real-time digital implementation of the system using a standard
Quadraphonic loudspeaker array. The inclusion of the Doppler effect and other
secondary cues such as high frequency air absorption, when combined with
independent control of the direct and reverberant signals resulted in an effective
simulation of movement which was highly sophisticated for its time and indeed this
approach is still widely used today, particularly for real-time applications.
Chowning illustrated this new technique, along with a number of FM synthesis
algorithms (also developed by Chowning) in the landmark work, Turenas, which was
completed in 1972.
A significant increase in computational processing power was required if
multiple discrete reflections, as well as the diffuse global reverberation, were to be
accurately simulated. Indeed, it was more than a decade after Chowning's 1971 paper
before such a spatialization system was proposed.
63
[Blauert, 1997; Gardner, 1962]. Moores model represents the performance and
illusory acoustic spaces as two rooms, namely;
-
The illusory acoustic space or outer room from which sounds emanate.
Fig. 5.1 Moores spatial model (a) direct signal (b) reflected signal paths
64
Absorption due to collision with the outer walls of the inner room (these are
modelled as being completely absorptive)
The various attenuation factors due to reflections and absorption are derived from
measurements of real acoustic spaces and surfaces. A reverberation algorithm based
on the overall size and shape of the outer room and the reflective properties of its
walls is used to generate the diffuse global reverberation. Multiple delay lines are
used to model the transmission times of the direct and reflected signals from the
source to the listener. A change in source distance will therefore result in a change in
the amount of delay, producing a pitch shift which is very similar to the Doppler
Effect.
Real-time implementations of Moores model were later developed [Ballan et
al, 1994] including a version for the Max/MSP environment [Yadegari et al]. The
simplifications introduced in order to produce a real-time configuration included the
down-sampling of source paths, interpolating delay times, and improvements in the
inner room ray intersection detection algorithm [Yadegari et al, 2002]. The real-time
modular spatial-sound-processing system Spat (discussed previously in Section 3.1.3)
is also based on the algorithms developed by Chowning and Moore [Jot, 1999]. More
recent adaptions of this model for the Csound processing language have also been
developed [Hofmann, 2008] and will be discussed later in Chapter Twelve
65
space. Moore argues that although each listener will hear something different, all
listeners will in general perceive different perspectives of the same illusion. He
states;
Depending on proximity to the loudspeakers, each listener will hear these
sounds from a slightly different perspective. Information is presented at each
loudspeaker about all sound sources in the virtual outer room. The
differences in perception among listeners in the inner room are analogous to
perspective distortion. [Moore, 1983]
In effect, Moore assumes that each listener will localize the virtual source to a
different direction, but that the relative source distance and the broad overall
trajectory will still be apparent to each listener.
66
67
techniques and so tend to focus on measurements of the size and location of the
source signal. However, while this distinction serves a reasonable purpose in this
case, it is not clear that such a distinction can be made in the area under discussion in
this thesis. A complex sound scene will typically contain multiple sound objects
which may overlap or even merge depending upon the context. This may result in a
diffuse source which leads to a sense of spaciousness without a clearly perceived
direction. In addition localization measurements only record the perceived source
direction and do not indicate whether the source is sharply defined and focussed or
widely dispersed around a particular direction. Blauerts measure of locatedness
which was discussed earlier in this thesis will be an important measure in this regard
[Blauert, 1997].
A common approach in this area is to restrict the discussion to the ideal
situation of a single, centrally positioned listener in an anechoic or near-anechoic
room. While this approach may provide useful information about the relative
strengths of different techniques it does not address their performance under less than
ideal conditions. For performances of spatial music, it is critical to also assess the
influence of off-centre listener positions and the effect of the acoustic reflections in
the playback room.
69
source velocity which degrades as the velocity increases or decreases beyond the
optimum range [Saberi et al, 1990].
70
chamber using both listening tests and a binaural auditory model which calculated
localization cues from the signals arriving at the ear canals. The results indicated that
the panning direction accurately matched the perceived direction when the centre of
the loudspeaker pair is near the median plane, but degrades as the loudspeaker pair is
displaced laterally. In general, a bias toward the median plane was reported for
sources produced by laterally biased loudspeaker pairs. Similar results were obtained
for a variety of different source signals [Pulkki, 2002]. The results of tests using
vertically orientated loudspeaker pairs suggest that the perception of the elevation of
amplitude panned phantom images varies widely from person to person. Only if a
virtual source is positioned at the same elevation as a loudspeaker can it be reliably
localized in that direction.
Grohn also carried out listening tests to assess the localization of both static
and dynamic sources in an immersive virtual environment [Grhn, 2002]. The VBAP
system was used with fourteen loudspeakers which were placed in a non-symmetrical
arrangement due to the presence of various graphical displays. A variety of source
signals were presented while subjects were asked to indicate the position and
movement of the virtual source using a tracked baton pointer method. As expected,
localization was best when the source was at a loudspeaker position, and worst when
panned between loudspeakers. The trajectory of dynamic sources therefore tended to
bend toward the loudspeaker positions and similar results were reported for all test
signals. In an additional test, it was found that a second distracting stimulus
decreased localization accuracy in line with other similar experiments [Blauert, 1997],
but only if the distracting signal was at least 10-15dB less than the target signal.
6.2.1 Discussion
The results of the tests discussed in the previous Section clearly suggest that a
minimum of six loudspeakers is required to ensure reasonably accurate localization
for a single, central listener. In addition, the quality of lateral sources in quadraphonic
systems decreases significantly as the size of the array is increased to accommodate a
distributed audience (see Section 10.2). Other tests suggest, perhaps unsurprisingly,
that localization accuracy increases as more and more loudspeakers are added [Ballas
et al, 2001].
71
72
various encoded test signals were played back using square, rectangular and
hexagonal arrays in a medium-sized, acoustically treated listening room.
Interestingly, the initial tests were carried out in an ordinary untreated room but were
abandoned as extremely poor localization was achieved. Four different decoding
schemes were examined, a full-band velocity decode, a full-band energy decode, a
full-band in-phase decode, and a dual-band energy/velocity decode based on Gerzons
original design and with a transition frequency of 400Hz. Subjects were free to move
around within the array and were asked to listen for attributes such as directional
accuracy, perspective, timbral changes, artefacts and loudspeaker proximity effects.
Overall, the majority of test subjects preferred the hexagonal array with dual-band
decoding. The square layout was least preferred due to poor lateral imaging and
spectral changes for different source directions. The rectangular layout was found to
work well for frontal sources with rear ambience. In terms of the decoding scheme,
the velocity and in-phase decoders were least preferred for opposing reasons. The
velocity decoder produced uncomfortable in-head imaging and comb-filtering,
probably due to the high frequency anti-phase components which would be present in
a full-band velocity decode. The in-phase decoder on the other hand was judged to be
much too diffuse and reverberant, although comb filtering and artefacts due to listener
movement were eliminated. The full-band energy decoder was judged to provide a
balance between these two extremes and was found to work well at off-centre listener
positions. However, the shelf filter decoder produced more defined sources as it
appeared to pull the various spectral components of the signal to the same perceived
direction. An interesting general finding was that the loudspeaker layout is
significantly more important than the choice of decoder. The results of the initial
failed test would suggest that the acoustics of the listening room are also highly
important.
Guastavino conducted a number of listening tests which compared 1D, 2D and
3D ambisonic presentations [Guastavino et al, 2004]. In an acoustically treated room
containing six symmetrically arranged loudspeakers in a hexagonal formation with
two sets of three loudspeakers arranged above and below. The twenty-seven expert
listeners subjects were first asked to rate various ambient recordings made with a
Soundfield microphone decoded using a full-band in-phase decoding scheme. The
test results show a strong preference for the 2D, hexagonal layout in terms of
naturalness, source distance, envelopment and timbral coloration. The 3D schemes
73
were described as sounding further away, indistinct and less enveloping while the 1D
scheme was found to be the most stable with listener movement. In a second
experiment, a more directive decoding scheme (similar to a max rE scheme) was used
as this provided a better balance between localization accuracy and sensitivity to
listener position [Guastavino et al, 2004]. Similar results were achieved in both tests
and an analysis of the results suggests that the preferred layout, at least in terms of
naturalness, is dependent on the source material (see Figure 6.2 [Guastavino et al,
2004]). The 3D layout appeared to be preferred for indoor environments, while the
2D layout was preferred for outdoor scenes and the 1D scheme for frontal music
scenes.
74
6.3.1 Discussion
The results of these tests confirm Gerzons original proposal in that a dualband decoder which optimizes the velocity and energy vectors is preferred when there
is a single listener. However, when off-centre listener positions are taken into
account, decoders which optimize rV are least preferred due to the significant antiphase components which are required to maximize the velocity component. The inphase decoding scheme eliminates these anti-phase components entirely and so is
very stable across a wide listening area, but is also very diffuse. The max-rE decoder
represents a good compromise between these two extremes, particularly at higher
orders. As with stereophony, it appears that Ambisonics requires a minimum of six
loudspeakers for optimum performance.
Fig. 6.3 Decoder criteria related to the size of the listening area
Daniel proposes that for a given order and distance from the centre of the
array, the max-rE decoding scheme is most suitable [Daniel, 2000]. If the listening
area extends beyond this distance, or to the loudspeaker periphery, then the in-phase
scheme is preferred (see Figure 6.3 [Daniel, 2000]). He proposes a tri-band decoding
75
scheme which applies the basic, max-rE and in-phase decoders in three consecutive
frequency bands, based upon the size of the listening area.
76
polarity cosine function was preferred, as this method contains no disturbing antiphase components.
Dickins analysed the performance of first and second order Ambisonics and a
custom non-negative least squares (NNLS) panning algorithm by measuring the
directional energy vector magnitude rE [Dickins et al, 1999]. Two listening locations
have been considered, one in the sweet spot at the centre of the array, and another
towards the rear of the array. The tests were carried out in an acoustically treated
listening room. Martin suggests that in general a compromise must be made between
optimizing the directionality of the source, and minimising panning artefacts as the
source moves. The NNLS algorithm is therefore similar to Martins polarity
restricted cosine function and Ville Pulkkis VBAP in that it allows a trade-off
between maximum directivity at loudspeaker positions and a more diffuse panning
which is homogeneous in all directions. The NNLS algorithm was preferred to the
second order Ambisonics system as it functioned well at off-centre listening positions
and could be extended to non-symmetrical loudspeaker arrays. However, as with the
tests carried out by Martin, little detail is given regarding the ambisonic decoding
scheme used. A strict decoding scheme which optimized rV would be expected to
function poorly away from the sweet spot, while a decode which optimized rE would
be much more similar to the NNLS algorithm and would provide a better comparison.
Guastavino conducted a number of listening tests which compared twochannel stereo, transaural and b-format Ambisonics using six symmetrically arranged
77
Pulkki carried out a number of listening tests in order to assess the validity of
a binaural auditory model [Pulkki et al, 2005]. The experiment was conducted in an
anechoic chamber using a symmetrical eight-channel array, first and second order
78
6.4.1 Discussion
The results of the tests presented in the preceding Section indicate that
Ambisonics is consistently preferred to amplitude panning for dynamically moving
sources as it produces a more uniform phantom image and hence disguises the
loudspeaker position. However, amplitude panning was also consistently preferred
for static sources as this method uses fewer loudspeakers and so reduces the
localization blur. This would seem to support Martins view that in general a
compromise must be made between optimizing the directionality of the source, and
minimising panning artefacts as the source moves [Martin et al, 1999a]. The results
which indicated that Ambisonics produced a more diffuse enveloping sound field but
less tightly focussed sources is arguably another interpretation of the same
fundamental difference between the two spatialization techniques.
A number of alternative amplitude panning techniques were presented which
attempt to reduce the timbral changes produced when a source is dynamically panned
79
to different positions [Pulkki, 2005; Martin, 1999a, Dickins et al, 1999]. These
techniques are similar in that they can control the number of contributing
loudspeakers independently of the source azimuth. In this way, they can ensure that a
minimum number of loudspeakers is always used, which then smoothes the perceived
trajectory. This is clearly very similar to the Ambisonics approach of optimizing the
energy vector rE for all directions, at the cost of reducing the maximum localization
accuracy that could be achieved at the loudspeaker positions. Pernaux point out that
amplitude panning algorithms like VBAP can be considered as analogous to a local
ambisonic velocity decode whereby only the loudspeakers closest to the source
direction are used, and the requirement to optimize the velocity component (rV = 1) is
dropped [Pernaux et al, 1998]. They go on to develop a dual-band Vector Based
Panning algorithm (VBP) which uses VBAP at low frequencies and a Vector Based
Intensity panning algorithm (VBIP) at high frequencies, which like Ambisonics,
ensures that E = V. The significant advantage of this system over Ambisonics is that
appropriate decoding factors can be more readily obtained for non-regular
loudspeaker arrays.
Fig. 6.6 Ambisonic decoder directivity patterns where M is the system order
81
dummy loudspeakers were used to increase the choice of angle for the listeners
[Bates et al, 2007b].
82
available seating area in a rectangular shaped room (such as the one used for this test),
delay and gain adjustments must be made to displaced loudspeakers. In this test, the
appropriate delay was applied to each of the two lateral loudspeakers when encoding
the test signals. The gain adjustments were applied to these two loudspeakers by
calibrating each loudspeaker in the array to 70dBA at the centre listening position.
This approach is preferable to using the inverse square law when operating in a
reverberant acoustic environment, due to the superposition of the direct and
reverberant sound affecting the total SPL.
In order to assess the effect of various stimuli, users were presented with one
second unfiltered recordings of male speech, female speech, Gaussian white noise and
a music sample containing fast transients. The results indicate that for most
combinations of listening and source position, the influence of the room is not enough
to cause a listener to incorrectly localize a monophonic source away from the desired
location when using an asymmetrical 8-speaker array. However, for extreme cases
such as a front-corner listening position with a source positioned to the rear, the
degree of accuracy becomes heavily dependent on the nature of the source signal.
In a second test, first and second order Ambisonics, VBAP and Delta
Stereophony were assessed using the same loudspeaker configuration and a forcedchoice, speaker identification method. The first order Ambisonics was implemented
using IRCAMs Spat software and a traditional, dual-band decoding scheme. The
second order system was implemented using the ICST externals for Max/MSP and a
max-rE decoding scheme devised by David Malham specifically for horizontal eight
channel arrays [Schacher et al, 2006]. Delta Stereophony (DSS) is a sound
reinforcement system which has been used successfully in large auditoria. The main
objective of DSS is to reinforce the original direct sound while also ensuring accurate
sound source localization. This can be achieved if the listener at any place in the
room receives the first wavefront from the direction of the sound event being
reinforced, rather than from any of the other loudspeaker positions [Fels, 1996].
The results of the listening tests were verified using calculated ITDs inferred
from high resolution binaural measurements recorded in the test environment. An
equivalent acoustic model was also implemented to investigate specific aspects of the
effect of the room acoustic. The results indicate that neither amplitude panning or
Ambisonics can create consistently localized virtual sources for a distributed audience
in a reverberant environment. Source localization at non-central listener positions is
83
Fig. 6.9 Reported (blue) and actual (orange) direction for a source at speaker 14
84
The results of this test suggest that accurate localization of phantom sources is
difficult to achieve for multiple listeners in a reverberant environment. While
85
acoustic reflections and reverberation in the listening room probably contribute to the
overall reduction in localization accuracy, particularly at certain listener locations, the
results shown in Figure 6.10 would seem to indicate that this is not the primary
negative influence on the performance of the various systems. At locations away
from the centre of the array, the temporal relationship between the contributing sound
waves is distorted and so the precedence effect effectively dominates source
localization. The proximity of the loudspeaker array to certain listeners results in
phantom images which are consistently localized to the nearest contributing
loudspeaker. The best results are therefore achieved with VBAP, as this system only
ever uses a maximum of two loudspeakers, while the worst results were achieved with
b-format as this system uses nearly all of the loudspeakers to produce a phantom
image. The increased directivity of the second order Ambisoncs, max rE decoding
scheme performs better than the first order system particularly at off-centre listener
positions.
Fig. 6.12 Quality maps for 5ms (left) and 50ms (right) time difference
A large number of experiments have been carried out to examine the influence
of the precedence effect [Wallach et al, 1949; Litovsky et al, 1999]. It has been found
that a number of correlated sound waves arriving in close succession will be fused
together and perceived as a single sound at a single location. The duration over which
fusion will take place is highly dependent on the nature of the source signal, but
studies have found an approximate lower limit of 5ms for transient sounds and an
upper limit of 50ms for broadband sounds [Litovsky et al, 1999]. Moore used these
limits to evaluate a number of listener positions with the ITU 5.1 layout in terms of
the precedence effect. By checking the time difference between each loudspeaker
pair, a mark out of ten was given for each position in the listening area allowing
performance to be quantified at different positions in the reproduction area [Moore et
al, 2007]. The results for a reproduction area of 20m2 are shown in Figure 6.12.
87
They indicate that transient signals will only be correctly localized in a relatively
small central listening area due to the large travel distances involved in such a large
array. A large effective listening area is shown for a 50ms time difference and the
author suggests that the effective listening area could be extended using a form of
transient suppression applied to the output signals [Moore et al, 2007].
It has been suggested that altering the directivity characteristic of the
loudspeakers can help to reduce the detrimental effect of the room acoustic on
stereophonic source localization. The results of a series of listening tests carried out
by Harris suggested that diffuse acoustic radiators such as Distributed-Mode
Loudspeakers (DML) reduce the degradation caused by room acoustics on
stereophonic localization [Harris et al, 1998].
reverberation and early reflections in the listening room also influence the perception
of virtual sources, the extent and exact nature of this influence is more difficult to
define as this will entirely depend on the exact dimensions and layout of the particular
space.
88
The results of an initial test indicated that there was an easily recognizable difference
between signals processed using a specular and diffused reflection model, whether a
reverberant tail was included or not. In general, a mix of specular and diffused
reflection models was found to be preferable to a typical perfectly specular reflection
model. Martin also suggests that the inclusion of a height component in a synthetic
89
room model improved the attractiveness and realism of the resulting sound, even
when reproduced using a two-dimensional loudspeaker configuration [Martin et al,
2001].
It is still unclear how the addition of artificial reverberation affects auditory
perception at non-central listener positions. Lund developed a 5.1 surround sound
spatialization system based on amplitude panning with suitable spatially distributed
early reflections and diffuse reverberation [Lund, 2000]. He suggests that the
localization of virtual sources will be made more robust and accurate if the desired
source location is supported by accurately modelled early reflections and
reverberation. The results of a listening test with a very small number of participants
(five listeners) seem to support this view as the system produced more consistently
localized phantom sources and greatly increased the effective listening area [Lund,
2000]. However, further tests with a greater number of subjects are required to fully
verify these results.
Michelsen found that the addition of artificial reverberation and early
reflections produced a clear and unchanged sense of distance in both an anechoic and
reverberent listening room [Michelsen et al, 1997]. This suggests that the listening
room acoustic is not significantly detrimental to the artificial simulation of distance
cues. However, the perceptual effect of combining artificial reverberation with the
natural reverberation of the listening room itself is unknown. The composer Denis
Smalley suggested that this would produce a negative cognitive effect, which he
refers to as spatial dissonance, as the auditory system would be presented with
localization cues that suggest two, potentially conflicting, acoustic spaces. However,
very few listening tests have been undertaken to ascertain how detrimental this effect
actually is. In the authors experience, the layering of two synthetic reverberation
effects does not create two clearly perceivable acoustic spaces but instead results in
one, slightly indistinct space. Clearly, the simulation of distance is highly dependent
on the use of artificial reverberation and it is hard to see how else this effect could be
implemented other than the physical placement of loudspeakers at different distances.
As discussed in Chapter Three, wavefield synthesis differs from stereophony and
Ambisonics in that it can theoretically recreate virtual sources at different distances by
synthesizing the correct wavefront over an extended area. The evaluation of this
system and the veracity of these claims will be examined in the next section.
90
synthesized is between two lines from the listener to the edges of the array, as shown
in Figure 6.14. This area is further reduced if tapering windows are applied to the
array to reduce truncation effects (see Section 3.2.2).
WFS can only accurately synthesize a sound field up to the spatial aliasing
frequency, so clearly the high frequency content will be distorted in some way. Start
reported that the imperfect reconstruction of the sound field above the spatial aliasing
frequency gives rise to an increase in the apparent source width (ASW) due to the
uncertain directionality of the high frequency content [Start, 1997]. Wittek points out
that this must be taken into account when evaluating the localization accuracy of any
WFS system [Wittek, 2003]. For example, a measure of the standard deviation in the
reported directional data will not indicate whether listeners perceive a tightly focussed
source within the range of directions reported, or a broad diffuse source distributed
between the range of reported angles. As discussed earlier in Chapter Two,
locatedness is a measure of the degree to which an auditory event can be said to be
clearly in a particular location [Blauert] and this parameter is sometimes used in the
assessment of WFS systems to determine the focus or apparent width of the virtual
source.
Some of the earliest perceptual experiments with WFS systems were carried
out by Vogel [Vogel, 1993]. In an experiment with an array of twelve loudspeakers
each separated by 45cm, he found that correct directional localization was maintained
92
despite the very low spatial aliasing frequency of 380Hz of this system. However,
Wittek points out that as this system can only correctly synthesize frequencies below
380Hz, it cannot be assumed that WFS is responsible for the correct localization
[Wittek, 2003]. For a WFS virtual source positioned behind the array, the
loudspeaker nearest to a direct line from the listener to the source will be producing
the earliest and often the loudest signal. The precedence effect would therefore
provide a localization cue at all frequencies, which, in this case, coincides with the
source position specified by the WFS system. The mean directional error reported in
Vogels test is no lower than what would be expected if the precedence effect was
dominating localization and so these results do not indicate that localization accuracy
is improved by this particular WFS system. This situation does not occur with
focussed sources in front of the array, as in this case the first wavefront does not
arrive from the same direction as the virtual source [Wittek, 2003].
Since these early tests further experiments have been carried out with
loudspeaker arrays of greater size and resolution [Vogel, 1993; Huber, 2002]. The
results of these tests demonstrated that localization accuracy was greater than the
actual physical resolution of the loudspeaker array, no doubt due to the increased
resolution of the array and the associated increase in the spatial aliasing frequency.
The results of both these tests clearly indicate the importance of the spatial aliasing
frequency in terms of the performance of the WFS system.
Start compared the minimal audible angle (MAA) of real sources and virtual
sources produced using a WFS system [Start, 1997]. He found no difference between
the MAA of a real source and the WFS source for a spatial aliasing frequency of
1.5kHz, for both broadband and low-pass-filtered noise signals. When the spatial
aliasing frequency was reduced to 750Hz however, the MAA increased somewhat.
Start suggested that this result implied that a spatial aliasing frequency of 1.5kHz
would ensure that the dominant low frequency localization cues are satisfied and so
the source will be accurately localized.
Huber conducted listening tests in an anechoic chamber to compare real
sources, two-channel stereo, WFS with loudspeaker spacings of 4cm and 12cm, and
an augmented WFS system based on Optimized Phantom Source Imaging (OPSI)
[Huber, 2002]. OPSI uses amplitude panning to position the portion of the signal
which lies above the sampling aliasing frequency [Wittek, 2002], thereby reducing the
artefacts which occur due to the incorrect reconstruction of the high frequency signals
93
in WFS systems. Figure 6.15 shows the scaled judgements of locatedness (which
Huber refers to as localization quality) for each of the five systems. Clearly none are
able to match the real source in terms of locatedness, however, a significant
improvement is apparent when the loudspeaker spacing is reduced to 4cm (which
results in a aliasing frequency of 3kHz). The worst results were achieved with stereo
while the hybrid OPSI method was found to produce approximately the same results
as the normal WFS system.
accuracy shown in Figure 6.16 does not indicate any differences between the real and
WFS sources, which indicates the importance of assessing the perceptual sense of
locatedness as well as the perceived direction.
94
95
The results were similar to Huber's in that none of the systems matched the
performance of a real source in terms of locatedness, but better results were reported
with the WFS system than with amplitude panning (see Figure 6.18). As with Huber,
better results were achieved when the spatial aliasing frequency was increased from
2.5kHz to 7.5kHz. Similarly, these differences are not evident when only the standard
deviations of the measured auditory event directions are considered. No degradation
in localization quality was found using the hybrid OPSI method.
Sanson examined localization inaccuracies in the synthesis of virtual sound
sources using WFS at high frequencies [Sanson et al, 2008]. Objective and
perceptual analyses were carried out through a binaural simulation of the WFS array
at the ears of the listener using individual head related transfer functions (HRTFs).
The array could be configured for loudspeaker spacing of 15cm, resulting in an
aliasing frequency around 1500Hz, or a loudspeaker spacing of 30cm results in an
aliasing frequency around 700Hz. Two listener positions were evaluated, one central
and one laterally displaced to the right by 1m. The results of the test indicated that
localization accuracy was dependent on the listening position, the source position and
the frequency content of the source signal. Localization accuracy decreased as the
listener position moved laterally away from the centre point, As the source cut-off
frequency was increased, localization at the off-centre position degraded, but not at
the central listening position. The authors suggest that this is due to the unequal
distribution of high frequency content at either ear when the listener is positioned at a
non-central location. This would provide a conflicting localization cue relative to the
low frequency content which is accurately reproduced by the WFS system. Clearly,
technical solutions to the distorted high frequency content in WFS systems must
address localization for off-centre listener positions.
96
The averaged RMS error for the low-pass-filtered noise signal is almost
identical for the real and virtual sources in the anechoic room but the results
for a real source are somewhat better in the auditorium and concert hall.
The best results were achieved with the speech signal
The averaged RMS error for the high-pass-filtered noise signal is much larger
for synthesized sources.
The localization performance of the WFS system is worse in the concert hall
while real sources are localized to the same degree of accuracy as the other
rooms.
Start suggests that the reduction in localization accuracy of the WFS system in
the concert hall is primarily due to the lower spatial aliasing frequency (750Hz) of this
system. The standard deviation of the averaged perceived direction for each room is
shown in Figure 6.20. In the anechoic room and the auditorium a clear difference can
be seen in the results for the low and high pass filtered noise. In each case, the low
frequency signal (<1.4kHz and <1.2kHz, respectively) whether real or synthesized, is
localized more accurately than the high-pass-filtered signal. The localization
accuracy with synthesized sources is particularly worse with the high-pass-filtered
signal. However, this is not the case in the concert hall as here the worst results are
97
achieved with the low-pass-filtered signal (now <750Hz) for both real and
synthesized sources. These results suggest that the WFS system is not working
correctly in this particular room. Start suggests that this is solely due to the decrease
in the spatial aliasing frequency. However, the drastic reduction in performance for a
low-pass-filtered noise signal results suggest that the performance of the WFS system
is also significantly affected by the reproduction room acoustic.
98
Fig. 6.21 Virtual Source (a) behind the array & (b) in front of the array
gray bars: virtual sources (loudspeaker spacing = 11cm)
black bars: virtual sources (loudspeaker spacing = 22cm)
white bars: real sources [Verheijen]
A similar experiment was carried out by Verheijen using two different arrays
with spatial aliasing frequencies of 0.75kHz and 1.5kHz respectively, and virtual
sources behind and in front of the array [Verheijin, 1998].
utilized so that central loudspeaker spacing (24cm) was greater than lateral
loudspeakers (13cm). This unusual arrangement results in an aliasing frequency
which varies with the source angle and listener position, as shown in Figure 6.23.
100
The results illustrated the general and expected trend that localization ability
decreases as the listener orientation changes from frontal to lateral orientation. This is
particularly evident for the listener position closest to the loudspeaker array, which
suggests that there is a limit to how close to the loudspeakers listeners can be placed
without a significant decrease in localization accuracy. The best results were obtained
for the central listener position while a slight reduction in accuracy was reported for
the listener position furthest from the array. As expected localization accuracy
generally decreased in the more reflective room but, in general, good localization was
achieved. However, Marentakis points out that signals with strong transient
characteristics, such as the enveloped white noise signals used in this test, are
localized independent of the room reverberation [Hartmann, 1983].
101
two other simultaneously reproduced virtual reference sources (see Figure. 6.24). The
subjects were asked to move around in the listening area throughout each test. In the
first experiment, the subjects could only manipulate the WFS source distance while
other parameters such as the direct to reverberant energy ratio and signal level were
kept constant. The results showed that the subjects were indeed able to position the
middle guitar in between the other two solely using the perspective cue of the WFS
sound field. This result suggests that the perception of distance due to the virtual
source position in a WFS system can be perceived independently of the subjective
distance impression.
102
sources. Wittek suggests that these results indicate that at a fixed listening position,
the curvature of the wavefront of a dry WFS virtual source does not support distance
perception. However, in spite of not being a crucial cue, a correct wavefront
curvature (and thus a consistency between curvature and actual distance) may support
the perception of distance, particularly if the listener can move [Wittek, 2003]
Fig. 6.25 Distance of real (left) and virtual (right) sources reported by Kerber
Usher similarly found that in the absence of any indirect sound, when a source
is positioned beyond a certain distance using a WFS system, the curvature of the
wavefront does not seem to be used to determine the distance of the virtual source, but
rather the timbre of the perceived source dominates [Usher et al, 2004]. It would
seem, therefore, that in order to accurately produce WFS virtual sources at different
distances, some form of artificial reverberation is needed to provide additional
distance cues. However, Wittek points out that disturbing reflections caused by the
WFS array itself may in fact also hinder the perception of the distance of virtual
sources in front of or behind the array. It is not the case that a dry WFS virtual source
will automatically produce a natural reflection pattern in the reproduction room
[Wittek, 2003] and this is illustrated in Figure 6.26 which shows a WFS system with a
virtual source (blue dot) positioned in front of the array. The correct reflections that
would arise if a real source was at this position are indicated by the green dots, while
the actual reflections that arise are shown as orange dots. Clearly both the timing and
direction of the actual reflections do not correspond to the desired source position, but
rather to the distance of the loudspeaker array itself. The virtual source distance will
therefore most likely be perceived to be the distance of the loudspeaker array rather
103
than the specified source distance. This again indicates the significant influence of
the reproduction room acoustic on the performance of WFS systems. Various
listening room compensation schemes have been proposed which can actively cancel
early reflections in the horizontal plane [Corteel et al, 2003; Spors et al, 2003]. The
results of simulations suggest that these techniques could help to reduce the
detrimental effect of early reflections in the listening room over a large area.
104
Or course, WFS can also simulate different source distances through the
addition of early reflections and reverberation. Figure 6.27 illustrates a scheme
proposed by Caulkins to artificially inject early reflections which are otherwise absent
in the WFS reproduction of focussed sources [Caulkins et al, 2003]. As with other
spatialization techniques, more listening tests are required to fully determine the
perceptual effect of combined and potentially conflicting virtual and real acoustic
reflections.
Fig. 6.28 Perceived colouration for various WFS and OPSI systems
105
Wittek carried out a series of listening tests to determine the relative level of
colouration of real, WFS, WFS+OPSI and stereophonic sources in an acoustically
treated listening room [Wittek, 2007]. The results, shown in Figure 6.28, indicate that
the amount of signal colouration increases with spatial aliasing, and that the OPSI
method can significantly reduce the perceived colouration in comparison to the WFS
system. The author suggests that the non-zero result for the real reference source was
due to the non-individualised HRTF used in the experiment. Interestingly, the lowest
level of colouration was reported with the standard stereo system.
106
conditions. This view seems to be supported by the results of tests which found that
localization accuracy is decreased for focussed sources in front of the array
[Verheijen, 1998] or for non-central listener positions [Marentakis et al, 2008]. The
quality of localization or locatedness is another important factor, as measures such as
the standard deviation in source angle may suggest good localization was achieved
when in fact significant source broadening occurred. Further subjective tests are
required to fully determine the localization accuracy of WFS systems for focussed
sources in front of the array and for non-central listener positions.
It has been suggested that WFS can be used to enlarge the effective listening
area of other stereophonic techniques, and this appears to be true for certain
applications. Various tests [Corteel et al, 2004] have found that WFS does
significantly increase the listening area for two-channel stereo as sources are less
likely to collapse to the nearest loudspeaker when listeners are displaced laterally
from the sweet spot. WFS has similarly been successfully used in a domestic
situation as a flexible and robust method for 5.1 reproduction as it is less sensitive to
listener position and non-standard loudspeaker positions [Corteel et al, 2004]. Other
studies have found that plane wave reproduction with a WFS system can be used to
diffuse the rear channels in a 5.1 reproduction set-up, again increasing the effective
listening area [Boone et al, 1999]. The use of WFS as a flexible reproduction format
for domestic cinema and audio applications would therefore seem to be one of the
most promising applications of this technique. The CARROUSO Project (Creating,
assessing and rendering in real-time of high quality audio-visual environments in
MPEG-4 context) has attempted to merge WFS with the flexible MPEG-4 standard
for the transfer of both recorded and synthesized sound fields between different
reproduction systems. Much of this work has concentrated on developing practical
implementations of WFS suitable for the domestic market, such as flat-panel,
distributed mode loudspeakers [Farina et al, 2000].
The reproduction of virtual sources at different distances is one of the most
widely lauded features of the WFS method. However, it appears that when the
listener position is fixed, the correct wavefront curvature produced by a WFS virtual
source does not provide any perceptible distance information [Kerber et al, 2004]. In
addition, reflections from the array and reproduction room walls will tend to pull the
perceived source distance to the distance of the loudspeakers [Wittek, 2003].
Although it has been suggested that this weak cue might help to support other more
107
dominant distance cues [Nogus et al, 2003], there is little evidence to support this
claim. The one significant exception is when the listener can move, and in this
instance the ability of WFS to reproduce a sense of changing perspective or motion
parallax has been shown to support the estimation of distance.
108
6.8.1 Discussion
The preceding discussion illustrates the difficulties in the presentation of
spatial audio to multiple listeners. The influence of the precedence effect is
particularly noticeable for off-centre listeners and it appears that a high degree of
directional localization accuracy can only really be achieved for every listener if a
single loudspeaker is used. Spatialization techniques such as pair-wise amplitude
panning, and to a lesser extent, higher order Ambisonics, produce the next best
results, as the number of contributing loudspeakers is restricted and are situated in the
110
111
6.8.2 Implications
The results presented in the preceding section suggest that it is very difficult to
produce spatial locations and trajectories which are unambiguously perceived by
every listener, in the same way. Even in the case of point sources which are clearly
localized, each listener will be orientated differently with regards to the loudspeaker
array, and so will have a different perspective on the spatial layout. As noted earlier,
directional localization accuracy is the main topic under investigation in many of
these tests, but this is not necessarily the only way in which space can be utilized in a
musical composition. The results presented earlier suggest that this may in fact be a
necessity. However, it is just as important to know if these other uses of space are
clearly perceptible to an audience, and if so, which spatialization technique can
achieve this most effectively, if at all? Clearly, Ambisonics is the preferred
spatialization technique for dynamically moving sources. However, it is also clear
that the precise trajectory perceived by each listener will be strongly influenced by
their position within the array.
If a recorded sound is to be used in spatial music composition, the ambisonic
Soundfield microphone represents the most flexible recording option if an enveloping
sound field is required. However, if a more directional diffusion is required, then
monophonic or stereophonic microphone techniques are perhaps more applicable as
although multi-channel microphone techniques can be very effective, they are tied to
a specific reproduction layout.
While many composers continue to utilize various multi-channel techniques,
others have adopted an entirely different approach based upon a single two-channel
stereo source and a large, disparate collection of spatially distributed pairs of
loudspeakers, i.e. a loudspeaker orchestra. This aesthetic represents a very different
approach to the multi-channel techniques discussed in the preceding chapters.
However, the art of diffusion is admirably focussed on the perception of the audience
and the real technical problems which arise in these kinds of performances, something
which is often lacking in multi-channel tape compositions.
The second half of this thesis will focus on spatial music composition via the
analysis of a number of different composers and aesthetics, and some original
compositions by the author. Different approaches to the use of space as a musical
parameter will be assessed in terms of the technical and perceptual research presented
112
in the preceding chapters. Inevitably, greater emphasis will be placed on music from
the twentieth century as many significant aspects of spatial music are dependent on
technical developments from this era, however, spatial music is not solely a twentieth
century phenomenon. The spatial distribution of performers has been used for
centuries in European religious choral music, and this antiphonal style is itself derived
from the even more ancient call-and-response form. The next chapter in this thesis
will examine this early form of spatial music and investigate the development of
acoustic spatial music in the first half of the twentieth century, prior to the
development of recording and amplification technology and electronic spatialization
techniques.
113
114
been situated at some other point away from the main choir. This theory is supported
by historical accounts from the time, one of which details the method of
synchronizing the various spatially distributed groups. It appears that two additional
conductors were employed to relay the beat indicated by the principal conductor,
situated at floor level with the main choir, to the musicians in each organ loft.
Giovanni Gabrielli's famous work In Ecclesiis is an excellent example of this form of
polychoral music, using four separate groups of instrumental and singing performers
accompanied by organ and basso continuo to create a spatial dialogue and echo
effects (see score extract in Figure 7.2).
The Venetian school was highly influential across Europe, helped in part no
doubt by the invention of the printing press a century before. The English composer
Thomas Tallis composed Spem in Alium in 1573 for forty separate vocal parts
arranged in eight choirs while Orazio Benevolis Festal Mass was written for fiftythree parts, two organs and basso continuo. Over the next four hundred years, the use
of antiphony became rarer with some notable exceptions such as the antiphonal choral
effects of J. S. Bachs St. Matthew Passion (1729), and the motivic interplay of
spatially separated groups of Mozarts Serenade in D for four Orchestras (1777). In
the Romantic era, composers occasionally placed groups of musicians away from the
main orchestra for dramatic effect. One example is Berliozs Requiem (1837), which
116
at its premiere included four brass ensembles positioned at the four cardinal points,
along with a massive orchestra of singers, woodwinds, horns, strings, and percussion.
Berlioz was aware in advance that this work would be premiered in Les Invalides, the
gigantic domed cathedral of the military hospital in Paris, and he exploited the
characteristics of this space in this new commission. In the famous Tuba Mirum
section, the invocation of Gods fury with the damned is invoked through the
consecutive entrance of the four brass ensembles which gradually builds to a dramatic
climax of massed timpani and voices (see Figure 7.3). 1
Although Berlioz was clearly thinking about the use of space in music (he
referred to it as architectural music), the spatial distribution of the performers in this
case is largely for dramatic effect and is not a critical feature of the work, and this is
true of most of the historical examples discussed in this Section. In the early part of
the twentieth century, space was sometimes used to create a sense of perspective by
contrasting the orchestra on stage with more instruments placed at a distance off1
Interestingly, during the first performance, the conductor Habeneck (a rival of Berlioz) is alleged to
have broken for a pinch of snuff during the critical entrance of the brass ensembles, requiring Berlioz
himself to leap in and take over the conducting duties [Bromberger, 2007].
117
stage. The Austrian composer Gustav Mahler often used off-stage musicians in
addition to the main orchestra, such as, for example, the brass and percussion in the
fifth movement of Symphony No. 2 (1894) or the off-stage snare drum in Symphony
No. 3 (1896). Other significant composers of the era also used similar effects,
although not as frequently as Mahler. Igor Stravinsky made use of tubas dans le
couline (in the corridor i.e. in the wings) in the ballet score of Firebird (1910) and
Strauss featured six trombones auf dem Theater in Die Frau ohne Schatten (1919).
This period was one of great upheaval in Western Art music as many composers
begin to move away from the strict functional tonality and defined meters of previous
eras. While composers like Stravinsky and later Schoenberg experimented with the
basic foundations of musical structure, others retained aspects of traditional music
practice but utilized them in a very different way. The American Charles Ives is one
such composer who regularly combined traditional tonality with the then new ideas of
musical quotation, polyrhythms and meters, and spatial effects. Ives was a
contemporary of Mahler (both produced most of their music within the same thirty
year period from 1888 to 1918) and although their music is quiet disparate, and
derived from very different musical traditions, both composers do exhibit certain
similarities [Morgan, 1978]. For example, both composers were interested in the
quotation of other musical material within their own compositions and both regularly
combined and juxtaposed layers of different and contrasting material. Both
composers also retained aspects of functional tonality in their work and made
extensive use of overlapping yet unrelated tempi. While both composers used the
spatial distribution of performers in their work, it would be Ives who developed this
practice further and have the most lasting effect on the development of spatial music,
particularly in America.
118
accounts exist of his experiments with the spatial effect of simultaneous marching
bands [Ross, 2009; Mortenson, 1987]. This influence can clearly be seen in the music
of Charles Ives, which often makes use of overlapping keys and meters and the
combination of European Art music with American popular and church music. Over
the course of his life, Ives would go on to explore many of the musical innovations
which would become associated with modern contemporary music such as
polytonality and polyrhythm, tone clusters and microtonality, musical quotation and
collage, and also spatial music. Much of Ives music involves the juxtaposition of
various disparate elements and his compositional use of space generally reflected this.
The Unanswered Question (1908), one of his most famous compositions, uses the
spatial distribution of musicians to highlight the three distinct layers of strings,
woodwinds and brass. The three layers operate independently at their own tempo and
key and this musical separation is further accentuated through the placement of the
string orchestra off-stage, the woodwind ensemble on stage, and the solo trumpet
positioned at some other distant position, such as a balcony. Throughout the piece the
string orchestra performs slow, sustained tonal triads which are punctuated by short
trumpet phrases (the question), which are in turn answered in an increasingly
incoherent fashion by the woodwinds. The symbolism of this work is beautifully
supported by the spatial distribution and layering of the musical material. The slowmoving tonal strings represent the natural world which surrounds the audience and
remains in constant yet slow motion, undisturbed by the question and answer dialogue
of the trumpet and woodwinds. The trumpet sounds out the question with a clear,
atonal phrase which is then answered by the woodwind ensemble. In contrast to the
clear question of the solo trumpet, the woodwinds respond with multiple, overlapping
and seemingly unrelated phrases and so, no answer is found to the eternal question of
the title.
In much of Ives music, space is used to clarify and define the various
overlapping yet independent musical layers, and other composers at the time were
also beginning to experiment with the layering of disparate musical material. Italian
futurists like Luigi Russolo were exploring collage form and noise while Darius
Milhauds 1918 ballet LHomme et son desir used multiple, spatially distributed
ensembles playing independently of each other, sometimes in different metres
[Zvonar, 2004]. However, Ives fourth and last symphony (1910-1916) is certainly
one of the most ambitious works that made use of this technique. This work, for a
119
gigantic orchestra with additional off-stage ensembles, was not performed in full until
nearly a decade after Ives death. The second movement juxtaposes so much different
thematic material that a second conductor is generally required while the final
movement contains a contrasting dialogue between discordant and tonal material. In
the conductor's note Ives wrote [Johnson, 2002];
"As the distant hills, in a landscape, row upon row, grow gradually into the horizon,
so there may be something corresponding to this in the presentation of music. Music
seems too often all foreground even if played by a master of dynamics
It is difficult to reproduce the sounds and feeling that distance gives to sound wholly
by reducing or increasing the number of instruments or by varying their intensities.
A brass band playing pianissimo across the street is a different sounding thing than
the same band playing the same piece forte, a block or so away. Experiments, even
on a limited scale, as when a conductor separates a chorus from the orchestra or
places a choir off the stage or in a remote part of the hall, seem to indicate that there
are possibilities in this matter that may benefit the presentation of music, not only
from the standpoint of clarifying the harmonic, rhythmic, thematic material, etc., but
of bringing the inner content to a deeper realization.
Clearly Ives used the spatial separation of performers to create a sense of distance and
perspective, in much the same way as European composers such as Mahler.
However, Ives also used this spatial distribution to clarify different layers of
independent and potentially dissonant musical material and to facilitate the
performance of overlapping yet unrelated musical layers, often at different tempi or
metres. While the spatial separation of musical material at different tempi obviously
has practical benefits for its performers, the above quote also indicates that Ives had
intuitively realised that this spatial separation also benefited the listener. It is
unknown whether this insight was informed by the recent development of Gestalt
psychology in Germany, or derived from Ives own experience. However, it has since
been shown by Bregman and others that our ability to segregate an audio scene into
multiple streams strongly influences our perception of musical parameters such as
melody and rhythm [Bregman, 1990]. Bregmans work on Auditory Scene Analysis
(see Chapter Two, Section 2.3) emphasized the importance of spatial cues in the
segregation of audio streams and he suggested that the spatial separation of a
multiplicity of sounds prevents the auditory system from computing dissonances
between them. Other studies have also found that a listeners ability to detect and
understand the content of multiple signals is improved if the signals are spatially
separated signals [Shinn-Cunningham, 2003; Best, 2004] and this would also appear
to support Ives use of space to clarify the harmonic, rhythmic and thematic
120
material. Although largely ignored for much of his career, Charles Ives would
eventually be recognized as a highly creative and innovative composer, and his
experiments with spatial music would be an important influence on a number of
American composers.
The composer Henry Brant (1913-2008) is one such composer who was
influenced by Ives. Over the course of his career, Brant wrote seventy-six works of
spatial music (along with fifty-seven non-spatial works [Harley, 1997]) and has
become one of the most famous composers of orchestral spatial music. Brants use of
space was clearly influenced by Ives, as illustrated by this extremely admiring 1954
description by Brant of Ives The Unanswered Question.
This unique, unprecedented little work, written in 1908, presents, with extraordinary
economy and concentration, the entire twentieth-century spatial spectrum in music, and
offers guidelines for solving all the practical problems involved. The spatialcontrapuntal-polytemporal principles so brilliantly exemplified in this piece are the basis
for the more complicated spatial superimpositions present in all my own recent largescale works [Brant, 1967].
Brant composed and wrote extensively on the use of space in orchestral music and
his first spatial work, Antiphony I (1953) contains many of the core ideas which the
composer would continue to use throughout his career. In this piece, the orchestra is
divided into five groups which are placed at different parts of the auditorium and
perform material in contrasting tempi, meter and harmonies. Although the entrance of
each group is cued, they then proceed independently at their own speed. The
composer states that a purposeful lack of relationship between the intervals,
phrasing, note-values, tone-quality and sonorities of the various lines will necessarily
produce a complex result as soon as the lines are combined [Harley, 1997]. Brant
presented his ideas about the compositional use of space in a brief article written in
1967 [Brant, 1967] which is summarized as follows;
-
122
123
This piece also includes other spatial effects such as the stepwise introduction
of trombone/trumpet pairs, which are described by Harley as sound axes [Harley,
1997]. This spatial movement is linked to pitch, as the entries begin with a very high
trumpet note paired with a very low note on the trombone directly opposite, and end
with a convergence around middle C (see Figures 7.5 and 7.6). This movement is
quite different from the successive entry of the spatial assembly instruments as this
time a relationship exists between the material, instead of each instrument having its
own independent melody, key and meter. Brant used the term spill to describe the
effect of spatially distributed musicians performing similar, harmonically related
material. He uses the Tuba Mirum section of Berliozs Requiem (see Figure 7.3) as
an example and describes how the common tonality and tone quality of the four
groups causes the resulting musical texture to extend from the corners to fill the room
[Brant, 1967].
Throughout his long career, Brant continued to compose spatial works of
greater and greater scale, culminating in Bran(d)t aan de Amstel, the massive
spectacle of spatial music which encompassed most of the city of Amsterdam in the
1984 Holland Festival [Harley, 1997]. This huge work involved a colossal number of
musicians including numerous bands in public squares, a youth jazz band, two
choruses, two brass bands, four street organs and four boatloads of performers moving
through the citys canals.
Ives influence can also be seen in the spatial distribution scheme adopted by
the early composers of electronic music in America in the early 1950s. The relatively
recent development of magnetic tape was quickly adopted by composers as it greatly
facilitated the editing and splicing together of different sounds. The Project for Music
for Magnetic Tape was established in New York in the early 1950s and over the next
two years this group produced three new electronics works, Cages Williams Mix
(1952), Earle Brown's Octet (1952) and Morton Feldman's Intersection (1953). Each
piece was realized using eight unsynchronized monophonic tapes positioned
equidistantly around the auditorium. The spatial separation of multiple independent
musical layers, in this case electronically generated, is clearly reminiscent of the
approach taken by Ives and Brant. In a lecture on experimental music given in 1957,
Cage described this approach as follows;
Rehearsals have shown that this new music, whether for tape or for instruments, is
more clearly heard when the several loudspeakers or performers are separated in
124
space rather than grouped closely together. For this music is not concerned with
harmoniousness as generally understood, where the quality of harmony results from
a blending of several elements. Here we are concerned with the coexistence of
dissimilars, and the central points where fusion occurs are many: the ears of the
listeners wherever they are. This disharmony, to paraphrase Bergsons statement
about disorder, is simply a harmony to which many are unaccustomed [Cage, 1957].
In a 1992 interview Brant stated that the main function of space in music is to
make complexity intelligible [Harley, 1997] and Cages co-existence of dissimilars
is very reminiscent of this. Brant did, however, distinguish his music from Cage,
stating that his approach is opposed to what later came to be termed aleatoric or
indeterminate music, in which accident and chance are looked upon as fundamental
musical parameters. When uncoordinated rhythmic layers are combined with spatial
distribution, accident is no more a factor than it is in the performance of rubato in a
complex Chopin ratio [Brant, 1967].
The music of Ives, Brant and Cage uses space as a fundamental musical
parameter, as in this case, the spatial separation of the different musical lines is crucial
for the correct perception of the work. However, although space is a critical aspect of
this music, it is nevertheless used in a somewhat limited way, namely just to separate
and clarify the various musical lines. While composers like Cage adopted a strictly
indeterminate approach, others began to develop more formal systems to organize the
spatial aspects of a work. Brant was well aware of the dangers in this approach,
saying that schemes for spatial distribution that are conceived in terms of their visual
expressiveness on paper cannot be expected to produce any effect on the aural
mechanism. He also discussed the degradation in spatial impression that occurs after
a certain level of musical activity is reached, stating that the impression of the sound
travelling gradually down the first wall is very strong; this impression of moving
direction becomes less well defined as the further entrances and accumulations occur
[Brant, 1967]. However, Brant did use geometrical patterns to map out spatial
trajectories, such as the sound axes of Millennium discussed earlier, and in the years
to follow many more composers would also turn to geometric abstraction in an
attempt to systematically organize the spatial relationships within a composition. This
approach was undoubtedly influenced by the development of electronic music as now,
for the first time, sounds could be entirely removed from their source and reproduced
at will. The different aesthetics of electronic music which developed in the middle of
125
the twentieth century would profoundly influence the use of space, not only in
electronic music, but also in the way composers employed space in orchestral music.
126
Fig. 8.1 Pierre Henry performing with the potentiomtre d'espace, Paris, 1952
127
being derived from them. Nevertheless, Schaeffers ideas were well received at a
lecture at Darmstadt festival in 1951 and, perhaps due to the influence of visiting
composers like Boulez, Messiaen and Stockhausen, serialist tendencies, although
128
resisted by Schaeffer, began to emerge within the GRMC [Palombini, 1993]. This
eventually lead to the resignation of Schaeffer, Henry and others from the GRMC in
1958 and the founding of a new collective, the Groupe de Recherches Musicales
(GRM), which was later joined by composers such as Luc Ferrari, Iannis Xenakis,
Bernard Parmegiani, and Franois Bayle.
Meanwhile in Germany, another electronic music studio was established
which would be far more amenable to the tenants of serialism. The physicist Werner
Meyer-Eppler had published a thesis in 1949 on the production of electronic music
using purely electronic processes [Meyer-Eppler, 1949] and in 1951, Meyer-Eppler,
with Robert Beyer and Herbert Eimert, established a new studio for this purpose in
Cologne at the Nordwestdeutscher Rundfunk (NWDR). Their approach to
Elektronische Musik, differed from Musique Concrte in its use of synthesized sounds
rather than recorded acoustic sounds. However, both aesthetics also differ
fundamentally in terms of their basic approach to electronic music composition.
Composition with sound synthesis is inherently more suited to abstract structuring
principles such as serialism, as the material can be deliberately generated to fit the
preconceived score. This is much more difficult with the complex acoustic sounds of
musicque concrte and indeed Schaeffer struggled to find a formal compositional
system which originated from the intrinsic properties of the sounds. Schaeffers
disappointment is evident in this quote from an interview in 1986;
"I fought like a demon throughout all the years of discovery and exploration in
musique concrte. I fought against electronic music, which was another approach, a
systemic approach, when I preferred an experimental approach actually working
directly, empirically with sound. But at the same time, as I defended the music I was
working on, I was personally horrified at what I was doing. . . . . . I was happy at
overcoming great difficulties - my first difficulties with the turntables when I was
working on Symphonie pour un Homme Seul, my first difficulties with the tape
recorders when I was doing Etude aux objects - that was good work, I did what I set
out to do. My work on the Solfege - it's not that I disown everything I did - it was a
lot of hard work. But each time I was to experience the disappointment of not
arriving at music. I couldn't get to music, what I call music. I think of myself as an
explorer struggling to find a way through in the far north, but I wasnt finding a way
through. " [Hodgkinson, 2001]
129
between sounds, does the spatial movement or distribution arise from the sound itself
or is it imposed upon the sound?
The NWDR studio in Cologne swiftly became one of the most well-known
electronic music studios in the world, helped in no small part by the growing fame of
the composer Karlheinz Stockhausen, who had joined the studio in 1953.
Stockhausen had worked briefly with Schaeffer in 1952 and produced a single
Konkrete Etde in 1952. Stockhausens approach to composition at this time was
based upon the ideas of total serialism, meaning the application of a measurable scale
of proportions (the series), not only to pitches, but also to non-pitched parameters
such as timbre, rhythm and also space. Schaeffers musicque concrte would
eventually lead to an aesthetic in which space is used in a performance to highlight
and exaggerate the pre-existing content of the music, and this will be looked at in
more detail later in this chapter. The alternative legacy of composers like
Stockhausen can be seen in the large number of abstract schemes which composers
began to use to organize the spatial distribution of material.
130
Opus 24, "What is essential is not a uniquely chosen gestalt (theme, motive), but a
chosen sequence of proportions for pitch, duration and volume [Morgan, 1991].
Electronic music would be the ideal medium to implement these ideas as it allowed
the composer to create the musical material and hence organize timbre and space
according to a series of proportions. These ideas were implemented by Stockhausen
in the electronic work Gesang der Jnglinge (1955/1956) which has been described as
the first masterpiece of electronic music [Simms, 1996] and was also the first piece
to serialize the projection of sound in space [Smalley J., 2000]. In this piece,
Stockhausen attempts to forge a connection between recordings of a boy soprano and
electronically synthesized sounds ranging from white noise to sine tones.
Stockhausen categorized the vocal recordings into basic phonetic components such as
noise-like plosive consonants and vowel sounds which resemble pure tones. These
were then combined with artificial consonant and vowel-like sounds created from
layered sine waves and filtered white noise to produce a range of material which fills
a continuum of timbres from pure tones to noise. This positioning of material within
a continuum of timbres illustrates Stockhausens conception of serialism as a
graduated scale of values between two opposing extremes which he later described in
an interview in 1971;
"Serialism is the only way of balancing different forces. In general it means simply
that you have any number of degrees between two extremes that are defined at the
beginning of a work, and you establish a scale to mediate between these two
extremes. Serialism is just a way of thinking." [Cott, 1973]
Although both recorded and synthesized sounds are used in Gesang der
Jnglinge, every sonic event and parameter (including spatial locations and the
comprehensibility of the text) was controlled and organized serially. The original
work was performed using five groups of loudspeakers at its premiere in 1953 but was
subsequently mixed down to four tracks. No five-track tape machines existed at this
time, so a four-track machine was used to feed four loudspeakers positioned around
the audience while an additional tape machine was used to feed a fifth loudspeaker
positioned on stage (see Figure 8.2 [Smalley J., 2000]). Although it is known that
Stockhausen attempted to organize space in this work in the same way as every other
parameter, the exact means by which serial techniques were applied to the spatial
distribution and movement is not entirely known [Smalley J., 2000]. The spatial
distance, especially of the boys voice, is clearly an important aspect of this work, and
131
the composer relates this parameter to the comprehensibility of the voice. When the
voice is positioned close-by with little reverb, the text is clearly comprehensible and is
the primary focus of attention. However, as the voice recedes into the distance and
the level of reverberation increases, comprehensibility decreases and the complex
interaction of the voice with itself and other similar electronic elements becomes the
primary focus of concern [Moritz, 2002].
Stockhausen's theoretical writings from this time are primarily concerned with
the difficulties in applying serialist proportions to non-pitched parameters such as
space and timbre. Stockhausen was well aware of the difficulties inherent in this
approach, particularly in terms of timbre, writing that "we only hear that one
instrument is different from another, but not that they stand in specific relationship to
one another", and it is clear that a similar problem exists in terms of the perception of
spatial locations and movements. Stockhausen's solution is described in the wellknown text, Wie die Zeir Vergeht (how time passes) which was published in 1955
[Stockhausen, 1959]. In this essay, Stockhausen presents a theory of the unity of
musical time in which every musical parameter is considered at different temporal
132
levels. So, for example, the macro-level rhythm of an individual musical phrase can
be related to the specific micro-level subdivision of the spectrum by its harmonic
structure, which is also similarly related to timbre [Morgan, 1991]. At even greater
durations, the entire composition can be considered as a timbre with a spectrum
derived from the pitch, rhythm and dynamic envelope of the combined individual
musical phrases. This new approach moved the focus away from the pointillistic note
relationships of early serialism and onto what Stockhausen referred to as "group
composition", which emphasized the overall character of large groups of
proportionally related material, rather than on the relationship between individual
pitches. This approach is demonstrated in Gruppen (1955-57), for three orchestras
positioned to the left, in front and to the right of the audience. The spatial separation
of the three orchestras was initially devised to clarify the carefully constructed
relationships between the three layers of material, which is clearly reminiscent of the
approach adopted by Brant and Ives. However, in certain passages, musical material
is passed from one group to another and Stockhausen notes that similar orchestration
was deliberately used for each of the three groups in order to achieve this effect. The
spatial movement was produced using overlapping crescendos and decrescendos (see
Figure 8.5) which are clearly reminiscent of stereophonic panning and illustrates how
Stockhausens experience with electronic music composition influenced his
composing for instruments. Indeed, Stockhausen originally intended to write
Gruppen for both orchestral and electronic forces but the electronic element was
eventually abandoned due to practical and economic constraints [Misch et al, 1998].
The composer describes the spatial aspects of this work as follows;
The spatial separation of the groups initially resulted from the superimposition of
several time layers having different tempi which would be unplayable for one
orchestra. But this then led to a completely new conception of instrumental music in
space: the entire process of this music was co-determined by the spatial disposition of
the sound, the sound direction, sound movement (alternating, isolated, fusing,
rotating movements, etc. ), as in the electronic music Gesang Der Jnglinge for five
groups of loudspeakers, which was composed in 1955/56 [Moritz, 2002].
The spatial segregation of the three groups clearly helps to distinguish the
different layers of material but the more elaborate spatial effects also seem to be quite
effective. Various reviews of performances of this work have commented on the
dramatic effect of a single chord in the brass instruments travelling around the hall,
and this spatial movement is also quite apparent in stereo recordings of this piece (in
133
which the three groups are generally panned hard left, centre and hard right). The
approach to space used in Gruppen was further developed in Carre (1959-1960)
which was composed by Stockhausen and his assistant at the time, the British
composer Cornelius Cardew. This piece was composed for four orchestras which as
the name suggests (Carre literally means square in French) were arranged in a square
around the audience, as shown in Figure 8.6 As with Gruppen, similar orchestration
is used for each group with the addition of a mixed choir of eight to twelve singers.
The vocal text consists largely of phonemes and other vocal sounds chosen for their
relationship to the instrumental sounds.
134
135
In 1958 Stockhausen published two articles in the German music journal Die
Riehe entitled Electronische und Instrumentale Musik (Instrumental and Electronic
music) [Stockhausen, 1975a] and Musik in Raum (Music in Space) [Stockhausen,
1975b]. In the latter essay, Stockhausen briefly discusses the history of Western
spatial music before going on to outline the functional use of space in his own work.
The composer describes two difficulties which often occur in serialist music. Firstly,
the extensive use of serialist processes results in music in which every parameter is
constantly changing, and this often results in a rather static, pointillistic texture.
Secondly, in order to articulate longer time-phrases, one parameter must remain
constant and dominate, but this is in direct contradiction with the serialist aesthetic.
Stockhausen suggests that the spatial distribution of sounds can be used to articulate
longer time-phrases and structure the material, and also to clarify the complex
relationships between the different layers [Stockhausen, 1975b]. Stockhausen also
discusses the perception of distance in terms of air absorption, reverberation and early
reflections which, despite the typically characteristic terminology, compares relatively
well to the current research reviewed earlier in this thesis. Stockhausen discusses the
relative nature of the perception of distance, and notes the importance of source
recognition in this context and the difficulty in estimating the distance of unknown,
synthesized sounds. Consequently, Stockhausen concluded that the distance of a
source is a secondary musical parameter which is determined by the timbre and
loudness of the source signal, and that the direction of the source should be the
primary spatial parameter which is related to overall serial structure. Interestingly
136
Fig. 8.7 Spatial intervals and directions from Stockhausens Musik in Space
Stockhausens next work, Kontakte (1958-60), would complete the idea for a
combination of instrumental and electronic music originally considered for Gruppen.
This famous work can be performed as a four-channel tape piece or as a mixed-media
work for four-channel tape, live piano and percussion. As the title suggests, this work
explores the various points of contact between static instrumental sounds and dynamic
electronic textures, between different spatial movements, and also the temporal
relationship between pitch and rhythm. At one point in this piece, Stockhausen
dramatically illustrates his theory of the unity of musical time as a high-frequency,
clearly pitched tone smoothly transforms into a slower rhythmical procession of
clicks. Although direction, or more precisely changes in direction, is the primary
spatial element in Kontakte, distance is also used to support dynamic aspects of the
music, such as the fast oscillating spatial movements which emphasize certain loud
dynamic sounds. In addition, spatial depth is suggested through the contrast of a
dense layer of material in the foreground which is suddenly removed to reveal another
137
more distant layer in the background. In order to generate dynamic spatial trajectories
and rotation effects, Stockhausen developed a rotating loudspeaker mechanism which
is surrounded by four microphones (see Figure 8.8). Sounds played back with the
rotating loudspeaker are recorded and reproduced using four corresponding
loudspeakers positioned around the audience. The physical rotation of the
loudspeaker results in a Doppler shift, time varying filtering, phase shifts and other
distortions which are difficult to accurately reproduce electronically [Roads, 1996].
Six distinct spatial movements are used in Kontakte, namely rotation, looping,
alternation, a static distribution with a duplicated source, a static distribution with
different sources, and single static sources. These spatial motifs are implemented at
different speeds and directions and interact with each other, and with the static
instrumental performers. This series of spatial movements is treated in the same way
as pitch and rhythm using a system based on the change in speed and angular
direction. In fact, Stockhausen equates the spatial movements with rhythm, arguing
that changes in spatial location can articulate durations in exactly the same way [Cott,
1973]. The interaction between the instruments and the electronics is one of the most
important aspects of this work and, as in Gesange der Junglinge, Stockhausen
effectively positions the concrete sounds, now a piano and percussion instead of a boy
soprano, within the electronic texture. Dramatic crescendos in the instruments seem
to trigger electronic textures while at other times the pitch and rhythmic motion of the
instrumental parts reflect the spatial motion of the electronic sounds.
138
One of the high points of avant-garde electronic music was undoubtedly the
extravagant international expositions of the late 1950s and 60s which featured
multimedia art works and music and performances by such prominent composers such
as Karlheinz Stockhausen, Edgard Varse and Iannis Xenakis. An elaborate
multimedia environment was commissioned by Philips for the 1958 Brussels World's
Fair to showcase their new advances in audio and visual technology. The pavilion
(see Figure 8.9) was designed by the architect and composer Iannis Xenakis in the
form of an electronic poem [Zvonar, 2004]. It contained within its unusual
hyperbolic paraboloid structure a long unbroken projection surface with elaborate
lighting and projection equipment and an eleven channel multi-track tape system
which could be routed to over four hundred loudspeakers. The main musical element
of the program consisted of Edgard Varse' tape piece Pome Electronique, which
was synchronized to the visual effects and dynamically distributed through the space
in nine different sound routes using a switching system controlled with an
additional tape track.
The architect and designer of the Philips Pavilion, Iannis Xenakis, was also a
composer and his tape piece Concrte PH was played as an interlude between shows.
Xenakis had moved to Paris from his native Greece in 1947 and took a position as an
assistant to the architect Le Corbusier. In his spare time, Xenakis studied composition
and produced his first major work, Metastasis in 1954 after studying with Olivier
Messiaen. An architectural influence is evident in the score for this work which
features massed string glissandi in which each individually notated instrument is
arranged according to the large scale, deterministic structure (see Figure 8.10). In a
1954 article, Xenakis describes this approach as follows, "the sonorities of the
orchestra are building materials, like brick, stone and wood. . . the subtle structures of
orchestral sound masses represent a reality that promises much [Hoffman, 2001] .
Xenakis went on to use the structural design of Metastasis as the basis for the wall
139
Fig. 8.9 The Philips Pavilion at the 1958 Worlds Fair in Brussels
140
for multi-channel tape was composed for the 1970 Worlds Fair in Osaka and was
performed in the Japanese pavilion through eight hundred loudspeakers arranged
above, around and under the audience. Stockhausen was also present at Osaka 70 and
his music was performed twice a day for 183 days in the German Pavilion by twenty
instrumentalists and singers (see Figure 8.12). The composer himself controlled the
sound projection from a position in the centre of the spherical venue which contained
fifty-five loudspeakers arranged in seven rings, and six small balconies for the
musicians. The design of the auditorium was largely based on Stockhausens
specifications, which are clearly expressed in the quote at the beginning of this
section. As with his earlier works, Stockhausen again used spatial trajectories to
articulate different musical events. However, the spatial distribution was now
implemented in real-time. This approach is somewhat similar to the live spatial
diffusion practised by Pierre Schaefer, albeit with a greater emphasis on geometrical
paths. Stockhausen describes the experience as follows:
I can create with my hand up to six or seven revolutions a second. And you can
draw a polyphony of two different movements, or let one layer of sound stay at the left
side, then slowly move up to the centre of the room, and then all of a sudden another
layer of sound will start revolving like mad around you in a diagonal circle. And the
third spatially polyphonic layer will just be an alteration between front-right and
back-left, or below me and above me, above me and below me, alternating. This
polyphony of spatial movements and the speed of the sound become as important as
the pitch of the sound, the duration of the sound, or the timbre of the sound. [Cott,
1973].
143
changes color, because different distances occur between the sound sources.
[Felder, 1977]
144
and he criticized Gruppen as not really being spatial because all the orchestras have
brass, woodwinds, and percussion, so the direction and the tone quality cannot
indicate the source of the material [Harley, 1997]. However, this criticism is not
entirely justified as Stockhausen is clearly using the spatial distribution of the
musicians to articulate different layers of material, albeit with similar instrumentation.
While the delineation of spatial locations with different timbres may be necessary
when many disparate layers of material are operating concurrently, such as in the
music of Henry Brant, this is not necessarily required when each group produces
intermittent, rhythmically and harmonically distinct passages of music, as in the
moment to moment form typically employed by Stockhausen.
One of the most contentious aspects of Stockhausens treatise on spatial music
was his proposal for the serialization of direction based upon the division of a circle
into a series of proportions. Critics argue that the absolute directions specified on
paper will not be perceived by the audience [Harrison, 1999; Harley, 1998a], and the
tests discussed earlier in this thesis would seem to support this view. The results of
these tests illustrate the significant variations in perceived source direction which
occur with electronic spatialization systems and suggest that a reliable and absolute
perception of direction is difficult to achieve electronically. This problem is
exacerbated in a performance setting when the audience is seated at different locations
within the loudspeaker array. The apparent contradiction between the compositional
design and the perceptible results in the music of Stockhausen, and other composers,
is not only evident in their use of space, but also in other areas, as illustrated by the
quote at the beginning of this section. Some composers questioned whether the equal
division of parameters such as pitch, motion, duration and form on paper, translates to
an equal division in the perception of the listener [Bernard, 1987]. The twelve-tone
music of the Vienna school which preceded serialism had primarily focussed on a
serialized pitch row which was, in theory at least, evident in the resulting pointillistic
texture. By 1955, however, Stockhausen had begun to use these serialist procedures
to control the overall character of large groups of material, in what he referred to as
"group composition". In this aesthetic, it is questionable whether the audience is
intended to perceive the absolute and tightly controlled internal relationships within
the composition. Instead perhaps, it is the overall result of these procedures which is
important. Serialism began as a negation of tonality, and serialist procedures serve to
eliminate any perceivable melody or theme, or any defined tempo or rhythm. These
145
purely negative goals could also be achieved by simply randomizing every parameter,
yet the controlled bursts of activity in Kontakte or Gruppen sound anything but
random. In effect, the serialist procedures eliminate any repetition, of pitches,
rhythms or indeed spatial movements, while also maintaining a definite coherence in
the audible material. In these works, direction is therefore divided into a series of
proportions in the same way as every other parameter, and it is the consistency of this
approach which produces the non-repeating, yet entirely coherent material in these
works. The precisely specified spatial locations presented in Musik in Raum are
therefore perhaps not intended to be perceived in an absolute sense, but rather as a
way to remove any recognizable or re-occurring spatial motifs. Stockhausen therefore
uses space to create a sense of perspective in the listener which is not in fact fixed, as
in the classic concert listening experience, but varying. This approach ties in with the
composers overall aesthetic in which the music is similarly removed from a definite
tonal, harmonic or rhythmical centre [Cott, 1973].
Stockhausens use of space, particularly in his early career, suggests that the
composer was well aware of the perceptual issues with his approach. Musik in
Raum contains a detailed and relatively accurate assessment of the perception of
auditory distance and also suggests that a perceptually appropriate scale of directions
should be determined by listening tests. In a later interview, Stockhausen clearly
recognizes the importance of the source material in terms of its perceived spatial
movement:
I can say in general that the sharper the sound and the higher the frequency, the
better it moves and the clearer its direction. But Id also say that the more a sound is
chopped, the better it moves in space. And the sharper the attack of each segment of
a sound event, the better it moves. Whereas a continuous or harmonious sound or a
spectrum of vowels doesnt move very well. [Cott, 1973]
which Stockhausen was no stranger) which concentrate solely on the serial control of
direction discussed in Musik in Raum. Consider the following statement:
"Stockhausen dismisses the use of space by Gabrieli, Berlioz, and Mahler as being
too theatrical, and argues instead that direction is the only spatial feature worthy of
compositional attention because it could be serialized. [Harley, 1998a]
Although Stockhausen does state that direction is the only spatial parameter suitable
for serialization, he clearly does not consider this to be the only spatial feature
worthy of compositional attention [Harley, 1998a] . As discussed previously in this
Chapter, distance is used extensively in Kontakte and Gesang der Jnglinge but, as
this parameter is highly subjective, and dependent on multiple parameters such as
amplitude, timbre and the nature of the source signal, it is not suitable for serial
control.
Stockhausens acoustic spatial music also displayed an awareness of the
limitations of the medium. His use of spatial movement in Gruppen was one of the
first attempts to replicate stereophony using acoustic instruments and is particularly
successful in this regard due to the relative restraint displayed by the composer.
Spatial movements are implemented, often in isolation, as short, distinct gestures, and
this helps to clarify the perceived movement. A more complex spatial movement is in
fact implemented only once in the dramatic and famous passage shown in Figure 8.4
where a single travelling chord rotates around the distributed brass instruments. The
results of experiments presented in Chapter Six indicate that the presence of an
additional distracting stimulus reduced the localization accuracy for the primary
source [Blauert, 1997]. This suggests that when these spatial movements are isolated
in this way, they are much more likely to be clearly perceived by the audience.
However, if multiple complex trajectories are occurring simultaneously, it will be
much harder for the listener to determine the precise trajectory of each source. In
addition, the fragility of movements created using stereophonic processes has been
clearly demonstrated and this too imposes perceptual limitations on the complexity of
the spatialization process. It would therefore appear to be quite difficult to justify the
elaborate and visually-orientated spatial schemes implemented by composers such as
Varse and Xenakis. Xenakis appeared to reach the same conclusion much later in
his career, as evident in the following comment made by the composer in 1992.
In reality, sound movements are usually more complex and depend on the
architecture of the performance space, the position of the speakers and many other
things. When you want to reproduce such a complicated phenomenon with live
147
musicians playing one after another with amplitude changing in the same way that
you change the levels in a stereo sound projection, sometimes it will work and
sometimes it will not. It depends on the speed of the sound as well as on the angle of
two loudspeakers or musicians, that is, on the relative position of the listener. These
two considerations are equally important.
Xenakis Music, Space and Spatialization, 1992: 67)
148
150
The approach developed by Henry and Bayle was adopted by other composers
and institutions such as the Groupe de Musique de Bourges in Belgium. The
Birmingham ElectroAcoustic Sound Theatre (BEAST) at the University of
Birmingham is another important centre for research in this area. Although these
151
groups differ in terms of the precise technical setup and loudspeaker layout, they use
many similar working methods and techniques. The source material is generally a
stereo CD, and is often the commercial CD release of the piece. The stereo track is
routed to a special mixing desk which allows the diffusion engineer to control the
routing of the stereo track to different loudspeaker pairs. Often this will be a
commercial desk, reverse-engineered so that it takes a stereo signal as the input, and
each individual fader channel feeds a different loudspeaker pair.
The diffusion process is as much concerned with adapting the material for the
particular performance space as it is with the spatial articulation of the material. For
this reason, the loudspeaker pairs are often specifically arranged in an attempt to
preserve the stereophonic image for as much of the audience as possible. Jonty
Harrison, who works at the University of Birmingham, illustrates this approach using
the two layouts shown in Figure 8.15 [Harrison, 1999]. In a normal two-channel
stereo system, only listeners seated in the sweet spot (point (a) in Figure 8.15 (left))
will perceive the stereo image correctly. At point (b), the stereo image will collapse
to the left speaker due to the precedence effect, while at point (c) listeners will
experience a significant hole-in-the-middle effect due to the wide loudspeaker angle at
this location close to the loudspeakers. Meanwhile a distant listener at point (d) will
perceive a drastically narrowed image. Diffusion systems attempt to overcome these
problems through the introduction of additional pairs of loudspeakers and a typical
layout is shown in Figure 8.15 (right). In this arrangement, the main stereo pair has
been narrowed to reduce the hole-in-middle effect for listeners close to the stage.
This is supported by another pair of similar loudspeakers positioned at a wider angle
which can be used to increase the image width as necessary. Additional distance
effects are supported through the use of a distant loudspeaker pair, positioned at the
back of the stage and angled across the stage. Finally, a rear pair is added so that the
stereo image can be extended out from the stage and around the audience. This group
of loudspeakers is described as the "main eight" in the BEAST system and this layout
is described by Harrison as the absolute minimum arrangement required for the
playback of stereo tapes [Harrison, 1999]. More loudspeakers are often added to the
main eight to further extend the capabilities of the system. For example, additional
side-fill loudspeakers are often added to facilitate the creation of smoother movements
from front to back. Various elaborate systems have been developed, such as the
152
153
Diffusion has traditionally been controlled manually but some work has been carried
out with automated processes and digital spatialization techniques [Truax, 1999;
Moore, 1983]. Harrison suggests that this approach reflects the primarily manual
processes used in the studio by composers in the early days of Musique Concrte
[Harrision, 1999]. Others, like Smalley have compared the spatial gestures produced
by the diffusion engineer to the physical gestures of traditional instrumental
performers [Smalley, 2007]. However, although the diffusion process clearly adds
something to the performance, the diffusionist can only emphasize the pre-existing
material in the work. Clearly then for a successful performance of a diffusion piece,
the composer must organize the material in a way that supports its eventual
performance. Denis Smalleys theory of spectromorphology details how a gestural
approach can be used as a structuring principle in electroacoustic, and particularly
acousmatic music.
154
acoustic of the listening room. He therefore divides the perceived space into the
composed space (which contains the spatial cues created by the composer), and the
listening space (the space in which the composed space is heard), as shown in Figure
8.17. The composed space consists of both the internal space of the sounding object,
such as the resonances of a struck enclosure, and the external space of the
environment containing the sound object, which is made apparent through reflections
and reverberation. The listening space is also subdivided into the personal space,
which relates to the precise position of the listener within the venue, and the diffused
space created by the various loudspeakers distributed around the venue. Smalley
suggests that, in his experience, the perception of a number of parameters will be
different when the work is transferred from a single listener in a studio to a large
performance space [Smalley, 1997]. Spatial depth which is captured using stereo
recording techniques or synthesized using artificial reverberation can easily create
images which appear to originate from outside the array for a single listener.
However, in a larger space these images may instead become superimposed within the
space, rather than beyond the physical loudspeaker positions. Similarly, creating a
sense of spatial intimacy [Smalley et al, 2000] becomes much more difficult as the
size of the listening area increases. The spatial variants suggested by Smalley are
shown in Figure 8.17 [Smalley, 1997].
156
and technical formats. Spectromorphology can be used to relate and structure a wide
variety of sounds as either gesture or texture, performance or environmental. Human
activity, whether in the form of traditional instrumental performance or the
manipulation of non-traditional instruments (such as the garden pots in Empty
Vessels), can be related to synthesized sounds via the shaping of dynamic spatial and
spectromorphological gestures. In addition, environmental recordings can be related
to continuous synthesized or processed sounds without accented attack and decay
envelopes. Soundscape composition [Truax, 2008] is a compositional aesthetic which
utilizes the latter approach and is predominantly based upon environmental sounds
and textures. Luc Ferraris Presque Rien No. 1 (1970), which consists solely of
layered recordings from a day at the beach with a minimum level of manipulation of
the material, is a well known example of this style of composition. The Vancouver
Soundscape (1970) by the World Soundscape Project similarly consists of direct
reproductions of natural soundscapes with a minimum level of human intervention.
Since the seventies, soundscape composition has developed beyond this minimalist
approach to include digital synthesis and multichannel techniques and the Canadian
composer Barry Truax is one of the chief exponents of this aesthetic. His early work
used granular synthesis and a quadraphonic speaker system to create highly textural
works with a strong environmental character. In later works such as Pacific (1990),
or Basilica (1992), Truax used granulated environmental recordings with an
octophonic, eight-channel array and multiple decorrelated granular streams to create
an immersive sonic environment. Truax has argued that the avoiding the
representational meaning of environmental sounds is difficult, stating that
environmental sound acquires its meaning both in terms of its own properties and in
terms of its relation to context [Truax, 1996]. Despite this difficulty, environmental
recordings have also been used in a more symbolic fashion in which the different
recorded sounds and spaces are used to tell a sort of narrative. Various composers
and theorists have explored this compositional aesthetic and suggested various
symbolic interpretations of different spaces and movements [Trochimczyk, 2001;
Wishart, 1985].
The gestural use of space suggested by Denis Smalley in his theory of
spectromorphology originated in the acousmatic tradition of stereo diffusion, but this
idea is equally applicable to compositions for multi-channel loudspeaker arrays, or
mixed-media electroacoustic works. The notion of gestural shaping also suggests an
157
obvious approach to works which combine acoustic and spatialized electronic sounds
as the spectromorphological profile of the synthetic sounds can be deliberately
designed to match or mimic the instrumental gestures of the performers.
158
sounds within the overall serialist structure, as discussed in detail in Chapter Eight. In
addition, various points of contact between the static instrumental sounds and
dynamic electronic textures are created through crescendos in one part which provoke
a response from the other. Harrison suggests that the unusual loudspeaker
arrangement also contributes to the success of this piece. As the four loudspeakers are
placed in a cruciform arrangement rather than in the corners of the space, this means
that three of the four loudspeakers and the live instruments (piano and percussion) are
in front or to the sides of listeners with only one loudspeaker behind [Harrison, 1999].
Therefore, the instrumental sounds and most of the electronic parts are produced
largely from in front of the audience, this helps to unify the two parts into an
integrated whole.
Although Kontakte successfully integrates the static instrumental performers
and the dynamic spatialized electronic parts within the work as a whole, in much of
the piece the static spatial locations of the performers is actually deliberately
contrasted with the dynamic spatial trajectories in the electronic part. The various
points of contact between the two spaces is then heightened for dramatic effect in
much the same way as other elements in this piece such as pitch and rhythm. The
composer himself wrote that in works which combine electronic and instrumental
music, there remains the problem of "finding the larger laws of connection beyond
that of contrast, which represents the most primitive kind of form [Stockhausen,
1975a]".
In electroacoustic works such as Kontakte, the performers must perform while
listening to a click track in order to synchronize with the tape part. However, some
composers were uncomfortable with the rigidity of this arrangement, and the French
composer Pierre Boulez was a particularly vocal critic, stating that give and take
with the tempo of a piece is one of the basic features of music. It is perhaps
unsurprising that Boulez was so attuned to this particular limitation considering his
career as a conductor and he would go on to compose for one of earliest systems for
the real-time processing and spatialization of sounds, an approach which would
capture all the spontaneity of public performance, along with its human
imperfections [Boulez et al, 1988].
160
Faced with these difficulties, Boulez' abandoned his work with electronic sounds and
it would be over twenty years before Boulez would return to this area. His
opportunity to do so came in 1970 when he was asked by the French president
Georges Pompidou to create and head a new institute for musical research and
composition, the Institut de Recherche et Coordination Acoustique/Musique
161
(IRCAM). The institute opened in Paris in 1977 and, along with the Centre for
Computer Research in Music and Acoustics (CCRMA) at Stanford, IRCAM quickly
became one of the most important centres for research in digital music technology.
162
assisted by Andrew Gerzso in the design of software patches for the 4X processor and
several technicians were required during the performance to manage the various
devices. Boulez positioned the instruments and loudspeakers in Rpons using a very
similar layout to that of the earlier work, Posie pour Pouvoir (see Figure 9.2).
According to the composer, the un-amplified and unprocessed orchestra is situated in
the centre of the auditorium in order to provide a clear visual focus for the audience
[Boulez et al, 1988]. The audience is therefore seated in-the-round, surrounding the
orchestra, and within the circle formed by the six soloists and the loudspeaker array.
The title of this piece comes from a medieval French term for the antiphonal choral
music discussed earlier in this thesis. This call-and-response process is implemented
throughout Rpons and the dialogue between the soloists and the main orchestra is
particularly reminiscent of medieval choral antiphony. Boulez relates this process of
multiple responses (the orchestra) to a single question (the soloist) to the notion of the
multiplication and the proliferation of sounds. This idea is also implemented
electronically as various processes multiply single instrumental notes into a multitude
of notes, chords or timbres, all of which are related to the original in a clear way.
After an opening movement performed solely by the central orchestra, the six
soloists enter dramatically with different but concurrent, short arpeggios. These
instrumental arpeggios are captured by the 4X system and circulated around the
loudspeaker array in the trajectories shown as coloured arrows in Figure 9.2. The
composer describes the effect of this passage as follows;
163
The attention of the audience is thereby suddenly turned away from the centre of the
hall to the perimeter, where the soloists and the speakers are. The audience hears the
soloist's sounds travelling around the hall without being able to distinguish the paths
followed by the individual sounds. The overall effect highlights the antiphonal
relation between the central group and the soloists by making the audience aware of
the spatial dimensions that separate the ensemble from the soloists and that separate
the individual soloists as well. [Boulez et al, 1988]
The various spatial trajectories are therefore not intended to be perceived directly but
are instead used to distinguish the different electronic gestures associated with each of
the six soloists. This effect is further exaggerated through the use of different
velocities for each spatial trajectory, which are determined by the amplitude envelope
of the original instrumental gesture. Each instrument has a similar amplitude
envelope (see Figure 9.3) consisting of a sharp attach and an slowly decreasing decay,
however, the duration of the decay depends both on the pitch of the note and also the
instrument on which it is played [Boulez et al, 1988]. Therefore, the velocity of each
trajectory decreases proportionally to the decay of the associated instrumental
passage, linking the instrumental gesture of the soloist to the spatial gesture of the
electronic part.
164
loudspeaker, slowing the spatial trajectory. This process illustrated in Figure 9.4 for a
system with four loudspeakers, and hence four flip-flop units.
The themes of multiplication and displacement are further developed through the
recording, processing and playback by the 4X processor of arpeggiated chords
performed by each soloist. Boulez developed the idea of an arpeggio as a
displacement in time and pitch which may be applied, not just to the notes of a chord,
but also to the spreading of the electronic material in pitch, time and space. Boulez
describes this process as an arpeggio (the multiple recorded and processed copies
created by the 4X) of an arpeggio (the sequentially played notes of the instrumental
chord) of an arpeggio (the individually cued soloists) [Boulez et al, 1988]. The
speaker switching mechanism could also be interpreted as a form of spatial arpeggio
and this is perhaps another reason why this switching mechanism was used instead of
a standard panning algorithm.
As with many large-scale works of spatial music, Rpons has been performed
relatively infrequently. As always, finding a venue which can accommodate the
various distributed musicians and loudspeakers is difficult, and the dependence of this
piece on highly specific hardware is an additional limitation in this case. IRCAMs
165
166
"The same factors are present, but they are considered differently because the focus
of a live performance visually and musically - is the performer. So, I don't want to
use as full a diffusion system as I do for tape pieces, because overdoing the diffusion
will tend to undermine the carefully considered musical relationships between the live
performer and the content of the acousmatic domain. - If you have more than one
performer, there are different considerations, because the visual and sonic weight in
front, on stage, is increased. Therefore, I think that there is more for the eye to focus
on and follow, and the acousmatic possibiities become reduced. For example, one
can't have a lot of visual silences on stage, where people are sitting doing nothing. "
[Smalley et al, 2000]
167
168
resonant filters are used to create a variety of sustained drone including a heavily
distorted and sustained drone taken untreated from the original performance. The
various drones represent the most abstract material, and hence remote surrogacy.
10.1.1 Analysis
The title of this piece is a reference to the modern, chaos-based religion
founded in either 1958 or 1959 which has been described as both an elaborate joke
disguised as a religion, and as a religion disguised as an elaborate joke. The number
five is highly important in this parody religion and Discordianism is structured
accordingly. It is in 5/4 time, it is composed of five, twenty-five bar sections, and it is
approximately five minutes long. It also uses elements of chaos theory in its
construction, specifically in the granular synthesis algorithm which incorporates the
classic population-growth model, and this chaotic process, combined with the
preciso e meccanico rhythmical activity is intended to reflect the irreverent
philosophy referred to in the title.
The opening section of this work is marked by repeated sequences (in
multiples of five) of a plucked guitar harmonic which cycle around the four channels
and builds in dynamic to a sudden decrescendo. This first-order surrogacy gesture
interacts with the other material in various ways, sometimes triggering a short burst of
feedback, sometimes effecting the activity or density of the bubbling, spatial
granulated texture operating in the background, or sometimes provoking simply
silence. Gradually, the density and activity of the granular texture increases until at
the one minute mark, a burst of feedback triggers a heavy, distorted feedback drone in
both front channels. The sustained nature of this drone extends the second-order
surrogacy even further, although occasional human gestural activity from the original
performance remains. The short feedback bursts now function in a structural sense,
marking out five bar intervals and triggering the introduction of new material such
rapid granulated tremoloed notes and waves of white noise produced by resonant
filters. These additional drones slowly drift around the four channels, in contrast to
the static distorted drone locked to the front channels. The dynamic and density of the
material steadily increases before a brief decrescendo which is immediately followed
by a sudden, highly dynamic reintroduction of all the material, triggered again by a
short burst of feedback.
170
This highly dramatic point signals the start of the third and final section of this
work and the gradual introduction of four layers of percussive material proceeding at
different tempos from the four corners the array. It also forcefully reintroduces both
first-order and remote surrogacy and finally confirms the connection between them in
the form of spatially distributed layers of clearly instrumental material, operating at
different tempi, and a granulated version of the distorted drone from the second
section. This new, heavily distorted drone now moves dynamically in space with
dynamic and irregular lateral movements created by the stereo spreading of the
granulation process and a rapid back and forth movement between the front and rear
channels using amplitude panning. The final section steadily increases in dynamic
and intensity as more layers of rhythmic and granulated material are added building to
a sudden crescendo and rapid cut-off before a brief, swinging spatial movement of the
distorted drone which is finally, cut off by one last burst of feedback.
171
10.2 Case Study II: Sea Swell (for four SATB choirs)
The initial inspiration for this work came from some experiments carried out
by the author with granulated white noise. When a very large grain duration was
employed the granulation algorithm produced a rolling noise texture which rose and
fell in a somewhat irregular fashion and was highly reminiscent of the sound of
breaking waves. This noise texture was originally intended to accompany the choir,
172
however in practice this proved to be unnecessary as the choir could easily produce
hissing, noise textures which achieved the same effect.
The choir is divided into four SATB groups positioned symmetrically around
the audience and facing a central conductor. The piece utilizes many of Brants ideas,
albeit with a much less atonal harmonic language. In particular, spatial separation is
used to clarify and distinguish individual lines within the extremely dense, sixteen
part harmony, which is shown in Figure 10.1. If this chord was produced from a
single spatial location, it would collapse into a dense and largely static timbre.
However, this problem is significantly reduced when the individual voices are
spatially distributed. Certain individual voices are further highlighted through offset
rhythmic pulses and sustained notes, as shown in the score excerpt in Figure 10.2.
173
Fig. 10.3 Sea Swell notation example for opening sibilant section
10.2.1 Analysis
Brant used the term spill to describe the effect of overlapping material
produced by spatially distributed groups of musicians. However, Brant was also
aware of the perceptual limitations of this effect and he noted that precise spatial
trajectories became harder to determine as the complexity and density of the spatial
scene increases. The opening of this piece will contain a significant degree of spill
due to the common tonality of each part and will thus be perceived as a very large and
diffuse texture. Large numbers of these sibilant phrases are overlapped in an irregular
fashion to deliberately disguise the individual voices and create a very large, spatially
complex timbre of rolling noise-like waves. This opening section can be seen in full
in Figure 10.4.
The loose ABA structure of this work ends with a shortened refrain of the
opening section discussed earlier. The B section consists of a lyrical and highly
melodic progression of layered cannons. The lyrics consist of some fragments of
poetry by Pablo Neruda as well as some original lines by the author. Three different
lines are cannoned between the sopranos, altos and tenors in a form of spatial and
174
melodic counterpoint, while the basses reintroduce the sibilant sounds from the
introduction.
175
10.3.1 Analysis
The primary concept behind this piece is of an endless ascent, inspired by the
optical paradox of the Penrose stairs (see Figure 10.9) or its auditory equivalent, the
shepard tone. This concept is implemented using an ascending dominant seventh
scale on G, which is equivalent to the Mixolydian mode. This mode uses the same
series of intervals as the major scale (C major in this case), except the fifth is taken as
the tonic. Although this rather straightforward progression is the basis of the entire
work, it is generally disguised and distorted in different ways. At the beginning of the
piece, the cello begins with the tonic G, the first note in the progression, however, the
other three parts respectively start on the fifth, sixth and seventh notes in the series.
Similarly, different rhythmic groupings are used for each part and this is particularly
evident in the opening ascent which is shown in Figure 10.10. The cello holds each
176
note for twelve crotchets, the viola for ten, the second violin for eight and the first
violin for six, and in addition, the higher three parts begin after a single crotchet rest,
in order to reduce the number of times more than one part changes note
simultaneously and disguise the exact harmonic root of the progression.
grain durations, the original timbre is much more apparent, and this approach was
adopted here. As a result, the granulated drone has a similar timbre to the original
recordings, and therefore to the acoustic quartet. In addition, the bowing action which
is clearly audible in the original recordings is also apparent in the granulated drone.
Using Smalleys terminology, the drone retains aspects of the spectromorphological
profile of the original instrumental gestures, which will in turn bond with the live
instrumental gestures of the quartet. Multiple layers of these drones were finally
superimposed to create a very dense drone which nonetheless retains the
spectromorphology of the original instrumental gesture.
The large scale structure of this piece is delineated by a series of six
crescendos, which is clearly illustrated in Figure 10.11. At first, the electronic part is
positioned to the front, with the quartet, and is harmonically restricted to the same
dominant seventh scale. As the dynamic increases, the electronic drone becomes
increasingly chromatic as more notes, not present in the original scale are introduced,
and by the first crescendo, all twelve pitches are present in the electronic part. At this
point, the electronics move from the front to a wider, 90o spread. As the quartet begin
the following syncopated section (beginning at bar 45) the electronic part mimics the
short, stabbing rhythms of the instruments and extends them spatially. The
descending progression which ends at the third crescendo is reminiscent of the
opening ascent, but now the tone intervals of the original progression are filled,
resulting in a more chromatic descent by the quartet which reflects the corrupting
influence of the electronic part. The spatialization of the electronics matches the
descending progression of the quartet as it collapses from a wide spatial distribution
to a narrow, frontal position matching the position of the quartet.
The second section (beginning at bar 98) opens with a single, granulated cello
trill on the root G, diffused to the rear of the hall. This is the first time that the tonic is
clearly revealed, however, exact harmonic root is still slightly uncertain due to the
slow and irregular glissandoing of the time-stretched trill. The quartet explores this
idea with a series of slow, descending dominant seventh progressions which are
continually interrupted by glissandos and the introduction of chromatic notes by the
quartet and the electronics. By the fourth, and lowest crescendo, the quartet has
resolved to a somewhat stable harmonic centre, but this quickly disintegrates into the
penultimate ascent which is primarily driven by a two and a half octave upward
glissando on the cello. The fifth and final ascent is essentially a refrain of the initial
178
progression with a significantly increased dynamic. The final eight bars again reveal
the slow glissando in the electronic part which is finally echoed and resolved by the
cello and viola.
179
10.13 (d)). The example in Figure 10.12 indicates that the azimuth should move from
approx 10-20o to 90o over the course of ten bars.
(a)
(b)
(c)
(d)
The two channels can be spatialized directly from the patch to a multichannel
array using higher order Ambisonics, or alternatively, a stereo feed can be routed
directly from the patch into a mixing desk for manual diffusion to a loudspeaker
orchestra. If the spatialization is to be implemented using Ambisonics, then a MIDI
controller is also required to adjust the distance and two azimuth controls during the
180
performance. This controller (or the computer keyboard) is also used to trigger
playback of the various audio files.
181
182
SECOND ORDER
REMOTE SURROGACY
Strummed arpeggios
mono /stereo
Picked Theme
Mono
Franssen Intro/Outro
eight channel
Time-stretched Arpeggio
eight channel
Programmed Theme
Stereo
Triad Drone
eight channel
Programmed Sequences
stereo / eight channel
Strummed arpeggios
b-format
Fig. 10.12 Auto Harp source material, grouped according to order of surrogacy
183
184
String No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Harmonic
1
1
3
7
2
9
5
11
3
7
2
9
5
21
3
7
15
17
19
5
23
3
7
2
Freq
261. 6
261. 6
392. 4
457.8
523. 3
294.3
327.0
359.7
392. 4
457.8
523. 3
294.3
327.0
343. 4
392. 4
457.8
490. 5
278.0
310.7
327.0
376.1
392. 4
457.8
523. 3
10.4.1 Analysis
The introductory section of Auto Harp uses the loudspeaker amplitude
envelope shown in Figure 10.7 in conjunction with twenty-four sine tones which
correspond to the twenty-four strings of the autoharp. Each string is tuned to a
different harmonic (octave adjusted) of the lowest note (C 261.6Hz) as shown in
Table 8.1. Each harmonic is adjusted so it falls within a two octave range. This
spectral tuning produces a complex resonance with very little beating due to the close
harmonic relationship between each string. Each sine tone onset is presented in
sequence from the four loudspeakers to the right, moving from front to back, and is
shifted to the matching loudspeaker on the left using a transition time t of 60ms.
After all twenty-four sine tones have been introduced, reversed recordings of each
individual string are gradually introduced at the same real spatial locations as the
associated sine tone, and the entire sequence builds to a sharp crescendo. The overall
effect of this sequence is to delay and disguise the movement of the layered sine tones
from right to left, as the initial onsets are clearly localized to the right, while the final
185
reversed autoharps recording clearly crescendo on the left side. Localization during
the middle steady-state section will therefore vary depending on the acoustical
environment, the individual listener and the associated strength of the Franssen effect,
introducing an element of variability in a composition for fixed media.
The overall structure of Auto Harp is marked on the waveform of a
monophonic reduction of the eight-channel work in Figure 10.8. The crescendo
which marks the end of the initial Franssen section is articulated with a spatial sweep
from front to back on the left side of the array. This is immediately responded to by a
sweep through each individual note in a corresponding spatial motion from back right
to front right. The entire Play section which follows uses this type of spatial dialogue
between static arpeggio recordings and dynamic, algorithmic sequences. In each case,
a recorded arpeggio (mono, stereo or b-format) produces an algorithmic response,
which is constructed from the individual string recordings, pitch shifted up an octave.
The spectromorphological profile of each recorded arpeggio was carefully applied to
the algorithmic response by matching the duration of each sequence. Karlheinz Essls
Real-Time Composition Library for Max/MSP was used to create these algorithmic
responses [Essl, 1995]. The series object was used to create a random sweep through
all twenty-four notes, while the sneak object randomly selects notes but only chooses
notes adjacent to the currently selected note. This object was used to create the
distinctive and rapid sequences of notes which are prominent in this section.
The next section is introduced by a mono recording of the plucked sequence
which functions as the main melodic theme. The theme is presented four times in this
section, initially as the performed sequence in the original mono recording. This is
followed by a sequenced recreation of the theme using the individually recorded
strings which are spread across the front pair of loudspeakers. The final two iterations
of the theme use both the original progression, and various sequenced progressions
which are distributed across each of the four loudspeaker pairs. In this case, the
sequenced progressions are also transposed to harmonize with the initial progression.
A granulated time-stretched version of this progression is also introduced in this
section, shifting the focus away from the highly gestural material which has
dominated up to this point. The spatial distribution and dynamic of this drone alters
in response to the instrumental gestures and at times the drone itself displays strong
spatial gestures. This is evident in the crescendo that ends this section, and also at
186
348, when the diffuse drones builds and then suddenly collapses to low frequency
drone statically positioned in the front loudspeaker pair.
The Triad Drone section is the first to move away completely from the highly
gestural arpeggios and sequences which dominated the opening half of this piece.
The first harmonies which are not contained within the tuning system shown earlier
are also introduced in this section. Multiple stereo channels of a spectrally rich drone
were constructed from a recorded arpeggio which was time-stretched using the
granule opcode in the Csound processing language (Csound is discussed in more
detail in chapter 12.3). Two of these decorrelated stereo channels quickly begin a
slow upward glissando before eventually settling at two pitches to form the triad
drone which gives this section its name. The two pitch-shifted drones were tuned by
ear to produce a relatively consonant harmony. The tuning of these two new drones
corresponds approximately to the fifth and ninth harmonics of the original root note.
The penultimate section is, as its name suggests, primarily constructed from a
time-stretched arpeggio constructed using a granulation algorithm in Max/MSP. The
resulting drone quickly separates into a dense cloud of clicks created from the initial
plucking action and a thick, mid-frequency drone created from the resonance of the
instrument. This spatially diffuse sequence decays slowly and then builds again to a
final crescendo which is punctuated by occasional unprocessed arpeggios. The piece
ends with a reversal of the initial Franssen section, as the drone fades into associated
sine tones which collapse spatially in the same way as they appeared.
187
188
189
190
191
192
193
layered to eventually create a large, enveloping drone which gradually overcomes the
live performer and closes the piece.
11.3.1 Analysis
In this etude for hexaphonic guitar the strings are consecutively routed in pairs
through three instances of Ambience, the freeware reverb VST plugin by
Smartelectronix (see Figure 11.5). Various plucked harmonics and intervals are
sustained using the reverb hold function which is triggered with a MIDI foot
controller. After each group of three harmonics/intervals have been layered, the
combined drone is then dynamically routed to a spatialization algorithm (2nd order
Ambisonics using the ICST externals for Max/MSP) which chaotically pans the drone
around the entire array. This entire process is then repeated three times with
increasing intensity, culminating in a very loud drone which dominates the
instrumental part. This process is highlighted during the final section as the performer
remains motionless while the vast and spatially dynamic drone continues, culminating
in a loud crescendo.
194
195
12 Behavioural Space
Modern computer technology has now advanced to such a degree that an
ordinary laptop computer and suitable audio interface is capable of processing and
spatializing numerous channels of audio in real-time. These developments have
allowed composers to utilise more advanced algorithms to control the spatialization of
sounds. Flocking algorithms have proved to be particularly popular in this regard and
conceptually, the notion of flocking, or swarming, contains obvious connotations in
terms of spatial motion. Flocking algorithms such as Craig Reynolds Boids
[Reynolds, 1987], model the overall movement of groups of autonomous agents, such
as a flock of birds or a shoal of fish, and have been widely used in computer graphics
and animation. Recently, these algorithms have also been adapted for musical
applications including spatialization.
196
197
share many characteristics. Both use large numbers of individual agents or grains,
both produce macro-level complexity through relatively simple micro-level structures,
and both are indirectly controlled through various micro and macro level parameters.
Flocking algorithms would therefore appear to be highly suitable as a control
mechanism for granular synthesis and also suggests an obvious spatialization scheme
where each grain is positioned in space according to the location of a single agent in
the algorithmically generated flock.
IRCAMs Le Spatialisateur
These processes were combined so that each grain corresponds to a single element in
each flocking algorithm. The flock of individual grainboids are then spatialized using
one of the two spatialisation algorithms.
198
The source material used in this piece consists of noise/click samples and prerecorded tremoloed notes on viola and cello. The overall movement of each flock can
be controlled with a user controlled attract point. While the individual spatial
movement of each grainboid is controlled by the flocking algorithm, the overall
movement of each flock can be adjusted by the user in a number of ways. Firstly a
bounding box can be created which will contain the flock within a certain specified
area. Alternatively, the strength of the flocks attraction to the user-specified attract
point can be used to direct the flock to follow a particular point in space. In this
piece, noise/click samples and pre-recorded tremoloed notes on viola and cello are
associated with a particular flock of eight grainboids using the Max/MSP pitch shown
at the start of this section in Figure 12.3. Each flock follows a slightly different
overall trajectory, using the associated attract point with a medium attraction strength
setting so that the specified path is only loosely followed.
12.2.1 Analysis
The nature of the source signal has a large bearing on the effectiveness of the
perceived spatialization (see Chapter two) and the source material for this piece was
chosen with that in mind. Localization accuracy has been found to improve when the
source signal contains plenty of onset transients, and when broadband signals are
used. The initial testing of this system was carried out using a noise/click sample for
this reason and this material was then also used in the final composition. Tremoloed
string samples were chosen as they provide pitched material which also contains lots
of irregular transient information due to the rapid bowing action. In addition, the fast
irregular bowing motion recorded in the original tremolo string samples is well
matched to the granulation process.
Two spatialisation methods were used in order to investigate the perceptual
difference between these two techniques, namely amplitude panning and Higher
Order Ambisonics. A subtle yet distinct difference was perceived between these two
spatialization schemes, however, this difference is certainly also influenced by the
artificial reverberation and early reflections added by IRCAMs Le Spatialisateur
program. This perceptual difference, although relatively subtle, does help to spatially
distinguish the various flocks and heightens the spatial micro-complexity of the piece.
The Doppler Effect was utilized throughout the work, perhaps surprisingly
199
considering the presence of pitched material. However, the view was taken that the
pitch variations introduced by the spatial movement and resulting Doppler shift was
aesthetically pleasing and fitted the micro-variations in pitch resulting from the rapid
bowing action of the original samples, as well as supporting the perceived movement
of each flock.
200
201
granular event generators and instruments, entitled AmbiBoid were developed by the
author to implement various granular synthesis techniques such as granulation,
glisson synthesis and grainlet additive synthesis. Each grain is then spatialized using
the Boids algorithm and second order Ambisonics with distance modelling. 2
The score file generators read spatial co-ordinates from text files generated by
a Max/MSP patch containing Singers Boids object. Due consideration was given to
coding the Boids algorithm within either Csound or C. However, generating the
spatial trajectories in real-time has a number of benefits, namely;
The Ambiboib source code, along with various audio examples, can be downloaded at
www.endabates.net/ambiboid.
202
Once the desired spatial trajectories have been performed and recorded in
Max/MSP, the AmbiBoid utilities can be used to generate a Csound score file. Each
utility/instrument pair implements a specific form of granular synthesis with shared
parameters such as overall duration, number of grain streams, global reverb, etc. The
density is controlled via the inter-grain spacing which sets the amount of silence
between each consecutive grain as a multiple of that streams grain duration. The
density can be either fixed or dynamically varying between specified values at five
hit-points equally distributed across the total duration. The grain duration can also be
fixed or dynamically varying in a similar fashion. Various windowing functions such
as Gaussian, Hamming and Hanning windows can also be chosen. Three different
granular synthesis techniques were implemented, namely;
12.3.1 Granulation
The AmbiBoidgranule score generator and instrument performs straight
forward granulation of a monophonic sample. The grain durations can either be a
single fixed value for all streams, a single dynamically varying value for all streams,
or a randomly chosen duration for each stream varying between a given maximum
and minimum value. The playback position within the file for each grain is randomly
chosen and the playback rate of the audio sample is defined in the score file.
203
204
12.4.1 Analysis
Two distinct trajectories were used throughout this piece;
a fast spiralling trajectory with lots of vertical movement around the centre
point
The first trajectory was created specifically for use with glisson synthesis as it
contains significant vertical movement. Elevation values are mapped to the frequency
and direction of the glissando with the glissando start frequency restricted to linear
harmonic intervals, producing a more melodic texture. This can be clearly heard in
isolation at the very beginning of the piece. The second section begins with
synthesized tones created using the grainlet additive synthesis instrument and the slow
front-back trajectory. These synthesized tones provide the underlying harmony
which is derived from a two chord guitar progression (a granulated version of which
can be faintly heard later on). These strongly tonal textures are contrasted with
wideband noise textures which also follow the same front-back trajectory. These
rolling waves of noise were created using the granulation instrument and a heavily
distorted version of the original guitar progression. The underlying tonality of these
noise bursts becomes more apparent as the piece progresses and the grain duration
205
206
The above quote illustrates the vast range of issues related to the production,
composition and performance of spatial music which this thesis has attempted to
address. A composer of spatial music must consider specific aspects of the work such
as the types of sounds and the particular spatialization process involved, while also
remembering that many of these specific aspects may not be perceived directly by the
listener. As Smalley states the piece as a whole must be looked at, because the
whole is the space or spaces of the piece. The success of any work of spatial music
can therefore only be considered in terms of the overall compositional strategy which
describes the relationship between space and every other musical parameter.
This thesis has attempted to examine all of these issues in terms of auditory
perception as ultimately, the success of any work of spatial music depends upon the
perception of the listener. A number of different spatialization schemes were
analysed and a meta-analysis of the results of a large number of other listening tests
and simulations were presented. The results indicate that each spatialization scheme
has particular strengths and weaknesses, with the result that the most applicable
technique in any situation is dependent on the particular spatial effect required.
207
Fig. 13.1 Octagonal loudspeaker layout with centre front (a) and without (b)
209
211
this fact, and they made extensive use of spatial distribution in their work for this
reason.
The presentation of any work of spatial music to a distributed audience will
necessarily result in certain members of the audience being situated outside the sweet
spot. This will result in an unavoidable distortion in the spatial trajectory which is
perceived by each member of the audience, and this distortion depends upon the
position of the listener within the array. Moore suggested that the differences in
perception among listeners is analogous to perspective distortion in photography or
cinematography [Moore, 1983]. Although this is perhaps unsurprising, it raises
significant questions about spatial music compositions which attempt to create and
relate recognizable sound shapes, such as in certain works by Iannis Xennakis for
example. This form of spatial counterpoint, which McNabb describes as the motivic
use of space assumes that spatial locations and trajectories are clearly and
unambiguously perceived by every listener [McNabb, 1986]. However the results of
the listening tests presented in this thesis show that this is extremely difficult to
achieve. Even in the case of point sources which are clearly localized, each listener
will be orientated differently with regards to the loudspeaker array, and so will have a
different perspective on the spatial layout. In this case it is very hard to see how
spatial motifs can be clearly and unambiguously perceived unless they are restricted
to very rudimentary movements. In the words of the composer Henry Brant,
Ideas of that kind seem to me more an expression of hope than of reality. . It is
hard enough to make the sounds do what you want in sound without saying that the
sound should be shaped like a pretzel or something like that [Brant, 1967]
will not be precisely perceived by most, if not all the members of the audience, this is
perhaps not as critical as it first appears. Spatial movement in Kontakte is controlled
serially in the same way as rhythm and pitch. The precise angular location is not
intended to be accurately perceived by each member of the audience, instead, a
change in spatial location is used to indicate a certain temporal duration. So even
though each listener will perceive a slightly different spatial movement, they will all
perceive the same spatio-temporal movement. The abstract designs of these serialist
composers differ therefore from the abstract graphical designs discussed earlier in that
the specific trajectories and movements are not intended to be perceived by the
audience. The following quote from Pierre Boulez represents a very different
conception of spatial music to the highly visual form suggested by Xennakis and
Varse.
However, I do not want to suggest that one thereby plays as in tennis or car racing,
back and forth or in a cycle with everything following exactly. For me, sound
mixtures are far more interesting; one feels that they rotate somehow, but one can't
have the picture of an exactly comprehensible movement. With these different rates
and with these different times, one gets an impression, which is much richer for me
than a more normal circle or a loop to the left or right. For me, this is too anecdotal.
[Boulez et al, 1988]
Abstract and complex spatial designs can perhaps therefore be effective when used
indirectly in this fashion. This approach is somewhat reminiscent of other abstract
processes which have been used to indirectly create and control complex textures in
orchestral music. Consider this quote by the composer Gyorgy Ligeti from an
interview in 1978.
Technically speaking, I have always approached musical texture through partwriting. Both Atmospheres and Lontano have a dense canonic structure. But you
cannot actually hear the polyphony, the canon. You hear a kind of impenetrable
texture, something like a very densely woven cobweb. The polyphonic structure does
not come through, you cannot hear it, it remains hidden in a microscopic, underwater
world, to us inaudible [Bernard, 1987].
The dynamic spatial textures created using granular synthesis and flocking algorithms
would seem to function in the same way, as in this case only the overall motion and
the motion of sounds relative to each other will be perceived, while the individual
trajectories followed by each grainboid are not generally clearly perceptible. In this
case, it is the overall motion, rather than the specific location or direction, which is
important and hence, Ambisonics would seem to be a suitable spatialization scheme
for this purpose.
213
13.3 Conclusion
This thesis has attempted to examine the various interrelated factors which
influence the musical use of space in a performance context. It has been shown that
the optimum spatialization technique in any situation is highly dependent on other
factors, such as the overall compositional aesthetic. Various landmark works of
spatial music have been assessed in terms of the perceptibility of the particular
spatialization scheme adopted for that work, and also in terms of how the various
214
[Goebel, 2001]
215
216
14 List of Publications
P1
Bates, E. & Furlong, D., Score File Generators for Boids Based Granular
Synthesis in Csound, Proceedings of the 126th Audio Engineering Society
Convention, Munich, Germany, May 2009.
P2
P3
P4
217
In this paper we present a set of score file generators and granular synthesis
instruments for the Csound language. The applications use spatial data
generated by the Boids flocking algorithm along with various user-defined
values to generate score files for grainlet additive synthesis, granulation and
glisson synthesis instruments. Spatialization is accomplished using Higher
Order Ambisonics and distance effects are modelled using the Doppler Effect,
early reflections and global reverberation. The sonic quality of each synthesis
method is assessed and an original composition by the author is presented.
P2
This paper describes how polyphonic pickup technology can be adapted for
the spatialization of electric stringed instruments such as the violin, cello and
guitar. It is proposed that mapping the individual strings to different spatial
locations integrates the spatial diffusion process with the standard musical
gestures of the performer. The development of polyphonic guitar processing
is discussed and a method of adapting MIDI guitar technology for this purpose
is presented. The compositional and technical strategies used with various
augmented instruments is presented along with an analysis of three
compositions by the author for spatialized hexaphonic guitar.
P3
219
Bibliography
[Ahrens et al, 2009] Ahrens, J. and Spors, S., Spatial Encoding and Decoding of
Focussed Virtual Sound Sources, Graz Ambisonics Symposium, 2009.
[Ambisonics Standards, 2009] Analysis of Normalisation Schemes: A supporting
document for the "Universal Ambisonic Standard", Ambisonics Google Groups,
http://docs.google.com/Doc?id=df4dtw69_66d4khc7fb&hl=en
[Asano et al, 1990] Asano, F., Suzuki, Y. and Sone, T., Role of spectral cues in
median plane localization, Journal of the Acoustical Society of America, Vol. 88,
pp. 159-168, 1990.
[Baalman, 2006] Baalman, M., swonder3Dq: Auralisation of 3D objects with Wave
Field Synthesis, Proceedings of the Linux Audio Conference, 2006.
[Ballan et al, 1994] Ballan, O., Mozzoni, L. and Rocchesso, D., Sound spatialization
in real time by first reflection simulation, Proceedings of the International Computer
Music Conference, pp. 475-476, 1994.
[Ballas et al, 2001] Ballas, J. A., Brock, D., Stroup, J. and Fouad, H., The Effect of
Auditory Rendering on Perceived Movement: Loudspeaker Density and HRTF,
Proceedings of the 2001 International Conference on Auditory Display, 2001.
[Bamford, 1995], Bamford, J. S. An Analysis of Ambisonic Sound Systems of First
and Second Order, Doctoral Thesis, University of Waterloo, Canada, 1995.
[Barrett, 1990] Barrett, N., Spatio-musical Composition Stratagies, Organized
Sound, Vol. 7(3), 2003.
[Barrett et al, 2007] Barrett, N. and Otondo, F. Creating Sonic Spaces: an interview
with Natasha Barrett, Computer Music Journal, Vol. 31(2), pp. 10-19, 2007.
[Barron et al, 1981] Barron, M. and Marshall, A. H., Spatial Impression due to
220
222
[Boone et al, 2003] Boone, M. M. and de Bruijn, W., "Improved speech intelligibility
in teleconferencing using Wave Field Synthesis", Proceedings of the 114th Convention
of the Audio Engineering Society, Preprint 5800, 2003.
[Boulanger, 1986] Boulanger, R., Toward a new age of performance: Reading the
book of dreams with the Mathews electronic violin, Perspectives of New Music, Vol.
24(2), pp. 130155, 1986.
[Boulanger, 2000] Boulanger, R., The Csound book, MIT Press, 2000.
[Boulez et al, 1988] Boulez, P. and Gerzso, A., Computers in Music, Scientific
American, Vol. 258(4), 1988.
[Brant, 1967] Brant, H., "Space as an essential aspect of musical composition",
Contemporary Composers on Contemporary Music, Holt, Rinehart and Winston, pp.
221-242, 1967.
[Bregman, 1990] Bregman, A. S., Auditory Scene Analysis, 196-200, pp. 204-250;
pp. 455-490, MIT Press; 1990.
[Bromberger, 2007] Bromberger, E., Berlioz Requiem - San Diego Symphony
Orchestra Program Notes, 2007, www.sandiegosymphony.
com/uploads/pdfs/pdfmw060713.pdf, retrieved on 06/06/2009.
[Bronkhurst, 2002] Bronkhorst, A. W., "Modeling auditory distance perception in
rooms, Proceedings of the EAA Forum Acusticum Sevilla, 2002.
[Brungart et al, 1999] Brungart, D. S. and Rabinowitz, W. M., Auditory localization
of nearby sources. Head-related transfer functions, Journal of the Acoustical
Society of America, Vol. 106(3), 1999.
[Bryant, 1981] Bryant, D., The cori spezzati of St Mark's: myth and reality, Early
Music History, Vol. 1, pp. 165-186, 1981.
223
[Cage, 1957] Cage, J., Experimental Music, Address given to the convention of
Music Teachers National Association, Chicago, 1957. Reprinted in Silence, MIT
Press, 1961.
[Capra et al, 2007] Capra, A., Fontana, S., Adriaensen, F., Farina, A. and Grenier, Y.,
Listening Tests of the Localization Performance of Stereodipole and Ambisonic
Systems, Proceedings of the 123rd Convention of the Audio Engineering Society,
Preprint 7187, 2007.
[Caulkins et al, 2003] Caulkins, T., Corteel, E. and Warusfel, O., Wave Field
Synthesis Interaction with the Listening Environment, Improvements in the
Reproduction of Virtual Sources Situated inside the Listening Room, Proceedings of
the 6th International Conference on Digital Audio Effects (DAFX-03), 2003.
[Chowning, 1971], Chowning, J., The simulation of moving sound sources, Journal
of the Audio Engineering Society, Vol. 19, pp. 26, 1971.
[Clark et al, 1958] Clark, H. A. M., Dutton, G. F., Vanderlyn, P. B., The
'Stereosonic' Recording and Reproducing System: A Two-Channel Systems for
Domestic Tape Records, Journal of the Audio Engineering Society, Vol. 6(2), pp.
102-117, 1958.
[Corteel et al, 2003] Corteel, E. and Nicol, R., "Listening room compensation for
Wave Field Synthesis. What can be done?", Proceedings of the 23rd International
Conference of the Audio Engineering Society, 2003.
[Corteel et al, 2004] Corteel E., Caulkins T., Warusfel O., Pellegrini R., Rosenthal M.
and Kuhn C., Sound Scene Creation using Wave Field Synthesis, Actes Journes du
Design Sonore, 2004.
[Corteel et al, 2008] Corteel, E., Pellegrini, R. and Kuhn-Rahloff, C., Wave Field
Synthesis with increased aliasing frequency, Proceedings of the 124th Convention of
the Audio Engineering Society, Preprint 7362, 2008.
224
[Cott, 1973], Cott, J., Stockhausen; conversations with the composer, Simon and
Schuster, 1973.
[Daniel et al, 1998] Daniel, J., Rault, J. and Polack, J. D., Ambisonics Encoding of
Other Audio Formats for Multiple Listening Conditions, Proceedings of the 105th
Convention of the Audio Engineering Society, Preprint 4795, 1998.
[Daniel et al, 1999], Daniel, J., Nicol, R. and Moreau, S, Further Investigations of
High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging,
Proceedings of the 114th Convention of the Audio Engineering Society, Preprint
5788, 1999.
[Daniel, 2000] Daniel, J., Reprsentation de champs acoustiques, application la
transmission et la reproduction de scnes sonores complexes dans un contexte
multimdia, Doctoral thesis, Universit Paris, 2000.
[Daniel, 2003] Daniel, J., Spatial Sound Encoding Including Near Field Effect:
Introducing Distance Coding Filters and a Viable, New Ambisonic Format,
Proceedings of the 23rd International Conference of the Audio Engineering Society,
2003.
[de Vries, 1995] de Vries, D., "Sound enhancement by wave field synthesis: adaption
of the synthesis operator to the loudspeaker directivity characteristics", Proceedings
of the 98th Convention of the Audio Engineering Society, Preprint 3971, 1995.
[Dickins et al, 1999] Dickins, D., Flax, M., Mckeag, A. and Mcgrath, D., Optimal
3D speaker panning, Proceedings of the 16th International Convention of the Audio
Engineering Society, 1999.
[Dobrian, 2008] Dobrian, C., Doppler shift in a stereo Csound instrument,
http://music.arts.uci.edu/dobrian/dsp2001/csound/dopplertheory.htm, retrieved on
07/07/2008.
225
[Elfner et al, 1968] Elfner, L. F. and Tomsic, R. T., Temporal and intensive factors
in binaural lateralization of auditory transients, Journal of the Acoustical Society of
America, Vol. 43, pp. 747-751, 1968.
[Emmerson, 1986] Emmerson, S., The relation of language to materials, The
Language of Electroacoustic Music, pp. 20-24, Macmillan Press, 1986.
[Essl, 1995] Essl, K., Lexikon-Sonate. An Interactive Realtime Composition for
Computer-Controlled Piano, Proceedings of the 2nd Brazilian Symposium on
Computer Music, 1995.
[Farina et al, 1998] Farina A. and Ugolotti E., Software Implementation of B-Format
Encoding and Decoding, Proceedings of the 104th International AES Convention,
1998.
[Farina et al, 2000] Farina, A., Zanolin, M. and Crema, E., Measurement of sound
scattering properties of diffusing panels through the Wave Field Synthesis approach,
Proceedings of the 108th International Convention of the Audio Engineering Society,
2000.
[Felder, 1977] Felder, D., An interview with Karlheinz Stockhausen, Perspectives of
New Music, Vol. 16(1), pp. 85101, 1977.
[Fellget, 1975] Fellgett, P., Ambisonics. Part One: General System Description,
Studio Sound, Vol.17, pp.2022;40, 1975.
[Fels, 1996] Fels, P., 20 Years Delta Stereophony System - High Quality Sound
Design, Proceedings of the 100th Convention of the Audio Engineering Society,
Preprint 4188, 1996.
[Franco et al, 2004] Franco, A. F., Merchel, S., Pesqueux, L., Rouaud, M. and
Soerensen, M. O., "Sound Reproduction by Wave Field Synthesis", Project Report,
Aalborg University, 2004.
226
[Frank et al, 2008] Frank, M., Zotter, F. and Sontacchi, A., Localization
Experiments Using Different 2D Ambisonics Decoders, 25th Tonmeistertagung
VDT International Convention, 2008.
[Franssen, 1962] Franssen, N. V., Stereophony, Philips Technical Library, The
Netherlands, English translation (1964).
[Furlong et al, 1992] Furlong, D. J. and MacCabe, C. J., Loudspeakers, Listening
Rooms, and Effective Synthetic Auralization, Proceedings of the 93rd Convention of
the Audio Engineering Society, Preprint 3445, 1992.
[Furse, 2000] Furse, R., First and Second Order Ambisonic Decoding Equations,
http://www.muse.demon.co.uk/ref/speakers.html, retrieved on 03/06/2009.
[Gardner, 1962] Gardner, M. B., "Binaural Detection of Single- Frequency Signals in
Presence of Noise", Journal of the Acoustical Society of America, Vol. 34, pp. 18241830, 1962.
[Gayou, 2007] Gayou, E., The GRM: landmarks on a historic route, Organised
Sound, Vol. 12(3), pp. 203211, Cambridge University Press, 2007.
[Guastavino et al, 2004] Guastavino, C and Katz, B. F. G., Perceptual evaluation of
multi-dimensional spatial audio reproduction, Journal of the Acoustical Society of
America, Vol. 116(2), 2004.
[Guastavino et al, 2007] Guastavino, C., Larcher, V., Catusseau, G. and Boussard, P.,
Spatial Audio Quality Evaluation: Comparing Transaural, Ambisonics and Stereo,
Proceedings of the 13th International Conference on Auditory Display, 2007.
[Gerzon, 1974a] Gerzon, M. A., Sound Reproduction Systems, Patent No. 1494751.
[Gerzon, 1974b] Gerzon, M. A., Surround Sound Psychoacoustics", Wireless World,
Vol. 80, pp. 483-486, 1974.
227
228
229
230
[Kerber et al, 2004] Kerber, S., Wittek, H., Fastl, H. and Theile, G., Experimental
investigations into the distance perception of nearby sound sources: Real vs. WFS
virtual nearby sources, Joint Meeting of the German and the French Acoustical
Societies (CFA/DAGA), 2004.
[Komiyama et al, 1991] Komiyama, S., Morita, A., Kurozumi, K. and Nakabayashi,
K., Distance control system for a sound image, Proceedings of the 9th International
Conference of the Audio Engineering Society, pp. 233239, 1991.
[Kratschmer et al, 2009] Kratschmer, M. and Rabenstein, R., Implementing
Ambisonics on a 48 Channel Circular Loudspeaker Array, Proceedings of the Graz
Ambisonics Symposium, 2009.
[Ligeti, 1958] Ligeti, G., Metamorphoses of Musical Form, Die Reihe, Vol. 7, pp.
5-19, 1958.
[Litovsky et al, 1999], Litovsky, R.Y., Colburn, H. S, Yost, W. A., Guzman, S. J.,
The Precedence Effect, Journal of the Acoustical Society of America, Vol. 106(4),
pp. 1633-1654, 1999.
[Lund, 2000] Lund, T., Enhanced Localization in 5.1 Production, Proceedings of
the 105th Convention of the Audio Engineering Society, Preprint 5243, 2000.
[MacDonald, 1995] MacDonald, A., Performance Practice in the Presentation of
Electroacoustic Music, Computer Music Journal, Vol. 19(4), pp. 88-92, 1995.
[Machover, 1992] Machover, T., Hyperinstruments - MIT Media Lab Research
Report, 1992.
[MacPherson, 2002] MacPherson, E. A. and Middlebrooks, J. C., Listener weighting
of cues for lateral angle: The Duplex Theory of sound localization revisited, Journal
of the Acoustical Society of America, Vol. 111, pp. 22192236, May 2002.
231
232
233
[Misch et al, 1998] Misch, L., Hentschel, F. and Kohl, J. On the Serial Shaping of
Stockhausen's Gruppen fr drei Orchester, Perspectives of New Music, Vol. 36(1),
pp. 143-187, 1998.
[Moore, 1983], Moore, F. R., A general model for spatial processing of sounds,
Computer Music Journal, Vol. 7, pp. 615, 1983.
[Moore et al, 2007] Moore, J. D. and Wakefield, J. P., Surround Sound for Large
Audiences: What are the Problems?, Annual University of Huddersfield
Researchers' Conference. 2007.
[Moorer, 1979] Moorer, J. A., "About This Reverberation Business", Computer Music
Journal, Vol. 3(2), pp. 13-28, 1979.
[Moreau et al, 2006] Moreau, S., Daniel, J. and Bertet, S., 3D Sound Field
Recording with Higher Order Ambisonics Objective Measurements and Validation
of a 4th Order Spherical Microphone, Proceedings of the 120th Convention of the
Audio Engineering Society, 2006.
[Morgan, 1978] Morgan, R. P., Ives and Mahler: Mutual Responses at the End of an
Era, 19th-Century Music, Vol. 2(1), pp. 72-81, University of California Press, 1978.
[Morgan, 1991] Morgan, R. P., Stockhausen's Writings on Music, The Musical
Quarterly, Vol. 75(4), pp. 194-206, Oxford University Press, 1991.
[Moritz, 2002] Moritz, A., Stockhausen: Essays on the Works,
http://home.earthlink.net/~almoritz/, retrieved on 01/05/2009.
[Mortenson, 1987] Mortenson, G. C., Father to Son: The Education of Charles
Ives, Music Educators Journal, Vol. 73(7), pp. 33-37, 1987.
[Neher, 2004] Neher, T., Towards A Spatial Ear Trainer, Doctoral Thesis,
University of Surrey, pp. 126-164, 2004.
234
[Pulkki, 1997] Pulkki, V., Virtual sound source positioning using Vector Base
Amplitude Panning, Journal of the Audio Engineering Society, Vol. 45, pp. 456
466, 1997.
[Pulkki, 1999] Pulkki, V., "Uniform spreading of amplitude panned virtual sources",
Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to
Audio and Acoustics, Mohonk Mountain House, 1999.
[Pulkki et al, 2001] Pullki, V. and Karjalainen, M. Directional Quality of 3-D
Amplitude-Panned Virtual Sources, Proceedings of the 7th International Conference
on Auditory Display (ICAD2001), pp. 239-244, 2001.
[Pulkki, 2002] Pulkki, V., Compensating displacement of amplitude-panned virtual
sources, Proceedings of the 22nd International Conference of the Audio Engineering
Society, pp. 186-195, 2002.
[Pulkki et al, 2005] Pulkki, V. and Hirvonen, T., Localization of Virtual Sources in
Multichannel Audio Reproduction, IEEE Transactions on Speech and Audio
Processing, Vol. 13(1), 2005.
[Ratliff, 1974] Ratliff, P. A., "Properties of Hearing Related to Quadrophonic
Reproduction", BBC Research Department Report, BBC RD 38, 1974.
[Reynolds, 1987] Reynolds, C. W., Flocks, herds, and schools: A distributed
behavioral model, Computer Graphics, Vol. 21(4), pp. 2534, 1987.
[Roads, 1996] Roads, C., The Computer Music Tutorial, pp. 470-472, MIT Press,
1996.
[Roads, 2004] Roads, C., Microsound, MIT Press, 2004.
[Ross, 2009] Ross, A., The Rest is Noise, Farrar, Straus and Giroux, 2007.
236
237
[Simms, 1986] Simms, B. R., Music of the Twentieth Century: Style and Structure,
Schirmer Books, 1986.
[Singer, 2008] Singer, E., Boids for Max MSP/Jitter, http://www. ericsinger. com,
retrieved on 07/07/2008.
[Smalley, 1996] Smalley, D., The Listening Imagination: Listening in the Electroacoustic Era, Contemporary Music Review, Vol. 13(2), pp. 77-107, Harwood
Academic Publishers, 1996.
[Smalley, 1997] Smalley, D., Spectromorphology : Explaining Sound-Shapes,
Organised Sound, Vol. 2(2), pp. 107-126, Cambridge University Press, 1997.
[Smalley, 2007] Smalley, D., Space-Form and the Acousmatic Image,
Organised Sound, Vol. 12(1), pp. 35-58, Cambridge University Press, 2007.
[Smalley J., 2000] Smalley, J. Gesang der Jnglinge: History and Analysis,
Masterpieces of 20th-Century Electronic Music: A Multimedia Perspective, Columbia
University Computer Music Centre,
www.music.columbia.edu/masterpieces/notes/stockhausen, retrieved 02/07/2009.
[Smalley et al, 2000] Smalley, D. and Austin, L., Sound Diffusion in Composition
and Performance: An Interview with Denis Smalley, Computer Music Journal, Vol.
24(2), pp. 10-21, MIT Press, 2000.
[Smith, 1985] Smith III, J. O., A new approach to digital reverberation using closed
waveguide networks, Proceedings of the International Computer Music Conference,
pp. 4753, 1985.
[Sonke, 2000] Sonke, J. J., "Variable Acoustics by wave field synthesis", Doctoral
Thesis, Delft University of Technology, 2000.
238
239
[Theile et al, 1976], Theile, G. and Plenge, G., "Localization of lateral phantomsources". Journal of the Audio Engineering Society, Vol. 25, pp, 196-200, 1976.
[Theile, 1980] Theile, G., On the Localisation in the Superimposed Soundfield,
Doctoral Thesis, Berlin University of Technology, 1980.
[Theile, 1991] Theile, G. On the Naturalness of Two-Channel Stereo Sound,
Journal of the Audio Engineering Society, Vol. 39, 1991.
[Theile et al, 2003] Theile, G., Wittek, H. and Reisinger, M., Potential Wave Field
Synthesis Applications in the Multichannel Stereophonic World, Proceedings of the
24th International Conference of the Audio Engineering Society, 2003.
[Trochimczyk, 2001] Trochimczyk, M., From Circles to Nets: On the Signification
of Spatial Sound Imagery in New Music, Computer Music Journal, Vol. 25(4), pp.
39-56, MIT Press, 2001.
[Truax, 1990] Truax, B., "Composing with Real-Time Granular Sound", Perspectives
of New Music, Vol. 28(2), 1990.
[Truax, 1996] Truax, B., "Soundscape, Acoustic Communication & Environmental
Sound Composition", Contemporary Music Review, Vol. 15(1), pp. 49-65, 1996.
[Truax, 1999] Truax, B., "Composition and diffusion: space in sound in space",
Organised Sound, Vol. 3(2), pp. 141-6, 1999.
[Truax, 2008] Truax, B., Soundscape Composition as Global Music: Electroacoustic
Music as Soundscape, Organised Sound, Vol. 13(2), pp. 103-109, 2008.
[Unemi et al, 2005] Unemi, T. and Bisig, D., Music by interaction among two
flocking species and human, Proceedings of the 3rd International Conference on
Generative Systems in Electronic Arts, pp. 171179, 2005.
[Uozumi, 2007] Uozumi, Y., Gismo2: An application for agent-based composition,
Proceedings of the 2007 EvoWorkshops, pp. 609616, 2007.
240
[Usher et al, 2004] Usher, J., Martens, W. L. and Woszczyk, W., The influence of
the presence of multiple sources on auditory spatial imagery as indicated by a
graphical response technique, Proceedings of the 18th International Congress on
Acoustics (ICA 2004), 2004.
[Varse, 1998] Varse, E., "The Liberation of Sound", Contemporary Composers on
Contemporary Music, pp. 197, Perseus Publishing, 1998.
[Verheijen, 1998] Verheijen, E., "Sound reproduction by Wave Field Synthesis",
Doctoral Thesis, Delft University of Technology, 1998.
[Vogel, 1993] Vogel, P., "Application of wave field synthesis in room acoustics",
Doctoral Thesis, Delft University of Technology, 1993.
[Wallach et al, 1949] Wallach, H., Newman, E. B. and Rosenzweig, M. R., The
precedence effect in sound localization, American Journal of Psychology, Vol. 57,
pp. 315-336, 1949.
[Wessel et al, 2002] Wessel, M., Wright, D. and Schott, J., Situated Trio - an
interactive live performance for a hexaphonic guitarist and two computer musicians,
Proceedings of the 2002 Conference on New Instruments for Musical Expression
(NIME-02), pp. 171-173, 2002.
[Wessel, 2006] Wessel, D., An Enactive Approach to Computer Music
Performance, Le Feedback dans la Creation Musical, pp. 93-98, 2006.
[White, 2008] White, N. S., The Rosen - a pretty thing, but is it a serious
autoharp?, Autoharp Notes, www.ukautoharps.org.uk, retrieved on 01/05/2009.
[Wiggins, 2004] Wiggins, B., An Investigation into the Real-time Manipulation and
Control of Three-dimensional Sound Fields, Doctoral Thesis; University of Derby,
2004.
[Wishart, 1985] Wishart, T., On Sonic Art, Imagineering Press, 1985.
241
[Wittek, 2002] Wittek, H., Optimised Phantom Source Imaging (OPSI) of the high
frequency content of virtual sources in Wave Field Synthesis,
www.hauptmikrofon.de/wfs.html, retrieved 03/06/2009.
[Wittek, 2003] Wittek, H., Perception of Spatially Synthesized Sound Fields, 2003,
www.hauptmikrofon.de/wfs.html, retrieved 03/06/2009.
[Wittek, 2007] Wittek, H., Perceptual differences between wavefield synthesis and
stereophony, Doctoral Thesis, University of Surrey, 2007.
[Xenakis, 2008] Xenakis, I., Music and Architecture, Architectural Projects, Texts,
and Realizations Compiled, translated and commented by Sharon Kanach, pp. 146,
Pendragon, 2008.
[Yadegari et al, 2002] Yadegari, S., Moore, F. R., Castle, H., Burr, A. and Apel, T.,
Real-Time Implementation of a General Model for Spatial Processing of Sounds,
Proceedings of the International Computer Music Conference, pp. 244-247, 2002.
[Yeung, 2004] Yeung, B., Guitar dreams, an interview with Adrian Freed,
www.sfweekly.com, 2004, retrieved on 01/04/2007.
[Zacharov, 1998] Zacharov, N., "Subjective Appraisal of Loudspeaker Directivity for
Multichannel Reproduction", Journal of the Audio Engineering Society, Vol. 46(4),
1998.
[Zvonar, 2004] Zvonar, R., A History of Spatial Music, eContact, Vol. 7(4),
Canadian Electroacoustic Community, 2004.
242
Appendix A Scores
243