Sound Localization 135
Sound localization accuracy is best at frequencies below
1000 Hz (the region where interaural time differences are
useful cues) and above 2000 Hz (the region where interaural
level differences are useful cues), and the transition region
around 1500 Hz is consistent with the duplex theory of sound
localization. Sound localization acuity is best for azimuthal
judgements. Differences as small as 1 of visual angle can be
discriminated when the sound sources are directly in front
(discriminations of differences in sound source locations are
referred to as minimal audible angles; see Mills, 1972). Dif-
ferences in interaural time differences as small as 10 mi-
croseconds and differences in interaural level differences as
small as 0.5 dB can be discriminated (Blauret, 1997).
All sounds that lie on cones of confusion (Mills, 1972) gen-
erate the same interaural time and level differences. One such
cone is the midsagittal plane: the plane that is from directly in
front, to directly overhead, to directly behind, to directly
below a listener. All locations on the midsagittal plane pro-
duce zero differences of interaural time and level, and as such
these interaural differences would not allow sound location
within this plane. Yet, listeners can accurately locate sound
sources in this plane without moving their heads (head move-
ments would change the cone of confusion). Thus, cues other
than the interaural differences are most likely used to locate
sounds in the vertical plane (in the up-down direction).
The head-related transfer functions (HRTFs) discussed in
relationship to the outer ear describe the spectral changes that
sound undergoes as it travels from its source across the body
and head of the listener toward the middle ear. The spectral
characteristics of the HRTF are dependent on the location of
the sound source. In particular, there are spectral peaks and
valleys in frequency regions above 4000 Hz that change
spectral location in a systematic and orderly manner as
a function of the vertical position of the sound source
(Wightman & Kistler, 1989b). Thus, the frequency location
of these HRTF spectral peaks and valleys are probable cues
for sound localization in the vertical direction. For instance,
vertical sound localization is degraded if sound is low-passed
filtered so that there is little or no energy above 4000 Hz where
the spectral peaks and valleys are located. Acuity in the verti-
cal direction is generally poorer than in the horizontal direction
(Middlebrooks, 1992). The greatest number of sound localiza-
tion errors occur along cones of confusions. For instance, there
can be significant front-back and back-front confusions in the
midsagittal plane (Wightman & Kistler, 1989b).
Sound localization accuracy of the distance of a sound
source is poorer than either horizontal or vertical sound lo-
calization accuracy. The primary cues for distance perception
are either the relative sound level or the ratio of reverberant
to direct sound impinging on the listener (Loomis, Klatzky,
Philbeck, & Golledge, 1998). If the sound source is within
the near field of a listener (within about one meter), then in-
teraural level differences may aid in distance judgements.
Relative sound level is only a useful cue if there is some a
priori knowledge of the level, because sound level can vary at
the source, as well as a function of distance. The ratio of
reflected or reverberant sound to that coming directly from
the source varies as a function of distance, making this ratio a
probable cue for distance judgements when there are reflec-
tive surfaces (e.g., the ground).
Models of Sound Localization
Neural coincidence networks have been suggested as one
means by which the auditory system might compute interau-
ral differences, especially interaural time differences
(Colburn & Durlach, 1978). The network contains cells re-
ceiving bilateral inputs. The cells fire upon the simultaneous
arrival of neural information from the two inputs. The net-
work can serve as a neural crosscorrelator of the timing in-
formation arriving at each ear. Thus, sound arriving at the
same time at each ear activates neurons in the middle of the
network, whereas sound arriving at one ear ahead of that
arriving at the other ear activates neurons to one side of the
network. The activation of these neurons in the coincidence
network could form a type of spatial neural map. Several
computational models based on coincidence and crosscorre-
lation have been successful in accounting for a great deal of
data based on manipulations of interaural time differences.
There is neural evidence in some species, especially birds, for
just this sort of coincidence network (Konishi et al., 1988).
Lateralization Versus Localization
When sounds are presented over headphones and interaural
differences of time or level are varied, listeners report that the
sounds move left and right as they do in the real world. How-
ever, the sounds are lateralized inside the head rather than in
space as any real-world sound source would be located—and
therefore localized. Thus, lateralization is often used to refer
to headphone-delivered sounds in the study of sound local-
ization and localization when sound sources are in the exter-
nal world (Blauert, 1997; Yost & Gouervitch, 1987).
One reason that sound delivered over headphones may
be lateralized rather than localized is that the headphone-
delivered sounds have not undergone the spectral transforma-
tions associated with the HRTFs that naturally occurring
sounds undergo. If the information about the HRTF is put
back into the sound delivered over headphones, then it is
possible to produce a sound over headphones (using HRTF
filters) that is spectrally identical to that which would have
arrived at the middle ear from a real sound source at some