Psychoacoustics 59
There are two challenges when using spectral cues.
The first is discriminating between the filtering feature
and the spectrum of the source. For instance, if one
hears a notch around 9 kHz, it might be due to an HRTF,
or the original source spectrum might have a notch
around 9 kHz. Unfortunately there is no simple way to
discriminate between them. However, for a familiar
sound (voice, instruments, etc.) with a spectrum known
to the auditory system, it is easier to figure out the
HRTFs and thus easier to localize the source than an
unknown sound. If one has trouble discriminating
sounds along the cone of confusion, one can use the
cues of head motion. For example, suppose a listener
turns his or her head to the left. If the source moves to
the right, the source is in front; whereas if the source
moves farther to the left, it must be in the back. The
second challenge is the individuality of HRTFs. No two
people share the same pinna and head shape, and we
have learned our own pinnae and head size/shape over
years of experience. If one listens to sounds convolved
with the HRTFs of someone else, although the left–right
localization will be good, there will be a lot of
front–back confusion,^47 unless the listener’s head and
ears happen to be similar in size and shape to those
whose HRTF is measured.^48 The human binaural system
is remarkably adaptive. Experiments with ear molds^49
show that, if a subject listens exclusively through
another set of ears, although there is originally a lot of
front–back confusion, in about 3 weeks, the subject will
learn the new ears and localize almost as well as with
their original ears. Instead of forgetting either the new
or the old ears, the subject actually memorizes both sets
of ears, and becomes in a sense bilingual, and is able to
switch between the two sets of ears.
3.11.3 Externalization
Many listeners prefer listening to music through loud-
speakers instead of through headphones. One of the rea-
sons is that when listening through headphones, the
pinnae are effectively bypassed, and the auditory system
is not receiving any of the cues that the pinnae produce.
Over headphones, the instruments and singers’ voices
are all perceived or localized inside the head. When lis-
tening through loudspeakers, although the localization
cues are not perfect, the sounds are externalized if not
localized, somewhat more naturally. If, however, music
playing through the headphones includes the HRTFs of
the listener, he or she should be able to externalize the
sound perfectly.^50 Algorithms are available to simulate
3D sound sources at any location in free field and in a
regular room with reverberation. The simulation is
accurate to up to 16 kHz, and listeners cannot discrimi-
nate between the real source and the virtual (simulated)
sound.51,52 An inconvenience nevertheless is that the
system has to be calibrated to each listener and each
room. In 1985, Jones et al.^53 devised a test for stereo
imagery utilizing a reverberator developed at the North-
western University Computer Music Studio. The rever-
berator utilized HRTFs to create very compelling
simulations of 3D space and moving sound sources
within 3D space. The test by Jones et al.,^53 called LEDR
(Listening Environment Diagnostic Recording) NU™,
contained sound examples that moved in very specific
sound paths. When played over loudspeaker systems
that were free from phase or temporal distortions and in
environments free from early reflections, the paths were
perceived as they were intended. In the presence of
early reflection or misaligned crossovers or drivers, the
paths are audibly corrupted.
Figure 3-21. Cone of confusion for a spherical head with
two holes at the ear positions. If only ITD cues are available,
the listener cannot discriminate positions on the surface of
the cone of confusion, corresponding to a given ITD. If ILD
cues are also available, due to the diffraction of the head,
the listener can further limit the confusion range into a cir-
cle (the dark “donut” on the figure).
Figure 3-22. Head-related transfer functions. Each curve
shows the filtering feature (i.e., the gain added by the exter-
nal ear at each frequency) of an incident angle. This figure
shows the orientations in the horizontal plane. The angles
are referenced to the medial sagittal plane, ipsilateral to the
ear. The angle of 0q is straight ahead of the subject.
Frequency–kHz
Gain–dB
25
20
15
10
5
0
−5
+45°
0° +90°
+135°
0°
+45°
+90°
+135°
0.2 0.3 0.4 0.5 0.7 1 2 3 4 5 7 10 12