Sound Source Segregation 139
frequency of the two vowels is made to differ, then often
vowel recognition is improved. Because a change in the fun-
damental voicing frequency also alters the harmonic structure
of each vowel, this result suggests that harmonicity does sup-
port sound source segregation.
Spectral Profile
Most of the time, the identifiable properties of a sound source
are level independent. For instance, an uttered sentence has
about the same intelligibility at many different overall loud-
ness levels. Thus, the overall spectral-temporal structure of
the sound from a source, which remains constant as overall
level is varied, is the important determiner of sound-source
identification. An area of study referred to as profile analysis
(Green, 1989) has been used to study this property of audi-
tory perception.
In a typical profile analysis experiment, several tones that
are all of the same level but of different frequencies are mixed
together. The frequency spacing of the tones is usually loga-
rithmic to avoid generating sounds with harmonic structure
that may have complex pitches. The level of a tone in the
spectral middle of the complex (the signal tone) is increased,
and the level of this signal tone required for the complex with
the signal to be just discriminable from the complex with
all tones equal in level is measured in several different condi-
tions. The key aspect of these profile studies is that the overall
level of both complexes is randomly varied across stimulus
presentations over a large range, such as 40 dB. The random
variation would affect two possible cues for detection. If
detection were based on just attending to the signal tone,
the overall random-level variation would require a very large
signal-level increment for discrimination. An increase in the
level of the signal will increase the overall level of the com-
plex as compared to the complex in which all tones are pre-
sented at the same level. Thus, overall level (or loudness)
could be a cue for detection. However, the random overall
level variation would again require a very large signal-level
increment if this loudness cue were to be the basis for
discrimination. If, on the other hand, listeners could use the
relative change in level between the level of the signal as com-
pared to the level of the other tones in the complex, then the
random overall level variation would not affect this cue. The
complex with the signal increment would have a pointed spec-
tral profile, whereas the complex without the signal increment
would have a flat profile. Thus, if this spectral profile cue were
used, then discrimination between the signal and nonsignal
complexes might occur for small changes in signal level.
The data in Figure 5.14 suggest that such a spectral profile
cue is being used. The level of the tonal signal required for
detection of the signal increment is shown as a function of
the number of total tones in the complex. With 11 tones in the
complex, the threshold is about the same as it was when
the signal was presented in isolation of any flanking tones.
When there are fewer or more flanking tones in the complex
than 11, thresholds are higher. When a large number of tones
fit into the same bandwidth, the tones are so close together
that they directly interact, so that tones near that of the signal
mask the signal. The increase in threshold with increases in
tonal density is consistent with other masking data. When
there are only a few tones in the complex, it is argued that the
profile is difficult to determine; for example, there is a large
spectral difference between the signal tone and its nearest
neighbor, making it difficult to discern the spectral profile. A
model of listener performance, based on how listeners weigh
the spectral information in these tonal profile complexes,
suggests that listeners do monitor the spectral profile of these
stimuli as the basis for their discrimination judgments
(Green, 1989).
Experiments like these profile experiments suggest that the
auditory system is very sensitive to subtle changes in the spec-
tral shape of complex signals. Thus, sounds from different
sources can be segregated based on changes in spectral shape.
Note that the use of spectral shape requires the auditory sys-
tem to process information across a wide spectral range.
Figure 5.14 The results from a profile analysis experiment in which the
number of masker frequency components surrounding a 1000-Hz signal
component increased from 4 to 42. The thresholds for detecting an increment
in the 1000-Hz signal component (the center component) are shown in
decibels relative to that of the rest of the masker component intensity. The as-
terisk on the far left indicates the typical threshold for detecting a level in-
crement of a single, 1000-Hz tone. Thresholds for the 10-masker condition
are almost as low as those for the single-tone condition, and the thresholds
first decrease and then increase as the number of masker components in-
creases from 4 to 42. Source:From Yost (2000), based on data from Green
(1989), with permission.