Handbook of Psychology, Volume 4: Experimental Psychology

140 Audition

Common Onsets and Offsets

It is often the case that although sounds from different
sources may occur at about the same time, one sound may
come on or go off at a slightly different time than another
sound. When this happens, all of the temporal-spectral char-
acteristics of one sound come on and go off at a different time
than that occurring for the other sound. Thus, the common
onset or offset of these spectral-temporal cues could be used
for sound source segregation.
Asynchronous onsets, and in some cases offsets, have
been shown to provide powerful cues for sound source segre-
gation (Yost & Sheft, 1993). In some cases, onset cues can be
used to amplify other cues that might be used for sound
source segregation. As described above in the section on
pitch, a harmonic sequence can produce a complex pitch
equal to the fundamental frequency of the complex. If two
complexes with different fundamentals are mixed, in most
conditions listeners do not perceive the two pitches corre-
sponding to the original two fundamental frequencies. The
spectral characteristics of the new complex consisting of the
mixture of the two harmonic sequences appear to be analyzed
as a whole (synthetically). However, if one of the harmonic
complexes is turned on slightly before (50 ms) the other har-
monic complex, listeners often perceive the two pitches, even
though for most of the time (perhaps for a second) the two
harmonic complexes occur together (Darwin, 1981).

Common Modulation

Most everyday sound sources impart a slow amplitude and fre-
quency modulation (change) to the overall spectral-temporal
properties of the sound from the source. Each sound source
will produce a different pattern of modulation, and these mod-
ulation patterns may allow for sound source segregation (Yost
& Sheft, 1993). When a person speaks, the vocal cords open
and close in a nearly periodic manner that determines the pitch
of a voice (Fowler chapter in this volume). However, the fre-
quency of these glottal openings varies (frequency modula-
tion, voicing vibrato) slightly, and the amplitude of air
released by each opening also randomly varies (amplitude
modulation, voicing jitter) over a small range. Each person has
a different pattern of vibrato and jitter. Speech sounds can be
artificially generated (via computer) such that the speech (see
Fowler chapter in this volume) is produced with constant glot-
tal frequency and amplitude. If two such constant speech
sounds are generated and mixed, it is often difficult to segre-
gate the two sounds into the two different speech signals.
However, if random variation is introduced into the computer-
generated glottal openings and closing (random vibrato and
jitter), segregation can occur (McAdams, 1984).

Thus, common amplitude and frequency modulation may be possible cues for sound source segregation. However, frequency modulation per se is probably not a cue used for sound source segregation (Carylon, 1991), but amplitude modulation is most likely a useful cue. Two experimental procedures have been extensively studied to investigate the role of amplitude modulation in auditory processing: comodulation masking release (CMR) and modulation detection interference (MDI). In a typical CMR experiment (Hall, Haggard, & Fernandes, 1984; Yost & Sheft, 1993) listeners are asked to detect a tonal signal spectrally centered in the middle of a narrow band of noise (target band). In one condition, the detection of the signal is compared to a case in which another narrow band of noise (the flanking band) is simultaneously added in another region of the spectrum. The addition of this flanking band has little effect on signal threshold in the target band, if the target and flanking bands are completely independent. This is con- sistent with the critical-band view of auditory processing, in that the flanking band falls outside the spectral region of the critical band of the target band and therefore should have little influence on signal detection within the target band. However, if the target and flanking band are dependent in that they have the same pattern of amplitude modulation (they are comodulated), then signal threshold for the target band is lowered by 10–15 dB. This improvement in signal threshold due to comodulation is referred to as CMR, and the results from a typical experiment are shown in Figure 5.15. The CMR results suggest that the common modulation increases the listener’s ability to detect the signal. One expla- nation of these results is based on the assumption that comodulation groups the flanking and target bands into one perceived sound source that contains more information than that in a single band. Independent (non-comodulated) bands of noise would not come from a single sound source and therefore would not be grouped together. The additional information in the combined (grouped) sound might aid signal detection. For instance, it might make the valleys of low amplitude in the modulated noises more obvious, increasing the ability of the auditory system to detect the tone occurring in these valleys. The addition of the signal changes the correlation between the signal-plus-masking stimuli and the masking-alone stimuli. The combined stimulus may increase this correlation, increasing signal detection. In an MDI condition (Yost, 1992b; Yost & Sheft, 1993), listeners are asked to discriminate between two amplitude- modulated tonal carrier signals (the probe stimuli) on the basis of the depth of the amplitude modulation. Threshold performance is typically a 3% change in the depth of amplitude modulation. If a tone of a different frequency and not

Handbook of Psychology, Volume 4: Experimental Psychology

Get our desktop app

Company

Features

Documentation

Resources