Handbook for Sound Engineers

(Wang) #1
Designing for Speech Intelligibility 1387

36.1 Introduction


The fundamental purpose of a paging, announcement,
voice alarm, or speech reinforcement system is to
deliver intelligible speech to the listener. A surprising
number of systems, however, fail to achieve this basic
goal. There can be many reasons for this, ranging from
inadequate signal-to-noise ratio to poor room acoustics
or inappropriate choice or location of the loudspeaker. It
is the job of the sound system designer to be aware of
these factors and take them into account when designing
a sound system and selecting devices to provide the
degree of intelligibility required. In order to do this,
however, an understanding of the basic factors that
affect speech intelligibility and the way we hear speech
is required. This chapter therefore begins by taking a
look at the nature of the speech signal and how we hear
it before discussing design strategies and ways of opti-
mizing system design and performance. Current
methods of assessing and measuring intelligibility are
then also discussed together with comments on their
practical limitations.


36.2 Parameters Affecting Speech Intelligibility


Although sound quality and speech intelligibility are
inextricably linked, they are not the same thing. For
example it is quite possible to have a poor sounding
system that is highly intelligible (e.g., the frequency
response limited and resonant re-entrant horn) or alter-
natively a high-quality system that is virtually unintelli-
gible (e.g., a hi-fi loudspeaker in an aircraft hangar).
Similarly a common mistake, often made when
discussing intelligibility, is to confuse audibility with
clarity. Just because a sound is audible does not mean to
say that it is intelligible. Audibility relates to the ability
of a listener to physically be able to hear a sound,
whereas clarity describes the ability to detect the struc-
ture of the sound. In the case of speech, this means
hearing the consonants and vowels correctly in order to
identify the words and sentence structure and so give
the speech sounds intelligible meaning.


36.3 The Nature of Speech


A speech signal involves the dimensions of sound pres-
sure, time, and frequency. Fig. 36-1 shows some typical
speech waveforms representing the numbers “one”
“two,” and “three.” The waveforms are highly complex,
with amplitudes and frequency contents that change


almost millisecond by millisecond. Consonant sounds
typically have durations of around 65 ms and vowels
100 ms. The duration of syllables is typically
300–400 ms whereas complete words are about
600–900 ms in length dependent on their complexity
and rate of speech. When speech is transmitted into a
reverberant space, local reflections and the general
reverberation distort the speech waveform by smearing
it in time. The reverberant tail of one syllable or word
can overhang the start of the next and so mask it,
thereby reducing the potential clarity and intelligibility,
Fig. 36-2. Equally if the background noise level is high
or more accurately if the speech signal-to-noise ratio is
too low, then again parts of words or syllables become
lost and intelligibility deteriorates, Fig. 36-3. There are
many other factors that can affect the potential intelligi-
bility and perceived clarity of a speech signal, the most
important are summarized below.

Figure 36-1. Anechoic speech waveforms for the numbers
“one,” “two,” and “three.”

Figure 36-2. Speech waveforms (as Fig. 36-1) but with
reverberation (RT 60 = 2.4 s). The way one word runs into
the next can clearly be seen, but with concentration the
individual words can still be understood.

0.0 500 1000 1500

1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
auto

Input data–V

Time–ms

Input data–V

0.0 500 1000 1500

1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
auto
Time–ms

M
L
S
S
A
Free download pdf