P1: C-172
Kroon WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 13:9 Char Count= 0
314 SPEECH ANDAUDIOCOMPRESSIONTable 2Summary of Relevant ITU-T Coding Standards: Quality and Complexity
Are Indicated by Asterisks, Where a Single Asterisk Means Low, and More
Asterisks Mean Increasing ComplexityCoder Algorithm Rate Kb/s Frame size Complexity Quality
G.711 Mu/A-Law 64 1 * *****
G.726 ADPCM 32,40,16,24 1 ** *****
G.727 ADPCM 32,40,16,24 1 ** *****
G.728 LD-CELP 16 5 ***** *****
G.729 CS-ACELP 8 80 **** ****
G.723.1 MP-MLQ 6.3 240 **** ***
ACELP 5.3 240 **** **The International Telecommunications Union, ITU-T, es-
tablishes worldwide telephony and communication stan-
dards, whereas regional standards bodies such as ETSI
and TIA define standards that are more regional in char-
acter (e.g., wireless standards). For Internet applications
there is a proliferation of ITU-T standards and protocols.
Table 2 summarizes the most relevant ITU-T speech cod-
ing standards.
G.729 is one of the more commonly used coders in
VoIP applications. Besides the main standard there are
many extensions to this standard, which in ITU-T terms
are referred to as annexes. Annex A, for example, is a
low-complexity version of G.729, which generates bit-
compatible output. It requires only about half the com-
plexity, at the expense of minor degradation in speech
quality. Other annexes of G.729 define a low-bit-rate ver-
sion and a high-bit-rate version and integration with Voice
Activity Detection.
Most of the discussion has focused on coders for an
8-kHz sampling rate, which limits the audio bandwidth
to 4 kHz. With the availability of new endpoints, it is now
more feasible to provide higher quality output, specifically
increased audio bandwidths. A commonly used sampling
rate is 16 kHz, which supports audio bandwidths up to
8 kHz. Table 3 summarizes several ITU-T standards that
have been defined for encoding these so-called wide band
signals.
Most of these ITU-T standards not only come with
detailed technical descriptions of their underlying algo-
rithms but also are accompanied by reference source code
and test vectors. The source code is provided as float-
ing point C code or fixed point C code or both. Fixed-
point code reflects implementation on signal processor
and VLSI chips, which are typically used in most portable
devices.
Although standards make the technology readily avail-
able, they are not license-free. In most cases proper royaltyagreements must have been obtained before one can use
the coder in a commercial application.AUDIO CODING TECHNIQUES
The most common high-quality audio distribution for-
mat is based on the compact disc format introduced in
the early 1980s. The signal is encoded using PCM with
16 bits/sample using a 44.1-kHz sampling rate. For stereo
signals this means a data rate of 44,100× 16 × 2 =1.41
Mb/s. More recent formats such as DVD-audio support up
to 24 bits/sample, multichannel formats (e.g., 5.1), and
sampling rates up to 192 kHz, resulting in even higher
data rates. For most practical purposes these signals will
be used as digitized source signals. For Internet streaming
and computer storage applications, it is necessary to re-
duce these rates significantly and to bring them into the 32
to 128 kb/s range. As discussed earlier, this can be accom-
plished by the use of perceptually lossless coders, which
take advantage of the limitations of our auditory system.
For CD quality it is possible to make signals sound per-
ceptually indistinguishable from the original at 64 kb/s
per channel (128 kb/s for stereo). At lower rates we lose
some of the information, but if this is done by proper
combinations of bandwidth reduction, reduced dynamic
range, and the use of mono instead of stereo, the resulting
signal will still be acceptable for many applications. Per-
ceptually lossy and lossless compression uses two main
techniques. First we haveirrelevancyremoval, which re-
moves parts of the signal that we cannot hear. The second
technique,redundancyremoval, finds the most compact
signal representation.
Irrelevancy removal exploits the properties of the hu-
man auditory system. The human auditory system is a
highly sophisticated system with tremendous capabili-
ties. It acts as a converter of acoustic waves to auditory
nerve firings, while performing a spectral analysis as partTable 3ITU-T Wideband Coders for 16-kHz Sampling RateCoder Algorithm Rate Kb/s Frame size Complexity QualityG.722 ADPCM 64,56,48 1 * *****
G.722.1 Transform 32,24 320 *** ****
G.722.2 CELP 15.85, 6.6–23.05 320 **** *****−****