The Internet Encyclopedia (Volume 3)

(coco) #1

P1: C-172


Kroon WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 13:9 Char Count= 0


316 SPEECH ANDAUDIOCOMPRESSION

Noiseless
decoding

Noiseless
decoding

Inverse
Quantizer

Inverse
Quantizer

Synthesis
filterbank

Synthesis
filterbank

bit
stream audio
output

Figure 13: Block diagram of generic audio decoder.

operation. The resulting bit rate will be variable and will
be signal-dependent. In many implementations an itera-
tive procedure is used to find the optimum quantizer step
sizes that result in coding noise below the masked thresh-
old that will result in the lowest possible bit rate.
The decoder operation performs the operation in re-
verse, without the need for a perceptual model. A generic
block diagram of an audio decoder is shown in Figure 13.
After the coefficients are reconstructed, the signal is trans-
formed back to the time domain and ready for playback.
The block diagrams of Figures 12 and 13 are the princi-
ple for coding a single audio channel. For encoding mul-
tiple channels (Nchannels) once could in principle useN
of these encoder/decoder pairs. However, in practice one
would like to take advantage of the possible correlations
that exist between the various channels. Also, for trans-
parent coding, one would take into account that masking
levels will differ for signals that are spatial in nature. Some
distortions that are inaudible in each individual channel
will become audible when listening to its multichannel
version (e.g., in stereo).
Most state-of-the-art audio coders will produce a vari-
able bit rate. In most communication scenarios a fixed bit
rate is more desirable. This is accomplished by a buffering
scheme that interacts with the coder quantization deci-
sions. Designing buffering schemes that minimize buffer
size (and its corresponding delay) and minimize impact
on audio quality turns out to be a challenge, and various
solutions exist, each with advantages and disadvantages.

Audio Coding Standards
Two types of standards exist. The first type is based on
sanctioning by a standard organization (e.g., ISO and
its MPEG standards). The second type is based on a

proliferated proprietaryde factostandard (Windows Me-
dia Player, RealPlayer, Dolby AC-3, etc). In both cases
proper licensing is needed to use these standards in com-
mercial applications. The MPEG coding standards are
widely used for audio and video. The well-known MP3
standard is actually MPEG-1, Layer 3. Table 4 summa-
rizes the MPEG standards.
The MPEG standards are defined in a different way
than most ITU-T speech coding standards. They consist of
a normative part defining the bit stream syntax and the de-
coder description (sometimes accompanied by reference
C code). As a result anyone should be able to implement
proper decoders. The encoder is described in the informa-
tive part and just describes the algorithmic concepts and
operating modes. However, to build good encoders it is
necessary to understand the algorithms, and the standard
document will not provide details on how to build a good
encoder. As a result standard compliant encoders can be
quite different in performance.
Besides coders based on the traditional coding
paradigm described above, other paradigms exist as well.
These alternative paradigms play a role in very-low-rate
coding. Two paradigms that have become part of the
MPEG-4 standards are structured audio and parametric
audio coding. Structured audio consists of a set of tools
that can be used to generate music in a way similar to a
music synthesizer. It also contains a structured way of de-
scribing musical scores and their subsequent conversion
to sound using synthesizers. A structured audio bit stream
describes how to build the synthesizers and provides the
musical score and information on how to play this on the
synthesizer. The resulting description can be very com-
pact and low-bit-rate (only several kb/s). The resulting au-
dio signal can be of very high quality, and because many
modern music productions use synthesizers extensively, it
can be very close to some originals. Parametric audio cod-
ing uses ideas similar to those used in speech compression
by modeling some of the production mechanisms of the
music sounds. Although it will work well for some sounds,
it is difficult to make this technique work in a consistent
way for a wide variety of input signals.

Table 4Overview of Various MPEG Audio Standards

Standard Year Rates for transparency Channels Comments
MPEG-1 Audio, 1992 384 kb/s Mono, stereo 48-, 44.1-, 32-kHz
Layer I stereo sampling rates
MPEG-1, Layer II 1992 192—256 kb/s Mono, stereo 48-, 44.1-, 32-kHz
stereo sampling rates
MPEG-1, Layer III 1992 128—160 kb/s Mono, stereo 48-, 44.1-, 32-kHz
stereo sampling rates
MPEG-2, Layers I, 1994 See MPEG-1 Mono, stereo, and MPEG-1 + enhancements,
II, III rates backward compatible supports lower sampling
5.1 multichannel rates
MPEG-2, AAC 1997 96—128 Mono, stereo, multichannel Not compatible with
up to 48 channels MPEG1, 2, supports
96-kHz sampling
MPEG-4 Version 1 1998 9—128 Mono, stereo, multichannel Supports various coding tools
MPEG-4 Version 2 1999 9—128 Mono, stereo, multichannel Support error robustness tools
Free download pdf