The Internet Encyclopedia (Volume 3)

P1: C-172

Kroon WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 13:9 Char Count= 0

316 SPEECH ANDAUDIOCOMPRESSION

Noiseless decoding

Inverse Quantizer

Synthesis filterbank

bit stream audio output

Figure 13: Block diagram of generic audio decoder.

operation. The resulting bit rate will be variable and will be signal-dependent. In many implementations an itera- tive procedure is used to find the optimum quantizer step sizes that result in coding noise below the masked thresh- old that will result in the lowest possible bit rate. The decoder operation performs the operation in re- verse, without the need for a perceptual model. A generic block diagram of an audio decoder is shown in Figure 13. After the coefficients are reconstructed, the signal is trans- formed back to the time domain and ready for playback. The block diagrams of Figures 12 and 13 are the principle for coding a single audio channel. For encoding mul- tiple channels (Nchannels) once could in principle useN of these encoder/decoder pairs. However, in practice one would like to take advantage of the possible correlations that exist between the various channels. Also, for trans- parent coding, one would take into account that masking levels will differ for signals that are spatial in nature. Some distortions that are inaudible in each individual channel will become audible when listening to its multichannel version (e.g., in stereo). Most state-of-the-art audio coders will produce a variable bit rate. In most communication scenarios a fixed bit rate is more desirable. This is accomplished by a buffering scheme that interacts with the coder quantization deci- sions. Designing buffering schemes that minimize buffer size (and its corresponding delay) and minimize impact on audio quality turns out to be a challenge, and various solutions exist, each with advantages and disadvantages.

Audio Coding Standards Two types of standards exist. The first type is based on sanctioning by a standard organization (e.g., ISO and its MPEG standards). The second type is based on a

proliferated proprietaryde factostandard (Windows Me- dia Player, RealPlayer, Dolby AC-3, etc). In both cases proper licensing is needed to use these standards in com- mercial applications. The MPEG coding standards are widely used for audio and video. The well-known MP3 standard is actually MPEG-1, Layer 3. Table 4 summa- rizes the MPEG standards. The MPEG standards are defined in a different way than most ITU-T speech coding standards. They consist of a normative part defining the bit stream syntax and the decoder description (sometimes accompanied by reference C code). As a result anyone should be able to implement proper decoders. The encoder is described in the informa- tive part and just describes the algorithmic concepts and operating modes. However, to build good encoders it is necessary to understand the algorithms, and the standard document will not provide details on how to build a good encoder. As a result standard compliant encoders can be quite different in performance. Besides coders based on the traditional coding paradigm described above, other paradigms exist as well. These alternative paradigms play a role in very-low-rate coding. Two paradigms that have become part of the MPEG-4 standards are structured audio and parametric audio coding. Structured audio consists of a set of tools that can be used to generate music in a way similar to a music synthesizer. It also contains a structured way of de- scribing musical scores and their subsequent conversion to sound using synthesizers. A structured audio bit stream describes how to build the synthesizers and provides the musical score and information on how to play this on the synthesizer. The resulting description can be very com- pact and low-bit-rate (only several kb/s). The resulting audio signal can be of very high quality, and because many modern music productions use synthesizers extensively, it can be very close to some originals. Parametric audio coding uses ideas similar to those used in speech compression by modeling some of the production mechanisms of the music sounds. Although it will work well for some sounds, it is difficult to make this technique work in a consistent way for a wide variety of input signals.

Table 4Overview of Various MPEG Audio Standards

Standard Year Rates for transparency Channels Comments MPEG-1 Audio, 1992 384 kb/s Mono, stereo 48-, 44.1-, 32-kHz Layer I stereo sampling rates MPEG-1, Layer II 1992 192—256 kb/s Mono, stereo 48-, 44.1-, 32-kHz stereo sampling rates MPEG-1, Layer III 1992 128—160 kb/s Mono, stereo 48-, 44.1-, 32-kHz stereo sampling rates MPEG-2, Layers I, 1994 See MPEG-1 Mono, stereo, and MPEG-1 + enhancements, II, III rates backward compatible supports lower sampling 5.1 multichannel rates MPEG-2, AAC 1997 96—128 Mono, stereo, multichannel Not compatible with up to 48 channels MPEG1, 2, supports 96-kHz sampling MPEG-4 Version 1 1998 9—128 Mono, stereo, multichannel Supports various coding tools MPEG-4 Version 2 1999 9—128 Mono, stereo, multichannel Support error robustness tools

The Internet Encyclopedia (Volume 3)

Get our desktop app

Company

Features

Documentation

Resources