590 Chapter 19
Applications for MPEG-4 audio might include “ mix minus 1 ” applications in which an
orchestra is recorded minus the concerto instrument, allowing a musician to play along
with his or her instrument at home, or where all effects and music tracks in a feature fi lm
are “ mix minus the dialogue, ” allowing very fl exible multilingual applications because
each language is a separate audio object and can be selected as required in the decoder.
In principle, none of these applications is anything but straightforward; they could be
handled by existing digital (or analogue) systems. The problem, once again, is bandwidth.
MPEG-4 is designed for very low bit rates and this should suggest that MPEG have designed
(or integrated) a number of very powerful audio tools to reduce necessary data throughput.
These tools include the MPEG-4 Structured Audio format, which uses low bit-rate
algorithmic sound models to code sounds. Furthermore, MPEG-4 includes the functionality
to use and control postproduction panning and reverberation effects at the decoder, as well as
the use of a SAOL signal-processing language enabling music synthesis and sound effects to
be generated, once again, at the terminal rather than prior to transmission.
19.7.1 Structured Audio
We have already seen how MPEG (and Dolby) coding aims to remove perceptual
redundancy from an audio signal, as well as removing other simpler representational
redundancy by means of effi cient bit-coding schemes. Structured audio (SA) compression
schemes compress sound by, fi rst, exploiting another type of redundancy in signals—
structural redundancy.
Structural redundancy is a natural result of the way sound is created in human situations.
The same sounds, or sounds which are very similar, occur over and over again. For
example, a performance of a work for solo piano consists of many piano notes. Each
time the performer strikes the “ middle C ” key on the piano, a very similar sound is
created by the piano’s mechanism. To a fi rst approximation, we could view the sound as
exactly the same upon each strike; to a closer one, we could view it as the same except
for the velocity with which the key is struck and so on. In a PCM representation of the
piano performance, each note is treated as a completely independent entity; each time
the “ middle C ” is struck, the sound of that note is independently represented in the data
sequence. This is even true in a perceptual coding of the sound. The representation has
been compressed, but the structural redundancy present in rerepresenting the same note as
different events has not been removed.