P1: C-172
Kroon WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 13:9 Char Count= 0
318 SPEECH ANDAUDIOCOMPRESSIONH.323
MCUH.323
TERMINALH.323
TERMINALH.323
TERMINALH.323
GATEKEEPERPACKET
BASED
NETWORKH.323
GATEWAYPSTN ISDNFigure 15: An example network topology using the H.323
protocol.as intermediaries between end points. Two protocols that
are widely used are H.323 and SIP. Both of these protocols
build on existing protocols. The H.323 protocol suite was
defined by ITU-T to support multimedia communications
over IP independent of network topology. It contains four
endpoints: (1) the terminal, which serves as the interface
to the user, (2) the gatekeeper, which supports services
such as billing and authentication, (3) the gateway, which
supports connection to other networks, and (4) a multi-
point control unit (MCU), which supports teleconferenc-
ing. Figure 15 shows an H.323 network topology. A simple
end-to end call can be accomplished with two H.323 ter-
minals, without the need for the functional modules.
Although we can build a telephony network indepen-
dently of the traditional PSTN (public switched telephony
network), in most cases we need to be able to interface
with the legacy networks, for example when making a
call from an IP phone to a traditional PSTN phone. The
interface function between the Internet-based telephony
and PSTN telephony is served by a so-called gateway. This
gateway will facilitate the protocol translation and will
also convert the audio data into a format suitable for the
PSTN network.
The session initiation protocol (SIP) was proposed by
IETF. It is modeled after the simple mail transfer proto-
col (SMPT). It is independent of the underlying packet
protocol, and it leverages the Internet and Web structure.
Although most existing IP phones use H.323, SIP is gain-
ing momentum due to its simplicity. It is expected that
both protocols will coexist for a long time and that most
equipment will support both protocols.
Despite the use of RTP, there are still no guarantees that
all packets will arrive on time, or will arrive at all. Buffer-
ing the incoming packets can compensate for late arrival,
at the expense of an increase in delay. Hence it is common
to make this buffer, the so-called jitter buffer, adaptive and
to have it increase in size if the throughput becomes less
reliable. Proper error mitigation is also important, and
some coders have this built in, whereas others such as
G.711 need external concealment. Due to buffering and
other transport delay mechanisms, quite often there is a
need for echo cancellation, especially if a speakerphone is
used (or the speakers on a PC).
Although it is hard to believe that Internet telephony
over the public Internet will replace PSTN services soon,
it is a very likely to become a dominant application incorporate networks. Besides the economical advantage of
only needing one network structure, these networks are
also more carefully managed, thereby making it possible
to guarantee a minimum quality of service.Audio Streaming
There are two popular ways of distributing music over
the Internet.Downloadingis a copying operation from
some server containing the original or compressed ver-
sion of the material. The downloading will typically use
the IP/TCP protocol, and one has to wait till all the ma-
terial has been downloaded before being able to play the
music. Once downloaded, the same material can be played
repeatedly. The other common form of music distribution
isstreaming, which is very much like radio broadcasting.
The material is transmitted to the end user, but the signal
is played out almost instantaneously without local stor-
age. Similarly to the VoIP scenario, it is necessary to buffer
the packets to compensate for late arrival. Because this
is a broadcast or one-way communication scenario, the
amount of buffering can be large (a few to 30 s). Although
in principle the buffers could be made even larger, this
would create large delays before a player produced an au-
dible signal. From a user interface perspective this is un-
desirable. Hence, sophisticated buffer control has been
developed to make sure that continuous throughput is
maintained, while maintaining a relatively small buffer.
Some streaming services accomplish this by reducing the
coder rate temporarily if the average available connection
rate cannot support the initial streaming rate.
The compression techniques used are typically based
on audio coding algorithms, because most material that
is being streamed will be music. For lower rates it is pos-
sible that speech-coding algorithms are used instead. A
commonly used format is the MP3 (MPEG-1 Layer III) for-
mat running at bit rates varying from 64 to 128 kb/s. For
streaming applications, it is possible to use proprietary
coders, as long as the decoders are available as down-
loads. However, most content providers will only support
one or two formats, and as a result a couple of proprietary
standards have become de facto standards. Examples are
Apple’s Quicktime, Microsoft’s Windows Media Player,
and RealNetworks’ RealPlayer. All of these proprietary
coders have a reasonable quality vs. bit rate performance,
while trading off other parameters such as delay, com-
plexity, and audio bandwidth. As should be clear from the
previous sections, the best quality is obtained at higher bit
rates (e.g., 96 to128 kb/s), whereas at lower rates (e.g., 24
to 64 kb/s) tradeoffs will be made by reducing the audio
bandwidth, or even switching to mono. It should also be
noted that even at the same bit rate and using the same
format, differences in quality could exist due to the qual-
ity of the source material and the encoder used. For most
streaming applications it is important that a variety of
rates can be accommodated to support the various con-
nection speeds. It is also important that the decoder have
a relative low complexity to allow it to run on the host pro-
cessor. With the advances in computing speed, this has be-
come less of an issue. However, if these formats are used
for downloading in portable players, complexity becomes
an issue because it is connected to battery life and cost.