The Internet Encyclopedia (Volume 3)

(coco) #1

P1: IML/FFX P2: IML/FFX QC: IML/FFX T1: IML


Video ̇Compression ̇OLE WL040/Bidgolio-Vol I WL040-Sample.cls September 14, 2003 18:10 Char Count= 0


544 VIDEOCOMPRESSION

together with the estimate of their prediction error (Shi
& Sun, 2000). The three-dimensional transform tech-
nique has become feasible due to recent advances in elec-
tronics manufacture and represents a growing area for
research.

Rate Control of Compressed Digital Video
The information content of digital video sources may vary
in complexity from scene to scene. Constant bit rate en-
coding forces the reconstructed video quality to vary in or-
der to maintain a (near) constant bit rate for the channel
coder. Variable bit rate (VBR) has the goal of maintaining
(near) constant quality by varying the bit rate allocated
to scenes of varying complexity. Popular methods of rate
control include varying the quantization of transform cod-
ing in a feedback loop and embedded zero tree encoding
(Shapiro, 1993).

Psychovisual Modeling
Human attention is often attracted to faces and facial
expressions and distracted by the motion of peripheral
objects across the field of view (see Colmenarez, Frey, &
Huang, 1999). An encoder may choose to encode these
regions of interest at a higher bit rate than others if the
regions can be bounded (typically, by focus boxes).

Statistical Multiplexing
Often, a broadcaster wishes to transmit several program
streams on a channel. Statistical multiplexing is a pro-
cess by which the instantaneous encoding rates of multi-
ple program streams are adjusted by VBR techniques to
maintain either (near) constant quality or optimize the
aggregate bit rate in the channel.

Network Transmission Issues
In networks with quality of service guarantees such as the
Internet, the varying availability of bandwidth leads to
network congestion and packet jitter. Approaches for ad-
dressing these issues include buffering the video stream
and employing techniques that yield multiple bit rates for
same stream. Hunt (2001) filed a patent application that
indicates a congestion management scheme—intended
primarily for mobile videophone users—in which graph-
ics and text may be displayed when it is necessary to

buffer the video stream so that the user perceives the video
streaming process to be in real time.

DIGITAL VIDEO COMPRESSION
STANDARDS
MPEG-1, -2, -4 Visual Codecs
Introduction
Koenen (2001) pointed out that the Moving Pic-
ture Expert Group (MPEG)—whose formal name is
ISO/IEC JTC1/SC29/WG11—has developed the widely
used MPEG-1, -2 and -4 codecs to facilitate and stan-
dardize infrastructure for the interoperability of multi-
media. MPEG-1 is still widely used on the Web, targeted
at bit rates 64 kbs –1.5 Mbps, MPEG-2 is the basis
for current digital television and Digital Versatile Disk
(DVD) standards, targeted at 1.5 Mbps –7 Mbps or higher,
and MPEG-4 is an extensible toolbox of algorithms that
addresses the coding of audiovisual objects throughout
an extended bit range from about 5 bps to more than
1 Gbps. The standards address video compression, to-
gether with audio and systems elements. The MPEG Web
site (http://mpeg.telecomitalialab.com) provides much de-
tailed information about the MPEG standards including
an overview.

Intra-, Predictive, and Bidirectional Frames
Table 2 shows the relationship among intra (I), predic-
tive (P), and bidirectional (B) frames referred to in the
MPEG standards and introduced in the MPEG-1 standard
(ISO/IEC 11172–2:1993). I frames are coded using infor-
mation only from that frame. P frames are coded using
forward prediction from previous I or P frames. B frames
are encoded from previous and succeeding frames using I
or P frames. This, in turn, implies that the encoding order
may be markedly different from the display order. Since B
frames require both past and future frames to be decoded
before they can be decoded and displayed, the presence
of B frames adds considerably to the encoder delay. The
MPEG-2 simple profile at main level requires only I and
P frames, thus reducing delay.

Sequence, Group of Pictures, Slice, Macroblock,
Block, and Pixel
A group of pictures (GOP) contains I, and potentially P,
or B frames. During encoding the user selects the du-
ration of the GOP as the number frames between suc-
cessive I frames (N) together with the distance between

Table 2Display and Transmission of a Group of 18 NTSC MPEG-2 Intra (I)

Display I B P B P B P B P B P B P B P B P B
Frame 1 2 345678 9101112131415161718
Transmit B I B P B P B P B P B P B P B P B P
Frame 2 1 43658710 91211141316151817

Note: Predicted (P) and bidirectional (B) pictures. The P pictures are predicted from the previous I picture and the B pictures
are interpolated from both the previous and next I or P pictures. The decoder must buffer P frames and compute B frames.
The maximum length of a group of pictures (GOP) is 18 pictures on the NTSC system and 15 pictures on the PAL system.
Free download pdf