P1: C-46
Tuttle WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 17:30 Char Count= 0
VIDEOCOMPRESSIONALGORITHMS 559gestures. In this situation, the video information can be
represented by a key frame along with delta frames con-
taining the changes between the frames. This is known
as interframe compression. In addition, individual frames
may be compressed using lossy techniques. An example of
this is a technique where the number of bits representing
color information is reduced and some color information
is lost. This is known as intraframe compression. Com-
bining the interframe and intraframe compression tech-
niques can result in up to a 200:1 compression (Compaq,
1998).
Another compression technique is called quantizing.
It is the basis for most lossy compression algorithms. Es-
sentially, it is a process where rounding of data is done to
reduce the display precision. For the most part, the eye
cannot detect these changes to the fine details (Fischer &
Schroeder, 1996). An example of this type of compression
is the intraframe compression described above. Another
example is the conversion from the RGB color format used
in computer monitors to the YcrCb format used in digital
videos that was discussed in the capturing and digitizing
section of this paper.
Filtering is a very common technique that involves the
removal of unnecessary data. Transforming is another
technique, where a mathematical function is used to con-
vert the data into a code used for transmission. The trans-
form can then be inverted to recover the data (Vantum
Corporation, 2001).
For videos that have audio, the actual process used to
compress audio is very different from that used to com-
press video even though the techniques that are used are
very similar to those described above. This is because the
eye and ear work very differently. The ear has a much
higher dynamic range and resolution. The ear can pick
out more details but it is slower than the eye (Filippini,
1997). Sound is recorded as voltage levels and it is sam-
pled by the computer a number of times per second. The
higher the sampling rate, the higher the quality and hence,
the greater the need for compression. Compressing audio
data involves removing the unneeded and redundant parts
of the signal. In addition, the portions of the signal that
cannot be heard are removed.VIDEO COMPRESSION ALGORITHMS
Some algorithms were designed for wide bandwidths and
some for narrow bandwidths. Some algorithms were de-
veloped specifically for CD-ROMs and others for stream-
ing video. There are a number of compression algorithms
available for streaming video; this chapter will discuss the
major ones in use today. These algorithms are MPEG-1,
MPEG-2, MPEG-4, H.261, H.263, and MJPEG. The
video compression algorithms can be separated into two
groups: those that make use of frame-to-frame redun-
dancy and those that do not. The algorithms that make
use of this redundancy can achieve significantly greater
compression. However, more computational power is re-
quired to encode video where frame-to-frame redundan-
cies are utilized.
As mentioned in earlier in this paper, MPEG stands for
Moving Pictures Experts Group, which is a work group of
the International Standards Organization (ISO) (Compaq,
1998). This group has defined several levels of standardsfor video and audio compression. The MPEG standard
only specifies a data model for compression and, thus,
it is an open, independent standard. MPEG is becoming
very popular with streaming video creators and users.
The first of these standards, MPEG-1, was made avail-
able in 1993 and was aimed primarily at video conferenc-
ing, videophones, computer games, and first-generation
CD-ROMs. It was designed for consumer video and
CD-ROM audio applications that operate at a data rate of
approximately 1.5 Mbps and a frame rate of 30 frames per
second. It has a resolution of 360×242 and supports play-
back functions such as fast forward, reverse, and random
access into the bitstream (Compaq, 1998). It is currently
used for video CDs and it is a common format for video
on the Internet when good quality is desired and when
its bandwidth requirements can be supported (Vantum
Corporation, 2001).
MPEG-1 uses interframe compression to remove
redundant data between the frames, as discussed in the
previous section on compression techniques. It also uses
intraframe compression within an individual frame as
described in the previous section. This compression al-
gorithm generates three types of frames: I-frames, P-
frames, and B-frames. I-frames do not reference other
previous or future frames. They are stand-alone or Inde-
pendent frames and they are larger than the other frames.
They are compressed only with intraframe compression.
They are the entry points for indexing or rewinding the
video, because they represent complete pictures (Compaq,
1998).
On the other hand, P-frames contain predictive infor-
mation with respect to the previous I or P frames. They
contain only the pixels that have changed since the last
frame, and they account for motion. In addition, they
are smaller than the I-frames, because they are more
compressed. I-frames are sent at regular intervals during
transmission process. P-frames are sent at some time in-
terval after the I-frames have been sent (this time inter-
val will vary based on the transmission of the streaming
video).
If the video has a lot of motion, the P-frames may not
come fast enough to give the perception of smooth mo-
tion. Therefore, B-frames are inserted between the I- and
P-frames. B-frames use data in the previous I- or P-frames
as well as the future I- or P-frames, thus, they are consid-
ered bidirectional. The data that they contain are an in-
terpolation of the data in the previous and future frames,
with the assumption that the pixels will not drastically
change between the two frames. As a result, the B-frames
have the most compression and are the smallest of the
three types of frames. In order for a decoder to decode
the B-frames, it must have the I- and P-frames that they
are based on; thus the frames may be transmitted out of
order to reduce decoding delays (Comqaq, 1998).
A frame sequence consisting of an I-frame and its fol-
lowing B- and P-frames before the next I-frames is called
a group of pictures (GOP) (Compaq, 1998). There are usu-
ally around 15 frames in a GOP. An example of the MPEG
encoding process can be seen in Figure 1. The letters I, P,
and B in the figure represent the I-, P-, and B-frames that
could possibly be included in a group of pictures. The let-
ters were sized to indicate the relative size of the frame
(as compared to the other frames).