A (175)

APPENDIX A: Audio Concepts, Terminology, and Codecs 653

In digital imaging and digital video, the resolution is quantified in the number of colors, and in digital
audio, this resolution is quantified in how many bits of data are used to define each of the audio
samples taken. Just like in digital imaging (more colors yields better quality), a higher sample
resolution yields better sound reproduction. The only difference between the two is that digital audio
supports 12-bit sample resolution. I often wish than digital imaging formats supported 12-bit color!

Thus, higher sampling resolutions—using more data to reproduce a given sound wave sample—will
yield a higher audio playback quality at the expense of a large data footprint. This is the reason that
16-bit audio, termed CD-quality audio, sounds better than 8-bit audio, just like truecolor images look
better than indexed color images.

In digital audio there is now 24-bit audio sampling, known as HD audio in the consumer electronics
industry. HD digital audio broadcast radio uses a 24-bit sample resolution, so each audio sample or
slice of a sound wave contains 16,777,216 units of sample resolution. Some newer Android devices
now support HD audio, such as the smartphones you see advertised featuring “HD quality” audio.
This means they have 24-bit audio hardware, that is, a 24-bit capable audio decoder chip installed.

Beside a digital audio sample resolution, there is also a digital audio sampling frequency. This is
how many of these samples at a particular sample resolution are taken during one second of sample
time. In digital image editing, the sampling frequency is analogous to the number of pixels contained
within an image. More pixels in an image would equate to the analog image being sampled more
frequently.

Sampling frequency can also be called the sampling rate. You are probably familiar with the term
CD-quality audio, which is defined as using a 16-bit sample resolution and a 44.1kHz sampling rate.
This is taking 44,100 samples, each of which contains 16-bits of sample resolution, yielding 65,536
bits of audio data in each sample.

Let’s do some math and find out how many bits of data provide one second of raw or uncompressed
digital audio data. This is calculated by multiplying a 65,536 sample resolution by a 44,100 sample
frequency. This gives you a maximum potential value of 2,890,137,600 bits to represent one second
of CD quality audio. Divide this by eight to get 361,267,200 bytes, and by 1,024 to get 352,800
kilobytes, and by 1,024 again to get 344 megabytes.

Not every CD quality 16-bit sample will use all of these potential data bits, however, so your original
raw (uncompressed) audio samples will be smaller than this, usually only a few dozen megabytes.

So to figure out raw data in an audio file you would multiply the sampling bit rate by the sampling
frequency by the number of seconds in that audio snippet. You can see that it can often be a huge
number! Audio codecs are really great at optimizing this data down to an amazingly small data
footprint with very little (audible) loss in quality, as you will see during the course of this book.

So the exact same trade-off that you have in digital imaging and in digital video exists with digital
audio as well. The more data you include, the better quality result you get, but at the cost of a larger
data footprint.

In the visual medium, this is defined using color depth and pixels. With digital video, it’s defined in
frames. In the aural medium, it is defined via the sample resolution in combination with the sampling
rate. Common sampling rates in the digital audio industry include 8kHz, 22kHz, 32kHz, 44.1kHz,
48kHz, 96kHz, 192kHz, and even 384kHz.

A (175)

Get our desktop app

Company

Features

Documentation

Resources