Space Savers:
Bend, Twist and Crunch Your Audio Files for the Web
by Neil Leonard
www.neilleonard.com
published in Electronic Musician, July 1998
Now that so many people are connected via the Web it is easier to exchange
music than ever before. All one has to do is point their browser to a Web
page with posted music files, click on a file name and download the music.
Right? Actually, in many respects the Web has increased the complexities
of distribution. When you buy a CD or standard audio cassette there is no
question that it will play on your home system.
Instant playback is not guaranteed when dealing with audio files that you
download from the Internet. Web distribution is more complex than CD or
cassette distribution partly due to the limitations of available bandwidth.
A person using a dedicated T1 line can transfer up to 128 Kbps (thousand
bits per second), which is fast enough to download a CD quality audio file
in real-time. However, a vast number of people accessing the Internet still
use 28.8 Kbps or even 14.4 Kbps modems. If you try to download one minute
of CD quality audio using one of these modems you might find yourself waiting
for over an hour.
In order to expedite transfer rates and reduce the amount of disk space
occupied by digital audio files, several approaches to encoding audio data
have evolved. You will find yourself using these technologies when you download
audio clips, tune in to a Web radio station, play a digital video, or listen
to audio streaming from a musician or record companies Web site. Even when
you are not on-line you might use an audio compression algorithm simply
to save disk space.
The piece of software that makes all this possible is called a codec, which
stands for compressor/decompressor. Its worth pointing out that a codec
is unlike a software based dynamic range compressor. Here the goal is to
reduce the number of bits that are required to represent the waveform.
The majority of codecs that we will look at here are lossy, meaning that
once the audio signal has been encoded, there is no guarantee that decompressing
it will produce an exact replica of the original data. You might ask, 'How
can we just throw away parts of the signal?' To answer that question lets
look at how a type of codec called a waveform coder works.
Waveform coders produce a close approximation of the waveform using fewer
bits. One widely used waveform coder is IMA-ADPCM which stands for International
Multimedia Association's specification for Adaptive Differential Pulse-Code
Modulation. This is a variant of the ADPCM that is widely used in the telecommunications
industry. IMA-ADPCM, was designed specifically for desktop audio applications
and is incorporated into the Windows operating system, where it is refer
to as ADPCM. It is also part of Apple's QuickTime software. Mac users often
refer to it as IMA compression. In some cases, files compressed with IMA-ADPCM
can sound indistinguishable from the original un-compressed version.
IMA-ADPCM works because the amplitude of an audio waveform tends to change
gradually. As it turns out, the waveform can be represented more efficiently
by saving the difference between consecutive samples, as opposed to saving
the absolute value of individual samples.
When an audio signal is sampled by a 16-bit analog to digital converter,
the incoming analog signal is measured at periodic intervals and converted
to corresponding 16-bit quantities. These 16-bit values can represent any
whole number between 0 and 65,525. If we measure the difference between
consecutive samples in one channel, we might find that the absolute difference
between two values rarely exceeds 100. Well, if the difference values of
a waveform never exceed 100, then we do not need 16 bits per sample to represent
this it.
In IMA-ADPCM each sample is represented by a 4-bit difference value. A 16-bit,
44.1 kHz, stereo file will be reduced to 25% of its original size. This
gives us a fixed 4:1 compression ratio. [Fig 1. Sound Converter can be used
to encode audio files using a variety of codecs including IMA-ADPCM.] Sometimes
the difference between samples is referred to as difference modulation,
or delta modulation. Difference modulation is not unique to IMA-ADPCM, in
fact it is the basis for other audio codecs including Dolby Labs' AC-1 codec.
Lets have a closer look at how these four bit values are used. Fifteen different
values can be represented by four bits. So, in its simplest form a four
bit difference value can represent a number between -7 to 7. However, what
happens when the waveform's amplitude jumps by a value greater than 7? Well,
rather than using a range of numbers between -7 to 7, we could use the same
4-bits to represent even numbers between -14 to +14. To wrap up our overview
of this codec, lets look at how IMA-ADPCM formats the waveform data.
The IMA-ADPCM codec groups consecutive samples in bundles. On the Macintosh
each bundle consists of 64 samples. Bundles begin with a step index, or
multiplier to scale the difference values. For example, this value determines
whether the difference values are on a -7 to 7 or -14 to 14 scale. The step
index value can vary, or be adapted to the needs of each bundle, hence the
A in ADPCM. The beginning of each bundle also has a predictor value to specify
the absolute amplitude of the first sample of each bundle.
Despite its often stunning results, there are drawbacks. The IMA did not
define the number of samples that are in a bundle or the number of bits
that are allocated for the step index and predictor values. As a result,
Microsoft and Apple came up with their own incompatible implementations
of IMA-ADPCM, that use different bundle sizes and bit allotments for step
index and predictor bytes. You might need to know what platform was used
to create an IMA-ADPCM file prior to selecting a piece of software to listen
to it.
There are additional caveats. ADPCM does not lend its self to random access.
You might have to decode your IMA-ADPCM files with a piece of utility software
before editing them with your favorite waveform editor. IMA-ADPCM encoders
convert the incoming file to 16-bits before creating the final file. If
you process an 8-bit sample file, it will automatically be converted to
the 16-bit before it is reduced to a 4-per sample file. So, you are better
off encoding a 16-bit version of the file. It will sound much better and
use the same amount of disk space.
The waveform coder is just one type of codec. What happens when you go to
a Web page where audio playback happens nearly instantaneously and files
are not downloaded. These streaming technologies rely on perceptual coders,
which use more intensive algorithms to provide even greater data reduction
ratios. Perceptual coders are the basis of MPEG (used in Shockwave) and
Dolby AC-2 and Dolby AC-3. (See "Surfing the Pipeline," EM, September,
1997 for an overview of products that use these technologies).
Perceptual coders radically reduce the amount of stored data, yet can yield
CD quality sound files. The music industry now views this as a powerful
alternative to traditional distribution methods. Web distribution practically
eliminates manufacturing costs and provides around-the-clock shopping.
At present you can audition high quality preview files or even purchase
tracks that have been encoded using a perceptual coder. If you download
a track you can use a piece of software to decode it and burn it to a standard
Red Book audio CD. One such system, Liquid Audio, has already been used
to publish Duran Duran's new album. Liquid Audio's file server software
generates broadcast reports for BMI, ASCAP and the Harry Fox Agency. While
this delivery medium is in its infancy, some specialists believe that on-lines
sales will reach $1.3 billion by the end of this millennium. [Fig 2. Liquid
Audio Screen Shot <<<The Berklee firewall would not let me actually
load examples - can you please grab these?>>>] Both Liquid Audio
and Real Audio use perceptual coders developed by Dolby Laboratories as
the basis of their streaming technologies.
Unlike waveform coders, perceptual coders do not attempt to preserve the
contour of the original waveform. Instead, the goal is to ensure that the
final output signal sounds like the original. To achieve this, the encoding
algorithm uses a model of the human auditory system to determine what parts
of the signal are masked, or inaudible. These parts of the audio signal
are deemed irrelevant and are removed. Hence, the amount of information
that needs to be stored is reduced.
For example, if a guitar concerto was encoded using a perceptual coder,
the algorithm would determine that the frequencies produced by the guitar
are of critical importance during cadenzas. However, when the full string
section comes in we cannot always distinguish the guitar part. At these
points the coder would eliminated the frequencies produced by the guitar,
without any perceptible loss of audio quality.
In order to perform these tasks the encoder analyses the input signals within
consecutive overlapping time blocks that might be anywhere from a few hundred
to a few thousand samples long. Each block is divided into narrow frequency
sub-bands of different sizes according to the frequency sensitivity of human
hearing. A psychoacoustic model is then used to determine which sub-bands
contain irrelevant information that can be discarded.
Perceptual coders are scalable codecs, meaning that the compression ratio
can be adjusted by the user. It is common for these coders to include a
dialog box that allows the user to set the compression ratio to meet a minimum
bit rate that is expected when the file is played back via a modem of a
particular speed. [Fig 3. Macromedia's Shockwave Audio codec allows the
user scale the size of the encoded file to match the limits of a particular
modem speed.] This information is used to help determine how many bits to
allocate for different frequency ranges. Sub-bands that are deemed more
critical to our perception of the music get a more generous allotment of
bits.
Encoded files can be streamed or posted on the Web for downloading. In either
case, a special piece of software is required to playback the file. At playback
time the decoder uses an inverse filter bank to synthesize audio.
So, we have examined two types of audio codecs. Are there more? Definitely.
If you are running Windows 95, look at the Advanced Multimedia Properties
in the Control Panel. Chances are that you will find over a half dozen audio
codecs listed here. Fortunately, in most basic cases Windows finds the right
codec for the task, and you might not even know that codecs are being used.
Once you begin to explore the available codecs for your OS you might run
across µ-Law, which is used for some timee in North America and Japan.
It was defined by CCITT (International Telegraph and Telephone Consultative
Committee). It compresses audio using 8-bits per sample and can achieve
a signal to noise and dynamic range equivalent to that of a 12-bit system.
The step index is based on a logarithmic scale that is well suited for encoding
speech. Another waveform coder is Apple's MACE (Macintosh Audio Compression
Expansion) for encoding 8-bit files using difference modulation.
Does it end here? Hardly. Audio coding technologies are being updated on
a monthly, if not weekly, basis. Emagic just introduced ZAP (Zero-loss Audio
Packer), a stand alone application that allows users to archive their work
with up to 60% savings in file size. When expanded from the compressed files,
the original waveform is restored unchanged. ZAP supports SoundDesigner
II, AIFF and Windows Wave file formats. Files that have been compressed
with ZAP can be saved as self-extracting files, making it possible to decompress
the files without additional software.
By the time you read this Apple should have released QuickTime 3.0 which
ups the ante even further by incorporating two new audio codecs. The QDesign
Music Codec (QDMC) is designed to deliver CD quality music via a 28.8 Kbit
modem in real-time. QDMC offers 99 percent file size reduction, without
reducing audio quality. QUALCOMM's PureVoice is optimized for speech and
can stream telephone quality speech information over a 28.8k modem.