Copyright © 2016 by Wayne Stegall
Updated February 29, 2016. See Document History at end for
does compressed audio like mp3 work and what effect does it have on
Lossy audio compression formats such as mp3 and others have replaced CD
audio for many users. Advocates of such formats claim CD quality
sound. Meanwhile, others lament that these formats mark the
demise of high-fidelity sound. How does such compression work and
what actual effects on the sound are to be expected from the analysis
of the process.
Description of Process
Lossy audio compression operates as in figure 1
. A PCM audio stream
usually CD quality is processed as follows
- The stream is cut into time slices suitable for conversion by a
fast Fourier transform.
- A fast Fourier transform converts each time slice into its
discrete frequency equivalent.
- The frequency spectrum is analyzed to determine which frequency
samples to keep and which can be discarded.
- The frequency samples that remain are compressed by presumably
- The stream of compressed FFTs is stored in a file.
As a result the entire effect of compression depends on the algorithm
used to choose frequencies to discard. Ordinarily, a principle
called psychoacoustic masking along with the amount of compression
chosen. It is deemed that louder frequency samples mask quieter
adjacent ones so that discarding them does not effect signal quality.
Decoding compressed audio operates as in figure 2
. Decoding itself
imposes no penalty; any and all loss occurs on the encoding end.
- The stream of compressed FFTs is retrieved from the audio file
- The each compressed FFT is decompressed to a full FFT containing
as many frequency samples as before encoding.
- An inverse FFT (IFFT) restores each original PCM time slice.
- Individual time slices are recombined into a continuous PCM
- Of course, after the PCM stream is restored from the compressed
file, it is converted to analog audio by a DAC as any PCM audio signal
Analysis of sound effect
How does this process affect the sound. Although the first
impression is that removing anything from a recording would harm it,
some might argue that
removing the quietest frequencies would not be so bad. Perhaps
reduction might result. The answer has to do with the way
frequencies are rendered. Certainly, in the frequency domain only
certain frequencies are represented by discrete intervals. In
between frequencies have to be approximated by weighted combinations of
the discrete frequencies below and the one above that to be
represented. If the actual frequency is very close to one of its
discrete neighbors the other combining one would add at such a low
level that it would be at risk of deletion by the compression
process. Then if it were deleted the original tone would be
rounded up or down to the nearest discrete frequency. Practical
algorithms would not delete a low level frequency sample immediately
next to one of large magnitude. However in order to achieve a set
level of compression, there is some low level at which this
digitization of frequency is bound to occur, resulting in some
perception of the alteration of sound. The more compression that
is required the more likely that some low level sound is going to sound
Another effect is likely to occur as well. Because the Fourier
transform is symmetrical – that is that the inverse
transform is the same as the forward apart from a scaling factor –
frequency domain is likely to create time anomalies as well. The
beginning and end of a musical impulse may be altered or smeared.
Even analog filters create time distortion when frequency is altered in
the form of phase shift and group delay although these are not
considered as offensive as digital effects because they are
linear. Distortions that applied to the
domain would create harmonic or intermodulation distortion would
create pre- and post-echos in the time domain when applied to the
frequency domain, an effect likely invoked by these algorithms.
effects would likely be diminished in their severity by the shortness
the time slices used because they would be contained by them.
Perhaps at some rate of time slices – say
20 to 30 per second or more – the hearer may not readily notice
that musical elements have been rearranged in time. What a mess!
All of the effects can be proven and observed. The
next time you receive a call from someone on a cell phone notice the
background sounds, how robotic they sound. This results from the
high levels of compression that such systems use.
The above description is meant to be an overview of how all such
systems work. My original understanding was informed by a tract
describing the operation of the compression system used by Sony
Minidisc recorders. Minor details of implementation may differ in
attempts to improve this system. For example claim is made that
the open source Ogg Vorbis system improves the compressed sound by
making digital artifacts sound like white noise, a result comfirmed by
listening tests. Likely the
average value of discarded frequency samples is retained for each time
slice to fill the gaps in the frequency spectrum when the FFTs are
uncompressed on decoding with white noise.
It will remain a subjective matter whether listeners can tolerate the
defects introduced to the sound by these systems for the sake of
compact memory storage. In my own experience, I reduced my set of
60+ Bible CDs to one mp3 DVD and like the results for that
purpose. However, I have declined to buy albums that I wanted
only were available for download on mp3 and not CD.
February 27, 2016 Created.
February 29, 2016 Added time anomalies to Analysis of sound