banner
 Home  Audio Home Page 


Copyright © 2016 by Wayne Stegall
Updated February 29, 2016.  See Document History at end for details.




Lossy Compression

How does compressed audio like mp3 work and what effect does it have on sound?

Introduction

Lossy audio compression formats such as mp3 and others have replaced CD audio for many users.  Advocates of such formats claim CD quality sound.  Meanwhile, others lament that these formats mark the demise of high-fidelity sound.  How does such compression work and what actual effects on the sound are to be expected from the analysis of the process.

Description of Process

Encoding

Lossy audio compression operates as in figure 1.  A PCM audio stream usually CD quality is processed as follows
Figure 1
Encode Lossy Compression


PCM
Cut PCM stream into time slices
PCM


FFT FFT


Choose/Reject Frequencies
FFT


Compress Remaining Frequencies
Compressed FFTs


File storage



As a result the entire effect of compression depends on the algorithm used to choose frequencies to discard.  Ordinarily, a principle called psychoacoustic masking along with the amount of compression chosen.  It is deemed that louder frequency samples mask quieter adjacent ones so that discarding them does not effect signal quality.


Decoding

Decoding compressed audio operates as in figure 2.  Decoding itself imposes no penalty; any and all loss occurs on the encoding end.

Figure 2

Decode Compression


File storage


Compressed FFTs


Decompress FFTs
FFT


IFFT
PCM


Recombine  time slices
PCM


Analysis of sound effect

How does this process affect the sound.  Although the first impression is that removing anything from a recording would harm it, some might argue that removing the quietest frequencies would not be so bad.  Perhaps noise reduction might result.  The answer has to do with the way frequencies are rendered.  Certainly, in the frequency domain only certain frequencies are represented by discrete intervals.  In between frequencies have to be approximated by weighted combinations of the discrete frequencies below and the one above that to be represented.  If the actual frequency is very close to one of its discrete neighbors the other combining one would add at such a low level that it would be at risk of deletion by the compression process.  Then if it were deleted the original tone would be rounded up or down to the nearest discrete frequency.  Practical algorithms would not delete a low level frequency sample immediately next to one of large magnitude.  However in order to achieve a set level of compression, there is some low level at which this digitization of frequency is bound to occur, resulting in some perception of the alteration of sound.  The more compression that is required the more likely that some low level sound is going to sound digital. 

Another effect is likely to occur as well.  Because the Fourier transform is symmetrical – that is that the inverse transform is the same as the forward apart from a scaling factor – distorting the frequency domain is likely to create time anomalies as well.  The beginning and end of a musical impulse may be altered or smeared.  Even analog filters create time distortion when frequency is altered in the form of phase shift and group delay although these are not considered as offensive as digital effects because they are linear.  Distortions that applied to the time domain would create harmonic or intermodulation distortion would create pre- and post-echos in the time domain when applied to the frequency domain, an effect likely invoked by these algorithms.  These time effects would likely be diminished in their severity by the shortness of the time slices used because they would be contained by them.  Perhaps at some rate of time slices – say 20 to 30 per second  or more – the hearer may not readily notice that musical elements have been rearranged in time.  What a mess!

All of the effects can be proven and observed.  The next time you receive a call from someone on a cell phone notice the background sounds, how robotic they sound.  This results from the high levels of compression that such systems use.

Variations

The above description is meant to be an overview of how all such systems work.  My original understanding was informed by a tract describing the operation of the compression system used by Sony Minidisc recorders.  Minor details of implementation may differ in attempts to improve this system.  For example claim is made that the open source Ogg Vorbis system improves the compressed sound by making digital artifacts sound like white noise, a result comfirmed by listening tests.  Likely the average value of discarded frequency samples is retained for each time slice to fill the gaps in the frequency spectrum when the FFTs are uncompressed on decoding with white noise. 


Final remarks

It will remain a subjective matter whether listeners can tolerate the defects introduced to the sound by these systems for the sake of compact memory storage.  In my own experience, I reduced my set of 60+ Bible CDs to one mp3 DVD and like the results for that purpose.  However, I have declined to buy albums that I wanted that only were available for download on mp3 and not CD.




Document History
February 27, 2016  Created.
February 29, 2016  Added time anomalies to Analysis of sound effect..