Lossy Compression

How does compressed audio like mp3 work and what effect does it have on sound?

Introduction

Lossy audio compression formats such as mp3 and others have replaced CD audio for many users. Advocates of such formats claim CD quality sound. Meanwhile, others lament that these formats mark the demise of high-fidelity sound. How does such compression work and what actual effects on the sound are to be expected from the analysis of the process.

Description of Process

Encoding

Lossy audio compression operates as in figure 1. A PCM audio stream usually CD quality is processed as follows

The stream is cut into time slices suitable for conversion by a fast Fourier transform.
A fast Fourier transform converts each time slice into its discrete frequency equivalent.
The frequency spectrum is analyzed to determine which frequency samples to keep and which can be discarded.
The frequency samples that remain are compressed by presumably non-lossy means.
The stream of compressed FFTs is stored in a file.

Figure 1

Encode Lossy Compression

PCM

Cut PCM stream into time slices

PCM

FFT

Choose/Reject Frequencies

FFT

Compress Remaining Frequencies

Compressed FFTs

File storage

As a result the entire effect of compression depends on the algorithm used to choose frequencies to discard. Ordinarily, a principle called psychoacoustic masking along with the amount of compression chosen. It is deemed that louder frequency samples mask quieter adjacent ones so that discarding them does not effect signal quality.

Decoding

Decoding compressed audio operates as in figure 2. Decoding itself imposes no penalty; any and all loss occurs on the encoding end.

The stream of compressed FFTs is retrieved from the audio file
The each compressed FFT is decompressed to a full FFT containing as many frequency samples as before encoding.
An inverse FFT (IFFT) restores each original PCM time slice.
Individual time slices are recombined into a continuous PCM stream.

Figure 2

Decode Compression

File storage

Compressed FFTs

Decompress FFTs

FFT

IFFT

PCM

Recombine time slices

PCM

Of course, after the PCM stream is restored from the compressed file, it is converted to analog audio by a DAC as any PCM audio signal would be.

Analysis of sound effect

How does this process affect the sound. Although the first impression is that removing anything from a recording would harm it, some might argue that removing the quietest frequencies would not be so bad. Perhaps noise reduction might result. The answer has to do with the way frequencies are rendered. Certainly, in the frequency domain only certain frequencies are represented by discrete intervals. In between frequencies have to be approximated by weighted combinations of the discrete frequencies below and the one above that to be represented. If the actual frequency is very close to one of its discrete neighbors the other combining one would add at such a low level that it would be at risk of deletion by the compression process. Then if it were deleted the original tone would be rounded up or down to the nearest discrete frequency. Practical algorithms would not delete a low level frequency sample immediately next to one of large magnitude. However in order to achieve a set level of compression, there is some low level at which this digitization of frequency is bound to occur, resulting in some perception of the alteration of sound. The more compression that is required the more likely that some low level sound is going to sound digital.

Another effect is likely to occur as well. Because the Fourier transform is symmetrical – that is that the inverse transform is the same as the forward apart from a scaling factor – distorting the frequency domain is likely to create time anomalies as well. The beginning and end of a musical impulse may be altered or smeared. Even analog filters create time distortion when frequency is altered in the form of phase shift and group delay although these are not considered as offensive as digital effects because they are linear. Distortions that applied to the time domain would create harmonic or intermodulation distortion would create pre- and post-echos in the time domain when applied to the frequency domain, an effect likely invoked by these algorithms. These time effects would likely be diminished in their severity by the shortness of the time slices used because they would be contained by them. Perhaps at some rate of time slices – say 20 to 30 per second or more – the hearer may not readily notice that musical elements have been rearranged in time. What a mess!

All of the effects can be proven and observed. The next time you receive a call from someone on a cell phone notice the background sounds, how robotic they sound. This results from the high levels of compression that such systems use.

Variations

The above description is meant to be an overview of how all such systems work. My original understanding was informed by a tract describing the operation of the compression system used by Sony Minidisc recorders. Minor details of implementation may differ in attempts to improve this system. For example claim is made that the open source Ogg Vorbis system improves the compressed sound by making digital artifacts sound like white noise, a result comfirmed by listening tests. Likely the average value of discarded frequency samples is retained for each time slice to fill the gaps in the frequency spectrum when the FFTs are uncompressed on decoding with white noise.

Final remarks

It will remain a subjective matter whether listeners can tolerate the defects introduced to the sound by these systems for the sake of compact memory storage. In my own experience, I reduced my set of 60+ Bible CDs to one mp3 DVD and like the results for that purpose. However, I have declined to buy albums that I wanted that only were available for download on mp3 and not CD.

Document History
February 27, 2016 Created.
February 29, 2016 Added time anomalies to Analysis of sound effect..