Mathematical Consideration of Some of the Limitations of Digital Audio

Introduction

Having had a turntable as the center of my first hi-fi system, I was disappointed to hear my first CD player. Like others, I thought the treble was coarse and grainy. Although I was immediately hooked to the absence of noise, the quality of the sound was not what I expected.

Bit Resolution and Round-off Error

My first CD player had an 18 bit digital filter on the front end of its DAC. I suspect round-off error to be the primary reason for initial disappointment with CD playback. That upsampling has recently been more favored than oversampling, suggests the importance of bit resolution. In my efforts to determine the difference between these two processes, I found synchronous upsampling and oversampling to be the same. The difference in sound may be due bit resolution. Oversampling is usually done in the DAC presumably at its resolution of 16, 18, 20, or 24 bits. Currently, the best external asynchronous resamplers (which are used to do synchronous upsampling) have an internal resolution of 28 bits. Jitter reduction may add to the improvement in the case of asynchronous upsampling, but that subject has been thoroughly covered by others. It is also worthy of note that a CD player using a new 32-bit AKM DAC is highly acclaimed now. Some hold that the higher noise of analog and the resulting limit on dynamic range diminish or limit the need for higher bit resolutions. I believe the resolution of analog is greater than its dynamic range. This is the subjective opinion of many. That a signal can be extracted from much louder noise background by correlation algorithms is a very objective support of the same conclusion.

Phase Compression and Magnitude Uncertainty

It is evident that a half sampling frequency waveform (f_s/2 = 22.05kHz) is forced into phase with the digital clock and its amplitude diminished from its true value by an amount related to the phase compression. To what degree are lower frequencies subject to the same effect? Because of three-phase ac motors’ reputation for smoothness, I occurred to me that with a one third sampling rate signal (f_s/3 = 14.7kHz) the 120 degree separation of sampling vectors would certainly allow for proper phase and magnitude representation. From this frequency down, all would be well in this regard. Above this frequency, there is the potential for trouble. Mathematically, the digital filter was expected to eliminate all of these effects, because presumably the error in the signal is contained in the out-of-band digital images created by sampling. The phase and magnitude anomalies would be filtered out along with the unwanted images. This math depends on the signal being predictable and steady, however. Music is transient by nature. If the signal does not hold steady long enough for a reasonable representation of the beat frequencies created by sampling process, there may be no certainty that the exact phase and magnitude can be extracted from frequencies above f_s/3. The importance of the f_s/3 frequency of 14.7 kHz may explain why some have noticed the CD medium to be deficient and others not. Many do not have hearing above this frequency due to the abuse of concerts and other loud music.

Figure 1: F_s/2 Polar Plot - Black arrows show sampling vectors at f_s/2 (22.05kHz.) Full phase compression, vectors cannot add to any other phase. Magnitude trigonometrically related to phase alignment.

Figure 2: F_s/3 Polar Plot - Black arrows show sampling vectors at f_s/3 (14.7kHz.) Pink and red vectors show one example of how two sampling vectors can add to any desired magnitude and phase angle.

Limits on Frequency Resolution

Mention of the transient nature of music affecting digital reproduction, suggests another limitation. First consider the nature of the digital signal. The time sampled signal is related through a discrete Fourier transform to a discrete frequency spectrum. Because the Fourier transform is linear, we can break up our music into separate Fourier transforms all corresponding to different transient segments of the entire recording. This is true for the continuous or analog Fourier Transform as well. For clarity, let’s define a musical impulse as the smallest transient of music separated from the whole that has mathematical significance. Now, having each musical impulse in separate sequences of samples, we can analyze each separately. Now it is apparent from the properties of the transform and the digital domain that the frequency resolution of each musical impulse or transient depends on its duration. This is because there are half as many frequencies represented as time domain samples. Specifically the lowest represented frequency and the separation of discrete frequencies and the resulting frequency resolution is the inverse of the transient duration. I.e. a 1ms¹ transient can only represent frequencies from 1kHz to 22kHz in 1kHz steps. A 5ms transient can only resolve frequencies from 200Hz up in 200Hz steps. A 1s musical impulse could resolve 1Hz frequency distinctions. This also implies that the frequencies of a transient are ambiguous at the first instant and become more defined as the transient proceeds. This is an observation that can be made by tuning a string on a musical instrument, whether by ear or by an electronic tuner. All of the musical impulses in a recording would have different resolutions depending on their duration. Because the information contained in a transient is proportional to its length, this limitation is at first a natural one. These natural signal limitations then are approximated in the number of samples allowed by the sampling rate during the transient. Upon sampling, it becomes very clear that the limited number of samples translates to a similar granularity of frequency resolution. A musical impulse having more frequency information than its duration would allow would compress the extra frequency information, creating the sort of digital graininess that many actually claim to hear. A cymbal crash, resembling white noise of continuous frequency, is a prime example of a transient that has more frequency information than its length will allow after sampling. These results may seem odd, but represent the breakdown of wide ranging music into its most fundamental building blocks. That is, as a whole, the sum of all the impulses conveys the sound with all of its apparent frequencies plus added digital artifacts.

Conclusion

These thoughts suggest that greater bit resolution and higher sampling rates are indeed desirable after all.

¹s is the metric abbreviation for second; ms is short for millisecond = 1/1000 of a second

Document History
May 23, 2009 Created.
May 23, 2009 Revised.
April 19, 2012 Changed seconds unit from S to s to avoid confusion with the Siemens unit.
December 2, 2015 Improved formatting.