7 Myths of Digital Audio Dispelled

Introduction:

Everything in high-end audio is a compromise. No typology is best at everything. Every typology has its advantages and disadvantages. Digital is no different. Despite popular belief, there is not one type of DAC, quantization method, or CODEC, which is ideal for all situations, or better in all ways than others. Each has their advantages and disadvantages, and certain ones work better in certain situations.

Over the years I have heard and read all sorts of myths and misconceptions regarding digital audio. I hope to dispel the 7 most common categories of myths. I’m going to attempt to explain each typology in layman’s terms so that all of you can decide for yourselves what compromises you want to make.

Myth #1 Clocking:

Despite what you may hear in some company’s marketing message or read on forums; the accuracy of the clock makes no difference. The cheapest quartz clock they use in a $100 DVD player has more accuracy than you can hear, something like .005%. And you cannot hear the difference between .005% and .000005%.

So, what makes one clock sound better than another? Two things: how low the hash noise is that the clock creates in the audible spectrum and how little electromagnetic interference it creates. Both of these pollute other components in the same chassis.

An OCXO or Oven Controlled Crystal Oscillator, may have lower accuracy than a TCXO, or Temperature Controlled Crystal Oscillator, but it will sound better because it has lower hash noise. And the oven surrounding the clock does a better job of shielding electromagnetic radiation. BTW, OCXOs were invented to use in extremely cold environments, such as deep sea, arctic, or aerospace. None were created for high-end audio. It just so happens some of the best sounding clocks in high-end audio are oven-controlled crystal oscillators.

Here's a photo of the 10MHz OCXO clock (silver rectangle upper right) used in the new Jay's CDT-3 MkIII CD transport which is my current reference digital audio source. It is not the accuracy of the clock that makes it sound so good, but rather that it minimizes hash noise and electromagnetic radiation, so it does not negatively effect the surrounding circuitry. Notice this clock has a dedicated power supply and does not share power with any other devices in the chassis. The power supply isolation and proximity isolation of this clock, and how low the hash noise and electromagnetic interference is in the 10MHz clock they use, are what makes this transport sound so amazing, not how accurate the clock is.

So, what about Master Clocks? Think about it: they are isolated in a separate chassis with a separate power supply. That is what makes them sound so good, not that they are more accurate than integrated clocks.

Myth #2: DSD Sounds Better Than PCM or PCM Sounds Better Than DSD.

Does DSD, PCM , or MQA sound better than the others: the answer is not universal. Delta-Sigma DACs read native DSD and R-2R DACs read native PCM, so DSD will sound better on a Delta-Sigma DAC and PCM will sound better on an R-2R DAC. MQA is a CODEC not a quantization typology. I’ll get into CODECs a bit later.

Many of you are thinking “my DAC decodes all of those” which is another misconception. To begin with, Delta-Sigma DACs don’t “decode” they “interpolate” but I’ll get into that a bit later. So instead of the word “decode” let’s use the word “convert” which is what all digital to analog converters actually do.

Delta-Sigma DACs must first convert the PCM to DSD and R-2R DACs must first covert DSD to PCM before they can convert them to analog. When people make misguided universal statements that DSD is better than PCM or PCM is better than DSD they are correct only in respect to specific DACs.

Listening tests were performed and they concluded that there was no significant percentage of people who could distinguish the same recording quantized in PCM vs DSD. Of course, they did those tests 100% fairly using one of those rare DACs which has both an R-2R and a Delta-Sigma DAC chip. Since both chips used identical input stages, identical output stages, and identical power supplies. And since the chips were warmed up identically because they were in the same chassis. Those comparisons were fair. Very few other comparisons you may read about were done fairly.

Myth #3: Certain CODECs Sound Better

Many people confuse CODECs with quantization typologies. DSD, PCM and Wide-DSD are examples of quantization typologies. FLAC, ALAC, and MP3 are examples of CODECs. I as mentioned, MQA is a very advanced CODEC.

CODEC strands for “compression/decompression” and is a way to make audio files take up less storage space. This was a big deal a decade ago when storage space was expensive and large. Today most people can fit their entire music library on one SSD. Today compression is only relevant in terms of HD streaming bandwidth.

Some CODECs are what is called “lossless” which means they claim to decompress to exactly the original audio file. Even if they do decompress to the original audio file the processing required to decompress them lowers the performance of the player software they are being played on. For optimal performance you don’t want to store compressed files.

Interestingly enough, there are versions of FLAC and ALAC which are not compressed. So though called the same thing as the compressed versions, they are most certainly not CODECs. What these formats do which is desirable is package the album cover art and advanced metadata with the music data.

WAV is the most basic form of music file, and is not a CODEC. WAV is what you find on a CD. WAV can have basic metadata with the album and track names but cannot have album cover art packaged with the music data. Of course, player software can associate the album cover art with a specific WAV file yielding the same convenience of uncompressed FLAC and ALAC, with better performance.

WAV sounds the best because it is the least complex and requires the least amount of processing. And WAV is universally played on most player software. FLAC is optimized for Windows and ALAC is optimized for Apple, so each tends to work and sound best when played through their respective operating systems. Since Apple is a fancy GUI on top of Unix, ALAC will tend to sound better in Linux.

MQA can have more than one level of performance. Many believe it sounds notably better than DSD or PCM. The problem is to take advantage of MQA’s full potential you must use MQA compression in the studio when creating the original recording: you cannot convert DSD or PCM to the most advanced MQA. So, you’re never going to hear The Beatles, Billy Holiday, or Itzhak Perlman in advanced MQA.

What they call MQA on the HD streaming services is the most basic form of MQA compression and is a very efficient CODEC which allows your favorite HD streaming service to use a fraction of the bandwidth to bring you 24-bit 88.2KHz PCM.

Of course, the specific player software you are using can have more of an impact on performance than the file format. So, you may want to compare different file formats or CODECs on your specific player software before deciding what will sound best for you. My guess would be that WAV will sound best on all, but that may not necessarily be true.

Since the benefits from statistical error correction can be better than the advantage R-2R has in converting MQA, MQA files streamed from the internet could sound better on a specific Delta-Sigma DAC than a specific R-2R DAC. It would depend on the specific bit stream that reaches the DAC and the specific DAC.

Myth #4: Delta-Sigma Sounds Better Than R-2R or R-2R Sounds Better Than Delta Sigma.

There are basically two quantization typologies commercial music is marketed in: PCM and DSD. There is Wide-DSD, aka 5-bit or 8-bit DSD. Though widely used in studios during the recording, mixing, and mastering process, sadly there are no commercially available versions of our favorite recordings in Wide-DSD, even though a significant percentage of Delta-Sigma DAC chips can convert it.

All DAC chips, discrete or integrated, use R-2R and/or Delta-Sigma conversion. Some are “Hybrid” DAC chips which use both R-2R and Delta-Sigma. For the most significant bits they use R-2R, for the least significant bits they use Delta-Sigma, and they use an algorithm to weight and combine the two.

This same weighting and combining can also be done with more than one R-2R resistor ladder. This concept is called “segmented conversion.” The famous 24-bit PCM1704 DAC chip, often mistakenly called R-2R, is an excellent example of an IC segmented R-2R DAC chip. The greatest bit depth you can laser match on one IC’s Silicon wafer is 20-bits. Something like .000005% matching tolerance is required to achieve a bit depth of 20-bits. To get 24-bits of digital resolution (note I said “digital resolution”) you need more than one resistor ladder and an algorithm to weight and combine them.

Those so-called discrete R-2R DACs are actually segmented R-2R. They use modest tolerance resistors commercially available, such as .5%, put them in multiple lower bit resistor ladders, weight the voltage coming out of each resistor ladder, and then use algorithms to combine the weighted voltages from each ladder. This is one reason why segmented R-2Rs made from discrete parts have relatively poor linearity. Discrete R-2R DACs are a brilliant piece of engineering and can have a very attractive sound. To me they sound halfway between a true R-2R and a Delta-Sigma DAC. Not my cup of tea, but for some of you this may be the best of both worlds.

In any event, R-2R DACs, segmented or single ladder, are the only DAC typology which decode what’s in the bit stream. They take each digital word, which could be 16-bits or 24-bits, they put it through a series to parallel shift register, and then each bit is put through a rung in a resistor ladder, and then all the voltages from all the bits are summed in a summing amplifier.

For those of you who don’t completely understand digital theory let me get back to basics. In binary digital quantization each bit has double the value of the previous. 1, 2, 4, 8, 16, 32, 64, 128, and so on. Then you combine these numbers to create any number: 1 + 2 + 4 = 7 or 32 + 128 = 160. These numbers correspond to relative voltages or volumes, aka quantization values, in a PCM digital recording.

One thing many people find frustrating is the better your DAC gets, the more flaws you’re going to hear in your recordings. With R-2R you hear what is in the recording, warts and all. I prefer the articulation, proper time and tune, and harmonic coherency of R-2R. But I can’t tell you how many times I’ve played a modern recording and found it fatiguing. With R-2R, as your DAC gets better, your best recordings will sound transcendent while your worst recordings will become unlistenable.

As I mentioned, Delta-Sigma DACs, which comprise over 95% of the DAC chips sold today, do not actually “decode” the bit stream but rather "interpolate" it. They take in the digital bit stream faster than the music is playing, analyze it, noise shape it, error correct it, interpolate what they think the musical signal was supposed to look like, and then output a flawless waveform. Not quite the waveform which was quantized, but a very smooth and very even waveform. That is why Delta-Sigma DACs sound so smooth and refined. This is also why Delta-Sigma DACs have an advantage when playing mediocre sources such as music streamed from the internet.

Of course, algorithms cannot tell the difference between a bit-read error and emotional content. Think about it: emotional content is when the musician plays harder/softer or faster/slower. The algorithms in Delta-Sigma DACs see those as different than the other times similar notes were played, and they attempt to correct something that does not need correcting. I jokingly call statistical error correction a “defunkification filter.” Not my cup of tea, but I can see why so many people love that super smooth hear-no-evil sound. Sometimes a bit of tasteful color hides many sins.

FPGA DACs are the heavy weights in the Delta-Sigma world. They have much more powerful processors, so they can run much more sophisticated algorithms than a normal Delta-Sigma DAC. We are talking about super statistical error correction combined with insane upsampling. And the manufacturer can evolve and improve their algorithms and update their firmware. So, FPGA DACs are the most obsolete proof of all the DAC typologies. If you like that smooth, refined sound, and you’re planning on keeping your DAC for years to come, you’ll probably love an FPGA DAC.

Myth #5: The Higher the Upsampling, the Better the Sound

When some people claim upsampling sounds better and others claim it sounds worse: they are both correct. If their DAC uses statistical error correction, they are correct that insane upsampling can sound better and if their DAC does not use statistical error correction, they are correct that insane upsampling can sound worse. All this stuff about higher resolution sounding better is not universal, but DAC dependent.

44.1KHz done right sounds amazing. But it does have digital artifacts in the audible spectrum at fractions of 44.1KHz, such as 22.05KHz and 11.025KHz. By upsampling from 44.1KHz to 88.2KHz or 176.4KHz you move those digital artifacts above the audible spectrum and you get a subtle improvement in smoothness and a bit more harmonic coherency.

Upsampling can be done by some CD transports or can be done on a computer music server using Player software. Some player software will do upsampling and others will not. HQ Player is considered by many to be the heavy weight in the upsampling player software world.

As a rule, an R-2R DAC would not have statistical error correction. But theoretically you could add statistical error correction to an R-2R DAC if you feed it from a field programmable gate array (FPGA) running the right algorithms. An FPGA is another place you can do upsampling. Most FPGAs I’ve heard doing upsampling sounded better.

As a rule I've loved upsampling on a wide range of CD transports. Generally, I never loved it on my computer audio. Of course I only listen to R-2R DACs, so that would make perfect sense. Upsampling is very component and system dependent and over the years customers have reported both positively and negatively regarding upsampling.

Myth #6: Higher Resolution Files Sound Better

Despite what you might hear in some company’s marketing message or might read on a forum, Dr. Nyquist knew what he was talking about, and sampling at 44.1KHz perfectly quantizes the audio wave in frequencies up to 22KHz. My current reference digital source is a 16-bit 44.1KHz Red Book only CD transport. It sounds better than any high-definition 24-bit 352.8KHz PCM file, SACD, DSD, MQA, or anything else you can name played on computer audio.

Without upsampling 16-bit 44.1KHz Red Book CDs, played on a proper CD transport, will sound better than any HD computer audio I’ve ever heard. Granted I've not heard everything. But I've heard some of the best-of-the-best in computer audio, and there is no comparison, in terms of time, tune, tone, and timbre, harmonic coherency, and attack - bloom - decay, to a proper CD transport. Seriously: no comparison. The best CD transports use the top loading clamped disk CD typology and one of the vintage Red Book CD only mechanisms. Note that all things being equal, CD transports which use multi-disk DVD drives and tray loading CD transports can never sound as good as a simple Red Book 16-bit 44.1KHz only old school CD mechanism with a top clamp.

Myth #7: One transfer protocol sounds better than all others.

Transfer protocols are things like USB, Ethernet, S/PDIF, and I2S. Most computer transfer protocols, such as USB and Ethernet, are asynchronous, meaning they buffer and reclock the incoming data.

This is not different than what an external USB or Ethernet reclocker/regenerator does. So theoretically, if the internal USB or Ethernet input in a specific DAC is better than a specific external USB or Ethernet reclocker/regenerator, adding that “magic box” will degrade rather than improve performance.

S/PDIF stands for Sony/Philips Digital Interface and is one of the oldest and most commonly used digital transfer protocols. All CD transports and most DVD playes output S/PDIF coaxial with an RCA jack. S/PDIF is not inherently a computer transfer protocol, but there are some servers and streamers which output S/PDIF, and there are USB and Ethernet to S/PDIF converters.

AES is the balanced version of S/PDIF: same code with dual phase and better shielding. AES was created for pro audio to minimize noise when running longer distances. Most of the better CD transports output AES. In most cases I’ve found the coaxial S/PDIF can sound equal and even better than balanced AES over the normal 1-meter cables used in most home audio. This is partially because a good low-mass RCA connector is made from high-purity copper or silver whereas most balanced XLR connectors are made from less conductive high copper content brass. Same thing with BNC: the metallurgy of the connector can be a more important factor than being a perfect 75 ohms.

One of the big advantages of S/PDIF or AES is that they use an external clock. This lowers the potential noise inside the DAC’s chassis. Several highly regarded DAC manufacturers don’t integrate a USB or Ethernet input into their DACs, but rather make a high-performance USB or Ethernet to S/PDIF or AES converter, to better isolate noisy computer audio from their sensitive DAC.

Some DACs buffer and reclock the S/PDIF signal. This has advantages and disadvantages. When using an inferior source, such as a $1,000 multi-disc DVD player, the additional buffering and reclocking will improve performance. But when using a world-class CD transport with uncompromising clocking, the internal buffering and reclocking will degrade rather than improve performance.

I2S was engineered as an internal transfer protocol for inside of DACs and ADCs and is the language most DAC chips read. In most DACs all other transfer protocols are converted to I2S before they can be sent to the DAC chip. The official specification for I2S is that it should not be used for longer than 4”. This is why so few companies sell I2S compatible CD transports or DACs: it is not necessarily a good idea.

Think about it: all other transfer protocols are a bit stream with embedded clocking. Companies who boast about the performance of their I2S claim that the clocking in a single bit stream becomes corrupted. You see I2S has three wires: the data stream with embedded clocking, a bit clock which synchronizes with each bit, and a word clock which synchronizes with each digital word. If clocking in data streams can get corrupted, then why would it make sense to try to synchronize three data streams and clocks?

The only reason I2S sounds better on a specific DAC is because the other transfer protocols are of a lower level of performance. In a sense I2S saves the manufacturer money in that they are relying on expensive clocking from the component feeding their DAC rather than integrating such high-performance clocking.

So, which transfer protocol has the best sound? That would depend on the digital source (server, streamer, or CD transport), and the quality of the specific digital input on a specific DAC. Most DACs don’t have the same performance from all their inputs. Many DAC manufacturers will even state their best input is USB or Ethernet or S/PDIF. And even if you have the best input on your DAC, if you’re using a less than optimal digital source, overall performance won’t be all that good. So, once again, transfer protocols are not universal, but highly component dependent.

Conclusions:

So as you can see, different DAC typologies, different quantization formats, and different CODECs, each work best under specific situations. Different CODECs sound better with different operating systems and player software. And upsampling and different transfer protocols are both very hardware and system dependent.

If your main digital source is an HD streaming service, or a less than optimal computer or transport, you may very well prefer the smoothing sound of statistical error correction and Delta-Sigma DACs. Remember: FPGA DACs are the heavyweights in the Delta-Sigma world. If you are a purist and want the optimal time, tune, tone, timbre, and harmonic coherency, then you very well may prefer an R-2R DAC.

The most important thing is to be open minded and not assume anything. Always evaluate components in the system they will be played in using the digital source and software that will be feeding it. Even if you've found something like upsampling or a USB reclocker, or a specific CODEC sounded best with your current DAC or digital source, when considering new digital source components, do some blind A/B comparisons with friends, where the listener does not know what they are hearing. I've always found blind A/B comparisons are the best way to evaluate new components.

If you enjoyed this blog, you also may want to read my other blogs:

DSD vs. PCM: Myth vs. Truth. and The 24-Bit Delusion.

Enjoy!

Benjamin Zwickel
Owner, Mojo Audio