Sunday, 20 October 2013

Audio compression bitrates

If you are ripping a CD, and compressing the music for storage on your computer or portable player, what bitrate setting is enough?

I am writing about "lossy compression". Software is used to process music files on a CD, removing parts of the sound that are (theoretically) inaudible, so that the music can be stored in less space. MP3, AAC and Ogg Vorbis are examples of lossy compression formats, and my portable player can play music files encoded in these formats. The result is that I can store more music in the same space.

When compressing music, you have a choice of format and bitrate setting. The bitrate setting specifies the intended size of the compressed version - generally, a higher bitrate means higher quality. The format also affects the quality, with some formats making better use of space than others. For instance, AAC is often regarded as more efficient than MP3.

Recently, I began an effort of re-ripping CDs in order to have better quality copies on my computer and portable player. I did this because I worried that I was listening to poor-quality copies, and losing something important that I should be able to hear. Sometimes the copies were extremely poor, having been made many years ago, and it was a great improvement to return to the original CD.

The effort posed a question: what bitrate setting is enough for the new copies? Too high, and space is wasted, on the hard disk and on the portable player. Too low and the sound quality suffers.

What is "enough"? Good enough to be listenable? Good enough to sound like the CD? Or, good enough to be indistinguishable from the CD? The final definition is called transparency. Transparent lossy compression is inaudible. Ideally, this is what I want. I can't have music files that sound better than the original CD - but I can have music files that sound just as good.

There's actually a scientific test to determine if compression is transparent. It's called ABX. The technique is used to compare lossy compression software.

Comparisons are made by a test subject, who is able to listen to three files: "A", "B" and "X". As an example, "A" might be the original version of a track, taken straight from a CD. "B" is the compressed version. And "X" is either "A" or "B", chosen at random. The test subject listens to A, B, and X, switching between them as often as desired. The test ends when the subject decides either that X is A, or that X is B.
The ABX comparison plugin for the foobar2000 application
If "A" and "B" really do sound different, then this task is easy and the test subject does not need to guess. But if they sound very similar, or the same, then the test subject guesses.

The ABX trial is repeated many times, keeping score of the number of times that the test subject got the answer right. With sufficient tests, guesswork will be right about 50% of the time, whereas a perceptible difference is right most of the time.

The experimental results are useful because the tests are completely blind: only the computer knows which file is which, and it provides no clues to "help" the user, as these would introduce some bias.

I decided to try some ABX experiments on myself, and I found that I'm not generally able to tell the difference between high bitrate compression (in any common format) and the original CD. Even at lower bitrates, such as 128k and 160k MP3, it can be very hard to tell, and I find I can't reliably distinguish all files. I guessed, sometimes without even realising I was guessing.

ABX experiments have already been carried out, on a large scale, for all the common audio compression tools. For instance, they inform the selection of the LAME MP3 encoder's presets. I wondered why Amazon.com had encoded all of its MP3 files at (approximately) 256k... and now I know that the reason is that LAME's designers determined that this setting was transparent for virtually everyone, based on ABX testing results. The LAME MP3 encoder calls the maximum bitrate setting (320k) "insane" and will not use it unless forced to do so. The experiments show that the extra space required for 320k is wasted.

ABX experiments also provide definitive proof that "high-definition" consumer audio (e.g. 96kHz, 24-bit etc.) is indistinguishable from "CD quality" audio. That's an important result - but evidence won't necessarily have much impact on audiophiles who actually believe that £100 USB cables sound different to £2 USB cables.

The general results of ABX experiments, plus my personal ABX results, give a sort of lower bound for perfectly transparent results at around 192k. Accordingly, I added a small margin and started encoding files with the Nero AAC encoder at quality 0.6, which means an average bitrate above 200k.

This setting is probably similar to Amazon's (it's not directly comparable due to the difference in format). It's certainly lower than my current favourite music store (7digital - 320k AAC). But I think that the music stores pick higher bitrates just in order to please customers who assume that higher bitrates are audibly better. Experimental evidence says otherwise, but in the sales business, perceived differences are crucial!

I'll conclude with a link to a very interesting article that I found while researching this topic. It is about a (flawed) ABX experiment carried out to compare digital and analogue recording. A self-professed audiophile and hi-fi expert was asked to distinguish between sound played directly from a record and sound played from the record via a digital encoder and decoder. Initially he was unable to do so, but later it appears that he began to listen for subtle clues about the sound sources. Bias crept in - the listener started to listen for the source that sounded different - as evidenced by the fact that he unintentionally identified the digital source as being better! It's a fascinating study of (1) the weird things that audiophiles believe, and (2) the problems with carrying out ABX experiments involving analogue equipment.