You are here

MP3 Players

Cutting Edge By Dave Shapton
Published October 2000

Diamond Rio — perhaps the best known MP3 player.Diamond Rio — perhaps the best known MP3 player.

This month, Dave Shapton suggests that emerging compression techniques may actually improve our audio and multimedia experiences.

Compression. The data reduction type, that is. It's not something that those of us working in the professional audio business have had much to do with until quite recently. The few professional devices that did employ (data) compression were used in radio, where priorities were different. For radio transmission, the ability to transfer over a network and ISDN, alongside making the best use of expensive storage, was more important than absolute quality. But now it seems as if everyone in the audio industry is talking about compression. Why?

The Medium And The Message

Minidisc was the first successful consumer format to use data compression.Minidisc was the first successful consumer format to use data compression.

Well, what's happening is that we now have to take responsibility for the way we move music about and so we've started to make a distinction between the medium that carries our music recordings and the digital data files themselves. The best example is an MP3 file. Before MP3, the way in which we experienced digital audio was via CDs — DAT never had a chance to succeed as a consumer medium so, as far as digital music was concerned, our choice was limited to CDs.

When you put an audio CD into a CD player you see a list of tracks. Up to now we've not thought of them as files (in the same sense as a word‑processor file) but CD Writers have changed that. Making copies of CDs is now trivially easy, as is converting the resulting WAV or AIFF files to MP3. And when you've got your MP3, you have choices. Do you keep them on your computer's hard disk, copy them back onto a CD‑R (about a hundred and fifty of them per disc), or transfer them to a portable MP3 player (which is, nowadays, nearly every portable electronic device that you can buy, including wristwatches)?

The important point is that we, the music users, are now involved in some pretty technical stuff when we move our music files around. Unless someone simply shows us how it's done, we need to know about file sizes, and probably all sorts of Internet related stuff as well: including the question of bandwidth.

I don't think the issue of compression is going to go away very soon, so I thought I'd spend a bit of time looking at this area. If you read on to the end of this article there are some curious, if not profound, conclusions to be made about the way compression techniques may actually improve our audio and multimedia experiences.

More For Less?

Sony's Super Audio CD combines much of the technology developed for the DVD format with that of conventional CDs.Sony's Super Audio CD combines much of the technology developed for the DVD format with that of conventional CDs.

Did you know that when you sample analogue audio you are actually compressing it? Do you believe me? Think about it. In nature (or indeed in a musician's bedroom) there is no fixed limit to the audio frequency spectrum. There is, of course, a limit to our own hearing range and, because of that, there is little point in being able to record and reproduce frequencies which we can't hear. (There is a fascinating debate about whether listeners do actually benefit from reproducing frequencies that they can't hear directly, but there isn't space to discuss that here.)

So we only 'sample' audio at a resolution that is appropriate to the nature of the best equipment we might reproduce it on. You could say that the act of sampling is actually a pretty severe form of compression, because not only do we use a relatively low sample rate, with its consequent limiting of the reproduced frequency range, but we also limit the dynamic range through the use of 8‑, 16‑ or 24‑bit sampling.

Even though the process of sampling analogue audio represents, in absolute terms, a very severe form of compression, the results are actually very good. So good, in fact, that most of us never think of the sampling process as a data reduction process.

Profit From Loss

Once your media is in the digital domain it can be further compressed digitally or, should I say, mathematically. The ideal sort of compression is loss free, because it allows you to reproduce the original data without change or degradation. The trouble with loss‑free compression is that it actually doesn't compress your material by very much. If you've ever had to buy a computer tape backup device you'll have seen that most of them come with two stated capacities, one normally twice the other.

For example, a 6Gb tape might also have the figure of 12Gb written on it. The lower figure is the true or 'native' capacity of the media and the higher figure is the 'compressed' capacity. By compressing the data, as it is put on the tape, to half of its original size, you can put effectively twice the amount of data onto the tape. But there's a catch to this. This type of compression only works with conventional 'computer' type files, which typically contain text, or computer program code. You only have to look at the average computer program (or indeed the content of Cutting Edge) to see that this type of data is very repetitive and ideal for compression.

Digital audio and video, on the other hand, is less suited to this type of compression. Compression ratios of around 1.6:1 (where 1:1 is uncompressed) is about the best you could expect, with figures much worse than this being quite normal. So, to achieve useful compression ratios with media, we must use 'lossy' compression.

Using lossy compression means that the degree of data reduction we can achieve is remarkable. For example, MP3 at 128Kbits/sec still sounds OK (although discernibly inferior to CD quality) and represents a compression ratio of about 11:1. With video, the results are even more impressive. Uncompressed PAL video has a data rate of (wait for it) 270Mbits/sec. That's nearly two thousand times the data rate of a typical MP3 data stream. At that rate, you'd get about a minute of video per Gigabyte of storage, and only just over half a minute on a whole CD‑ROM. Incredibly, you'd still only get a maximum of about seven minutes of uncompressed video onto a typical dual‑layer DVD!

In practice, when using MPEG‑2 compression you can get a two and a half hour feature film, plus multiple sound tracks, onto a DVD and most people would agree that the picture and sound quality from DVDs are excellent. So what's the compression ratio with DVDs? The data rate on a typical commercial DVD will probably vary between 4Mbit/sec to 11Mbit/second, which represents compression ratios of between 20:1 and 50:1.

All types of lossy compression rely on the principle that if you only throw away data which won't be missed, then the compressed product will still be perfectly acceptable. It starts to get interesting when you push the process to, or beyond, its limits and when you move between one method of compression and another. (I'm talking a lot about video here because the technology is, to some extent, similar to audio and the effects of compression are easier to explain with images.)

Take It To The Limit

When you push video compression to the limit, it's very easy to see the results. Previously smooth, 'natural' images will break up into blocks. You'll also see 'ringing' around sharp edges. These effects, which are analogous with the high‑frequency unpleasantness that you hear with the Internet streamed‑audio formats, are called 'compression artifacts'.

Sometimes the content of a picture can simply be too complex to compress accurately. If you've got a DV camcorder, try filming a field of grass waving in the wind. You'll notice that the blue sky looks fine, but the chances are that if you look closely at the grass it will just be a mass of green noise. (DV compression is a close relative of MPEG, which is said to stand for Moving Pictures Except Grass). That's bad enough, but most people don't notice this type of effect unless they look closely. Now consider what would happen if you were to convert this video material into another compression format. Most compression algorithms find randomness a problem. So our field of grass, which has now become an area which is basically green noise, has so much randomness about it that it could completely trip up any subsequent attempts to compress it.

Cheesy Analogy

The scenario just mentioned is probably a worst possible case. In actual fact, as long as you stay with the same kind of compression, you can probably compress and recompress several times. Don't expect to be able to go between, say, MP3 and ATRAC (Sony's Minidisc compression format) too many times, though. To understand why, think of your digital material, audio or video, as a piece of cheese. (Obviously!).

Now imagine pushing the cheese through a grid of cheese‑cutting wire. What you'll get is the same block of cheese with a regular pattern of slices through it. Hold them together and they will look like the same block of cheese. Chances are that no‑one would notice.

Now, carefully, pass the cheese through the same grid, making sure that you line it up with the original pattern of wires. What happens? Nothing. Apart from it's original carve‑up, it's none the worse for its second pass though the slicing grid. And of course you can go on doing this, perhaps only losing a small amount of cheese because it's difficult to line up the cheese with total accuracy. But what would happen if you changed the shape of the grid (to, say, a triangular pattern), or if you rotated it. Do this enough times and you'll end up with more gaps between the chunks of cheese than cheese itself.

Moral Of The Story

The moral of the analogy is that if you must compress and decompress digital media repeatedly, then you should do it with the same type of compression. Very broadly speaking, all the 'flavours' of compression which I've mentioned so far use the technique of converting waveforms into frequency graphs. By only encoding frequencies that are actually there (and even then only encoding them with as much precision as is necessary) very large compression ratios can be achieved. Some video compression techniques reproduce only differences between frames. But these methods are very susceptible to noise (which is constantly changing and can cause every frame to look completely different).

So where can we go from here? The Windows Media 7 video codec uses what it calls an 'enhanced' version of MPEG‑4 (which should more properly be referred to, perhaps, as 'not' MPEG‑4). There's nothing very remarkable about the MPEG‑4 codec itself, except that hidden within the specification for MPEG‑4 is a facility for working with media 'objects' rather than sampled digital media. It seems that the next leap ahead in media recording and reproduction is to treat every source of audio and every 'thing' in a video as a distinct entity. If you know the characteristics of the object, then you know how it will behave under any given circumstance. Does this sound familiar? It's like physical modelling, of course.

Get Real

Clearly, there's a long way to go before object based media is a practical proposition, but the advantages are enormous. What I find really exciting is that the more we learn about the brain, the more we can describe media objects to look like the raw input which our brain receives from our sense organs. Our brain has the ability to make dreams seem real and it seems to me that we might be able to call upon this ability to actually make the stuff we create in our studios look and sound better. More real, in other words.

A Lesson In Compression

If you compress something well, you will hardly notice it. For example, 16‑bit resolution limits us to just over 65,000 possible amplitudes whereas to be able to reproduce the level of a sound with absolute precision, we would have to use an infinite number of bits to describe it. Let's see, infinity minus 65,000 is... infinity which could suggest that 16‑bit sampling is, at best, completely useless. But, of course, we know perfectly well that this is not the case.

Sixteen‑bit resolution is actually so good that it exceeds the capabilities of most cheap to middle range hi‑fi systems. It's only when you listen to very low level passages in music with a wide dynamic range that 16‑bit playback is a problem at all. The drawbacks that 16‑bit recordings have are easily fixed by 24‑bit recording and reproduction, which is so good that I've yet to hear anyone suggest that we should consider going to an even higher resolution (notwithstanding 'one bit' techniques like Sony's Super Audio CD).

Sony's Super Audio CD combines much of the technology developed for the DVD format with that of conventional CDs.