Digital Audio Primer

Aka Sound for Programmers

Many systems contain hardware to generate sound. To understand the hardware behind audio synthesis, it helps to know how sound can be represented digitally.

Sound 101

The sound we hear is just changes in air pressure on our ears. The key information communicated by the sound is this variation in pressure over time.

We call this a “wave” and it can be visualized like this.

Horizontally on the x axis is time, and vertically on the y axis is the value corresponding with the pressure.

Lets take a look at the two main attributes we use to describe a sound wave.

Amplitude

First up is amplitude. This tells is the height of the peaks of the wave. In the previous example, the wave had amplitude 1. This means the wave range from +1 to -1.

Amplitude changes how loud we perceive the sound to be. A wave with a lower amplitude sounds quieter than a wave with higher amplitude.

Here is the wave from earlier, this time with amplitude 0.5 instead of 1.

Frequency

Next is frequency, which is a measure of how quickly a wave repeats. We recognize patterns, and sound waves typically repeat a number of times.

Frequency is typically measured in Hertz (Hz), which is the number of times a wave repeats in one second. Sometimes we wish to talk about one up&down repetition of a wave. This is called its period.

Human hearing is limited to detecting sound waves in certain bands of frequencies.

A440 is a frequency often used for tuning instruments. On a piano, it corresponds with the A note just above the middle of the keyboard. A440 means it has a frequency of 440 hz. We heard A440 earlier. Here you can see and listen to the next note up, corresponding with about 466 Hz.

Here is a visualization of it over a fraction of a second. You can click the button to hear the tone as well.

Audio Digitization

Natural sound waves are continuous. When we want to represent audio data digitally, the audio gets chunked up. Often the size of these chunks are small enough that one cannot tell the difference.

This “chunking” happens on both dimensions.

Sampling

Sampling is this “chunking” applied on the horizontal or time dimension. The sample rate refers to how frequently the value of an audio wave is stored.

Typically the sampling rate is high enough that human ears cannot tell the difference between a smooth analog signal and a digital reproduction.

Here is the A440 single from earlier, this time only sampled at 440 * 3 = 1320 hz.

Quantization

Quantization refers to the resolution along the vertical axis. Typically, it means the number of bits per sample, which limits how many unique values are possible for each sample.

Here is the original A440 signal, quantized to 8 levels (3 bits per sample).

Common Wave Types

Now lets take a look at a few common wave types used in audio synthesis. Code snippets for each are provided in vanilla Javascript.

Sine

We’ve already looked at the sine wave. It is a smooth, bell like tone. Its called the sine wave since it can be described by the sine math function.

The sine wave is described primarily by its frequency.

And can be described by the following code:

function fn(t){ 
  return Math.sin(2 * Math.PI * 440 * t);
 }

Square / Pulse

While the sine wave is mathematically simple, this doesn’t mean it is simple to generate. Particularly in early computer hardware, producing a sine wave in real time could prove challenging.

Instead, many early devices used a square or pulse wave. This is a wave that flips from +1 to -1 at some frequency. Here is a square wave at 440 hz.

In addition to frequency, square waves are controlled through their “duty cycle”. This refers to the fraction of time the wave is high during one period. The last example was a 50% duty cycle, which most closely mirrors the sine wave.

Below is a 10% duty cycle square wave, still at 440 hz.

  let p = 1.0 / 440;
  let duty = 0.1;
   function fn(t){ 
    if ((t % p)/p < duty) {
        return 1;
    } else {
        return -1;
    }
  }

Square waves were popular in early devices as they are easy to build, and are essentially implemented with a counter and a comparison.

Triangle

Next up is the triangle wave. Unlike the square wave, which ideally transitions between only two values, the triangle wave smoothly transitions between 1 and -1 at each point.

You may notice that this looks like the earlier sine wave with a low sampling rate.

  let p = 1.0 / 440;
  let slope = 2/(p/2);
  function fn(t){ 
    let rem = t % p;
    if (rem/p < 0.5) {
        return -1 + slope*rem;
    } else {
        return 1 - slope*(rem-p/2);
    }
  }

Sawtooth

And the last of the frequently used waves is the sawtooth. Conceptually, the sawtooth is somewhere between a pulse and a triangle wave. Over the full period, it sweeps from the negative peak to the positive. Then it transitions back to the negative instantly.

   let p = 1 / 440;
   let slope = 2/p;
  
   function fn(t){ 
    let rem = t % p;
    return -1 + slope*rem;
  }

Conclusion

And that covers most of what you’ll need to know when building systems to generate audio.

If you’re interested in how specific computer systems generate sound, take a look at the systems pages for deepdives on classic hardware.