NES Audio (APU)

In this article we will be looking at the sound features of the NES. We’ll learn how to emulate sound at a high level, and also write programs that generate sound on the NES. Sound on the NES is handled by some custom hardware, often called the “Audio Processing Unit” or APU, which is embedded in the CPU Chip (the Ricoh 2A03).

Introduction

When we looked at the NES Graphics hardware, we saw that the NES uses a number of tricks in order to minimize storager requirements, while enabling a great deal of flexibility for developers.

Similar optimizations were made for the sound hardware. Rather than store audio waveforms directly, the system includes some simple synthesizer hardware. This both makes it easy for developers to achieve a variety of sound and music effects, but also eliminates the need to store audio data.

Note, this article assumes some basic understanding of how computers represent sound. If this is new to you, you can take a look at the Digital Audio Primer.

Hardware Features

The APU hardware manages five key hardware units, each capable of generating a different type of sound. All five of these units, or channels, can be active at the same time. These are:

1&2. Pulse or Square wave (2x) 3. Triangle 4. Noise 5. DMC (Delta Modulation Channel)

Sound wave types

Many of these channels support similar features, with the key distinction being the type of sound wave generated. Lets look at these sound waves briefly, then look into the hardware in detail.

Pulse 1 & 2

These channels generate a pulse or square wave, which essentially is a wave that jumps quickly between two values. The duty cycle is also adjustable, allowing thinner or thicker waves.

Triangle

Next up is the triangle wave, which linearly transitions between the two values rather than directly switching over. The triangle wave has a very ear-catching sound, and for this reason is typically used for the main melody in games.

Noise

The two previous channels are meant for playing notes of varying shapes. But what about other sounds, like percussion?

The NES contains a “noise” channel, which can be thought of as producing random values on its output. This typically sounds something like the “sh” sound, and is typically used for sound effects or a hi-hat style of percussion instrument.

The NES uses a hardware approximation of “random” numbers, which actually repeats. Though conceptually it is meant to produce random samples for the channel.

DMC

Finally, the DMC allows the software to send specific values to the sound chip. This allows great flexibility, but ties up the CPU in order to use effectively. In practice, this means it often use for cutscenes, or when the gameplay is paused. It also takes a lot of storage space to use, so many games stick to the other channels.

This is called the “Delta Modulation Channel”, because of how data is sent to the soundchip. Internally, the DMC manages a 7 bit counter for the hardware. If raw audio samples were stored, each sample or point on the above graphs would take 7 bits. DMC playback rate varies, but ranges from 4181 Hz at minimum and 33.143 KHz at max.

So at its most space efficient setting, it would take 7 bits * 4181 / 8 = 3.658 KB per second of audio over the DMC. Considering that the typical storage of an entire cartridge was only 16K, adding four seconds of sound would more than double the storage requirements (and thus cost) of a cartridge).

So instead of storing raw samples, DMC compresses these samples by a factor of 7. Each bit means “increase by 1” or “decrease by 1”. The DMC then applies the value to its existing counter in order to get the new value. This results in a slight reduction in audio quality in return for massive savings. 1 second of audio encoded this way would be 1 bit per sample or 522 bytes per second. Still a lot of space, but not quite as bad.

Controlling the Sound Hardware

Before we try to build a model of the sound hardware for an emulator, lets see how software interacts with it.

As with all other devices on the NES, these channels are controlled through a number of memory mapped IO addresses.

Each channel has a few internal registers, and its internal state ticks about 4 times per frame. Using any channel requires usage of the APU Status register. Beyond that, each channel has its own control register.

Rather than cover all the channels here, I’ll look at the pulse channel in detail. The NESDev wiki, linked at the end of this article, has details on the other channels. After reading my explanation of the pulse channel, information about the other channels should be easier to understand.

Status (0x4015)

The status mmio register can be used to enable or disable each of the above sound channels. Reading from the address can also be used to check if the given sound has finished playing.

Writing a 1 enables, and 0 disables. Reading a 1 means the channel is still playing, and 0 means it has finished.

             7654 3210
0x4015 bits: ---D NT21

D: DMC
N: Noise
T: Triangle
2: Pulse 2
1: Pulse 1

Pulse 1 Registers (0x4000 - 0x4003)

The pulse channel has its own set of registers at hex 0x4000-0x4003. Pulse 2 uses the same organization, instead at address 0x4004-0x4007.

0x4000 : Pulse1 Main register
         7654 3210
         DDLC VVVV
DD: Duty cycle.
L : Loop. If set, its counter will not decrease,
resulting in a tone that plays continuously.
C: Const volume. If 1, the sweep will not change its
volume over time.
VVVV: Volume (C=1) or Envelope(C=0)

Most channels include a way to vary parameters over time. This is accomplished with a “sweep unit”. For the pulse channels, the sweep unit can be used to vary the volume over time. At high speeds this gives the audio a sort of wobbly effect. We’ll leave this disabled for our program, but feel free to experiment with these settings later.

0x4001: Sweep controls
        7654 3210
        EPPP NSSS
E: Enable
P: Period
N: Negate or flip
S: Shift

Finally, we have the length and timer settings. The timer can be thought of as controlling the frequency or pitch of the sound. The length controls the duration of the sound. This allows the APU to automatically shut the sound off after a certain amount of time has passed. This is much more efficient than requiring the CPU to monitor playback, since it leaves more CPU cycles available for the game code.

The timer is an 11 bit value. Since addresses hold 8 bit values, the timer is split over two different MMIO locations.

0x4002 : Timer lower bits
0x4003 : Length & Timer upper bits
   LLLL LTTT
L: Length
T: Upper timer bits.

Thats it for the pulse channel. Other channels have similar features, each controlled by their own set of MMIO addresses.

Frame counter (0x4017)

The last register involved in sound generation for the pulse channel is called the “Frame Counter”. This is involved in all channels, and controls the varying aspects of the channel. The frame counter can be thought of running a fixed program in a loop, and runs either a 4 step or a five step program.

4 Step (Mode 0):

- - - I
- L - L
C C C C 

5 Step (Mode 1):
- - - - -
- l - - l
C C C - C

I: Interrupt, if enabled.
L: Decrease Length Counter & Sweep
C: Envelope and Linear Counter.

The external controls of the Frame Counter come down to only two bits:

0x4017: MI-- ----
M: Mode.
I: IRQ Off

Sound interrupts are often to implement sequencing, to play back a sequence of notes for a song. On an interrupt, the CPU would check the next step of the song, and update the channel registers to match.

Writing a test program

Lets put together what we’ve learned about the APU and write a program to generate a basic pulse wave. Comments are added below to make it easy to follow along.

 ; configure pulse
; 50% duty, loop, const vol @ 15/15
; DDLC VVVV
; 1011 1111
; B    F
lda #$BF
sta $4000 ; Pulse 1 Main

; Disable Sweep & Envelope for a constant
; sound.
lda #$00
sta $4001

; Calcuate timer T to play
; note A440, 440 hz
; Freq = Freq_cpu / (16 x (T+1))
; T = (Freq_CPU / (16 x Freq)) - 1
; Freq_CPU = 1789773 (NTSC)
; Freq = 440
; T = 253 = 0xFD
lda #$FD
sta $4002

; Use the maximum length,
; and no more timer bits
; since T < 256
; LLLL LTTT
; 1111 1000
lda #$F8
sta $4003

; Use mode 0(4 step) for frame ctr
lda #$00
sta $4017

; Finally enable pulse channel 1
; ---D NT21
; 0000 0001
lda #$01
sta $4015

And with this your NES or emulator should produce a solid tone. You can find the full source code, as well as a pre-assembled test rom, on my NES index page.

Testing sound support

The test rom we built previously can be modified per channel, giving you a basic test of sound support.

And beyond this, much software for the NES plays a theme song on the title screen. You can build support for the APU one channel at a time, and should be able to observe a more complete theme when testing against your game backup of choice.

Other Quirks

The APU can trigger interrupts in two different situations. One, is when the DMC channel reaches 0. This is useful so that code can queue up the next delta, enabling streaming audio. The other is when the frame counter reaches the end of its sequence. This gives software a natural way to tweak various parameters of the sound over time.

Hyrum’s Law points out observable effect of a system will be relied upon. Some games use the these interrupts in order to get sub-frame timing. Correctly emulating all games on the NES requires these timers to step as though these hardware elements ticked multiple times per frame.

Extra resources

Details in this article come from my experience in building an emulator, and the wealth of information on the nesdev wiki.

⭅ Previous (Controller)

Next (PicoNES Setup Guide) ⭆