⭅ Previous (Architecture) Next (Emulator Design) ⭆

6502 Assembly Language

Also available on Youtube.

In the previous post we looked at the key components that make up the NES, and learned a bit about how computers represent values and how they’re stored in memory.

One issue mentioned was that memory is physically outside the CPU, and thus slower to access. How does a CPU, particularly the 6502-like CPU in the NES, work around that? The answer is registers!

What is a register?

CPUs have a small amount of storage inside the same chip. By being physically closer, it can be used much more quickly than external memory. However, especially in the early days of integrated circuits, it was hard to fit much of this storage on the same chip. The 6502, and thus the NES, has only a few registers:

PC aka Program Counter : 16 bits
SP aka Stack Pointer   : 8 bits

accumulator : 8 bits
x index     : 8 bits
y index     : 8 bits

status      : 8 bits

Note this is a total of 7 bytes, far less than the ~65 000 or 65K Bytes addressable in memory. But these can be accessed without any delay, and so efficient programming style prefers to utilize registers while computing. Registers can be backed up or “swapped out” to memory once different values are needed.

Of these registers, only acc, x, and y are truly general purpose. Status is updated indirectly, and serves to give information about the last operation completed. PC and SP are used to control memory accesses. PC can point to any memory address, and SP has some limitations which is why only one byte is needed. PC is how the CPU keeps track of what it needs to do next, and SP makes memory a little easier to use, which we’ll see in a bit. First, lets get a sense of how we tell the CPU to do things.

Instructions

A computer program is a list of steps for the CPU to run. Modern computers typically use an abstract or high level programming language, which is easy for humans to write. In the era of the NES and the 6502, computers were much slower, and so humans wrote code directly in terms of what the CPU understood.

This is known as “assembly language”, which can be directly translated to the “machine code” that the CPU understands. Here is an example:

; Load the a register with the value hex $20 aka 32
lda #$20 

This is fed into a program called an assembler, which uses a table to convert this into machine code for the CPU. Our assembler ignores anything after a semicolon (;), which allows us to place notes along side our code.

In the 6502, this lda command or “instruction” is represented by the value 0xa9. The input to the command comes right afterwards. So the assembly language above turns into the machine code (represented in hex):

a9 20

Notice that assembly language and machine code directly represent the same data. Using an assembly language just saves the programmer from needing to memorize the numeric representation, or opcode, of each instruction.

The 6502 uses one byte for each opcode, and the size of the input depends on which instruction is used. That lda instruction takes only a single byte as input, so the full lda instruction is 1byte opcode + 1byte input = 2 bytes.

After the cpu reads, decodes, and executes that instruction, it would update the program counter (PC) by adding 2, and continuing to the next instruction. The CPU knows how long each instruction is based on its opcode, and reads through the program sequentially in this way.

Lets take a look at a few other types of instruction to get a feel for what the CPU does. Once a value is loaded into a register, we can do math with it.

; add to accumulator from memory
; aka acc is added to the 8 bytes at address 0xdead
adc $dead

This instruction reads from memory, and adds the value looked up to the accumulator register and stores the result in the accumulator. Since the input to this instruction is the 16 bit(2 byte) address, this instruction takes 3 bytes instead of the two we saw previously. And since it needs to read from memory, it will take longer to complete than the register only instruction. We’ll see timing in more detail later.

Now if the CPU could only follow instructions one after another, it wouldnt be too useful. We could easily predict ahead of time everything it would do. The last instruction type we’ll look at is a branch, which is how CPUs in general handle decision making.

lda #$1   ; acc = 1
ldx #$03  ; x = 0x03 = 3
.loop:
    asl a      ; acc = acc << 1, aka times 2
    dex        ; decrease x by 1
    bpl .loop  ; if the last result is positive, branch to .loop
.done:

Alright, a few new things in this example. Comments are on each line to help explain what the code does. The importance here is just the understanding of branching, so no worries if this doesn’t make complete sense. We start with the accumulator set to 1, and x set to 3. Then we have a repeated series of instructions, often called a loop. The code .loop: is called a label, which just gives us a way to talk about a position in the code.

In the loop, we arithmetic shift left by 1. This is sort of an assembly language trick to multiply by 2, since early computers including the 6502 couldn’t multiply. for a number represented by the bits ABCD EFGH, after doing an asl instruction the value will be BCDE FGH0. So as long as the biggest bits were zero, the result is the same as multiplying by two. The base ten equivalent, which might be easier to see, would be like turning 12 into 120.

Dex means decrease x by 1, aka decrement x.

Finally we have bpl. This instruction uses the status register, and determines whether to go to the next instruction, or change PC to be the code right after our loop label. The status register gets updated by most instructions, but we cant see it directly. In this case, when we decrement x, the status register gets updated to tell us if the result was negative. bpl goes backc to loop only as long as the new x value was 0 or greater.

So if we could watch this code run, it would look like: a = 1, x = 3 a = 2, x = 2 a = 4, x = 1 a = 8, x = 0 a = 16, x = -1 Then the code stops because x is not positive.

And with that, we have about everything needed to start writing the CPU emulator. Next up we’ll look at some basic emulator code, as well as some clever 6502 design tricks that made it simple to build and also simpler to emulate.

⭅ Previous (Architecture) Next (Emulator Design) ⭆

We publish about 1 post a week discussing emulation and retro systems. Join our email list to get notified when a new post is available. You can unsubscribe at any time.