Document Version: 1.0.0 Last Updated: 2025-12-18
- Overview
- Technical Specifications
- Registers
- Addressing Modes
- Instruction Set
- Interrupt Handling
- DMA Operation
- Implementation Guide
The NES CPU is a Ricoh 2A03 (NTSC) or RP2A07 (PAL), a custom variant of the MOS Technology 6502 microprocessor. Key differences from the stock 6502:
- Decimal mode disabled: The D flag exists but has no effect
- Audio Processing Unit (APU) integrated on the same die
- DMA controller for OAM sprite data transfers
- Clock frequency: 1.789773 MHz (NTSC) or 1.662607 MHz (PAL)
NTSC Master Clock: 21.477272 MHz
CPU Clock = Master ÷ 12 = 1.789773 MHz (~559 ns per cycle)
PAL Master Clock: 26.601712 MHz
CPU Clock = Master ÷ 16 = 1.662607 MHz (~601 ns per cycle)
| Specification | Value |
|---|---|
| Architecture | 8-bit, little-endian |
| Data Bus | 8-bit |
| Address Bus | 16-bit (64KB address space) |
| Clock Speed (NTSC) | 1.789773 MHz |
| Clock Speed (PAL) | 1.662607 MHz |
| Instruction Set | 56 official opcodes |
| Addressing Modes | 13 modes |
| Stack | 256 bytes ($0100-$01FF), descending |
| Interrupts | NMI, IRQ, BRK, RESET |
Instructions take 2-7 cycles:
- 2 cycles: Register operations (INX, DEY, TAX, etc.)
- 3-4 cycles: Memory read/write (LDA, STA)
- 5-6 cycles: Read-Modify-Write (INC, DEC, ASL, ROL)
- 7 cycles: Interrupts (NMI, IRQ, BRK)
Page Crossing Penalty: +1 cycle for some indexed addressing modes when crossing a 256-byte page boundary.
The 6502 has 7 registers, all 8-bit except the Program Counter (16-bit):
pub a: u8- Primary register for arithmetic and logic operations
- Holds one operand and receives results
- Used with all ALU operations (ADC, SBC, AND, OR, EOR, CMP)
pub x: u8
pub y: u8- Used for indexed addressing modes
- Can be used as loop counters
- X used for stack pointer offset (TSX, TXS)
- Y typically used for indirect indexed addressing
pub sp: u8- Points to next free location on stack
- Stack located at $0100-$01FF (256 bytes)
- Descending (decrements on push, increments on pull)
- Full address = $0100 + SP
- Initialized to $FD on power-up
pub pc: u16- Points to next instruction to execute
- Automatically incremented after instruction fetch
- Modified by jumps, branches, subroutines, and interrupts
- Initialized to value at $FFFC-$FFFD on RESET
bitflags! {
pub struct Status: u8 {
const CARRY = 0b0000_0001; // C - Carry flag
const ZERO = 0b0000_0010; // Z - Zero flag
const INTERRUPT = 0b0000_0100; // I - Interrupt disable
const DECIMAL = 0b0000_1000; // D - Decimal mode (no effect on NES)
const BREAK = 0b0001_0000; // B - Break command
const UNUSED = 0b0010_0000; // - - Always 1
const OVERFLOW = 0b0100_0000; // V - Overflow flag
const NEGATIVE = 0b1000_0000; // N - Negative flag
}
}Carry (C):
- Set when unsigned overflow occurs (addition) or no borrow needed (subtraction)
- Used by ADC, SBC, CMP, CPX, CPY
- Manipulated by SEC, CLC, ROL, ROR
Zero (Z):
- Set when result equals zero
- Affected by most ALU and load/store operations
Interrupt Disable (I):
- When set, IRQ interrupts are masked (NMI still occurs)
- Set by SEI, cleared by CLI
- Automatically set during interrupt handling
Decimal (D):
- On stock 6502, enables Binary-Coded Decimal (BCD) mode
- Has no effect on NES - hardware is removed, but flag still exists
- Set by SED, cleared by CLD
Break (B):
- Distinguishes BRK from IRQ when status pushed to stack
- Set (1) for BRK, clear (0) for IRQ/NMI
- Not an actual flag register - created on stack push
Unused:
- Always reads as 1
- Pushed to stack as 1
Overflow (V):
- Set when signed overflow occurs in addition/subtraction
- Formula:
(A^result) & (M^result) & 0x80 - Used to detect when signed arithmetic result is incorrect
- Cleared by CLV
- Tested by BVC/BVS
Negative (N):
- Copy of bit 7 of result
- Used for signed comparisons
- Affected by most ALU operations
The 6502 supports 13 addressing modes:
Operates on a register or flag, no operand needed.
Syntax: INX, DEY, CLC, SEI
Bytes: 1
Cycles: 2
Example: INX ; X = X + 1
Operand is the accumulator.
Syntax: ASL A, ROL A, LSR A, ROR A
Bytes: 1
Cycles: 2
Example: ASL A ; A = A << 1
Operand is the next byte after opcode.
Syntax: LDA #$10
Bytes: 2
Cycles: 2
Effective Address: PC
Example: LDA #$42 ; A = 0x42
Operand at address $00XX (first 256 bytes).
Syntax: LDA $10
Bytes: 2
Cycles: 3
Effective Address: $00nn
Example: LDA $80 ; A = memory[$0080]
Zero page address + X register.
Syntax: LDA $10,X
Bytes: 2
Cycles: 4 (includes dummy read)
Effective Address: ($00nn + X) & $00FF
Example: LDA $80,X ; A = memory[($80 + X) & $FF]
Zero page address + Y register (used with LDX, STX).
Syntax: LDX $10,Y
Bytes: 2
Cycles: 4
Effective Address: ($00nn + Y) & $00FF
Full 16-bit address.
Syntax: LDA $1234
Bytes: 3
Cycles: 4
Effective Address: $nnnn
Example: LDA $8000 ; A = memory[$8000]
Absolute address + X register.
Syntax: LDA $1234,X
Bytes: 3
Cycles: 4 (+1 if page crossed)
Effective Address: $nnnn + X
Example: LDA $8000,X ; A = memory[$8000 + X]
Page Crossing: If ($nnnn & $FF00) != (($nnnn + X) & $FF00), add 1 cycle (dummy read from wrong page).
Absolute address + Y register.
Syntax: LDA $1234,Y
Bytes: 3
Cycles: 4 (+1 if page crossed)
Effective Address: $nnnn + Y
Used only with JMP. Address stored at given location.
Syntax: JMP ($1234)
Bytes: 3
Cycles: 5
Effective Address: memory[$nnnn] | (memory[$nnnn+1] << 8)
Example: JMP ($FFFC) ; PC = memory[$FFFC] | (memory[$FFFD] << 8)
Hardware Bug: If low byte of address is $FF, high byte wraps within same page:
JMP ($10FF) reads:
Low: memory[$10FF]
High: memory[$1000] <- should be $1100!
Address from zero page table indexed by X.
Syntax: LDA ($10,X)
Bytes: 2
Cycles: 6
Effective Address: memory[$00nn + X] | (memory[$00nn + X + 1] << 8)
Example: LDA ($80,X) ; addr = ZP[$80 + X], A = memory[addr]
Address from zero page, then indexed by Y.
Syntax: LDA ($10),Y
Bytes: 2
Cycles: 5 (+1 if page crossed)
Effective Address: (memory[$00nn] | (memory[$00nn + 1] << 8)) + Y
Example: LDA ($80),Y ; addr = ZP[$80] + Y, A = memory[addr]
Used by branch instructions. Signed 8-bit offset from PC.
Syntax: BEQ label
Bytes: 2
Cycles: 2 (+1 if branch taken, +2 if page crossed)
Effective Address: PC + signed_offset
Example: BEQ $02 ; if Z=1, PC = PC + 2
| Mnemonic | Description | Flags | Example |
|---|---|---|---|
| LDA | Load Accumulator | N, Z | LDA #$42 |
| LDX | Load X Register | N, Z | LDX $80 |
| LDY | Load Y Register | N, Z | LDY $80,X |
| STA | Store Accumulator | - | STA $8000 |
| STX | Store X Register | - | STX $80 |
| STY | Store Y Register | - | STY $80 |
| Mnemonic | Description | Flags | Cycles |
|---|---|---|---|
| TAX | Transfer A to X | N, Z | 2 |
| TAY | Transfer A to Y | N, Z | 2 |
| TXA | Transfer X to A | N, Z | 2 |
| TYA | Transfer Y to A | N, Z | 2 |
| TSX | Transfer SP to X | N, Z | 2 |
| TXS | Transfer X to SP | - | 2 |
| Mnemonic | Description | Flags | Cycles |
|---|---|---|---|
| PHA | Push Accumulator | - | 3 |
| PHP | Push Processor Status | - | 3 |
| PLA | Pull Accumulator | N, Z | 4 |
| PLP | Pull Processor Status | All | 4 |
| Mnemonic | Description | Flags | Notes |
|---|---|---|---|
| ADC | Add with Carry | N, V, Z, C | A = A + M + C |
| SBC | Subtract with Carry | N, V, Z, C | A = A - M - (1-C) |
| INC | Increment Memory | N, Z | M = M + 1 |
| INX | Increment X | N, Z | X = X + 1 |
| INY | Increment Y | N, Z | Y = Y + 1 |
| DEC | Decrement Memory | N, Z | M = M - 1 |
| DEX | Decrement X | N, Z | X = X - 1 |
| DEY | Decrement Y | N, Z | Y = Y - 1 |
| Mnemonic | Description | Flags | Formula |
|---|---|---|---|
| AND | Logical AND | N, Z | A = A & M |
| ORA | Logical OR | N, Z | A = A | M |
| EOR | Exclusive OR | N, Z | A = A ^ M |
| BIT | Bit Test | N, V, Z | N=M7, V=M6, Z=(A&M==0) |
| Mnemonic | Description | Flags | Operation |
|---|---|---|---|
| ASL | Arithmetic Shift Left | N, Z, C | C <- [76543210] <- 0 |
| LSR | Logical Shift Right | N, Z, C | 0 -> [76543210] -> C |
| ROL | Rotate Left | N, Z, C | C <- [76543210] <- C |
| ROR | Rotate Right | N, Z, C | C -> [76543210] -> C |
| Mnemonic | Description | Flags | Operation |
|---|---|---|---|
| CMP | Compare Accumulator | N, Z, C | A - M |
| CPX | Compare X Register | N, Z, C | X - M |
| CPY | Compare Y Register | N, Z, C | Y - M |
All branches take 2 cycles if not taken, 3 if taken (same page), 4 if taken (different page).
| Mnemonic | Description | Condition |
|---|---|---|
| BCC | Branch if Carry Clear | C = 0 |
| BCS | Branch if Carry Set | C = 1 |
| BEQ | Branch if Equal (Zero) | Z = 1 |
| BNE | Branch if Not Equal | Z = 0 |
| BMI | Branch if Minus | N = 1 |
| BPL | Branch if Plus | N = 0 |
| BVC | Branch if Overflow Clear | V = 0 |
| BVS | Branch if Overflow Set | V = 1 |
| Mnemonic | Description | Cycles | Stack Effect |
|---|---|---|---|
| JMP | Jump (Absolute) | 3 | None |
| JMP | Jump (Indirect) | 5 | None |
| JSR | Jump to Subroutine | 6 | Push PC-1 (2 bytes) |
| RTS | Return from Subroutine | 6 | Pull PC, PC = PC + 1 |
| Mnemonic | Description | Cycles | Stack Effect |
|---|---|---|---|
| BRK | Break | 7 | Push PC+2, Push P|0x10 |
| RTI | Return from Interrupt | 6 | Pull P, Pull PC |
| Mnemonic | Description | Flag | Value |
|---|---|---|---|
| CLC | Clear Carry | C | 0 |
| SEC | Set Carry | C | 1 |
| CLI | Clear Interrupt Disable | I | 0 |
| SEI | Set Interrupt Disable | I | 1 |
| CLV | Clear Overflow | V | 0 |
| CLD | Clear Decimal (no effect) | D | 0 |
| SED | Set Decimal (no effect) | D | 1 |
| Mnemonic | Description | Cycles |
|---|---|---|
| NOP | No Operation | 2 |
The 6502 supports 3 types of interrupts:
Highest priority, non-maskable.
Trigger: Power-on or reset button
Duration: 7 cycles
Vector: $FFFC-$FFFD
Stack: No push (SP decremented by 3 but nothing written)
I Flag: Set
Effect: PC = memory[$FFFC] | (memory[$FFFD] << 8)
Second priority, cannot be disabled.
Trigger: PPU VBlank (start of scanline 241)
Duration: 7 cycles
Vector: $FFFA-$FFFB
Stack: Push PCH, Push PCL, Push P (B=0)
I Flag: Set
Effect: PC = memory[$FFFA] | (memory[$FFFB] << 8)
Cycle Breakdown:
Cycle 1-2: Read next instruction (dummy)
Cycle 3: Push PCH
Cycle 4: Push PCL
Cycle 5: Push P (with B=0, U=1)
Cycle 6: Fetch vector low byte
Cycle 7: Fetch vector high byte
Lowest priority, maskable via I flag.
Trigger: Mapper IRQ, APU frame counter IRQ
Duration: 7 cycles
Vector: $FFFE-$FFFF
Stack: Push PCH, Push PCL, Push P (B=0)
I Flag: Set
Masked: When I flag is set
Effect: PC = memory[$FFFE] | (memory[$FFFF] << 8)
RESET > NMI > IRQ
If multiple interrupts occur simultaneously, RESET takes precedence, then NMI, then IRQ.
Software interrupt, similar to IRQ but sets B flag.
Opcode: $00
Duration: 7 cycles
Vector: $FFFE-$FFFF (same as IRQ)
Stack: Push PCH, Push PCL, Push P (B=1)
I Flag: Set
PC: Incremented by 2 before push (skips signature byte)
Writing to $4014 triggers a Direct Memory Access of 256 bytes to PPU OAM.
Write Value: Page number ($00-$FF)
Source: $XX00-$XXFF (where XX is written value)
Destination: PPU OAM ($2004)
Duration: 513 or 514 CPU cycles
Effect: CPU halted during transfer
Cycle Breakdown:
Cycle 1: Dummy read (or write) - alignment
Cycle 2-513: 256 reads + 256 writes (alternating)
Read from $XX00 -> Write to $2004
Read from $XX01 -> Write to $2004
... (256 times)
Alignment:
- If DMA triggered on odd CPU cycle: 513 total cycles (1 dummy + 512)
- If DMA triggered on even CPU cycle: 514 total cycles (2 dummy + 512)
The APU's Delta Modulation Channel can also steal CPU cycles:
Every sample byte fetch: 4 CPU cycles stolen
Frequency: Based on DMC rate (varies)
Effect: Can interfere with controller reads and exact timing
pub struct Cpu {
// Registers
pub a: u8,
pub x: u8,
pub y: u8,
pub sp: u8,
pub pc: u16,
pub p: Status,
// Internal state
cycles: u64,
nmi_pending: bool,
irq_pending: bool,
irq_line: bool,
// Lookup tables (for performance)
instruction_table: [InstructionFn; 256],
addressing_mode_table: [AddressingMode; 256],
cycle_table: [u8; 256],
}pub fn step(&mut self, bus: &mut Bus) -> u8 {
// Check for interrupts
if self.nmi_pending {
return self.handle_nmi(bus);
}
if self.irq_pending && !self.p.contains(Status::INTERRUPT) {
return self.handle_irq(bus);
}
// Fetch opcode
let opcode = self.read(bus, self.pc);
self.pc = self.pc.wrapping_add(1);
// Dispatch
let addr_mode = self.addressing_mode_table[opcode as usize];
let instruction = self.instruction_table[opcode as usize];
let base_cycles = self.cycle_table[opcode as usize];
// Execute (returns extra cycles for page crossing)
let extra_cycles = instruction(self, bus, addr_mode);
base_cycles + extra_cycles
}Essential Test ROMs:
- nestest.nes - Golden log comparison (all instructions)
- instr_test-v5 - Comprehensive instruction tests
- cpu_interrupts_v2 - NMI/IRQ timing
- cpu_dummy_reads - Bus behavior verification
- CPU_TIMING.md - Cycle-by-cycle instruction timing
- CPU_UNOFFICIAL_OPCODES.md - Illegal opcode reference
- ../bus/MEMORY_MAP.md - CPU address space details
- ../ARCHITECTURE.md - System integration
- NESdev Wiki: CPU
- NESdev Wiki: 6502 Instructions
- Visual6502 - Transistor-level simulation
- MOS Technology 6502 Hardware Manual