This project implements a 5-Stage Pipelined RISC-V Processor using SystemVerilog on a Nexys A7 (Artix-7 FPGA). It extends the previous Single-Cycle Processor architecture by introducing instruction-level parallelism through pipelining, dividing instruction execution into five stages Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). The processor supports the complete RV32I instruction set (R, I, S, B, U, J types), integrates pipeline registers, forwarding logic, and a hazard detection unit to resolve data and control hazards, and achieves significant throughput improvement compared to a single-cycle design.
| Type | Tool / Technology |
|---|---|
| Hardware | Nexys A7 FPGA (Artix-7, Digilent) |
| Language | SystemVerilog (HDL) |
| Software | Xilinx Vivado (Design, Simulation, Synthesis) |
| Extras | RISC-V Assembly, .mem files for memory initialization |
Each instruction progresses through five sequential stages, enabling instruction overlap for parallel execution:
- IF – Instruction Fetch: Fetch instruction from instruction memory using PC.
- ID – Instruction Decode: Decode instruction, read registers, generate control signals, and compute immediate.
- EX – Execute: Perform arithmetic/logic operations and calculate branch or memory addresses.
- MEM – Memory Access: Read/write data memory as directed by control signals.
- WB – Write Back: Write results to destination register.
| Feature | Single-Cycle | Pipelined |
|---|---|---|
| Execution Time / Instruction | One long cycle | Five shorter cycles |
| Clock Period | Determined by slowest operation | Reduced (stage-based) |
| Throughput | One instruction at a time | One per cycle (after fill) |
| Latency | 1 cycle / instruction | 5 cycles / instruction |
| Hazard Handling | Not required | Forwarding, stalls, flushes |
| Performance | Moderate | 3–5× higher throughput |
- Program Counter (PC): Holds current instruction address and updates sequentially or to branch/jump target.
- Instruction Memory: Stores compiled RISC-V machine code and supplies 32-bit instructions each cycle.
- Control Unit: Decodes opcodes and generates synchronized control signals for all stages.
- Immediate Generator: Extracts and sign-extends immediates for all instruction types.
- Register File: Contains 32 general-purpose registers (x0–x31); supports 2 reads and 1 write per cycle.
- ALU (Arithmetic Logic Unit): Executes arithmetic and logical operations as per ALUOp signals.
- Data Memory: Handles 32-bit load/store operations with aligned access.
- Branch Comparator: Compares register values for conditional branches.
- Pipeline Registers: IF/ID, ID/EX, EX/MEM, MEM/WB — store intermediate data and control signals.
- Forwarding Unit: Bypasses results from EX/MEM or MEM/WB to resolve RAW dependencies.
- Hazard Detection Unit: Inserts stalls or flushes pipeline on load-use or branch hazards.
The design follows a modular approach, allowing each component (ALU, Register File, Control Unit, etc.) to be independently tested using SystemVerilog testbenches before integration.
All modules are synthesized in Vivado and integrated in the top.sv module, which handles global clock, reset, and data flow.
Testing was performed through simulation and FPGA implementation:
- Module-Level Testing: Each unit (ALU, Hazard Unit, Forwarding Logic, etc.) verified individually.
- Integration Testing: Pipeline registers and control paths validated for signal synchronization.
- System-Level Testing: Complete processor executed RISC-V programs for end-to-end verification.
- FPGA Verification: Design successfully implemented on Nexys A7 with 100 MHz clock.
| Instruction Type | Examples | Status |
|---|---|---|
| R-Type | add, sub, and, or, slt | Passed |
| I-Type | addi, andi, ori, lw | Passed |
| S-Type | sw | Passed |
| B-Type | beq, bne, blt, bge | Passed |
| U-Type | lui, auipc | Passed |
| J-Type | jal, jalr | Passed |
| Metric | Single-Cycle Processor | 5-Stage Pipelined Processor |
|---|---|---|
| Execution Flow | One instruction at a time | Five instructions in parallel |
| Clock Period | Long (slowest path) | Shorter (stage-based) |
| Throughput | 1 instruction / cycle | 1 instruction / short cycle (after fill) |
| Hazard Handling | None required | Forwarding + Stalls + Flush |
| Performance Gain | – | ≈ 4× Improvement |
These RTL (Register-Transfer-Level) views were auto-generated in Vivado to visualize structural connectivity among modules.
Timing waveforms confirm correct overlap of instructions, data forwarding, and stall behavior across pipeline stages.
The 5-Stage Pipelined RISC-V Processor successfully demonstrates a modern pipelined architecture implemented entirely in SystemVerilog and deployed on the Nexys A7 FPGA.
Through instruction-level parallelism, forwarding, and hazard management, the processor achieves a ≈ 4× increase in throughput compared to a single-cycle design while maintaining functional correctness and timing stability at 100 MHz.
- Branch Prediction Unit – reduce control hazard penalties.
- Instruction & Data Caches – improve memory latency.
- Out-of-Order Execution – further boost parallelism.
- Exception/Interrupt Handling – add system-level robustness.
- RISC-V Extensions – support M, F, and CSR extensions for advanced features.
This project is licensed under the MIT License.
Awais Asghar
NUST Chip Design Centre (NCDC)