2010 R&E Computer System Education &
Research
Lecture 9. MIPS Processor Design
Instruction Fetch
Prof. Taeweon Suh
Computer Science Education
Korea University
Introduction
Microarchitecture:
How to implement an
architecture in hardware
Multiple implementations for
a single architecture
Single-cycle
Each instruction executes in a
single cycle
Multicycle
Each instruction is executed
broken up into a series of shorter
steps
We dont cover this in this class
Pipeline
Each instruction is broken up into
a series of steps
Multiple instructions execute
simultaneously
2
Application
Software
programs
Operating
Systems
device drivers
Architecture
instructions
registers
Microarchitecture
datapaths
controllers
Logic
adders
memories
Digital
Circuits
AND gates
NOT gates
Analog
Circuits
amplifiers
filters
Devices
transistors
diodes
Physics
electrons
Korea Univ
Processor Performance
Program execution time
Execution Time = (#instructions)(cycles/instruction)
(seconds/cycle)
Challenge in designing microarchitecture
is to satisfy constraints of:
Cost
Power
Performance
Korea Univ
Overview
In chapter 4, we are going to implement (design) MIPS CPU
The implemented CPU should be able to execute the machine
code we discussed so far
For the sake of your understanding, we simplify the
processor system structure
Real-PC system
CPU
FSB
(Front-Side Bus)
North
Bridge
DMI
(Direct Media I/F)
Main
Memor
y
(DDR)
Simplified
Address Bus
MIPS
CPU
Data Bus
Memory
(Instruction,
data)
South
Bridg
e
Korea Univ
Our MIPS Model
Our MIPS CPU model has separate connections
to instruction memory and data memory
Actually, this structure is more realistic as we will see
in chapter 5
Address Bus
Instruction
Memory
Data Bus
MIPS CPU
Address Bus
Data
Memory
Data Bus
Korea Univ
Processor
Our MIPS implementation is simplified by implementing
only
memory-reference instructions: lw, sw
arithmetic-logical instructions: add, sub, and, or, slt
Control flow instructions: beq, j
Generic implementation steps
Fetch: use the program counter (PC) to supply the instruction
address and fetch the instruction from memory (and update the
PC)
Decoding: decode the instruction (and read registers)
Execution: execute the instruction
Address Bus
MIPS CPU
Fetch
PC = PC +4
Instruction
Memory
Data Bus
Address Bus
Execute
Decode
Data Bus
Data
Memory
Korea Univ
Instruction Execution in CPU
Fetch
Fetch instruction by accessing memory with PC
Decoding
Extract opcode: Determine what operation should be done
Extract operands: Register numbers or immediate from fetched
instruction
Read registers from register file
Execution
Use ALU to calculate (depending on instruction class)
Arithmetic result
Memory address for load/store
Branch target address
Access data memory for load/store
Address Bus
MIPS CPU
Next Fetch
Fetch
PC = PC +4
PC target address or PC + 4
Instruction
Memory
Data Bus
Address Bus
Execute
Decode
Data Bus
Data
Memory
Korea Univ
Revisiting Logic Design Basics
Combinational logic
Output is directly determined by input
Sequential logic
Output is determined not only by input, but
also by internal state
Sequential logic needs state elements to store
information
Flip-flop and latch are used to store the state
information
But, avoid using latch in digital design
Korea Univ
Combinational Logic Examples
Adder
AND gate
Y=A&B
A
B
Y=A+
B
+
Arithmetic Logic Unit
(ALU)
Multiplexer
Y = S ? I1 : I0
I0
I1
M
u
x
Y = F(A, B)
A
ALU
F
9
Korea Univ
State Element (Register)
Register (flip-flop): stores data in a circuit
Clock signal determines when to update the stored
value
Edge-triggered
Rising-edge triggered: update when clock changes from 0 to 1
Falling-edge triggered: update when clock changes from 1 to 0
Data input determines what (0 or 1) to update to the
output
Flip-flop
(register)
D
Clk
Clk
D
Q
10
Korea Univ
State Element (Register)
Register with write control
Only updates on clock edge when write
control input is 1
Clk
D
Write
Clk
Write
D
Q
11
Korea Univ
Clocking Methodology
Virtually all digital systems are essentially synchronous to
the clock
Combinational logic sits between state elements (registers)
Combinational logic transforms data during clock cycles
Between clock edges
Input from state elements
Output to the next state elements
Longest delay determines clock period (frequency)
12
Korea Univ
Building a Datapath
Processor is composed of datapath and control
Datapath
Elements that process data and addresses in the CPU
Registers, ALUs, muxs, memories,
Control
Logic that controls operations
When to write to a register
What kind of operation ALU should do
Addition, Subtraction, Exclusive OR and so on
We will build a MIPS datapath incrementally and provide
Verilog code
We adopt both structural and behavioral modeling
Behavioral modeling describes what a module does
For example, the lowest modules (such as ALU and register files) will be
designed with the behavioral modeling
Structural modeling describes a module from simpler modules via
instantiations
For example, the top module (such as MIPS_CPU) will be designed with the
structural modeling
13
Korea Univ
Address Bus
Instruction
Memory
MIPS
CPU
Data Bus
Address Bus
Overview of CPU Design
Data
Memory
Data Bus
mips_tb.v (testbench)
mips_cpu_mem.v
reset
mips_cpu.v
Address
imem.v
(Instruction
Memory)
clock
fetch,
pc
Decodin
g
Register
File
ALU
Memory
Access
Instruction
Address
dmem.v
DataOut
(Data
Memory)
DataIn
14
Binary
(machine
code)
Data in
your
program,
Stack,
Heap
Korea Univ
Instruction Fetch
MIPS CPU
Increment by 4 for
next instruction
4
Add
Instructio
n Memory
reset
clock
Address
PC
Out
32
instruction
32-bit register (flip-flops)
What is PC on reset?
MIPS initializes the PC to 0xBFC0_0000
For the sake of simplicity, lets initialize the PC to 0x0000_0000 in our design
How about x86 and ARM?
x86 reset vector is 0xFFFF_FFF0. BIOS ROM is located there
ARM reset vector is 0x0000_0000
15
Korea Univ
Instruction Fetch Verilog Model
4
reset
clock
Add
PC
`include "delay.v"
`include "delay.v"
`include "delay.v"
module pc (input
clk, reset,
output reg [31:0] pc,
input
[31:0] pcnext);
module adder(input [31:0] a, b,
output [31:0] y);
module mips_cpu(input
clk, reset,
output [31:0] pc,
input [31:0] instr);
always @(posedge clk, posedge reset)
begin
if (reset) pc <= #`mydelay
0'h00000000;
else
pc <= #`mydelay pcnext;
end
assign #`mydelay y = a + b;
wire [31:0] pcnext;
endmodule
// instantiate pc and adder modules
pc
pcreg (clk, reset, pc, pcnext);
adder pcadd4 (pc, 32'b100, pcnext);
endmodule
endmodule
16
Korea Univ
Memory
As studied in the Computer Logic Design,
memory is classified into RAM (Random Access
Memory) and ROM (Read-Only Memory)
RAM is classified into DRAM (Dynamic RAM) and SRAM
(Static RAM)
DDR is a DRAM
Short form of DDR (Double Data Rate) SDRAM (Synchronous
DRAM)
DDR is used as main memory in modern computers
We use a simple Verilog memory model that
stores your program since our focus is on how
CPU works
17
Korea Univ
Simple MIPS Test Code
Example MIPS Assembly code
assemble
18
Korea Univ
Instruction Memory Verilog
Model
module imem(input [6:0] a,
output [31:0] rd);
128
words
Instruction
Memory
reg [31:0] RAM[127:0];
Word
(32-bit)
initial
begin
$readmemh("memfile.dat",RAM);
end
Compiled
binary file
a[6:0]
assign #1 rd = RAM[a]; // word
aligned
endmodule
Data comes out from
the address a
2002000
5
2003000
c
2067fff7
00e2202
5
0064282
4
00a4282
0
10a7000
a
0064202
a
1080000
1
2005000
0
00e2202
a
0085382
0
00e2382
2
ac67004
4
8c02005
0
0800001
1
2002000
1
ac02005
4
rd[31:0] 32
memfile.dat
Depending on your needs, you can increase or decrease the memory size
Examples
For 1KB word-addressable memory, reg [31:0] RAM[255:0]
For 16KB byte-addressable memory, reg [7:0] RAM[16*1024-1:0]
19
Korea Univ
MIPS CPU with imem and
Testbench
module mips_tb();
reg
reg
module mips_cpu_mem(input clk, reset);
wire [31:0] pc, instr;
// instantiate processor and memories
mips_cpu imips_cpu (clk, reset, pc,
instr);
imem
imips_imem (pc[7:2], instr);
clk;
reset;
// instantiate device to be tested
mips_cpu_mem imips_cpu_mem(clk, reset);
// initialize test
initial
begin
reset <= 1;
# 32;
reset <= 0;
end
endmodule
// generate clock to sequence tests
initial
begin
clk <= 0;
forever #10 clk <= ~clk;
end
endmodule
20
Korea Univ
Simulation and Synthesis
Instruction fetch simulation
Synthesis
Try to synthesis pc and adder with
Quartus-II
21
Korea Univ