0% found this document useful (0 votes)
110 views

Risc PPT Final v1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Risc PPT Final v1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 98

RISC_processor_HDL_v

erilog
Computer Architecture
• Computer architecture
• Definition of ISA to facilitate implementation of
software layers
• The hardware/software interface

• Computer micro-architecture
• Design processor, memory, I/O to implement ISA
• Efficiently implementing the interface

2
The Next Step – Simple RISC
Processor
• Reduced Instruction Set Computer

• Key features
• Large number of general purpose registers
or use of compiler technology to optimize register use
• Limited and simple instruction set
• Emphasis on optimising the instruction pipeline
Simple RISC Processor?
compute
jump/branch
targets

A
memory register

D
alu
file

B
+4
addr
PC
inst

control din dout

M
B
memory
extend Forward
new

imm
unit
Detect
pc hazard

Instruction Instruction ctrl Execute Write-

ctrl

ctrl
Fetch Decode Memory Back
IF/ID ID/EX EX/MEM MEM/WB
4
Reduced Instruction Set Computer
RISC-V = Reduced Instruction Set Computer (RlSC)
• ≈ 200 instructions, 32 bits each, 4 formats
• all operands in registers
• almost all are 32 bits each
• ≈ 1 addressing mode: Mem[reg + imm]

x86 = Complex Instruction Set Computer (ClSC)


• > 1000 instructions, 1 to 15 bytes each
• operands in dedicated registers, general purpose registers,
memory, on stack, …
• can be 1, 2, 4, 8 bytes, signed or unsigned
• 10s of addressing modes
• e.g. Mem[segment + reg + reg*scale + offset] 5
Reduced Instruction Set Computer
RISC-V x86

RISC-V X86
Reduced Instruction Set Computer Complex Instruction Set Computer
(RlSC) (ClSC)
• ≈ 200 instructions, • > 1000 instructions,
• 32 bits each, 4 formats • 1 to 15 bytes each
• all operands in registers • operands in dedicated registers,
• almost all are 32 bits each general purpose registers, memory,
on stack, …
• can be 1, 2, 4, 8 bytes, signed or
• ≈ 1 addressing mode: Mem[reg + unsigned
imm] • 10s of addressing modes
• e.g. Mem[segment + reg + reg*scale +
offset]
6
Comparison of RISC and CISC
(x86)
Parameter RISC CISC
Instruction Set Size ≈ 200 instructions > 1000 instructions
Instruction Length 32 bits each 1 to 15 bytes each
Instruction Formats 4 formats N/A (variable-length instructions)
Operands in dedicated registers,
Operand Location All operands in registers general purpose registers, memory,
on stack, etc.
Can be 1, 2, 4, 8 bytes, signed or
Operand Size Almost all are 32 bits each
unsigned
10s of addressing modes (e.g.,
≈ 1 addressing mode (Mem[reg +
Addressing Modes Mem[segment + reg + reg*scale +
imm])
offset])
Implications - RISC

Best support is given by optimising most used and
most time consuming features

Large number of registers

Operand referencing (assignments, locality)

Careful design of pipelines

Conditional branches and procedures

Simplified (reduced) instruction set - for optimization
of pipelining and efficient use of registers
Why CISC (1)?
• Compiler simplification?
• Disputed…
• Complex machine instructions harder to exploit
• Optimization more difficult
• Smaller programs?
• Program takes up less memory but…
• Memory is now cheap
• May not occupy less bits, just look shorter in symbolic form
• More instructions require longer op-codes
• Register references require fewer bits
Why CISC (2)?
• Faster programs?
• Bias towards use of simpler instructions
• More complex control unit, thus even simple instructions
take longer to execute

• It is far from clear that CISC is the appropriate solution


What is Superscalar?
• Common instructions (arithmetic, load/store, conditional
branch) can be initiated simultaneously and executed
independently
• Applicable to both RISC & CISC
Why Superscalar?
• Most operations are on scalar quantities
• Improve these operations by executing them concurrently in
multiple pipelines
• Requires multiple functional units
• Requires re-arrangement of instructions
General Superscalar
Organization
Limitations
• Instruction level parallelism: the degree to which the
instructions can be executed parallel (in theory)
• To achieve it:
• Compiler based optimisation
• Hardware techniques
• Limited by
• Data dependency
• Procedural dependency
• Resource conflicts
The RISC Tenets
RISC CISC
• Single-cycle execution • many multicycle operations
• Hardwired control • microcoded multi-cycle
operations

• Load/store architecture • register-mem and mem-mem


• Few memory addressing • many modes
modes
• Fixed-length insn format • many formats and lengths

• hand assemble to get good


• Reliance on compiler performance
optimizations • few registers
• Many registers (compilers
are better at using them)
15
Comparison of RISC vs CISC vs
SuperScalar
Instruction Set
Architecture (ISA)
Instruction Set Architecture
(ISA)
Different CPU architectures specify different instructions

Two classes of ISAs


• Reduced Instruction Set Computers (RISC)
IBM Power PC, Sun Sparc, MIPS, Alpha
• Complex Instruction Set Computers (CISC)
Intel x86, PDP-11, VAX

Another ISA classification: Load/Store Architecture


• Data must be in registers to be operated on
For example: array[x] = array[y] + array[z]
1 add ? OR 2 loads, an add, and a store ?
• Keeps HW simple  many RISC ISAs are load/store
18
RISC Arithmetic Instructions
1.Addition (ADD):
•add rd, rs1, rs2 (rd = rs1 + rs2)
2.Subtraction (SUB):
•sub rd, rs1, rs2 (rd = rs1 - rs2)
3.Multiplication (MUL):
•mul rd, rs1, rs2 (rd = rs1 * rs2)
4.Division (DIV):
•div rd, rs1, rs2 (rd = rs1 / rs2)
5.Remainder (REM):
•rem rd, rs1, rs2 (rd = rs1 % rs2)
6.Multiplication and Add (MULH, MULHU,
MULHSU):
•mulh rd, rs1, rs2 (rd = (rs1 * rs2) >> XLEN)
•mulhu rd, rs1, rs2 (rd = (rs1 * rs2) >> XLEN,
for RV64)
•mulhsu rd, rs1, rs2 (rd = (rs1 * rs2) >> XLEN,
signed, for RV64)
RISC Logical Instructions
1.Bitwise AND (AND):
•and rd, rs1, rs2 (rd = rs1 & rs2)
2.Bitwise OR (OR):
•or rd, rs1, rs2 (rd = rs1 | rs2)
3.Bitwise XOR (XOR):
•xor rd, rs1, rs2 (rd = rs1 ^ rs2)
4.Bitwise NOT (NOT):
•not rd, rs1 (rd = ~rs1)
5.Shift Left Logical (SLL):
•sll rd, rs1, rs2 (rd = rs1 << (rs2 % XLEN))
6.Shift Right Logical (SRL):
•srl rd, rs1, rs2 (rd = rs1 >> (rs2 % XLEN))
7.Shift Right Arithmetic (SRA):
•sra rd, rs1, rs2 (rd = rs1 >> (rs2 % XLEN),
sign-extended)
Instruction Processing

Prog
inst
Mem Reg. ALU
File Data
Mem
+4
5 5 5
PC
control
Instructions: A basic processor
stored in memory, encoded in binary • fetches
00100000000000100000000000001010 • decodes
00100000000000010000000000000000
00000000001000100001100000101010 • executes
one instruction at a time
21
Levels of Interpretation: Instructions
for (i = 0; i < 10; i++) High Level Language
printf(“go cucs”); • HDL, C, Java, Python, ADA, …
• Loops, control flow, variables

main: addi x2, x0, 10 Assembly Language


addi x1, x0, 0 • No symbols (except labels)
loop: slt x3, x1, x2 • One operation per statement
... • “human readable machine
language”
10 x2 x0 op=addi
00000000101000010000000000010 Machine Language
011 • Binary-encoded assembly
00100000000000010000000000010 • Labels become addresses
000 • The language of the CPU
00000000001000100001100000101
Instruction Set Architecture
010
Machine Implementation
ALU, Control, Register File, …
(Microarchitecture) 22
ISA-Architecture
HDL int x = 10;
compiler x = 2 * x + 15; x0 = 0
x5 = x0 + 10
RISC-V addi x5, x0, 10
muli x5, x5, 2 x5 = x5<<1 #x5 = x5 * 2
assembly addi x5, x5, 15 x5 = x15 + 15
assembler 10 r0 r5 op = addi
00000000101000000000001010010011
machine 00000000001000101000001010000000
code 00000000111100101000001010010011
15 r5 r5 op =
CPU opaddi
= r-type x5 shamt=1 x5 func=sll

Circuits

Gates

Transistors

Silicon
23
Big Picture: Where are we going?
HDL int x = 10;
compiler x = 2 * x + 15; High Level
addi x5, x0, 10 Languages
RISC-V
muli x5, x5, 2
assembly addi x5, x5, 15
assembler
00000000101000000000001010010011
machine 00000000001000101000001010000000
code 00000000111100101000001010010011
Instruction Set
CPU Architecture (ISA)
Circuits

Gates

Transistors
24
Silicon
Single-Cycle RISC-V
Datapath
Big Picture: Building a
Processor

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor


26
• Understanding the basics of a processor
• We now have the technology to build a CPU!

• Putting it all together:


• Arithmetic Logic Unit (ALU)
• Register File
• Memory
• SRAM: cache
• DRAM: main memory
• RISC-V Instructions & how they are executed

27 27
RISC-V Register File

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor


28
RISC-V Register File
• RISC-V register file
• 32 registers, 32-bits each DW QA
32 32
• x0 wired to zero Dual-Read-Port
Single-Write-Port
• Write port indexed via RW 32 x 32 QB 32
• on falling edge when WE=1 Register File
• Read ports indexed via RA, RB
WE RW RA RB
1 5 5 5

29
RISC-V Register File
• RISC-V register file
• 32 registers, 32-bits each W x0 A
32 32
• x0 wired to zero x1
• Write port indexed via RW … B 32
• on falling edge when WE=1
x31
• Read ports indexed via RA, RB
WE RW RA RB
1 5 5 5
• RISC-V register file
• Numbered from 0 to 31
• Can be referred by number: x0, x1, x2, … x31
• Convention, each register also has a name:
• x10 – x17  a0 – a7, x28 – x31  t3 – t6
8
RISC-V Memory

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor


31
RISC-V Memory
1 byte address
Din Dout
0x000ffff
32 memory 32 f
. . .
32 2 0x05 0x0000000
b
addr mc E
0x0000000
• 32-bit address a
• 32-bit data (but byte addressed) 0x0000000
9
• Enable + 2 bit memory control (mc) 0x0000000
8
0x0000000
00: read word (4 byte aligned) 7
01: write byte 0x0000000
6
10: write halfword (2 byte aligned)
0x0000000
11: write word (4 byte aligned) 5 32
Putting it all together: Basic
Processor

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor


33
Putting it all together: Basic
Processor
A RISC-V CPU with a (modified) Harvard architecture
• Modified: instructions & data in common address space, separate
instr/data caches can be accessed in parallel

Registers 0010000000
1
Control 0010000001
ALU data, address,
0
control 0001000010
0
... Data
CPU 1010001000
Memory
0
1011000001
1
0010001010
1
Program
...
Memory 34
Takeaway
A processor executes instructions
• Processor has some internal state in storage elements
(registers)
A memory holds instructions and data
• (modified) Harvard architecture: separate insts and data
• von Neumann architecture: combined inst and data
A bus connects the two

We now have enough building blocks to build


machines that can perform non-trivial computational
tasks
35
Takeaway

A RISC-V processor and ISA (instruction set architecture) is


an example a Reduced Instruction Set Computers (RISC)
where simplicity is key, thus enabling us to build it!!
Next Goal
How are instructions executed?
What is the general datapath to execute an instruction?
Five-Stage RISC-V
Datapath
Five Stages of RISC-V Datapath
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

A single cycle processor – this diagram is not 100% spatial


39
Five Stages of RISC-V Datapath
Basic CPU execution loop
1. Instruction Fetch
2. Instruction Decode
3. Execution (ALU)
4. Memory Access
5. Register Writeback

40
Stage 1: Instruction Fetch

Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Fetch 32-bit instruction from memory


Increment PC = PC + 4 41
Stage 2: Instruction Decode
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Gather data from the instruction


Read opcode; determine instruction type, field lengths
Read in data from register file
(0, 1, or 2 reads for jump, addi, or add, respectively)
42
Stage 3: Execution (ALU)
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Useful work done here (+, -, *, /), shift, logic operation,


comparison (slt)
Load/Store? lw x2, x3, 32  Compute address
43
Stage 4: Memory Access
Prog. inst addr
Reg. ALU
Mem Data Data
File
Mem
+4
Data
5 5 5
PC R/W
control

Fetch Decode Execute Memory WB

Used by load and store instructions only


Other instructions will skip this stage
44
Stage 5: Writeback
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Write to register file


• For arithmetic ops, logic, shift, etc, load. What about stores?
Update PC
• For branches, jumps
45
Takeaway
• The datapath for a RISC-V processor has five stages:
1. Instruction Fetch
2. Instruction Decode
3. Execution (ALU)
4. Memory Access
5. Register Writeback

• This five stage datapath is used to execute all RISC-V instructions


Next Goal
• Specific datapaths RISC-V Instructions
Instruction Types
Instruction Types
• Arithmetic
• add, subtract, shift left, shift right, multiply, divide
• Memory
• load value from memory to a register
• store value to memory from a register
• Control flow
• conditional jumps (branches)
• jump and link (subroutine call)

• Many other instructions are possible


• vector add/sub/mul/div, string operations
• manipulate coprocessor
• I/O

49
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
• I-type: result and source register, shift amount in 16-bit immediate
with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero extension
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link
• I-type: jump-and-link register
• S-type: conditional branches: pc-relative addresses
50
RISC-V instruction formats
All RISC-V instructions are 32 bits long, have 4
formats
• R-type funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
• I-type imm rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

imm rs2 rs1 funct3 imm op


• S-type
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

imm rd op
• U-type 20 bits 5 bits 7 bits 51
R-Type (1): Arithmetic and Logic
00000000011001000100001000110011
funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
op funct mnemonic description
3
0110011 000 ADD rd, rs1, rs2 R[rd] = R[rs1] + R[rs2]
0110011 000 SUB rd, rs1, rs2 R[rd] = R[rs1] – R[rs2]
0110011 110 OR rd, rs1, rs2 R[rd] = R[rs1] | R[rs2]
0110011 100 XOR rd, rs1, rs2 R[rd] = R[rs1]  R[rs2]

52
Arithmetic and Logic

Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

skip

53
R-Type (2): Shift Instructions
0000000001100010000101000011011
funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
op funct3 mnemonic description
0110011 001 SLL rd, rs1, rs2 R[rd] = R[rs1] << R[rs2]
0110011 101 SRL rd, rs1, rs2 R[rd] = R[rs1] >>> R[rs2] (zero ext.)
0110011 101 SRA rd, rs1, rs2 R[rd] = R[rt] >>> R[rs2] (sign ext.)

54
Shift

Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

skip

55
I-Type (1): Arithmetic w/
immediates
00000000010100101000001010010011
imm rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

op funct mnemonic description


3
0010011 000 ADDI rd, rs1, imm R[rd] = R[rs1] + imm
0010011 111 ANDI rd, rs1, imm R[rd] = R[rs1] &
zero_extend(imm)
0010011 110 ORI rd, rs1, imm R[rd] = R[rs1] |
zero_extend(imm)

56
Arithmetic w/ immediates

Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control

imm
extend
16 12

shamt

Fetch Decode Execute Memory WB


skip

57
U-Type (1):“ Load” Upper
Immediate
00000000000000000101001010110111
imm rd op
20 bits 5 bits 7 bits

op mnemonic description
0110111 LUI rd, imm R[rd] = imm << 16

58
Load Upper Immediate

Prog.
Reg. ALU
Mem
File
0x50000
+4
5 5 5
PC
control 16

imm
extend
16 12

shamt

Fetch Decode Execute Memory WB


skip

59
Multiplication

60
Hardware Multiply: Sequential
Multiplicand Multiplier
<< 1 >> 1
(32 bit) (16 bit)

lsb==1?
32+
32
Product
(32 bit) we

• Control: repeat 16 times


• If least significant bit of multiplier is 1…
• Then add multiplicand to product
• Shift multiplicand left by 1
• Shift multiplier right by 1
61
Division

62
Divider Circuit
Divisor
Quotient

Sub >=0
Shift in 0 or 1

Remainder Dividend
msb
Shift in 0 or 1
Shift in 0

• N cycles for n-bit divide


63
Shifts & Rotates

64
Shift and Rotation Instructions
• Left/right shifts are useful…
• Fast multiplication/division by small constants (next)
• Bit manipulation: extracting and setting individual bits in words

• Right shifts
• Can be logical (shift in 0s) or arithmetic (shift in copies of MSB)
srl 110011, 2 = 001100
sra 110011, 2 = 111100
• Caveat: for negative numbers, sra is not equal to division by 2
• Consider: -1 / 16 = ?

• Rotations are less useful…


• But almost “free” if shifter is there
• MIPS and LC4 have only shifts, x86 has shifts and rotations
65
Compiler Opt: Strength Reduction
• Strength reduction: compilers will do this (sort of)
A * 4 = A << 2
A * 5 = (A << 2) + A
A / 8 = A >> 3 (only if A is unsigned)
• Useful for address calculation: all basic data types are 2M in size
int A[100];
&A[N] = A+(N*sizeof(int)) = A+N*4 = A+N<<2

66
A Simple Shifter
• The simplest 16-bit shifter: can only shift left by 1
• Implement using wires (no logic!)
• Slightly more complicated: can shift left by 1 or 0
• Implement using wires and a multiplexor (mux16_2to1)
0 A0

A O

A15

A O
A <<1 O <<1

67
Barrel Shifter
• What about shifting left by any amount 0–15?

• 16 consecutive “left-shift-by-1-or-0” blocks?


– Would take too long (how long?)
• Barrel shifter: 4 “shift-left-by-X-or-0” blocks (X = 1,2,4,8)
• What is the delay?

A O
<<8 <<4 <<2 <<1
shift[3] shift[2] shift[1] shift[0]
shift

• Similar barrel designs for right shifts and rotations 68


Shifter in Verilog
• Logical shift operators << >>
• performs zero-extension for >>
wire [15:0] a = b << c[3:0];
• Arithmetic shift operator >>>
• performs sign-extension
• requires a signed wire input
wire signed [15:0] b;
wire [15:0] a = b >>> c[3:0];

69
Single-Cycle Performance

70
Single-Cycle Datapath Performance-
Sequential
<<
+ 2
4

P Insn Register
a
C Mem File Data
s1 s2 d
dMem
S
X

• One cycle per instruction (CPI)


• Clock cycle time proportional to worst-case logic delay
• In this datapath: insn fetch, decode, register read, ALU, data memory access, write register
• Can we do better?

71
RISC Pipelining
• Most instructions are register to register
• Arithmetic/logic instruction:
• I: Instruction fetch
• E: Execute (ALU operation with register input and output)
• Load/store instruction:
• I: Instruction fetch
• E: Execute (calculate memory address, see virtual memory)
• D: Memory (register to memory or memory to register
operation)
Foreshadowing: Pipelined Datapath
PC PC
<<
+ 2
4

A O
Insn Register
PC a
Mem File O Data D
s1 s2 d B
B dMem
S
X
IR IR IR IR

• Split datapath into multiple stages


• Assembly line analogy
• 5 stages results in up to 5x clock & performance improvement
73
Delay Slots in the
Pipeline
Sequential 1 2 3 4 5 6 7 8 9 10 11
LOAD rA, m1 I E D
LOAD rB, m2 I E D
ADD rC, rA, rB I E
STORE m3, rC I E D

Pipelined 1 2 3 4 5 6 7
LOAD rA, m1 I E D
LOAD rB, m2 I E D
ADD rC, rA, rB I E
STORE m3, rC I E D
Optimization of
Pipelining
• Code reorganization techniques to reduce data and
branch dependencies
• Delayed branch
• Does not take effect until the execution of following
instruction
• This following instruction is the delay slot
• More successful with unconditional branch
• 1st approach: insert NOOP (prevents fetching instr., no
pipeline flush and delays the effect of jump)
• 2nd approach: reorder instructions
Normal and Delayed Branch
Address Normal branch 1st Delayed branch 2nd Delayed branch
100 LOAD rA, X LOAD rA, X LOAD rA, X
101 ADD rA, #1 ADD rA, #1 JUMP 105
102 JUMP 105 JUMP 106 ADD rA, #1
103 ADD rA, rB NOOP ADD rA, rB
104 SUB rC, rB ADD rA, rB SUB rC, rB
105 STORE Z, rA SUB rC, rB STORE Z, rA
106 STORE Z, rA
Use of Delayed Branch
Normal branch 1 2 3 4 5 6 7 8
100. LOAD rA, X I E D
101. ADD rA, #1 I E
102. JUMP 105 I E
103. ADD rA, rB I
105. STORE Z, rA I E D

Delayed branch 1 2 3 4 5 6
100. LOAD rA, X I E D
102. JUMP 105 I E
101. ADD rA, #1 I E
105. STORE Z, rA I E D
Goals for today
Memory
• CPU: Register Files (i.e. Memory w/in the CPU)
• Scaling Memory: Tri-state devices
• Cache: SRAM (Static RAM—random access memory)
• Memory: DRAM (Dynamic RAM)
Last time: How do we store one bit

D Q
clk D Flip Flop stores 1 bit
Goal for today

How do we store results from ALU computations?


Big Picture: Building a
Processor

memory inst register


alu
file

+4 +4
=? addr
PC din dout
offset control cmp
memory
new target
imm
pc extend

A Single cycle processor


81
Big Picture: Building a
Processor

memory inst register


alu
file

+4 +4
=? addr
PC din dout
offset control cmp
memory
new target
imm
pc extend

A Single cycle processor


82
Goal for today
How do we store results from ALU computations?

How do we use stored results in subsequent


operations?

Register File

How does a Register File work? How do we design it?


Register File

Register File QA
DW 32
• N read/write registers 32 Dual-Read-Port
• Indexed by Single-Write-Port Q B
32
register number 32 x 32
Register File
W RW RA RB

1 5 5 5

84
Register File
Recall: Register
D0
• D flip-flops in parallel
• shared clock
D1
• extra clocked inputs:
write_enable, reset, …
D2

D3
4-bit
4 reg 4
clk
clk 85
Register File
Recall: Register
D0
• D flip-flops in parallel
• shared clock
D1
• extra clocked inputs:
write_enable, reset, …
D2

D3
32-bit
32 reg 32
clk
clk 86
Register File
Register File 32
D
• N read/write registers Reg 0
Reg 1
• Indexed by
register number
5-to-32
decoder
….
Reg 30
Reg 31

addi x1, 5
RW W
x0, 10 00001
How to write to one register in the register file?
• Need a decoder

87
Aside: 3-to-8 decoder truth table
& circuit
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6 o7
3-to-8
0 0 0 decoder


0 0 1
0 1 0 001
3
0 1 1 RW

1 0 0
1 0 1
1 1 0
1 1 1
88
Aside: 3-to-8 decoder truth table
& circuit
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6 o7
3-to-8
0 0 0 1 decoder


0 0 1 1
0 1 0 1 001
3
0 1 1 1 RW

1 0 0 1 i2
i1 o0
1 0 1 1 i0
1 1 0 1 i2
i1 o5
1 1 1 1
i0
89
Register File
Register File 32
• N read/write registers Reg 0
Reg 1 M 32
• Indexed by
register number …. …. U
X
QA
Reg 30
Reg 31

add x1, x0, M


32
x5to read from two registers?
How …. U
X
QB

• Need a multiplexor

5 5
RA RB
90
Register File
Register File D
32
32
• N read/write registers Reg 0
Reg 1 M 32
• Indexed by
register number 5-to-32 …. …. U
X
QA
decoder
Reg 30
Reg 31

Implementation: M
• D flip flops to store bits U 32
…. X QB
• Decoder for each write
port
• Mux for each read port
5 5 5
W RW RA RB
91
Register File
Register File
• N read/write registers QA
• Indexed by 32
DW
Dual-Read-Port
32

register number
Single-Write-Port Q B
32
32 x 32
Register File
Implementation: W RW RA RB
• D flip flops to store bits
1 5 5 5
• Decoder for each write
port
• Mux for each read port

92
Register File
Register File
What happens if same
• N read/write registers
• Indexed by
register read and
register number written during same
clock cycle?

Implementation:
• D flip flops to store bits
• Decoder for each write
port
• Mux for each read port

93
Tradeoffs a
8-to-1 mux

Register File tradeoffs


b
+ Very fast (a few gate delays for
both read and write) c

+ Adding extra ports is d


straightforward
– Doesn’t scale e

e.g. 32Mb register file with f


32 bit registers
g
Need 32x 1M-to-1 multiplexor
and 32x 20-to-1M decoder h

How many logic gates/transistors?


s2 s1 s0
94
Takeway
Register files are very fast storage (only a few gate
delays), but does not scale to large memory sizes.
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
✔ • I-type: result and source register, shift amount in 16-bit
immediate with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero
extension

• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations

• Control flow
• U-type: jump-and-link 96
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
✔ • I-type: result and source register, shift amount in 16-bit
immediate with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero
extension
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link
• I-type: jump-and-link register
• S-type: conditional branches: pc-relative addresses 97
Summary
We have all that it takes to build a processor!
• Arithmetic Logic Unit (ALU)
• Register File
• Memory

RISC-V processor and ISA is an example of a Reduced Instruction Set


Computers (RISC)
• Simplicity is key, thus enabling us to build it!

We now know the data path for the MIPS ISA:


• register, memory and control instructions

You might also like