Risc PPT Final v1
Risc PPT Final v1
erilog
Computer Architecture
• Computer architecture
• Definition of ISA to facilitate implementation of
software layers
• The hardware/software interface
• Computer micro-architecture
• Design processor, memory, I/O to implement ISA
• Efficiently implementing the interface
2
The Next Step – Simple RISC
Processor
• Reduced Instruction Set Computer
• Key features
• Large number of general purpose registers
or use of compiler technology to optimize register use
• Limited and simple instruction set
• Emphasis on optimising the instruction pipeline
Simple RISC Processor?
compute
jump/branch
targets
A
memory register
D
alu
file
B
+4
addr
PC
inst
M
B
memory
extend Forward
new
imm
unit
Detect
pc hazard
ctrl
ctrl
Fetch Decode Memory Back
IF/ID ID/EX EX/MEM MEM/WB
4
Reduced Instruction Set Computer
RISC-V = Reduced Instruction Set Computer (RlSC)
• ≈ 200 instructions, 32 bits each, 4 formats
• all operands in registers
• almost all are 32 bits each
• ≈ 1 addressing mode: Mem[reg + imm]
RISC-V X86
Reduced Instruction Set Computer Complex Instruction Set Computer
(RlSC) (ClSC)
• ≈ 200 instructions, • > 1000 instructions,
• 32 bits each, 4 formats • 1 to 15 bytes each
• all operands in registers • operands in dedicated registers,
• almost all are 32 bits each general purpose registers, memory,
on stack, …
• can be 1, 2, 4, 8 bytes, signed or
• ≈ 1 addressing mode: Mem[reg + unsigned
imm] • 10s of addressing modes
• e.g. Mem[segment + reg + reg*scale +
offset]
6
Comparison of RISC and CISC
(x86)
Parameter RISC CISC
Instruction Set Size ≈ 200 instructions > 1000 instructions
Instruction Length 32 bits each 1 to 15 bytes each
Instruction Formats 4 formats N/A (variable-length instructions)
Operands in dedicated registers,
Operand Location All operands in registers general purpose registers, memory,
on stack, etc.
Can be 1, 2, 4, 8 bytes, signed or
Operand Size Almost all are 32 bits each
unsigned
10s of addressing modes (e.g.,
≈ 1 addressing mode (Mem[reg +
Addressing Modes Mem[segment + reg + reg*scale +
imm])
offset])
Implications - RISC
•
Best support is given by optimising most used and
most time consuming features
•
Large number of registers
•
Operand referencing (assignments, locality)
•
Careful design of pipelines
•
Conditional branches and procedures
•
Simplified (reduced) instruction set - for optimization
of pipelining and efficient use of registers
Why CISC (1)?
• Compiler simplification?
• Disputed…
• Complex machine instructions harder to exploit
• Optimization more difficult
• Smaller programs?
• Program takes up less memory but…
• Memory is now cheap
• May not occupy less bits, just look shorter in symbolic form
• More instructions require longer op-codes
• Register references require fewer bits
Why CISC (2)?
• Faster programs?
• Bias towards use of simpler instructions
• More complex control unit, thus even simple instructions
take longer to execute
Prog
inst
Mem Reg. ALU
File Data
Mem
+4
5 5 5
PC
control
Instructions: A basic processor
stored in memory, encoded in binary • fetches
00100000000000100000000000001010 • decodes
00100000000000010000000000000000
00000000001000100001100000101010 • executes
one instruction at a time
21
Levels of Interpretation: Instructions
for (i = 0; i < 10; i++) High Level Language
printf(“go cucs”); • HDL, C, Java, Python, ADA, …
• Loops, control flow, variables
Circuits
Gates
Transistors
Silicon
23
Big Picture: Where are we going?
HDL int x = 10;
compiler x = 2 * x + 15; High Level
addi x5, x0, 10 Languages
RISC-V
muli x5, x5, 2
assembly addi x5, x5, 15
assembler
00000000101000000000001010010011
machine 00000000001000101000001010000000
code 00000000111100101000001010010011
Instruction Set
CPU Architecture (ISA)
Circuits
Gates
Transistors
24
Silicon
Single-Cycle RISC-V
Datapath
Big Picture: Building a
Processor
memory inst
register
alu
file
+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend
27 27
RISC-V Register File
memory inst
register
alu
file
+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend
29
RISC-V Register File
• RISC-V register file
• 32 registers, 32-bits each W x0 A
32 32
• x0 wired to zero x1
• Write port indexed via RW … B 32
• on falling edge when WE=1
x31
• Read ports indexed via RA, RB
WE RW RA RB
1 5 5 5
• RISC-V register file
• Numbered from 0 to 31
• Can be referred by number: x0, x1, x2, … x31
• Convention, each register also has a name:
• x10 – x17 a0 – a7, x28 – x31 t3 – t6
8
RISC-V Memory
memory inst
register
alu
file
+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend
memory inst
register
alu
file
+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend
Registers 0010000000
1
Control 0010000001
ALU data, address,
0
control 0001000010
0
... Data
CPU 1010001000
Memory
0
1011000001
1
0010001010
1
Program
...
Memory 34
Takeaway
A processor executes instructions
• Processor has some internal state in storage elements
(registers)
A memory holds instructions and data
• (modified) Harvard architecture: separate insts and data
• von Neumann architecture: combined inst and data
A bus connects the two
40
Stage 1: Instruction Fetch
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control
49
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
• I-type: result and source register, shift amount in 16-bit immediate
with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero extension
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link
• I-type: jump-and-link register
• S-type: conditional branches: pc-relative addresses
50
RISC-V instruction formats
All RISC-V instructions are 32 bits long, have 4
formats
• R-type funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
• I-type imm rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits
imm rd op
• U-type 20 bits 5 bits 7 bits 51
R-Type (1): Arithmetic and Logic
00000000011001000100001000110011
funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
op funct mnemonic description
3
0110011 000 ADD rd, rs1, rs2 R[rd] = R[rs1] + R[rs2]
0110011 000 SUB rd, rs1, rs2 R[rd] = R[rs1] – R[rs2]
0110011 110 OR rd, rs1, rs2 R[rd] = R[rs1] | R[rs2]
0110011 100 XOR rd, rs1, rs2 R[rd] = R[rs1] R[rs2]
52
Arithmetic and Logic
Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control
skip
53
R-Type (2): Shift Instructions
0000000001100010000101000011011
funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
op funct3 mnemonic description
0110011 001 SLL rd, rs1, rs2 R[rd] = R[rs1] << R[rs2]
0110011 101 SRL rd, rs1, rs2 R[rd] = R[rs1] >>> R[rs2] (zero ext.)
0110011 101 SRA rd, rs1, rs2 R[rd] = R[rt] >>> R[rs2] (sign ext.)
54
Shift
Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control
skip
55
I-Type (1): Arithmetic w/
immediates
00000000010100101000001010010011
imm rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits
56
Arithmetic w/ immediates
Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control
imm
extend
16 12
shamt
57
U-Type (1):“ Load” Upper
Immediate
00000000000000000101001010110111
imm rd op
20 bits 5 bits 7 bits
op mnemonic description
0110111 LUI rd, imm R[rd] = imm << 16
58
Load Upper Immediate
Prog.
Reg. ALU
Mem
File
0x50000
+4
5 5 5
PC
control 16
imm
extend
16 12
shamt
59
Multiplication
60
Hardware Multiply: Sequential
Multiplicand Multiplier
<< 1 >> 1
(32 bit) (16 bit)
lsb==1?
32+
32
Product
(32 bit) we
62
Divider Circuit
Divisor
Quotient
Sub >=0
Shift in 0 or 1
Remainder Dividend
msb
Shift in 0 or 1
Shift in 0
64
Shift and Rotation Instructions
• Left/right shifts are useful…
• Fast multiplication/division by small constants (next)
• Bit manipulation: extracting and setting individual bits in words
• Right shifts
• Can be logical (shift in 0s) or arithmetic (shift in copies of MSB)
srl 110011, 2 = 001100
sra 110011, 2 = 111100
• Caveat: for negative numbers, sra is not equal to division by 2
• Consider: -1 / 16 = ?
66
A Simple Shifter
• The simplest 16-bit shifter: can only shift left by 1
• Implement using wires (no logic!)
• Slightly more complicated: can shift left by 1 or 0
• Implement using wires and a multiplexor (mux16_2to1)
0 A0
A O
A15
A O
A <<1 O <<1
67
Barrel Shifter
• What about shifting left by any amount 0–15?
A O
<<8 <<4 <<2 <<1
shift[3] shift[2] shift[1] shift[0]
shift
69
Single-Cycle Performance
70
Single-Cycle Datapath Performance-
Sequential
<<
+ 2
4
P Insn Register
a
C Mem File Data
s1 s2 d
dMem
S
X
71
RISC Pipelining
• Most instructions are register to register
• Arithmetic/logic instruction:
• I: Instruction fetch
• E: Execute (ALU operation with register input and output)
• Load/store instruction:
• I: Instruction fetch
• E: Execute (calculate memory address, see virtual memory)
• D: Memory (register to memory or memory to register
operation)
Foreshadowing: Pipelined Datapath
PC PC
<<
+ 2
4
A O
Insn Register
PC a
Mem File O Data D
s1 s2 d B
B dMem
S
X
IR IR IR IR
Pipelined 1 2 3 4 5 6 7
LOAD rA, m1 I E D
LOAD rB, m2 I E D
ADD rC, rA, rB I E
STORE m3, rC I E D
Optimization of
Pipelining
• Code reorganization techniques to reduce data and
branch dependencies
• Delayed branch
• Does not take effect until the execution of following
instruction
• This following instruction is the delay slot
• More successful with unconditional branch
• 1st approach: insert NOOP (prevents fetching instr., no
pipeline flush and delays the effect of jump)
• 2nd approach: reorder instructions
Normal and Delayed Branch
Address Normal branch 1st Delayed branch 2nd Delayed branch
100 LOAD rA, X LOAD rA, X LOAD rA, X
101 ADD rA, #1 ADD rA, #1 JUMP 105
102 JUMP 105 JUMP 106 ADD rA, #1
103 ADD rA, rB NOOP ADD rA, rB
104 SUB rC, rB ADD rA, rB SUB rC, rB
105 STORE Z, rA SUB rC, rB STORE Z, rA
106 STORE Z, rA
Use of Delayed Branch
Normal branch 1 2 3 4 5 6 7 8
100. LOAD rA, X I E D
101. ADD rA, #1 I E
102. JUMP 105 I E
103. ADD rA, rB I
105. STORE Z, rA I E D
Delayed branch 1 2 3 4 5 6
100. LOAD rA, X I E D
102. JUMP 105 I E
101. ADD rA, #1 I E
105. STORE Z, rA I E D
Goals for today
Memory
• CPU: Register Files (i.e. Memory w/in the CPU)
• Scaling Memory: Tri-state devices
• Cache: SRAM (Static RAM—random access memory)
• Memory: DRAM (Dynamic RAM)
Last time: How do we store one bit
D Q
clk D Flip Flop stores 1 bit
Goal for today
+4 +4
=? addr
PC din dout
offset control cmp
memory
new target
imm
pc extend
+4 +4
=? addr
PC din dout
offset control cmp
memory
new target
imm
pc extend
Register File
Register File QA
DW 32
• N read/write registers 32 Dual-Read-Port
• Indexed by Single-Write-Port Q B
32
register number 32 x 32
Register File
W RW RA RB
1 5 5 5
84
Register File
Recall: Register
D0
• D flip-flops in parallel
• shared clock
D1
• extra clocked inputs:
write_enable, reset, …
D2
D3
4-bit
4 reg 4
clk
clk 85
Register File
Recall: Register
D0
• D flip-flops in parallel
• shared clock
D1
• extra clocked inputs:
write_enable, reset, …
D2
D3
32-bit
32 reg 32
clk
clk 86
Register File
Register File 32
D
• N read/write registers Reg 0
Reg 1
• Indexed by
register number
5-to-32
decoder
….
Reg 30
Reg 31
addi x1, 5
RW W
x0, 10 00001
How to write to one register in the register file?
• Need a decoder
87
Aside: 3-to-8 decoder truth table
& circuit
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6 o7
3-to-8
0 0 0 decoder
…
0 0 1
0 1 0 001
3
0 1 1 RW
1 0 0
1 0 1
1 1 0
1 1 1
88
Aside: 3-to-8 decoder truth table
& circuit
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6 o7
3-to-8
0 0 0 1 decoder
…
0 0 1 1
0 1 0 1 001
3
0 1 1 1 RW
1 0 0 1 i2
i1 o0
1 0 1 1 i0
1 1 0 1 i2
i1 o5
1 1 1 1
i0
89
Register File
Register File 32
• N read/write registers Reg 0
Reg 1 M 32
• Indexed by
register number …. …. U
X
QA
Reg 30
Reg 31
• Need a multiplexor
5 5
RA RB
90
Register File
Register File D
32
32
• N read/write registers Reg 0
Reg 1 M 32
• Indexed by
register number 5-to-32 …. …. U
X
QA
decoder
Reg 30
Reg 31
Implementation: M
• D flip flops to store bits U 32
…. X QB
• Decoder for each write
port
• Mux for each read port
5 5 5
W RW RA RB
91
Register File
Register File
• N read/write registers QA
• Indexed by 32
DW
Dual-Read-Port
32
register number
Single-Write-Port Q B
32
32 x 32
Register File
Implementation: W RW RA RB
• D flip flops to store bits
1 5 5 5
• Decoder for each write
port
• Mux for each read port
92
Register File
Register File
What happens if same
• N read/write registers
• Indexed by
register read and
register number written during same
clock cycle?
Implementation:
• D flip flops to store bits
• Decoder for each write
port
• Mux for each read port
93
Tradeoffs a
8-to-1 mux
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link 96
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
✔ • I-type: result and source register, shift amount in 16-bit
immediate with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero
extension
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link
• I-type: jump-and-link register
• S-type: conditional branches: pc-relative addresses 97
Summary
We have all that it takes to build a processor!
• Arithmetic Logic Unit (ALU)
• Register File
• Memory