0% found this document useful (0 votes)

110 views

Risc PPT Final v1

Uploaded by

Vaishnavi Pandurangan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views

Risc PPT Final v1

Uploaded by

Vaishnavi Pandurangan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 98

RISC_processor_HDL_v

erilog
Computer Architecture
• Computer architecture
• Definition of ISA to facilitate implementation of
software layers
• The hardware/software interface

• Computer micro-architecture
• Design processor, memory, I/O to implement ISA
• Efficiently implementing the interface

2
The Next Step – Simple RISC
Processor
• Reduced Instruction Set Computer

• Key features
• Large number of general purpose registers
or use of compiler technology to optimize register use
• Limited and simple instruction set
• Emphasis on optimising the instruction pipeline
Simple RISC Processor?
compute
jump/branch
targets

A
memory register

D
alu
file

B
+4
addr
PC
inst

control din dout

M
B
memory
extend Forward
new

imm
unit
Detect
pc hazard

Instruction Instruction ctrl Execute Write-

ctrl

ctrl
Fetch Decode Memory Back
IF/ID ID/EX EX/MEM MEM/WB
4
Reduced Instruction Set Computer
RISC-V = Reduced Instruction Set Computer (RlSC)
• ≈ 200 instructions, 32 bits each, 4 formats
• all operands in registers
• almost all are 32 bits each
• ≈ 1 addressing mode: Mem[reg + imm]

x86 = Complex Instruction Set Computer (ClSC)

• > 1000 instructions, 1 to 15 bytes each
• operands in dedicated registers, general purpose registers,
memory, on stack, …
• can be 1, 2, 4, 8 bytes, signed or unsigned
• 10s of addressing modes
• e.g. Mem[segment + reg + reg*scale + offset] 5
Reduced Instruction Set Computer
RISC-V x86

RISC-V X86
Reduced Instruction Set Computer Complex Instruction Set Computer
(RlSC) (ClSC)
• ≈ 200 instructions, • > 1000 instructions,
• 32 bits each, 4 formats • 1 to 15 bytes each
• all operands in registers • operands in dedicated registers,
• almost all are 32 bits each general purpose registers, memory,
on stack, …
• can be 1, 2, 4, 8 bytes, signed or
• ≈ 1 addressing mode: Mem[reg + unsigned
imm] • 10s of addressing modes
• e.g. Mem[segment + reg + reg*scale +
offset]
6
Comparison of RISC and CISC
(x86)
Parameter RISC CISC
Instruction Set Size ≈ 200 instructions > 1000 instructions
Instruction Length 32 bits each 1 to 15 bytes each
Instruction Formats 4 formats N/A (variable-length instructions)
Operands in dedicated registers,
Operand Location All operands in registers general purpose registers, memory,
on stack, etc.
Can be 1, 2, 4, 8 bytes, signed or
Operand Size Almost all are 32 bits each
unsigned
10s of addressing modes (e.g.,
≈ 1 addressing mode (Mem[reg +
Addressing Modes Mem[segment + reg + reg*scale +
imm])
offset])
Implications - RISC
•
Best support is given by optimising most used and
most time consuming features
•
Large number of registers
•
Operand referencing (assignments, locality)
•
Careful design of pipelines
•
Conditional branches and procedures
•
Simplified (reduced) instruction set - for optimization
of pipelining and efficient use of registers
Why CISC (1)?
• Compiler simplification?
• Disputed…
• Complex machine instructions harder to exploit
• Optimization more difficult
• Smaller programs?
• Program takes up less memory but…
• Memory is now cheap
• May not occupy less bits, just look shorter in symbolic form
• More instructions require longer op-codes
• Register references require fewer bits
Why CISC (2)?
• Faster programs?
• Bias towards use of simpler instructions
• More complex control unit, thus even simple instructions
take longer to execute

• It is far from clear that CISC is the appropriate solution

What is Superscalar?
• Common instructions (arithmetic, load/store, conditional
branch) can be initiated simultaneously and executed
independently
• Applicable to both RISC & CISC
Why Superscalar?
• Most operations are on scalar quantities
• Improve these operations by executing them concurrently in
multiple pipelines
• Requires multiple functional units
• Requires re-arrangement of instructions
General Superscalar
Organization
Limitations
• Instruction level parallelism: the degree to which the
instructions can be executed parallel (in theory)
• To achieve it:
• Compiler based optimisation
• Hardware techniques
• Limited by
• Data dependency
• Procedural dependency
• Resource conflicts
The RISC Tenets
RISC CISC
• Single-cycle execution • many multicycle operations
• Hardwired control • microcoded multi-cycle
operations

• Load/store architecture • register-mem and mem-mem

• Few memory addressing • many modes
modes
• Fixed-length insn format • many formats and lengths

• hand assemble to get good

• Reliance on compiler performance
optimizations • few registers
• Many registers (compilers
are better at using them)
15
Comparison of RISC vs CISC vs
SuperScalar
Instruction Set
Architecture (ISA)
Instruction Set Architecture
(ISA)
Different CPU architectures specify different instructions

Two classes of ISAs

• Reduced Instruction Set Computers (RISC)
IBM Power PC, Sun Sparc, MIPS, Alpha
• Complex Instruction Set Computers (CISC)
Intel x86, PDP-11, VAX

Another ISA classification: Load/Store Architecture

• Data must be in registers to be operated on
For example: array[x] = array[y] + array[z]
1 add ? OR 2 loads, an add, and a store ?
• Keeps HW simple  many RISC ISAs are load/store
18
RISC Arithmetic Instructions
1.Addition (ADD):
•add rd, rs1, rs2 (rd = rs1 + rs2)
2.Subtraction (SUB):
•sub rd, rs1, rs2 (rd = rs1 - rs2)
3.Multiplication (MUL):
•mul rd, rs1, rs2 (rd = rs1 * rs2)
4.Division (DIV):
•div rd, rs1, rs2 (rd = rs1 / rs2)
5.Remainder (REM):
•rem rd, rs1, rs2 (rd = rs1 % rs2)
6.Multiplication and Add (MULH, MULHU,
MULHSU):
•mulh rd, rs1, rs2 (rd = (rs1 * rs2) >> XLEN)
•mulhu rd, rs1, rs2 (rd = (rs1 * rs2) >> XLEN,
for RV64)
•mulhsu rd, rs1, rs2 (rd = (rs1 * rs2) >> XLEN,
signed, for RV64)
RISC Logical Instructions
1.Bitwise AND (AND):
•and rd, rs1, rs2 (rd = rs1 & rs2)
2.Bitwise OR (OR):
•or rd, rs1, rs2 (rd = rs1 | rs2)
3.Bitwise XOR (XOR):
•xor rd, rs1, rs2 (rd = rs1 ^ rs2)
4.Bitwise NOT (NOT):
•not rd, rs1 (rd = ~rs1)
5.Shift Left Logical (SLL):
•sll rd, rs1, rs2 (rd = rs1 << (rs2 % XLEN))
6.Shift Right Logical (SRL):
•srl rd, rs1, rs2 (rd = rs1 >> (rs2 % XLEN))
7.Shift Right Arithmetic (SRA):
•sra rd, rs1, rs2 (rd = rs1 >> (rs2 % XLEN),
sign-extended)
Instruction Processing

Prog
inst
Mem Reg. ALU
File Data
Mem
+4
5 5 5
PC
control
Instructions: A basic processor
stored in memory, encoded in binary • fetches
00100000000000100000000000001010 • decodes
00100000000000010000000000000000
00000000001000100001100000101010 • executes
one instruction at a time
21
Levels of Interpretation: Instructions
for (i = 0; i < 10; i++) High Level Language
printf(“go cucs”); • HDL, C, Java, Python, ADA, …
• Loops, control flow, variables

main: addi x2, x0, 10 Assembly Language

addi x1, x0, 0 • No symbols (except labels)
loop: slt x3, x1, x2 • One operation per statement
... • “human readable machine
language”
10 x2 x0 op=addi
00000000101000010000000000010 Machine Language
011 • Binary-encoded assembly
00100000000000010000000000010 • Labels become addresses
000 • The language of the CPU
00000000001000100001100000101
Instruction Set Architecture
010
Machine Implementation
ALU, Control, Register File, …
(Microarchitecture) 22
ISA-Architecture
HDL int x = 10;
compiler x = 2 * x + 15; x0 = 0
x5 = x0 + 10
RISC-V addi x5, x0, 10
muli x5, x5, 2 x5 = x5<<1 #x5 = x5 * 2
assembly addi x5, x5, 15 x5 = x15 + 15
assembler 10 r0 r5 op = addi
00000000101000000000001010010011
machine 00000000001000101000001010000000
code 00000000111100101000001010010011
15 r5 r5 op =
CPU opaddi
= r-type x5 shamt=1 x5 func=sll

Circuits

Gates

Transistors

Silicon
23
Big Picture: Where are we going?
HDL int x = 10;
compiler x = 2 * x + 15; High Level
addi x5, x0, 10 Languages
RISC-V
muli x5, x5, 2
assembly addi x5, x5, 15
assembler
00000000101000000000001010010011
machine 00000000001000101000001010000000
code 00000000111100101000001010010011
Instruction Set
CPU Architecture (ISA)
Circuits

Gates

Transistors
24
Silicon
Single-Cycle RISC-V
Datapath
Big Picture: Building a
Processor

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor

26
• Understanding the basics of a processor
• We now have the technology to build a CPU!

• Putting it all together:

• Arithmetic Logic Unit (ALU)
• Register File
• Memory
• SRAM: cache
• DRAM: main memory
• RISC-V Instructions & how they are executed

27 27
RISC-V Register File

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor

28
RISC-V Register File
• RISC-V register file
• 32 registers, 32-bits each DW QA
32 32
• x0 wired to zero Dual-Read-Port
Single-Write-Port
• Write port indexed via RW 32 x 32 QB 32
• on falling edge when WE=1 Register File
• Read ports indexed via RA, RB
WE RW RA RB
1 5 5 5

29
RISC-V Register File
• RISC-V register file
• 32 registers, 32-bits each W x0 A
32 32
• x0 wired to zero x1
• Write port indexed via RW … B 32
• on falling edge when WE=1
x31
• Read ports indexed via RA, RB
WE RW RA RB
1 5 5 5
• RISC-V register file
• Numbered from 0 to 31
• Can be referred by number: x0, x1, x2, … x31
• Convention, each register also has a name:
• x10 – x17  a0 – a7, x28 – x31  t3 – t6
8
RISC-V Memory

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor

31
RISC-V Memory
1 byte address
Din Dout
0x000ffff
32 memory 32 f
. . .
32 2 0x05 0x0000000
b
addr mc E
0x0000000
• 32-bit address a
• 32-bit data (but byte addressed) 0x0000000
9
• Enable + 2 bit memory control (mc) 0x0000000
8
0x0000000
00: read word (4 byte aligned) 7
01: write byte 0x0000000
6
10: write halfword (2 byte aligned)
0x0000000
11: write word (4 byte aligned) 5 32
Putting it all together: Basic
Processor

memory inst
register
alu
file

+4 +4
addr
=?
PC din dout
offset control cmp
memory
new target
imm
pc extend

A single cycle processor

33
Putting it all together: Basic
Processor
A RISC-V CPU with a (modified) Harvard architecture
• Modified: instructions & data in common address space, separate
instr/data caches can be accessed in parallel

Registers 0010000000
1
Control 0010000001
ALU data, address,
0
control 0001000010
0
... Data
CPU 1010001000
Memory
0
1011000001
1
0010001010
1
Program
...
Memory 34
Takeaway
A processor executes instructions
• Processor has some internal state in storage elements
(registers)
A memory holds instructions and data
• (modified) Harvard architecture: separate insts and data
• von Neumann architecture: combined inst and data
A bus connects the two

We now have enough building blocks to build

machines that can perform non-trivial computational
tasks
35
Takeaway

A RISC-V processor and ISA (instruction set architecture) is

an example a Reduced Instruction Set Computers (RISC)
where simplicity is key, thus enabling us to build it!!
Next Goal
How are instructions executed?
What is the general datapath to execute an instruction?
Five-Stage RISC-V
Datapath
Five Stages of RISC-V Datapath
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

A single cycle processor – this diagram is not 100% spatial

39
Five Stages of RISC-V Datapath
Basic CPU execution loop
1. Instruction Fetch
2. Instruction Decode
3. Execution (ALU)
4. Memory Access
5. Register Writeback

40
Stage 1: Instruction Fetch

Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Fetch 32-bit instruction from memory

Increment PC = PC + 4 41
Stage 2: Instruction Decode
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Gather data from the instruction

Read opcode; determine instruction type, field lengths
Read in data from register file
(0, 1, or 2 reads for jump, addi, or add, respectively)
42
Stage 3: Execution (ALU)
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Useful work done here (+, -, *, /), shift, logic operation,

comparison (slt)
Load/Store? lw x2, x3, 32  Compute address
43
Stage 4: Memory Access
Prog. inst addr
Reg. ALU
Mem Data Data
File
Mem
+4
Data
5 5 5
PC R/W
control

Fetch Decode Execute Memory WB

Used by load and store instructions only

Other instructions will skip this stage
44
Stage 5: Writeback
Prog. inst
Reg. ALU
Mem Data
File
Mem
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

Write to register file

• For arithmetic ops, logic, shift, etc, load. What about stores?
Update PC
• For branches, jumps
45
Takeaway
• The datapath for a RISC-V processor has five stages:
1. Instruction Fetch
2. Instruction Decode
3. Execution (ALU)
4. Memory Access
5. Register Writeback

• This five stage datapath is used to execute all RISC-V instructions

Next Goal
• Specific datapaths RISC-V Instructions
Instruction Types
Instruction Types
• Arithmetic
• add, subtract, shift left, shift right, multiply, divide
• Memory
• load value from memory to a register
• store value to memory from a register
• Control flow
• conditional jumps (branches)
• jump and link (subroutine call)

• Many other instructions are possible

• vector add/sub/mul/div, string operations
• manipulate coprocessor
• I/O

49
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
• I-type: result and source register, shift amount in 16-bit immediate
with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero extension
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link
• I-type: jump-and-link register
• S-type: conditional branches: pc-relative addresses
50
RISC-V instruction formats
All RISC-V instructions are 32 bits long, have 4
formats
• R-type funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
• I-type imm rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

imm rs2 rs1 funct3 imm op

• S-type
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

imm rd op
• U-type 20 bits 5 bits 7 bits 51
R-Type (1): Arithmetic and Logic
00000000011001000100001000110011
funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
op funct mnemonic description
3
0110011 000 ADD rd, rs1, rs2 R[rd] = R[rs1] + R[rs2]
0110011 000 SUB rd, rs1, rs2 R[rd] = R[rs1] – R[rs2]
0110011 110 OR rd, rs1, rs2 R[rd] = R[rs1] | R[rs2]
0110011 100 XOR rd, rs1, rs2 R[rd] = R[rs1]  R[rs2]

52
Arithmetic and Logic

Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

skip

53
R-Type (2): Shift Instructions
0000000001100010000101000011011
funct rs2 rs1 funct3 rd op
7
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
op funct3 mnemonic description
0110011 001 SLL rd, rs1, rs2 R[rd] = R[rs1] << R[rs2]
0110011 101 SRL rd, rs1, rs2 R[rd] = R[rs1] >>> R[rs2] (zero ext.)
0110011 101 SRA rd, rs1, rs2 R[rd] = R[rt] >>> R[rs2] (sign ext.)

54
Shift

Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control

Fetch Decode Execute Memory WB

skip

55
I-Type (1): Arithmetic w/
immediates
00000000010100101000001010010011
imm rs1 funct3 rd op
12 bits 5 bits 3 bits 5 bits 7 bits

op funct mnemonic description

3
0010011 000 ADDI rd, rs1, imm R[rd] = R[rs1] + imm
0010011 111 ANDI rd, rs1, imm R[rd] = R[rs1] &
zero_extend(imm)
0010011 110 ORI rd, rs1, imm R[rd] = R[rs1] |
zero_extend(imm)

56
Arithmetic w/ immediates

Prog.
Reg. ALU
Mem
File
+4
5 5 5
PC
control

imm
extend
16 12

shamt

Fetch Decode Execute Memory WB

skip

57
U-Type (1):“ Load” Upper
Immediate
00000000000000000101001010110111
imm rd op
20 bits 5 bits 7 bits

op mnemonic description
0110111 LUI rd, imm R[rd] = imm << 16

58
Load Upper Immediate

Prog.
Reg. ALU
Mem
File
0x50000
+4
5 5 5
PC
control 16

imm
extend
16 12

shamt

Fetch Decode Execute Memory WB

skip

59
Multiplication

60
Hardware Multiply: Sequential
Multiplicand Multiplier
<< 1 >> 1
(32 bit) (16 bit)

lsb==1?
32+
32
Product
(32 bit) we

• Control: repeat 16 times

• If least significant bit of multiplier is 1…
• Then add multiplicand to product
• Shift multiplicand left by 1
• Shift multiplier right by 1
61
Division

62
Divider Circuit
Divisor
Quotient

Sub >=0
Shift in 0 or 1

Remainder Dividend
msb
Shift in 0 or 1
Shift in 0

• N cycles for n-bit divide

63
Shifts & Rotates

64
Shift and Rotation Instructions
• Left/right shifts are useful…
• Fast multiplication/division by small constants (next)
• Bit manipulation: extracting and setting individual bits in words

• Right shifts
• Can be logical (shift in 0s) or arithmetic (shift in copies of MSB)
srl 110011, 2 = 001100
sra 110011, 2 = 111100
• Caveat: for negative numbers, sra is not equal to division by 2
• Consider: -1 / 16 = ?

• Rotations are less useful…

• But almost “free” if shifter is there
• MIPS and LC4 have only shifts, x86 has shifts and rotations
65
Compiler Opt: Strength Reduction
• Strength reduction: compilers will do this (sort of)
A * 4 = A << 2
A * 5 = (A << 2) + A
A / 8 = A >> 3 (only if A is unsigned)
• Useful for address calculation: all basic data types are 2M in size
int A[100];
&A[N] = A+(N*sizeof(int)) = A+N*4 = A+N<<2

66
A Simple Shifter
• The simplest 16-bit shifter: can only shift left by 1
• Implement using wires (no logic!)
• Slightly more complicated: can shift left by 1 or 0
• Implement using wires and a multiplexor (mux16_2to1)
0 A0

A O

A15

A O
A <<1 O <<1

67
Barrel Shifter
• What about shifting left by any amount 0–15?

• 16 consecutive “left-shift-by-1-or-0” blocks?

– Would take too long (how long?)
• Barrel shifter: 4 “shift-left-by-X-or-0” blocks (X = 1,2,4,8)
• What is the delay?

A O
<<8 <<4 <<2 <<1
shift[3] shift[2] shift[1] shift[0]
shift

• Similar barrel designs for right shifts and rotations 68

Shifter in Verilog
• Logical shift operators << >>
• performs zero-extension for >>
wire [15:0] a = b << c[3:0];
• Arithmetic shift operator >>>
• performs sign-extension
• requires a signed wire input
wire signed [15:0] b;
wire [15:0] a = b >>> c[3:0];

69
Single-Cycle Performance

70
Single-Cycle Datapath Performance-
Sequential
<<
+ 2
4

P Insn Register
a
C Mem File Data
s1 s2 d
dMem
S
X

• One cycle per instruction (CPI)

• Clock cycle time proportional to worst-case logic delay
• In this datapath: insn fetch, decode, register read, ALU, data memory access, write register
• Can we do better?

71
RISC Pipelining
• Most instructions are register to register
• Arithmetic/logic instruction:
• I: Instruction fetch
• E: Execute (ALU operation with register input and output)
• Load/store instruction:
• I: Instruction fetch
• E: Execute (calculate memory address, see virtual memory)
• D: Memory (register to memory or memory to register
operation)
Foreshadowing: Pipelined Datapath
PC PC
<<
+ 2
4

A O
Insn Register
PC a
Mem File O Data D
s1 s2 d B
B dMem
S
X
IR IR IR IR

• Split datapath into multiple stages

• Assembly line analogy
• 5 stages results in up to 5x clock & performance improvement
73
Delay Slots in the
Pipeline
Sequential 1 2 3 4 5 6 7 8 9 10 11
LOAD rA, m1 I E D
LOAD rB, m2 I E D
ADD rC, rA, rB I E
STORE m3, rC I E D

Pipelined 1 2 3 4 5 6 7
LOAD rA, m1 I E D
LOAD rB, m2 I E D
ADD rC, rA, rB I E
STORE m3, rC I E D
Optimization of
Pipelining
• Code reorganization techniques to reduce data and
branch dependencies
• Delayed branch
• Does not take effect until the execution of following
instruction
• This following instruction is the delay slot
• More successful with unconditional branch
• 1st approach: insert NOOP (prevents fetching instr., no
pipeline flush and delays the effect of jump)
• 2nd approach: reorder instructions
Normal and Delayed Branch
Address Normal branch 1st Delayed branch 2nd Delayed branch
100 LOAD rA, X LOAD rA, X LOAD rA, X
101 ADD rA, #1 ADD rA, #1 JUMP 105
102 JUMP 105 JUMP 106 ADD rA, #1
103 ADD rA, rB NOOP ADD rA, rB
104 SUB rC, rB ADD rA, rB SUB rC, rB
105 STORE Z, rA SUB rC, rB STORE Z, rA
106 STORE Z, rA
Use of Delayed Branch
Normal branch 1 2 3 4 5 6 7 8
100. LOAD rA, X I E D
101. ADD rA, #1 I E
102. JUMP 105 I E
103. ADD rA, rB I
105. STORE Z, rA I E D

Delayed branch 1 2 3 4 5 6
100. LOAD rA, X I E D
102. JUMP 105 I E
101. ADD rA, #1 I E
105. STORE Z, rA I E D
Goals for today
Memory
• CPU: Register Files (i.e. Memory w/in the CPU)
• Scaling Memory: Tri-state devices
• Cache: SRAM (Static RAM—random access memory)
• Memory: DRAM (Dynamic RAM)
Last time: How do we store one bit

D Q
clk D Flip Flop stores 1 bit
Goal for today

How do we store results from ALU computations?

Big Picture: Building a
Processor

memory inst register

alu
file

+4 +4
=? addr
PC din dout
offset control cmp
memory
new target
imm
pc extend

A Single cycle processor

81
Big Picture: Building a
Processor

memory inst register

alu
file

+4 +4
=? addr
PC din dout
offset control cmp
memory
new target
imm
pc extend

A Single cycle processor

82
Goal for today
How do we store results from ALU computations?

How do we use stored results in subsequent

operations?

How does a Register File work? How do we design it?

Register File QA
DW 32
• N read/write registers 32 Dual-Read-Port
• Indexed by Single-Write-Port Q B
32
register number 32 x 32
Register File
W RW RA RB

1 5 5 5

84
Register File
Recall: Register
D0
• D flip-flops in parallel
• shared clock
D1
• extra clocked inputs:
write_enable, reset, …
D2

D3
4-bit
4 reg 4
clk
clk 85
Register File
Recall: Register
D0
• D flip-flops in parallel
• shared clock
D1
• extra clocked inputs:
write_enable, reset, …
D2

D3
32-bit
32 reg 32
clk
clk 86
Register File
Register File 32
D
• N read/write registers Reg 0
Reg 1
• Indexed by
register number
5-to-32
decoder
….
Reg 30
Reg 31

addi x1, 5
RW W
x0, 10 00001
How to write to one register in the register file?
• Need a decoder

87
Aside: 3-to-8 decoder truth table
& circuit
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6 o7
3-to-8
0 0 0 decoder

…
0 0 1
0 1 0 001
3
0 1 1 RW

1 0 0
1 0 1
1 1 0
1 1 1
88
Aside: 3-to-8 decoder truth table
& circuit
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6 o7
3-to-8
0 0 0 1 decoder

…
0 0 1 1
0 1 0 1 001
3
0 1 1 1 RW

1 0 0 1 i2
i1 o0
1 0 1 1 i0
1 1 0 1 i2
i1 o5
1 1 1 1
i0
89
Register File
Register File 32
• N read/write registers Reg 0
Reg 1 M 32
• Indexed by
register number …. …. U
X
QA
Reg 30
Reg 31

add x1, x0, M

32
x5to read from two registers?
How …. U
X
QB

• Need a multiplexor

5 5
RA RB
90
Register File
Register File D
32
32
• N read/write registers Reg 0
Reg 1 M 32
• Indexed by
register number 5-to-32 …. …. U
X
QA
decoder
Reg 30
Reg 31

Implementation: M
• D flip flops to store bits U 32
…. X QB
• Decoder for each write
port
• Mux for each read port
5 5 5
W RW RA RB
91
Register File
Register File
• N read/write registers QA
• Indexed by 32
DW
Dual-Read-Port
32

register number
Single-Write-Port Q B
32
32 x 32
Register File
Implementation: W RW RA RB
• D flip flops to store bits
1 5 5 5
• Decoder for each write
port
• Mux for each read port

92
Register File
Register File
What happens if same
• N read/write registers
• Indexed by
register read and
register number written during same
clock cycle?

Implementation:
• D flip flops to store bits
• Decoder for each write
port
• Mux for each read port

93
Tradeoffs a
8-to-1 mux

Register File tradeoffs

b
+ Very fast (a few gate delays for
both read and write) c

+ Adding extra ports is d

straightforward
– Doesn’t scale e

e.g. 32Mb register file with f

32 bit registers
g
Need 32x 1M-to-1 multiplexor
and 32x 20-to-1M decoder h

How many logic gates/transistors?

s2 s1 s0
94
Takeway
Register files are very fast storage (only a few gate
delays), but does not scale to large memory sizes.
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
✔ • I-type: result and source register, shift amount in 16-bit
immediate with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero
extension

• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations

• Control flow
• U-type: jump-and-link 96
RISC-V Instruction Types
• Arithmetic/Logical
• R-type: result and two source registers, shift amount
✔ • I-type: result and source register, shift amount in 16-bit
immediate with sign/zero extension
• U-type: result register, 16-bit immediate with sign/zero
extension
• Memory Access
• I-type for loads and S-type for stores
• load/store between registers and memory
• word, half-word and byte operations
• Control flow
• U-type: jump-and-link
• I-type: jump-and-link register
• S-type: conditional branches: pc-relative addresses 97
Summary
We have all that it takes to build a processor!
• Arithmetic Logic Unit (ALU)
• Register File
• Memory

RISC-V processor and ISA is an example of a Reduced Instruction Set

Computers (RISC)
• Simplicity is key, thus enabling us to build it!

We now know the data path for the MIPS ISA:

• register, memory and control instructions

Discrete Mathematical Structures With Applications To Computer Science by J.P. Tremblay, R. Manohar PDF
61% (112)
Discrete Mathematical Structures With Applications To Computer Science by J.P. Tremblay, R. Manohar PDF
510 pages
The RISC-V Processor: Hakim Weatherspoon CS 3410
No ratings yet
The RISC-V Processor: Hakim Weatherspoon CS 3410
47 pages
Lec-1-To-10 19ECE349-RISC Processor Design Using HDL
No ratings yet
Lec-1-To-10 19ECE349-RISC Processor Design Using HDL
95 pages
RISC, CISC, and Assemblers!: Hakim Weatherspoon CS 3410, Spring 2011
No ratings yet
RISC, CISC, and Assemblers!: Hakim Weatherspoon CS 3410, Spring 2011
31 pages
02 - Instruction Set Architecture-RV Part I V - 21in - Aug23
No ratings yet
02 - Instruction Set Architecture-RV Part I V - 21in - Aug23
32 pages
Risc, Cisc, and Isa Variations: Hakim Weatherspoon CS 3410
No ratings yet
Risc, Cisc, and Isa Variations: Hakim Weatherspoon CS 3410
41 pages
EL3011---16-Wrap-Up
No ratings yet
EL3011---16-Wrap-Up
62 pages
Unit Ii
No ratings yet
Unit Ii
36 pages
11 Risc Cisc and Assemblers I
No ratings yet
11 Risc Cisc and Assemblers I
39 pages
Superscalar Architectures: COMP375 Computer Architecture and Organization
No ratings yet
Superscalar Architectures: COMP375 Computer Architecture and Organization
35 pages
RISC, CISC, and ISA Variations: Prof. Hakim Weatherspoon CS 3410, Spring 2015
No ratings yet
RISC, CISC, and ISA Variations: Prof. Hakim Weatherspoon CS 3410, Spring 2015
55 pages
RISC Vs CISC, Harvard V/s Van Neumann
No ratings yet
RISC Vs CISC, Harvard V/s Van Neumann
35 pages
1 Introduction
No ratings yet
1 Introduction
40 pages
Computer Architecture Taxonomy
No ratings yet
Computer Architecture Taxonomy
13 pages
L02-SimpleImps
No ratings yet
L02-SimpleImps
41 pages
William Stallings Computer Organization and Architecture: Reduced Instruction Set Computers
No ratings yet
William Stallings Computer Organization and Architecture: Reduced Instruction Set Computers
39 pages
Module 2
No ratings yet
Module 2
50 pages
Slide 3
No ratings yet
Slide 3
34 pages
8229_90_51_RISC-CISC-ARM
No ratings yet
8229_90_51_RISC-CISC-ARM
98 pages
02 Organization
No ratings yet
02 Organization
18 pages
Chapter-2_ISA_Complete
No ratings yet
Chapter-2_ISA_Complete
92 pages
Chapter-2_ISA_Reduced
No ratings yet
Chapter-2_ISA_Reduced
62 pages
Lecture2-Appendix A Instruction Set Principles
No ratings yet
Lecture2-Appendix A Instruction Set Principles
36 pages
Lecture 4: Reduced Instruction Set Computers (RISC) and Assembly Language
No ratings yet
Lecture 4: Reduced Instruction Set Computers (RISC) and Assembly Language
39 pages
6 RISCvsCISC
No ratings yet
6 RISCvsCISC
28 pages
8 Bit Risc Processor Presentation
100% (1)
8 Bit Risc Processor Presentation
36 pages
Lab2 Assembly Lab I
No ratings yet
Lab2 Assembly Lab I
44 pages
CAO - Processor Organization and Control Unit
No ratings yet
CAO - Processor Organization and Control Unit
120 pages
CSE331_L3_ARM_ISA
No ratings yet
CSE331_L3_ARM_ISA
103 pages
lecture4
No ratings yet
lecture4
10 pages
Lec 11 Risc Cisc
No ratings yet
Lec 11 Risc Cisc
17 pages
ESD-GPP
No ratings yet
ESD-GPP
41 pages
Processor and Computer Achitecture
No ratings yet
Processor and Computer Achitecture
26 pages
Microcontroller Notes MODULE 1
100% (2)
Microcontroller Notes MODULE 1
49 pages
Aula Ch2 1
No ratings yet
Aula Ch2 1
27 pages
2023 S1 IT1020 Lecture 03
No ratings yet
2023 S1 IT1020 Lecture 03
31 pages
Ijert: 32-Bit Risc Processor For Computer Architecture
No ratings yet
Ijert: 32-Bit Risc Processor For Computer Architecture
6 pages
Milestone03_Computer Architecture Report_Group3
No ratings yet
Milestone03_Computer Architecture Report_Group3
45 pages
Chapter 02 RISC V
No ratings yet
Chapter 02 RISC V
95 pages
Lecture11_new
No ratings yet
Lecture11_new
31 pages
coa concept ppt
No ratings yet
coa concept ppt
18 pages
L03
No ratings yet
L03
39 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
45 pages
Von Neumann Architecture vs. Harvard
No ratings yet
Von Neumann Architecture vs. Harvard
22 pages
Lecture4 - CISCArchitecture (Nibs-PC's Conflicted Copy 2016-07-27)
No ratings yet
Lecture4 - CISCArchitecture (Nibs-PC's Conflicted Copy 2016-07-27)
32 pages
101_9_digitalCircuit_Chap_10_1
No ratings yet
101_9_digitalCircuit_Chap_10_1
32 pages
CH 0
No ratings yet
CH 0
138 pages
Electronics
No ratings yet
Electronics
8 pages
CA I - Chapter 2 ISA 2 RISC V
No ratings yet
CA I - Chapter 2 ISA 2 RISC V
66 pages
Computer Architecture Unit 2 - Phase 1 PDF
No ratings yet
Computer Architecture Unit 2 - Phase 1 PDF
52 pages
Risc Arcitecture (Reduced Instructuion Set Computers) by Prateek
No ratings yet
Risc Arcitecture (Reduced Instructuion Set Computers) by Prateek
14 pages
COA 3.2_RISC_CISC
No ratings yet
COA 3.2_RISC_CISC
20 pages
PROCESSOR ORGANIZATION & PIPELINING
No ratings yet
PROCESSOR ORGANIZATION & PIPELINING
74 pages
Wk05 - CPU Architecture (Part 1)
No ratings yet
Wk05 - CPU Architecture (Part 1)
72 pages
-OAC RISC-V- Chapter_02
No ratings yet
-OAC RISC-V- Chapter_02
97 pages
Understanding Instruction Sets of Microcontrollers
No ratings yet
Understanding Instruction Sets of Microcontrollers
52 pages
Lec 1 pt2, Lec4c
No ratings yet
Lec 1 pt2, Lec4c
29 pages
Lec-27 EE-222
No ratings yet
Lec-27 EE-222
16 pages
CA I - Chapter 2 ISA 2 RISC V
No ratings yet
CA I - Chapter 2 ISA 2 RISC V
65 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Indian Constitution Introduction
No ratings yet
Indian Constitution Introduction
5 pages
Web Security - Slides
No ratings yet
Web Security - Slides
30 pages
FPGA Based System Design PPT End Sem - Student - Updated
No ratings yet
FPGA Based System Design PPT End Sem - Student - Updated
625 pages
Refeerence Paper 31
No ratings yet
Refeerence Paper 31
4 pages
Design of Low Power VLSI Architecture For Classification of Arrhythmic Beats Using DNN For Wearable Device Applications
No ratings yet
Design of Low Power VLSI Architecture For Classification of Arrhythmic Beats Using DNN For Wearable Device Applications
6 pages
Refeerence Paper 32
No ratings yet
Refeerence Paper 32
4 pages
Refeerence Paper 15
No ratings yet
Refeerence Paper 15
8 pages
Varta Opzv Range 4 Opzv 200... 24 Opzv 3000: Application Installation
No ratings yet
Varta Opzv Range 4 Opzv 200... 24 Opzv 3000: Application Installation
3 pages
RENAULT Immobilizer Emulator EML921/203 Installation Guide
No ratings yet
RENAULT Immobilizer Emulator EML921/203 Installation Guide
3 pages
Sipgear TK 550
No ratings yet
Sipgear TK 550
5 pages
Grade Card: B.Tech Degree Examination
No ratings yet
Grade Card: B.Tech Degree Examination
1 page
Mangrove Forests in India: Exploring Ecosystem Services Abhijit Mitra - Download the full set of chapters carefully compiled
100% (4)
Mangrove Forests in India: Exploring Ecosystem Services Abhijit Mitra - Download the full set of chapters carefully compiled
65 pages
Super Decathlon
No ratings yet
Super Decathlon
22 pages
SWAGAT New Presentation-29-1-15
No ratings yet
SWAGAT New Presentation-29-1-15
43 pages
Bihar Budget & Economic Survey
No ratings yet
Bihar Budget & Economic Survey
2 pages
2020 EU-wide Stress Test - Draft Templates
No ratings yet
2020 EU-wide Stress Test - Draft Templates
4,560 pages
Mfe Employment Report 17-18-1
No ratings yet
Mfe Employment Report 17-18-1
6 pages
Quantum Mechanics: Vibration and Rotation of Molecules: 5th April 2010
No ratings yet
Quantum Mechanics: Vibration and Rotation of Molecules: 5th April 2010
8 pages
Financial Analysis Formula Table
No ratings yet
Financial Analysis Formula Table
1 page
Awd 04
No ratings yet
Awd 04
4 pages
SI 47-01 Amdt
No ratings yet
SI 47-01 Amdt
20 pages
Chirurgia Tiroidei
No ratings yet
Chirurgia Tiroidei
15 pages
Shafin Structure Example
No ratings yet
Shafin Structure Example
13 pages
Diploma 6TH Sem Project
No ratings yet
Diploma 6TH Sem Project
34 pages
The Fraudulent Povedano Calendar
No ratings yet
The Fraudulent Povedano Calendar
2 pages
GT-1 Paper-1 Question Paper PDF
No ratings yet
GT-1 Paper-1 Question Paper PDF
23 pages
Project Report On "Xerox" To Analyze The Market Share of Xerox Photocopier Machine in Lucknow.
75% (8)
Project Report On "Xerox" To Analyze The Market Share of Xerox Photocopier Machine in Lucknow.
103 pages
Aaa 3250 Annual Crop Production
No ratings yet
Aaa 3250 Annual Crop Production
2 pages
Manitou MT-X 1440 A (1)
No ratings yet
Manitou MT-X 1440 A (1)
5 pages
Thigh Ultrasound Monitoring Identifies Decreases.27
No ratings yet
Thigh Ultrasound Monitoring Identifies Decreases.27
9 pages
Locomotor Non Locomotor
No ratings yet
Locomotor Non Locomotor
4 pages
ANT-ATR4518R3-0998-001 Datasheet PDF
No ratings yet
ANT-ATR4518R3-0998-001 Datasheet PDF
2 pages
MKT 627 - Nitol Motors
No ratings yet
MKT 627 - Nitol Motors
3 pages
Lateral Condyle Fracture
No ratings yet
Lateral Condyle Fracture
4 pages
Lab 13 Gas Laws
No ratings yet
Lab 13 Gas Laws
15 pages
SAMPLE Mastercam X9 Handbook Volume 2
No ratings yet
SAMPLE Mastercam X9 Handbook Volume 2
36 pages

Risc PPT Final v1

Uploaded by

Risc PPT Final v1

Uploaded by

RISC_processor_HDL_v

control din dout

Instruction Instruction ctrl Execute Write-

x86 = Complex Instruction Set Computer (ClSC)

• It is far from clear that CISC is the appropriate solution

• Load/store architecture • register-mem and mem-mem

• hand assemble to get good

Two classes of ISAs

Another ISA classification: Load/Store Architecture

main: addi x2, x0, 10 Assembly Language

A single cycle processor

• Putting it all together:

A single cycle processor

A single cycle processor

A single cycle processor

We now have enough building blocks to build

A RISC-V processor and ISA (instruction set architecture) is

Fetch Decode Execute Memory WB

A single cycle processor – this diagram is not 100% spatial

Fetch Decode Execute Memory WB

Fetch 32-bit instruction from memory

Fetch Decode Execute Memory WB

Gather data from the instruction

Fetch Decode Execute Memory WB

Useful work done here (+, -, *, /), shift, logic operation,

Fetch Decode Execute Memory WB

Used by load and store instructions only

Fetch Decode Execute Memory WB

Write to register file

• This five stage datapath is used to execute all RISC-V instructions

• Many other instructions are possible

imm rs2 rs1 funct3 imm op

Fetch Decode Execute Memory WB

Fetch Decode Execute Memory WB

op funct mnemonic description

Fetch Decode Execute Memory WB

Fetch Decode Execute Memory WB

• Control: repeat 16 times

• N cycles for n-bit divide

• Rotations are less useful…

• 16 consecutive “left-shift-by-1-or-0” blocks?

• Similar barrel designs for right shifts and rotations 68

• One cycle per instruction (CPI)

• Split datapath into multiple stages

How do we store results from ALU computations?

memory inst register

A Single cycle processor

memory inst register

A Single cycle processor

How do we use stored results in subsequent

How does a Register File work? How do we design it?

add x1, x0, M

Register File tradeoffs

+ Adding extra ports is d

e.g. 32Mb register file with f

How many logic gates/transistors?

RISC-V processor and ISA is an example of a Reduced Instruction Set

We now know the data path for the MIPS ISA:

You might also like