Introduction to Processor Design
&
The ARM Architecture
ARM Applications
History
RISC Idea from Stanford & Berkeley
Universities-1980
Stored Program Digital
Computer(Principle)
1940’s concept was started (Princeton)
1948 implemented ‘Baby’ Machine
which ran at Manchester University,
England
Computer Architecture
Describes Users view of the
Computer
Eg.
– Instruction Set,
– Visible Registers,
– Memory Management Table Structure,
– Exception Handling Models etc
Computer Organization
Describes User Invisible
Implementation of the Architecture
Eg.
– Pipeline Structure,
– Transparent Cache,
– Translation Look Aside Buffers etc
What is a Processor?
Finite State Automation
Executes Instructions held in
Memory
State depends on values hold by
registers & memory
Instruction Set Design
4-Address Instructions, add d,s1,s2,
nextAdd
3-Address Instructions, add d,s1,s2
2-Address Instructions, add d,s1
1-Address Instructions, add s1
0-Address Instructions, add ; top of
stack
Instructions Types
Data Processing
Data Movement
Control Flow
Special Instructions
– Eg. Switching to privileged mode
How will u improve the Processor
Performance?
Instruction Type Dynamic Range
Data Movement 43%
Control Flow 23%
Arithmetic 15%
Comparisons 13%
Logical 5%
Other 1%
How will u improve the Processor
Performance?
Pipeline
Cache
Super Scalar Architecture (multiple
instructions are executed by
dispatching them to non functional
units)
Pipelines
Fetch
Decode
Register Access
ALU
Memory, if necessary
Write Back
Pipeline Hazards
Instruction
1 Fetch Dec Reg ALU Mem Res
2 Fetch Dec Reg ALU Mem Res
Instruction
1 Fetch Dec Reg ALU Mem Res
2 Fetch Dec Reg ALU Mem Reg
Branch Instructions?
RISC Architecture
RISC CISC
Fixed width instructions Variable length instructions
Few formats of instructions Several formats of
instructions
Load/Store Architecture Memory values can be used
as operands in instructions
Large Register bank Small Register Bank
Instructions are pipelinable Cannot pipeline instructions
RISC Organization
RISC CISC
Hardwired instruction Microcode ROMS
decode instruction decoder
Single cycle execution of Multi cycle execution on
instruction instruction
RISC Advantages
A Smaller Die Size
A Shorter Development Time
Higher Performance (Bit Tricky)
– Smaller things have higher natural
frequencies
RISC Disadvantages
Generally poor code density (Fixed
Length Instruction)
ARM History
ARM – Acorn RISC Machine(1983–1985)
– Acorn Computers Limited, Cambridge,
England
ARM – Advanced RISC Machine 1990
– ARM Limited, 1990
– ARM has been licensed to many
semiconductor manufacturers
ARM Architect
Steve Furber
[email protected]
Father of ARM
Architectural Inheritance
When first ARM chip was designed
examples of other RISC architectures were
– Berkeley RISC I & II
– MIPS
Earlier Machines did share some of the
features
– PDP-8
– Cray-1
– IBM 801
Semiconductor Partners
Features Used from Berkeley RISC
A Load/Store Architecture
Fixed Length 32-bit Instructions
3- Address Instruction Formats
Features Rejected from Berkeley RISC
Delayed Branches
– Branches cause problem in Pipelines
– Most RISC Processor wait for execution
of branch
– Original ARM did not use delayed
Branching bcoz it makes exception
handling complex
– later helped simplify the re-
implementation of the Architecture
Features Rejected from Berkeley RISC
Single Cycle Execution of ALL
Instructions
– Single Memory for Instruction & Data
– Even a simple load/store will require at
least two cycles
– Separate Data & Instruction was the
solution but was too costly those times
These GUYS used the extra cycle for
something useful such as supporting
auto-indexing
The ARM Programmers Model
When writing user level programs only
– 15-general purpose 32-bit registers(r0-
r14) &
– the Program Counter (r15) &
– the CPSR (Current Program Status
Register) need to be considered
The remaining registers are only for
system level programming & for
handling exceptions
Support for ARM Modes of Operations
CPSR
In user level programs uses CPSR to
store the condition code bits
– N Negative
– C Carry
– Z Zero
– V Overflow
The bottom bits are protected by the
user level program
– I, F, T, mode[4:0]
CPSR – Current Program Status Register
31 2827 8 7 6 5 4 0
NZCV unused IF T mode
The Memory System
Memory may be viewed as linear array
of bytes number from 0 to 2^32 –1
Data Bytes may be 8-bit (B), 16-bit
(HW), or 32-bit (W)
Words are always aligned at 4-byte
boundaries i.e least two bits are zero
Half Words are aligned on even
boundaries
ARM Memory Organization
bit 31 bit 0
23 22 21 20
19 18 17 16
word16
15 14 13 12
half-word14
half-word12
11 10 9 8
word8
7 6 5 4
byte6half-word4
3 2 1 0
byte
byte3byte2
byte1
byte0 address
Load-Store Architecture
Data Processing Instructions
Data Transfer Instructions
Control Flow Instruction
ARM Exceptions
ARM supports range of Interrupts,
Traps, Supervisor Calls, all grouped
under general heading of Exceptions
PC is saved in r14 (link register) and
CPSR into SPSR for thr exception
type
Exception Priorities
1. Reset (Highest Priority)
2. Data Abort
3. FIQ
4. IRQ
5. Prefetch Abort
6. SWI – Including absent coprocessor
Exception Vector Addresses
Exception Mode Vector Address
Reset SVC 0x 0000 0000
Undefined UND 0x 0000
0004
Instruction
Software Interrupt AVC 0x 0000
0008
Prefetch Abort Abort 0x 0000
000C
Data Abort Abort 0x 0000 0010
IRQ (Normal Interrupt) IRQ 0x 0000
0018
FIQ (Fast interrupt) FIQ 0x 0000 001C
The I/O System
Handles I/O as memory mapped devices
with interrupt support
Internal registers appear as
addressable locations
Attention of ARM attracted by normal
interrupt (IRQ) or by fast interrupt
(FIQ)
er
Instru
ctions Agenda
Contro
l Flow
Instru
ctions
Writin
g
Simple
Assem
bly
the
data
Data
values
Processing Instructions
in
ARM
Typica
lly
requir
e two
operan
ds &
produc
there
is any,
Rules for Data Processing
is 32-
bits Instructions
wide
and is
placed
in a
registe
r
(Except
ion:
Long
Multipli
Regist
er Operands in Data Processing
Opera
nds
Imme
diate
Opera
nds
Shifte
d
Regist
tions
Bit-
wise Data Processing Operations
Opera
tions
Regist
er
Move
ment
Opera
tions
Arithmetic Operations
ADD r0, r1, r2 r0 := r1 + r2
ADC r0, r1, r2 r0 := r1 + r2 + C
SUB r0, r1, r2 r0 := r1 - r2
SBC r0, r1, r2 r0 := r1 - r2 + C - 1
RSB r0, r1, r2 r0 := r2 – r1
RSC r0, r1, r2 r0 := r2 – r1 + C - 1
Bit-wise Logical Operations
AND r0, r1, r2 r0 := r1 and r2
ORR r0, r1, r2 r0 := r1 or r2
EOR r0, r1, r2 r0 := r1 xor r2
BIC r0, r1, r2 r0 := r1 and (not) r2
Register Movement Operations
MOV r0, r2 r0 := r2
MVN r0, r2 r0 := not r2
Comparison Operations
CMP r1, r2 set cc on r1 - r2
CMN r1, r2 set cc on r1 + r2
TST r1, r2 set cc on r1 and r2
TEQ r1, r2 set cc on r1 xor r2
– ADD
r3, Immediate Operands
r3,
#1
;
r3 :=
r3 +
1
– AND
r8,
r7,
#&ff
;
Shift Register Operands
Second register operand is subjected
to shift before it is combined with
first operand
ADD r3, r2, r1, LSL #3 ; r3 := r2 +
(r1*8)
Shift
Right
ASL-
ARM Shift Operations
Arith
metic
Shift
Left
ASR-
Arith
metic
Shift
Right
LSL, LSR, ASL, ASR, ROR, RRX
31 0 31 0
00000 00000
LSL #5 LSR #5
31 0 31 0
0 1
00000 0 1 1111 1
ASR #5 , positive operand ASR #5 , negative operand
31 0 31 0
C
C C
ROR #5 RRX
specif
y the Shift Value in Register
numbe
r of
bits
the
second
operan
d
should
be
shifte
risons
a
special
Setting the Condition Codes
reques
t
needs
to be
made
At
assem
bly
level
the
reques
t is
made
seco
nd
oper Multiplies
and
not
supp
orte
d
– The
resul
t
regis
ter
must
Single Register Load & Store
Data Transfer Instructions
transfer of a data item (byte, half-word,
–
word)
between ARM registers and memory
Multiple Register Load & Store
– enable transfer of large quantities of data
– used for procedure entry and exit, to
save/restore workspace registers, to copy
blocks of data around memory
Single Register Swap Instructions
– allow exchange between a register and memory
in one instruction
– used to implement semaphores to ensure
mutual exclusion on accesses to shared data in
multis
Register-Indirect Addressing
LDR r0, [r1] r0 := mem32[r1]
STR r0, [r1] mem32[r1] := r0
Note: r1 keeps a word
address (2 LSBs are 0)
Offset upto 4KBytes
Pre Indexed Addressing
LDR r0, [r1, #4] r0 := mem32[r1]
Post Indexed Addressing
LDR r0, [r1], #4 r0 := mem32[r1]
r1 := r1 + 4
Auto Indexing Addressing
LDR r0, [r1, #4]! r0 := mem32[r1 + 4]
r1 := r1 + 4
Where do I use this?
2
Algori
thm:
Exercise
– Point
er to
Tabl
e1
– Point
er to
Tabl
e2
– Load
[Tabl
e1]
– Stor
e
Answer
COPY: ADR r1, TABLE1 ; r1 points to TABLE1
ADR r2, TABLE2 ; r2 points to TABLE2
LOOP: LDR r0, [r1]
STR r0, [r2]
ADD r1, r1, #4
ADD r2, r2, #4
...
TABLE1: ...
TABLE2:...
Better Answer
COPY: ADR r1, TABLE1 ; r1 points to TABLE1
ADR r2, TABLE2 ; r2 points to TABLE2
LOOP: LDR r0, [r1], #4
STR r0, [r2], #4
...
TABLE1: ...
TABLE2:...
quanti
ty of
data
Multiple Register Transfer
needs
to be
transf
erred
But
there
is a
trade
off, i.e
Example Multiple Transfer
LDMIA r1, {r0, r2, r5} r0:=mem32[r1]
r2 := mem32[r1 + 4]
r5 := mem32[r1 + 8]
Base Address should be Word Aligned
Order of Registers do not matter
Normal practice to specify in increasing order
Including r15 is also possible
locatio
ns
Exercise
0x800
0-
2000
&
0x800
0-
2001?
Check
the
questi
covert
the
followi Exercise
ng C
State
ments
–X=
A+B
–X=
A–B
–X=B
–A
–X=
A+
B*4
–X=