Lecture2.2 ARM Instruction Set Architecture
Lecture2.2 ARM Instruction Set Architecture
CPU: Week
ARM Cortex-M 4
Curriculum Memory
and Interfaces
Week
5-6
Real-time Week
Operating systems 10-12
Project Week
15
Outline
• Basic Processor Architecture
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow control operations.
• Summary 3
Intelligent in your pocket!
4
Since 2012, virtually all PDAs and smartphones contain ARM CPUs, and
ARMs account for 75 percent of all 32-bit embedded systems and 90
percent of embedded RISC systems!
5
ARM Doesn’t Actually Produce Microprocessors
6
Which architecture is my processor?
What are differences between ARM7 and ARMv7?
7
Which architecture is my processor?
Processor core Architecture
• ARM7TDMI family v4T
– ARM720T, ARM740T
• ARM9TDMI family v4T
– ARM920T,ARM922T,ARM940T
• ARM9E family v5TE, v5TEJ
– ARM946E-S, ARM966E-S, ARM926EJ-S
• ARM10E family v5TE, v5TEJ
– ARM1020E, ARM1022E, ARM1026EJ-S
• ARM11 family v6
– ARM1136J(F)-S
– ARM1156T2(F)-S v6T2
– ARM1176JZ(F)-S v6Z
• Cortex family (2004)
– ARM Cortex-A8 v7A
– ARM Cortex-R4 v7R
– ARM Cortex-M3 v7M
10
ARM Cortex Profile
The ARM Cortex family includes processors based on the three distinct profiles
of the ARMv7 architecture.
The A profile for sophisticated, high-end Applications running open and
complex operating systems
The R profile for Real-time systems
The M profile optimized for cost-sensitive and Microcontroller
applications
11
ARM Cortex-M Series
Cortex M0
• 32-bit RISC processor
• 3-stage pipeline von
Neumann architecture
• ARMv6-M architecture
• 16-bit Thumb instruction
set with Thumb-2
technology.
• Load-Store Architecture
• 56 Assembly Instructions
• Low power support
14
ARM Cortex-M Series
Cortex M0+
• 32-bit RISC processor
• 2-stage pipeline enabling faster
branch instruction execution with
fewer clock cycles and minimizes
power consumption
• ARMv6-M architecture
• The most energy efficient ARM
processor
• Cortex-M compatibility – 100%
compatible with Cortex-M0
instruction set and a subset the
Cortex-M3/M4 instruction set
• Ultra-Low power
15
ARM Cortex-M Series
Cortex M3
• 32-bit RISC processor
• 3-stage pipeline for high
performance embedded
system while providing low
power advantages.
• ARMv7-M architecture
• Thumb® /Thumb-2
instruction set.
• High-performance low-cost
platforms
16
ARM Cortex-M Series
Cortex M4
• 32-bit RISC processor
• 3-stage pipeline for high
performance embedded system
while providing low power
advantages.
• ARMv7-M architecture
• Thumb® /Thumb-2 instruction set.
• Combination of high-efficiency
signal processing functionality
with the low-power
17
ARM Cortex-M Series
• ARM is a RISC architecture
– Most instructions execute in a single cycle
• ARM is a 32-bit load / store architecture
– operations cannot be performed directly on memory locations
– operands must be loaded into the CPU and results are stored back to main memory
• When used in relation to the ARM:
– Halfword means 16 bits (two bytes)
– Word means 32 bits (four bytes)
– Doubleword means 64 bits (eight bytes)
• The earlier ARMs implement two instruction sets:
– 32-bit ARM Instruction Set
– 16-bit Thumb Instruction Set
• Latest ARM cores (e.g. Cortex-M) introduce a new instruction set called Thumb-2
– Provides a mixture of 32-bit and 16-bit instructions
– Maintains code density with increased flexibility
18
ARM Cortex-M Series
19
ARM Cortex-M Series
20
ARM Cortex-M Series
Comparing 16-bit Multiply Operations Across Processor Architectures
21
ARM Cortex-M Series
Cortex-M Advantages:
1. Energy efficiency
2. Smaller code
3. Ease of use
4. High performance
22
ARM Cortex-M Series
23
ARM Cortex-M Series
Compatibility between Cortex-M processors
24
ARM Cortex-M Series
• Bit-band feature
• Assembly Instructions
25
25
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow control operations.
• Summary
26
ARM data types
28
Operation Modes & States
• Thread mode:
– Executes application software. The processor enters Thread mode
when it comes out of reset.
• Handler mode:
– Handles exceptions. The processor returns to Thread mode when it
has finished all exception processing.
29
Operation Modes & States
• Unprivileged access level: The software
– has limited access to system registers using the MSR and MRS instructions,
and cannot use the CPS instruction to mask interrupts
– cannot access the system timer, NVIC, or system control block
– might have restricted access to memory or peripherals.
• Privileged:
– The software can use all the instructions and has access to all resources.
30
Core Registers
• R0-R12 - General purpose, for data processing
• SP - Stack pointer (R13)
– Can refer to one of two SPs
• Main Stack Pointer (MSP)
• Process Stack Pointer (PSP)
– Uses MSP initially, and in Handler mode
– In Thread mode, can select either MSP or PSP
using SPSEL flag in CONTROL register.
• LR - Link Register (R14)
– Holds return address when called with Branch &
Link instruction (B&L)
• PC - program counter (R15)
33
Program Counter (r15)
• When the processor is executing in ARM state:
– All instructions are 32 bits wide
– All instructions must be word aligned
– Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined
(because ARM instructions cannot be halfword or byte aligned)
34
Special Registers
• Contain the processor status and define the operation
states and interrupt/exception masking
– development of simple applications do not require access to
these registers
– are needed for development of an embedded OS
• Special registers are not memory mapped, and can be
accessed using special register access instructions such as
MSR and MRS:
MRS <reg>, <special_reg>; Move special register into register
MSR <special_reg>, <reg>; Move register to special register
35
Program Status Registers
xPSR - combined Program Status Register (PSR) provides
information about program execution and the ALU flags.
• The APSR contains the ALU flags to
control conditional branches
• ERSR cannot be accessed by
software code
• IPSR is read only, contains the
current executing ISR (Interrupt
Service Routine) number
36
Program Status Registers
Cortex-M0 and Cortext-M0+
37
Program Status Registers
39
Other Special Registers
Cortex-M
Note: The FAULTMASK and BASEPRI registers are not available in ARMv6-M
41
Stack
• Memory location is orgaized as a LIFO (Last In, First Out):
– Stack pointer indicates the last stacked item on the stack memory.
– Thread mode can use MSP (Main Stack Pointer) by default, or PSP
(Process Stack Pointer).
45
ARM memory organization
Memory model
• 32-bit address space:
– Linear 4 GB address space removes complex paging schemes,
simplifying software architecture
• Virtual memory is not supported in ARMv6-M.
• support fetching instructions from 16-bit flash memories
– make the microcontroller design simpler, smaller, and consequently
cheaper
• Data accesses are always naturally aligned
46
ARM memory organization
The Cortex-M memory map
47
ARM memory organization
The Cortex-M memory map and buses
48
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow controling operations.
• Summary
49
ARM and Thumb Instructions
https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
– Coditional Codes
– Load/store instructions
• ARM flow controling operations.
• Summary
52
Data processing instructions
53
Data processing Instructions
• The data processing instructions only work on registers, NOT
memory.
• Instruction Format:
<Operation>{<cond>}{S} Rd, Rn, Operand2
• <Operation>: 3-letter mnemonic containts code of operation
– ADD, SUB, ...
• {<cond>}: Optional two-letter condition code
– ADDEQ r0, r1, r2
• {S}: Optional additional flag
– ADDS r0, r1, r2
• Rd: Destination register
– ADD r0, r1, r2
• Rn: First operand register
– ADD r0, r1, r2
54
Data processing Instructions
• The data processing instructions only work on registers, NOT
memory.
• Instruction Format:
<Operation>{<cond>}{S} Rd, Rn, Operand2 Operand Operand
• Operand 2 can be a register or an (8-bit) immediate value 1 2
– SUB r0, r1, r2 (r0 = r1 - r2)
– ADD r1, r4, #0xFF (r1 = r4 + 0xFF)
Barrel
• Comparisons set flags only - they do not specify Rd Shifter
– CMP r0, r3
• Data movement does not specify Rn
– MOV r0, r1
• Operand 2 is sent to the ALU via barrel shifter ALU
Arithmetic: Logical:
– ADD, ADC : add (w. • AND:
carry)
• ORR:
– SUB, SBC : subtract (w.
carry) • EOR: Exclusive OR
– RSB, RSC : reverse • BIC : bit clear
subtract (w. carry) – BIC r0,r1,r2 ; r0 = r1 and (not r2).
56
Data processing Instructions
58
ARM data instructions
• Shift operations are used as part of data processing instructions.
– bits can be shifted from 0-31 places, typically without performance penalty
LSL: Logical Shift Left ASR: Arithmetic Shift Right
CF Destination 0
Destination CF
Multiplication by a power of 2
Division by a power of 2, preserving the sign bit
Division by a power of 2 Bit rotate with wrap around from LSB to MSB
Destination CF
62
Multiply and Divide
• There are 2 classes of multiply - producing 32-bit and 64-bit results
• 32-bit versions on an ARM7TDMI will execute in 2 - 5 cycles
63
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM instruction set
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM data processing operations.
• Coditional Codes
• Load/store instructions
– ARM flow of control.
• Summary
64
Codition Codes
• All operations can be performed conditionally by testing N, Z,
C, and V flags in xPSR:
– Note AL is the default and does not need to be specified
• By default, data processing instructions do not affect the condition flags but this
can be achieved by post fixing the instruction (and any condition code) with an
“S”
loop
ADD r2, r2, r3 r2=r2+r3
SUBS r1, r1, #0x01 decrement r1 and set flags
BNE loop if Z flag clear then branch
67
Conditional execution examples
5 instructions 3 instructions
5 words 3 words
5 or 6 cycles 3 cycles
68
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM instruction set
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM data processing operations.
• Codition Codes
• Load/store memory access instructions
– ARM flow controlling operations.
• Summary
69
ARM load/store instructions
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
• Syntax:
– LDR{<cond>}{<size>} Rd, <address>
– STR{<cond>}{<size>} Rd, <address>
70
ARM load/store instructions
• Addressing modes: Address accessed by LDR/STR is
specified by a base register with an offset
– register indirect : LDR r0,[r1]
• loads r0 from the address given by r1
– with second register : LDR r0,[r1,-r2]
• loads r0 from the address given by r1 – r2
– with constant : LDR r0,[r1,#4]
• loads r0 from the address r1 + 4.
– Auto-indexing increments base register: LDR r0,[r1,#4]!
• R1 = r1 + 4 and loads r0 from the address r1
– Post-indexing fetches, then does offset: LDR r0,[r1],#4
• Loads r0 from r1, then adds 4 to r1.
72
Loading 32 bit constants
• To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
– LDR rd, =const
• This will either:
– Produce a MOV or MVN instruction to generate the value (if possible)
or
– Generate a LDR instruction with a PC-relative address to read the constant from
a literal pool (Constant data area embedded in the code)
• For example
– LDR r0, =0xFF => MOV r0, #0xFF
– LDR r0, =0x55555555 => LDR r0, [PC, #Imm12]
…
…
DCD 0x55555555
• This is the recommended way of loading constants into a register
74
Example: C assignments
• C:
x = (a + b) - c;
• Assembly:
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
LDR r4,=b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
LDR r4,=c ; get address for c
LDR r2, [r4] ; get value of c
76
C assignment, cont’d.
77
Example: C assignment
• C:
y = a*(b+c);
• Assembly:
LDR r4,=b ; get address for b
LDR r0,[r4] ; get value of b
LDR r4,=c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
78
C assignment, cont’d.
79
Example: C assignment
• C:
z = (a << 2) | (b & 15);
• Assembler:
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0, LSL 2 ; perform shift
LDR r4,=b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
80
C assignment, cont’d.
81
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM instruction set
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM data operations.
• Codition Codes
• Load/store instructions
– ARM flow controlling operations.
• Summary
82
Branch Instructions
• Branch instructions have the following format:
– B{L}{<cond>} label
– subroutine calls can be made by specifying the optional {L}
– a 24 bit address offset field is part of the instruction encoding
• On execution this is left shifted 2 places (since ARM instructions are always word
aligned) to give a 26 bit value, thus giving a relative branch range of ± 32 MB
– Causes a pipeline flush
83
Example: if statement
• C:
if (a < b) { x = 5; y = c + d; } else x = c - d;
• Assembly:
; compute and test condition
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
LDR r4,=b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block
84
If statement, cont’d.
; false block
fblock LDR r4,=c ; get address for c
LDR r0,[r4] ; get value of c
LDR r4,=d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute c-d
LDR r4,=x ; get address for x
STR r0,[r4] ; store value of x
after ...
86
Example: Conditional instruction implementation
87
Example: Conditional instruction implementation
; true block: conditional
MOVLT r0,#5 ; generate value for x
LDRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
LDRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
LDRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
LDRLT r4,y ; get address for y
STRLT r0,[r4] ; store y
88
Example: Conditional instruction implementation
; false block
LDRGE r4,=c ; get address for c
LDRGE r0,[r4] ; get value of c
LDRGE r4,=d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute c-d
LDRGE r4,=x ; get address for x
STRGE r0,[r4] ; store value of x
89
Example: switch statement
• C:
switch (test) { case 0: … break; case 1: … }
• Assembly:
LDR r2,=test ; get address for test
LDR r0,[r2] ; load value for test
LDR r1,=switchtab ; load address for switch table
LDR r15,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
case0
ADD r0, r1, r2 ; Operation 0
B Endcase ; Break
case1
SUB r0, r1,
Endcase
90
...
Example: FIR filter (1)
• C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
• Assembly
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
LDR r2,=N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f
91
Example: FIR filter (2)
92
Subroutines
• Implementing a conventional subroutine call requires two steps:
– Store the return address
– Branch to the address of the required subroutine
• These steps are carried out in one instruction, BL
– The return address is stored in the link register (lr/r14)
– Branch to an address anywhere within a +/- 32MB range
• Returning is performed by restoring the program counter (pc) from lr
– MOV r15,r14
– or BX Lr
func1 func2
void func1 (void)
{
: :
BL func2 :
func2();
:
: BX lr
}
93
Generating Branches with LDR
• The ARM’s branch instruction is limited to a range of ±32MB
– However branches can also be performed by loading address values directly into
the PC (r15)
– armasm provides pseudo instructions to make this easier
Assembler Code
LDR pc, =label ; load address of label into PC
ARMASM
95
Memory Block Copying (1)
• The use of base register updating enables simple copying routines to be written
– For example: The post-indexed variant could be used to copy a block of memory
Increasing
; r8 points to start of source data Memory
; r9 points to end of source data
; r10 points to start of destination data
96
Load and Store Multiples
• Syntax:
– <LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
• 4 addressing modes:
LDMIA / STMIA
– increment after
LDMIB / STMIB
– increment before
LDMDA / STMDA decrement after
–
LDMDB / STMDB decrement before
–
LDMxx r10, {r0,r1,r4} IA IB DA DB
; loads r0, r1, r4 from the address r10 r4
STMxx r10, {r0,r1,r4} r4 r1
; store r0, r1, r4 to the address r10 r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
97
Memory Block Copying (2)
• As well as being used for stack operations, the STM / LDM instructions can perform block
copying of memory
Increasing
• For example Memory
; r8 points to start of source data
; r9 points to end of source data
; r10 points to start of destination data
98
Stacks
The stack is a last-in-first-out (LIFO) queue:
The top of the stack is the most recently allocated space
ARM stack grows down to lower addresses in memory
The stack pointer, SP (R13), points to the top of the stack
Memory Memory
The stack (a) before expansion and (b) after two-word expansion 99
Stacks
ARM stack operations are implemented by block transfer instructions:
STMFD (Push) Store Multiple - Full Descending stack [STMDB]
LDMFD (Pop) Load Multiple - Full Descending stack [LDMIA]
Note: Multiple registers will always be stacked in register order from lowest register to lowest
memory location
The order registers are specified has no effect.
Top of Memory
9753 9753
8420 8420
1234 1234 pc 8034
9020
lr 8034 Old SP 1010 SP 1010 lr 9048
SP
8034
FFFF 8034
Stack frame
r7 A0BE A0BE
16 AOBE
A0BE r7 A0BE
12
r6 1234 1234
102E 1234 r6 12340
r5 FF 8765
FF FF r5 14544
FF
r4 100 SP ABCD
100 SP 100 r4 100
1
100
xPSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0
• In User Mode, all bits can be read but only the condition flags (_f) can be
modified
101
Example Source File
Defines start of a read-only area, called “ARMEX”, containing code
102
Summary
• Computer architecture taxonomy are in common use today
– von Neumann vs. Harvard
– CISC vs. RISC
– Superscalar vs. VLIW
• The programming model is a description of the architecture
relevant to instruction operation.
• ARM Instruction set
– Load/store architecture
– Most instructions are RISC, operate in single cycle.
• Some multi-register operations take longer.
– All instructions can be executed conditionally.
103
Quiz (1)
104
Quiz (2)
Q5: Answer the following questions about the ARM programming model:
• a. How many general-purpose registers are there?
• b. What is the purpose of the xPSR?
• c. What is the purpose of the Z bit?
• d. Where is the program counter kept?
105
Quiz (3)
Q6: Write an ARM instruction which will implement each of the following:
a) r0 = 16
b) r0 = r1 / 16 (signed numbers)
c) r1 = r2 * 3
d) r0 = -r0
Q7: Explain the operation of the BL instruction, including the state of ARM
registers before and after its operation.
Q8: Which data processing instructions always set the condition flags?
106
Quiz (4)
107