0% found this document useful (0 votes)
6 views

Lecture2.2 ARM Instruction Set Architecture

The document outlines the Embedded Systems course (ELT3240) at VietNam National University, covering topics such as ARM architecture, instruction sets, and real-time operating systems. It details the curriculum schedule, including weeks dedicated to processor architecture, embedded software, and interfacing with real-world applications. Additionally, it discusses the ARM Cortex family, emphasizing the differences between various Cortex-M processors and their applications in embedded systems.

Uploaded by

22022192
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture2.2 ARM Instruction Set Architecture

The document outlines the Embedded Systems course (ELT3240) at VietNam National University, covering topics such as ARM architecture, instruction sets, and real-time operating systems. It details the curriculum schedule, including weeks dedicated to processor architecture, embedded software, and interfacing with real-world applications. Additionally, it discusses the ARM Cortex family, emphasizing the differences between various Cortex-M processors and their applications in embedded systems.

Uploaded by

22022192
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

VietNam National University

University of Engineering and Technology

EMBEDDED SYSTEM FUNDAMENTALS


(ELT3240, NHẬP MÔN HỆ THỐNG NHÚNG)

Dr. Nguyễn Kiêm Hùng


Email: [email protected]
Introduction to VietNam National University
Week
Embedded Systems 1-2
University of Engineering
Introduction to CandWeek
Technology
Languague 3

CPU: Week
ARM Cortex-M 4

Curriculum Memory
and Interfaces
Week
5-6

Path ARM-based Week


7
Embedded System

Embedded Software Week


8-9

Real-time Week
Operating systems 10-12

Interfacing Embedded Week


With Real-World 13-14

Project Week
15
Outline
• Basic Processor Architecture
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow control operations.
• Summary 3
Intelligent in your pocket!

MID are changing your life…

4
Since 2012, virtually all PDAs and smartphones contain ARM CPUs, and
ARMs account for 75 percent of all 32-bit embedded systems and 90
percent of embedded RISC systems!

5
ARM Doesn’t Actually Produce Microprocessors

6
Which architecture is my processor?
What are differences between ARM7 and ARMv7?

• An architecture is a design that define the


programmer’s model, covering all aspects of the
design
– registers, addressing modes, memory architecture, basic
operations, etc.
– modifications from previous architectures
• A processor is a device that is silicon implementation of
architecture
– depends on an architecture but also adds other features
(e.g. pipeline stages, cache size)

7
Which architecture is my processor?
Processor core Architecture
• ARM7TDMI family v4T
– ARM720T, ARM740T
• ARM9TDMI family v4T
– ARM920T,ARM922T,ARM940T
• ARM9E family v5TE, v5TEJ
– ARM946E-S, ARM966E-S, ARM926EJ-S
• ARM10E family v5TE, v5TEJ
– ARM1020E, ARM1022E, ARM1026EJ-S
• ARM11 family v6
– ARM1136J(F)-S
– ARM1156T2(F)-S v6T2
– ARM1176JZ(F)-S v6Z
• Cortex family (2004)
– ARM Cortex-A8 v7A
– ARM Cortex-R4 v7R
– ARM Cortex-M3 v7M

 Note: Implementations of the same architecture can be very different


 ARM7TDMI - arch v4T. Von Neuman core with 3 stage pipeline
 ARM920T - arch v4T. Harvard core with 5 stage pipeline and MMU
8
ARM processor family

10
ARM Cortex Profile
The ARM Cortex family includes processors based on the three distinct profiles
of the ARMv7 architecture.
 The A profile for sophisticated, high-end Applications running open and
complex operating systems
 The R profile for Real-time systems
 The M profile optimized for cost-sensitive and Microcontroller
applications

11
ARM Cortex-M Series

Cortex-M is a complete Microcontroller Unit (MCU) architecture,


not just a CPU core. 12
ARM Cortex-M Series

The most common used five different variants of Cortex-M profile


13
ARM Cortex-M Series

Cortex M0
• 32-bit RISC processor
• 3-stage pipeline von
Neumann architecture
• ARMv6-M architecture
• 16-bit Thumb instruction
set with Thumb-2
technology.
• Load-Store Architecture
• 56 Assembly Instructions
• Low power support

14
ARM Cortex-M Series
Cortex M0+
• 32-bit RISC processor
• 2-stage pipeline enabling faster
branch instruction execution with
fewer clock cycles and minimizes
power consumption
• ARMv6-M architecture
• The most energy efficient ARM
processor
• Cortex-M compatibility – 100%
compatible with Cortex-M0
instruction set and a subset the
Cortex-M3/M4 instruction set
• Ultra-Low power

15
ARM Cortex-M Series
Cortex M3
• 32-bit RISC processor
• 3-stage pipeline for high
performance embedded
system while providing low
power advantages.
• ARMv7-M architecture
• Thumb® /Thumb-2
instruction set.
• High-performance low-cost
platforms

16
ARM Cortex-M Series
Cortex M4
• 32-bit RISC processor
• 3-stage pipeline for high
performance embedded system
while providing low power
advantages.
• ARMv7-M architecture
• Thumb® /Thumb-2 instruction set.
• Combination of high-efficiency
signal processing functionality
with the low-power

17
ARM Cortex-M Series
• ARM is a RISC architecture
– Most instructions execute in a single cycle
• ARM is a 32-bit load / store architecture
– operations cannot be performed directly on memory locations
– operands must be loaded into the CPU and results are stored back to main memory
• When used in relation to the ARM:
– Halfword means 16 bits (two bytes)
– Word means 32 bits (four bytes)
– Doubleword means 64 bits (eight bytes)
• The earlier ARMs implement two instruction sets:
– 32-bit ARM Instruction Set
– 16-bit Thumb Instruction Set
• Latest ARM cores (e.g. Cortex-M) introduce a new instruction set called Thumb-2
– Provides a mixture of 32-bit and 16-bit instructions
– Maintains code density with increased flexibility

18
ARM Cortex-M Series

Instruction set comparison between Cortex-M


processors and ARM7TDMI:

19
ARM Cortex-M Series

Instruction set comparison between Cortex-M


processors and ARM7TDMI:

20
ARM Cortex-M Series
Comparing 16-bit Multiply Operations Across Processor Architectures

21
ARM Cortex-M Series

Cortex-M Advantages:
1. Energy efficiency

2. Smaller code

3. Ease of use

4. High performance

22
ARM Cortex-M Series

Range of Instructions in Different Cortex-M


Processors:

23
ARM Cortex-M Series
Compatibility between Cortex-M processors

Von Neunman Harvard

24
ARM Cortex-M Series

How about Compatibility between the Cortex-M0


and Cortex-M3 processors ?

switching the device driver library • NVIC (Nested Vectored


ARMv6-M ARMv7-M Interrupt Controller) and

SCB (System Control


Block) registers:
• AVAILABILITY?
• 8/16-bit TRANSFER
MODES.

• Bit-band feature
• Assembly Instructions
25
25
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow control operations.
• Summary
26
ARM data types

• Word is 32 bits long.


• Word can be divided into four 8-bit bytes.
• ARM addresses can be 32 bits long.
• An address refers to a byte, not a word, in memory.
– Word 0 is at address of byte 0, word 1 is at address of
byte 4, etc.
• ARM processor can be configured at power-up as
either little-endian or big-endian mode.

28
Operation Modes & States
• Thread mode:
– Executes application software. The processor enters Thread mode
when it comes out of reset.
• Handler mode:
– Handles exceptions. The processor returns to Thread mode when it
has finished all exception processing.

29
Operation Modes & States
• Unprivileged access level: The software
– has limited access to system registers using the MSR and MRS instructions,
and cannot use the CPS instruction to mask interrupts
– cannot access the system timer, NVIC, or system control block
– might have restricted access to memory or peripherals.
• Privileged:
– The software can use all the instructions and has access to all resources.

30
Core Registers
• R0-R12 - General purpose, for data processing
• SP - Stack pointer (R13)
– Can refer to one of two SPs
• Main Stack Pointer (MSP)
• Process Stack Pointer (PSP)
– Uses MSP initially, and in Handler mode
– In Thread mode, can select either MSP or PSP
using SPSEL flag in CONTROL register.
• LR - Link Register (R14)
– Holds return address when called with Branch &
Link instruction (B&L)
• PC - program counter (R15)

33
Program Counter (r15)
• When the processor is executing in ARM state:
– All instructions are 32 bits wide
– All instructions must be word aligned
– Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined
(because ARM instructions cannot be halfword or byte aligned)

• When the processor is executing in Thumb state:


– All instructions are 16 bits wide
– All instructions must be halfword aligned
– Therefore the pc value is stored in bits [31:1] with bit [0] undefined
(because Thumb instructions cannot be byte aligned)

34
Special Registers
• Contain the processor status and define the operation
states and interrupt/exception masking
– development of simple applications do not require access to
these registers
– are needed for development of an embedded OS
• Special registers are not memory mapped, and can be
accessed using special register access instructions such as
MSR and MRS:
MRS <reg>, <special_reg>; Move special register into register
MSR <special_reg>, <reg>; Move register to special register

35
Program Status Registers
xPSR - combined Program Status Register (PSR) provides
information about program execution and the ALU flags.
• The APSR contains the ALU flags to
control conditional branches
• ERSR cannot be accessed by
software code
• IPSR is read only, contains the
current executing ISR (Interrupt
Service Routine) number

36
Program Status Registers
Cortex-M0 and Cortext-M0+

37
Program Status Registers

• Condition code flags • T Bit: Thumb state


– N = Negative result from ALU – Always T = 1; trying to clear this bit
will cause a fault exception
– Z = Zero result from ALU
– C = Result from ALU > 32 bits • Exception Number
– V = Result from ALU > 31 bits – Indicates which exception the
procesor is handling
• Sticky Overflow flag - Q flag
– Not available in ARMv6-M
– Indicates if saturation has occurred
• ICI/IT bits  New bits in ARMv7E-M
– Interrupt-Continuable Instruction  GE[3:0] used by some SIMD
(ICI) bits, IF-THEN instruction instructions
status bit for conditional execution
of Thumb2 instruction groups
38
Program Status Registers

• Compare instructions automatically update


the xPSR
• Most other instructions (e.g. arithmetic,
logical, or shifting operations) do not
automatically update the xPSR but can be
forced to by adding the “S” directive after the

39
Other Special Registers
Cortex-M

Note: The FAULTMASK and BASEPRI registers are not available in ARMv6-M

• used for exception or interrupt masking


• Can only be accessed in the privileged access level
• When PRIMASK set, it blocks all exceptions and interrupts apart from the
Non-Maskable Interrupt (NMI) and the HardFault exception
• When FAULTMASK set, it blocks all exceptions, interrupts and the
HardFault exception
• BASEPRI masks exceptions or interrupts based on priority level. When it is
set to a non-zero value, it blocks exceptions/interrupts that have the same
or lower priority level.
40
Special Registers
• CONTROL register defines:
– The selection of stack pointer (Main Stack Point/Process Stack
Pointer)
– Access level in Thread mode (Privileged/Unprivileged)
• CONTROL register can only be modified in the privileged
access level

41
Stack
• Memory location is orgaized as a LIFO (Last In, First Out):
– Stack pointer indicates the last stacked item on the stack memory.

• Two stacks, two independent stack pointers.


– Handler mode always uses the MSP (Main Stack Pointer)

– Thread mode can use MSP (Main Stack Pointer) by default, or PSP
(Process Stack Pointer).

• In an OS environment, ARM recommends that threads running in


Thread mode use the process stack and the kernel and exception
handlers use the main stack

• The stack can fill up quickly, depending on the situation


42
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow control operations.
• Summary
43
ARMv6 Endianness
• Instructions are always little-endian

• Loads and stores to Private Peripheral Bus are always little-


endian

• Data: Depends on implementation, or from reset configuration


– Kinetis processors are little-endian

45
ARM memory organization
Memory model
• 32-bit address space:
– Linear 4 GB address space removes complex paging schemes,
simplifying software architecture
• Virtual memory is not supported in ARMv6-M.
• support fetching instructions from 16-bit flash memories
– make the microcontroller design simpler, smaller, and consequently
cheaper
• Data accesses are always naturally aligned

46
ARM memory organization
The Cortex-M memory map

47
ARM memory organization
The Cortex-M memory map and buses

48
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
• ARM flow controling operations.
• Summary

49
ARM and Thumb Instructions

• Thumb technology reduces program – Improves speed with little memory


memory size and bandwidth overhead
requirements • CPU operating state
• Thumb provides a subset of ARM 32-bit – CPU decodes instructions based on
whether in Thumb state or ARM state -
instructions re-encoded into usually controlled by T bit
fewer bits (most 16 bits, some 32 bits) – Thumb state indicated by program counter
• Not all 32-bit instructions available being odd (LSB = 1)
• Most 16-bit instructions can only • Cortex-M0+ only uses Thumb
access low registers (R0-R7), but a
few can access high registers (R8-R15) instructions, is always in Thumb state
• 1995 :Thumb-1 instruction set • See ARMv6-M Architecture Reference
– 16-bit instructions Manual for specifics per instruction
(Section A.6.7)
• 2003: Thumb-2 instruction set
– Adds some 32 bit instructions
Cortex-M Instruction Groups
Instr
Group Instructions M0,M0+,M1 M3 M4 M7 M23 M33,M35P
bits
ADC, ADD, ADR, AND, ASR, B, BIC, BKPT, BLX, BX, CMN, CMP, CPS, EOR, LDM, LDR, LDRB, LDRH, LDRSB,
Thumb-1 16 LDRSH, LSL, LSR, MOV, MUL, MVN, NOP, ORR, POP, PUSH, REV, REV16, REVSH, ROR, RSB, SBC, SEV, STM, STR, Yes Yes Yes Yes Yes Yes
STRB, STRH, SUB, SVC, SXTB, SXTH, TST, UXTB, UXTH, WFE, WFI, YIELD
Thumb-1 16 CBNZ, CBZ No Yes Yes Yes Yes Yes
Thumb-1 16 IT No Yes Yes Yes No Yes
Thumb-2 32 BL, DMB, DSB, ISB, MRS, MSR Yes Yes Yes Yes Yes Yes
Thumb-2 32 SDIV, UDIV No Yes Yes Yes Yes Yes
ADC, ADD, ADR, AND, ASR, B, BFC, BFI, BIC, CDP, CLREX, CLZ, CMN, CMP, DBG, EOR, LDC, LDM, LDR, LDRB,
LDRBT, LDRD, LDREX, LDREXB, LDREXH, LDRH, LDRHT, LDRSB, LDRSBT, LDRSH, LDRSHT, LDRT, LSL, LSR, MCR,
MCRR, MLA, MLS, MOV, MOVT, MRC, MRRC, MUL, MVN, NOP, ORN, ORR, PLD, PLDW, PLI, POP, PUSH, RBIT,
Thumb-2 32 No Yes Yes Yes No Yes
REV, REV16, REVSH, ROR, RRX, RSB, SBC, SBFX, SEV, SMLAL, SMULL, SSAT, STC, STM, STR, STRB, STRBT, STRD,
STREX, STREXB, STREXH, STRH, STRHT, STRT, SUB, SXTB, SXTH, TBB, TBH, TEQ, TST, UBFX, UMLAL, UMULL,
USAT, UXTB, UXTH, WFE, WFI, YIELD
PKH, QADD, QADD16, QADD8, QASX, QDADD, QDSUB, QSAX, QSUB, QSUB16, QSUB8, SADD16, SADD8, SASX,
SEL, SHADD16, SHADD8, SHASX, SHSAX, SHSUB16, SHSUB8, SMLABB, SMLABT, SMLATB, SMLATT, SMLAD,
SMLALBB, SMLALBT, SMLALTB, SMLALTT, SMLALD, SMLAWB, SMLAWT, SMLSD, SMLSLD, SMMLA, SMMLS,
DSP 32 SMMUL, SMUAD, SMULBB, SMULBT, SMULTT, SMULTB, SMULWT, SMULWB, SMUSD, SSAT16, SSAX, SSUB16, No No Yes Yes No Optional
SSUB8, SXTAB, SXTAB16, SXTAH, SXTB16, UADD16, UADD8, UASX, UHADD16, UHADD8, UHASX, UHSAX,
UHSUB16, UHSUB8, UMAAL, UQADD16, UQADD8, UQASX, UQSAX, UQSUB16, UQSUB8, USAD8, USADA8,
USAT16, USAX, USUB16, USUB8, UXTAB, UXTAB16, UXTAH, UXTB16
VABS, VADD, VCMP, VCMPE, VCVT, VCVTR, VDIV, VLDM, VLDR, VMLA, VMLS, VMOV, VMRS, VMSR, VMUL,
SP Float 32 No No Optional Optional No Optional
VNEG, VNMLA, VNMLS, VNMUL, VPOP, VPUSH, VSQRT, VSTM, VSTR, VSUB
VCVTA, VCVTM, VCVTN, VCVTP, VMAXNM, VMINNM, VRINTA, VRINTM, VRINTN, VRINTP, VRINTR, VRINTX,
DP Float 32 No No No Optional No No
VRINTZ, VSEL
TrustZone 16 BLXNS, BXNS No No No No Optional Optional
TrustZone 32 SG, TT, TTT, TTA, TTAT No No No No Optional Optional
Co-processor 16 CDP, CDP2, MCR, MCR2, MCRR, MCRR2, MRC, MRC2, MRRC, MRRC2 No No No No No Optional

https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM architecture
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM Instruction Set
• ARM data processing operations.
– Coditional Codes
– Load/store instructions
• ARM flow controling operations.
• Summary

52
Data processing instructions

ARM Calculation Unit

53
Data processing Instructions
• The data processing instructions only work on registers, NOT
memory.
• Instruction Format:
<Operation>{<cond>}{S} Rd, Rn, Operand2
• <Operation>: 3-letter mnemonic containts code of operation
– ADD, SUB, ...
• {<cond>}: Optional two-letter condition code
– ADDEQ r0, r1, r2
• {S}: Optional additional flag
– ADDS r0, r1, r2
• Rd: Destination register
– ADD r0, r1, r2
• Rn: First operand register
– ADD r0, r1, r2

54
Data processing Instructions
• The data processing instructions only work on registers, NOT
memory.
• Instruction Format:
<Operation>{<cond>}{S} Rd, Rn, Operand2 Operand Operand
• Operand 2 can be a register or an (8-bit) immediate value 1 2
– SUB r0, r1, r2 (r0 = r1 - r2)
– ADD r1, r4, #0xFF (r1 = r4 + 0xFF)
Barrel
• Comparisons set flags only - they do not specify Rd Shifter
– CMP r0, r3
• Data movement does not specify Rn
– MOV r0, r1
• Operand 2 is sent to the ALU via barrel shifter ALU

• By default, data processing instructions do not update


the condition flags.
Result
55
Data processing Instructions

Arithmetic: Logical:
– ADD, ADC : add (w. • AND:
carry)
• ORR:
– SUB, SBC : subtract (w.
carry) • EOR: Exclusive OR
– RSB, RSC : reverse • BIC : bit clear
subtract (w. carry) – BIC r0,r1,r2 ; r0 = r1 and (not r2).

56
Data processing Instructions

ARM comparison instructions


– CMP : compare (using subtraction on operands)
– CMN : negated compare (using addition on
operands)
– TST : bit-wise test (using AND on operands)
– TEQ : bit-wise negated test (using EOR on
operands)

These instructions set only the NZCV bits of xPSR.


57
Data processing Instructions

ARM move instructions


– MOV, MVN : move (negated: one’s complement of
operand bits)
• put values only from registers or from immediate
values into registers
MOV r0, r1 ; sets r0 to r1
Mov r0, #4
Mov r0, r1, lsr #1
Mov r0, r1, lsr r2

58
ARM data instructions
• Shift operations are used as part of data processing instructions.
– bits can be shifted from 0-31 places, typically without performance penalty
LSL: Logical Shift Left ASR: Arithmetic Shift Right
CF Destination 0
Destination CF
Multiplication by a power of 2
Division by a power of 2, preserving the sign bit

LSR: Logical Shift Right ROR: Rotate Right

...0 Destination CF Destination CF

Division by a power of 2 Bit rotate with wrap around from LSB to MSB

RRX: Rotate Right Extended

Destination CF

Single bit rotate with wrap around


from CF to MSB
60
Shift Operations

- ARM cores in ARM mode do not actually need shift instructions,


the barrel shifter can perform a shift during an instruction by specifying
a shift on the second operator directly inside an instruction.
- In Thumb mode, the instructions are simplified and shift instructions
do exist.

ADD r0, r5, r5, LSL #1 ; r0 = r5 x 3


MOVS R1,R0,LSR #3 ; R1 = R0 >> 3

62
Multiply and Divide
• There are 2 classes of multiply - producing 32-bit and 64-bit results
• 32-bit versions on an ARM7TDMI will execute in 2 - 5 cycles

– MUL r0, r1, r2 ; r0 = r1 * r2


– MLA r0, r1, r2, r3 ; r0 = (r1 * r2) + r3

• 64-bit multiply instructions offer both signed and unsigned versions


– For these instruction there are 2 destination registers

– [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3


– [U|S]MLAL r4, r5, r2, r3 ; r5:r4 = (r2 * r3) + r5:r4

• Most ARM cores do not offer integer divide instructions


– Division operations will be performed by C library routines or inline shifts

63
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM instruction set
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM data processing operations.
• Coditional Codes
• Load/store instructions
– ARM flow of control.
• Summary

64
Codition Codes
• All operations can be performed conditionally by testing N, Z,
C, and V flags in xPSR:
– Note AL is the default and does not need to be specified

Suffix Description Flags tested


EQ Equal Z=1
NE Not equal Z=0
CS/HS Unsigned higher or same C=1
CC/LO Unsigned lower C=0
MI Minus N=1
PL Positive or Zero N=0
VS Overflow V=1
VC No overflow V=0
HI Unsigned higher C=1 & Z=0
LS Unsigned lower or same C=0 or Z=1
GE Greater or equal N=V
LT Less than N!=V
GT Greater than Z=0 & N=V
LE Less than or equal Z=1 or N=!V
AL Always
66
Conditional Execution and Flags
• ARM instructions can be made to execute conditionally by post-fixing them with
the appropriate condition code
– This can increase code density and increase performance by reducing the number of
forward branches
CMP r0, r1 r0 - r1, compare r0 with r1 and set flags
ADDGT r2, r2, #1 if > then r2=r2+1 flags remain unchanged
ADDLE r3, r3, #1 if <= then r3=r3+1 flags remain unchanged

• By default, data processing instructions do not affect the condition flags but this
can be achieved by post fixing the instruction (and any condition code) with an
“S”
loop
ADD r2, r2, r3 r2=r2+r3
SUBS r1, r1, #0x01 decrement r1 and set flags
BNE loop if Z flag clear then branch
67
Conditional execution examples

C source code ARM instructions


unconditional conditional
if (r0 == 0) CMP r0, #0 CMP r0, #0
{ BNE else ADDEQ r1, r1, #1
r1 = r1 + 1; ADD r1, r1, #1 ADDNE r2, r2, #1
} B end ...
else else
{ ADD r2, r2, #1
r2 = r2 + 1; end
} ...

 5 instructions  3 instructions
 5 words  3 words
 5 or 6 cycles  3 cycles

68
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM instruction set
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM data processing operations.
• Codition Codes
• Load/store memory access instructions
– ARM flow controlling operations.
• Summary

69
ARM load/store instructions
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

• Memory system must support all access sizes

• Syntax:
– LDR{<cond>}{<size>} Rd, <address>
– STR{<cond>}{<size>} Rd, <address>

70
ARM load/store instructions
• Addressing modes: Address accessed by LDR/STR is
specified by a base register with an offset
– register indirect : LDR r0,[r1]
• loads r0 from the address given by r1
– with second register : LDR r0,[r1,-r2]
• loads r0 from the address given by r1 – r2
– with constant : LDR r0,[r1,#4]
• loads r0 from the address r1 + 4.
– Auto-indexing increments base register: LDR r0,[r1,#4]!
• R1 = r1 + 4 and loads r0 from the address r1
– Post-indexing fetches, then does offset: LDR r0,[r1],#4
• Loads r0 from r1, then adds 4 to r1.

72
Loading 32 bit constants
• To allow larger constants to be loaded, the assembler offers a pseudo-
instruction:
– LDR rd, =const
• This will either:
– Produce a MOV or MVN instruction to generate the value (if possible)
or
– Generate a LDR instruction with a PC-relative address to read the constant from
a literal pool (Constant data area embedded in the code)
• For example
– LDR r0, =0xFF => MOV r0, #0xFF
– LDR r0, =0x55555555 => LDR r0, [PC, #Imm12]


DCD 0x55555555
• This is the recommended way of loading constants into a register
74
Example: C assignments
• C:
x = (a + b) - c;

• Assembly:
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
LDR r4,=b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
LDR r4,=c ; get address for c
LDR r2, [r4] ; get value of c

76
C assignment, cont’d.

SUB r3,r3,r2 ; complete computation of x


LDR r4,=x ; get address for x
STR r3, [r4] ; store value of x

77
Example: C assignment

• C:
y = a*(b+c);

• Assembly:
LDR r4,=b ; get address for b
LDR r0,[r4] ; get value of b
LDR r4,=c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a

78
C assignment, cont’d.

MUL r2,r2,r0 ; compute final value for y


LDR r4,=y ; get address for y
STR r2,[r4] ; store y

79
Example: C assignment

• C:
z = (a << 2) | (b & 15);

• Assembler:
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0, LSL 2 ; perform shift
LDR r4,=b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR

80
C assignment, cont’d.

LDR r4,=z ; get address for z


STR r1,[r4] ; store value for z

81
Outline
• Computer architecture taxonomy
• Basic concept of Instruction set
• ARM instruction set
– ARM versions.
– ARM programming model.
– ARM memory organization.
– ARM data operations.
• Codition Codes
• Load/store instructions
– ARM flow controlling operations.
• Summary

82
Branch Instructions
• Branch instructions have the following format:
– B{L}{<cond>} label
– subroutine calls can be made by specifying the optional {L}
– a 24 bit address offset field is part of the instruction encoding
• On execution this is left shifted 2 places (since ARM instructions are always word
aligned) to give a 26 bit value, thus giving a relative branch range of ± 32 MB
– Causes a pipeline flush

B start perform PC relative branch to label “start”


.
.
start continue execution from here

83
Example: if statement

• C:
if (a < b) { x = 5; y = c + d; } else x = c - d;

• Assembly:
; compute and test condition
LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
LDR r4,=b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block

84
If statement, cont’d.

; true block: unconditional


MOV r0,#5 ; generate value for x
LDR r4,=x ; get address for x
STR r0,[r4] ; store x
LDR r4,=c ; get address for c
LDR r0,[r4] ; get value of c
LDR r4,=d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
LDR r4,=y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
85
If statement, cont’d.

; false block
fblock LDR r4,=c ; get address for c
LDR r0,[r4] ; get value of c
LDR r4,=d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute c-d
LDR r4,=x ; get address for x
STR r0,[r4] ; store value of x
after ...

86
Example: Conditional instruction implementation

; compute and test condition


LDR r4,=a ; get address for a
LDR r0,[r4] ; get value of a
LDR r4,=b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b

87
Example: Conditional instruction implementation
; true block: conditional
MOVLT r0,#5 ; generate value for x
LDRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
LDRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
LDRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
LDRLT r4,y ; get address for y
STRLT r0,[r4] ; store y

88
Example: Conditional instruction implementation
; false block
LDRGE r4,=c ; get address for c
LDRGE r0,[r4] ; get value of c
LDRGE r4,=d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute c-d
LDRGE r4,=x ; get address for x
STRGE r0,[r4] ; store value of x

89
Example: switch statement
• C:
switch (test) { case 0: … break; case 1: … }

• Assembly:
LDR r2,=test ; get address for test
LDR r0,[r2] ; load value for test
LDR r1,=switchtab ; load address for switch table
LDR r15,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
case0
ADD r0, r1, r2 ; Operation 0
B Endcase ; Break
case1
SUB r0, r1,
Endcase
90
...
Example: FIR filter (1)

• C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];

• Assembly
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
LDR r2,=N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f

91
Example: FIR filter (2)

LDR r3,=c ; load r3 with base of c


LDR r5,=x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue

92
Subroutines
• Implementing a conventional subroutine call requires two steps:
– Store the return address
– Branch to the address of the required subroutine
• These steps are carried out in one instruction, BL
– The return address is stored in the link register (lr/r14)
– Branch to an address anywhere within a +/- 32MB range
• Returning is performed by restoring the program counter (pc) from lr
– MOV r15,r14
– or BX Lr
func1 func2
void func1 (void)
{
: :
BL func2 :
func2();
:
: BX lr
}

93
Generating Branches with LDR
• The ARM’s branch instruction is limited to a range of ±32MB
– However branches can also be performed by loading address values directly into
the PC (r15)
– armasm provides pseudo instructions to make this easier
Assembler Code
LDR pc, =label ; load address of label into PC

ARMASM

Branches anywhere within the 4GB Object Code


address space are thus possible
LDR pc, [pc, #n]
.
--------------
DCD 0x12345678
Literal pool address data

95
Memory Block Copying (1)

• The use of base register updating enables simple copying routines to be written
– For example: The post-indexed variant could be used to copy a block of memory

Increasing
; r8 points to start of source data Memory
; r9 points to end of source data
; r10 points to start of destination data

loop LDR r0, [r8], #4 ; load 4 bytes


r10
STR r0, [r10], #4; and store them
CMP r8, r9 ; check for the end
BLT loop ; else loop r9

• In this example 1 word is copied per iteration


r8

96
Load and Store Multiples
• Syntax:
– <LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list>
• 4 addressing modes:
LDMIA / STMIA
– increment after
LDMIB / STMIB
– increment before
LDMDA / STMDA decrement after

LDMDB / STMDB decrement before

LDMxx r10, {r0,r1,r4} IA IB DA DB
; loads r0, r1, r4 from the address r10 r4
STMxx r10, {r0,r1,r4} r4 r1
; store r0, r1, r4 to the address r10 r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0

97
Memory Block Copying (2)

• As well as being used for stack operations, the STM / LDM instructions can perform block
copying of memory
Increasing
• For example Memory
; r8 points to start of source data
; r9 points to end of source data
; r10 points to start of destination data

loop LDMIA r8!, {r0-r7} ; load 32 bytes r10


STMIA r10!, {r0-r7}; and store them
CMP r8, r9 ; check for the end r9
BLT loop ; and loop
• In this example 8 words are copied per loop
r8

98
Stacks
 The stack is a last-in-first-out (LIFO) queue:
 The top of the stack is the most recently allocated space
 ARM stack grows down to lower addresses in memory
 The stack pointer, SP (R13), points to the top of the stack

Memory Memory
The stack (a) before expansion and (b) after two-word expansion 99
Stacks
 ARM stack operations are implemented by block transfer instructions:
 STMFD (Push) Store Multiple - Full Descending stack [STMDB]
 LDMFD (Pop) Load Multiple - Full Descending stack [LDMIA]
 Note: Multiple registers will always be stacked in register order from lowest register to lowest
memory location
 The order registers are specified has no effect.

STMFD sp!, {r4-r7, lr} LDMFD sp!, {r4-r7, pc}

Top of Memory
9753 9753
8420 8420
1234 1234 pc 8034
9020
lr 8034 Old SP 1010 SP 1010 lr 9048
SP
8034
FFFF 8034
Stack frame

r7 A0BE A0BE
16 AOBE
A0BE r7 A0BE
12
r6 1234 1234
102E 1234 r6 12340
r5 FF 8765
FF FF r5 14544
FF
r4 100 SP ABCD
100 SP 100 r4 100
1

100
xPSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q de J GE[3:0] IT cond_abc E A I F T mode


f s x c
• MRS and MSR allow contents of xPSR to be transferred to / from a general
purpose register or take an immediate value
– MSR allows the whole status register, or just parts of it to be updated
– MRS (Register ← Status Register)
– MSR (Status Register ← Register)
• Interrupts can be enable/disabled and modes changed, by writing to the xPSR
– Typically a read/modify/write strategy should be used:
MRS r0,xPSR ; read xPSR into r0
BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR xPSR_c,r0 ; write modified value to ‘c’ byte
only

• In User Mode, all bits can be read but only the condition flags (_f) can be
modified
101
Example Source File
Defines start of a read-only area, called “ARMEX”, containing code

AREA ARMEX, CODE, READONLY


Marks software entry point
ENTRY
ARM Start of a sequence of ARM instructions
main
Label starts in the first column
MOV r0, #10
MOV r1, #3
ADD r2, r0, r1 ; this is a comment
stop
B stop
END Marks end of source file

102
Summary
• Computer architecture taxonomy are in common use today
– von Neumann vs. Harvard
– CISC vs. RISC
– Superscalar vs. VLIW
• The programming model is a description of the architecture
relevant to instruction operation.
• ARM Instruction set
– Load/store architecture
– Most instructions are RISC, operate in single cycle.
• Some multi-register operations take longer.
– All instructions can be executed conditionally.

103
Quiz (1)

Q1: What is difference between Von Neumann and Harvard architecture?


Q2: What is difference between CISC and RISC?
Q3: What is difference between Microprocessor and Microcontroller architecture?
Q3: What is difference between superscalar processor and VLIW processor?
Q4: What is the difference between a big-endian and little-endian data
representation?

104
Quiz (2)
Q5: Answer the following questions about the ARM programming model:
• a. How many general-purpose registers are there?
• b. What is the purpose of the xPSR?
• c. What is the purpose of the Z bit?
• d. Where is the program counter kept?

105
Quiz (3)

Q6: Write an ARM instruction which will implement each of the following:
a) r0 = 16
b) r0 = r1 / 16 (signed numbers)
c) r1 = r2 * 3
d) r0 = -r0

Q7: Explain the operation of the BL instruction, including the state of ARM
registers before and after its operation.
Q8: Which data processing instructions always set the condition flags?

106
Quiz (4)

1. What assembler directive should start every assembler source file?

2. What instructions can be used to return from a leaf subroutine call?

3. What instructions should be used to enable or disable IRQ interrupts?

4. What instructions can be used to overcome the ± 32MB limitation of the


standard ARM Branch instruction?

107

You might also like