0% found this document useful (0 votes)
43 views

ARM-ISA-and-Cortex-M0

Uploaded by

khaliljouili16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

ARM-ISA-and-Cortex-M0

Uploaded by

khaliljouili16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Cortex-M0+ CPU Core and

ARM Instruction Set Architecture

1
Microcontroller vs. Microprocessor
 Both have a CPU core to
execute instructions
 Microcontroller has
peripherals for embedded
interfacing and control
 Analog
 Non-logic level
signals
 Timing
 Clock generators
 Communications
 point to point
 network
 Reliability
and safety

2
Cortex-M0+ Core

3
ARM Processor Core Registers
 R0-R12 - General purpose, for data processing
 SP - Stack pointer (R13)
 Can refer to one of two SPs
 Main Stack Pointer (MSP)
 Process Stack Pointer (PSP)
 Uses MSP initially, and in Handler mode
 In Thread mode, can select either MSP or PSP
using SPSEL flag in CONTROL register.
 LR - Link Register (R14)
 Holds return address when called with Branch &
Link instruction (B&L)
 PC - program counter (R15)

4
Operating Modes
Reset

Thread
Mode.
MSP or PSP.

Exception Starting
Processing Exception
Completed Processing

Handler
Mode
MSP
 Which SP is active depends on operating mode, and SPSEL (CONTROL register bit 1)
 SPSEL == 0: MSP
 SPSEL == 1: PSP

5
ARM Program Status Register

 Three views of same register


 Application PSR (APSR)
 Condition code flag bits Negative, Zero, oVerflow, Carry used for
conditional branches, extended precision math, error detection
 Interrupt PSR (IPSR)
 Holds exception number of currently executing ISR
 Execution PSR (EPSR)
 Thumb state

6
ARM Processor Core Registers

 PRIMASK - Exception mask register


 Bit 0: PM Flag
 Set to 1 to prevent activation of all exceptions with configurable priority
 Access using CPS, MSR and MRS instructions
 Use to prevent data race conditions with code needing atomicity

 CONTROL
 Bit 1: SPSEL flag
 Selects SP when in thread mode: MSP (0) or PSP (1)
 Bit 0: nPRIV flag
 Defines whether thread mode is privileged (0) or unprivileged (1)
 With OS environment,
 Threads use PSP
 OS and exception handlers (ISRs) use MSP

7
Different Instruction Sets for Different Design Spaces?

 ARM instructions optimized for resource-rich high-


performance computing systems
 Deeply pipelined processor, high clock rate, wide (e.g. 32-
bit) memory bus
 https://en.wikipedia.org/wiki/ARM_Cortex-
M#Instruction_sets
 Low-end embedded computing systems are different
 Slower clock rates, shallow pipelines
 Different cost factors – e.g. code size matters much more
 Bit and byte operations critical

8
The Memory Wall

 It has been easier to speed up the CPU than the memory


 Facts of life
 Off-chip memory is slower than on-chip memory. May not want
to put all memory on-chip, even if possible.
 Flash is slower to read or write than RAM.
 Fast RAM is more expensive than slow RAM. Same for flash.
 Design for high-performance CPUs
Double flash
 Use caches (small fast RAM) to make main memory (large slow
bus width
RAM, flash) look faster at a low cost.
 Put cache(s) on chip if possible.
 Increase bandwidth by widening memory bus, improving protocol,
reducing overhead, split transactions, using page mode, etc.)
 Design for low-performance CPUs
 Put memory on-chip with CPU. RAM, flash ROM
 Increase flash ROM bandwidth by widening memory bus, adding
prefetch buffer, branch target buffer, etc. Low High
 Add cache Performance Performance
 Change instruction set size to reduce instruction bandwidth needed
9
ARM and Thumb Instructions

 Thumb reduces program memory size and


bandwidth requirements  CPU operating state
 Subset of instructions re-encoded into fewer bits  CPU decodes instructions based on whether in Thumb
state or ARM state - controlled by T bit
(most 16 bits, some 32 bits)
 Not all 32-bit instructions available  Thumb state indicated by program counter being odd
 Most 16-bit instructions can only access low registers
(LSB = 1)
(R0-R7), but a few can access high registers (R8-R15)  Cortex-M0+ only uses Thumb instructions, is always
 1995: Thumb-1 instruction set in Thumb state
 16-bit instructions  See ARMv6-M Architecture Reference Manual for
 2003: Thumb-2 instruction set specifics per instruction (Section A.6.7)
 Adds some 32 bit instructions
 Improves speed with little memory overhead

10
Cortex-M Instruction Groups
Instr
Group Instructions M0,M0+,M1 M3 M4 M7 M23 M33,M35P
bits
ADC, ADD, ADR, AND, ASR, B, BIC, BKPT, BLX, BX, CMN, CMP, CPS, EOR, LDM, LDR, LDRB, LDRH,
Thumb-1 16 LDRSB, LDRSH, LSL, LSR, MOV, MUL, MVN, NOP, ORR, POP, PUSH, REV, REV16, REVSH, ROR, RSB, SBC, Yes Yes Yes Yes Yes Yes
SEV, STM, STR, STRB, STRH, SUB, SVC, SXTB, SXTH, TST, UXTB, UXTH, WFE, WFI, YIELD
Thumb-1 16 CBNZ, CBZ No Yes Yes Yes Yes Yes
Thumb-1 16 IT No Yes Yes Yes No Yes
Thumb-2 32 BL, DMB, DSB, ISB, MRS, MSR Yes Yes Yes Yes Yes Yes
Thumb-2 32 SDIV, UDIV No Yes Yes Yes Yes Yes
ADC, ADD, ADR, AND, ASR, B, BFC, BFI, BIC, CDP, CLREX, CLZ, CMN, CMP, DBG, EOR, LDC, LDM,
LDR, LDRB, LDRBT, LDRD, LDREX, LDREXB, LDREXH, LDRH, LDRHT, LDRSB, LDRSBT, LDRSH,
LDRSHT, LDRT, LSL, LSR, MCR, MCRR, MLA, MLS, MOV, MOVT, MRC, MRRC, MUL, MVN, NOP, ORN,
Thumb-2 32 No Yes Yes Yes No Yes
ORR, PLD, PLDW, PLI, POP, PUSH, RBIT, REV, REV16, REVSH, ROR, RRX, RSB, SBC, SBFX, SEV, SMLAL,
SMULL, SSAT, STC, STM, STR, STRB, STRBT, STRD, STREX, STREXB, STREXH, STRH, STRHT, STRT, SUB,
SXTB, SXTH, TBB, TBH, TEQ, TST, UBFX, UMLAL, UMULL, USAT, UXTB, UXTH, WFE, WFI, YIELD
PKH, QADD, QADD16, QADD8, QASX, QDADD, QDSUB, QSAX, QSUB, QSUB16, QSUB8, SADD16,
SADD8, SASX, SEL, SHADD16, SHADD8, SHASX, SHSAX, SHSUB16, SHSUB8, SMLABB, SMLABT, SMLATB,
SMLATT, SMLAD, SMLALBB, SMLALBT, SMLALTB, SMLALTT, SMLALD, SMLAWB, SMLAWT, SMLSD,
SMLSLD, SMMLA, SMMLS, SMMUL, SMUAD, SMULBB, SMULBT, SMULTT, SMULTB, SMULWT, SMULWB,
DSP 32 No No Yes Yes No Optional
SMUSD, SSAT16, SSAX, SSUB16, SSUB8, SXTAB, SXTAB16, SXTAH, SXTB16, UADD16, UADD8, UASX,
UHADD16, UHADD8, UHASX, UHSAX, UHSUB16, UHSUB8, UMAAL, UQADD16, UQADD8, UQASX,
UQSAX, UQSUB16, UQSUB8, USAD8, USADA8, USAT16, USAX, USUB16, USUB8, UXTAB, UXTAB16,
UXTAH, UXTB16
VABS, VADD, VCMP, VCMPE, VCVT, VCVTR, VDIV, VLDM, VLDR, VMLA, VMLS, VMOV, VMRS, VMSR,
SP Float 32 No No Optional Optional No Optional
VMUL, VNEG, VNMLA, VNMLS, VNMUL, VPOP, VPUSH, VSQRT, VSTM, VSTR, VSUB
VCVTA, VCVTM, VCVTN, VCVTP, VMAXNM, VMINNM, VRINTA, VRINTM, VRINTN, VRINTP, VRINTR,
DP Float 32 No No No Optional No No
VRINTX, VRINTZ, VSEL
TrustZone 16 BLXNS, BXNS No No No No Optional Optional
TrustZone 32 SG, TT, TTT, TTA, TTAT No No No No Optional Optional
Co-processor 16 CDP, CDP2, MCR, MCR2, MCRR, MCRR2, MRC, MRC2, MRRC, MRRC2 No No No No No Optional

https://en.wikipedia.org/wiki/ARM_Cortex-M#Instruction_sets
11
Reference for ARM Instruction Set Architecture

 ARM V6-M Architecture Reference Manual,


Chapter A5. The Thumb Instruction Set Encoding
 16- or 32-bit instruction?
 Bits [15:11]
 0b11101, 0b1110, 0b11111: 32-bit instruction. Page A5-91
 Else 16-bit instruction. Page A5-84

12
Example Instruction Encoding: ADC (register)

 Page A6-106 of ARM-V6M ARM

13
Example Instruction Encoding: ADD (register)

 Page A6-109 of ARM-V6M ARM

14
Assembler Instruction Format

 <operation> <operand1> <operand2> <operand3>


 There may be fewer operands
 First operand is typically destination (<Rd>)
 Other operands are sources (<Rn>, <Rm>)

 Examples
 ADDS <Rd>, <Rn>, <Rm>
 Add registers: <Rd> = <Rn> + <Rm>
 AND <Rdn>, <Rm>
 Bitwise and: <Rdn> = <Rdn> & <Rm>
 CMP <Rn>, <Rm>
 Compare: Set condition flags based on result of computing <Rn> - <Rm>

15
Update Condition Codes in APSR?

 “S” suffix indicates the instruction updates APSR


 ADD vs. ADDS
 ADC vs. ADCS
 SUB vs. SUBS
 MOV vs. MOVS

16
USING REGISTERS

17
AAPCS Register Use Conventions

 Make it easier to create modular, isolated and integrated code

 Scratch registers are not expected to be preserved upon returning from a called
subroutine
 r0-r3

 Preserved (“variable”) registers are expected to have their original values upon
returning from a called subroutine
 r4-r8, r10-r11

18
AAPCS Core Register Use

Must be saved, restored by callee-


procedure if it will modify them.
Calling subroutine expects these to
retain their value.

Must be saved, restored by callee-


procedure if it will modify them.
Calling subroutine expects these to
retain their value.

Don’t need to be saved. May


be used for arguments,
results, or temporary values.

19
Instruction Set Summary
Instruction Type Instructions
Move MOV
Load/Store LDR, LDRB, LDRH, LDRSH, LDRSB, LDM, STR, STRB, STRH, STM
Add, Subtract, Multiply ADD, ADDS, ADCS, ADR, SUB, SUBS, SBCS, RSBS, MULS
Compare CMP, CMN
Logical ANDS, EORS, ORRS, BICS, MVNS, TST
Shift and Rotate LSLS, LSRS, ASRS, RORS
Stack PUSH, POP
Conditional branch B, BL, B{cond}, BX, BLX
Extend SXTH, SXTB, UXTH, UXTB
Reverse REV, REV16, REVSH
Processor State SVC, CPSID, CPSIE, SETEND, BKPT
No Operation NOP
Hint SEV, WFE, WFI,YIELD
Barriers DMB, DSB, ISB

20
Load and Store Register Instructions

 ARM is a load/store architecture, so must  Source and destination addresses are


process data in registers (not memory) specified using available addressing modes
 LDR: load register with word (32 bits) from  Offset Addressing mode: [<Rn>, <offset>]
memory accesses address <Rn>+<offset>
 Base Register <Rn> can be R0-R7, SP or PC
 LDR <Rt>, source address
 <offset> is added or subtracted from base
 STR: store register contents (32 bits) to register to create effective address
memory  Can be an immediate constant
 STR <Rt>, destination address  Can be another register, used as index <Rm>
 Auto-update: Can write effective address
back to base register
 Pre-indexing: use effective address to access
memory, then update base register
 Post-indexing: use base register to access
memory, then update base register
21
Memory Maps For Cortex M0+ and MCU
KL25Z128VLK4
0x2000_2FFF

SRAM_U (3/4)
16 KB SRAM
0x2000_0000
SRAM_L (1/4)
0x1FFF_F000 Some RAM is located in
Code segment, allowing
code to run from RAM
to allow flash
reprogramming or for
0x0001_FFFF better speed on faster
systems

128KB Flash

0x0000_0000
22
Memory Maps For Cortex M0+ and MCU
KL25Z128VLK4
0x2000_2FFF

SRAM_U (3/4)
16 KB SRAM
0x2000_0000
SRAM_L (1/4)
0x1FFF_F000

0x0001_FFFF

128KB Flash

0x0000_0000
23
Endianness Memory
7 0 Register
 For a multi-byte value, in 31 24 23 16 15 8 7 0
B3 B2 B1 B0
what order are the bytes Address A B0 msbyte
stored? A+1 B1
A+2 B2
A+3 B3 lsbyte
 Little-Endian: Start with
least-significant byte
Memory
7 0
Register
 Big-Endian: Start with most-
31 24 23 16 15 8 7 0
significant byte Address A B3 msbyte B3 B2 B1 B0
A+1 B2
A+2 B1
A+3 B0 lsbyte

24
ARMv6-M Endianness
 Instructions are always little-endian

 Loads and stores to Private Peripheral Bus are always little-endian

 Data: Depends on implementation, or from reset configuration


 Kinetis processors are little-endian

25
Loading/Storing Smaller Data Sizes
Signed Unsigned
Byte LDRSB LDRB
Half-word LDRSH LDRH

 Some load and store instructions can handle half-word (16 bits) and byte (8 bits)
 Store just writes to half-word or byte
 STRH, STRB
 Loading a byte or half-word requires padding or extension: What do we put in the upper bits of the
register?
 Example: How do we extend 0x80 into a full word?
 Unsigned? Then 0x80 = 128, so zero-pad to extend to word 0x0000_0080 = 128
 Signed? Then 0x80 = -128, so sign-extend to word 0xFFFF_FF80 = -128

26
In-Register Size Extension
Signed Unsigned
Byte SXTB UXTB
Half-word SXTH UXTH

 Can also extend byte or half-word already in a register


 Signed or unsigned (zero-pad)
 How do we extend 0x80 into a full word?
 Unsigned? Then 0x80 = 128, so zero-pad to extend to word 0x0000_0080 = 128
 Signed? Then 0x80 = -128, so sign-extend to word 0xFFFF_FF80 = -128

27
Load/Store Multiple
 LDM/LDMIA: load multiple registers starting from [base register], update base register afterwards
 LDM <Rn>!,<registers>
 LDM <Rn>,<registers>

 STM/STMIA: store multiple registers starting at [base register], update base register after
 STM <Rn>!, <registers>

 LDMIA and STMIA are pseudo-instructions, translated by assembler

28
Load Literal Value into Register

 Assembly pseudo-instruction: LDR <rd>,  Example formats for literal values (depends
=value on compiler and toolchain used)
 Assembler generates code to load <rd> with  Decimal: 3909
value  Hexadecimal: 0xa7ee
 Assembler selects best approach depending  Character: ‘A’
on value  String: “44??”
 Load immediate
 MOV instruction provides 8-bit unsigned immediate operand
(0-255)
 Load and shift immediate values
 Can use MOV, shift, rotate, sign extend instructions
 Load from literal pool
 1. Place value as a 32-bit literal in the program’s literal pool
(table of literal values to be loaded into registers)
 2. Use instruction LDR <rd>, [pc,#offset] where offset
indicates position of literal relative to program counter value

29
Move (Pseudo-)Instructions
 Copy data from one register to another without
updating condition flags
 MOV <Rd>, <Rm>

 Assembler translates pseudo-


instructions into equivalent
instructions (shifts, rotates)
 Copy data from one register to another
and update condition flags
 MOVS <Rd>, <Rm>
 Copy immediate literal value (0-255)
into register and update condition flags
 MOVS <Rd>, #<imm8>

30
Stack Operations
 Push some or all of registers (R0-R7, LR) to stack
 PUSH {<registers>}
 Decrements SP by 4 bytes for each register saved
 Pushing LR saves return address
 PUSH {r1, r2, LR}
 Always pushes registers in same order

 Pop some or all of registers (R0-R7, PC) from stack


 POP {<registers>}
 Increments SP by 4 bytes for each register restored
 If PC is popped, then execution will branch to new PC value after this POP instruction (e.g. return address)
 POP {r5, r6, r7}
 Always pops registers in same order (opposite of pushing)

31
Add Instructions
 Add registers, update condition flags
 ADDS <Rd>,<Rn>,<Rm>

 Add registers and carry bit, update condition flags


 ADCS <Rdn>,<Rm>

 Add registers
 ADD <Rdn>,<Rm>

 Add immediate value to register


 ADDS <Rd>,<Rn>,#<imm3>
 ADDS <Rdn>,#<imm8>

32
Add Instructions with Stack Pointer
 Add SP and immediate value
 ADD <Rd>, SP, #<imm8>
 ADD SP, SP, #<imm7>

 Add SP and register


 ADD <Rdm>, SP, <Rdm>
 ADD SP, <Rm>

33
Address to Register Pseudo-Instruction
 Add immediate value to PC, write result in register
 ADR <Rd>,<label>

 How is this used?


 Enables storage of constant data near program counter
 First, load register R2 with address of const_data
 ADR R2, const_data
 Second, load const_data into R2
 LDR R2, [R2]

 Value must be close to current PC value

34
Subtract
 Subtract immediate from register, update condition flags
 SUBS <Rd>, <Rn>, #<imm3>
 SUBS <Rdn>, #<imm8>

 Subtract registers, update condition flags


 SUBS <Rd>, <Rn>, <Rm>

 Subtract registers with carry, update condition flags


 SBCS <Rdn>, <Rm>

 Subtract immediate from SP


 SUB SP, SP, #<imm7>

35
Multiply
 Multiply source registers, save lower word of result in destination register, update condition flags
 MULS <Rdm>, <Rn>, <Rdm>
 <Rdm> = <Rdm> * <Rn>

 Signed multiply

 Note:
 32-bit * 32-bit = 64-bit
 Upper word of result is truncated

36
Logical Operations
 All of these instructions update the condition flags

 Bitwise AND registers


 ANDS <Rdn>,<Rm>
 Bitwise OR registers
 ORRS <Rdn>,<Rm>
 Bitwise Exclusive OR registers
 EORS <Rdn>,<Rm>
 Bitwise AND register and complement of second register
 BICS <Rdn>,<Rm>
 Move inverse of register value to destination
 MVNS <Rd>,<Rm>
 Bitwise AND two registers, discard result
 TST <Rn>, <Rm>

37
Compare
 Compare - subtracts second value from first, updates condition flags, discards result
 CMP <Rn>,#<imm8>
 CMP <Rn>,<Rm>

 Compare negative - adds two values, updates condition flags, discards result
 CMN <Rn>,<Rm>

38
Shift and Rotate
 Common features
 All of these instructions update APSR condition flags
 Shift/rotate amount (in number of bits) specified by last operand
 Logical shift left - shifts in zeroes on right
 LSLS <Rd>,<Rm>,#<imm5>
 LSLS <Rdn>,<Rm>
 Logical shift right - shifts in zeroes on left
 LSRS <Rd>,<Rm>,#<imm5>
 LSRS <Rdn>,<Rm>
 Arithmetic shift right - shifts in copies of sign bit on left (to maintain arithmetic sign)
 ASRS <Rd>,<Rm>,#<imm5>
 Rotate right
 RORS <Rdn>,<Rm>

39
Reversing Bytes

 REV - reverse all bytes in word MSB LSB


 REV <Rd>,<Rm>

MSB LSB
 REV16 - reverse bytes in both half-words
 REV16 <Rd>,<Rm>
MSB LSB
 REVSH - reverse bytes in low half-word
(signed) and sign-extend
 REVSH <Rd>,<Rm> MSB LSB

MSB LSB
Sign extend
MSB LSB
40
Changing Program Flow - Branches

 Unconditional Branches
 B <label>
 Target address must be within 2 KB of branch instruction (-2048 B to
+2046 B)

 Conditional Branches
 B<cond> <label>
 <cond> is condition - see next page
 B<cond> target address must be within of branch instruction
 B target address must be within 256 B of branch instruction (-256 B to
+254 B)

41
Condition Codes

 Append to branch instruction


(B) to make a conditional branch

 Full ARM instructions (not


Thumb or Thumb-2) support
conditional execution of
arbitrary instructions

 Note: Carry bit = not-borrow


for compares and subtractions

42
Changing Program Flow - Subroutines

 Call  Return
 BL <label> - branch with link  BX <Rd> branch and exchange
 Call subroutine at <label>  Branch to address specified by <Rd>
 PC-relative, range limited to PC+/-16MB  LSB of target address must be set to 1 to
 Save return address in LR ensure continued execution in Thumb state
 BLX <Rd> - branch with link and  Supports full 4 GB address space
exchange  BX LR - Return from subroutine
 Call subroutine at address in register Rd  POP {PC}
(exchange Rd with PC)
 Supports full 4GB address range
 LSB of target address must be set to 1 to
ensure continued execution in Thumb state
 Save return address in LR

43
Special Register Instructions
 Move to Register from Special Register
 MSR <Rd>, <spec_reg>

 Move to Special Register from Register


 MRS <spec_reg>, <Rd>

 Change Processor State - Modify PRIMASK


register
 CPSIE - Interrupt enable
 CPSID - Interrupt disable

44
Other

 No Operation - does nothing!


 NOP

 Breakpoint - causes hard fault or debug halt - used to implement software breakpoints
 BKPT #<imm8>

 Wait for interrupt - Pause program, enter low-power state until a WFI wake-up event occurs (e.g. an
interrupt)
 WFI

 Supervisor call generates SVC exception (#11), same as software interrupt


 SVC #<imm>

45

You might also like