ARM Processors and Architectures - Uni Program
ARM Processors and Architectures - Uni Program
Architectures
A Comprehensive Overview
ARM University Program
September 2012
More information about ARM and our offices on our web site:
http://www.arm.com/aboutarm/
Optional features
VFPv3 Vector Floating-Point
NEON media processing engine
Dual-issue, super-scalar 13-stage pipeline
Branch Prediction & Return Stack
NEON and VFP implemented at end of pipeline
Mode Description
Supervisor Entered on reset and when a Supervisor call
(SVC) instruction (SVC) is executed
Exception modes
cpsr
spsr spsr spsr spsr spsr
Syntax:
<Operation>{<cond>}{S} {Rd,} Rn, Operand2
Examples:
ADD r0, r1, r2; r0 = r1 + r2
TEQ
University r0,
Program r1
Material ; if r0 = r1, Z flag will be set
Copyright ARM Ltd 2012 23
Single Access Data Transfer
Use to move data between one or two registers and memory
LDRDSTRD Doubleword
LDR STR Word
Memory
LDRBSTRB Byte
LDRHSTRH Halfword
LDRSB Signed byte load
31 0
LDRSH Signed halfword load
Upper bits zero filled or
Rd sign extended on Load
Syntax:
LDR{<size>}{<cond>} Rd, <address>
STR{<size>}{<cond>} Rd, <address>
Example:
LDRB r0, [r1] ; load bottom byte of r0 from the
; byte of memory at address in r1
University Program Material
Copyright ARM Ltd 2012 24
Multiple Register Data Transfer
These instructions move data between multiple registers and memory
Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>
4 addressing modes (IA) IB DA DB
Increment after/before
r4
Decrement after/before
r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
Also
PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register
Example
LDM r10, {r0,r1,r4} ; load registers, using r10 base
PUSH {r4-r6,pc} ; store registers, using SP base
func1 func2
void func1 (void)
{
: :
BL func2 :
func2(); BX lr
:
:
}
The SVC handler can examine the SVC number to decide what operation
has been requested
But the core ignores the SVC number
s
tu
Adjusts LR based on exception type
ta
us r s
at o
Dd
Destination
Register
Lane
D30
Q15
D31
Off-chip
ARM Core Memory
On-chip
BIU
SRAM
D-Cache RAM
L1 L2 L3
Cache Lockdown
Prevents line Eviction from a specified Cache Way (discussed later)
Streaming, Critical-Word-First
Cache data is forwarded to the core as soon as the requested word is received in
the Linefill buffer
Any word in the cache line can be requested first using a WRAP burst on the bus
19 8 3
Cache line
7 6 5 4 3 2 1 0 d
Tag v Data d
Tag vv
Tag Data d
Data
Line 0 d Cache has 8 words of data in each line
Tag v DataLine 0 d
Counter
Line 1 Line 0
LineLine
1 0 Each cache line contains Dirty bit(s)
Victim
Line 1
Line 1
Line 254
Indicates whether a particular cache
Line 30
LineLine
25530 line was modified by the ARM core
LineLine
31 30
Line 31
Line 31 Each cache line can be Valid or invalid
An invalid line is not considered
when performing a Cache Lookup
v - valid bit d - dirty bit(s)
D$ I$ D$ I$ D$ I$ D$ I$
Cortex M3 Total
60k* Gates
University Program Material
Copyright ARM Ltd 2012 43
Cortex-M0
ARMv6-M Architecture
16-bit Thumb-2 with system control
instructions
Fully programmable in C
3-stage pipeline
von Neuman architecture
AHB-Lite bus interface
Fixed memory map
1-32 interrupts
Configurable priority levels
Non-Maskable Interrupt support
Low power support
Core configured with or without
debug
Variable number of watchpoints and
breakpoints
Cortex M3 Total
60k* Gates
University Program Material
Copyright ARM Ltd 2012 44
Agenda
Introduction
ARM Architecture Overview
ARMv7-AR Architecture
Programmers Model
Memory Systems
ARMv7-M Architecture
Programmers Model
Memory Systems
Floating Point Extensions
ARM System Design
Software Development Tools
R0 Registers R0-R12
R1
General-purpose registers
R2
R3
R4 R13 is the stack pointer (SP) - 2 banked versions
R5
R6
R14 is the link register (LR)
R7
R8
R9 R15 is the program counter (PC)
R10
R11 PSR (Program Status Register)
R12
R13 (SP)
Not explicitly accessible
R14 (LR) Saved to the stack on an exception
R15 (PC)
Subsets available as APSR, IPSR, and EPSR
PSR
ARM Processor
Application Code
Thread Reset
Mode
Exception Exception
Entry Return
Exception Code
Handler
Mode
Memory Access:
STRB r2, [r10, r1] ; store lower byte in r2 at
address {r10 + r1}
LDR r0, [r1, r2, LSL #2] ; load r0 with data at address
{r1 + r2 * 4}
Program Flow:
BL<label> ; PC relative branch to <label>
location, and return address stored
in LR (r14)
Interrupt handling
Interrupts are a sub-class of exception
Automatic save and restore of processor registers (xPSR, PC, LR, R12, R3-R0)
Allows handler to be written entirely in C
INTNMI
INTISR[0]
NVIC
Cortex-Mx
INTISR[N] Processor Core
IRQ1
IRQ2
IRQ3
Base CPU
Time
Core Execution Foreground ISR2 ISR1 ISR2 ISR3 Foreground
(ISR 2 resumes)
Main
5
4
Reset Handler
Main
4
3
1
Exception Handler
Exception Vector
1. Exception occurs
Current instruction stream stops
Processor accesses vector table
2. Vector address for the exception loaded from the vector table
3. Exception handler executes in Handler Mode
4. Exception handler returns to main
During (or after) state saving the address of the ISR is read from the Vector Table
ExecFuncPtr exception_table[] = {
(ExecFuncPtr)&Image$$ARM_LIB_STACK$$ZI$$Limit, /* Initial SP */
(ExecFuncPtr)__main, /* Initial PC */
NMIException,
The vector table at address
HardFaultException,
0x0 is minimally required to
MemManageException,
have 4 values: stack top,
BusFaultException,
reset routine location,
UsageFaultException,
NMI ISR location,
0, 0, 0, 0, /* Reserved */
HardFault ISR location
SVCHandler,
DebugMonitor, The SVCall ISR
0, /* Reserved */ location must be
PendSVC, populated if the
SysTickHandler SVC instruction will
/* Configurable interrupts start here...*/ Once interrupts be used
}; are enabled, the
#pragma arm section vector table
(whether at 0 or in
SRAM) must then
have pointers to all
enabled (by mask)
exceptions
University Program Material
Copyright ARM Ltd 2012 60
Vector Table in Assembly
PRESERVE8
THUMB
IMPORT ||Image$$ARM_LIB_STACK$$ZI$$Limit||
AREA RESET, DATA, READONLY
EXPORT __Vectors
UNUSED FFFF_FFFF
E004_2000 512MB System (XN)
ETM
E004_1000 E000_0000
TPIU
E004_0000
1GB External
E003_FFFF Peripheral
RESERVED
E000_F000
NVIC A000_0000
E000_E000
RESERVED
E000_3000
FPB 1 GB External
E000_2000
DWT SRAM
E000_1000
ITM
E000_0000 6000_0000
Internal Private Peripheral Bus
512MB Peripheral
4000_0000
512MB SRAM
2000_0000
512MB Code
0000_0000
Cortex M3 Total
60k* Gates
University Program Material
Copyright ARM Ltd 2012 68
Cortex-M4F Floating Point Registers
FPU provides a further 32 single-precision registers
Can be viewed as either
32 x 32-bit registers S0
D0
16 x 64-bit doubleword registers S1
Any combination of the above S2
D1
S3
S4
D2
S5
S6
D3
S7
~
~ ~
~ ~
~ ~
~
S28
D14
S29
S30
D15
S31
ARMv7-M
Architecture
ARMv6-M
Architecture
AMBA AXI
Varying width, speed and size core Interface
AMBA APB
Other
Other peripherals and interfaces CoreLink
Can include on-chip memory from Peripherals
High Performance
APB
ARM processor UART
High
Bandwidth AHB Timer
APB
External
Bridge
Memory Keypad
Interface
Arbiter
HADDR
HADDR HWDATA Slave
Master HWDATA
#1
HRDATA
#1
HRDATA
Address/Control
Slave
#2
Master
#2
Write Data
Slave
Read Data #3
Master
#3
Slave
#4
Decoder
ARM Master 2
Inter-connection architecture
Master interface
Slave interface
Linux Support
Pre-built Linux images are available for ARM hardware platforms
DS-5 accepts kernel images built with the GNU toolchain
Can also debug applications or loadable kernel modules
RVCT can be used to build Linux applications or libraries
Giving performance benefits
ARM does not provide technical support for the GNU toolchain, or Linux
kernel/driver development
August 2012