ARM Architecture Overview
ARM Architecture Overview
4T
ARM7TDMI ARM922T Thumb instruction set
5TE
ARM926EJ- S ARM946E-S ARM966E-S Improved ARM/Thumb Interworking DSP instructions Extensions: Jazelle (5TEJ)
6
ARM1136JF -S ARM1176JZF-S ARM11 MPCore SIMD Instructions Unaligned data support Extensions: Thumb-2 (6T2) TrustZone (6Z) Multicore (6K) Thumb-2 Extensions:
7
Cortex-A8/R4/M3/M1
v7A (applications) NEON v7R (real time) HW Divide V7M (microcontroller) HW Divide and Thumb-2 only
ARM7TDMI - architecture v4T. Von Neuman core with 3 stage pipeline ARM920T - architecture v4T. Harvard core with 5 stage pipeline and MMU
2
Memory management support (MMU) Highest performance at low power Influenced by multi-tasking OS system requirements TrustZone and Jazelle-RCT for a safe, extensible system
Protected memory (MPU) Low latency and predictability real-time needs Evolutionary path for traditional embedded business
Programmers Model
Halfword means 16 bits (two bytes) Word means 32 bits (four bytes) Doubleword means 64 bits (eight bytes)
32-bit ARM Instruction Set 16-bit Thumb Instruction Set Provides a mixture of 32-bit and 16-bit instructions Maintains code density with increased flexibility
Processor Modes
Each mode has access to own stack and a different subset of registers Some operations can only be carried out in a privileged mode
Mode Supervisor (SVC) FIQ IRQ Abort Undef System User Description Entered on reset and when a Software Interrupt instruction (SWI) is executed Entered when a high priority (fast) interrupt is raised Entered when a low priority (normal) interrupt is raised Used to handle memory access violations Used to handle undefined instructions Privileged mode using the same registers as User mode Mode under which most Applications / OS tasks run Unprivileged mode Privileged modes
Exception modes
IRQ
FIQ
Undef
Abort
SVC
ARM has 37 registers, all 32-bits long A subset of these registers is accessible in each mode
N Z C V Q de f
n s
d e f GE[3:0]
IT n e d i cond_abc x
E A
I F T c
mode
T Bit
T = 0: Processor in ARM state T = 1: Processor in Thumb state Introduced in Architecture 4T Mode bits Specify the processor mode New bits in V6
GE[3:0] used by some SIMD instructions E bit controls load/store endianness A bit disables imprecise data aborts IT [abcde] IF THEN conditional execution of Thumb2 instruction groups
8
Data alignment
Prior to architecture v6 data accesses must be appropriately aligned for access size
Unaligned data can be accessed using multiple aligned accesses combined with shift/mask operations
Exception Handling
Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits Change to ARM state Change to exception mode Disable interrupts (if appropriate) Stores the return address in LR_<mode> Sets PC to vector address
0x00
Reset
Vector Table
Vector table can also be at
Must be done in ARM state in most cores, but... ...Thumb-2 capable cores can do this in Thumb state
10
11
Example data processing instructions SUB r0,r1,#5 ADD r2,r3,r3,LSL #2 ADDEQ r5,r5,r6 Example branching instruction B <Label> Example memory access instructions LDR r0,[r1] STRNEB r2,[r3,r4] STMFD sp!,{r4-r8,lr}
Load word at address r1 into r0 IF NE condition true, store bottom byte of r2 to address r3+r4 Store registers r4 to r8 and lr on stack. Then update stack pointer
12
Optimized for code density from C code (~65% of ARM code size) Improved performance from narrow memory Subset of the functionality of the ARM instruction set
Constraints are not generally consistent Targeted at compiler generation, not hand coding
13
Adds 32-bit instructions to implement almost all of the ARM ISA functionality Retains the complete 16-bit Thumb instruction set
No switching between ARM-Thumb states Compiler automatically selects mix of 16 and 32 bit instructions
14
Performance
Thumb-2
Random mix
Profiled mix
Code density
15
Processor Cores
16
ARM7TDMI Processor
17
ARM926EJ-S Processor
ARM926EJ-S
Architecture v5TE 5-stage pipeline Single-cycle 32x16 multiplier Caches and TCMs Memory management unit (MMU) 2 AHB memory interfaces Jazelle technology
18
TrustZone 8-stage pipeline Branch prediction Four AXI memory ports IEM (Intelligent Energy Management) Integrated VFP coprocessor
19
MP11
MP11
MP11
MP11
20
Architecture v7-M (Thumb-2 only) Very different from previous ARM processors No CPSR register Vector table contains addresses, not
instructions
Atypical Implementation Fixed memory map Integrated interrupt controller Serial-Wire Debug
21
22
23
The ARM7TDMI uses a 3-stage pipeline in order to increase the speed of the flow of instructions to the processor
PC - 4
PC-2
DECODE
Decoding of registers used in instruction Register(s) read from Register Bank Shift and ALU operation Write register(s) back to Register Bank
PC - 8
PC - 4
EXECUTE
Debug tools will hide this from you This is now part of the ARM Architecture and applies to all processors
24
Optimal Pipelining
Cycle Operation ADD SUB ORR AND ORR EOR F D F E D F E D F E D F M E D F E D E W 1 2 3 4 5 6 7 8 9
F - Fetch
D - Decode
E - Execute
All operations here are on registers (single cycle execution) In this example it takes 6 clock cycles to execute 6 instructions Clock cycles per Instruction (CPI) = 1
25
F - Fetch
D - Decode
E Execute L Linkret
A - Adjust
Breaking the pipeline Note that the core is executing in ARM state
26
D2
D3
D4
E0
E1
E2
E3
E4
E5
AGU
Queue
Route
MUL1
MUL2
WB
Instruction Fetch
Shift
ALU
SAT
BP Update
WB
ALU PIPE1
Instruction Decode
AGU RAM + Format BP TLB Fwd Update WB LOAD STORE
Optimising code to make use of the processor pipeline is very difficult Leave it to the compiler!!
27
Reference Slides
28
Reference Material
ARM DDI 0100E covers v5TE DSP extensions Can be purchased from booksellers - ISBN 0-201-737191 (Addison-Wesley) Available for download from ARMs website ARM v7-M ARM available for download from ARMs website Contact ARM if you need a different version (v6, v7-AR, etc.)
Naming Conventions
ARMx1z (e.g. ARM710T) indicates cache & full MMU ARMx2z (e.g. ARM720T) indicates cache, MMU & Process ID support ARMx3z (e.g. ARM1136J-S) indicates physically mapped caches and MMU ARMx4z (e.g. ARM740T) indicates cache and MPU ARMx5z (e.g. ARM1156T2-S) indicates cache, MPU and error correcting memory ARMx6z (e.g. ARM966E-S) indicates write buffer but no caches ARMx7z (e.g. ARM1176JZ-S) indicates AXI bus, & physically mapped caches and MMU ARMxy6 (e.g. ARM946E-S) indicates TCMs
30
ARM7TDMI family
ARM1136J(F)-S ARM1156T2(F)-S ARM1176JZ(F)-S ARM11 MPCore ARM Cortex -A8 ARM Cortex -R4(F) ARM Cortex -M3 ARM Cortex -M1
Cortex family
For ARM processor naming conventions and features, please see the Appendix
31
ARMv4T Cores:
7TDMI
Architecture Cache Associativity TCM Replacement Write Strategy Write Buffer MMU/MPU Hi Vectors Streaming Standby Mode
von Neumann None N/A No N/A N/A None None No N/A No
720T
von Neumann 8K Unified 4 words/line 4-way No Random Write Through 8 Words 4 Addresses MMU Yes Yes No
740T
von Neumann 8K Unified 4 words/line 4-way No Random Write Through 8 Words 4 Addresses MPU No Yes No
920T
Harvard 16K Instr + 16K Data 8 words/line 64- way No Random Round Robin Write Through Write Back 16 Words 4 Addresses MMU Yes Yes Yes
940T
Harvard 4K Instr + 4K Data 4 words/line 64- way No Random Write Through Write Back 8 Words 4 Addresses MPU Yes Yes Yes
SA1100
Harvard 16K Instr + 16K Data 4 words/line 32- way No Round Robin Write Back 8 Words 4 Addresses MMU Yes Yes Yes
32
ARMv5 Cores:
926EJ-S
Architecture Cache Associativity TCM Replacement Write Strategy Write Buffer
Harvard 4-128K Instr 4-128K Data 8 words/line 4-way 0-1024K Instr 0-1024K Data Random Round Robin Write Through Write Back 16 Words 4 Addresses
946E-S
Harvard 0-1024K Instr 0-1024K Data 8 words/line 4-way 0-1024K Instr 0-1024K Data Random Round Robin Write Through Write Back 16 Words Data or Address MPU Yes Yes Yes
966E-S
Harvard None
968E-S
Harvard None
1026EJ-S
Harvard 0-128K Instr 0-128K Data 8 words/line 4-way 0-1024K Instr 0-1024K Data Random Round Robin Write Through Write Back 8 Words Data or Address MMU or MPU Yes Yes Yes
XScale
Harvard 32K Instr 32K Data 8 words/line 32- way No Random Round Robin Write Through Write Back 8 x 16 Bytes Coalescing MMU With extensions Yes Yes Yes
N/A 0-64M Instr 0-64M Data N/A N/A 12 Words Data or Address None Yes N/A Yes
N/A 0-64M Instr 0-64M Data N/A Write Through Write Back 12 Words Data or Address None Yes N/A Yes
33
ARMv6 Cores:
1136EJ(F)S
Architecture Cache Associativity TCM Replacemen t Write Strategy MMU/MPU Hi Vectors Streaming Standby Mode Bus VFP Support
Harvard 4-64K Instr 4-64K Data 8 words/line 4-way 0-64K Instr 0-64K Data Random Round Robin Write Through Write Back MMU Yes Yes Yes AHB/APB Yes
1156T2(F)S
Harvard 0-64K Instr 0-64K Data 8 words/line 4-way 0-256K Instr 0-256K Data Random Round Robin Write Through Write Back MPU Yes Yes Yes AXI Yes
1176JZ(F)S
Harvard 4-64K Instr 4-64K Data 8 words/line 4-way 0-64K Instr 0-64K Data Random Round Robin Write Through Write Back MMU Yes N/A Yes AXI Yes
MPCore11
Harvard 16-64K Instr 16-64K Data 8 words/line 4-way None Random Round Robin Write Through Write Back MMU Yes Yes Yes AXI Yes
34
Cortex Cores:
Cortex-M3
Architecture Cache Associativity TCM Replacemen t Write Strategy MMU/MPU Hi Vectors Streaming Standby Mode Bus VFP Support
N/A None N/A N/A 0-1M Instr 0-1M Data N/A Harvard None
Cortex-M1
Harvard None
Cortex-R4
Harvard 4-64K Instr 4-64K Data 8 words/line 4-way 0-8M Instr 0-8M Data Random Write Through Write Back MPU (optional) Yes Yes Yes AXI Yes
Cortex-A8
Harvard 16 or 32 Instr 16 or 32 Data 16 words/line 4-way None Random Write Through Write Back MMU Yes Yes Yes AXI Yes
35
TrustZone Computing
TrustZone adds a parallel world to allow trusted programs and data to be safely separated from the OS and applications
Introduced for ARM1176, standard for ARMv7-A Cores Features: New Secure Monitor Mode:
gate -keeper for secure state New S-bit in CP15 to indicate when the processor is running in a secured state Security state exposed on external bus accesses to permit securityaware memory and peripherals Ability to restrict debug to nonsecure state
36
Best use of available memory bandwidth Eliminates data arrangement overhead Operates on separate register file SIMD Framework excellent target for compilers
37
End
38