02 ARM Processor Fundamentals
02 ARM Processor Fundamentals
Minsoo Ryu Department of Computer Science and Engineering Hanyang University [email protected]
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
Topics Covered p
ARM Processor Fundamentals ARM Core Dataflow Model Registers and Current Program Status Register Pipeline Exceptions, Interrupts, and the Vector Table Core Extensions ARM Architecture Revisions and Families
22
33
ARM instructions typically have two source registers, Rn and Rm, and a single result or destination register, Rd
Source operands are read from the register file using the S d df th i t fil i th internal bus
44
For load and store instructions the incrementer updates the address register before the core reads or writes the next register value from or to the next sequential memory location
55
66
77
There are instructions that treat r14 and r15 in a special way
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
88
99
10 10
Processor Modes
The processor mode determines which registers are active and the access rights to the cpsr register itself
A privileged mode allows full read-write access to the cpsr A nonprivileged mode only allows read access to the control i il d d l ll d t th t l field in the cpsr, but still allows full read-write access to the condition flags
11 11
Processor Modes
Abort mode
When there is a failed attempt to access memory
Supervisor mode
The processor is in after reset (when power is applied) and is generally the mode that an operating system kernel operates i ll th d th t ti t k l t in
System mode
Special version of user mode that allows full read-write access to p the cpsr
Undefined mode
When the processor encounters an instruction that is undefined or not supported by the implementation
User mode
Used for programs and applications
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
12 12
Processor Modes
Mode Abort Fast Interrupt Interrupt request Supervisor System Undefined U d fi d User Abbreviation Privileged abt fiq irq svc sys und d usr Yes Yes Yes Yes Yes Yes Y No Bits [4:0] 10111 10001 10010 10011 11111 11011 10000
13 13
Banked Registers g
There are 37 registers in the register file
20 registers are hidden from a program at different times These registers are g called banked registers and are identified by the shading in the program
14 14
Banked Registers g
Banked registers are available only when the processor is in a particular mode
Abort mode has banked registers r13_abt, r14_abt, and spsr_abt spsr abt
Every processor mode except user mode can change mode by writing directly to the mode bits of the cpsr A banked register maps one-to-one onto a user mode register g
If you change processor mode, a banked register from the new mode will replace an existing register
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
15 15
Banked Registers g
When the processor is in the interrupt request mode, the instructions still access registers named r13 and r14
However, th H these registers are the banked registers r13_irq and i t th b k d i t 13 i d r14_irq The user mode registers r13 and r14 are not affected by the g y instruction referencing these registers A program still has normal access to the other registers r0 to r12
16 16
Mode Change g
Two ways of mode change
By a program that writes directly to the cpsr By hardware when the core responds to an exception or interrupt
17 17
Note that the spsr can only be modified and read in a privileged mode
There is no spsr available in user mode
Note that the cpsr is not copied into the spsr when a mode change forced due to a program writing directly to the cpsr
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
18 18
The jazelle J and Thumb T bits in the cpsr reflect the state of the processor
When both J and T bits are 0, the processor is in ARM state , p and executes ARM instructions
19 19
20 20
Interrupts are handled as normal, and cause an immediate return from Java state to ARM state to run the interrupt handler
At the end of the interrupt routine, the normal return th d f th i t t ti th l t mechanism will return the processor to Java state
21 21
ARM Thumb Instruction Size Core instructions cpsr 32-bit 58 T=0 J 0 J=0 16-bit 30 T=1 J 0 J=0
Jazelle 8-bit Over 60% of Java : H/W The rest : S/W T=0, J=1
22 22
Interrupt Masks p
Interrupt masks are used to stop specific interrupt requests from interrupting the processor
Two interrupt levels: interrupt request (IRQ) and fast interrupt request (FIQ) The I bit in the cpsr masks IRQ when set to binary 1 The F bit in the cpsr masks FIQ when set to binary 1 p y
23 23
Condition Flags g
Condition flags are updated by comparison and the result of ALU operations that specify the S instruction suffix
If a SUBS subtract instruction results in a register value of bt ti t ti lt i i t l f zero, then the Z flag in the cpsr is set
Condition flags
N : Negative result from ALU g Z : Zero result from ALU C : ALU operation Carried out V : ALU operation overflowed Q : Overflow & Saturation ARMv5TEJ only
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
24 24
Conditional Execution
Conditional execution controls whether or not the core will execute an instruction
The condition attribute is postfixed to the instruction mnemonic, mnemonic which is encoded into the instruction Priori to execution, the processor compares the condition attribute and with the condition flags in the cpsr If they match, then the instruction is executed; otherwise the instruction is ignored
25 25
Conditional Execution
Mnemonic EQ NE CS/HS CC/LO MI PL VS Equal Not equal Carry set / unsigned higher or same Carry clear / unsigned l C l i d lower Minus / negative Plus / positive or zero Overflow
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
Name
Condition flags Z z C c N n V
26 26
Conditional Execution
Mnemonic HI LS GE LT GT LE AL Name Unsigned higher Unsigned lower or same Signed greater than or Si d t th equal Signed less than S dl h Signed greater than Always (unconditional) Condition flags zC Z or c NV or nv Nv or nV NzV or nzv Ignored
27 27
Pipeline p
The mechanism a RISC processor uses to execute instructions in parallel
ARM 7
ARM 9
ARM 10
As the pipeline length increases, the amount of work done at each stage is reduced, which allows the processor attain a hi h operating frequency tt i higher ti f
This in turn increases the performance This also increases the latency
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
28 28
Pipeline p
The ARM9 adds a memory and writeback stage
1.1 Dhrystone MIPS per MHz Increase in instruction throughput by around 13% compared with an ARM7
ARM9 and ARM10 use the same pipeline executing characteristics as an ARM7
Code written for the ARM7 will execute on an ARM9 or ARM10
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
29 29
The memory map address 0x00000000 is reserved for the vector table, a set of 32 bit words table 32-bit
On some processors the vector table can be optionally located at a higher address in memory (starting at the offset g y( g 0xffff0000) Operating systems such as Linux and MSs embedded products can take advantage of this feature
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
30 30
Exception Vectors p
Reset: When power is applied Undefined instruction: When the processor cannot decode an instruction Software interrupt: When the processor meet an SWI instruction Prefetch abort: When the processor attempts to fetch an instruction from an address without the correct access permission i i Data abort: When an instruction attempts to access data memory without the correct access permissions p Interrupt request (IRQ): When an external hardware interrupts the normal execution flow of the processor Fast i t F t interrupt request (FIQ): When an hardware requiring faster t t (FIQ) Wh h d ii f t response times interrupts the normal execution flow of the processor
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
31 31
Core Extensions
There are some hardware extensions that are standard components placed next to the ARM core
Cache and tightly coupled memory Memory management unit M t it Coprocessors
33 33
34 34
The logic and control is the glue logic that connects the memory system to the AMBA bus
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
35 35
36 36
37 37
38 38
39 39
Coprocessors p
A coprocessor extends the processing features of a core by extending the instruction set or by providing configuration registers
More than one coprocessors can be added to the ARM core M th b dd d t th via the coprocessor interface
40 40
Coprocessors p
The coprocessor can extend the instruction set by providing a specialized group of new instructions
Vector floating-point (VFP) operations can be added These new instructions are processed in the decode stage Th i t ti d i th d d t If the decode stage sees a coprocessor instruction, then it offers it to the relevant coprocessor p But if the coprocessor is not present or doesnt recognize the instruction, the ARM takes an undefined instruction exception
41 41
42 42
Nomenclature
43 43
Nomenclature
All ARM cores after the ARM7TDMI include the TDMI features even though they may not include those letters The processor family is a group of processor implementations that share the same characteristics
The ARM7TDMI, ARM740T, and ARM720T all share the same family and belong to the ARM7 family y g y
JTAG is described by IEEE 1149.1 Standard Test y Access Port and boundary scan architecture
It is a serial protocol used by ARM to send and receive debug information between the core and test equipment i f i b h d i
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
44 44
Nomenclature
EmbeddedICE macrocell is the debug hardware built into the processor that allows breakpoints and watchpoints to be set Synthesizable means that the processor core is supplied as source code that can be compiled into a form easily used by EDA tools
Also known as soft cores that are delivered in a HDL or gate netlist Can be used as building blocks within ASIC chip design or FPGA logic designs l i d i Soft cores follow the SPR design flow (synthesis, placement, and route) )
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
45 45
Architecture Evolution
46 46
ARMv5 to ARMv8
47 47
48 48
49 49
50 50
ARM7 Family y
The ARM7TDMI was the first of a new range of processors introduced in 1995
Licensed by many of the top semiconductor companies around the world
Characteristics
Good performance-to-power ratio The first core that introduced the Thumb instruction set
51 51
ARM9 Family y
ARM9 family was announced in 1997
The memory system has been redesigned to follow the Harvard architecture which separates the data D and instruction I buses
ARM946E-S and ARM966E-S execute v5TE instructions and support ETM (embedded trace macrocell) ARM926EJ-S was designed for small portable Javaenabled devices such as 3G phones
The fi Th first to include the Jazelle technology i l d h J ll h l
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
52 52
ARM10 Family y
The ARM10 announced in 1999 was designed for performance
6-stage pipeline and optional VFP (vector floating point) unit which adds a seventh stage to the ARM10 pipeline VFP increases floating-point performance and is compliant with the IEEE 754.1985 floating-point standard
53 53
ARM11 Family y
The ARM1136J-S announced was designed for high performance and power efficient applications
The first processor implementation to execute architecture ARMv6 instructions 8-stage pipeline with separate load-store and arithmetic pipelines Single instruction multiple data (SIMD) extensions for media processing, specifically designed to increase video processing performance
54 54
Cortex Series
Three "profiles" are defined
"Application" profile: Cortex-A series Provide an entire range of solutions for devices hosting a rich OS platform and user applications "Real-time" profile: Cortex-R series Designed for high p g g performance, dependability and errorp y resistance with highly deterministic behavior "Microcontroller" profile: Cortex-M series O ti i d f cost and power sensitive MCU and mixed-signal Optimized for t d iti d i d i l devices
55 55
Specialized Processors p
StrongARM was originally co-developed by Digital Semiconductor
Now exclusively licensed by Intel Corporation Popular for PDAs P l f PDA Harvard architecture with separate D + I caches 5 stage 5-stage pipeline No support for the Thumb instructions set
56 56
Specialized Processors p
Intels Xscale is a follow-on product to the StrongARM
Dramatic increase in performance Runs up to 1 GHz Harvard architecture and MMU H d hit t d
57 57
Processors since the ARM7TDMI have featured Thumb instruction set which have their own state set,
The "T" in "TDMI" indicates the Thumb feature
The space-saving comes from making some of the instruction operands implicit and limiting the number of possibilities compared to the ARM instructions executed in the ARM instruction set state
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
58 58
Thumb-2 extends both the ARM and Thumb instruction set with yet more instructions, including bit-field manipulation, table branches, and conditional execution ti
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
59 59
SIMD
SIMD (Single Instruction, Multiple Data) is a technique employed to achieve data level parallelism, as in a vector or array processor Example: the same value is being added to a large number of data points
It would be changing the brightness of an image Each pixel of an image consists of three values for the brightness of the red, green and blue portions of the color To change the brightness, the R G and B values are read from memory, a value is added (or subtracted) from it, and the l i dd d ( bt t d) f it d th resulting value is written back out to memory
60 60
Features
Single-cycle 16x16 and 32x16 MAC implementations New instructions to load and store pairs of registers, with enhanced addressing modes New CLZ instruction improves normalization in arithmetic operations and improves divide performance
61 61
NEON
The Advanced SIMD extension
A combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications At least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD
62 62
NEON
NEON instructions perform "Packed SIMD" processing:
Registers are considered as vectors of elements of the same data type Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision floating point Instructions perform the same operation in all lanes
63 63
TrustZone
The Security Extensions
Provide two virtual processors backed by hardware based access control Enable the application core to switch between two states, states referred to as worlds, in order to prevent information from leaking from the more trusted world to the less trusted world Each world can operate independently of the other while using the same core
Typical applications are to run a rich operating system in the less trusted world, and smaller securityspecialized code in the more trusted world p
The specific implementation details of TrustZone are proprietary and have not been publicly disclosed for review
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
64 64
TrustZone
Modes in an ARM for the Security Extensions
The entry to monitor can be triggered by software y gg y executing a dedicated Secure Monitor Call (SMC) instruction, or by a subset of the hardware exceptions
The IRQ, FIQ, external Data Abort, and external Prefetch Abort exceptions can all be configured to cause the p processor to switch into monitor mode
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr
65 65