100% found this document useful (2 votes)
13K views

02 ARM Processor Fundamentals

ARM Processor Fundamentals minsoo Ryu Department of Computer Science and engineering, Hanyang University. Registers and current program status register Pipeline Exceptions, interrupts, and the vector table core extensions. Register pipeline is a way of displaying current program status.

Uploaded by

bengaltiger
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
13K views

02 ARM Processor Fundamentals

ARM Processor Fundamentals minsoo Ryu Department of Computer Science and engineering, Hanyang University. Registers and current program status register Pipeline Exceptions, interrupts, and the vector table core extensions. Register pipeline is a way of displaying current program status.

Uploaded by

bengaltiger
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

ARM Processor Fundamentals

Minsoo Ryu Department of Computer Science and Engineering Hanyang University [email protected]
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

Topics Covered p
ARM Processor Fundamentals ARM Core Dataflow Model Registers and Current Program Status Register Pipeline Exceptions, Interrupts, and the Vector Table Core Extensions ARM Architecture Revisions and Families

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

22

ARM Core Dataflow Model


An ARM core can be viewed as functional units connected by data buses The data may be an instruction or a data item
The figure shows a Von Neumann implementation of ARM (data items and instructions share the same bus) Harvard implementations of the ARM use two different buses
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

33

ARM Core Dataflow Model


The instruction decoder translates instructions Data items are placed in the register file
A storage bank made up of 32-bit registers Most instructions treat the registers as holding signed or unsigned 32-bit values The sign extend hardware converts signed 8-bit and 16-bit 8 bit 16 bit numbers into 32-bit values

ARM instructions typically have two source registers, Rn and Rm, and a single result or destination register, Rd
Source operands are read from the register file using the S d df th i t fil i th internal bus

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

44

ARM Core Dataflow Model


The ALU (arithmetic logic unit) or MAC (multiplyaccumulate unit) takes the register values Rn and Rm from the A and B buses and computes a result
Data D t processing unit write the result in Rd di i it it th lt i directly to the tl t th register file Load and store instructions use the ALU to generate an g address to be held in the address register and broadcast on the Address bus

For load and store instructions the incrementer updates the address register before the core reads or writes the next register value from or to the next sequential memory location

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

55

Registers and Current Program Status Register


General purpose registers hold either data or an address The figure shows the active registers available in user mode
All the registers shown are 32 bits in size 16 data registers + 2 processor status registers Three registers, r13 r14 and r15 are assigned to a registers r13, r14, r15, particular task or special function (the shaded registers)

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

66

Special Purpose Registers p p g


Register r13 is traditionally used as the stack pointer (sp) and stores the head of the stack in the current processor mode Register r14 is called the link register (lr) and is where the core puts the return address whenever it calls a subroutine Register r15 is the program counter (pc) and contains the address of the next instruction to be fetched by the processor

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

77

Special Purpose Registers p p g


Depending upon the context, registers r13 and r14 can also be used as general-purpose registers, which can be particularly useful since these registers are banked during a processor mode change
However, it is dangerous to use r13 as a general register when the processor is running ant form of operating system p g p g y because operating systems often assume that r13 always points to a valid stack frame

Registers r0 to r13 are orthogonal


Any instruction that you can apply to r0 you can equally well apply to any other registers pp y y g

There are instructions that treat r14 and r15 in a special way
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

88

Current Program Status Register g g


The ARM core uses the cpsr to monitor and control internal operations
Divided into four fields: flags, status, extension, and control In I current designs th extension and status fields are td i the t i d t t fi ld reserved for future use

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

99

Current Program Status Register g g


The control field
Processor mode State Interrupt mask bits I t t k bit

The flags field


Condition flags

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

10 10

Processor Modes
The processor mode determines which registers are active and the access rights to the cpsr register itself
A privileged mode allows full read-write access to the cpsr A nonprivileged mode only allows read access to the control i il d d l ll d t th t l field in the cpsr, but still allows full read-write access to the condition flags

Seven processor modes


Six privileged modes Abort, fast interrupt request, interrupt request, supervisor, system, system and undefined One nonprivileged mode user
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

11 11

Processor Modes
Abort mode
When there is a failed attempt to access memory

Fast interrupt request and interrupt request modes


Correspond to the two interrupt levels

Supervisor mode
The processor is in after reset (when power is applied) and is generally the mode that an operating system kernel operates i ll th d th t ti t k l t in

System mode
Special version of user mode that allows full read-write access to p the cpsr

Undefined mode
When the processor encounters an instruction that is undefined or not supported by the implementation

User mode
Used for programs and applications
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

12 12

Processor Modes
Mode Abort Fast Interrupt Interrupt request Supervisor System Undefined U d fi d User Abbreviation Privileged abt fiq irq svc sys und d usr Yes Yes Yes Yes Yes Yes Y No Bits [4:0] 10111 10001 10010 10011 11111 11011 10000
13 13

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

Banked Registers g
There are 37 registers in the register file

20 registers are hidden from a program at different times These registers are g called banked registers and are identified by the shading in the program

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

14 14

Banked Registers g
Banked registers are available only when the processor is in a particular mode
Abort mode has banked registers r13_abt, r14_abt, and spsr_abt spsr abt

Every processor mode except user mode can change mode by writing directly to the mode bits of the cpsr A banked register maps one-to-one onto a user mode register g
If you change processor mode, a banked register from the new mode will replace an existing register
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

15 15

Banked Registers g
When the processor is in the interrupt request mode, the instructions still access registers named r13 and r14
However, th H these registers are the banked registers r13_irq and i t th b k d i t 13 i d r14_irq The user mode registers r13 and r14 are not affected by the g y instruction referencing these registers A program still has normal access to the other registers r0 to r12

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

16 16

Mode Change g
Two ways of mode change
By a program that writes directly to the cpsr By hardware when the core responds to an exception or interrupt

The following exceptions and interrupts cause a mode change


Reset, interrupt request, fast interrupt request, software , p q , p q , interrupt, data abort, prefetch abort, and undefined instruction Exceptions and interrupts suspend the normal execution of sequential instructions and jump to a specific location

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

17 17

Mode Change from User to Interrupt Request


The saved program status register (spsr) appears in interrupt request mode
The cpsr is copied into spsr_irq To t T return back to the user mode, a special instruction is used b k t th d i li t ti i d that instructs the core to restore the original cpsr from the spsr_irq and bank in the user registers r13 and r14

Note that the spsr can only be modified and read in a privileged mode
There is no spsr available in user mode

Note that the cpsr is not copied into the spsr when a mode change forced due to a program writing directly to the cpsr
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

18 18

States and Instruction Sets


The state of the core determines which instruction set is being executed (three instruction sets)
ARM: active in ARM state Thumb: ti i Thumb t t Th b active in Th b state Jazelle: active in Jazelle state

The jazelle J and Thumb T bits in the cpsr reflect the state of the processor
When both J and T bits are 0, the processor is in ARM state , p and executes ARM instructions

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

19 19

Jazelle: Jazelle: ARM Architecture Extensions for Java


ARM has introduced a set of extensions to the ARM architecture that will allow an ARM processor to directly execute Java byte code alongside exiting operating systems, middleware and application code systems To execute Java bytecodes you require the Jazelle bytecodes, technology plus a specially modified version of the Java virtual machine
It is important to note that the hardware portion of Jazelle only supports a subset of the Java bytecodes The rest are emulated in software

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

20 20

Jazelle: Jazelle: ARM Architecture Extensions for Java


There is a single new ARM instruction: BXJ Rm for entering Java state
This first performs a test on one of the condition codes If th condition is met, it then stores the current PC, puts the the diti i t th t th t PC t th processor into Java state, branches to a target address specified in Rm and begins executing Java byte codes

Interrupts are handled as normal, and cause an immediate return from Java state to ARM state to run the interrupt handler
At the end of the interrupt routine, the normal return th d f th i t t ti th l t mechanism will return the processor to Java state

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

21 21

States and Instruction Set Features

ARM Thumb Instruction Size Core instructions cpsr 32-bit 58 T=0 J 0 J=0 16-bit 30 T=1 J 0 J=0

Jazelle 8-bit Over 60% of Java : H/W The rest : S/W T=0, J=1

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

22 22

Interrupt Masks p
Interrupt masks are used to stop specific interrupt requests from interrupting the processor
Two interrupt levels: interrupt request (IRQ) and fast interrupt request (FIQ) The I bit in the cpsr masks IRQ when set to binary 1 The F bit in the cpsr masks FIQ when set to binary 1 p y

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

23 23

Condition Flags g
Condition flags are updated by comparison and the result of ALU operations that specify the S instruction suffix
If a SUBS subtract instruction results in a register value of bt ti t ti lt i i t l f zero, then the Z flag in the cpsr is set

Condition flags
N : Negative result from ALU g Z : Zero result from ALU C : ALU operation Carried out V : ALU operation overflowed Q : Overflow & Saturation ARMv5TEJ only
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

24 24

Conditional Execution
Conditional execution controls whether or not the core will execute an instruction
The condition attribute is postfixed to the instruction mnemonic, mnemonic which is encoded into the instruction Priori to execution, the processor compares the condition attribute and with the condition flags in the cpsr If they match, then the instruction is executed; otherwise the instruction is ignored

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

25 25

Conditional Execution
Mnemonic EQ NE CS/HS CC/LO MI PL VS Equal Not equal Carry set / unsigned higher or same Carry clear / unsigned l C l i d lower Minus / negative Plus / positive or zero Overflow
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

Name

Condition flags Z z C c N n V
26 26

Conditional Execution
Mnemonic HI LS GE LT GT LE AL Name Unsigned higher Unsigned lower or same Signed greater than or Si d t th equal Signed less than S dl h Signed greater than Always (unconditional) Condition flags zC Z or c NV or nv Nv or nV NzV or nzv Ignored
27 27

Signed less than or equal Z or Nv or nV

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

Pipeline p
The mechanism a RISC processor uses to execute instructions in parallel
ARM 7

ARM 9

ARM 10

As the pipeline length increases, the amount of work done at each stage is reduced, which allows the processor attain a hi h operating frequency tt i higher ti f
This in turn increases the performance This also increases the latency
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

28 28

Pipeline p
The ARM9 adds a memory and writeback stage
1.1 Dhrystone MIPS per MHz Increase in instruction throughput by around 13% compared with an ARM7

The ARM10 adds an issue stage


1.3 Dhrystone MIPS per MHz 34% more throughput than an ARM7 g p

ARM9 and ARM10 use the same pipeline executing characteristics as an ARM7
Code written for the ARM7 will execute on an ARM9 or ARM10
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

29 29

Exceptions, Exceptions, Interrupts, and the Vector Table


When an exception or interrupt occurs, the processor sets the pc to a specific memory address
The address is within a special address range called the vector table The entries in the vector table are instructions that branch to specific routines designed to handle a particular exception or interrupt

The memory map address 0x00000000 is reserved for the vector table, a set of 32 bit words table 32-bit
On some processors the vector table can be optionally located at a higher address in memory (starting at the offset g y( g 0xffff0000) Operating systems such as Linux and MSs embedded products can take advantage of this feature
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

30 30

Exception Vectors p
Reset: When power is applied Undefined instruction: When the processor cannot decode an instruction Software interrupt: When the processor meet an SWI instruction Prefetch abort: When the processor attempts to fetch an instruction from an address without the correct access permission i i Data abort: When an instruction attempts to access data memory without the correct access permissions p Interrupt request (IRQ): When an external hardware interrupts the normal execution flow of the processor Fast i t F t interrupt request (FIQ): When an hardware requiring faster t t (FIQ) Wh h d ii f t response times interrupts the normal execution flow of the processor
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

31 31

Exception Vector Table p


Exception Reset Undefined instruction Software interrupt Prefetch abort Data abort Reserved Interrupt request Fast interrupt request Shorthand RESET UNDEF SWI PABT DABT IRQ FIQ Vector address High address 0x00000000 0x00000004 0x00000008 0x0000000c 0x00000010 0x00000014 0x00000018 0x0000001c 0xffff0000 0xffff0004 0xffff0008 0xffff000c 0xffff0010 0xffff0014 0xffff0018 0xffff001c
32 32

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

Core Extensions
There are some hardware extensions that are standard components placed next to the ARM core
Cache and tightly coupled memory Memory management unit M t it Coprocessors

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

33 33

Cache and Tightly Coupled Memory g y p y


The cache is a block of memory placed between main memory and the core
With a cache the processor core can run for the majority of the time without having to wait for data from slow external memory Most ARM-based embedded systems use a single-level cache internal to the processor

ARM has two forms of cache


The fi t i f Th first is found attached to the V d tt h d t th Von Neumann-style N t l (Princeton) cores It combines both data and instruction into a single unified g cache The second is attached to the Harvard-style cores It has separate caches f data and instruction h t h for d t d i t ti
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

34 34

A Simplified Von Neumann Architecture with Cache

The logic and control is the glue logic that connects the memory system to the AMBA bus
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

35 35

A Simplified Harvard Architecture with TCM

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

36 36

ARM1136JFARM1136JF-S Processor Block Diagram g

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

37 37

Tightly Coupled Memory (TCM) g y p y( )


A cache provides an overall increase in performance but at the expense of predictable execution But for real-time systems, it is paramount that code execution is deterministic
The ti Th time taken for loading and storing instructions and data t k f l di d t i i t ti dd t must be predictable This is achieved using a form of memory called TCM g y TCM is fast SRAM located close to the core and guarantees the clock cycles required to fetch instructions or data TCMs TCM appear as memory in the address map and can be i th dd d b accessed as fast memory

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

38 38

Memory Management Unit y g


Three types of memory management hardware
Non-protected memory Small embedded systems that require no protection from rouge application MPU (Memory Protection Unit) Simple systems that uses a limited number of memory regions p y y g The memory regions are controlled with a set of coprocessor registers, and each region is defined with specific access permissions MMU (Memory Management Unit) Uses a set of translation tables to support a virtual-to-physical pp p y address map More sophisticated platform operating systems that support multitasking
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

39 39

Coprocessors p
A coprocessor extends the processing features of a core by extending the instruction set or by providing configuration registers
More than one coprocessors can be added to the ARM core M th b dd d t th via the coprocessor interface

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

40 40

Coprocessors p
The coprocessor can extend the instruction set by providing a specialized group of new instructions
Vector floating-point (VFP) operations can be added These new instructions are processed in the decode stage Th i t ti d i th d d t If the decode stage sees a coprocessor instruction, then it offers it to the relevant coprocessor p But if the coprocessor is not present or doesnt recognize the instruction, the ARM takes an undefined instruction exception

The coprocessor can also be accessed through configuration registers


Coprocessor 15 registers can be used to control cache, TCMs, and memory management
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

41 41

Architecture Revisions and Families


The ISA has evolved to keep up with the demands of the embedded market
This evolution has been carefully managed by ARM, so that code written to execute on an earlier architecture will also execute on a later revision of the architecture

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

42 42

Nomenclature

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

43 43

Nomenclature
All ARM cores after the ARM7TDMI include the TDMI features even though they may not include those letters The processor family is a group of processor implementations that share the same characteristics
The ARM7TDMI, ARM740T, and ARM720T all share the same family and belong to the ARM7 family y g y

JTAG is described by IEEE 1149.1 Standard Test y Access Port and boundary scan architecture
It is a serial protocol used by ARM to send and receive debug information between the core and test equipment i f i b h d i
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

44 44

Nomenclature
EmbeddedICE macrocell is the debug hardware built into the processor that allows breakpoints and watchpoints to be set Synthesizable means that the processor core is supplied as source code that can be compiled into a form easily used by EDA tools
Also known as soft cores that are delivered in a HDL or gate netlist Can be used as building blocks within ASIC chip design or FPGA logic designs l i d i Soft cores follow the SPR design flow (synthesis, placement, and route) )
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

45 45

Architecture Evolution

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

46 46

ARMv5 to ARMv8

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

47 47

ARM Architecture and Family y

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

48 48

ARM Processor Families


ARM families
ARM7, ARM9, ARM10, ARM11, and Coretex cores The postfix numbers indicate different core designs ARM8 was d developed but was soon superseded l db t d d

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

49 49

ARM Processor Variants

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

50 50

ARM7 Family y
The ARM7TDMI was the first of a new range of processors introduced in 1995
Licensed by many of the top semiconductor companies around the world

Characteristics
Good performance-to-power ratio The first core that introduced the Thumb instruction set

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

51 51

ARM9 Family y
ARM9 family was announced in 1997
The memory system has been redesigned to follow the Harvard architecture which separates the data D and instruction I buses

ARM920T was the first processor in the ARM9 family


A separate 16K/16K D + I cache and an MMU

ARM946E-S and ARM966E-S execute v5TE instructions and support ETM (embedded trace macrocell) ARM926EJ-S was designed for small portable Javaenabled devices such as 3G phones
The fi Th first to include the Jazelle technology i l d h J ll h l
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

52 52

ARM10 Family y
The ARM10 announced in 1999 was designed for performance
6-stage pipeline and optional VFP (vector floating point) unit which adds a seventh stage to the ARM10 pipeline VFP increases floating-point performance and is compliant with the IEEE 754.1985 floating-point standard

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

53 53

ARM11 Family y
The ARM1136J-S announced was designed for high performance and power efficient applications
The first processor implementation to execute architecture ARMv6 instructions 8-stage pipeline with separate load-store and arithmetic pipelines Single instruction multiple data (SIMD) extensions for media processing, specifically designed to increase video processing performance

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

54 54

Cortex Series
Three "profiles" are defined
"Application" profile: Cortex-A series Provide an entire range of solutions for devices hosting a rich OS platform and user applications "Real-time" profile: Cortex-R series Designed for high p g g performance, dependability and errorp y resistance with highly deterministic behavior "Microcontroller" profile: Cortex-M series O ti i d f cost and power sensitive MCU and mixed-signal Optimized for t d iti d i d i l devices

Profiles are allowed to subset the architecture


For example, the ARMv6-M profile (used by the Cortex-M0) is a subset of the ARMv7-M profile (it supports fewer instructions) i t ti )
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

55 55

Specialized Processors p
StrongARM was originally co-developed by Digital Semiconductor
Now exclusively licensed by Intel Corporation Popular for PDAs P l f PDA Harvard architecture with separate D + I caches 5 stage 5-stage pipeline No support for the Thumb instructions set

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

56 56

Specialized Processors p
Intels Xscale is a follow-on product to the StrongARM
Dramatic increase in performance Runs up to 1 GHz Harvard architecture and MMU H d hit t d

SC100 is at the other end of the performance spectrum


Designed specifically for low-power security applications g p y p y pp The SC100 is the first SecureCore and is based on an ARM7TDMI with an MPU Small d h l S ll and has low voltage and current requirements lt d t i t Attractive for smart card applications

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

57 57

Thumb Instruction Set


A compact 16-bit encoding for a subset of the ARM instruction set
The purpose is to improve compiled code-density

Processors since the ARM7TDMI have featured Thumb instruction set which have their own state set,
The "T" in "TDMI" indicates the Thumb feature

The space-saving comes from making some of the instruction operands implicit and limiting the number of possibilities compared to the ARM instructions executed in the ARM instruction set state
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

58 58

Thumb 2 Instruction Set


Thumb-2 technology made its debut in the ARM1156 core, announced in 2003
Thumb-2 extends the limited 16-bit instruction set of Thumb with additional 32-bit instructions to give the instruction set 32 bit more breadth, thus producing a variable-length instruction set A stated aim for Thumb-2 is to achieve code density similar to Thumb with performance similar to the ARM instruction set on 32-bit memory y

Thumb-2 extends both the ARM and Thumb instruction set with yet more instructions, including bit-field manipulation, table branches, and conditional execution ti
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

59 59

SIMD
SIMD (Single Instruction, Multiple Data) is a technique employed to achieve data level parallelism, as in a vector or array processor Example: the same value is being added to a large number of data points
It would be changing the brightness of an image Each pixel of an image consists of three values for the brightness of the red, green and blue portions of the color To change the brightness, the R G and B values are read from memory, a value is added (or subtracted) from it, and the l i dd d ( bt t d) f it d th resulting value is written back out to memory

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

60 60

The ARM DSP Extensions and SIMD


The ARM DSP instruction set extensions increase the DSP processing
Optimized for a broad range of software applications including servo motor control Voice over IP (VOIP) and video control, & audio codecs

Features
Single-cycle 16x16 and 32x16 MAC implementations New instructions to load and store pairs of registers, with enhanced addressing modes New CLZ instruction improves normalization in arithmetic operations and improves divide performance

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

61 61

NEON
The Advanced SIMD extension
A combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications At least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD

NEON is included in all Cortex-A8 devices but is optional in Cortex-A9 d i i li C A9 devices

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

62 62

NEON
NEON instructions perform "Packed SIMD" processing:
Registers are considered as vectors of elements of the same data type Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision floating point Instructions perform the same operation in all lanes

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

63 63

TrustZone
The Security Extensions
Provide two virtual processors backed by hardware based access control Enable the application core to switch between two states, states referred to as worlds, in order to prevent information from leaking from the more trusted world to the less trusted world Each world can operate independently of the other while using the same core

Typical applications are to run a rich operating system in the less trusted world, and smaller securityspecialized code in the more trusted world p
The specific implementation details of TrustZone are proprietary and have not been publicly disclosed for review
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

64 64

TrustZone
Modes in an ARM for the Security Extensions

The entry to monitor can be triggered by software y gg y executing a dedicated Secure Monitor Call (SMC) instruction, or by a subset of the hardware exceptions
The IRQ, FIQ, external Data Abort, and external Prefetch Abort exceptions can all be configured to cause the p processor to switch into monitor mode
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr

65 65

You might also like