Unit-11 ARM Processor Fundamentals_Technical
Unit-11 ARM Processor Fundamentals_Technical
Syllabus
Registers, Current Program Status Register, Pipeline, Exceptions, Interrupts and Vector Table,
Core Extensions, Architecture Revisions, Arm Processor Families
Contents
14.1 Introduction
Summer-16, 19 (CSE),
Winter-17, 19 (CSE), Marks7
14.2 Registers
14.3 Curent Program Status Register
14.4 Pipeline
14.5 Exceptions, Interrupts and Vector Table
14.6 Core Extensions
14.7 Architecture Revisions
14.8 ARM Processor Families
(14- 1)
Microprocessors and Microcontrollers 14-2 ARM Processor Fundamentals
ARM cores are very small. Typically they occupy a few square millimeters of chip
area.
Data
Instruction
decoder Sign extend
Write Read
Registerfitle
PC(r15) 015 Rd
Ra Rmt Bbus
Bbus Barrel shifter
Abus N
Acc Abus
MAC ALU
Result bus
Address Register
Incrementer
Address
Fig. 14.1.1 ARM core dataflow model
a load-store
Since ARM processor is basically a RISC processor, it uses
architecture. This means, it has two instruction types, load and store, for
transferring data in and out of the processor respectively.
LOAD : This instruction copies data from memory to registers in the processor
Core.
STORE: This instruction copies data from registers in the processor core to
memory.
The ARM processor instruction set does not include the instructions that directly
maripulate data in memory. The data processing is carried out only in registers.
Data bus
The data enters the ARM core through the data bus. The data is either in the form
of an instruction opcode or a data item.
Since Von Neumann architecture is used, data items and instructions share the
same bus. This is in contrast with Hardvard architecture which uses two different
buses.
Instruction decoder
This unit decodes the instruction opcode read from the memory and then the
instruction is executed.
Register file
" This is a bank of 32-bit registers used for storing data items.
Sign extend :
The ARM core is a 32-bit processor. So most instructions of the ARM processor
treat registers as holding signed or unsigned 32-bit values.
When the processor reads signed &-bit or 16-bit numbers from memory, the sign
extend hardware converts these numbers to 32-bit values and then places them in
a register file.
ALU (Arithmetic Logic Unit) and MAC (Multiply-Accumulate Unit)
Most of the ARM instructions are two operand instructions. The two source
registers R, and Rm are used to store these operands. These source operands are
read from the R, and Rm registers using the internal buses A and B respectively.
The ALUor MACreads the operand values from Rn and Rm registers via A and
B buses respectively, performs the operation and stores the computed result via
internalCbus in destination register, Ra and then to the register file.
" The load and store instructiorns generate address using ALUand stores it in the
address register.
Address register
This holds the address generated by the load and store instructions and places it
on the address bus.
Barrel shifter
The conternts of the Rm register alternatively can be preprocessed in the barrel
shifter before applying as an input to the ALU.
" A wide range of expressions and addresses can be calculated using the barrel
shifter and ALU.
Incrementer
For load and store instructions, the incrementer updates the contents of the
address register before the processor core reads or writes the next register value
from or to the consecutive memnory location.
" The processor core continues the execution of instruction. Only when an exception
or interrupt occurs, the normal execution flow is changed.
14.1.2.2 Data Types
ARM processors support the following data types :
Byte : 8 bits.
Halfword :16 bits (halfwords must be aligned to two-byte boundaries).
Word : 32-bits (words must be aligned to four-byte boundaries).
Note
Al three types are supported in ARM architecture versiorn 4 and above. Oniy
bytes and words were supported prior to ARM architecture version 4.
When any of these types is described as unsigned, the N-bit data value
represents a non-negative integer in the range 0to +2-1, using normal bìnary
format.
When any of these types is described as signed, the N-bit data value represents
an integer in the range -2N- to +2N--1, using two's complement format.
All data operations, for example ADD, are performed on word quantities.
" Load and store operations can transfer bytes, halfwords and words to and from
memory, automatically zero-extending or sign-extending bytes or halfwords as
they are loaded.
ARM instructions are exactly one word (and are aligned on a four-byte
boundary). Thumb instructions are exactly one halfword (and are aligned On a
two-byte boundary).
TECHNICAL PUBLICATIONs - An up thrust for knowledge
Microprocessors and Microcontrollers 14- 6 ARM Processor Fundamentals
Review Questions
2. FIQ (Fast Interrupt reQuest): This mode supports high speed interrupt
handling.
3. IRQ (Interrupt ReQuest) : This mode supports all other interrupt sources in a
system.
4. Abort : If an instruction or data is fetched from an invalid memory location,
an abort exception will be generated.
5. Undefined : If a fetched opcode is not an ARM instruction, an undefined
instruction exception will be generated.
6. User: This mode is used to run the application code. In the user mode we
cannot change the contents of CPSR (Current Program Status Register) and
modes can only be changed when an exception is generated. This mode is also
known as Unprivileged mode.
7. System : This mode is used for running operating system tasks. It uses the
same registers as user mode.
All the above modes, except user mode, are privilege modes.
For all operating modes, user registers r0 - r7 are common. However, FIQ mode
replaces the r0 - r7 registers by its own registers r8 to rl4. Similarly, each of the
TECHNICAL PUBLICATIONS " An up thrust for knowledge
Microprocessors and Microcontrollers 14-7 ARM Processor Fundamentals
other modes have their own r13 and r14 registers so that each operating mode has
its own unique stack pointer and link register.
14.2.2 Programming Model
Fig. 14.2.1 shows the programming model of ARM processor.
User and
system
r2
r4
Fast
interrupt
r7 request
r8
r9 9 fiq
r10
r11 Interrupt
r12 request Supervisor Undefined Abort
r15 pc
cpsr
" Out of 37 registers, 20 registers which are shown shaded in Fig. 14.2.1 are the
banked registers. Fig. 14.2.1 also shows which banked registers are used in which
mode. Banked registers of a particular mode are denoted by, r number_mode.
For example, supervisor, mode has banked registers r13_svc, r14 svc and spsr_Svc.
On the other hand, abort mode has banked registers r13-abt, r14-abt and spsr-abt.
Registers r8 to r12 have two banked physical registers each. The first group of
physical registers are referred to as r&_usr to rl2_usr and the second group as
r8 _fiq to r12_fiq. The r8_usr to r12_usr group is used in all processor modes other
than FIQ mode, and the other is used in FIO mode.
Registers r13 and r14 have six banked physical registers each. One is used in User
and System modes, while each of the remnaining five is used in one of the five
exception modes.
The registers r0 to r13 are orthogonal. This means, any instruction which you can
apply to ro, you can equally well apply to any of the r1 to r13 registers. This is
not the case with r14 and r15 registers.
Review Questions
Note User mode and System mode do not have an SPSR, because they are not
exception modes. All instructions which read write the SPSR are
Bits 31 28 27 87 6 5 4
NZCv UNDEFINED Mode
Thumb
state
Fig. 14.3.1 Format of cpsr and spsr
Control flags (Bits 0-7)
The control bits change when an exception arises and can be altered by software
only when the processor is in a privileged mode.
Bits 0-4 (Mode Select Bits) : Processor modes
These bits determine the processor mode as shown in Table 14.3.1.
System 11111
Undefined 11011
User 10000
Table 14.3.1 Processor mode
Bit 5(Thumb State Bit):
This bit gives the state of the core. The state of the core
determines which
instruction set is being executed.
There are three instruction sets, ARM, Thumb and Jazelle. One of the three
instruction set is active when the processor is in ARM state, Thumb state and
Jazelle state respectively.
Thumb
Thumb instructions are 16 bits (instead of the usual 32 bit). This allows for
greater
code density in places where memory is restricted.
The Thumb set can only address the first eight registers, and there are no
conditional execution instructions. Also, the Thumb cannot do a number of things
required for low-level processor exceptions, so the Thumb instruction set will
always come alongside the full ARM instruction set.
Exceptions and the like can be handled in ARM code, with Thumb used for the
more regular code.
Table 14.3.2 gives the comparison of ARM and Thumb instruction set features.
ARM (Cpsr T = 0) Thumb (cpsr T =1)
Instruction size 32-bit 16-bi.
Core instructions 58 30
The Jazelle instruction set is a closed instruction set and is not openly available.
An extra software licensed from both, ARM Limited and Sun Microsystems is
required to use Jazelle.
Bits 6 and 7 (Interrupt Masks) :
There are two interrupts available on the ARM processor core :
Interrupt Request (IRQ) and
Fast Interrupt Request (FIQ).
These are maskable interrupts and their masking is controlled by bits 6 and 7 of
cpsr. Bit 6(F) controls FIQ and bit 7T) controls IRQ.
When the bit is set to binary 1, the corresponding interrupt request is masked and
when bit is 0, the interrupt is available.
Condition code flags
These flags in the cpsr can be tested by most instructions to determine whether
the instruction is to be executed.
The condition code flags are usually modified by : Execution of a
comparison
instruction (CMN, CMP, TEQ or TST).
Execution of some other arithmetic, logical or move instruction, where the
destination register of the instruction is not r15. Most of these instructions have
both a flag-preserving and a flag-setting variant, with the latter being selected by
adding an S qualifier to the instruction mnemonic. Some of these instructions only
have a flag-preserving version.
Bit 27 (Saturation flag, Q)
This flag is available for the ARM processor cores which
include the DSP
extensions. If an overflow and/or saturation occurs in an enhanced DSP
instruction, the Q bit is set to 1. The flag is 'sticky' which means the hardware
only sets this flag. We need to write to the cpsr directly to clear the flag
bit.
Similarly, bit [27] of each spsr is a Q flag and is used to preserve and restore the
cpsr Q flag if an exception occurs.
Bit 28 (0verflow flag, V)
" It is set in one of two ways:
For an addition or subtraction, V is set to 1 if signed overflow occurred,
regarding the operands and result as two's complement signed integers.
. For
non-addition/subtractions, V is normally left unchanged
Review Questions
14.4 Pipeline
Fetching the next instruction while other instruction is in execution is called
pipelining. It is useful to speed up program execution. The ARM processor uses
this pipeline mechanism.
ARM7 uses a simple three-stage pipeline as shown in the Fig. 14.4.1.
Execute : In this stage the processor processes the instruction and stores (writes)
result in a register.
By overlapping the above stages of execution of the different instructions, the
speed of execution is increased. After filling the pipeline, each instruction takes a
single cycle to complete the execution. Thus increases the throughput.
Fig. 14.4.2 shows the 3-stagepipelined instruction execution.
Pipeline stages
Fetch Decode Execute
Time
Cycle 1 Instruction 1
Phase-1
Phase-2
DECODE
THUMB/ ARM INSTR.DECODER
REG DECODER REG READ
WRITE REGWRITE
Next pc
+4
I-cache Fetch
pc + 4
pc+8
I-decode
15
Decode
Register bank
+pc Immediate
LDMI field
STM
KMUL
Post-index
Shift Register
shift
Pre-index
Execute
ALU Forwarding
paths
MUX
B, BL
Mov pc
SUBS pc
Byte repl.
Memory
D-cache
Load/store
address
Byte rotate/
Sign extension
LDR PC
Write
Register write
Fig. 14.4.4 ARM9TDMI 5-stage pipeline organization
" The 5-stage pipeline stages are :
. Fetch : In this stage the processor fetches instruction from memory and places
in the instruction pipeline.
Decode : In this stage
1. The instruction is decoded and
1. What is pipelining ?
2. Explain the concept of pipeline used in ARM processor.
3. Explain the ARM 5-stage pipelining.
4. Draw the ARM-10 six-stage pipeline.
5. With diagram explain the various blocks in a 3stage pipeline of ARM processor organization.
6. Explain the pipeline mechanism in (Advanced RISC Machine) ARM processor.
Review Questions
ARM Core
Data Instructiorn D
TCM TCM
ARMcore
The ARM processor core supports several different types of closely coupled
coprocessors, including floating point, SIMD, and systme control and cache
maintenance.
" Each coprocessor present in an ARM system has a unique 4-bit D code.
Coprocessor instructions contain afield for the ID code of the processor on which
they will execute.
One of the primary goals of the ARM coprocessor interface is not to slow down
the CPU core. Beyond checking to see if a coprocessor instruction is coded for an
existing coprocessor, the core does not spend time sorting out coprocessor
instructions within its own pipeline. The core sends all the instructions it fetches
from memory directly to all coprocessors. The coprocessor decodes all incoming
instructions, which include both ordinary ARM instructions as well as coprocessor
instructions. During the decoding stage, the coprocessor rejects any instructions
that are not recognised as its own. This includes both ARM instructions and
instructions coded for other coprocessors. The coprocessor recongises its own
instructions and adds only those to its internal execution pipeline. The coprocessor
then sends a signal back to the core indicating that it has accepted an instruction.
The coprocessor can be accessed through a group of dedicated ARM instructions
that provide a load-store type interface.
The coprocessor can also externd the instruction set by providing a specialized
group of new instructions. For example, floating-point instructions can be added to
the standard ARM instruction set by attaching vector floating-point (VFP)
coprocessor.
Review Questions
" M- Fast multiplier : Older ARM processors used a small and simple multiplier
unit. This multiplier unit required more clock cycles to complete a single
multiplication. With the introduction of fast multiplier unit, the clock cycles
required for multiplication are significantly reduced and modern ARM processors
are capable of calculating a 32-bit product in a single cycle.
" I- Embedded ICE Macrocell : ARM processors have on-chip debug hardware that
allows the processor to set breakpoints and watchpoints.
" E - Enhanced instructions : ARM processors with this mode will support the
extended DSP instruction set for high performance DSP applications. With these
extended DSP instructions, the DSP performance of the ARM processors can be
increased without high clock frequencies.
" J -Jazelle : ARM processors with Jazelle technology can be used in accelerated
execution of Java bytecodes. Jazelle DBX or Direct bytecode eXecution is used in
mobile phones and other consumer devices for high performance Java execution
without affecting memory or battery.
F - Vector floating-point unit : The floating point architecture in ARM processors
provide execution of floating point arithmetic operations. The Dynamic Range and
precision offered by the floating point architecture in ARM processors are used in
many real time applications in the industrial and automotive areas.
" S- Synthesizable : The ARM processor core is available as source code. This
software core can be compiled into a format that can be easily understood by the
EDA tools. Using the processor source code, it is possible to modify the
architecture of the ARM processor.
The families are based on the ARM7, ARM9, and ARM11 cores. The postfix
numbers 7,9 and 11 indicate different core designs.
14.8.1 ARM7
ARM7 family is introduced in 1994 (ARM7TDMI, ARM7EJ-S, ARM720T)
This family has been immensely successful and has established ARM as the
architecture of choice in digital word.
Over the years more than 10 billion ARM7 processor family based devices have
powered a verity of cost and power sensitive applications.
Due the availability of more advanced ARM processors, the ARM7 processor
family (ARM7 TDMI) is not recommended for new designs.
14.8.1.1 Features of ARM7
1. Pipeline depth :Three stage (Fetch, Decode, Execute)
2. Operating frequency : 80 MHz
3. Power consumption : 0.06 mW/MHz.
4. MIPS/MHz : 0.97
5. Architecture used : Von-Neumann
6. MMUIMPU: Not present
7. Cache memory : Not present
8. Jazelle instruction : Not present
9. Thumb instruction : Yes (16 bit instruction set)
10. ARM instruction set : Yes (32 bit)
11. ISA (Instruction set Architecture) :V4T (4 TH Version)
12. Interrupt controller : Not Present
13. ISR entry : Non deterministic ISR entry
14. Power Management : No in built power management
15. Instruction Set Performance v/s code size : Optimal performance code size balance
requires interworking between ARM & Thumb code
16. Ease of application porting from one device to another : Lack of standardization
inhibits application porting.
They support the optional embedded trace macrocell (ETM), which allows a
developer to trace instruction and data execution in real time on the processor.
. The ARM946E-S incudes TCM (Tightly Coupled Memory), cache, and an MPU.
The sizes of the TCM and caches are configuralble. It is designed for use in
embedded control applications that require deterministic real-time response.
On the other hand, the ARM966E does not have the MPU and cache extensions
but does have configurable TCMs.
" ARM926EJ-S
. It is synthesizable processor core, announced in 2000.
It is designed for use in small portable Java-enabled devices such as 3G phones
and Personal Digital Assistants (PDAs).
Supports the Jazelle technology, which accelerates Java bytecode execution.
It features an MMU, configurable TCMs, and D +Icaches with zero or nonzero
wait state memories.
" Architecture used : Harvard. It separates the data D and instruction I buses.
MMUIMPU: Present
Cache Memory : Present (separate 32 K)
" ARM Thumb instruction : Support both
" It also supports an optional Vector Floating-Point(VFP) unit, which adds a seventh
stage to the ARM10 pipeline. The VFP significantly increases floating-point
performance and is compliant with the IEEE 754.1985 floating-point standard.
ARM10 is the first ARM core to support architecture version 5TE. This is a
superset of version 4T, adding BLX (branch-with-link and toggle Thumb/ARM
mode), CLZ (Count Leading Zeroes, useful for DSP operations), and BRK
(software breakpoint).
" Production ARM10 processors actually support v5TE, which adds signal
processing (saturate-on-overflow) instructions.
ARM1020E
The ARM1020E is the first processor to use an ARM10E core.
Like the ARM9E, it includes the enhanced E instructions.
" It has separate 32K D+I caches,optional vector floatingpoint unit, and an MMU.
" The ARM1020E also has a dual 64-bit bus interface for increased performance.
ARM1026EJ-S
ARM1026EJ-S is very similar to the ARM926EJ-S but with both MPUand MMU.
This processor has the performance of the ARM10 with the flexibility of an
ARM926EJ-S.
Review Questions