Module3 ARM
Module3 ARM
MODULE 3
STEVE FURBER, ARM SYSTEM-ON-CHIP ARCHITECTURE (2ND
EDITION)
Contents
2
◻ RISC vs CISC
◻ The RISC philosophy concentrates on reducing the complexity of
instructions performed by the hardware because it is easier to
provide greater flexibility and intelligence in software rather than
hardware. As a result, a RISC design places greater demands on
the compiler.
◻ The traditional complex instruction set computer (CISC) relies
more on the hardware for instruction functionality, and
consequently the CISC instructions are more complicated.
ARM – Advanced RISC Machines
◻ First commercial use RISC microprocessor(Acorn RISC Machine)
◻ Developed by Acorn Computers Limited of Cambridge, England
between 1983 & 1985 . In 1990, Advanced RISC Machines
Limited(ARM Ltd.) was formed
◻ Used in PDA, cell phones, multimedia players, handheld game
console, digital TV and cameras
◻ ARM7: GBA, iPod
◻ ARM9: NDS, PSP, Sony Ericsson, BenQ
◻ ARM11: Apple iPhone, Nokia N93, N800
◻ 75% of 32-bit embedded processors
Architectural inheritance
ARM: From Berkeley RISC
◻ Features used:
🞑 Load – store architecture : where instructions that process data operate only on
registers and are separate from instructions that access memory
🞑 Fixed-length 32-bit instructions
🞑 3-address instruction formats
◻ Features rejected:
🞑 Register windows - ARM reduced to 16 from 32 in RISC
🞑 Delayed Branches - Prediction of branches and delays
🞑 Single-cycle execution of all instructions
ARM Architecture
◻ Based on Berkeley RISC design.
◻ Large uniform register file.
◻ Load/store architecture.
◻ Only operations on memory are copy memory values into
registers (load) or copy register values into memory (store).
◻ ARM does not support memory to memory operations
◻ Simple addressing modes
◻ Uniform and fixed-length instruction fields(32 bit)
◻ 3-address instruction formats.
8
9
◻ Load Store architecture: This means the access of memory is done via these two
instructions. The load instruction copies data from memory to the register file,
whereas the store instruction writes the data from the registers to the memory. All
the arithmetic and logical instructions access only the register file, thus keeping the
operand access less time consuming and simple.
◻ Data Size and Instruction Set :The data processing capability of the instruction
basically depends on the register width.The group of registers called the register
file typically holds a signed or unsigned 32/64 bit data depending on the ARM core
family. The ARM sign extend hardware converts the Byte (8 bits) and Halfword (16
bits or two bytes) intoWord (32 bits or four bytes) to store in the register file.
10
11
ARM : Visible Registers
User-level programs: 15, 32-bit GPRs(r0 – r14)
🞑 Program counter, PC(r15)
🞑 Current Program Status Register(CPSR)
Remaining registers
🞑 System-level programming
🞑 Handling Exceptions
ARM11 Registers
13
Current Program Status Register (CPSR)
Undefined
U n d e f i n e d
15
◻ The ARM has seven basic operating modes: six privileged mode and one non privileged mode (user). The
privileged modes are used to service interrupt , exception and access protected resources.
◻ User Mode : is an unprivileged mode under which most tasks run. This mode is used for executing application
programs.
◻ System mode : is a privileged mode to run user and system programs. Under this mode the user has all the access
permissions. This mode uses the same set of registers as the non privileged user mode.
◻ Supervisor Mode: This is the mode where in the OS kernel operates. This mode is entered on reset and when a
Software Interrupt instruction is executed.
◻ Two of the privilege modes are allocated for interrupt handling. In general, ARM has two levels of interrupts.
◻ Fast Interrupt Request (FIQ) : FIQ mode is entered when a high priority (fast) interrupt is raised. FIQ supports
channel communication for data transfer.
◻ Interrupt Request (IRQ) :IRQ is entered when a low priority (normal) interrupt is raised. This is a privileged
mode for general purpose interrupt handling.
◻ Abort : used to handle memory access violations. The abort mode handles data abort and pre-fetch abort.
◻ Undefined : used to handle undefined instructions that are not supported by the implementation.
The Memory System -Byte Ordering
16
Little vs Big Endian
18
Quantity Address divisible by Binary address ends in
Byte 1 Anything
Half-word (16 bits) 2 0
◻ All ARM instructions are 32 bits wide and are aligned on 4-byte boundaries in
memory.
◻ The load-store architecture.
◻ 3-address data processing instructions (that is, the two source operand registers and
the result register are all independently specified).
◻ Conditional execution of every instruction.
◻ Load and store multiple register instructions.
◻ The ability to perform a general shift operation and a general ALU operation in a
single instruction that executes in a single clock cycle.
◻ Open instruction set extension through the coprocessor instruction set, including
adding new registers and data types to the programmer's model.
◻ 16-bit compressed representation of the instruction set in the Thumb architecture.
The I/O System
23
'ADD' is simple addition, 'ADC' is add with carry, 'SUB' is subtract, 'SBC' is
subtract
with carry, 'RSB' is reverse subtraction and 'RSC' reverse subtract with carry.
Data Processing Instructions – Bit-wise
30
Logical & Register Movement
Data Processing Instructions – Comparison
31
◻ These instructions do not produce a result (which is therefore omitted from the
assembly language format) but just set the condition code bits (N, Z, C and V) in the
CPSR according to the selected operation.
◻ Any data processing instruction can set the condition codes (N, Z, C and V) if the
programmer wishes it to.
◻ The comparison operations only set the condition codes, so there is no option with them,
but for all other data processing instructions a specific request must be made. At the
assembly language level this request is indicated by adding an 's' to the opcode, standing
for 'Set condition codes'.
◻ The code performs a 64-bit addition of two numbers held in r0-r1 and r2-r3, using the C
condition code flag to store the intermediate carry:
Data Transfer Instructions
35
◻ Data transfer instructions move data between ARM registers and memory. There
are three basic forms of data transfer instruction in the ARM instruction set:
◻ Single register load and store instructions.
🞑 These instructions provide the most flexible way to transfer single data items between an
ARM register and memory. The data item may be a byte, a 32-bit word, or a 16-bit half-
word.
◻ Multiple register load and store instructions.
🞑 Large quantities of data to be transferred more efficiently. They are used for procedure entry
and exit, to save and restore workspace registers, and to copy blocks of data around
memory.
◻ Single register swap instructions.
🞑 A value in a reg. to be exchanged with a value in memory
36
◻ Register-indirect addressing
🞑 The ARM data transfer instructions are all based around register-indirect
addressing, with modes that include base-plus-offset and base-plus-index
addressing.
🞑 Register-indirect addressing uses a value in one register (the base register) as a
memory address and either loads the value from that address into another
register or stores the value from another register into that memory address.
🞑 LDR r0, [r1] ; r0 := mem32[r1]
🞑 STR r0, [r1] ; mem32[r1] := r0
LAB
EL
MOV r0, #0 ; initialize counter
LOOP …….
ADD r0, r0, #1 ; increment loop counter
CMP r0, #10 ; compare with limit
BNE LOOP ; repeat if not equal
.… ; else fall through
Control Flow Instructions – Branch
44
Conditions
Control Flow Instructions – Conditional
45
Execution
An unusual feature of the ARM instruction set is that conditional execution
applies
not only to branches but to all ARM instructions. A branch which is used to skip
a
small number of following instructions may be omitted altogether by giving
those
instructions the opposite condition.
⮚ Nested subroutine
BL SUB1
SUB1 BL SUB2
….
SUB2 …….
Control Flow Instructions – Subroutine
47
return instructions
◻ ⮚ Nested
subroutine
BL SUB1
……..
SUB2 …….
MOV pc, r14 ; copy r14 into r15 to return
Control Flow Instructions – Supervisor
Calls
48
51
52
3- Stage pipeline ARM Organization-
Components
◻ The register bank
🞑 Stores the processor state.
🞑 It has two read ports and one write port which can each be used to
access any register, plus an additional read port and an additional
write port that give special access to r15, the program counter.
◻ The barrel shifter
🞑 can shift or rotate one operand by any number of bits.
◻ The ALU
🞑 which performs the arithmetic and logic functions required by
theinstruction set. 53
3- Stage pipeline ARM Organization-
Components
◻ The address register and incrementer
🞑 which select and hold all memory addresses and generate sequential
56
Single Cycle Instruction
57
Multi-Cycle Instruction
58
Breaks in Pipeline
59
PC Generation
◻ Pipeline reads instruction operands one stage earlier in the
pipeline.
◻ Incremented PC value is fed directly into decode stage,
bypassing pipeline register between 2 stages.
60
ARM Instruction Execution – Data
61
processing Instructions
◻ Instruction taken to the Instruction decoder unit.
◻ Instruction decoded and based on the opcode/operand
🞑 Generates control signal that transfer info as to which register should be opened
up.
◻ Data is read from the memory and stored in the register
◻ Source operands (Rm, Rn )
🞑 read from the register file using Bus A,B respectively and result Rd is written back
◻ ALU (Arithmetic and Logic Unit) takes register values Rm,Rn from buses
A,B and computes a result
◻ Data processing instruction write the result in Rd to the register file
62
ARM Instruction Execution – Data
63
Transfer Instructions
◻ Load-Store instruction use ALU to generate an address to be held in the address
register and broadcast on the address bus
◻ Barrel Shifter
🞑 Register Rm alternatively can be preprocessed in the barrel shifter before it enters ALU
🞑 Generating wide range of expressions and addresses in the same cycle
◻ PC value in the address register fed into incrementer and the incremented value
written back to R15
◻ Eg: PC accessed instruction from address 1000
🞑 Instruction from location 1000 read
🞑 Incremented to 1004 and fed back to PC.
◻ Incremented address also written into the address register
🞑 To be used as address for next fetch
64
ARM Instruction Execution – Branching
65
Instructions
◻ Branch instructions compute the target address in the first cycle.
◻ A 24-bit immediate field is extracted from the instruction and then shifted
left two bit positions to give a word-aligned offset which is added to the
PC.
◻ The result is issued as an instruction fetch address, and while the
instruction pipeline refills the return address is copied into the link
register (r14) if this is required (that is, if the instruction is a 'branch with
link').
◻ The third cycle, which is required to complete the pipeline refilling, is also
used to make a small correction to the value stored in the link register in
order that it points directly at the instruction which follows the branch.
66
The ARM Coprocessor Interface
67
◻ The ARM may decide not to execute it, either because it falls in a branch
shadow or because it fails its condition code test. ARM will not assert cpi,
and the instruction will be discarded by all parties.
◻ The ARM may decide to execute it but no present coprocessor can take it so
cpa stays active. ARM will take the undefined instruction trap and use
software to recover, possibly by emulating the trapped instruction.
◻ ARM decides to execute the instruction and a coprocessor accepts it, but
cannot execute it yet.
◻ ARM decides to execute the instruction and a coprocessor accepts it for
immediate execution, cpi, cpa and cpb are all taken low and both sides
commit to com plete the instruction.
ARM Coprocessor Data Transfers & Pre-
emptive execution.
72
Assignment Topics
5-Stage Pipeline ARM Organization- Assignment Topics
74
Performance Improvement- Assignment Topics
76
How to handle the issues? - Assignment Topics
◻ 5-stage pipeline
🞑Breaking instruction into 5 reduces the maximum work in each
clock cycle
🞑Reduces the maximum work to be completed in a clock cycle
◻ Separate instruction and data memories
🞑Can be separate caches connected to a unified instruction and data
main memory
🞑Significant reduction in core’s CPI
77
Stages of 5-Stage Pipeline- Assignment Topics
78
◻ Fetch
🞑The instruction is fetched from memory and placed in the
instruction pipeline.
◻ Decode
🞑The instruction is decoded and register operands read from the
register file.
🞑There are three operand read ports in the register file, so most
ARM instructions can source all their operands in one cycle.
Stages of 5-Stage Pipeline- Assignment Topics
79
◻ Execute
🞑 An operand is shifted and the ALU result generated.
🞑 If the instruction is a load or store the memory address is computed in the
ALU.
◻ Buffer/data
🞑 Data memory is accessed if required. Otherwise the ALU result is simply
buffered for one clock cycle to give the same pipeline flow for all instructions.
◻ Write-back
🞑 The results generated by the instruction are written back to the register
file,including any data loaded from memory.
5-Satge Pipeline – Data Forwarding-
80
Assignment Topics
◻ Instruction execution is spread across three pipeline stages, the only
way to resolve data dependencies without stalling the pipeline is to
introduce forwarding paths.
◻ Data dependencies arise when an instruction needs to use the result of
one of its predecessors before that result has returned to the register file.
◻ Forwarding paths allow results to be passed between stages as soon as
they are available.
◻ Forwarding paths allow results to be passed between stages as soon as
they are available, and the 5-stage ARM pipeline requires each of the
three source operands to be forwarded from any of three
intermediate result registers.
81
ARM Implementation – Clocking Scheme-
82
Assignment Topics
ARM Implementation – Datapath
Timing - Assignment Topics
83
ARM1 ripple carry adder- Assignment Topics
84
ARM2 4- bit carry look-ahead adder-
85
Assignment Topics
ARM2 ALU Logic- Assignment Topics
86
ARM6 Carry-Select Adder- Assignment Topics
87
ARM6 ALU Organization- Assignment Topics
ARM9 Carry arbitration encoding-
89
Assignment Topics