The ARM Processor
CSC 522 Embedded Systems Summer, 2006
ARM What is it?
ARM stands for Advanced RISC Machines An ARM processor is basically any 16/32bit microprocessor designed and licensed by ARM Ltd, a microprocessor design company headquartered in England, founded in 1990 by Herman Hauser A characteristic feature of ARM processors is their low electric power consumption, which makes them particularly suitable for use in portable devices. It is one of the most used processors currently on the market
Examples of ARM Based Products
The Toshiba 46HM94 46-inch Television
The Nano IPod
Samsung S3FJ9SK Smartcard IC
The Motorola E680i is one of the latest mobile handsets
History of ARM
Acorn Computers: a British computer company founded in Cambridge, England, in 1978, by Hermann Hauser and Chris Curry. The company produced a number of computers which were especially popular in the UK. These included the Acorn Electron, the BBC Micro and the Acorn Archimedes. Acorn's BBC Micro computer dominated the UK educational computer market during the 1980s and early 1990s. VLSI Technology, Inc. produced the first ARM processor based on Acorn designs. ARM based PCs did not sell well, Acorn acquired by Olivetti in 1985 ARM contracted to develop for Apple for the Apple Newton Handheld built by VLSI. The company was broken up into several independent operations in 2000, one of which, notably, was ARM Holdings ARM holdings primary business model is to license its RISC based designs to other manufactures.
General Computer Architecture Idealized Baseline
A Stored-Program digital computer keeps its instructions and data in the same memory system, allowing the instructions to be treated as data when necessary. A Stored-Program computer is sometimes reflected by its configuration as a desktop machine where the user runs different programs at different times. Other times a Stored-Program computer is reflected by the same processor being used in a range of different applications, each with a fixed program, i.e. an embedded system.
General Computer Architecture Idealized Baseline (cont.)
MU0 University of Manchester. (Basically a simplified MARIE computer) Basic components Program Counter (IR) Accumulator (ACC) Arithmetic-Logic Unit (ALU) Instruction Register (IR) An instruction set
MU0 data path example
MU0 instruction format
MU0 Instruction Set
General Computer Architecture Definitions
Semantic Gap The distance, in implementation terms, between a high-level language construct and a machine instruction. Compiler A computer program that translates a high-level language program into a sequence of machine instructions. Processor Design Trade-offs Processor design is to define an instruction set that supports the functions that are useful to the programmer while at the same time allowing an implementation that is as efficient as possible. Good processor design should define the instruction set to be a good compiler target rather than something that the programmer will use directly.
General Computer Architecture Two Types of Instruction Sets
Complex Instruction Set Computers(CISC) Intended to reduce the semantic gap. Single instruction procedure entries and exits Variable length instruction sets with many formats Complex sequence of operations over many clock cycles Processors based on CISC were sold on the sophistication and number of their addressing modes, data types, etc Developed in the 1970s when computers had slow main memory so processors were controlled by faster ROMs Frequently used operations are drawn from ROM as microcode sequences rather than having instructions pulled from main memory Reduced Instruction Set Computers(RISC) Pipeline execution Starting a second instruction before the first one has finished A fixed (32 bit) instruction size with few formats. A load-store architecture where instructions that process data operate only on registers and are separate from instructions that access memory A large register bank of 32-bit registers, all of which can be used for any purpose, to allow the load-store architecture to operate efficiently Hard-wired instruction decode logic Single-cycle execution
RISC Architecture Advantages/Disadvantages
Advantages
A smaller die size A simpler processor requires fewer transistors and less silicon area. A shorter development time Less design effort and therefore a lower cost A higher performance Simpler instructions are executed faster.
Disadvantages
Poor code density compared with CISCs Doesnt execute x86 code
RISC Power-efficient Processing
Principals of low-power circuit design
Minimize the power supply voltage Minimize the circuit activity Minimize the number of gates Minimize the clock frequency
RISC Power-efficient Processing (cont.)
Strategy of low-power circuit design
Minimize voltage
Choose the lowest clock frequency that delivers the required performance, then set the poser supply voltage as low as is practical. Off-chip capacitances are much higher than on-chip loads Avoid clocking unnecessary circuit functions and to employ sleep modes where possible
Minimize off-chip activity
Minimize on-chip activity
ARM Architecture
RISC features incorporated by ARM
A load-store Architecture Fixed-length 32-bit instructions 3-address instruction formats Pipelining
RISC features not incorporated into ARM
Delayed branches
Single-cycle execution of all instructions
ARM Architecture Instruction Set Foundation
Visible Registers
User Addressable System Addressable
ARM Architecture Instruction Set Foundation
Current Program Status Register
Used in user-level programs to store the condition code bits.
N: Negative; the last ALU operation which changed the flags produced a negative result Z: Zero; the last ALU operation which changed the flags produced a zero result C: Carry; the last ALU operation which changed the flags generated a carry-out. V: Overflow; the last arithmetic ALU operation which changed the flags generated an overflow into the sign bit.
ARM Architecture Instruction Set Foundation
The Memory System
The ARM system has memory state
Viewed as a linear array of bytes numbered from 0 to 232-1 Data items may be
8-bit bytes 16-bit half-words 32-bit words
Words are always aligned on a 4-byte boundary
ARM Architecture Instruction Set Foundation
Load-store Architecture
The instruction set will only process (add, subtract, etc.) values which are in registers (or specified directly within the instruction itself), and will always place the results of such processing into a register The only operations which apply to memory state are one which copy memory values into registers (load instructions) or copy register values from memory (store instructions)
ARM instructions fall into three categories
Data processing instructions.
These use and change only register values These copy memory values into registers (load instructions) or copy values into memory (store instructions). An additional form, useful only in systems code, exchanges a memory value with a register value.
Data transfer instructions
Control flow instructions
Control flow instructions cause execution to switch to a different address, either permanently (branch instructions) or saving a return address to resume the original sequence (branch and link instructions) or trapping into system code (supervisor calls)
ARM Architecture Instruction Set Foundation
Supervisor mode
The ARM processor supports a protected supervisor mode. The protection mechanism to ensures that the user code cannot gain supervisor privileges without appropriate checks being carried out to ensure that the code is not attempting illegal operations
ARM Architecture Instruction Set Foundation
The ARM Instruction Instruction Set Features The load-store architecture Set 3-address data processing instructions
All ARM instructions are 32 bits wide and are aligned on 4-byte boundaries The exception is the compressed 16 bit Thumb instructions
Conditional execution of every instruction Inclusion of load and store multiple register instructions Ability to perform a general shift operation and a general ALU operation in a single instruction that executes in a single clock cycle Open instruction set extension through the coprocessor instruction set, including adding new registers and data types A very dense 16-bit compressed representation of the instruction set in the Thumb architecture
ARM Architecture Instruction Set Foundation
The I/O System
The ARM handles I/O peripherals as memory-mapped devices with interrupt support. The internal registers in these devices appear as addressable locations within the ARMs memory map and may be read and written using the same (load-store) instructions as any other memory location Peripherals may attract the processors attention by making an interrupt request using either the normal interrupt (IRQ) or the fast interrupt (FIQ) input
ARM Organization and Implementation
3-stage pipeline organization
Principal components
The register bank The barrel shifter
Can shift or rotate one operand by any number of bits
The ALU The address register and incrementer
Select and hold all memory addresses and generate sequential
addresses
The data registers The instruction decoder and associated control logic
Process Instruction Flow
In a single-cycle data processing instruction, two register operands are accessed, the value on the B bus is shifted and combined with the value on the A bus in the ALU, then the result is written back into the register bank. The program counter value is in the address register, from where it is fed into the incrementer, then the incremented value is copied back into r15 in the register bank and also into the address register to be used as the address for the next instruction fetch
ARM Organization and Implementation
ARM processors employ a simple 3-stage pipeline with the following pipeline stages
Fetch The instruction is fetched from memory and placed in the instruction pipeline Decode The instruction is decoded and the data path control signals prepared for the next cycle. In this stage the instruction owns the decode logic but not the data path Execute The instruction owns the data path; the register bank is read, an operand shifted, the ALU result generated and written back into a destination register
Example ARM Instruction Set
Summary
The ARM processor has a rich history both in academia and in the commercial space. It uses innovative architectural design to achieve high performance with low power consumption. It is highly utilized in mobile and embedded devices due to its power characteristics and is one of the most populous processors currently used. It utilizes the RISC instruction set to achieve this performance. It also uses a variety of organizational designs such as pipelining, in addition to the instruction set. The ARM processor is a robust development platform that will be in use for many years to come.