0% found this document useful (0 votes)
133 views90 pages

Embedded Processors: Instruction Set Architecture (ISA)

The document discusses embedded processors and their instruction set architectures (ISAs). It describes how processors contain a master processor that controls slave processors. It then explains key aspects of ISAs, including what operations they specify, where data is stored, different addressing modes, and examples of common ISA models like RISC and CISC. Finally, it provides details on internal processor components like the CPU, memory, and buses that connect them.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views90 pages

Embedded Processors: Instruction Set Architecture (ISA)

The document discusses embedded processors and their instruction set architectures (ISAs). It describes how processors contain a master processor that controls slave processors. It then explains key aspects of ISAs, including what operations they specify, where data is stored, different addressing modes, and examples of common ISA models like RISC and CISC. Finally, it provides details on internal processor components like the CPU, memory, and buses that connect them.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 90

UNIT 2:

EMBEDDED PROCESSORS

• Processors are the main functional units of an embedded board, and are primarily responsible for
processing instructions and data.
• An electronic device contains at least one master processor, acting as the central controlling device, and
can have additional slave processors that work with and are controlled by the master processor.
• These slave processors may either extend the instruction set of the master processor or act to manage
memory, buses, and I/O (input/output) devices.
Instruction Set Architecture (ISA)
• ISA is an interface between the Hardware and software, it instruct the hardware about what needs be done
through the software instructions.
• It helps to utilize the hardware completely through the software instructions.
The Role of ISA :
 What is the operation to be performed ?
 Where to store the operand?
 Where to store the result of operation?
 How many operands are needed to perform this operation?
 What is the type of operation?
 What is the format of the operation?
 Size of operands
 What are the addressing modes supported? How to address the operands?
Sample ISA operations:

Operands
Operands are the data that operations manipulate. An ISA defines the types and formats of operands for a particular
architecture.
For example, in the case of the MPC823 (Motorola/Freescale PowerPC), SA-1110 (Intel StrongARM), and many
other architectures, the ISA defines simple operand types of bytes (8 bits), halfwords (16 bits), and words (32 bits).
An ISA also defines the operand formats (how the data looks) that a particular architecture can support, such as
binary, decimal and hexadecimal.
Example:
MOV registerX, 10d ; Move decimal value 10 into register X

MOV registerX, $0Ah ; Move hexadecimal value A (decimal 10) to register X

Storage:
The ISA specifies the features of the programmable storage used to store the data being operated on
1. Memory is simply an array of programmable storage, that stores data, including operations, operands, and so on.
The indices of this array are locations referred to as memory addresses, where each location is a unit of memory that can be addressed separately.
An ISA defines specific characteristics of the address space, such as whether it is:
Linear. A linear address space is one in which specific memory locations are represented incrementally, typically starting at “0”.
Specificmemory locations can only be accessed by specifying a segment identifier, a segment number that can be explicitly defined or implicitly
obtained from a register, and specifying the offset within a specific segment within the segmented address space.
The offset within the segment contains a base address and a limit, which map to another portion of memory that is set up as a linear address space.
If the offset is less than or equal to the limit, the offset is added to the base address, giving the un-segmented address within the linear address
space.
An important note regarding ISAs and memory is that different ISAs not only define where data is stored, but also how data is stored in memory—
specifically in what order the bits (or bytes) that make up the data is stored, or byte ordering.
The two byte-ordering approaches are big-endian, in which the most significant byte or bit is stored first, and little-endian, in which the least
significant bit or byte is stored first.
 An important note regarding ISAs and memory is that different ISAs not only
define where data is stored, but also how data is stored in memory—
specifically in what order the bits (or bytes) that make up the data is stored, or
byte ordering.
 The two byte-ordering approaches are big-endian, in which the most
significant byte or bit is stored first, and little-endian, in which the least
significant bit or byte is stored first.
Example:
68000 and SPARC are big-endian
x86 is little-endian
2. Register Set
A register is simply fast programmable memory normally used to store operands that are immediately or
frequently used.
A processor’s set of registers is commonly referred to as the register set or the register file.
Different processors have different register sets, and the number of registers in their sets vary between very few
to several hundred (even over a thousand).
3. How Registers Are Used:
An ISA defines which registers can be used for what transactions, such as special purpose, floating point, and
which can be used by the programmer in a general fashion (general purpose registers).
Addressing Modes
 Addressing modes define how the processor can access operand storage. In fact, the
 usage of registers is partly determined by the ISA’s Memory Addressing Modes.
The two most common types of addressing mode models are:
Load-Store Architecture, which only allows operations to process data in registers, not anywhere else in memory.
For example, the PowerPC architecture has only one addressing mode for load and store instructions: register
plus displacement (supporting register indirect with immediate index, register indirect with index, etc.).
Register-Memory Architecture, which allows operations to be processed both within registers and other types of
memory. Intel’s i960 Jx processor is an example of an addressing mode architecture that is based upon the register-
memory model (supporting absolute, register indirect, etc.).
Interrupts and Exception Handling:
Interrupts (also referred to as exceptions or traps depending on the type) are mechanisms that stop the standard
flow of the program in order to execute another set of code in response to some event, such as problems with the
hardware, resets, and so forth.
The ISA defines what if any type of hardware support a processor has for interrupts.
ISA models:
1.Application-Specific ISA Models: These ISA models define processors that are intended for specific embedded
applications, such as processors made only for TVs
2.Controller
Model : The Controller ISA is implemented in processors that are not required to perform complex
data manipulation, such as video and audio processors that are used as slave processors on a TV board.
3.DatapathModel: The Datapath ISA is implemented in processors whose purpose is to repeatedly perform fixed
computations on different sets of data.
Example: Digital Signal Processors (DSPs)
Finite
State Machine with Datapath (FSMD) Model: The FSMD ISA is an implementation based upon a
combination of the Datapath
ISA and the Controller ISA for processors that are not required to perform complex data manipulation and must
repeatedly perform fixed computations on different sets of data.
Examples: Application-Specific Integrated Circuits (ASICs), programmable logic devices (PLDs), and field-
programmable gate-arrays
 FSMD

An FSMD (finite state machine with data path) combines an FSM and regular sequential circuits.
The FSM, which is sometimes known as a control path, examines the external commands and
status and generates control signals to specify operation of the regular sequential circuits, which are
known collectively as a data path. Algorithms described in RT (register transfer) operation, in which
the operations are specified as data manipulation and transfer among a collection of registers, can
be converted to FSMD and realized in hardware.
 Java Virtual Machine (JVM) Model:
The JVM ISA is based upon one of the Java Virtual Machine standards real-world JVMs can
be implemented in an embedded system via hardware
Example : aJile’s aj-80 and aj-100 processors.
2. General-Purpose ISA models
 Complex Instruction Set Computing (CISC) Model
 Reduced Instruction Set Computing (RISC) Model

3. Instruction-Level Parallelism ISA Models


 Single Instruction Multiple Data (SIMD) Model
 Superscalar Machine Model
 Very Long Instruction Word Computing (VLIW) Model
MIPS Architecture:
MIPS (an acronym for Microprocessor without Interlocked Pipeline Stages) is a reduced instruction set computer
 (RISC) instruction set architecture (ISA)developed by MIPS Technologies (formerly MIPS Computer Systems)
Internal Processor Design:
Internal processor components consists of : CPU, Memory, Input components,
Output components and Buses
These internal components are the basis of the von Neumann model
Central Processing Unit (CPU)
Central Processing Unit (CPU)
CPU

 CPU is the processing unit within a processor.


 The CPU is responsible for fetching, decoding, and executing of instructions.
 This three-step process is commonly referred to as a three stage pipeline.

These cycles are implemented through some combination of four major CPU
components:
 The arithmetic logic unit (ALU) – implements the ISA’s operations
 Registers – a type of fast memory
 The control unit (CU) – manages the entire fetching and execution cycle
 The internal CPU buses – interconnect the ALU, registers, and the CU
The MPC860 CPU – the PowerPC core
Internal CPU Buses
 The CPU buses are the mechanisms that interconnect the ALU, the CU, and
registers
 Buses are simply wires that interconnect the various other components
within the CPU.
 Each bus’s wire is typically divided into logical functions, such as
 Data : which carries data, bi-directionally, between registers and the ALU
 Address: which carries the locations of the registers that contain the data to
be transferred
 Control: which carries control signal information, such as timing and
control signals, between the registers, the ALU, and the CU
 In the PowerPC Core, there is a Control Bus that carries the control
signals between the ALU, CU, and registers.
 In PowerPC source buses are the data buses that carry the data
between registers and the ALU.
 There is an additional bus called the write-back which is dedicated to
writing back data received from a source bus directly back from the
load/store unit to the fixed or floating point registers .
ALU
 The arithmetic logic unit (ALU) implements the comparison, mathematical and logical
operations defined by the ISA.
 The format and types of operations implemented in the ALU can vary depending on the ISA.
 The ALU is responsible for accepting multiple n-bit binary operands and performing any
logical (AND, OR, NOT, etc.), mathematical (+, –, *, etc.), and comparison (=, <, >, etc.)
operations on these operands.
 In the PowerPC core, the ALU is part of the “Fixed Point Unit” that implements all fixed-point
instructions other than load/store instructions.
 The ALU is responsible for fixed-point logic, add, and subtract instruction implementation.
 In the case of the PowerPC, generated results of the ALU are stored in an Accumulator.
 The PowerPC has an IMUL/IDIV unit (essentially another ALU) specifically for performing
multiplication and division operations.
Registers
 Registers are simply a combination of various flip-flops that can be used to
temporarily store data or to delay signals.
 storage register is a form of fast programmable internal processor memory usually
used to temporarily store, copy, and modify operands that are immediately or
frequently used by the system.
 Shift registers delay signals by passing the signals between the various internal flip-
flops with every clock pulse.
 ISA designs do not use all registers in the same way to process the data, storage
typically falls under one of two categories, either general purpose or special purpose
 General purpose registers can be used to store and manipulate any type of data
determined by the programmer.
 Special purpose registers can only be used in a manner specified by the ISA, including
holding results for specific types of computations, having predetermined
 flags (single bits within a register that can act and be controlled independently)
 Acting as counters (registers that can be programmed to change states—that is, increment
—asynchronously or synchronously after a specified length of time)
 Controlling I/O ports(registers managing the external I/O pins connected to the body of
the processor and to board I/O).
 Shift registers are inherently special purpose, because of their limited functionality.
 The PowerPC Core has a “Register Unit” which contains all registers visible to a user.
 PowerPC processors generally have two types of registers: general-purpose and special-
purpose (control) registers.
Control Unit (CU)
 The control unit (CU) is primarily responsible for generating timing signals, as well as controlling and
coordinating the fetching, decoding, and execution of instructions in the CPU.
 After the instruction has been fetched from memory and decoded, the control unit then determines
what operation will be performed by the ALU, and selects and writes signals appropriate to each
functional unit within or outside of the CPU (i.e., memory, registers, ALU, etc.).
 PowerPC core’s CU is called a “sequencer unit,” and is the heart of the PowerPC core.
 The sequencer unit is responsible for managing the continuous cycle of fetching, decoding, and
executing instructions.
 Provides the central control of the data and instruction flow among the other major units within the
PowerPC core (CPU), such as registers, ALU and buses.
 Implements the basic instruction pipeline. Fetches instructions from memory to issue these
instructions to available execution units.
 Maintains a state history for handling exceptions.
The CPU and the System (Master) Clock
 A processor’s execution is ultimately synchronized by an external system or master
clock ,located on the board.
 The master clock is an oscillator along with a few other components, such as a crystal.
 It produces a fixed frequency sequence of regular on/off pulse signals
Processor Performance
 There are several measures of processor performance, but are all based upon the processor’s behaviour
over a given length of time. One of the most common definitions of processor performance is a
processor’s throughput, the amount of work the CPU completes in a given period of time.
 CPU throughput (in bytes/sec or MB/sec) = 1 / CPU execution time = CPU performance
 CPU execution time in seconds per program = (total number of instructions per program or instruction
count) * (CPI in number of cycle cycles/instruction) * (clock period in seconds per cycle) =
((instruction count) * (CPI in number of cycle cycles/instruction)) / (clock rate in MHz)
 CPI (average number of clock cycles per instruction) can be determined CPI =∑ (CPI per instruction
* instruction frequency)
Other definitions of performance besides throughput include:
 A processor’s responsiveness, or latency, which is the length of elapsed time a processor takes to
respond to some event.
 A processor’s availability, which is the amount of time the processor runs normally without failure;
reliability, the average time between failures or MTBF (mean time between failures); and
recoverability, the average time the CPU takes to recover from failure or MTTR (mean time to
recover).
 One of the most common performance measures used for processors in the embedded market is
millions of instructions per seconds or MIPS.
 MIPS = Instruction Count / (CPU execution time * 106) = Clock Rate / (CPI * 106)
 The MIPS performance measure gives the impression that faster processors have higher MIPS values,
since part of the MIPS formula is inversely proportional to the CPU’s execution time.
 MIPS = Instruction Count / (CPU execution time * 106) = Clock Rate / (CPI * 106)
Memory : Basic Concepts
 Stores large number of bits
 m x n: m words of n bits each
 k = Log2(m) address input signals
 m = 2^k words
 e.g., 4,096 x 8 memory:
 32,768 bits
 12 address input signals
 8 input/output data signals
 Memory access
 r/w: selects read or write
 enable: read or write only when asserted
 multiport: multiple accesses to different locations simultaneously
 Traditional ROM/RAM distinctions
 ROM
 read only, bits stored without power
 RAM
 read and write, lose stored bits without power
 Other Differences
 Advanced ROMs can be written to : e.g., EEPROM
 Advanced RAMs can hold bits without power :e.g., NVRAM
 Write ability
 Manner and speed a memory can be written
 Storage permanence: ability of memory to hold stored bits after they are
 written
 Ranges of write ability
 High end
 processor writes to memory simply and quickly
 e.g., RAM
 Middle range
 processor writes to memory, but slower
 e.g., FLASH, EEPROM
 Lower range
 special equipment, “programmer”, must be used to write to memory
 e.g., EPROM, OTP ROM
 Low end
 bits stored only during fabrication
 e.g., Mask-programmed ROM
 In-system programmable memory
 Can be written to by a processor in the embedded system using the memory
 Memories in high end and middle range of write ability
 Range of storage permanence
 High end
 essentially never loses bits
 e.g., mask-programmed ROM
 Middle range
 holds bits days, months, or years after memory’s power source turned off
 e.g., NVRAM
 Lower range
 holds bits as long as power supplied to memory
 e.g., SRAM
 Low end
 begins to lose bits almost immediately after written
 e.g., DRAM
 Nonvolatile memory
 Holds bits after power is no longer supplied
 High end and middle range of storage permanence
ROM
 Nonvolatile memory
 Can be read from but not written to, by a processor in an embedded system
 Traditionally written to, “programmed”, before inserting to embedded system
 Uses
 Store software program for general-purpose processor
 program instructions can be one or more ROM words
 Store constant data needed by system
 Implement combinational circuit
Example: 8x4 ROM
 Horizontal lines = words
 Vertical lines = data
 Lines connected only at circles
 Decoder sets word 2’s line to 1 if address input is 010
 Data lines Q3 and Q1 are set to 1 because there is a
“programmed” connection with word 2’s line
 Word 2 is not connected with data lines Q2 and Q0
 Output is 1010
Implementing combinational function
 Any combinational circuit of n functions of same k variables can be done with 2k x
n ROM
Most common types of on-chip ROM include:
 MROM (mask ROM), which is ROM (with data content) that is permanently etched into the
microchip during the manufacturing of the processor, and cannot be modified later.
 PROMs (programmable ROM), or OTPs (one-time programmable), can be integrated on-
chip, that is one-time programmable by a PROM programmer (in other words, it can be
programmed outside the manufacturing factory).
 EPROM (erasable programmable ROM), can be integrated on a processor, in which content
can be erased and reprogrammed more than once (the number of times erasure and re-use
can occur depends on the processor). The content of EPROM is written to the device using
special separate devices and erased, Either selectively or in its entirety using other devices
that output intense ultraviolet Light into the processor’s built-in window.
 EEPROM (electrically erasable programmable ROM), can be erased and reprogrammed
more than once. The number of times erasure and re-use can occur depends on the
processor. Unlike EPROMs, the content of EEPROM can be written and erased without using
any special devices while the embedded system is functioning. With EEPROMs, erasing must
be done in its entirety, unlike EPROMs, which can be erased selectively.
RAM: “Random-access” memory
 Typically volatile memory
 bits are not held without power supply
 Read and written to easily by embedded system during execution
 Internal structure more complex than ROM
 a word consists of several memory cells, each storing 1 bit
 each input and output data line connects to each cell in its column
 rd/wr connected to every cell
 when row is enabled by decoder, each cell has logic that stores input
data bit when rd/wr indicates write or outputs stored bit when rd/wr
indicates read
 RAM (random access memory), is a main memory, in which any location within it can be accessed directly
(randomly, rather than sequentially from some starting point), and whose content can be changed more than once (the
number depending on the hardware).
 Unlike ROM, contents of RAM are erased if RAM loses power, meaning RAM is volatile.
Basic types of RAM
 SRAM: Static RAM
 Memory cell uses flip-flop to store bit
 Requires 6 transistors
 Holds data as long as power supplied
 DRAM: Dynamic RAM
 Memory cell uses MOS transistor and capacitor to store bit
 More compact than SRAM
 “Refresh” required due to capacitor leak
 word’s cells refreshed when read
RAM variations
 PSRAM: Pseudo-static RAM
 DRAM with built-in memory refresh controller
 Popular low-cost high-density alternative to SRAM
 NVRAM: Nonvolatile RAM
 Holds data after external power removed
 Battery-backed RAM
 SRAM with own permanently connected battery
 writes as fast as reads
 no limit on number of writes unlike nonvolatile ROM-based memory
 SRAM with EEPROM or flash
 stores complete RAM contents on EEPROM or flash before power turned off
Composing memory:
 Memory size needed often differs from size of readily available memories
 When available memory is larger, simply ignore unneeded high-order address bits and
higher data lines
 When available memory is smaller, compose several smaller memories into one larger
memory
 Connect side-by-side to increase width of words
 Connect top to bottom to increase number of words
 added high-order address line selects smaller memory containing desired word using
a decoder
 Combine techniques to increase number and width of words
Composing memory:
a. Increase width of words b. Increase number of words

c. Increase number and width of words


 Address Size Expansion: (32X4 RAM module using 8X4 RAM chips) Assume
processor is having 7 address line
D0
D1
D2
D3
RAM1 D 3 D 0 RAM2 D 3 D 0 RAM3 D 3 D 0 RAM4 D 3 D 0
0 0 0 0
1 1 1 1
2 2 2 2
A 0 3 A 0 3 A 0 3 A 0 3
A 1 4 A 1 4 A 1 4 A 1 4
A 2
5 A 2
5 A 2
5 A 2
5
6 6 6 6
7 7 7 7
RD WR CS RD WR CS RD WR CS RD WR CS
RD
WR
A0
A1
A2
A3 A Y0
A4 B Y1
A5 Y2
CS
A6 Y3
Address 2X4 DEC.
Selection
 Effect of the Address Selection Circuit
 The memory block occupied by the memory module depends on the connection of the address
selection circuit (AND gate) that enables the decoder.
 Two address lines are used to control the address selection circuit, thus the circuit can be configured to
occupy four different areas in the address space.

Address Selection A5 Address Selection A5 Address Selection A5 Address Selection A5


Circuit A6 Circuit A6 Circuit A6 Circuit A6

A6 A5 A4 A3 A2 A1 A0 Mem. Map A6 A5 A4 A3 A2 A1 A0 Mem. Map A6 A5 A4 A3 A2 A1 A0 Mem. Map A6 A5 A4 A3 A2 A1 A0 Mem. Map


0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 00 Not 0 0 0 0 0 0 0 00 Not 0 0 0 0 0 0 0 00 Not
RAM1
0 0 0 0 1 1 1 07 0 0 1 1 1 1 1 1F Used 0 1 1 1 1 1 1 3F Used 1 0 1 1 1 1 1 5F Used
0 0 0 1 0 0 0 08 0 1 0 0 0 0 0 20 1 0 0 0 0 0 0 40 1 1 0 0 0 0 0 60
RAM2 RAM1 RAM1 RAM1
0 0 0 1 1 1 1 0F 0 1 0 0 1 1 1 27 1 0 0 0 1 1 1 47 1 1 0 0 1 1 1 67
0 0 1 0 0 0 0 10 0 1 0 1 0 0 0 28 1 0 0 1 0 0 0 48 1 1 0 1 0 0 0 68
RAM3 RAM2 RAM2 RAM2
0 0 1 0 1 1 1 17 0 1 0 1 1 1 1 2F 1 0 0 1 1 1 1 4F 1 1 0 1 1 1 1 6F
0 0 1 1 0 0 0 18 0 1 1 0 0 0 0 30 1 0 1 0 0 0 0 50 1 1 1 0 0 0 0 70
RAM4 RAM3 RAM3 RAM3
0 0 1 1 1 1 1 1F 0 1 1 0 1 1 1 37 1 0 1 0 1 1 1 57 1 1 1 0 1 1 1 77
0 1 0 0 0 0 0 20 Not 0 1 1 1 0 0 0 38 1 0 1 1 0 0 0 58 1 1 1 1 0 0 0 78
Used RAM4 RAM4 RAM4
1 1 1 1 1 1 1 7F 0 1 1 1 1 1 1 3F 1 0 1 1 1 1 1 5F 1 1 1 1 1 1 1 7F
1 0 0 0 0 0 0 40 Not 1 1 0 0 0 0 0 60 Not
1 1 1 1 1 1 1 7F Used 1 1 1 1 1 1 1 7F Used
Example: (32X4 RAM module using 8X4 RAM chips - Assume an 8-address line processor)

D3 D3 D0 D3 D0 D3 D0 D3 D0
8x4 RAM 1 8x4 RAM 2 8x4 RAM 3 8x4 RAM 4
A0 A0 A0 A0

D0
A2 A2 A2 A2

RD WR CS RD WR CS RD WR CS RD WR CS

RD
WR

A0

2X4 DEC.
A2
A3 A Y0
A4 B Y1
A5 Y2
A6 CS Y3
A7
Memory Map for previous example
 There are three address lines connected on the address selection circuit. Thus there can be eight different
memory map configurations.
 Three possible memory map configurations are shown below.

Address Selection A5 Address Selection A5 Address Selection A5


Circuit
A6 Circuit
A6 Circuit
A6
A7 A7 A7

A7 A6 A5 A4 A3 A2 A1 A0 Mem. Map A7 A6 A5 A4 A3 A2 A1 A0 Mem. Map A7 A6 A5 A4 A3 A2 A1 A0 Mem. Map


0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 Not 0 0 0 0 0 0 0 0 00 Not
RAM1
0 0 0 0 0 1 1 1 07 1 0 0 1 1 1 1 1 9F Used 1 1 0 1 1 1 1 1 DF Used
0 0 0 0 1 0 0 0 08 1 0 1 0 0 0 0 0 A0 1 1 1 0 0 0 0 0 E0
RAM2 RAM1 RAM1
0 0 0 0 1 1 1 1 0F 1 0 1 0 0 1 1 1 A7 1 1 1 0 0 1 1 1 E7
0 0 0 1 0 0 0 0 10 1 0 1 0 1 0 0 0 A8 1 1 1 0 1 0 0 0 E8
RAM3 RAM2 RAM2
0 0 0 1 0 1 1 1 17 1 0 1 0 1 1 1 1 AF 1 1 1 0 1 1 1 1 EF
0 0 0 1 1 0 0 0 18 1 0 1 1 0 0 0 0 B0 1 1 1 1 0 0 0 0 F0
RAM4 RAM3 RAM3
0 0 0 1 1 1 1 1 1F 1 0 1 1 0 1 1 1 B7 1 1 1 1 0 1 1 1 F7
0 0 1 0 0 0 0 0 20 Not 1 0 1 1 1 0 0 0 B8 1 1 1 1 1 0 0 0 F8
RAM4 RAM4
1 1 1 1 1 1 1 1 FF Used 1 0 1 1 1 1 1 1 BF 1 1 1 1 1 1 1 1 FF
1 1 0 0 0 0 0 0 C0 Not
1 1 1 1 1 1 1 1 FF Used
Design Example:
Design an 8KX8 RAM module using 2KX8 RAM chips. The module should be connected on an 8-bit
processor with a 16-bit address bus, and occupy the address range starting from the address A000.
Show the circuit and the memory map.
 Number of memory devices needed = 8K/2K = 4 Starting Address = A000 = 1010-0000-0000-0000
==> A15 = 1, A14 = 0 and A13 = 1
 Decoder needed = 2X4 A13
Address Selection Circuit A14
 Number of address lines on each 2KX8 memory chip = 11 A15
A15 A14 A13 A12 A11 A10 A0 Mem. Map
2m = 2K = 21 x 210 = 211  (A0..A10) 0 0 0 0 0 0 0 0000 Not
 Decoder needed = 2X4 1 0 0 1 1 1 1 9FFF Used
1 0 1 0 0 0 0 A000
 2 address lines are needed for the decoder.  (A11..A12) 1 0 1 0 0 1 1 A7FF
RAM1

 Number of address lines needed for the address 1 0 1 0 1 0 0 A800


RAM2
1 0 1 0 1 1 1 AFFF
selection circuit = 16 - 11 - 2 = 3  (A13, A14 A15) 1 0 1 1 0 0 0 B000
RAM3
1 0 1 1 0 1 1 B7FF
1 0 1 1 1 0 0 B800
RAM4
1 0 1 1 1 1 1 BFFF
1 1 0 0 0 0 0 C000 Not
1 1 1 1 1 1 1 FFFF Used
Circuit Diagram

D7
D0 D7 D0 D7 D0 D7 D0 D7

2Kx8 RAM 2Kx8 RAM 2Kx8 RAM 2Kx8 RAM

D0 A0 A0 A0 A0

A10 A10 A10 A10

RD WR CS RD WR CS RD WR CS RD WR CS

RD
WR

A0
2X4 DEC.
A11
A Y0
A12
B Y1
A13
Y2
A14
CS Y3
A15 A15
Example: Design 16K words of memory using 4Kbytes of RAM using appropriate control signals
and show the address range of each memory block.
First Step:
Combine your 2k x 8 ROMs into a 2k X 32 ROM (requires 4 x 2k x 8 ROM ICs per 2k x 32 unit)). The address
inputs will be common and need to be connected in parallel. The data outputs are kept separate to for the 32 lines
required. Don't forget there are also control lines, usually a chip enable and a read line (usually active LOW) but
check the specs.
Second Step:
This involves combining four "2k x 32 bit" ROM units. The input ADDRESS LINES (A0 - A10) are connected
together in parallel. The OUTPUT DATA lines are also connected together in parallel. This just leaves the problem
of the CONTROL LINES. The READ line is simply common as you want the ROM to output the data with a single
'read' signal. The CHIP ENABLE lines are used as an extra ADDRESS signal to ensure that only ONE 2k x 32 bit
block is addressed at any given time. We have input addresses A11 and A12 to give the full 8K address for the ROM.
We need to add a 2 to 4 line decoder to convert these address lines to CHIP ENABLE selections.
On-Chip Memory
Embedded platforms have a memory hierarchy, a collection of different types of memory, each with unique speeds, sizes,
and usages.
Some of this memory can be physically integrated on the processor, such as registers, read-only memory (ROM), certain
types of random access memory (RAM), and level-1 cache.
1. Registers
 CPU registers are at the top most level of this hierarchy, they hold the most frequently
used data. They are very limited in number and are the fastest.
 They are often used by the CPU and the ALU for performing arithmetic and logical
operations, for temporary storage of data.
2. Cache
 The very next level consists of small, fast cache memories near the CPU. They act as
staging areas for a subset of the data and instructions stored in the relatively slow main
memory.
 There are often two or more levels of cache as well. The cache at the top most level
after the registers is the primary cache. Others are secondary caches.
 Many a times there is cache present on board with the CPU along with other levels that
are outside the chip.
3. Main Memory:
 The next level is the main memory, it stages data stored on large, slow disks often
called hard disks. These hard disks are also called secondary memory, which are the
last level in the hierarchy. The main memory is also called primary memory.
 The secondary memory often serves as staging areas for data stored on the disks or
tapes of other machines connected by networks.
Cache
which is a small amount of fast, expensive memory.
 The cache goes between the processor and the slower, dynamic main memory.
 It keeps a copy of the most frequently used data from the main memory.
 Memory access speed increases overall, because we’ve made the common case faster.
 Reads and writes to the most frequently used addresses will be serviced by the cache.
 We only need to access the slower main memory for less frequently used data.
The principle of locality
 It’s usually difficult or impossible to figure out what data will be ―most
frequently accessed before a program actually runs, which makes it hard to
know what to store into the small, precious cache memory.
 But in practice, most programs exhibit locality, which the cache can take
advantage of.
 The principle of temporal locality says that if a program accesses one memory
address, there is a good chance that it will access the same address again.
 The principle of spatial locality says that if a program accesses one memory
address, there is a good chance that it will also access other nearby addresses
 As the locality of reference is applicable in the computer system, instead of
fetching a single instruction a group of instruction are sent from primary memory
to cache. These groups of instructions are called blocks.
 Entire primary memory is divided into equal sized blocks. Cache memory is also
divided into same-sized blocks. For placing primary memory blocks into cache
memory blocks, 3 different mapping techniques are available.
Cache memory mapping techniques:
 Direct Mapping
 Associative Mapping
 Set – Associative Mapping
Direct Mapping:
 The direct mapping concept is if the ith block of main memory has to be placed at the jth block
of cache memory then, the mapping is defined as:
j = i % (number of blocks in cache memory)
Let’s see this with the help of an example.
 Suppose, there are 4096 blocks in primary memory and 128 blocks in the cache memory.
Then the situation is like if I want to map the 0th block of main memory into the cache
memory, then I apply the above formula and I get:
0 % 128 = 0
 So, the 0th block of main memory is mapped to the 0th block of cache memory. Here, 128 is the
total number of blocks in the cache memory.
1 % 128 = 1
2 % 128 = 2
 The following diagram illustrates the direct mapping process.
Associative Mapping:
 In the direct cache memory mapping technique, the problem was every block of main
memory was directly mapped to the cache memory. So, the major drawback was the
high conflict miss. That means we had to replace a cache memory block even when other
blocks in the cache memory were present as empty.
 Suppose, I have already loaded the 0th block of main memory to the 0th block of cache
memory using the direct mapping technique. Now consider, the next block that I need
is 128. Even if the 1,2,3… all blocks of cache memory are empty, I still have to map
the 128 block of main memory to the 0th block of cache memory since,
128 % 128 = 0
 Therefore, I have to change the previously loaded 0th block of main memory to
the 128 block. So, that was the reason for high conflict miss. That means I have to replace
a cache block even if the other cache blocks are present as empty. To overcome this
drawback of direct mapping technique, the concept of associative mapping technique
was introduced.
 The idea of associative mapping technique is to avoid the high conflict miss, any block of
main memory can be placed anywhere in the cache memory. Associative mapping
technique is the fastest and most flexible mapping technique. We can have the following
diagram to illustrate the associative mapping process.
Set Associative mapping:
 Set associative mapping is introduced to overcome the high conflict miss in the direct
mapping technique and the large tag comparisons in case of associative mapping. In this
cache memory mapping technique, the cache blocks are divided into sets. Here the set
size is always in the power of 2, i.e. if the cache has 2 blocks per set then it is called as 2-
way set associative. Similarly, if it has 4 blocks per set then it is called as 4-way set
associative.
 It basically means that instead of just referring to the cache block directly we will refer to
the particular sets present in the cache memory. So basically the concept is we map a
particular block of main memory to a particular set of cache and within that set, the
block can be mapped to any of the cache blocks that are available.
 Consider a system with 128 cache memory blocks and 4096 primary memory blocks.
Here we are considering 2 blocks in each set, or simply we are considering a 2-way set
associative process. Since there are 2 blocks in each set, so there will be total 64 sets in
our cache memory.
 Here, to determine the proper set position in which the main memory will be placed we
use a concept i.e. if the  ith block of main memory has to be placed in the jth block of
cache memory then,
j = i % (number of sets in cache)
 After determining the cache position, the primary memory block may be placed in any
block inside the set. 
Following diagram illustrates this process.
Mapping of main memory address to cache memory:
Example: Consider a cache consisting of 128 blocks of 16 words each, for total of 2048(2K) words and assume
that the main memory is addressable by 16 bit address. Main memory is 64K which will be viewed as 4K blocks of
16 words each.
1)Direct Mapping:

1) The simplest way to determine cache locations in which store Memory blocks is direct Mapping technique.
2) In this block J of the main memory maps on to block J modulo 128 of the cache. Thus main memory blocks
0,128,256,….is loaded into cache is stored at block 0. Block 1,129,257,….are stored at block 1 and so on.
3) Placement of a block in the cache is determined from memory address. Memory address is divided into 3 fields,
the lower 4-bits selects one of the 16 words in a block.
4) When new block enters the cache, the 7-bit cache block field determines the cache positions in which this block
must be stored.
5) The higher order 5-bits of the memory address of the block are stored in 5 tag bits associated with its location in
cache. They identify which of the 32 blocks that are mapped into this cache position are currently resident in the
cache.
2) Associative Mapping:-

1) This is more flexible mapping method, in which main memory block can be placed into any cache block position.
2) In this, 12 tag bits are required to identify a memory block when it is resident in the cache.
3) The tag bits of an address received from the processor are compared to the tag bits of each block of the cache to
see, if the desired block is present. This is known as Associative Mapping technique.
4) Cost of an associated mapped cache is higher than the cost of direct-mapped because of the need to search all
128 tag patterns to determine whether a block is in cache. This is known as associative search.
3) Set-Associative Mapping
1) It is the combination of direct and associative mapping
technique.
2) Cache blocks are grouped into sets and mapping allow
block of main memory reside into any block of a specific
set. Hence contention problem of direct mapping is
eased , at the same time , hardware cost is reduced by
decreasing the size of associative search.
3) For a cache with two blocks per set. In this case,
memory block 0, 64, 128,…..,4032 map into cache set 0
and they can occupy any two block within this set.
4) Having 64 sets means that the 6 bit set field of the
address determines which set of the cache might contain
the desired block. The tag bits of address must be
associatively compared to the tags of the two blocks of the
set to check if desired block is present. This is two way
associative search.
Examples on cache mapping:
 Where would the byte from memory address 6146 be stored in this direct-mapped 210-
block cache with 22-byte blocks?
6146 in binary is 00...01 1000 0000 0010.
Example 1: A computer system uses 16-bit memory addresses. It has a 2K-byte cache
organized in a direct-mapped manner with 64 bytes per cache block. Assume that the
size of each memory word is 1 byte.
 Calculate the number of bits in each of the Tag, Block, and Word fields of the memory
address.
 If the cache is organized as a 2-way set-associative cache
Average Memory Access Time:
Average Memory Access Time= (Hit Ratio x Time taken to access memory in case of hit) +( (1-Hit
Ratio)x Time taken to access memory in case of Miss)
Miss Ratio =1-Hit Ratio
Example: Assume that for a certain processor, a read request takes 50 nanoseconds on a cache miss and
5 nanoseconds on a cache hit . Suppose while running a program, it was observed that 80% of the
processors read read requests result in a cache hit. The average read access time in nanoseconds is?
Given :
Hit ratio = 0.8
Time taken in case of a hit = 5ns
Time taken in case of a hit = 50ns
Average memory Access time = 0.8x5 + (1-0.8)x50
= 4+0.2x50 =4+10 =14ns
1). Given the following three cache designs, find the one with the best performance by calculating the
average cost of access. Show all calculations.
(a) 4 Kbyte, 8-way set-associative cache with a 6% miss rate; cache hit costs one cycle, cache miss costs
12 cycles.
(b) 8 Kbyte, 4-way set-associative cache with a 4% miss rate; cache hit costs two cycles, cache miss costs
12 cycles.
(c) 16 Kbyte, 2-way set-associative cache with a 2% miss rate; cache hit costs three cycles, cache miss
costs 12 cycles.
a.) 4 Kb, 8-way set-associative cache with a 6% miss rate; cache hit costs 1 cycle,
cache miss costs 12 cycles.
miss rate = .06
hit rate = 1- miss rate = .94
.94 * 1cycle (hit) + .06 * 12 cycles (miss) = .94 + .72 = 1.66 cycles avg.
b.) 8 Kb, 4-way set-associative cache with a 4% miss rate; cache hit costs 2 cycles,
cache miss costs 12 cycles.
miss rate = .04
hit rate = 1 – miss rate = .96
.96 * 2 cycles (hit) + .04 * 12 cycles (miss) = 1.92 + .48 = 2.4 cycles avg.
c.) 16 Kb, 2-way set-associative cache with a 2% miss rate; cache hit costs 3 cycles,
cache miss costs 12 cycles.
miss rate = .02
hit rate = 1 – miss rate = .98
.98 * 3 cycles (hit) + .02 * 12 cycles (miss) = 2.94 + .24 = 3.18 cycles avg.
BEST PERFORMANCE: a) 1.66 cycles avg.
2). Given a 2-level cache design where the hit rates are 88% for the smaller cache and 97% for the
larger cache, the access costs for a miss are 12 cycles and 20 cycles, respectively, and the access
cost for a hit is one cycle, calculate the average cost of access.
hit rate = .88
L1 miss/L2 hit rate = .12 * .97
L1miss/L2 miss rate = .12 * .03
Avg. cost = (.88 * 1) + (.12 * .97 * 12) + (.12 * .03 * 20)
= .88 + 1.3968 + .072
= 2.3488 cycles
 3).A computer system uses 16-bit memory addresses. It has a 2K-byte cache organized in a 1).direct-
mapped manner with 64 bytes per cache block 2) 2-way set-associative cache. Assume that the size of
each memory word is 1 byte. Calculate the number of bits in each of the Tag, Block, and Word fields
of the memory address.
1. For direct-mapped:
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6
Cache size = 2K-byte = 211 bytes
Number of cache blocks = Cache size / Block size = 211/26 = 25
Therefore, Number of bits in the Block field = 5
Total number of address bits = 16
Therefore, Number of bits in the Tag field = 16 - 6 - 5 = 5
For a given 16-bit address, the 5 most significant bits, represent the Tag, the next 5 bits represent the
Block, and the 6 least significant bits represent the Word.
2) For ) 2-way set-associative
Block size = 64 bytes = 26 bytes = 26 words
Therefore, Number of bits in the Word field = 6
Cache size = 2K-byte = 211 bytes
Number of cache blocks per set = 2
Number of sets = Cache size / (Block size * Number of blocks per set) = 211/(26 * 2) = 24
Therefore, Number of bits in the Set field = 4
Total number of address bits = 16
Therefore, Number of bits in the Tag field = 16 - 6 - 4 = 6
Cache replacement policy
 The cache-replacement policy is the technique for choosing which cache block to replace when a fully-associative
cache is full, or when a set-associative cache’s line is full. Note that there is no choice in a direct-mapped cache; a
main memory address always maps to the same cache address and thus replaces whatever block is already there.
 There are three common replacement policies.
1. A random replacement policy chooses the block to replace randomly. While simple to implement, this policy does
nothing to prevent replacing block that’s likely to be used again soon.
2. A least-recently used (LRU) replacement policy replaces the block that has not been accessed for the longest time,
assuming that this means that it is least likely to be accessed in the near future. This policy provides for an
excellent hit/miss ratio but requires expensive hardware to keep track of the times blocks are accessed.
3. A first-in-first-out (FIFO) replacement policy uses a queue of size N, pushing each block address onto the queue
when the address is accessed, and then choosing the block to replace by popping the queue.
Cache write techniques
 When we write to a cache, we must at some point update the memory. Such update
is only an issue for data cache, since instruction cache is read-only.
 There are two common update techniques, write-through and write-back.
 In the write-through technique, whenever we write to the cache, we also write to main memory, requiring the
processor to wait until the write to main memory completes. While easy to implement, this technique may
result in several unnecessary writes to main memory.
 For example, suppose a program writes to a block in the cache, then reads it, and then writes it again, with the
block staying in the cache during all three accesses. There would have been no need to update the main
memory after the first write, since the second write overwrites this first write.
 The write-back technique reduces the number of writes to main memory by writing a block to main
memory only when the block is being replaced, and then only if the block
was written to during its stay in the cache. This technique requires that we associate an
extra bit, called a dirty bit, with each block. We set this bit whenever we write to the
block in the cache, and we then check it when replacing the block to determine if we
should copy the block to main memory.
Memory Management of External Memory
 There are several different types of memory that can be integrated into a system, and there are also
differences in how software running on the CPU views logical/virtual memory addresses and the actual
physical memory addresses—the two-dimensional array or row and column. Memory managers are ICs
designed to manage these issues.
 The two most common types of memory managers found on an embedded board are memory controllers
(MEMC) and memory management units (MMUs).
A memory controller (MEMC), shown in Figure, is used to implement and provide glueless interfaces to
the different types of memory in the system, such as SRAM and DRAM, synchronizing access to memory
and verifying the integrity of the data being transferred. Memory controllers access memory directly with
the memory’s own physical two- dimensional addresses. The controller manages the request from the
master processor and accesses the appropriate banks, awaiting feedback and returning that feedback to
the master processor. In some cases, where the memory controller is mainly managing one type of
memory, it may be referred to by that memory’s name, such as DRAM controller, cache controller, and so
forth.
 Memory management units (MMUs) mainly allow for the flexibility in a system of having a larger
virtual memory (abstract) space within an actual smaller physical memory. An MMU, shown in below
Figure, can exist outside the master processor and is used to translate logical (virtual) addresses into
physical addresses (memory mapping), as well as handle memory security (memory protection),
controlling cache, handling bus arbitration between the CPU and memory, and generating appropriate
exceptions.

 In the case of translated addresses, the MMU can use level-1 cache or portions of cache allocated as
buffers for caching address translations, commonly referred to as the translation lookaside buffer or
TLB, on the processor to store the mappings of logical addresses to physical addresses. MMUs also must
support the various schemes in translating addresses, mainly segmentation, paging, or some combination
of both schemes.

You might also like