SJB Institute of Technology: CO & ARM Microcontrollers (21EC52)
SJB Institute of Technology: CO & ARM Microcontrollers (21EC52)
Module-2
Memory System & Basic Processing Unit
Dr. Supreeth H S G
Associate Professor
Dept. of ECE SJBIT
1
VTU syllabus
Module – 02
Memory System: Basic Concepts, Semiconductor RAM Memories, Read Only Memories, Speed,
Size, and Cost, Cache Memories – Mapping Functions, Replacement Algorithms, Performance
Considerations.
Textbook 1: Chapter 5 – 5.1 to 5.4, 5.5 (5.5.1, 5.5.2), 5.6
Textbook:
Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Computer Organization, 5th Edition, Tata McGraw
Hill, 2002. (Listed topics only from Chapters 1, 2, 4, 5, 8).
Contents
1. Basic Concepts
2. Semiconductor RAM Memories
3. Read Only Memories
4. Speed, Size and Cost
5. Cache Memories –
Mapping Functions
Replacement Algorithms
6. Performance Considerations.
3
1.Some basic concepts
• Maximum size of the Main Memory
• byte-addressable
• CPU-Main Memory Connection
Processor Memory
k-bit
address bus
MAR
n-bit
data bus Up to 2 k addressable
MDR locations
Control lines
( R / W, MFC, etc.)
Some basic concepts(Contd.,)
Measures for the speed of a memory:
memory access time.
memory cycle time.
An important design issue is to provide a
computer system with as large and fast a
memory as possible, within a given cost target.
Several techniques to increase the effective size
and speed of the memory:
Cache memory (to increase the effective speed).
Virtual memory (to increase the effective size).
2. Semiconductor RAM memories
The Memory System
Internal organization of memory chips
• Each memory cell can hold one bit of information.
• Memory cells are organized in the form of an array.
• One row is one memory word.
• All cells of a row are connected to a common line, known as the
“word line”.
• Word line is connected to the address decoder.
• Sense/write circuits are connected to the data input/output lines of
the memory chip.
Internal organization of memory chips
(Contd.,)
7 7 1 1 0 0
W0
•
•
•
FF FF
A0 W1
•
•
•
A1
Address Memory
• • • • • • cells
decoder • • • • • •
A2 • • • • • •
A3
W15
•
•
•
T1 T2
X Y
Word line
Bit lines
Asynchronous DRAMs
• Static RAMs (SRAMs):
– Consist of circuits that are capable of retaining their state as long as the power
is applied.
– Volatile memories, because their contents are lost when power is interrupted.
– Access times of static RAMs are in the range of few nanoseconds.
– However, the cost is usually high.
CAS D7 D0
Fast Page Mode
Suppose if we want to access the consecutive bytes in the
selected row.
This can be done without having to reselect the row.
Add a latch at the output of the sense circuits in each row.
All the latches are loaded when the row is selected.
Different column addresses can be applied to select and place different bytes on the
data lines.
Consecutive sequence of column addresses can be applied
under the control signal CAS, without reselecting the row.
Allows a block of data to be transferred at a much faster rate than random accesses.
A small collection/group of bytes is usually referred to as a block.
This transfer capability is referred to as the
fast page mode feature.
Synchronous DRAMs
•Operation is directly synchronized
Refresh
counter with processor clock signal.
•The outputs of the sense circuits are
connected to a latch.
Row
•During a Read operation, the
Row
address decoder Cell array contents of the cells in a row are
latch
Row/Column loaded onto the latches.
address •During a refresh operation, the
Column Column Read/Write contents of the cells are refreshed
address
counter decoder circuits & latches without changing the contents of
the latches.
•Data held in the latches correspond
Clock
to the selected columns are transferred
R AS to the output.
Mode register
CAS and Data input Data output •For a burst mode of operation,
register register
R/ W timing control successive columns are selected using
CS column address counter and clock.
CAS signal need not be generated
externally. A new data is placed during
Data
raising edge of the clock
Latency, Bandwidth, and DDRS DRAMs
• Memory latency is the time it takes to transfer
a word of data to or from memory
• Memory bandwidth is the number of bits or
bytes that can be transferred in one second.
• DDRSDRAMs
– Cell array is organized in two banks
Static memories
21-bit
addresses 19-bit internal chip address Implement a memory unit of 2M
A0
A1 words of 32 bits each.
Use 512x8 static memory chips.
A19 Each column consists of 4 chips.
A20
Each chip implements one byte
position.
A chip is selected by setting its
chip select control line to 1.
Selected chip places its data on the
2-bit
decoder data output line, outputs of other
chips are in high impedance state.
21 bits to address a 32-bit word.
High order 2 bits are needed to
512K ´ 8
memory chip select the row, by activating the
D31-24 D23-16 D 15-8 D7-0
four Chip Select signals.
512K ´ 8 memory chip
19 bits are used to access specific
byte locations inside the selected
19-bit
address
8-bit data
input/output chip.
Chip select
Dynamic memories
Large dynamic memory systems can be implemented using
DRAM chips in a similar way to static memory systems.
Placing large memory systems directly on the motherboard
will occupy a large amount of space.
Also, this arrangement is inflexible since the memory system cannot be expanded easily.
RAS
R/ W
CAS
Memory
Request controller R/ W
Processor Memory
CS
Clock
Clock
Data
18
3. Read-Only Memories (ROMs)
Main
Processor Cache memory
• Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.
• Subsequent references to the data in this block of words are found in the cache.
• At any given time, only some blocks in the main memory are held in the cache.
Which blocks in the main memory are in the cache is determined by a
“mapping function”.
• When the cache is full, and a block of words needs to be transferred
from the main memory, some block of words in the cache must be
replaced. This is determined by a “replacement algorithm”.
Cache hit
• Existence of a cache is transparent to the processor. The processor issues
Read and
Write requests in the same manner.
• Read hit:
The data is obtained from the cache.
• Write hit:
Cache has a replica of the contents of the main memory.
Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
Update the contents of the cache, and mark it as updated by setting a bit known
as the dirty bit or modified bit. The contents of the main memory are updated
when this block is replaced. This is write-back or copy-back protocol.
Cache miss
• If the data is not present in the cache, then a Read miss or Write miss
occurs.
• Read miss:
Block of words containing this requested word is transferred from the memory.
After the block is transferred, the desired word is forwarded to the processor.
The desired word may also be forwarded to the processor as soon as it is
transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.
• Write-miss:
Write-through protocol is used, then the contents of the main memory are
updated directly.
If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word
is overwritten with new information.
Cache Coherence Problem
• A bit called as “valid bit” is provided for each block.
• If the block contains valid data, then the bit is set to 1, else it is 0.
• Valid bits are set to 0, when the power is just turned on.
• When a block is loaded into the cache for the first time, the valid bit is set to 1.
• Data transfers between main memory and disk occur directly bypassing the cache.
• When the data on a disk changes, the main memory block is also updated.
• However, if the data is also resident in the cache, then the valid bit is set to 0.
• What happens if the data in the disk and main memory changes and the write-
back protocol is being used?
• In this case, the data in the cache may also have changed and is indicated by the
dirty bit.
• The copies of the data in the cache, and the main memory are different. This is
called the cache coherence problem.
• One option is to force a write-back before the main memory is updated from the
disk.
Contents-Part B
Basic Processing Unit
1. Some Fundamental Concepts
2. Execution of a Complete Instruction
3. Multiple Bus Organization
4. Hard-wired Control
5. Micro programmed Control
6. Basic concepts of pipelining
1. Fundamental Concepts
• Processor fetches one instruction at a time and perform
the operation specified.
• Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
• Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
• Instruction Register (IR)
2. Executing an Instruction
• Fetch the contents of the memory location pointed to by
the PC. The contents of this location are loaded into the
IR (fetch phase).
IR ← [[PC]]
• Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
• Carry out the actions specified by the instruction in the
IR (execution phase).
Internal
Organization
of the
Processor
Figure: Single
bus organizaton
of the datapath
inside a
processor
Executing an Instruction
• Transfer a word of data from one processor
register to another or to the ALU.
• Perform an arithmetic or a logic operation and
store the result in a processor register.
• Fetch the contents of a given memory location
and load them into a processor register.
• Store a word of data from a processor register
into a given memory location.
Register Transfers Internal processor
bus
Riin
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Register Transfers
• All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure 7.3.7.3.Input
Figure Inputand
andoutput
outputgating
gating for one register
register bit.
bit.
Performing an Arithmetic or Logic
Operation
• The ALU is a combinational circuit that has no internal
storage.
• ALU gets the two operands from MUX and bus. The
result is temporarily stored in register Z.
• What is the sequence of operations to add the contents
of register R1 to those of R2 and store the result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Fetching a Word from Memory
• Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus
MDR
Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
Fetching a Word from Memory
• The response time of each memory access varies (cache
miss, memory-mapped I/O,…).
• To accommodate this, the processor waits until it
receives an indication that the requested operation has
been completed (Memory-Function-Completed, MFC).
• Move (R1), R2
MAR ← [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 ← [MDR]
Timing Step 1 2 3
Clock
MR
MDRinE
Data
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Execution of a Complete Instruction
Internal processor
bus
PC
Instruction
Step Action Address
decoder and
lines
MAR control logic
Step Action
Incrementer
PC
Register
file
Constant 4
MUX
A
ALU R
Instruction
decoder
IR
MDR
MAR
Step Action
sequence for
Instruction
Address
decoder and
lines
MAR control logic
instruction Data
lines
MDR
IR
phase? (Assume
ALU
control ALU
lines
Carry-in
single bus
XOR TEMP
architecture)
Figure 7.1. Single-bus organization of the datapath inside a processor.
4.Hardwired Control
Overview
• To execute instructions, the processor must
have some means of generating the control
signals needed in the proper sequence.
• Two categories: hardwired control and
microprogrammed control
• Hardwired system can operate at high speed;
but with little flexibility.
Control Unit Organization
CLK Control step
Clock counter
External
inputs
Decoder/
IR
encoder
Condition
codes
Control signals
Step decoder
T 1 T2 Tn
INS 1
External
INS 2 inputs
Instruction
IR Encoder
decoder
Condition
codes
INSm
Run End
Control signals
T4 T6
T1
Figure 7.12. Generation of the Zin control signal for the processor in Figure 7.1.
Generating End
• End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN +…
Branch<0
Add Branch
N N
T7 T5 T4 T5
End
Instruction Data
cache cache
Bus interface
Processor
System bus
Main Input/
memory Output
MDRout
WMFC
MAR in
Select
PCout
Micro -
R1out
R3out
Read
PCin
R1 in
Add
End
Z out
IRin
Yin
instruction
Zin
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
Step Action
Figure 7.6. Con trol sequence for execution of the instruction Add (R3),R1.
Overview
• Control store
Starting
IR address
generator One function
cannot be carried
out by this simple
organization.
Clock P C
Control
store CW
Starting and
branch address Condition
IR codes
generator
Clock m PC
Control
store CW
F1 F2 F3 F4 F5
0000: No transfer 000: No transfer 000: No transfer 0000: Add 00: No action
0001: PCout 001: PCin 001: MARin 0001: Sub 01: Read
0010: MDRout 010: IRin 010: MDRin 10: Write
0011: Zout 011: Z in 011: TEMP in
0100: R0out 100: R0in 100: Y in 1111: XOR
0101: R1out 101: R1in
0110: R2out 110: R2 in 16 ALU
functions
0111: R3 out 111: R3 in
1010: TEMPout
1011: Offset out
F6 F7 F8
What is the price paid for
this scheme?
F6 (1 bit) F7 (1 bit) F8 (1 bit)
11 10 8 7 4 3 0
Address Microinstruction
(octal)
External Condition
Inputs codes
Decoding circuits
AR
Control store
Next address I R
Microinstruction decoder
Control signals
F0 F1 F2 F3
F4 F5 F6 F7
F8 F9 F10
0 0 0 0 0 0 0 0 0 0 1 0 0 1 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 1 1 00 1 1 0 0 0 0 0 0 00 0 1 0 0 0
0 0 2 0 0 0 0 0 0 1 1 0 1 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 1 1 0
121 0 1 0 1 0 0 1 0 1 0 0 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
122 0 1 1 1 1 0 0 0 0 1 1 10 0 0 0 0 0 0 0 0 00 0 1 0 0 1
1 7 0 0 1 1 1 1 0 0 1 0 1 0 00 0 0 0 1 0 0 0 0 01 0 1 0 0 0
1 7 1 0 1 1 1 1 0 1 0 0 1 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 0
1 7 2 0 1 1 1 1 0 1 1 1 0 1 01 1 0 0 0 0 0 0 0 00 0 0 0 0 0
1 7 3 0 0 0 0 0 0 0 0 0 1 1 10 1 0 0 0 0 0 0 0 00 0 0 0 0 0
Decoder
Decoder
IR Rsrc Rdst
InstDecout
External
inputs ORmode
Decoding
circuits
Condition ORindsrc
codes
AR
Control store
Rdst out
Rdst in
Microinstruction
decoder
Rsrc out
Rsrc in