0% found this document useful (0 votes)
96 views93 pages

COD - Unit-3 - N - 4 - PPT AJAY Kumar

The document provides information about Computer Organization and Design units 3 and 4. It discusses the Arithmetic Logic Unit (ALU) which performs arithmetic and logical operations. It describes how the ALU handles integers and floating point numbers. It also covers topics like integer representation using sign-magnitude and two's complement methods, addition, subtraction, multiplication, division, and floating point number representation according to the IEEE 754 standard. Furthermore, it distinguishes between hardwired and microprogrammed control units, with hardwired used for RISC instruction sets and microprogrammed used for CISC instruction sets.

Uploaded by

Ajay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views93 pages

COD - Unit-3 - N - 4 - PPT AJAY Kumar

The document provides information about Computer Organization and Design units 3 and 4. It discusses the Arithmetic Logic Unit (ALU) which performs arithmetic and logical operations. It describes how the ALU handles integers and floating point numbers. It also covers topics like integer representation using sign-magnitude and two's complement methods, addition, subtraction, multiplication, division, and floating point number representation according to the IEEE 754 standard. Furthermore, it distinguishes between hardwired and microprogrammed control units, with hardwired used for RISC instruction sets and microprogrammed used for CISC instruction sets.

Uploaded by

Ajay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 93

Computer Organization and Design

(BCO 009A)

UNIT-3,4
Submitted By:
Ajay Kumar
Assistant Professor-I
CSE

Computer Organization and Design


UNIT - 3

Computer Organization and Design


Arithmetic & Logic Unit (ALU)

• Part of the computer that actually performs arithmetic and


logical operations on data
• All of the other elements of the computer system are there
mainly to bring data into the ALU for it to process and then to
take the results back out
• Based on the use of simple digital logic devices that can store
binary digits and perform simple Boolean logic operations
Arithmetic & Logic Unit

• Does the calculations


• Everything else in the computer is there to service this unit
• Handles integers
• May handle floating point (real) numbers
• May be separate FPU (maths co-processor)
• May be on chip separate FPU (486DX +)
ALU Inputs and Outputs
Integer Representation

• Only have 0 & 1 to represent everything


• Positive numbers stored in binary
• e.g. 41=00101001
• No minus sign
• No period
• Sign-Magnitude
• Two’s compliment
Sign-Magnitude

• Left most bit is sign bit


• 0 means positive
• 1 means negative
• +18 = 00010010
• -18 = 10010010
• Problems
• Need to consider both sign and magnitude in arithmetic
• Two representations of zero (+0 and -0)
Two’s Compliment

• +3 = 00000011
• +2 = 00000010
• +1 = 00000001
• +0 = 00000000
• -1 = 11111111
• -2 = 11111110
• -3 = 11111101
Benefits

• One representation of zero


• Arithmetic works easily (see later)
• Negating is fairly easy
• 3 = 00000011
• Boolean complement gives 11111100
• Add 1 to LSB 11111101
Negation Special Case 1

• 0= 00000000
• Bitwise not 11111111
• Add 1 to LSB +1
• Result 1 00000000
• Overflow is ignored, so:
• -0=0
Negation Special Case 2

• -128 = 10000000
• bitwise not 01111111
• Add 1 to LSB +1
• Result 10000000
• So:
• -(-128) = -128 X
• Monitor MSB (sign bit)
• It should change during negation
Range of Numbers

• 8 bit 2s compliment
• +127 = 01111111 = 27 -1
• -128 = 10000000 = -27
• 16 bit 2s compliment
• +32767 = 011111111 11111111 = 215 - 1
• -32768 = 100000000 00000000 = -215
Conversion Between Lengths

• Positive number pack with leading zeros


• +18 = 00010010
• +18 = 00000000 00010010
• Negative numbers pack with leading ones
• -18 = 10010010
• -18 = 11111111 10010010
• i.e. pack with MSB (sign bit)
Addition
OVERFLOW RULE:

If two numbers are added,


and they are both positive or
both negative, then overflow
occurs if and only if the result
has the opposite sign.
SUBTRACTION RULE:

To subtract one number


(subtrahend) from another
(minuend), take the twos
complement (negation) of
the subtrahend and add it
to the minuend.
Subtraction
Addition and Subtraction

• Normal binary addition


• Monitor sign bit for overflow

• Take twos compliment of substahend and add to minuend


• i.e. a - b = a + (-b)

• So we only need addition and complement circuits


Hardware for Addition and Subtraction
Flow chart for Addition and subtraction with
signed magnitude Data
Multiplication

• Complex
• Work out partial product for each digit
• Take care with place value (column)
• Add partial products
Multiplication Example

• 1011 Multiplicand (11 dec)


• x 1101 Multiplier (13 dec)
• 1011 Partial products
• 0000 Note: if multiplier bit is 1 copy
• 1011 multiplicand (place value)
• 1011 otherwise zero
• 10001111 Product (143 dec)
• Note: need double length result
Hardware
Implementation
of Unsigned
Binary
Multiplication
Flowchart for
Unsigned Binary
Multiplication
Twos Complement Multiplication
Comparison
Execution of Example
Multiplying Negative Numbers

• This does not work!


• Solution 1
• Convert to positive if required
• Multiply as above
• If signs were different, negate answer
• Solution 2
• Booth’s algorithm
Booth’s Algorithm
Example of Booth’s Algorithm
Division
Flowchart for
Unsigned
Binary Division
Example of Restoring Twos Complement Division
Division of Unsigned Binary Integers

00001101 Quotient

Divisor 1011 10010011 Dividend


1011
001110
Partial
1011
Remainders
001111
1011
Remainder
100
Flowchart for Unsigned Binary Division
Real Numbers

• Numbers with fractions


• Could be done in pure binary
• 1001.1010 = 23 + 20 +2-1 + 2-3 =9.625
• Where is the binary point?
• Fixed?
• Very limited
• Moving?
• How do you show where it is?
Floating Point

• +/- .significand x 2exponent


• Misnomer
• Point is actually fixed between sign bit and body of mantissa
• Exponent indicates place value (point position)

Misnomer:เรี ยกชื่อผิด, ใชช้ ่ือผิด


Floating-Point Representation

• Principles
• With a fixed-point notation it is possible to represent a range of
positive and negative integers centered on or near 0
• By assuming a fixed binary or radix point, this format allows
the representation of numbers with a fractional component as
well
• Limitations:
• Very large numbers cannot be represented nor can very small
fractions
• The fractional part of the quotient in a division of two large
numbers could be lost
Floating-Point

• Significand
• The final portion of the word
• Any floating-point number can be expressed in many
ways
The following are equivalent, where the significand is expressed
in binary form:
0.110 * 25
110 * 22
0.0110 * 26
• Normal number
• The most significant digit of the significand is nonzero
Expressible Numbers
IEEE Standard 754

Most important Standard was developed to


facilitate the portability of
floating-point programs from one processor to
another and to encourage the
representation is development of sophisticated,
defined numerically oriented programs

Standard has been widely IEEE 754-2008 covers


adopted and is used on
both binary and
virtually all contemporary
processors and arithmetic decimal floating-point
coprocessors representations
IEEE 754-2008

• Defines the following different types of floating-point formats:


• Arithmetic format
• All the mandatory operations defined by the standard are supported
by the format. The format may be used to represent floating-point
operands or results for the operations described in the standard.
• Basic format
• This format covers five floating-point representations, three binary
and two decimal, whose encodings are specified by the standard,
and which can be used for arithmetic. At least one of the basic
formats is implemented in any conforming implementation.
• Interchange format
• A fully specified, fixed-length binary encoding that allows data
interchange between different platforms and that can be used for
storage.
IEEE 754
Formats
Table 10.3

IEEE 754

Format
Parameters

* not including implied bit and not including sign bit


Floating Point Examples
Signs for Floating Point

• Mantissa is stored in 2s compliment


• Exponent is in excess or biased notation
• e.g. Excess (bias) 128 means
• 8 bit exponent field
• Pure value range 0-255
• Subtract 128 to get correct value
• Range -128 to +127
Normalization

• FP numbers are usually normalized


• i.e. exponent is adjusted so that leading bit (MSB) of mantissa
is 1
• Since it is always 1 there is no need to store it
• (c.f. Scientific notation where numbers are normalized to give
a single digit before the decimal point
• e.g. 3.123 x 103)
FP Ranges

• For a 32 bit number


• 8 bit exponent
• +/- 2256  1.5 x 1077
• Accuracy
• The effect of changing lsb of mantissa
• 23 bit mantissa 2-23  1.2 x 10-7
• About 6 decimal places
Expressible Numbers
IEEE 754

• Standard for floating point storage


• 32 and 64 bit standards
• 8 and 11 bit exponent respectively
• Extended formats (both mantissa and exponent) for
intermediate results
Hardwired Vs Microprogrammed Control Unit
• The Hardwired and Microprogrammed control unit generates
the control signals to fetch and execute instructions. The
fundamental difference between hardwired and microprogrammed
control unit is that hardwired is a circuitry approach whereas, the
microprogram control unit is implemented by programming.
• The hardwired control unit is designed for the RISC style
instruction set. On the other hand, the microprogrammed control
unit was designed for the CISC style instruction set.
• These control units can be distinguished on the several parameters
which we have discussed below.
Hardwired Vs Microprogrammed Control Unit
• The hardwired control unit is implemented using a hardware
circuit while a microprogrammed control unit is implemented
by programming.
• A hardwired control unit is designed for RISC style instruction
set. On the other hand, a microprogrammed control unit is
for CISC style instruction set.
• Implementing modification in a microprogrammed control unit
is easier as it is easy to change the code. But implementing
modification in Hardwired control unit is difficult as changing the
circuitry will add cost.
• The hardwired control unit executes simple instructions easily
but it finds difficulty in executing complex instruction. The
microprogrammed control unit finds, executing complex
instructions easy.
Hardwired Vs Microprogrammed Control Unit
• Building up a hardwired control unit requires the hardware circuit
which is costly. Building up microprogrammed control unit is
cheaper as it only requires coding.
• The hardwired control unit does not require a control memory as
here; the control signal is generated by hardware. The
microprogrammed control unit requires the control memory as
the microprograms which are responsible for generating the
control signals are stored in control memory.
• Execution is fast as everything is in the hardware. But, the
microprogram control unit is slow as it has to access control
memory for the execution of every instruction.
UNIT - 4

Computer Organization and


MEMORY ORGANIZATION

• Static memory and Dynamic Memory


•Memory Hierarchy

• Main Memory

• Auxiliary Memory

• Associative Memory

• Cache Memory

• Virtual Memory
Static memory and Dynamic Memory
Static RAM
  SRAM uses transistor to store a single bit of data
•  SRAM does not need periodic refreshment to maintain data
•  SRAM’s structure is complex than DRAM
•  SRAM are expensive as compared to DRAM
•  SRAM are faster than DRAM
•  SRAM are used in Cache memory
  Dynamic RAM
• DRAM uses a separate capacitor to store each bit of data
•  DRAM needs periodic refreshment to maintain the charge in the
capacitors for data
•  DRAM’s structure is simplex than SRAM
ü  DRAM’s are less expensive as compared to SRAM
ü  DRAM’s are slower than SRAM
ü  DRAM are used in Main memory
MEMORY HIERARCHY

Memory Hierarchy is to obtain the highest possible


access speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks

CPU Cache
memory

Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape
MAIN MEMORY
RAM and ROM Chips
Typical RAM chip
Chip select 1 CS1
Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7

CS1 CS2 RD WR Memory function State of data bus


0 0 x x Inhibit High-impedence
0 1 x x Inhibit High-impedence
1 0 0 0 Inhibit High-impedence
1 0 0 1 Write Input data to RAM
1 0 1 x Read Output data from RAM
1 1 x x Inhibit High-impedence

Typical ROM chip


Chip select 1 CS1
Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9
MEMORY ADDRESS MAP
Address space assignment to each memory chip

Example: 512 bytes RAM and 512 bytes ROM


Address bus
Hexa
Component address 10 9 8 7 6 5 4 3 2 1

RAM 1 0000 - 007F 0 0 0 x x x x x x x


RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
1 x x x x x x x x x
ROM 0200 - 03FF

Memory Connection to CPU


- RAM and ROM chips are connected to a CPU
through the data and address buses

- The low-order lines in the address bus select


the byte within the chips and other lines in the
address bus select a particular chip through
its chip select inputs
CONNECTION OF MEMORY TO CPU
CPU
Address bus
16-1110 9 8 7-1 RDWR Data bus

Decoder
3210
CS1

Data
CS2128 x 8
RD
WR RAM 1
AD7
CS1

Data
CS2
RD 128
RAM
x8
2
WR
AD7
CS1

Data
CS2128 x 8
RD RAM 3
WR
AD7
CS1
Data
CS2
RD 128
RAM
x8
4
WR
AD7
CS1
Data

1- 7 CS2512 x 8
8 }AD9 ROM
9
The Memory System: Memory Hierarchy
A Memory System is normally comprised of a hierarchy of memories:
• Cache - very fast (1 or 2 cycle access), but small (e.g. 32 KB-64 KB)
• built with SRAM on-board the processor chip
• designed as two separate caches (to improve bandwidth) - one for
instructions and one for data
• Main Memory - larger (typically 32 MB - 256 MB) and slower (50 ns
access) than cache
• built with DRAM chips on separate modules/card
• Virtual Memory - very large (say 2 GB - 16 GB), but also very slow (15 -
20 ms access)
• built with magnetic (hard) disk
• Ideally, we would like the memory system to always appear as very large
and very fast!!
Computer Organization and
Memory Systems: Hierarchy
• Concept of an infinite cache:
• fetches by the CPU for instructions or data normally come from
cache (say 95% of time)
• if instructions or operands are not in cache, a "miss" occurs and CPU
waits while MMU (memory management unit) goes to main memory
for the missing instruction or operand
• on the very rare occasion that the operand or instruction is not in
main memory, the CPU must go to the hard disk to find it (while the
processor either waits idle or branches)
• most of the time the instructions/data are available in cache
giving the appearance of a large, fast memory!
• Memory addressing: 32 bit address can access 4 GB of data/instructions
• Speed & Cost of 4GB DRAM Main Memory:
• if all memory were only main memory (DRAM), 4 GB would
cost $24,000 at $6/MB
• access time would be only 50 ns, rather than the 2-3 ns obtainable
with on-board cache
• Memory hierarchy is essential to achieving high speed, large
memory, & low cost!!!
Why it works: Locality of Reference
• temporal locality
• programs tend to contain loops (often nested loops) where an
instruction and/or data are accessed many times in sequence
• spacial locality
• instructions and/or data that are stored in contiguous (neighboring)
locations are often repeatedly accessed for reading or writing in a
typical program
• memory heirarchy makes use of temporal and spacial locality by
transferring at one time a group of instructions/data into cache or into
main memory
• A group of instructions or data transferred from main memory into
cache is called a line of data (say 32 bytes)
• A group of instructions or data transferred from disk storage into
main memory is called a page of data (say 4K bytes)
Virtual memory = the appearance that all 4GB addressable memory resides
in main memory

Studies of the execution of computer programs have demonstrated the


importance of locality of reference in designing a hierarchical memory
system.
Temporal and spacial locality allow us to achieve a near infinite cache
in practice for the operation of most computer programs!
thrashing = phenomenon of frequent disk accesses due to a particular
program perhaps accessing a database which does not fit entirely into main
memory
Solution: need a larger main memory!

Computer Organization and


Cache Memory Organization
• Cache organization schemes:
• direct mapped
• fully associative
• set-associative
• Line: A block of data transferred into cache at a given time (4B in text
illustrations)
• the memory address is comprised of 5 bit tag, 3 bit index, and 2 bit
byte fields
• the cache stores both the data (line) as well as the main memory
address (tag) of the data
Hit and Miss
When CPU requests data from cache, the address of requested data is
compared with addresses of data in cache. If both tag and index
addresses match (called a cache hit), the requested data is present in
cache
data word (or byte) is transferred to CPU
If the address of requested data does not match tag plus index address
of data present in cache, the cache signals the CPU that a cache miss
has occurred.
Main memory transfers a new line (containing the requested data
word) into the cache and also sends the requested word (or byte)
along to the CPU
When a cache miss occurs and a new line of data is to be transferred
in from main memory, the cache is likely already full of existing data
lines so that one of them needs to be replaced with the new data line.
If the line to be replaced has been changed since it was brought in
from main memory, it must be written back into main memory first.

Computer Organization and


Direct Mapped Cache
• cache address given by the index address bits
• Example at left: 8 lines stored in cache with 3 index address bits
• a memory line can be mapped to only one location in cache given by the
index address
• on a cache access, tag bits for given index are compared with the CPU address
tag bits
• cache hit: tag bits are identical to address tag
• word is fetched from cache and sent on the bus to the CPU
• cache miss: tag bits do not match address tag bits 
• cache sends signal to main memory to fetch the correct line with
matching tag address bits
• new line of data (or instructions) is sent both to the CPU and to update
the cache
• direct mapping is not usually the most optimum for improving cache hit ratio
due to allowing only one index address in cache at any one time
Computer Organization and
Fully Associative Cache Mapping
• Any memory line can be placed into any location in the cache
• no limitation to only store one line of data for a given index address as
in the direct mapped cache
• Tag address now includes the three bits that were previously labeled
index address
• When a request for data comes from the CPU, the entire main memory tag
address must be compared with all tag addresses presently residing in the
cache (next chart), to see if the requested data word is in the cache.
• If not, a miss occurs and a new data line (with the requested word) is
brought into the cache from main memory
Computer Organization and
Two-way Set Associative Cache Mapping
• A set associative cache is a compromise between direct mapped and fully
associative cache approaches
• Index bits again specify cache (set) address
• can have two lines of data per set (2-way set associative) with two
different tag addresses or four lines of data per set (4-way set
associative) with four different tag addresses
• Example at left: 2 index bits  4 sets with two-way set associative
organization
• more realistic example: 16KB cache with 4-way set associativity, 16
bit address
• line size = 16 words = 64 bytes  256 lines in the cache
• 4 groups of 64 sets of 64B lines in the cache  6 bits in index
address, 6 bits for word & byte fields
• with a 16 bit address, there would be 4 bits left for the tag address
Computer Organization and
Set-Associative Cache Block Diagram

• Cache must have logic to perform 2-way or 4-way compare of cache tag
bits with CPU address tag bits
• if tag bits in cache match the CPU tag address, one of the match
logic outputs is “1”
• the selected word is gated out to the CPU/Main Memory bus by the
tri-state buffer
• only one match can occur for a given tag address
• a logic OR gate pulls Hit/Miss’ line high to signal to CPU a Hit is
achieved
• If the Tag address from CPU does not match any stored Tags, then both
Match logic circuits give zero outputs, pulling the Hit/Miss’ line low to
signal the CPU and main memory that a cache Miss has occurred.
Computer Organization and
Set-Associative Cache with 4-word Lines
• Cache lines are normally defined to contain many words
• 2n where n is the number of bits in the Word address field
• (earlier examples had assumed only one word per line for simplicity
of the charts)
• Example below:
• a Line contains 4 Words (each Word has 4 Bytes) implying 2 address
bits in the Word field
• the Index address contains 2 bits implying 4 sets of lines specified by
the Index address
• 10 bit address implies 4 bits in the Tag address field
• 2-way set associative organization implies that a given line of data can
be in either the “left” or “right” side of the cache at the specified
location given by index address bits
Computer Organization and
Cache Write/Replacement Method
Cache Replacement Algorithm:
• In the event of a cache miss, assuming all lines are filled in the
cache, some existing line must be replaced by the new line brought
in from main memory! Which line to replace?
• random replacement scheme
• FIFO scheme
• LRU (least recently used) algorithm
– a crude approximation to the LRU approach is often used
• Line size: bus between CPU and cache and between cache and memory
is made as wide as possible, based on the line size
• line size is a tradeoff between bandwidth to cache vs cost and
performance limitations
Cache Write Methods:
Write-Through
the result is written to main memory (and to the cache if there
is a cache hit)
write buffering sometimes used to avoid a slow down
Write-Back (also called copy-back)
CPU write only to the cache (assuming a cache hit)
if a cache miss occurs, two choices are possible
write-allocate: read the line to be written from main
memory and then write to both main memory & cache
write back only to main memory
Valid bit: indicates that the associated cache line is valid
Dirty bit: indicates that a line has been written in cache (dirty in main
memory)

Computer Organization and


256K Cache Example – Block Diagram
• 256KB memory with 2-way set associativity and write-through
approach
• 32 bit memory address with byte addressing capability
• line size = 16 bytes = 4 words  2 bits each for word address and
byte address
• index field = 13 bits  8192 sets
• tag field = 15 bits
• 2-way set associativity  16,384 line entries in cache = 64K words =
256K bytes
Computer Organization and
Additional Cache Hierarchy Design Issues

• Instruction and Data L1 Caches:


• two separate caches for instructions and for data increases bandwidth
from CPU to memory
• fetch instructions while at same time write or fetch data
• two separate caches allows individual (simpler) design
• instruction cache may be direct mapping while data cache may be
2-way or 4-way set associative
• sometimes a single cache (unified cache) is more economical/practical
Multiple-Level Caches:
a second level of cache (L2) often improves "infinite cache" access time
if L1 cache miss occurs, we go to L2 for instructions/data
L2 access time (latency) may be only 2X-4X longer than L1 cache
improvement over main memory which may be 10X-20X longer
latency
allows use of smaller single cycle L1 cache and larger 3-4 cycle L2
cache nearby
L2 may be on a separate chip on the back side of module, or due to
recent VLSI advances with 0.18 um and 0.13 um litho capability, it may
be on the same chip
example: Intel Coppermine Pentium III
example: AMD Athlon processor
example: recent IBM Power4 PPC processor has 1.5 MB of shared
L2 cache on chip with 2 processors each having their own L1 caches

Computer Organization and


Virtual Memory Concept and
Implementation
• Virtual Memory is large memory storage addressable by 32 bit (or higher)
memory address but beyond the size capability of the physical address space in
main memory
• desire that virtual memory appears to the CPU to be main memory
• addressability
• average latency not to exceed main memory access time by very much
• each program sees a memory space equal to the virtual address space
• the hardware must map each virtual address into a physical address in
main memory
• Virtual address space from different programs may map to the same physical
address space
• Allows code and data to be shared across multiple programs
• normally implemented with hard disk (or tape drive) storage
• Pages: Blocks of addresses in virtual memory which map into physical page
frames of identical size in main memory
• analogous to "line" in main memory/cache hierarchy
Virtual-to-Physical Memory Address
Mapping
• Mapping virtual memory pages into physical page frames: (see example at
left)
• page = 4KB (1K words x 32 bits)
• page offset address = 12 bits
• Used to address words or bytes within a page
• same 12 bits for both virtual address and physical address
• virtual page number = 20 bits
•  220 = 1M pages in virtual address space
• 16 MB main memory
•  212 = 4K page frames in main memory
• 24 bit main memory address contains 12 bit page offset and 12 bit
physical page frame number
• A virtual page can be mapped to any physical page frame
• Data in FFC and FFE would be invalid since no mapping is shown!
• Page Table: Data structure used to store the mappings between pages in
virtual memory and physical page frames in main memory
Computer Organization and
Format for Page Table Entries
• 12 bit physical page frame number
• Valid bit
• "1" if the page frame data in memory is valid, "0" if the data is invalid
• note that data will be invalid when power is first applied until all pages in main memory have
eventually been written by the CPU (or loaded from virtual memory)
• Dirty bit
• if "1", there has been a write to the page in main memory and a correct copy of the page
must be placed back in virtual memory before it can be replaced in main memory
• if "0", then there has not been a write into the page since it was written into main memory, so
it can simply be replaced because the copy in virtual memory is correct
• Used bit
• a simple approximation to a LRU (least recently used) replacement scheme for replacing
pages when main memory is full of valid pages and a miss occurs
• Other flag bits might also be present – such as page access authorization
Page Table Structure
• Page table mappings are themselves stored in page tables
• assume 1K page table mappings can be stored in one 4KB page table
• can be stored in either main memory or hard disk
• a Directory Page provides mappings used to locate the 4KB program
page tables
• Directory Page Pointer is a register which points to the location of the
directory page
• 32-bit virtual address:
• 20-bit virtual page number contains
• 10 bit directory offset used to locate page table page number from
directory
• 10 bit page table offset used to locate physical page number from
page table
• Physical page number points to the physical page frame in main
memory
• 12-bit page offset used to locate the desired word or byte within the
physical page frame
Computer Organization and
Translation Lookaside Buffer
• the TLB is a high speed cache which holds the locations of recently addressed
pages
• without the TLB, three accesses to main memory would be needed for a
single instruction or operand fetch
• access for directory entry
• access for the page table entry
• access for the operand or instruction
• designed as fully associative or set-associative
• virtual page number from the address is compared in parallel with all the virtual
page number tags in the TLB
• if a hit occurs, the physical page frame number is outputted and put with the
page offset
• if a miss occurs, the main memory is accessed for the directory table entry
and page table entry which is brought into the TLB cache
• requires 3 memory accesses
• if the physical page does not exist in main memory, a "page fault" occurs
• interrupt brings in S/W to fetch the page, while the CPU may execute a
different program
• if both physical page and page table are not in main memory, then two
pages are transferred!
Assuming both virtual memory and a cache in a typical CPU: 2 cycles are
required for TLB and cache accesses
Computer Organization and
Thank You!

Computer Organization and

You might also like