Unit 5 Notes (1)
Unit 5 Notes (1)
CACHE MEMORY
b. Set associative mapping - This form of mapping is an enhanced form of direct mapping where
the drawbacks of direct mapping are removed.A block in memory can map to any one of the
lines of a specific set.
2. Reducing the Miss Penalty Using Multilevel Caches
1. L1 cache, or primary cache, is extremely fast but relatively small, and is usually
embedded in the processor chip as CPU cache.
2. L2 cache, or secondary cache, is often more capacious than L1. L2 cache may be embedded on the
CPU, or it can be on a separate chip or coprocessor and have a high-speed alternative system bus
connecting the cache and CPU. That way it doesn't get slowed by traffic on the main system bus.
3. Level 3 (L3) cache is specialized memory developed to improve the performance of L1
and L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed
of DRAM.
2. Explain about the techniques of cache mapping.
Cache Mapping
There are three different types of mapping used for the purpose of cache memory which is as follows:
Direct Mapping
Associative Mapping
Set-Associative Mapping
1. Direct Mapping
The simplest technique, known as direct mapping, maps each block of main memory into only one
possible cache line. or In Direct mapping, assign each memory block to a specific line in the cache. If
a line is previously taken up by a memory block when a new block needs to be loaded, the old block
is trashed. An address space is split into two parts index field and a tag field. The cache is used to
store the tag field whereas the rest is stored in the main memory. Direct mapping`s performance is
directly proportional to the Hit ratio.
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping
For purposes of cache access, each main memory address can be viewed as consisting of three fields.
The least significant w bits identify a unique word or byte within a block of main memory. In most
contemporary machines, the address is at the byte level. The remaining s bits specify one of the
2s blocks of main memory. The cache logic interprets these s bits as a tag of s-r bits (the most
significant portion) and a line field of r bits. This latter field identifies one of the m=2 r lines of the
cache. Line offset is index bits in the direct mapping.
3. Set-Associative Mapping
This form of mapping is an enhanced form of direct mapping where the drawbacks of direct mapping
are removed. Set associative addresses the problem of possible thrashing in the direct mapping
method. It does this by saying that instead of having exactly one line that a block can map to in the
cache, we will group a few lines together creating a set. Then a block in memory can map to any one
of the lines of a specific set. Set-associative mapping allows each word that is present in the cache can
have two or more words in the main memory for the same index address. Set associative cache
mapping combines the best of direct and associative cache mapping techniques. In set associative
mapping the index bits are given by the set offset bits. In this case, the cache consists of a number of
sets, each of which consists of a number of lines.
Set-Associative Mapping
1. Random Replacement
2. FIFO
3. Least Recently Used
4. Most Recently used
1. Random Replacement - Randomly selects a candidate item and discards it to make
space when necessary. This algorithm does not require keeping any information about
the access history.
Here, A B C D are placed in the cache as there is still space available. At the 5th access E, we see
that the block which held D is now replaced with E as this block was used most recently. Another
access to C and at the next access to D, C is replaced as it was the block accessed just before D
and so on.
Memory Translations
The physical address space is your system RAM, the memory modules inside your ESXi hosts,
also referred to as the global system memory. When talking about virtual memory, we are talking
about the memory that is controlled by an operating system, or a hypervisor like vSphere ESXi.
Whenever workloads access data in memory, the system needs to look up the physical memory
address that matches the virtual address. This is what we refer to as memory translations or
mappings.
To map virtual memory addresses to physical memory addresses, page tables are used. A page
table consists of numerous page table entries (PTE).
One memory page in a PTE contains data structures consisting of different sizes of ‘words’. Each
type of word contains multiple bytes of data (WORD (16 bits/2 bytes), DWORD (32 bits/4 bytes)
and QWORD (64 bits/8 bytes)). Executing memory translations for every possible word, or virtual
memory page, into physical memory address is not very efficient as this could potentially be
billions of PTE’s. We need PTE’s to find the physical address space in the system’s global
memory, so there is no way around them.
To make memory translations more efficient, we use page tables to group chunks of memory
addresses in one mapping. Looking at an example of a DWORD entry of 4 bytes; A page table
covers 4 kilobytes instead of just the 4 bytes of data in a single page entry. For example, using a
page table, we can translate virtual address space 0 to 4095 and say this is found in physical
address space 4096 to 8191. Now we no longer need to map all the PTE’s separately, and be far
more efficient by using page tables.
TLB in Detail
The TLB acts as a cache for the MMU that is used to reduce the time taken to access physical
memory. The TLB is a part of the MMU. Depending on the make and model of a CPU, there’s
more than one TLB, or even multiple levels of TLB like with memory caches to avoid TLB misses
and ensuring as low as possible memory latency.
In essence, the TLB stores recent memory translations of virtual to physical. It is a cache for page
tables. Because it is part of the MMU, the TLB lives inside the CPU package. This is why the TLB
is faster than main memory, which is where the page tables exists. Typically access times for a
TLB is ~10 ns where main memory access times are around 100 ns.
Now that we covered the basics on memory translation, let’s take a look at some example
scenarios for the TLB.
TLB hit
A virtual memory address comes in, and needs to be translated to the physical address. The first
step is always to dissect the virtual address into a virtual page number, and the page offset. The
offset consists of the last bits of the virtual address. The offset bits are not translated and passed
through to the physical memory address. The offset contains bits that can represent all the
memory addresses in a page table.
So, the offset is directly mapped to the physical memory layer, and the virtual page number
matches a tag already in the TLB. The MMU now immediately knows what physical memory page
to access without the need to look into the global memory.
In the example provided in the above diagram, the virtual page number is found in the TLB, and
immediately translated to the physical page number.
1. The virtual address is dissected in the virtual page number and the page offset.
2. The page offset is passed through as it is not translated.
3. The virtual page number is looked up in the TLB, looking for a tag with the corresponding number.
4. There is an entry in the TLB (hit), meaning we immediately can translate the virtual to the physical
address.
TLB miss
What happens when a virtual page number is not found in the TLB, also referred to as a TLB
miss? The TLB needs to consult the system’s global memory to understand what physical page
number is used. Reaching out to physical memory means higher latency compared to a TLB hit. If
the TLB is full and a TLB miss occurs, the least recent TLB entry is flushed, and the new entry is
placed instead of it. In the following example, the virtual page number is not found in the TLB, and
the TLB needs to look into memory to get the page number.
1. The virtual address is dissected in the virtual page number and the page offset.
2. The page offset is passed through as it is not translated.
3. The virtual page number is looked up in the TLB, looking for a tag with a corresponding number. In
this example, the TLB does not yet have a valid entry.
4. TLB reaches out to memory to find page number 3 (because of the tag, derived from the virtual
page number). Page number 3 is retrieved in memory with value 0x0006.
5. The memory translation is done and the entry is now cached in the TLB.
A TLB miss is not ideal, but the worst-case scenario is data that is not residing in memory but on
storage media (flash or disk). Where we are talking nanoseconds to retrieve data in caches or
global memory, getting data from storage media will quickly run into milliseconds or seconds
depending on the media used.
1. The virtual address is dissected in the virtual page number and the page offset.
2. The page offset is passed through as it is not translated.
3. The virtual page number is looked up in the TLB, looking for a tag with a corresponding number. In
this example, the TLB does not yet have a valid entry.
4. TLB reaches out to memory to find page number 0 (because of the tag, derived from the virtual
page number). Page number 0 is retrieved in memory but finds that the data does not resides in
memory, but on storage. A page fault is triggered, because we cannot translate memory pages for
data that is not in memory. We need to wait for the data from storage.
4.Present an outline of virtual address, physical address, address translation, segmentation page table,
swap space and page fault.
VIRTUAL MEMORY
❖ Virtual Memory is a storage scheme that provides user an illusion of having a very big
main memory. This is done by treating a part of secondary memory as the main memory.
❖ In this scheme, User can load the bigger size processes than the available main
memory by having the illusion that the memory is available to load the process.
❖ Instead of loading one big process in the main memory, the Operating System loads
the different parts of more than one process in the main memory.
❖ By doing this, the degree of multiprogramming will be increased and therefore, the
CPU utilization will also be increased.
➢Virtual memory uses both hardware and software to operate. When an application is in use, data from
that program is stored in a physical address using RAM.
➢A memory management unit (MMU) map a logical address space to a corresponding physical address.
➢If, at any point, the RAM space is needed for something more urgent, data can be swapped out of
RAM and into virtual memory. The computer's memory manager is in charge of keeping track of the
shifts between physical and virtual memory. If that data is needed again, the computer's MMU will
use a context switch to resume execution.
➢While copying virtual memory into physical memory, the OS divides memory with a fixed number of
addresses into either pagefiles or swap files. Each page is stored on a disk, and when the page is
needed, the OS copies it from the disk to main memory and translates the virtual addresses into real
addresses.
➢ However, the process of swapping virtual memory to physical is rather slow. This
means using virtual memory generally causes a noticeable reduction in performance.
Address Translation
● Translation of the virtual page number to a physical page number. The physical page
number constitutes the upper portion of the physical address, while the page offset,
which is
not changed, constitutes the lower portion.
● The number of bits in the page offset field determines the page size.
Translation Lookaside Buffer
● A translation lookaside buffer (TLB) is a memory cache that stores the recent translations
of virtual memory to physical memory. It is used to reduce the time taken to access a user
memory location. It can be called an address-translation cache.
● TLB contains page table entries that have been most recently used.
● Since the page tables are stored in main memory, every memory access by a program can
take at least twice as long: one memory access to obtain the physical address and a second
access to get the data.
Process:
1. Given a virtual address, the processor examines the TLB if a page table entry is present (TLB
hit), the frame number is retrieved and the real address is formed.
2. If a page table entry is not found in the TLB (TLB miss), the page number is used as index while
processing page table. TLB first checks if the page is already in main memory, if not in main
memory a page fault is issued then the TLB is updated to include the new page entry.
● Allowing users to operate multiple applications at the same time or applications that are
larger than the main memory
● Freeing applications from having to compete for shared memory space and allowing
multiple applications to run at the same time
● Allowing core processes to share memory between libraries, which consists of written code
that provides the foundation for a program's operations
● Improving security by isolating and segmenting where the computer stores information
● Improving efficiency and speed by allowing more processes to sit in virtual memory
● Lowering the cost of computer systems as you find the right amount of main memory and
virtual memory.
5.Outline a direct memory access with a neat diagram.
Direct Memory Access(DMA)
● Direct memory access (DMA) is a feature of computer systems and allows certain
hardware subsystems to access main system memory independently of the central
processing unit (CPU).
● Without DMA, when the CPU is using programmed input/output, it is typically fully
occupied for the entire duration of the read or write operation, and is thus unavailable
to perform other work. With DMA, the CPU first initiates the transfer, then it does
other operations while the transfer is in progress, and it finally receives an interrupt
from the DMA controller (DMAC) when the operation is done.
● This feature is useful at any time that the CPU cannot keep up with the rate of data
transfer, or when the CPU needs to perform work while waiting for a relatively slow
I/O data transfer.
Working steps
1. If the DMA controller is free, it requests the control of bus from the
processor by raising the bus request signal.
2. Processor grants the bus to the controller by raising the bus grant signal, now DMA
controller is the bus master.
3. The processor initiates the DMA controller by sending the memory addresses,
number of blocks of data to be transferred and direction of data transfer.
4. After assigning the data transfer task to the DMA controller, instead of waiting
ideally till completion of data transfer, the processor resumes the execution of the
program after retrieving instructions from the stack.
5. It makes the data transfer according to the control instructions received by the
processor.
6. After completion of data transfer, it disables the bus request signal and CPU
disables the bus grant signal thereby moving control of buses to the CPU.
Types of Data Transfer
a) Burst Mode: In this mode DMA handover the buses to CPU only after completion of
whole data transfer. Meanwhile, if the CPU requires the bus it has to stay ideal and wait for
data transfer.
b) Cycle Stealing Mode: In this mode, DMA gives control of buses to CPU after transfer of
every byte. It continuously issues a request for bus control, makes the transfer of one byte
and returns the bus. By this CPU doesn’t have to wait for a long time if it needs a bus for
c) Transparent Mode: Here, DMA transfers data only when CPU is executing the
Key Points
➔ To speed up the transfer of data between I/O devices and memory, DMA
controller acts as station master.
➔ DMA controller is a control unit, part of I/O device’s interface circuit, which can
transfer blocks of data between I/O devices and main memory with minimal
intervention from the processor.
➔ It is controlled by the processor. The processor initiates the DMA controller by
sending the starting address, Number of words in the data block and direction of
transfer of data .i.e. from I/O devices to the memory or from main memory to I/O
devices.
➔ More than one external device can be connected to the DMA
controller.
➔ DMA controller contains an address unit, for generating addresses and selecting
I/O device for transfer.
➔ It also contains the control unit and data count for keeping counts of the number of
blocks transferred and indicating the direction of transfer of data.
➔ When the transfer is completed, DMA informs the processor by raising an
interrupt.
Advantages of DMA Controller
Data Memory Access speeds up memory operations and data transfer.
CPU is not involved while transferring data.
DMA requires very few clock cycles while transferring data.
DMA distributes workload very appropriately.
DMA helps the CPU in decreasing its load.
Disadvantages of DMA Controller
Direct Memory Access is a costly operation because of additional operations.
DMA suffers from Cache-Coherence Problems.
DMA Controller increases the overall cost of the system.
DMA Controller increases the complexity of the software.
I/O Interface - Parallel and Serial
● The I/O interface of a device consists of the circuitry needed to connect that
device to the bus.
● On one side of the interface are the bus lines for address, data, and control. On the
other side are the connections needed to transfer data between the interface and the
I/O . This side is called a port, and it can be either a parallel or a serial port.
● A parallel port transfers multiple bits of data simultaneously to or from the device.
A serial port sends and receives data one bit at a time.
Working of Keyboard:
A typical keyboard consists of mechanical switches that are normally open.
1. When a key is pressed, its switch closes and establishes a path for an
electrical signal.
2. This signal is detected by an encoder circuit that generates theASCII
code for the corresponding character.
Main issue:
Bouncing - A difficulty with such mechanical pushbutton switches is that the contacts
bounce when a key is pressed, resulting in the electrical connection being made then
broken several times before the switch settles in the closed position.
Solution:
1. Using debouncing circuit along with the encoder circuit.The I/O routine can
read the input character as soon as it detects that KIN is equal to 1.
2. Using software based solution -The software detects that a key has been pressed
when it observes that the keyboard status flag, KIN, has been set to 1. The I/O
routine can then introduce sufficient delay before reading the contents of the input
buffer, KBD_DATA, to ensure that bouncing has subsided.
Encoder circuit:
● The output of the encoder consists of one byte of data representing the encoded
character and one control signal called Valid.
● When a key is pressed, the Valid signal changes from 0 to 1, causing the ASCII
code of the corresponding character to be loaded into the KBD_DATA register
and the status flag KIN to be set to 1.
Status flag
● The status flag is cleared to 0 when the processor reads the contents of the
KBD_DATAregister.
Address
● When the processor requests a Read operation, it places the address of the
appropriate register on the address lines of the bus.
Slave Ready
● Slave-ready signal is set at the same time, to inform the processor that the requested
data or status information has been placed on the data lines.
STATUS FLAG CIRCUIT
Both input interface and output interface has one more diagram having gates. If
required- please refer book
Serial Interface
➔ A serial interface is used to connect the processor to I/O devices that transmit data
Asynchronous Transmission
● The line connecting the transmitter and the receiver is in the 1 state when idle.
● Start bit=0, followed by 8 data bits and 1 or 2 Stop bits. The Stop bits have a
logic value of 1.
● The 1-to-0 transition at the beginning of the Start bit alerts the receiver that
data transmission is about to begin.
● Using its own clock, the receiver determines the position of the next
8 bits, which it loads into its input register. The Stop bits following the transmitted
character, which are equal to 1, ensure that the Start bit of the next character will be
recognized.
● When transmission stops, the line remains in the 1 state until another character is
transmitted.
Synchronous Transmission
★ Asynchronous is useful only where the speed of transmission is sufficiently
low.
★ In synchronous transmission, the receiver generates a clock that is synchronized to
that of the transmitter by observing successive
1-to-0 and 0-to-1 transitions in the received signal.
★ It adjusts the position of the active edge of the clock to be in the center of the bit
position.
★ A variety of encoding schemes are used to ensure that enough signal transitions occur
to enable the receiver to generate a synchronized clock and to maintain
synchronization.
★ Once synchronization is achieved, data transmission can continue indefinitely