0% found this document useful (0 votes)
24 views

Virtual Memory MGMT - Module 4

Virtual memory allows a process to have a logical address space that is larger than the available physical memory. When a process accesses a memory page that is not currently loaded in RAM, a page fault occurs which triggers the operating system to load the requested page from secondary storage. This approach of demand paging improves performance by only loading pages into memory when they are needed rather than up front. Algorithms like page replacement are used to determine which page to remove from memory when no free frames are available.

Uploaded by

Harshitha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Virtual Memory MGMT - Module 4

Virtual memory allows a process to have a logical address space that is larger than the available physical memory. When a process accesses a memory page that is not currently loaded in RAM, a page fault occurs which triggers the operating system to load the requested page from secondary storage. This approach of demand paging improves performance by only loading pages into memory when they are needed rather than up front. Algorithms like page replacement are used to determine which page to remove from memory when no free frames are available.

Uploaded by

Harshitha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Chapter 9: Virtual Memory

● Background
● Demand Paging
● Copy-on-Write
● Page Replacement
● Allocation of Frames
● Thrashing
Objectives

● To describe the benefits of a virtual memory system


● To explain the concepts of demand paging, page replacement
algorithms, and allocation of page frames
● To discuss the principle of the working-set model
● To examine the relationship between shared memory and
memory-mapped files
● To explore how kernel memory is managed
Background
● Code needs to be in memory to execute, but entire program rarely
used – Error code, unusual routines, large data structures
● Entire program code not needed at same time
● Consider ability to execute partially-loaded program
– Program no longer constrained by limits of physical memory
– Each program takes less memory while running -> more programs
run at the same time
● Increased CPU utilization and throughput with no increase in
response time or turnaround time
– Less I/O needed to load or swap programs into memory -> each
user program runs faster
Background
• Virtual memory – separation of user logical memory from physical
memory
– Only part of the program needs to be in memory for execution
– Logical address space can therefore be much larger than physical
address space
– Allows address spaces to be shared by several processes
– Allows for more efficient process creation
– More programs running concurrently
– Less I/O needed to load or swap processes
Background

● Virtual address space – logical view of how process is stored in


memory
– Usually start at address 0, contiguous addresses until end of space
– Meanwhile, physical memory organized in page frames
– MMU must map logical to physical
• Virtual memory can be implemented via:
– Demand paging
– Demand segmentation
Virtual Memory That is Larger Than Physical Memory
Virtual-address Space

● Usually design logical address space for stack to start at


Max logical address and grow “down” while heap grows
“up”
● Maximizes address space use
● Unused address space between the two is hole
● No physical memory needed until heap or stack grows to
a given new page Enables sparse address spaces with
holes left for growth, dynamically linked libraries, etc
● System libraries shared via mapping into virtual address
space
● Shared memory by mapping pages read-write into virtual
address space
● Pages can be shared during fork(), speeding process
creation
Shared Library Using Virtual Memory
Demand Paging

● Could bring entire process into


memory at load time
● Or bring a page into memory only
when it is needed
– Less I/O needed, no unnecessary
I/O
– Less memory needed
– Faster response
– More users
Demand Paging

Similar to paging system with swapping (diagram on right)


• Page is needed reference to it
– invalid reference abort
– not-in-memory bring to memory
• Lazy swapper – never swaps a page into memory unless page will be
needed
– Swapper that deals with pages is a pager
Basic Concepts

With swapping, pager guesses which pages will be used before swapping
out again
• Instead, pager brings in only those pages into memory
• How to determine that set of pages?
– Need new MMU functionality to implement demand paging
• If pages needed are already memory resident
– No difference from non demand-paging
• If page needed and not memory resident
– Need to detect and load the page into memory from storage
• Without changing program behavior
• Without programmer needing to change code
Valid-Invalid Bit

● With each page table entry a valid–invalid bit is associated (v in-memory


– memory resident, i not-in-memory)
• Initially valid–invalid bit is set to i on all entries
• Example of a page table snapshot:

● During MMU address translation, if valid–invalid bit in page table entry


is i, access to a page marked invalid causes page fault trap
Page Table When Some Pages Are Not in Main Memory
Page Fault

• If there is a reference to a page, first reference to that page will trap to


operating system: page fault
1. Operating system looks at another table to decide:
– Invalid reference abort
– Just not in memory
2. Find free frame
3. Swap page into frame via scheduled disk operation
4. Reset tables to indicate page now in memory Set validation bit = v
5. Restart the instruction that caused the page fault
Steps in Handling a Page Fault
Aspects of Demand Paging
Extreme case – start process with no pages in memory
– OS sets instruction pointer to first instruction of process, non-memory-resident -> page
fault
– And for every other process pages on first access
– Pure demand paging (Never bring a page in to memory until it is required)
• Actually, a given instruction could access multiple pages -> multiple page faults
– Consider fetch and decode of instruction which adds 2 numbers from memory and stores
result back to memory
– Pain decreased because of locality of reference ( same set of pages accessed over a short
period of time)
• Hardware support needed for demand paging
1. Page table with valid / invalid bit
2. Secondary memory (swap device with swap space)
– Crucial is Instruction restart (we must be able to restart the process in exactly the same
place and state, except that the desired page is now in memory and is accessible)
Instruction Restart

Consider an instruction that could access several different locations – block


move(256 bytes)

– auto increment/decrement location


– Restart the whole operation?
• What if source and destination overlap? We cannot restart the instruction
2 methods
The microcode computes and attempts to access both ends of both blocks. The
move can then take place
The other solution uses temporary registers to hold the values of overwritten
locations. If there is a page fault, all the old values are written back into memory
so that the instruction can be repeated
Performance of Demand Paging
• Stages in Demand Paging (worse case)

1. Trap to the operating system

2. Save the user registers and process state

3. Determine that the interrupt was a page fault

4. Check that the page reference was legal and determine the location of the page on the disk

5. Issue a read from the disk to a free frame:

5.1 Wait in a queue for this device until the read request is serviced

5.2. Wait for the device seek and/or latency time

5.3. Begin the transfer of the page to a free frame


6. While waiting, allocate the CPU to some other user

7. Receive an interrupt from the disk I/O subsystem (I/O completed) 8. Save the registers and process state
for the other user

9. Determine that the interrupt was from the disk

10. Correct the page table and other tables to show page is now in memory

11. Wait for the CPU to be allocated to this process again

12. Restore the user registers, process state, and new page table, and then resume the interrupted instruction
Performance of Demand Paging

• Demand paging can significantly affect the performance of a computer system


• Page Fault Rate 0 ≤ p ≤ 1.0
– if p = 0 no page faults
– if p = 1, every reference is a fault

• Effective Access Time (EAT)


EAT = (1 – p) x memory access
+ p (page fault overhead
+ [swap page out ]
+ swap page in
+ restart overhead)
Demand Paging Example
Memory access time = 200 nanoseconds
• Average page-fault service time = 8 milliseconds
• EAT = (1 – p) x 200 + p (8 milliseconds)
= (1 – p ) x 200 + p x 8,000,000
= 200 + p x 7,999,800
• If one access out of 1,000 causes a page fault, then EAT = 8.2 microseconds.
This is a slowdown by a factor of 40!!
• If want performance degradation < 10 percent
– 220 > 200 + 7,999,800 x p
20 > 7,999,800 x p
– p < .0000025 – < one page fault in every 400,000 memory accesses
Copy-on-Write
• Copy-on-Write (COW) allows both parent and child processes to initially share the
same pages in memory
– If either process writes to a shared page, a copy of the shared page is created
• COW allows more efficient process creation as only modified pages are copied
• In general, operating system allocates the free pages using a technique known as
zero-fill-ondemand
– Pool should always have free frames for fast demand page execution
• Don’t want to have to free a frame as well as other processing on page fault
– Why zero-out a page before allocating it?
because it erases the previous content
• vfork()( for virtual memory fork) variation on fork() system call has parent process
is suspended and child using address space of the parent process
– Designed to have child call exec()
– Because no copying of pages takes place, vfork is very efficient method of process
creation
Before Process 1 Modifies Page C

After Process 1 Modifies Page C


What Happens if There is no Free Frame?

Used up by process pages


• Also in demand from the kernel, I/O buffers, etc
• How much to allocate to each?
• Page replacement – find some page in memory, but not really in use,
page it out
– Algorithm – terminate? swap out? replace the page?
– Performance – want an algorithm which will result in minimum
number of page faults
• Same page may be brought into memory several times
Page Replacement
• Prevent over-allocation of memory by modifying page-fault service routine to
include page replacement.

• Use modify (dirty) bit to reduce overhead of page transfers


– only modified pages are written to disk.

• Page replacement completes separation between logical memory and physical


memory
– large virtual memory can be provided on a smaller physical memory.
Need For Page Replacement
Basic Page Replacement

Page replacement takes the following approach.


1. Find the location of the desired page on the disk
2. Find a free frame:
a) If there is a free frame, use it
b) If there is no free frame, use a page replacement algorithm to select a victim
frame
c) Write the victim frame to the disk; change the page and frame tables
accordingly
3. Bring the desired page into the newly free frame; change the page and frame
tables
4. Restart the user process
Note : if no frames are free, two page transfers are required. This situation
doubles the page fault service time and increases the effective access time
Page Replacement
Page and Frame Replacement Algorithms
● We must solve two major problems to implement demand paging
Frame-allocation algorithm determines
– How many frames to give for each process
– Which frames to replace
• Page-replacement algorithm
– Want lowest page-fault rate on both first access and re-access
• We evaluate algorithm by running it on a particular string of memory references
(reference string) and computing the number of page faults on that string
– String is just page numbers, not full addresses
– Repeated access to the same page does not cause a page fault
– Results depend on number of frames available
• In all our examples, the reference string of referenced page numbers is
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
Graph of Page Faults Versus The Number of Frames
First-In-First-Out (FIFO) Algorithm
•This is the simplest page replacement algorithm.
In this algorithm, the operating system keeps track of all pages in the memory in a
queue, the oldest page is in the front of the queue. When a page needs to be replaced page
in the front of the queue is selected for removal.
Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
• 3 frames (3 pages can be in memory at a time per process)
FIFO Illustrating Belady’s Anomaly

In the graph we see that the number of page fault for four frames is more than
for three frames.This is an unexpected result known as Belady’s Anomaly.
Optimal Algorithm
• Replace page that will not be used for longest period of time
– 9 is optimal for the example
How do you know this? – Can’t read the future
• Used for measuring how well your algorithm performs
Least Recently Used (LRU) Algorithm
Use past knowledge rather than future
• Replace page that has not been used for the longest period of time
• Associate time of last use with each page
• Page faults=12
Contd..

12 faults – better than FIFO but worse than OPT

• Generally good algorithm and frequently used

• The major problem is how to implement?

- it requires substantial hardware assistance


LRU Algorithm (Cont.)
• Counter implementation
– Every page entry has a counter; every time page is referenced through this
entry, copy the clock into the counter
– When a page needs to be changed, look at the counters to find smallest value
• Search through table needed
• Stack implementation
– Keep a stack of page numbers in a double link form:
– Page referenced:
• move it to the top
• requires 6 pointers to be changed
– But each update more expensive
– No search for replacement
• LRU and OPT are cases of stack algorithms that don’t have Belady’ s Anomaly
Use Of A Stack to Record Most Recent Page References
LRU Approximation Algorithms
• LRU needs special hardware and still slow
• Reference bit
– With each page associate a bit, initially = 0
– When page is referenced bit set to 1
– Replace any with reference bit = 0 (if one exists)
• We do not know the order, however
• Second-chance algorithm(dock algorithm)
– Generally FIFO, plus hardware-provided reference bit
– Clock replacement
– If page to be replaced has
• Reference bit = 0 -> replace it
• reference bit = 1 then:
– set reference bit 0, leave page in memory
– replace next page, subject to same rules
Second-Chance (clock) Page-Replacement Algorithm
Enhanced Second-Chance Algorithm

• Improve algorithm by using reference bit and modify bit (if available) in
concert
• Take ordered pair (reference, modify)
1. (0, 0) neither recently used not modified – best page to replace
2. (0, 1) not recently used but modified – not quite as good, must write out before
replacement
3. (1, 0) recently used but clean – probably will be used again soon
4. (1, 1) recently used and modified – probably will be used again soon and need
to write out before replacement
• When page replacement called for, use the clock scheme but use the four classes
replace page in lowest non-empty class
– Might need to search circular queue several times
Counting Algorithms

• Keep a counter of the number of references that have been made to each page
– Not common
• Lease Frequently Used (LFU) Algorithm: replaces page with smallest count
• Most Frequently Used (MFU) Algorithm: based on the argument that the page
with the smallest count was probably just brought in and has yet to be used
Page-Buffering Algorithms

• Keep a pool of free frames, always


– Then frame available when needed, not found at fault time
– Read page into free frame and select victim to evict and add to free pool
– When convenient, evict victim
• Possibly, keep list of modified pages
– When backing store otherwise idle, write pages there and set to non-dirty
• Possibly, keep free frame contents intact and note what is in them
– If referenced again before reused, no need to load contents again from disk
– Generally useful to reduce penalty if wrong victim frame selected
Applications and Page Replacement

• All of these algorithms have OS guessing about future page access


• Some applications have better knowledge – i.e. databases
• Memory intensive applications can cause double buffering
– OS keeps copy of page in memory as I/O buffer
– Application keeps page in memory for its own work
• Operating system can given direct access to the disk, getting out of the way of
the applications
– Raw disk mode
• Bypasses buffering, locking, etc
Allocation of Frames

• Each process needs minimum number of frames


• Example: IBM 370
– 6 pages to handle SS MOVE instruction:
– instruction is 6 bytes, might span 2 pages
– 2 pages to handle from
– 2 pages to handle to
• Maximum of course is total frames in the system
• Two major allocation schemes
– fixed allocation
– priority allocation
• Many variations
Fixed Allocation
• Equal allocation
– For example, if there are 100 frames (after allocating frames for the OS) and 5
processes, give each process 20 frames
– Keep some as free frame buffer pool
• Proportional allocation
– Allocate according to the size of process
– Dynamic as degree of multiprogramming, process sizes change
Priority Allocation

• Use a proportional allocation scheme using priorities rather than size


• If process Pi generates a page fault,
– select for replacement one of its frames
– select for replacement a frame from a process with lower priority
number
Global vs. Local Allocation
• Global replacement

– process selects a replacement frame from the set of all frames;


one process can take a frame from another
– But then process execution time can vary greatly
– But greater throughput so more common

• Local replacement

– each process selects from only its own set of allocated frames
– More consistent per-process performance
– But possibly underutilized memory
Non-Uniform Memory Access
• So far all memory accessed equally
• Many systems are NUMA – speed of access to memory varies
– Consider system boards containing CPUs and memory, interconnected over a
system bus
• Optimal performance comes from allocating memory “close to” the CPU on
which the thread is scheduled
– And modifying the scheduler to schedule the thread on the same system board
when possible
– Solved by Solaris by creating lgroups
• Structure to track CPU / Memory low latency groups
• Used my schedule and pager
• When possible schedule all threads of a process and allocate all memory for that
process within the lgroup
Thrashing

• If a process does not have “enough” pages, the page-fault rate is very high
– Page fault to get page
– Replace existing frame
– But quickly need replaced frame back
– This leads to:
● Low CPU utilization
● Operating system thinking that it needs to increase the degree of
multiprogramming
● Another process added to the system
● A process is thrashing if it is spending more time paging than executing.
● This high paging activity is called thrashing
Cause of Thrashing
We can limit the effects of thrashing by using a local replacement algorithm (or
priority replacement algorithm).
• To prevent thrashing, we must provide a process with as many frames as it
needs
Cause of Thrashing (contd)

• But how do we know how many frames it "needs'?


• The working-set strategy starts by looking at how many frames a process
is actually using Locality model
– Process migrates from one locality to another
– A locality is a set of pages that are actively used together
– Localities may overlap
For example, when a function is called, it defines a new locality mem ref
for instruction, local & global var are made
When we exit the function, the process leaves this locality.
Cause of thrashing(contd..)

• localities are defined by the program structure and its data structures.
• If accesses to any types of data were random rather than patterned,
caching would be useless
• Why does thrashing occur?
size of locality > total memory size
– Limit effects by using local or priority page replacement
Locality In A Memory-Reference Pattern
Working-Set Model

• Working-set model is based on the assumption of locality.


• This model uses a parameter A, to define the working set window.
• The idea is to examine the most recent A page references.
• If a page is in,active use, it will be in the working set. If it is no longer
being used, it will drop from the working set A time units after its last
reference.
• working-set strategy prevents thrashing while keeping the degree of
multiprogramming as high as possible. Thus, it optimizes CPU utilization.
Working-Set Model
Working-Set Model

• Why is this not completely accurate?


– because we cannot tell where, within an interval of 5,000, a reference
occurred
• Improvement = We can reduce the uncertainty by increasing the number
of history bits and the frequency of interrupts (eg 10 bits and interrupt
every 1000 time units)
• the cost to service these more
• frequent interrupts will be correspondingly higher.
Page-Fault Frequency (PFF)

• More direct approach than WSS


• We can establish upper and lower bounds on the desired page-fault rate
– If actual rate falls below the lower limit, too low, process loses frame
– If actual rate exceeds the upper limit, too high, process gains frame
Working Sets and Page Fault Rates

● Direct relationship between working set of a process and its page-fault rate
Working set changes over time
● Peaks and valleys over time
● A peak in the page fault rate occurs when we begin demand paging a new
locality
● When the process moves to the new working set, the page fault rate rises
towards the peak once again.

You might also like