Usha Mittal Institute of Technology SNDT Women'S University: MUMBAI - 400049
Usha Mittal Institute of Technology SNDT Women'S University: MUMBAI - 400049
Year: 2020-21
Page | 1
MEMORY ORGANIZATION
INDEX
CONTENTS
Sr.No. Page No.
Introduction
1 3
Memory Mapping
2 4
Memory hierarchy
3 6
Cache memory
4 8
Associative memory
5 13
6 Virtual memory 15
Conclusion
8 19
Page | 2
INTRODUCTION
A memory unit is the collection of storage units or devices together. The memory unit
stores the binary information in the form of bits. Generally, memory/storage is classified
into 2 categories:
Volatile Memory: This loses its data, when power is switched off.
Non-Volatile Memory: This is a permanent storage and does not lose any data when
power is switched off.
Page | 3
MEMORY-MAPPING
Application performance is the cost associated with this simplicity. Using mapped
memory does mean that the programmer gives up control over the data movement between
the host and devices. From the forums and experience, it is not unusual for kernel
performance to drop when using mapped memory because there are no guarantees when or
how often data will need to be transferred across the PCIe bus. Other considerations to
using mapped memory include:
Page | 4
If the contents of the mapped memory are modified, the application must
synchronize memory accesses using streams or events to avoid any potential read-
after-write, write-after-read, or write-after write hazards.
The host memory needs to be page aligned. The simplest and most portable way to
enforce this is to use cudaAllocHost() when allocating mapped host memory.
Benefits of Memory-Mapping:
The principal benefits of memory-mapping are efficiency, faster file access, the ability
to share memory between applications, and more efficient coding.
Page | 5
MEMORY HIERARCHY
The many trade-offs in designing for high performance will include the structure of
the memory hierarchy, i.e. the size and technology of each component. So the various
components can be viewed as forming a hierarchy of memories (m1,m2,...,mn) in which
each member mi is in a sense subordinate to the next highest member mi+1 of the
hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer
and then signaling to activate the transfer.
There are four major storage levels.
1. Internal – Processor registers and cache.
2. Main – the system RAM and controller cards.
3. On-line mass storage – Secondary storage.
4. Off-line bulk storage – Tertiary and Off-line storage.
Page | 6
This is a general memory hierarchy structuring. Many other structures are useful. For
example, a paging algorithm may be considered as a level for virtual memory when
designing a computer architecture.
Page | 7
CACHE MEMORY
Cache memory, also called CPU memory, is random access memory (RAM) that a
computer microprocessor can access more quickly than it can access regular RAM. This
memory is typically integrated directly with the CPU chip or placed on a separate chip that
has a separate bus interconnect with the CPU. The basic purpose of cache memory is to
store program instructions that are frequently re-referenced by software during operation.
Fast access to these instructions increases the overall speed of the software program.
As the microprocessor processes data, it looks first in the cache memory; if it finds the
instructions there (from a previous reading of data), it does not have to do a more time-
consuming reading of data from larger memory or other data storage devices. Most
programs use very few resources once they have been opened and operated for a time,
mainly because frequently re-referenced instructions tend to be cached. This explains why
measurements of system performance in computers with slower processors but larger caches
tend to be faster than measurements of system performance in computers with faster
processors but more limited cache space.
Page | 8
Memory cache configurations:
Caching configurations continue to evolve, but memory cache traditionally works
under three different configurations:
Direct mapping, in which each block is mapped to exactly one cache location.
Conceptually, this is like rows in a table with three columns: the data block or cache
line that contains the actual data fetched and stored, a tag that contains all or part of
the address of the fetched data, and a flag bit that connotes the presence of a valid bit
of data in the row entry.
Fully associative mapping is similar to direct mapping in structure, but allows a block
to be mapped to any cache location rather than to a pre-specified cache location (as is
the case with direct mapping).
Set associative mapping can be viewed as a compromise between direct mapping and
fully associative mapping in which each block is mapped to a subset of cache
locations. It is sometimes called N-way set associative mapping, which provides for a
location in main memory to be cached to any of "N" locations in the L1 cache.
Specialized caches:
In addition to instruction and data caches, there are other caches designed to provide
specialized functions in a system. By some definitions, the L3 cache is a specialized cache
because of its shared design.
Other definitions separate instruction caching from data caching, referring to each as
a specialized cache. Other specialized memory caches include the translation lookaside
buffer (TLB) whose function is to record virtual address to physical address translations. Still
other caches are not, technically speaking, memory caches at all. Disk caches, for example,
may leverage RAM or flash memory to provide much the same kind of data caching as
memory caches do with CPU instructions.
Page | 9
If data is frequently accessed from disk, it is cached into DRAM or flash-based silicon
storage technology for faster access and response. In the video below, Dennis Martin,
founder and president of Demartek LLC, explains the pros and cons of using solid-state
drives as cache and as primary storage.
Specialized caches also exist for such applications as Web browsers, databases,
network address binding and client-side Network File System protocol support. These types
of caches might be distributed across multiple networked hosts to provide greater scalability
or performance to an application that uses them.
A CPU cache is a cache used by the central processing unit (CPU) of a computer to
reduce the average time to access data from the main memory. The cache is a smaller, faster
memory which stores copies of the data from frequently used main memory locations. Most
CPUs have different independent caches, including instruction and data caches, where the
data cache is usually organized as a hierarchy of more cache levels (L1, L2 etc.).
When the processor needs to read from or write to a location in main memory, it first
checks whether a copy of that data is in the cache. If so, the processor immediately reads
from or writes to the cache, which is much faster than reading from or writing to main
memory.
Most modern desktop and server CPUs have at least three independent caches: an
instruction cache to speed up executable instruction fetch, a data cache to speed up data
fetch and store, and a translation lookaside buffer (TLB) used to speed up virtual-to-physical
address translation for both executable instructions and data. The data cache is usually
organized as a hierarchy of more cache levels.
Cache entries:
Data is transferred between memory and cache in blocks of fixed size, called cache
lines. When a cache line is copied from memory into the cache, a cache entry is created.
Page | 10
The cache entry will include the copied data as well as the requested memory location (now
called a tag). When the processor needs to read or write a location in main memory, it first
checks for a corresponding entry in the cache. The cache checks for the contents of the
requested memory location in any cache lines that might contain that address. If the
processor finds that the memory location is in the cache, a cache hit has occurred. However,
if the processor does not find the memory location in the cache, a cache miss has occurred.
In the case of:
a cache hit, the processor immediately reads or writes the data in the cache line
a cache miss, the cache allocates a new entry and copies in data from main memory,
then the request is fulfilled from the contents of the cache.
Cache performance:
The proportion of accesses that result in a cache hit is known as the hit rate, and can
be a measure of the effectiveness of the cache for a given program or algorithm. Read misses
delay execution because they require data to be transferred from memory, which is much
slower than reading from the cache. Write misses may occur without such penalty, since
the processor can continue execution while data is copied to main memory in the
background.
Replacement policies:
In order to make room for the new entry on a cache miss, the cache may have to evict
one of the existing entries. The heuristic that it uses to choose the entry to evict is called the
replacement policy. The fundamental problem with any replacement policy is that it must
predict which existing cache entry is least likely to be used in the future. Predicting the
future is difficult, so there is no perfect way to choose among the variety of replacement
policies available. One popular replacement policy, least-recently used (LRU), replaces the
least recently accessed entry. Marking some memory ranges as non-cacheable can improve
performance, by avoiding caching of memory regions that are rarely re-accessed. This
avoids the overhead of loading something into the cache without having any reuse.
Cache entries may also be disabled or locked depending on the context.
Write policies:
If data is written to the cache, at some point it must also be written to main memory.
The timing of this write is known as the write policy. In a write-through cache, every write
to the cache causes a write to main memory. Alternatively, in a write-back or copy-back
cache, writes are not immediately mirrored to the main memory. Instead, the cache tracks
Page | 11
which locations have been written over (these locations are marked dirty). The data in these
locations are written back to the main memory only when that data is evicted from the
cache. For this reason, a read miss in a write-back cache may sometimes require two
memory accesses to service: one to first write the dirty location to memory and then
another to read the new location from memory. There are intermediate policies as well. The
cache may be write-through, but the writes may be held in a store data queue temporarily,
usually so that multiple stores can be processed together (which can reduce bus turnarounds
and improve bus utilization). The data in main memory being cached may be changed by
other entities (e.g. peripherals using direct memory access or multi-core processor), in
which case the copy in the cache may become out-of-date or stale. Alternatively, when a
CPU in a multiprocessor system updates data in the cache, copies of data in caches
associated with other CPUs will become stale. Communication protocols between the cache
managers which keep the data consistent are known as cache coherence protocols.
CPU stalls:
The time taken to fetch one cache line from memory (read latency) matters because
the CPU will run out of things to do while waiting for the cache line. When a CPU reaches
this state, it is called a stall. As CPUs become faster, stalls due to cache misses displace more
potential computation; modern CPUs can execute hundreds of instructions in the time
taken to fetch a single cache line from main memory. Various techniques have been
employed to keep the CPU busy during this time.
Out-of-order CPUs (the Pentium Pro and later Intel designs, for example) attempt to
execute independent instructions after the instruction that is waiting for the cache
miss data.
Page | 12
ASSOCIATIVE MEMORY
Associative memory is found on a computer hard drive and used only in specific high-
speed searching applications. Most computer memory known as random access memory, or
RAM, works through the computer user providing a memory address and then the RAM
will return whatever data is stored at that memory address. However, CAM works through
the computer user providing a data word and then searching throughout the entire
computer memory to see if the word is there. If the computer finds the data word then it
offers a list of all of the storage addresses where the word was found for the user.
CAM is faster than RAM in almost every search application, but many people stick
with RAM for their computers because a computer with CAM is more expensive than
RAM. The reason for the price increase for CAM computers is because with CAM
computers, each cell has to have the full storage capability and logic circuits that can match
content with external argument. Associative memory computers are best for users that
require searches to take place quickly and whose searches are critical for job performance
on the machine.
Or
A storage unit of digital computers in which selection (entry) is performed not
according to concrete address but rather according to a preset combination (association) of
attributes characteristic of the desired information. Such attributes can be part of a word
(number), attached to it for detection among other words, certain features of the word itself
(for example, the presence of specific codes in its digits), the absolute value of a word, its
presence in a preset range, and so on.
Page | 14
VIRTUAL MEMORY
The operating system manages virtual address spaces and the assignment of real re
memory to virtual memory. Address translation hardware in the CPU, often referred to as a
memory management unit or MMU, automatically translates virtual addresses to physical
addresses. Software within the operating system may extend these capabilities to provide a
virtual address space that can exceed the capacity of real memory and thus reference more
memory than is physically present in the computer.
The primary benefits of virtual memory include freeing applications from having to
manage a shared
ed memory space, increased security due to memory isolation, and being able
to conceptually use more memory than might be physically available, using the technique
of paging.
Properties:
Virtual memory makes application programming easier by hiding fragmentation of
physical memory; by delegating to the kernel the burden of managing the memory
hierarchy (eliminating the need for the program to handle overlays explicitly); and, when
each process is run in its own dedicated address space, by obviating the
t need to relocate
program code or to access memory with relative addressing.
Eventually, the OS will need to retrieve the data that was moved to temporarily to
disk storage -- but remember, the only reason the OS moved pages of data from RAM to
disk storage to begin with was because it was running out of RAM. To solve the problem,
the operating system will need to move other pages to hard disk so it has room to bring back
the pages it needs right away from temporary disk storage. This process is known as paging
or swapping and the temporary storage space on the hard disk is called a pagefile or a swap
file.
Swapping, which happens so quickly that the end user doesn't know it's happening, is
carried out by the computer’s memory manager unit (MMU). The memory manager unit
may use one of several algorithms to choose which page should be swapped out, including
Least Recently Used (LRU), Least Frequently Used (LFU) or Most Recently Used (MRU).
Page | 16
MEMORY MANAGEMENT UNIT
A hardware device or circuit that supports virtual memory and paging by translating
virtual addresses into physical addresses.
The virtual address space (the range of addresses used by the processor) is divided into
pages, whose size is 2^N, usually a few kilobytes. The bottom N bits of the address (the
offset within a page) are left unchanged. The upper address bits are the (virtual) page
number. The MMU contains a page table which is indexed (possibly associatively) by the
page number. Each page table entry (PTE) gives the physical page number corresponding to
the virtual one. This is combined with the page offset to give the complete physical address.
A PTE may also include information about whether the page has been written to,
when it was last used (for a least recently used replacement algorithm), what kind of
processes (user mode, supervisor mode) may read and write it, and whether it should be
cached.
It is possible that no physical memory (RAM) has been allocated to a given virtual
page, in which case the MMU will signal a "page fault" to the CPU. The operating system
will then try to find a spare page of RAM and set up a new PTE to map it to the requested
virtual address. If no RAM is free it may be necessary to choose an existing page, using some
replacement algorithm, and save it to disk (this is known as "paging"). There may also be a
shortage of PTEs, in which case the OS will have to free one for the new mapping.
Page | 17
In a multitasking system all processes compete for the use of memory and of the
MMU. Some memory management architectures allow each process to have its own area or
configuration of the page table, with a mechanism to switch between different mappings on
a process switch. This means that all processes can have the same virtual address space
rather than require load-time relocation.
The memory addresses that it uses to reference its data is the logical address. The real
time translation to the physical address is performed in hardware by the CPU’s Memory
Management Unit (MMU). The MMU has two special registers that are accessed by the
CPU’s control unit. A data to be sent to main memory or retrieved from memory is stored in
the Memory Data Register (MDR). The desired logical memory address is stored in the
Memory Address Register (MAR). The address translation is also called address binding and
uses a memory map that is programmed by the operating system.
Page | 18
CONCLUSION
First, memory is a critical psychological function. You can have a behaving organism
which doesn't have a memory -- which operates purely on reflex, taxis, and instinct to
respond to physical stimuli that are present in the current environment. But such an
organism is severely limited.
Memory takes us beyond the present, and permits us to transcend the here-and-
now. Without memory, intelligent behavior -- behavior which responds flexibly to changes
in the situation -- just isn't possible, because memory provides the cognitive basis for other
cognitive functions.
The field of memory management is large, complex, time consuming to research and
difficult to apply to practical implementations.This is partially related to the difficulty of
modeling how systems behave in real multi-programmed systems which has resulted in
theoretical examination of virtual memory algorithms often depending on simulations of
specific workloads. Simulations are necessary as modeling how scheduling, paging behavior and
multiple processes interact presents a considerable challenge. Page replacement policies, a field
that has been the focus of considerable amounts of research, is a good example as it is only ever
shown to work well for specified workloads. The problem of adjusting algorithms and policies
to different workloads is addressed by having administrators tune systems as much as by
research and algorithms.
Page | 19