Terminologies in Cache Memory Organization
Last Updated :
26 Apr, 2025
Cache memory is a small, high-speed storage used to temporarily hold frequently accessed data or instructions. It helps speed up processing by providing faster access to data compared to main memory (RAM). Cache memory is typically smaller but much faster than main memory, and it holds a portion of the data that's currently in use.
The cache memory’s performance and structure are defined by several factors, such as its size, the number of sets it has, its block size, and how data is organized within it. Additionally, cache systems can be designed with different strategies for fetching and writing data, which impact how efficiently the cache operates.
In some systems, the cache might be split into two separate parts: one for instructions (code) and one for data. This is called a Harvard architecture. Otherwise, a Von Neumann architecture may use a single cache for both instructions and data. The combination of these factors determines how well the cache performs and how effectively it speeds up the computer's overall processing.
Read more about Cache Memory
1. First-Level Cache (L1 Cache)
The L1 cache is the closest cache to the CPU. It is split into two separate caches:
- L1I: Cache for instructions
- L1D: Cache for data
2. Second-Level Cache (L2 Cache)
The L2 cache is also known as the secondary cache. It is located between the CPU and the main memory and serves as a buffer to improve access times.
3. Main Memory
Main memory or RAM is the last level of memory the CPU will check when looking for data. If the data is not found in the cache, the CPU fetches it from main memory.
4. Memory Hierarchy
A memory hierarchy describes the levels of memory between the CPU and main memory. For a true hierarchical structure, there should be at least two caches between the CPU and main memory. Caches closer to the CPU are referred to as upstream or predecessor caches, while caches closer to the main memory are referred to as downstream or successor caches.
5. Block
A block (or line) is the smallest unit of data in the cache. It is associated with a tag, which helps identify the corresponding part of the main memory.
6. Set
A set is a collection of blocks that can be checked in parallel. If a cache has only one block per set, it is called a direct-mapped cache. A set-associative cache has multiple blocks per set, allowing more flexibility in placing data.
Read about Cache Mapping Techniques
7. Associativity
Associativity refers to the number of blocks in a set. A direct-mapped cache has one block per set, meaning each memory block has a single location in the cache. Higher degrees of associativity allow for multiple possible locations for each memory block, improving cache performance.
8. Sub-block
A sub-block is a smaller unit of data within a block, associated with a valid bit. The size of the sub-block is typically less than or equal to the block size.
9. Fetch Size
The fetch size is the maximum amount of memory that can be fetched from the next memory level (such as from main memory to cache). It is typically a multiple of the sub-block size and can be larger or smaller than the block size.
10. Read
A read is a request to fetch a consecutive collection of words from a cache at a specific address. The CPU generates both instruction fetches and load references, both of which are read operations.
11. Write
A write request contains an address, a number of sub-blocks to be written, and a mask to define the data to be written.
12. Read Miss
A read miss occurs when the data requested is not found in the cache. It can happen if none of the tags in the set match the requested address, or if one or more of the sub-blocks in a matching block are invalid.
13. Miss Ratios
- Local Read Miss Ratio: The number of read misses in a cache divided by the total number of read requests to that cache.
- Global Read Miss Ratio: The number of read misses in a cache divided by the number of read requests generated by the CPU.
- Solo Read Miss Ratio: The miss ratio for a cache when it is the only cache in the memory hierarchy.
14. Read Traffic Ratio
The read traffic ratio is the number of words fetched from the next level in the hierarchy (such as from main memory) divided by the number of words fetched from the cache.
15. Write Traffic Ratio
The write traffic ratio is the ratio of the number of words written out by a cache to the number of words written to the next level in the hierarchy.
16. Fetch Strategy
There are two basic strategies for fetching data:
- Write-through: Data is written to both the cache and the next level of memory simultaneously.
- Write-back: Data is initially written only to the cache and is later written to the next level of memory when it is evicted.
Both strategies may involve write buffering (width and depth), and a strategy for handling write misses must also be chosen.
Read more about Write Through and Write Back in Cache
17. Replacement Strategy
The replacement strategy determines how the cache decides which block to evict when space is needed. Common strategies include:
- Random: A random block is chosen for eviction.
- Least Recently Used (LRU): The block that has not been accessed for the longest time is evicted.
For direct-mapped caches, there is no need for a replacement strategy, as each block has a fixed location in the cache.
18. Hit Ratio
The hit ratio is the proportion of cache accesses that result in a cache hit (i.e., the requested data is found in the cache).
Formula: Hit Ratio = Number of Hits / Total Number of References
19. Miss Ratio
The miss ratio is the proportion of cache accesses that result in a cache miss (i.e., the requested data is not found in the cache).
Formula: Miss Ratio = Number of Misses / Total Number of References
Since Miss Ratio = 1 - Hit Ratio, they are complementary.
20. Miss Penalty
The miss penalty is the time required to fetch a block from main memory to the cache when a cache miss occurs. This includes the time to access main memory and any additional delays from memory hierarchy levels.
Similar Reads
Cache Hits in Memory Organization
The user has a memory machine. It has one layer for data storage and another layer for the cache. The user has stored an array with length N in the first layer. When the CPU needs data, it immediately checks in cache memory whether it has data or not. If data is present it results in CACHE HITS, els
8 min read
Memory Stack Organization in Computer Architecture
A stack is a storage device in which the information or item stored last is retrieved first. Basically, a computer system follows a memory stack organization, and here we will look at how it works. A portion of memory is assigned to a stack operation to implement the stack in the CPU. Here the proce
4 min read
Read and Write operations in Memory
A memory unit stores binary information in groups of bits called words. Data input lines provide the information to be stored into the memory, Data output lines carry the information out from the memory. The control lines Read and write specifies the direction of transfer of data. Basically, in the
3 min read
Cache Organization | Set 1 (Introduction)
Cache is close to CPU and faster than main memory. But at the same time is smaller than main memory. The cache organization is about mapping data in memory to a location in cache. A Simple Solution: One way to go about this mapping is to consider last few bits of long memory address to find small ca
3 min read
Memory Organisation in Computer Architecture
Memory organization is essential for efficient data processing and storage. The memory hierarchy ensures quick access to data by the CPU, while larger, slower storage devices hold data for the long term. Effective memory management ensures the system operates efficiently, providing programs with the
3 min read
Virtual Memory in Operating System
Virtual memory is a memory management technique used by operating systems to give the appearance of a large, continuous block of memory to applications, even if the physical memory (RAM) is limited. It allows larger applications to run on systems with less RAM.The main objective of virtual memory is
15+ min read
Computer Organization and Architecture Tutorial
In this Computer Organization and Architecture Tutorial, youâll learn all the basic to advanced concepts like pipelining, microprogrammed control, computer architecture, instruction design, and format. Computer Organization and Architecture is used to design computer systems. Computer architecture i
5 min read
Why ROM is Called Non Volatile Memory?
In the world of computer memory, two fundamental characteristics distinguish different types: volatility and nonvolatility. Volatile memory requires a continuous power supply to retain stored data, while nonvolatile memory retains data even when the power is turned off. In this article, we will expl
4 min read
Locality of Reference and Cache Operation in Cache Memory
This article deals with the concept of locality of reference and cache operations in cache memory. Locality of reference refers to the process of accessing a particular portion of memory at any given time period. Cache memory is a small-sized memory that allows fast access by storing the frequently
6 min read
Basic Cache Optimization Techniques
Generally, in any device, memories that are large(in terms of capacity), fast and affordable are preferred. But all three qualities can't be achieved at the same time. The cost of the memory depends on its speed and capacity. With the Hierarchical Memory System, all three can be achieved simultaneou
5 min read