Open In App

Terminologies in Cache Memory Organization

Last Updated : 26 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Cache memory is a small, high-speed storage used to temporarily hold frequently accessed data or instructions. It helps speed up processing by providing faster access to data compared to main memory (RAM). Cache memory is typically smaller but much faster than main memory, and it holds a portion of the data that's currently in use.

The cache memory’s performance and structure are defined by several factors, such as its size, the number of sets it has, its block size, and how data is organized within it. Additionally, cache systems can be designed with different strategies for fetching and writing data, which impact how efficiently the cache operates.

In some systems, the cache might be split into two separate parts: one for instructions (code) and one for data. This is called a Harvard architecture. Otherwise, a Von Neumann architecture may use a single cache for both instructions and data. The combination of these factors determines how well the cache performs and how effectively it speeds up the computer's overall processing.

Read more about Cache Memory

1. First-Level Cache (L1 Cache)

The L1 cache is the closest cache to the CPU. It is split into two separate caches:

  • L1I: Cache for instructions
  • L1D: Cache for data

2. Second-Level Cache (L2 Cache)

The L2 cache is also known as the secondary cache. It is located between the CPU and the main memory and serves as a buffer to improve access times.

3. Main Memory

Main memory or RAM is the last level of memory the CPU will check when looking for data. If the data is not found in the cache, the CPU fetches it from main memory.

4. Memory Hierarchy

A memory hierarchy describes the levels of memory between the CPU and main memory. For a true hierarchical structure, there should be at least two caches between the CPU and main memory. Caches closer to the CPU are referred to as upstream or predecessor caches, while caches closer to the main memory are referred to as downstream or successor caches.

5. Block

A block (or line) is the smallest unit of data in the cache. It is associated with a tag, which helps identify the corresponding part of the main memory.

6. Set

A set is a collection of blocks that can be checked in parallel. If a cache has only one block per set, it is called a direct-mapped cache. A set-associative cache has multiple blocks per set, allowing more flexibility in placing data.

Read about Cache Mapping Techniques

7. Associativity

Associativity refers to the number of blocks in a set. A direct-mapped cache has one block per set, meaning each memory block has a single location in the cache. Higher degrees of associativity allow for multiple possible locations for each memory block, improving cache performance.

8. Sub-block

A sub-block is a smaller unit of data within a block, associated with a valid bit. The size of the sub-block is typically less than or equal to the block size.

9. Fetch Size

The fetch size is the maximum amount of memory that can be fetched from the next memory level (such as from main memory to cache). It is typically a multiple of the sub-block size and can be larger or smaller than the block size.

10. Read

A read is a request to fetch a consecutive collection of words from a cache at a specific address. The CPU generates both instruction fetches and load references, both of which are read operations.

11. Write

A write request contains an address, a number of sub-blocks to be written, and a mask to define the data to be written.

12. Read Miss

A read miss occurs when the data requested is not found in the cache. It can happen if none of the tags in the set match the requested address, or if one or more of the sub-blocks in a matching block are invalid.

13. Miss Ratios

  • Local Read Miss Ratio: The number of read misses in a cache divided by the total number of read requests to that cache.
  • Global Read Miss Ratio: The number of read misses in a cache divided by the number of read requests generated by the CPU.
  • Solo Read Miss Ratio: The miss ratio for a cache when it is the only cache in the memory hierarchy.

14. Read Traffic Ratio

The read traffic ratio is the number of words fetched from the next level in the hierarchy (such as from main memory) divided by the number of words fetched from the cache.

15. Write Traffic Ratio

The write traffic ratio is the ratio of the number of words written out by a cache to the number of words written to the next level in the hierarchy.

16. Fetch Strategy

There are two basic strategies for fetching data:

  • Write-through: Data is written to both the cache and the next level of memory simultaneously.
  • Write-back: Data is initially written only to the cache and is later written to the next level of memory when it is evicted.

Both strategies may involve write buffering (width and depth), and a strategy for handling write misses must also be chosen.

Read more about Write Through and Write Back in Cache

17. Replacement Strategy

The replacement strategy determines how the cache decides which block to evict when space is needed. Common strategies include:

  • Random: A random block is chosen for eviction.
  • Least Recently Used (LRU): The block that has not been accessed for the longest time is evicted.

For direct-mapped caches, there is no need for a replacement strategy, as each block has a fixed location in the cache.

18. Hit Ratio

The hit ratio is the proportion of cache accesses that result in a cache hit (i.e., the requested data is found in the cache).

Formula: Hit Ratio = Number of Hits / Total Number of References

19. Miss Ratio

The miss ratio is the proportion of cache accesses that result in a cache miss (i.e., the requested data is not found in the cache).

Formula: Miss Ratio = Number of Misses / Total Number of References

Since Miss Ratio = 1 - Hit Ratio, they are complementary.

20. Miss Penalty

The miss penalty is the time required to fetch a block from main memory to the cache when a cache miss occurs. This includes the time to access main memory and any additional delays from memory hierarchy levels.
 


Next Article

Similar Reads