0% found this document useful (0 votes)

136 views

Intellegent Cache System

The document discusses the Selective-Mode Intelligence (SMI) cache system, which aims to optimize both temporal and spatial locality. It does this through a direct-mapped cache with small blocks to exploit temporal locality, and a fully associative spatial buffer with large blocks to exploit spatial locality. When a block is missed in the direct-mapped cache, the large block containing it is fetched to the spatial buffer. Blocks within that are referenced during its residency are then moved to the direct-mapped cache, selectively retaining those with high temporal locality. This SMI cache reduces misses compared to a conventional cache, due to better exploiting both types of data locality.

Uploaded by

api-3746880

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views

Intellegent Cache System

Uploaded by

api-3746880

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 18

1: Introduction

Cache memory is a key mechanism for improving overall system

performance. A cache exploits the locality inherent in the reference stream
of the typical application. Two primary type of locality are available and the
degrees to which they can be exploited depend on the program input
characteristic. Temporal locality relies on the greater probability that recently
accessed data will be accessed in near future. Spatial locality refers to the
tendency for adjacent or nearby locations to be reference together in time.

Prefetching mechanism can be used to reduce cache misses. It

also reduces cache stall time by bringing data into cache before it use, so that it
can be accessed without delay. Hardware-based prefetching requires some
modification to of the cache, but little modification to processor core. Its main
advantage is that prefetches are handled dynamically at run time without
compiler intervention. The drawbacks are that extra hardware is needed and the
memory reference for complex access pattern is difficult to predict. In contrast
software-based approaches rely on compiler technology to perform static
program analysis and to selectively insert prefetch instructions. The draw backs
are the nonnegligible execution overhead is introduced due to extra instructions
and that static analysis may misses some prefetch opportunities that are
obvious at runtime.

Most hardware prefetching mechanisms generate a

prefetch signal when either a cache miss or hit occurs. Therefore a large
number of prefetch signal are generated, leading to higher power
consumption and cache pollution. Prefetching can also cause an increase in
the MCPI (Memory Cycle Per Instruction) in the worst case. This intelligent
prefetching mechanism avoids these problems by reducing the prefetch
generation rate. As a result, power consumption is also reduced and increase
in MCPI is negligible.

A SMI cache is constructed in three parts; a conventional direct

mapped cache With a small block size, a fully associative buffer with a large
block size at the same cache level, and a hardware prefetching unit. The
improvement in performance is achieved by exploiting the basic characteristic
of locality. Specifically, two different block size are used, i.e., a small block
size to exploit temporal locality and a large block size that is multiple of the
small block size to exploit spatial locality. Also, temporal locality is enhanced
by selectively retaining block with a high probability of repeated reference in
time domain, and spatial locality is enhanced by both a large fetch size and an
intelligent prefetching mechanism. The technique we used for enhancing
temporal locality is to monitor the behavior of a block over a time interval
before choosing to store it into the direct mapped cache. That is, instead of
placing every missed block directly into the direct mapped cache, we first load
a large block including the missed block into the spatial buffer. The missed
block is not moved into the direct mapped cache until the large block is
replaced from the spatial buffer, and then only if it has shown evidence of
temporal locality while it was resident in the spatial buffer. Thus the SMI cache
exploits information about utilization of the block that is obtained during this
time interval. The interval itself is the proportional of the entries in the spatial
buffer. The behavior that we observe during this time enables us to determine
which block show strong temporal locality.
The average miss ratio and memory access time of the SMI
cache for a given cache space (e.g., 8 KB) are equivalent to those of a
conventional direct-mapped cache with four times as much space (e.g., 32 KB).
Thus, a 60 to 80 percent reduction in chip area can be obtained over a direct
mapped cache while achieving a high performance gain. Also, the SMI cache
can reduce the miss ratio by around 20% and the average memory access time
by 10%, compared with a victim cache.

2: Selective-Mode Intelligence Cache System

The motivation for the design, the architectural model for the SMI cache
system, and its operation in the context of hardware prefetching is as follows.

2.1: Motivation:

Common design objective for the cache are to improve utilization of

the temporal and spatial locality inherent in applications. However no single
cache organization exploit both temporal and spatial locality optimally because
of their contradictory characteristics.
Increasing block size reduces the number of cache blocks. Thus, conventional
cache design attempt to compromise by selecting an adequate block size. The
cache’s lack of adaptability to pattern of references of different types of locality
poses a further drawback. A fixed block size result in suboptimal overall access
time and bandwidth
Utilization because it fails to match the varying spatial and temporal locality
across and within programs. A cache system that provides two caches with
different configurations is an effective structure for overcoming these structural
drawbacks.
A significant reason for using a direct map catch with a small block size is to
reduce power consumption and cycle time. To make up for the poor spatial
locality of such a direct mapped cache, we provide a fully associative spatial
buffer with a large block size. The small block size exploits temporal locality
and the large block size exploits spatial locality. Small blocks are first loaded
as part of larger blocks that are fetched into the spatial buffer. We selectively
extend the lifetime of those small blocks that exhibits high temporal locality by
storing them in the direct mapped cache. When a large block is replaced in the
spatial buffer, small blocks belonging to it that have been accessed at least once
during its residency are moved into the direct mapped cache. This approach
effectively reduces conflict misses and thrashing at the same time.

Making the block size of direct mapped cache as small as possible

improves temporal locality by increasing the number of available entries for a
given cache space. For example, the number of entries for four byte block is 16
times more than for 64 byte blocks. Therefore, the lifetime of a data item in a
direct mapped cache with s – byte block size is at most l/s times that for a
cache with l byte (l>s) block size, for a given cache size. In conventional direct
mapped cache the increase in spatial locality due to a larger block size gives a
greater performance improvement than the increase in temporal locality
resulting from increasing the number of entries. However, the spatial buffer of
the SMI cache provides the necessary spatial locality and thus we choose the
smallest block size possible for the direct mapped cache.

Fetching a large block when a cache miss occurs increase spatial

locality. Our simulations show the probability that neighboring small blocks
will be accessed during the life time of initial blocks is more than 50% for most
benchmarks. An intelligent prefetching approach that emphasizes prefetching
of data that has exhibited evidence of spatial locality can also improve cache
performance.

Instead of loading a missed block directly into the direct mapped

cache, we load it into the fully associative spatial buffer. Then, time interval
proportional to the number of entries in the buffer elapsed before it is evicted.
The small blocks, which are individual words in the simple case within the
large block, that have been referenced during that time are moved into the
direct mapped cache when the large block is evicted.

2.2: SMI Cache System structure:

The direct mapped cache is the main cache and its organization
is similar to a traditional direct mapped cache. But with a smaller block size.
The spatial buffer is designed so that each entry is a collection of several
banks, each of which is the size of a block in the direct mapped cache. The tag
space of the buffer is a content addressable memory (CAM). The small blocks
in each bank include a hit bit (H), to distinguish referenced blocks from
unreferenced blocks. Each large block entry in the spatial buffer further has a
prefetch bit (P), which direct the operation of prefetch controller. The prefetch
bits are used to dynamically generate prefetch operations.

When CPU performs a memory reference, the direct mapped cache

and the spatial buffer are searched in parallel. A hit in the direct mapped cache
is processed is the same way as a hit in the conventional L1 cache. When a
miss occurs in both the direct mapped cache and the spatial buffer, a large
block is fetched into the spatial buffer. If a reference misses in the direct
mapped cache, but hits in the spatial buffer, its corresponding small block is
simply fetched from the spatial buffer and the hit bit for that small block is set
at the same time.

The prefetch controller generates a prefetch signal when a large

block is accessed and found to have multiple hit bits set. Two operations are
performed consecutively by the prefetch controller. Given a hit in the ith large
block, the first operation search the tags of the spatial buffer to direct whether
the (i+1)th large block is already present. A one cycle penalty occurs in this
case, but the overhead is negligible because prefetching is initiated in response
to only about 1.5 ~2.5 % of the total number of addresses generated by the
CPU. Thus the average MCPI is increased by about 0.06 %.If the (i+1)th large
block is not already in the spatial buffer, the second operation is performed: It
is faced into a prefetched buffer and the P bit in the ith large block is set to
prevent it from generating further prefetches. The number of prefetch buffer
entries is assumed to be one.

If misses occurs in both the direct mapped cache and the

spatial buffer, the cache control initiates miss handling. When this occurs, a
block that was already in prefetch buffer is transferred to spatial buffer.
Therefore, the transfer time is hidden because 19 clock cycles are required for
handling a miss. But, the missed block may already be present in the prefetch
buffer. Therefore when the block in the prefetch buffer is transferred to the
spatial buffer the tag value should be simultaneously compared with the
generated address. This comparison can be implemented either by using the
comparator in the direct mapped cache or an additional comparator. If the
comparator shows a match, the requested data item is transmitted to the CPU
and transferred to the spatial buffer at the same time. Also the ongoing miss
signal should be cancelled by the cache controller

2.3: Operational Model of The SMI Cache:

The algorithm for managing the SMI cache is as follows.

Prefetching is performed dynamically depending upon both H and P bits for a
particular large block in the spatial buffer. On every memory access, both the
direct mapped cache and the spatial buffer are accessed at the same time. The
different cases for the operational model are as follows.

1. Case of Cache Hits:

On every memory access, a hit may occurs at the direct mapped

cache or the spatial buffer. First, consider the case of a hit in the direct mapped
cache. If a read access to the direct mapped cache is a hit, the requested data
item is transmitted to the CPU without delay. If a write access to the direct
mapped cache is a hit, the write operation is performed and the dirty bit for the
block is set.

In second case when the hit is in the spatial buffer, the

data in the small block are sent to the CPU and the hit bit is set to the mark it
as referenced. As an aside, we also reduce power consumption by using the
most significant bits of the large block offset to activate just one of the banks
in the spatial buffer. For example, if the size of a small block is 8-byte and a
large block is 32-byte, there are four banks in the spatial buffer. When a large
block is accessed and its P bit is in the reset state, a prefetch operation is
initiated if a hit occurs in any bank of the spatial buffer and one or more hit
bits are already set. At the same time, the tags of the spatial buffer are
searched for the prefetch address to check whether it is already present, in
which case the prefetch is squashed and the P bit is set. If the address is not in
the spatial buffer, the prefetch controller generates the prefetch signal and the
target address to fetch the large block into the prefetch buffer from the next
level of memory. At the same time, the P bit of the original large block is set to
prevent the repetition of the prefetch. If the P bit of the large block is set, the
consecutive large block must be present in either the spatial buffer or the
prefetch buffer, and there is no need to search the tags within the spatial
buffer.

If a large block is in the prefetch buffer when a

subsequent prefetch signal is generated as a result of a cache miss, then the
miss stalls while the content of the prefetch buffer are moved into the spatial
buffer. According to simulation result, this case almost never occurs, even for
a prefetch buffer with one entry. This is because the overall rate of prefetch
operation is only about 0.3% and the miss-ratio is about 1.7 percent.
Therefore, the probability for a prefetch to be initiated in this manner is about
six times smaller than the miss ratio. Because the utilization of the prefetch
block is over 90 percent, it is not worth adding hardware specifically to handle
this rare case, relying instead on the existing cache stall mechanism.

2. Case of Cache Misses:

If a miss occurs in both the caches, a large block including

the missed small block is brought in the spatial buffer from the next level of
memory. Let an 8 K.B. direct mapped cache with a small block size of 8 bytes
and a 1 K.B. spatial buffer with a large block size of 32 bytes, so four
sequential blocks are contained within a 32 byte block. There are further two
cases.

(a) The Spatial Buffer is not full:

If at least one entry in the spatial buffer is in the invalid state, a large
block is fetched and stored in the spatial buffer. When a particular small block
is accessed by the CPU, the corresponding hit bit is set to one. Thus, the hit bit
of the small block identifies it as a referenced block.

(b) The Spatial Buffer is full:

If the spatial buffer is full, the oldest entry is replaced according

to a FIFO policy. At that point, the blocks in the entry whose hit bits are set are
moved into the direct mapped cache. Because these actions are accomplished
while the cache controller is handling a miss, this operation does not introduce
any additional delay.

Cache write back does not occur from the spatial buffer because
any modified or referenced small block is always moved to the direct mapped
cache or victim cache, which would typically have the small block size (e.g.,
32 bytes) as the spatial buffer, write back must be performed for the full 32
byte block, even though one word require write back. In contrast, the SMI
cache executes the write back operation only for the marked 8-byte small
blocks. Therefore, write traffic into memory is potentially reduced to a
significant degree.
The potential exists in any split cache for incoherent copies of
blocks appear in the different subcaches. To avoid this problem, we chose and
simulated a simple mechanism, which is as follows: When a global miss
occurs, the cache controller searches the tags of the temporal caches to detect
whether any of the four small blocks belonging to the particular large block
being fetched are present in the temporal caches. If a match is detected, then all
the corresponding small blocks in the temporal cache are invalidated. Each of
these small blocks that is also dirty is than used to update its corresponding
entry in the spatial buffer once the large block has been loaded. This search
operation can be accomplished while the cache controller is handling a miss.
Further, the power consumption is negligible because the miss ratio is only
about 1.7% of the total number of the addresses generated by the CPU.

A small block may thus temporarily exist in the temporal

cache in the invalid state, while its valid copy is being referenced in the spatial
buffer. When its corresponding large block is replaced, the small block is
copied into the temporal cache once again. Therefore, there is almost no
performance decrease. Of course, if three or four small blocks are present in
both the temporal cache and the spatial cache, then the effective utilization of
total cache space decreases a bit more, but is still negligible. This mechanism
also applies in the case of transferring a prefetched block into the spatial buffer.

3: Performance Evaluation
The direct mapped cache is chosen, for comparison in
terms of performance and cost.

3.1: Time of Prefetch Signal Generation and Overhead:

The more meaningful measure to evaluate the performance of

any given memory hierarchy is the average memory access time:

Average memory access time=

Hit time + Miss rate * Miss penalty

Here, hit time is the time to process a hit in the cache and miss penalty is the
additional time to service the miss. The basic parameters for the simulation are
as follows: The hit time of the direct mapped cache and fully associative buffer
are both assumed to be one cycle. We assumed 15 cycles are needed for a miss.
Therefore, each 8-byte block is transferred from the off-chip memory after a 15
cycle penalty. These parameters are based on the values for common 32-bit
embedded processor (e.g., Hitachi SH4 or ARM920T).

Simulations show s that prefetching when the number of hit bits

is two achieves a more significant miss gain than the other case, in spite of the
potential for greater overhead due to increased prefetching frequency. With
respect to power consumption, memory traffic, and the accuracy of the
prefetching operation, the “Prefetch-4” mechanism provides the most
significant effect.
Finally in case of prefatching a block that does not exist in the special
buffer, the target block is simply fetched into the prefetch buffer. The rate at
which prefetch actually occurred and that prefetched blocks are actually
referenced is shown in fig 4 and fig 5 respectively. With only a small number
of prefetch operations, the SMI cache system achieves a significant
performance gain with low overhead. For the “prefetching-4” mechanism, the
utilization of the prefetched blocks is over 90%. This data clearly show that
spatial locality is enhanced by prefatching a neighboring block intelligently
when a spatial buffer hit occurs.

3.2: Comparison of a Conventional Cache with the SMI

Cache Configuration

Two common performance metrics, the miss ratio and the average
memory access time, are used to evaluate and compare an SMI cache system
operating in a “prefetch-4” configuration with other approaches. Here, the
direct mapped cache is compared with the SMI cache in terms of miss ratio and
average memory access time.

Miss Ratio and Average Memory Access Time

The combination of an 8-byte small block
and a 32-byte large block shows the best performance for most cases.

The cache miss ratio for the conventional direct-mapped cache

and the SMI cache is shown in the Table-A. For the direct mapped cache,
denoted as DM, the notation “32KB-32byte” denotes a 32KB direct mapped
cache with a block size of 32bytes. The SMI cache notation “8K8-1K32”
denotes an 8KB direct-mapped cache with a block size of 8bytes and a 1KB
spatial buffer with a block size of 32-bytes. The average miss ratio of the SMI
cache for a given size (e.g., 8KB) is equal to a conventional direct mapped
cache with a cache size of four or eight time as much space (e.g., 32 or 64KB)
in nonprefetching mode and prefetching mode, respectively. Moreover,
increasing the spatial buffer space achieves a more significant performance
gain than increasing the direct mapped cache space. The miss ratios for a
conventional two way set associative cache and the SMI cache are compared in
Table-B. The two-way set associative cache greatly reduces the miss ratio, but
because of its slower access time and higher power consumption, this is not
much effective. The result shows that the SMI cache can achieve better
performance than a two way set associative cache with double the space.
The average memory access time for the conventional
direct mapped cache and the SMI cache are compared in Table-C. We can see
that 8K8-1K32(Prefetch mode) cache has greater Average Memory Access
Time (cycle) than the Direct mapped cache, while 8K8-2K32(Prefetch mode)
has less Average Memory Access Time (Cycle). This analysis shows that
application with a high degree of locality, show an especially strong
performance improvement with the SMI cache.

4.3: Relation between Cost and Performance

In general the, logic to manage the tags for the fully associative
cache is designed as a CAM structure for simultaneous comparison for each
entry. Because each CAM cell is a combination of storage and comparison, the
size of CAM cell is double of RAM cell. For fair performance/cost analysis,
the performance for various direct mapped-cache and buffer size is evaluated.
The metric is rbe (register bit equivalent) and the total area can be calculated
as follows:

Area = PLA + RAM + CAM. (1)

Here, the control logic PLA (Programmable logic array) is assumed to be 130
rbe, a RAM cell as 0.6 rbe, and a CAM cell is 1.2 rbe. Equation (2) represents
the RAM area:

RAM = 0.6 * (#entries + Lsense_amp) *

((#data_bits + #status_bits) + Wdriver), (2)

where Lsense_amp is the bit length of a bit line sense amplifier, Wdriver
the data width of a driver, #entries the number of rows of the tag array or data
array, #data_bits the tag bit or data bit of one set, and #status_bits the state bits
of one set. Finally, (3) calculate the area of the CAM:

CAM = 0.6 * (2^ (1/2) * #entries + Lsense_amp) *

(2^ (1/2) * #tag_bits + Wdriver), (3)

where #tag_bits is the number of bits for one set in the tag array. Table 1
shows the performance/cost ratio for direct mapped cache and SMI cache.

The SMI cache shows about a 60% area reduction compared with the 64KB-
32byte conventional direct mapped cache, even though it provides higher
performance. And, it offers an 80 percent area reduction compared with the
64KB-32byte configuration, while providing much higher performance. Also
the improvement ratio for the average memory access time shows that the
8KB-2KB SMI is the best configuration.

Performance and Cost of the SMI cache and Direct-mapped Cache

32KB-32byte 8KB-1KB 64KB-2KB 8KB-2KB

(DM) (SMI) (DM) (SMI)
Area 177496 rbe 67431 rbe 352596 rbe 73680 rbe
(Improvement ratio) (1.00) (0.38) (1.99) (0.42)

Miss ratio 1.89% 1.61% 1.46% 1.37%

(Improvement ratio) (1.00) (0.85) (0.77) (0.73)

Avg. memory 1.34 cycle 1.29 cycle 1.26 cycle 1.25 cycle
access time (1.00) (0.96) (0.94) (0.93)
(Improvement ratio)

Miss Ratio (%)

32KB-32byte (DM) 1.90
64KB-32 byte (DM) 1.45
8K8-1K32 (NPF) 2.05
8K8-2K32 (NPF) 1.80
8K8-1K32 (PF) 1.70
8K8-2K32 (PF) 1.45

Table-A: Miss Ratio of the Direct Cache and SMI Cache.

Miss Ratio (%)

16KB-32byte (2-way) 2.10

32KB-32 byte (2-way) 1.10
8K8-1K32 (NPF) 2.05
8K8-2K32 (NPF) 1.80
8K8-1K32 (PF) 1.70
8K8-2K32 (PF) 1.45

Table-B: Miss Ratio if the Two-way set associative cache and SMI cache.

Average Memory Access Time (Cycle)

32KB-32byte (DM) 1.35

64KB-32 byte (DM) 1.28
8K8-1K32 (NPF) 1.36
8K8-2K32 (NPF) 1.31
8K8-1K32 (PF) 1.28
8K8-2K32 (PF) 1.25

Table-C: Average memory access time of the direct mapped cache and the SMI
cache.
4. Conclusion

To design a simple but high performance cache system with low

cost , a new caching mechanism for exploiting two types of locality
effectively and adaptively is designed: A direct mapped cache with a
small block size for exploiting temporal locality and a fully associative
spatial buffer with a large block size for exploiting spatial buffer. An
intelligence hardware based prefetching mechanism is used to maximize
the effect of spatial locality.

The SMI cache overcomes the structural drawbacks of

direct mapped caches, such as conflict misses and thrashing. We can
evaluate the SMI cache system in two configurations, nonprefetching
mode and prefetching mode, to analyze the contribution of intelligent
prefetching. Both modes provide high performance, but non prefetching
mode offers lower power consumption, while prefetching mode offers
higher performance.

The temporal mechanism of the SMI cache

decreases conflict misses by about 26% and the spatial locality miss
ratio decrease by about 48%.

Table of Content:
1. Introduction
2. Selective Mode Intelligent Cache System

2.1. Motivation

2.2. SMI Cache System Structure

2.3. Operational Model of The SMI cache

2.3.1. Case of Cache Hits

2.3.2. Case of Cache misses

(A). The spatial buffer is not full:

(B). The spatial buffer is full:

3. Performance Evaluation

3.1. Time of Prefetch Signal Generation and Overhead

3.2. Comparison of a Conventional Cache with the SMI
Cache Configuration (Miss Ratio and Average
Memory Access Time)
3.3. Relation between Cost and Performance

4. Conclusion

References:
1. IEEE transaction On computers,
Vol. 52, NO.5, MAY 2003
2. Computer Architecture and Organization:
John P. Hayes
3. Computer Organization and Architecture:
William Stallings

Cache Memory Presentation Slides
No ratings yet
Cache Memory Presentation Slides
25 pages
HP - HPE0-V25.vNov-2023.by .Gary .34q
No ratings yet
HP - HPE0-V25.vNov-2023.by .Gary .34q
19 pages
Invitation To Computer Science 7th Edition Schneider Test Bank 1
100% (65)
Invitation To Computer Science 7th Edition Schneider Test Bank 1
10 pages
Elements of Cache Design
No ratings yet
Elements of Cache Design
6 pages
ARM Cortex-A76 Block Diagram
No ratings yet
ARM Cortex-A76 Block Diagram
1 page
Group Project
85% (13)
Group Project
50 pages
Reconfigurable Cache Architecture: Major Technical Project On
No ratings yet
Reconfigurable Cache Architecture: Major Technical Project On
9 pages
Design of Cache Memory Mapping Techniques For Low Power Processor
No ratings yet
Design of Cache Memory Mapping Techniques For Low Power Processor
6 pages
Unit 5
No ratings yet
Unit 5
40 pages
Computer Mapping and Different Memory
No ratings yet
Computer Mapping and Different Memory
9 pages
MTP 01 Final J.raghunat b15216
No ratings yet
MTP 01 Final J.raghunat b15216
10 pages
Computer Organization and Architecture Module 3
100% (1)
Computer Organization and Architecture Module 3
34 pages
Caches in Multicore Systems: Universitatea Politehnica Din Timisoara Facultatea de Automatica Şi Calculatoare
No ratings yet
Caches in Multicore Systems: Universitatea Politehnica Din Timisoara Facultatea de Automatica Şi Calculatoare
7 pages
Cache Memory: Replacement Algorithms
No ratings yet
Cache Memory: Replacement Algorithms
9 pages
CPU Cache: From Wikipedia, The Free Encyclopedia
No ratings yet
CPU Cache: From Wikipedia, The Free Encyclopedia
19 pages
Non Inclusive Caches
No ratings yet
Non Inclusive Caches
10 pages
Cache: Why Level It: Departamento de Informática, Universidade Do Minho 4710 - 057 Braga, Portugal Nunods@ipb - PT
No ratings yet
Cache: Why Level It: Departamento de Informática, Universidade Do Minho 4710 - 057 Braga, Portugal Nunods@ipb - PT
8 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
Cache Memory
No ratings yet
Cache Memory
8 pages
Chapter 2 Neede For Guide Line Help From Smiw
No ratings yet
Chapter 2 Neede For Guide Line Help From Smiw
7 pages
Unit 5 Memory Management
No ratings yet
Unit 5 Memory Management
20 pages
Evaluating Stream Buffers As A Secondary Cache Replacement
No ratings yet
Evaluating Stream Buffers As A Secondary Cache Replacement
10 pages
Unit 5 Notes (1)
No ratings yet
Unit 5 Notes (1)
26 pages
cache
No ratings yet
cache
5 pages
Associative Memory
No ratings yet
Associative Memory
25 pages
4 Unit Speed, Size and Cost
No ratings yet
4 Unit Speed, Size and Cost
5 pages
Shared-Memory Multiprocessors - Symmetric Multiprocessing Hardware
No ratings yet
Shared-Memory Multiprocessors - Symmetric Multiprocessing Hardware
7 pages
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
No ratings yet
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
5 pages
CACHE memory
No ratings yet
CACHE memory
24 pages
Mapping Functions
No ratings yet
Mapping Functions
23 pages
Cache Entries
100% (1)
Cache Entries
13 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
10 pages
kientrucenglish
No ratings yet
kientrucenglish
9 pages
Lecture 2.2.4 (Associative Memory, Cache Memory and Its Design Issues)
No ratings yet
Lecture 2.2.4 (Associative Memory, Cache Memory and Its Design Issues)
54 pages
CPU Cache: Details of Operation
No ratings yet
CPU Cache: Details of Operation
18 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
UNIT 4_ee4604ab899872c2fb7aa8d7693177f7
No ratings yet
UNIT 4_ee4604ab899872c2fb7aa8d7693177f7
36 pages
CS Note
No ratings yet
CS Note
51 pages
Cache Memory
No ratings yet
Cache Memory
11 pages
Cache 13115
No ratings yet
Cache 13115
20 pages
18bce2429 Da 2 Cao
No ratings yet
18bce2429 Da 2 Cao
13 pages
Implementation of Cache Memory
No ratings yet
Implementation of Cache Memory
15 pages
Cache Memory
No ratings yet
Cache Memory
4 pages
Cache Design
No ratings yet
Cache Design
59 pages
Computer Organization Answer
No ratings yet
Computer Organization Answer
6 pages
Dynamic Storage Allocation Techniques Final PDF
No ratings yet
Dynamic Storage Allocation Techniques Final PDF
14 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
Assignment4-Rennie Ramlochan
No ratings yet
Assignment4-Rennie Ramlochan
7 pages
Cache Memory: Computer Architecture Unit-1
No ratings yet
Cache Memory: Computer Architecture Unit-1
54 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Memory in Computer Organizatin
No ratings yet
Cache Memory in Computer Organizatin
12 pages
Shashank Aca Assignment
No ratings yet
Shashank Aca Assignment
21 pages
Cache Memory
No ratings yet
Cache Memory
20 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
7 pages
Memory Hierarchy 4.0
No ratings yet
Memory Hierarchy 4.0
50 pages
Cache Memory in Computer Organization
No ratings yet
Cache Memory in Computer Organization
5 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Term Paper: Cahe Coherence Schemes
No ratings yet
Term Paper: Cahe Coherence Schemes
12 pages
Cache Coherence Protocols: Evaluation Using A Multiprocessor Simulation Model
No ratings yet
Cache Coherence Protocols: Evaluation Using A Multiprocessor Simulation Model
26 pages
Literature Review of Cache Memory
100% (2)
Literature Review of Cache Memory
7 pages
Memory Management: Abstract
No ratings yet
Memory Management: Abstract
20 pages
6.cache Memory - BVK
No ratings yet
6.cache Memory - BVK
47 pages
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Tool Command Language
100% (1)
Tool Command Language
21 pages
VB Windows Services
No ratings yet
VB Windows Services
9 pages
Web Application
No ratings yet
Web Application
13 pages
Quantam Computers
No ratings yet
Quantam Computers
21 pages
The Oops of C++
100% (4)
The Oops of C++
26 pages
Global System For Mobile Communication
No ratings yet
Global System For Mobile Communication
21 pages
Network Custody Systems
100% (1)
Network Custody Systems
20 pages
Ecash
No ratings yet
Ecash
33 pages
Mobile Adhoc Network Full Seminar Report
No ratings yet
Mobile Adhoc Network Full Seminar Report
14 pages
Domain Name System
No ratings yet
Domain Name System
39 pages
VB EXCEPTIONHANDLING
No ratings yet
VB EXCEPTIONHANDLING
5 pages
Final Fuzzy Report
No ratings yet
Final Fuzzy Report
19 pages
Digitel Signature
No ratings yet
Digitel Signature
21 pages
Cryptography and Digital Signature
No ratings yet
Cryptography and Digital Signature
29 pages
Computer Viruses
No ratings yet
Computer Viruses
29 pages
Criptography
No ratings yet
Criptography
9 pages
Biometrics Is The Technique of Using Unique, Non-Transferable
No ratings yet
Biometrics Is The Technique of Using Unique, Non-Transferable
24 pages
Intro To ASP NET
100% (1)
Intro To ASP NET
18 pages
Cdma
No ratings yet
Cdma
17 pages
Bluetooth Tech
100% (1)
Bluetooth Tech
23 pages
Background:: Artificial Neural Network
100% (1)
Background:: Artificial Neural Network
22 pages
5th Input Output Organization
No ratings yet
5th Input Output Organization
37 pages
1.06 Prospectus HMP LFGpro
No ratings yet
1.06 Prospectus HMP LFGpro
12 pages
Chapter 1 Computer Abstractions and Technology
No ratings yet
Chapter 1 Computer Abstractions and Technology
23 pages
Computer Programming For Engineers
100% (1)
Computer Programming For Engineers
41 pages
Codigos de Error Maquinas Milltronics
No ratings yet
Codigos de Error Maquinas Milltronics
15 pages
WXXBinarylog
No ratings yet
WXXBinarylog
9 pages
Logcat Prev CSC Log
No ratings yet
Logcat Prev CSC Log
98 pages
Unit-1 Introduction To Computer
No ratings yet
Unit-1 Introduction To Computer
13 pages
Upgrading The Storage Management, SVP Software, and Storage System Firmware
No ratings yet
Upgrading The Storage Management, SVP Software, and Storage System Firmware
20 pages
SSDs
100% (1)
SSDs
4 pages
File System: File System Management and Optimization Example File Systems
No ratings yet
File System: File System Management and Optimization Example File Systems
33 pages
Embedded Systems RTOS
No ratings yet
Embedded Systems RTOS
47 pages
BIOS and Integrated Management Controller Setup
No ratings yet
BIOS and Integrated Management Controller Setup
17 pages
Sample Invitation To Tender Advertisement (Contract) Invitation To Tender
No ratings yet
Sample Invitation To Tender Advertisement (Contract) Invitation To Tender
11 pages
Corso GPFS Part 2
No ratings yet
Corso GPFS Part 2
20 pages
CH01 COA10e Stallings
No ratings yet
CH01 COA10e Stallings
57 pages
C01-01 Comp. Fund.
No ratings yet
C01-01 Comp. Fund.
3 pages
A Series Data Sheet
No ratings yet
A Series Data Sheet
6 pages
Flash Diagnosis
No ratings yet
Flash Diagnosis
73 pages
BCA Program Guide 2011
50% (2)
BCA Program Guide 2011
104 pages
COA Model Paper
No ratings yet
COA Model Paper
3 pages
Basic Monitoring of IO On AIX
No ratings yet
Basic Monitoring of IO On AIX
46 pages
GuardII EV User Manual
No ratings yet
GuardII EV User Manual
37 pages
Computer and statistics
No ratings yet
Computer and statistics
46 pages
Equipo Panasonic
No ratings yet
Equipo Panasonic
2 pages
Enusjg15 0219
No ratings yet
Enusjg15 0219
46 pages

Intellegent Cache System

Uploaded by

Intellegent Cache System

Uploaded by

1: Introduction

Cache memory is a key mechanism for improving overall system

Prefetching mechanism can be used to reduce cache misses. It

Most hardware prefetching mechanisms generate a

A SMI cache is constructed in three parts; a conventional direct

2: Selective-Mode Intelligence Cache System

Common design objective for the cache are to improve utilization of

Making the block size of direct mapped cache as small as possible

Fetching a large block when a cache miss occurs increase spatial

Instead of loading a missed block directly into the direct mapped

2.2: SMI Cache System structure:

When CPU performs a memory reference, the direct mapped cache

The prefetch controller generates a prefetch signal when a large

If misses occurs in both the direct mapped cache and the

2.3: Operational Model of The SMI Cache:

The algorithm for managing the SMI cache is as follows.

1. Case of Cache Hits:

On every memory access, a hit may occurs at the direct mapped

In second case when the hit is in the spatial buffer, the

If a large block is in the prefetch buffer when a

2. Case of Cache Misses:

If a miss occurs in both the caches, a large block including

(a) The Spatial Buffer is not full:

(b) The Spatial Buffer is full:

If the spatial buffer is full, the oldest entry is replaced according

A small block may thus temporarily exist in the temporal

3.1: Time of Prefetch Signal Generation and Overhead:

The more meaningful measure to evaluate the performance of

Average memory access time=

Simulations show s that prefetching when the number of hit bits

3.2: Comparison of a Conventional Cache with the SMI

Miss Ratio and Average Memory Access Time

The cache miss ratio for the conventional direct-mapped cache

4.3: Relation between Cost and Performance

Area = PLA + RAM + CAM. (1)

RAM = 0.6 * (#entries + Lsense_amp) *

CAM = 0.6 * (2^ (1/2) * #entries + Lsense_amp) *

Performance and Cost of the SMI cache and Direct-mapped Cache

32KB-32byte 8KB-1KB 64KB-2KB 8KB-2KB

Miss ratio 1.89% 1.61% 1.46% 1.37%

Miss Ratio (%)

Table-A: Miss Ratio of the Direct Cache and SMI Cache.

Miss Ratio (%)

16KB-32byte (2-way) 2.10

Average Memory Access Time (Cycle)

32KB-32byte (DM) 1.35

To design a simple but high performance cache system with low

The SMI cache overcomes the structural drawbacks of

The temporal mechanism of the SMI cache

2.2. SMI Cache System Structure

2.3. Operational Model of The SMI cache

2.3.1. Case of Cache Hits

2.3.2. Case of Cache misses

(A). The spatial buffer is not full:

3.1. Time of Prefetch Signal Generation and Overhead

You might also like