0% found this document useful (0 votes)
127 views

Interleaved Memory Organisation, Associative Memo

1. Memory interleaving is an organization that divides main memory into multiple modules or banks that can be accessed simultaneously to improve memory bandwidth. 2. There are two main types of interleaving - low-order interleaving which spreads contiguous memory locations across modules horizontally, and high-order interleaving which assigns contiguous locations to each module. 3. The Motorola 68040 processor uses interleaving to reduce memory access time by spreading the four long words of a burst access across two physical modules, allowing the individual accesses to overlap.

Uploaded by

Gourav Salla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

Interleaved Memory Organisation, Associative Memo

1. Memory interleaving is an organization that divides main memory into multiple modules or banks that can be accessed simultaneously to improve memory bandwidth. 2. There are two main types of interleaving - low-order interleaving which spreads contiguous memory locations across modules horizontally, and high-order interleaving which assigns contiguous locations to each module. 3. The Motorola 68040 processor uses interleaving to reduce memory access time by spreading the four long words of a burst access across two physical modules, allowing the individual accesses to overlap.

Uploaded by

Gourav Salla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Computer Organisation and Architecture: Evolutionary Concepts,

Principles, and Designs


Fundamental Computer Organisation
Memory Organisation

Interleaved Memory Organisation


Background

Table of Contents:

Background

Memory Interleaving

Types of Interleaving

Interleaving in Motorola 68040

Conclusion

Associative Memory Organisation

Background

Implementation

Word-Organised Associative Memory

Main memory is centrally located, since it is both the


destination of input and the source for output, and satisfies the
demands of CPU requests and also serves as the I/O interface at
the same time. Performance measures of main memory emphasize
both latency and bandwidth parameters. Amdahl's law warns us
what will happen if we concentrate only on one parameter to
speed up the computation while placing less importance on
and/or ignoring the other parameters. So, both the latency and
bandwidth improvement should be considered at the same time
with equal importance. Memory latency is traditionally the
primary concern of cache usage, and is minimized by using
various types of cache organisation. While memory bandwidth is
the primary concern of and critical to Processor and I/O, the
memory design goal in this regard is targeted to broaden the
effective memory bandwidth so that more and more memory
words can be accessed per unit time. The ultimate aim is,
however, to match the memory bandwidth with the processor
bandwidth and with the bandwidth of the bus to which the
memory is attached. Benefit in bandwidth improvement using
caches can also be obtained by increasing each cache block size,
but that may not be a cost-effective approach.

Innovative organisations of main memory are, hence, needed.


Memory interleaving is such a new organisation aimed more at the
improvement of memory bandwidth than at reduction of latency
with almost no extra cost. Increasing width is one way to improve
bandwidth, but another benefit of this is to extract the potential
parallelism by way of having multiple memory in a memory system
as demanded by pipeline processor and vector processor for their
optimal performance. These processors often require
simultaneous access to memory from their two or more sources
(stages). An instruction pipeline sometimes accesses memory to
fetch anew instruction, and at the same time to get an operand
from two different segments(stages) of the processor for another
executing instruction lying in the pipeline. An arithmetic pipeline
in a similar way also requires two or more operands at the same
time before entering the execution stage of the pipeline. This
simultaneous accesses require two memory buses. It can be
avoided, if the memory can be divided into a number of modules
connected to a common memory address bus and data buses.

Memory Interleaving
Here, the main memory is constructed with multiple modules.
Memory chips can be organised in banks to read or write multiple
words at a time rather than a single word. These memory modules
are connected to a system bus or a switching network to which
other resources such as processors or I/O devices are connected
to communicate with memory modules. A memory array is thus
formed with these memory modules when each memory module
has its own address register and data register (like MAR and MBR).

Each bank is often of one-word width so that the width of the


bus and cache does not need any change, and each bank returns
one word per cycle. Sending different addresses to several banks
simultaneously permits them to operate at the same time so that
multiple words can be accessed in parallel, or even, at least, in a
pipelined fashion.

Consider a memory system formed with memory interleaving


having m = 2X memory modules, with each module containing w =
21' words. The total capacity in the memory system is m x w = 2X+U
words. These words are assigned with usual linear addresses.
Linear addresses could be assigned in different ways giving rise to
different memory organisation, both for random access and for
block access. Block-access at consecutive addresses is sometimes
needed for fetching a sequence of instructions or for accessing a
set of data linearly ordered. This size of the block may correspond
to the size of the cache block, or to that of several cache blocks
(cache lines). The memory organisation should also consider this
aspect of block-access on contiguous words at the time of its
design.

The number of modules (banks) present in a memory system is


called Interleaving factor or degree of interleaving. Word-oriented
interleaving optimizes sequential memory access, and is equally
well-matched to catch read miss as the words in a block of cache
are read sequentially. Write-back caches make sequential write to
yield more efficiency from word-interleaved memory.

Types of Interleaving
Low-order X-bits of the memory address is used to identify the
target memory module (bank). The high-order Y-bits of the said
memory address are the word address (displacement or offset) of
the target location within each module. The same address can be
applied to all memory modules simultaneously. Such type of
arrangement of modules to support memory addressing is called
low-order interleaving. Figure 4.39 illustrates the scheme of this
interleaving. In this arrangement, contiguous memory locations
are
FIGURE 4.39

Low-order m-way interleaving word-address scheme with m =


2V modules, each module having w = 2>' words.

spread across the ш-modules horizontally. Low-order


interleaving facilitates block-access in a pipelined fashion.

When failure is detected in one module, the remaining modules


cannot be used. The fault isolation cannot be carried out in this
low-ordered organisation. A module failure in this arrangement
may paralyse the entire memory bank. That is why, this type of
organisation is not fault tolerant.

High-order X-bits of the memory address is used as the module


address and the low- order Y-bits as the target word address
within each module. Contiguous memory locations are hence
assigned to the same module. Such type of arrangement of
modules to support memory addressing is called high-order
interleaving. Figure 4.40 shows such an arrangement. Only one
word is accessed from each module in each memory cycle. Block
access of contiguous locations thus cannot be obtained from
high-order interleaving.

On the other hand, since sequential addresses are assigned to


each module in this organisation, it is easier to handle module
failure. When one module failure is detected in a memory bank of
ш-memory modules, the faulty memory module is isolated, the
remaining
FIGURE 4.40

High order m-way interleaving word-address scheme with m =


Iх modules, each module having w = 2у words.

modules can still be used by opening another window in the


address space. Fault tolerance is a salient feature of this
organisation apart from having other distinct disadvantages.

Interleaving in Motorola 68040


The Motorola 68000 series was introduced sometime in the late
1970s. This was one of the first chips to use 32-bits (4 bytes) for
addresses to access, in principle, a memory having 232 different
locations (words). The memory organisation of the Motorola
68040 processor, however, served to reduce the overall access
time in DRAM. The 68040 performs burst accesses to read or write
16 bytes of data in 4 adjacent long words between its caches and
memory in a single bus transaction. The interleaved memory
configuration is designed to speed up 68040 burst accesses by as
much as 30% (of course, the actual speed up depends on DRAM
access time and system clock speed). The four long words of a
burst access are spread across two physical modules of DRAM; the
individual access over each module can

FIGURE 4.41

Interleaved burst access timing.


be overlapped to reduce overall access time, and to hide part or
all of the memory access delay. This is illustrated in Figure 4.41.

Conclusion
Some aspects of this memory organisation encourage low-order
interleaving, other aspects indicate a clean sweep in favour of
high-order interleaving. However, high-order and low- order
interleaving can again be combined to yield many different
interleaved memory organisations. These different types,
however, offer normally a better bandwidth and that too, even in
the case of module failure. One of such representative
organisations is shown in Figure 4.42 using a four-way low-order
interleaving for a clear understanding of this hybrid organisation.
Here, low-order interleaving is organised in each of two memory
banks.

The advantage of this arrangement is that in case of module


failure, this organisation of the two-bank four-way design as
shown in Figure 4.42 will still offer a reduced bandwidth
FIGURE 4.42

Four-way interleaving within each memory bank; 1-bit to


address one of two banks; 2-bit to address one of

4 modules within each bank; 3-bit to address any of 8 words in


any module.

to four words per memory cycle, since only one of two faulty
banks will be invalid. The pure low-order interleaving in this
situation makes the entire memory bank out of use.

In interleaved memory organisation, the trade-offs should


consider the degree of interleaving to obtain maximum memory
bandwidth, its fault tolerance, and liberty of each memory banks so
that in the worst situation of module failure, something could be
extracted from the proposed design.

Associative Memory Organisation


Background
As technology constantly progresses, CPU-main memory speed
disparity continuously increases, thereby creating a severe
bottleneck at the CPU's end. Inclusion of cache memory in the
memory hierarchy primarily solves this problem of latency. Main
memory when organised in an interleaved fashion reduces mostly
the bandwidth problem. But, supply of data to CPU requires, firstly
to reach the particular target data from a list of it, and that too as
fast as possible. Thus, searching of target data to reach it before it
is supplied is an inherent process both in user-oriented as well as
in system-oriented applications. The traditional search procedure
usually follows a strategy that chooses a sequence of addresses,
reads the memory content at each address, and then compares
the information read with the item being searched until a match
occurs, or the sequence of addresses thus searched comes to an
end with no match. Under this scheme, the total time required to
access the desired data depends on the number of accesses to
memory which again depends on the location of the target item,
the organisation of this list of items, and the efficiency of the
search algorithm being employed as well. Many different
techniques have been proposed in this regard, a few of them have
been devised to optimize this approach of searching within the
limit of its periphery, but have been observed to have achieved
only a minimum at best.

Thus conventional approaches to identify the data by its


address (location) have been left away, and an innovative
hardware-based mechanism has been devised to accomplish fast
search to reach the desired item. The item under search here is
identified for access by its content rather than by an address. A
memory unit in which any stored item can be accessed directly by
using the contents of the item, in question, is called an Associative
Memory or Content Addressable Memory (CAM) or Parallel Search
Memory. The entire memory of this type is accessed
simultaneously, and in parallel on the basis of data content rather
than by specific address as usually happens with RAM. When a
data is stored in this memory, no address is linked with it.

Associative memory is a small hardware device usually inside


the MMU or within the CPU chip, and contains a small number of
entries, rarely more than 32. Due to its particular form of
organisation, this memory is uniquely matched to perform parallel
searches by data association. Searches can be done on an entire
word, or on a specific field within a word. The field chosen to
address the memory is called the Key. Items stored in associative
memory can be viewed as having the format: KEY, DATA; where
KEY is the address (a subfield of record), and DATA is the
information (contents of records) to be accessed. Applications
where the search time is assumed to be very critical and must be
very short are the ideal situation for associative memory to use.

Associative memory is generally used to contain the data which


are heavily in use in the form of a table during the execution of a
process. Sometimes, it is used to contain a small fraction of the
page table entries which are frequently demanded, and thus
accelerates the mapping process of virtual addresses to physical
addresses without going through the entire page table available in
RAM. It is, hence, sometimes called Translation Lookaside
BuffedTLB). The MIPS R2000, a RISC machine that has eventually
taken the associative memory idea to its limit. Here, the CPU
contains a 64-entry associative memory on the CPU chip, each
entry in this memory is of 64-bit, holds the virtual page number
and other related information. When the CPU generates a virtual
address during execution, this memory's entries are used for the
purpose of faster address translation. It is to be noted that the
addressing of associative memory is performed with the contents
of one of the fields (here, virtual page number) in each row of the
table.

The basic difference between associative memory and RAM is


that associative memory is content-addressable allowing parallel
access of multiple memory words, whereas the RAM must be
accessed by specifying the word addresses. The inherent
parallelism in associative memory has a great impact on the
architecture of associative processors, a special class of SIMD array
processors which are updated with associative memories.

An associative memory is more expensive than a RAM of the


same size, because each cell must have storage capability as well
as logic circuits for matching its content with the supplied
argument. Some additional circuit like Select Circuit is also
included in the hardware mechanism to provide other services as
will be discussed later.

The major advantage of associative memory over the RAM is its


capability of performing parallel search and comparison
operations, which are needed in many important applications,
such as table look-up, information storage and retrieval of rapidly
changing databases, radar-signal tracking, execution of image
processing, and real-time artificial intelligence computation.

The major disadvantage of associative memory is its much


increased hardware cost. Currently, associative memory is much
expensive than RAM, even though both are built with integrated
circuitry. However, with the rapid advent of VLSI technology, the
price gap between this type of memories is gradually reducing.

Implementation

Word-Organised Associative Memory

It consists of a memory array of m words, each is of n bits, and


also the related logic for this m words with n bits per word. In this
organisation, several registers are employed to accomplish
different responsibilities. Figure 4.43 shows the structure of this
type of a simple associative memory. The functions that different
registers perform are as follows:

Input register (I): The input register I holds the input. This
means that it holds the data to be written into the associative
memory, or the data to be searched for. At any instant, it holds
one word of data, i.e. a string of n bits of the memory.
Consequently, the length of the input register is n-bit.

Mask register (M): Each unit of stored information is a fixed-


length word (record).Any subfield of the word may be chosen as
the key. The mask register provides a mask for choosing a
particular field or key (i.e. the key to be searched) in the input
register's word. The maximum length of this register is n bits,
because this register has to hold a portion of the word or all bits
of the word to be searched. For example, consider an inventory
file containing various items, where each item is a record in the
file. Each such record contains
FIGURE 4.43

Block structure of a simple associative memory.

several fields, such as product code, type, product name,


description, etc. Any field can be chosen as a key for searching an
item over this file. At this moment, assume that the "productcode"
field will be used here as the key for searching an item over this
file. This desired key is specified by the mask register whose
contents identify the bit positions (not necessarily be adjacent) in
the record that define the key field.

To illustrate the searching mechanism involved with associative


memory, let the input register and mask register contain the
following information:

and the file contains three records (words) to be searched, for


example, are given as:
The contents of the mask register indicates four Is in its
leftmost four bits. This signifies that four leftmost bits of input
register will be used as key for searching. The contents of the key
field as obtained from the input register is then 1011, and this
string of bits will be searched over all the records present in the
file. Only that (or those) record (s) will be selected as match which
contains 1011 in its leftmost four bits (i.e. at the corresponding
position as given in the mask register) irrespective of the contents
of the other fields in the record (s). It is found that there is a match
for word 3 only, and not with others.

The current key is compared simultaneously with all stored


words, those that match the key will emit a match signal, which
enters a select circuit. This circuit does have a select register S of
m bits, one for each memory word. If matches are found after
comparing input data in I register with key field in M register,
then the corresponding bits in the select register (S) are set. The
select circuit enables the data field to be accessed (the back arrow
from select circuit to S).

If several entries have the same key (i.e. more than one match),
then the select circuit determines which data field to be read out;
it may, for example, read out all matched entries in some pre-
determined order. Since, all words in the memory (storage cell
arrays) are
FIGURE 4.44

A representative block diagram of an associative memory of size


мхи.

required to compare their keys simultaneously, each must have its


own match circuit. The match and select circuits make associative
memories much more complex and expensive than conventional
memories (RAM). The arrangement of the memory array and four
external registers linked with the associative memory system are
depicted in Figure 4.44.

In practice, most associative memories have the capability of


word-parallel operations, that is, all words in the associative
memory array are involved in the parallel search operation. This
differs radically from the word serial operations encountered in
RAMs.

Based on how bit slices are involved in the operation, there are
mainly two different associative memory organisation:

(i) Bit-parallel organisation, (ii) Bit-serial organisation.

i. Bit-parallel organisation: Under this scheme, the comparison


process is performed in a parallel-by-word and parallel-by-bit
fashion. All bit-slices which are not masked off by the masking
pattern are involved in the comparison process. Essentially the
entire array of cells is involved in a search operation Parallel
Element Processing Ensemble (PEPE). Burroughs Corporation has
employed associative memory with bit-parallel organisation.

ii. Bit-serial organisation: This memory organisation operates


with one bit slice at a time across all the words. The particular bit
slice is selected by an extra logic and control unit. The bit cell
read out will be used in subsequent bit-slice operations. The
associative processor STARAN (Goodyear Aerospace) has the bit-
serial memory organisation.

The bit-serial organisation requires less hardware, but is slower


in speed. The bit-parallel organisation while requiring additional
word-match detection logic, it is also faster in speed.

The logic circuit for a 1-bit associative memory cell with figure
is given in the web site: http://routledge.com/9780367255732.

The Bit-Serial organisation and Bit-Parallel organisation is


shown with figures in the web site:
http://routledge.com/9780367255732.

You might also like