0% found this document useful (0 votes)
10 views

Uniform Memory Access

Uploaded by

uma_sai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Uniform Memory Access

Uploaded by

uma_sai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Uniform Memory Access (UMA) is a shared

memory architecture used in parallel computers.


All the processors in the UMA model share the physical
memory uniformly. In a UMA architecture, access time to a
memory location is independent of which processor makes the
request or which memory chip contains the transferred data.
Uniform Memory Access computer architectures are often
contrasted with Non-Uniform Memory Access (NUMA)
architectures.
In the UMA architecture, each processor may use a private
cache. Peripherals are also shared in some fashion, The UMA
model is suitable for general purpose and time sharing
applications by multiple users. It can be used to speed up the
execution of a single large program in time critical applications.
Unified Memory Architecture (UMA) is a computer architecture
in which graphics chips are built into the motherboard and part
of the computer's main memory is used for video memory. [1]
[edit]Types of UMA architectures

1. UMA using bus-based SMP architectures


2. UMA using crossbar switches
3. UMA using multistage switching networks
[edit]

Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is


a computer memory design used in multiprocessors, where the memory access time
depends on the memory location relative to a processor. Under NUMA, a processor can
access its own local memory faster than non-local memory, that is, memory local to another
processor or memory shared between processors.

NUMA architectures logically follow in scaling from symmetric multiprocessing (SMP)


architectures. Their commercial development came in work
by Burroughs (later Unisys), Convex Computer(later Hewlett-Packard), Silicon
Graphics, Sequent Computer Systems, Data General and Digital during the 1990s.
Techniques developed by these companies later featured in a variety of Unix-likeoperating
systems, and somewhat in Windows NT.
Contents

[hide]

1 Basic concept
2 Cache coherent NUMA (ccNUMA)
3 NUMA vs. cluster computing
4 See also
5 References
6 External links

[edit]Basic concept

One possible architecture of a NUMA system. Notice that the processors are connected to the bus or crossbar by
connections of varying thickness/number. This shows that different CPUs have different priorities to memory
access based on their location.

Modern CPUs operate considerably faster than the main memory to which they are
attached. In the early days of computing and data processing the CPU generally ran slower
than its memory. The performance lines crossed in the 1960s with the advent of the
firstsupercomputers and high-speed computing. Since then, CPUs, increasingly starved for
data, have had to stall while they wait for memory accesses to complete. Many
supercomputer designs of the 1980s and 90s focused on providing high-speed memory
access as opposed to faster processors, allowing them to work on large data sets at speeds
other systems could not approach.

Limiting the number of memory accesses provided the key to extracting high performance
from a modern computer. For commodity processors, this means installing an ever-
increasing amount of high-speed cache memory and using increasingly sophisticated
algorithms to avoid "cache misses". But the dramatic increase in size of the operating
systems and of the applications run on them has generally overwhelmed these cache-
processing improvements. Multi-processor systems make the problem considerably worse.
Now a system can starve several processors at the same time, notably because only one
processor can access memory at a time.

NUMA attempts to address this problem by providing separate memory for each processor,
avoiding the performance hit when several processors attempt to address the same
memory. For problems involving spread data (common for servers and similar applications),
NUMA can improve the performance over a single shared memory by a factor of roughly the
number of processors (or separate memory banks).

Of course, not all data ends up confined to a single task, which means that more than one
processor may require the same data. To handle these cases, NUMA systems include
additional hardware or software to move data between banks. This operation has the effect
of slowing down the processors attached to those banks, so the overall speed increase due
to NUMA will depend heavily on the exact nature of the tasks run on the system at any given
time.

[edit]Cache coherent NUMA (ccNUMA)


Nearly all CPU architectures use a small amount of very fast non-shared memory known
as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache
coherenceacross shared memory has a significant overhead.

Although simpler to design and build, non-cache-coherent NUMA systems become


prohibitively complex to program in the standard von Neumann architecture programming
model. As a result, all NUMA computers sold to the market use special-purpose hardware to
maintain cache coherence[citation needed], and thus class as "cache-coherent NUMA",
or ccNUMA.

Typically, this takes place by using inter-processor communication between cache


controllers to keep a consistent memory image when more than one cache stores the same
memory location. For this reason, ccNUMA may perform poorly when multiple processors
attempt to access the same memory area in rapid succession. Operating-system support for
NUMA attempts to reduce the frequency of this kind of access by allocating processors and
memory in NUMA-friendly ways and by avoiding scheduling and locking algorithms that
make NUMA-unfriendly accesses necessary. Alternatively, cache coherency protocols such
as the MESIF protocol attempt to reduce the communication required to maintain cache
coherency. Scalable Coherent Interface (SCI) is an IEEE standard defining a directory based
cache coherency protocol to avoid scalability limitations found in earlier multiprocessor
systems. SCI is used as basis for the Numascale NumaConnect technology.

Current[when?] ccNUMA systems are multiprocessor systems based on the AMD Opteron,
which can be implemented without external logic, and Intel Itanium, which requires the
chipset to support NUMA. Examples of ccNUMA enabled chipsets are the SGI Shub (Super
hub), the Intel E8870, the HP sx2000 (used in the Integrity and Superdome servers), and
those found in recent NEC Itanium-based systems. Earlier ccNUMA systems such as those
from Silicon Graphics were based on MIPS processors and the DEC Alpha 21364 (EV7)
processor.

Intel announced NUMA introduction to its x86 and Itanium servers in late 2007
with Nehalem and Tukwila CPUs[citation needed]. Both CPU families will share a common chipset;
the interconnection is called Intel Quick Path Interconnect (QPI).

[edit]NUMA vs. cluster computing


One can view NUMA as a very tightly coupled form of cluster computing. The addition
of virtual memory paging to a cluster architecture can allow the implementation of NUMA
entirely in software where no NUMA hardware exists. However, the inter-node latency of
software-based NUMA remains several orders of magnitude greater than that of hardware-
based NUMA.

Cache only memory architecture (COMA) is a computer


memory organization for use in multiprocessors in which the
local memories (typically DRAM) at each node are used as
cache. This is in contrast to using the local memories as actual
main memory, as in NUMA organizations.
In NUMA, each address in the global address space is typically
assigned a fixed home node. When processors access some
data, a copy is made in their local cache, but space remains
allocated in the home node. Instead, with COMA, there is no
home. An access from a remote node may cause that data to
migrate. Compared to NUMA, this reduces the number of
redundant copies and may allow more efficient use of the
memory resources. On the other hand, it raises problems of
how to find a particular data (there is no longer a home node)
and what to do if a local memory fills up (migrating some data
into the local memory then needs to evict some other data,
which doesn't have a home to go to). Hardware memory
coherence mechanisms are typically used to implement the
migration.
A huge body of research has explored these issues. Various
forms of directories, policies for maintaining free space in the
local memories, migration policies, and policies for read-only
copies have been developed. Hybrid NUMA-COMA
organizations have also been proposed, such as Reactive
NUMA, which allows pages to start in NUMA mode and switch
to COMA mode if appropriate and is implemented in the Sun
WildFire

You might also like