Distrubuted Computing
Distrubuted Computing
Applications
Distributed Computing Systems have a number of applications,
including:
• Cloud Computing: Cloud Computing systems are a type of
distributed computing system that are used to deliver resources
such as computing power, storage, and networking over the
Internet.
• Peer-to-Peer Networks: Peer-to-Peer Networks are a type of
distributed computing system that is used to share resources
such as files and computing power among users.
• Distributed Architectures: Many modern computing systems,
such as microservices architectures, use distributed architectures
to distribute processing and data storage across multiple devices
or systems.
Parallel computing:
• Refers to executing multiple processors or computing units
simultaneously to solve large problems by breaking them into
smaller independent parts.
• Involves multiple CPUs communicating via shared memory and
combining results upon completion.
• Increases computation power and speeds up application
processing.
EFFICIENCY
Speedup Performance Law
1. Amdahl’s Law
Amdahl’s Law was named after Gene Amdahl, who presented it in
1967. In general terms, Amdahl’s Law states that in parallelization,
if P is the proportion of a system or program that can be made
parallel, and 1-P is the proportion that remains serial, then the
maximum speedup S(N) that can be achieved using N processors is:
S(N)=1/((1-P)+(P/N))
As N grows the speedup tends to 1/(1-P).
Speedup is limited by the total time needed for the sequential (serial)
part of the program. For 10 hours of computing, if we can parallelize
9 hours of computing and 1 hour cannot be parallelized,then our
maximum speedup is limited to 10 times as fast. If computers get
faster the speedup itself stays the same.
2. Gustafson’s Law
This law says that increase of problem size for large machines can
retain scalability with respect to the number of processors. American
computer scientist and businessman, John L. Gustafson (born
January 19, 1955) found out that practical problems show much
better speedup than Amdahl predicted.
Gustafson’s law: The computation time is constant (instead of the
problem size), increasing number of CPUs solve bigger problem
and get better results in the same time.
Execution time of program on a parallel computer is (a+b)
a is the sequential time and b is the parallel time
Total amount of work to be done in parallel varies linearly with
the number of processors.
So b is fixed as P is varied. The total run time is (a+p*b)
The speed up is (a+p*b)/(a+b)
Define α = a/(a+b), the sequential fraction of the execution time,
then
Any sufficiently large problem can be efficiently parallelized
with a speedup
Where p is the number of processors, and α is the serial portion
of the problem
Gustafson proposed a fixed time concept which leads to scaled
speed up for larger problem
sizes.
Basically, we use larger systems with more processors to solve
larger problems
UNIPROCESSOR ARCHITECTURE:
Computer architecture encompasses the design and structure of
computer systems, including their components and the
interconnections between them.
It defines the functionality of a computer system by specifying the
operations it can perform, such as arithmetic calculations, data
storage, and control flow.
Computer architecture determines how the various hardware
components, such as the central processing unit (CPU), memory,
input/output (I/O) devices, and peripherals, are organized and
interact with each other.
It involves designing instruction sets and addressing modes that
dictate how the CPU interprets and executes instructions.
Computer architecture also includes the design of memory
hierarchies, which involve different levels of memory (such as
cache, RAM, and secondary storage) that work together to store and
retrieve data efficiently.
Additionally, computer architecture encompasses the design of
communication pathways and protocols that facilitate data transfer
between different components, as well as the overall system
performance and power consumption considerations.
RI
CISC
SC
RISC is a reduced instruction CISC is a complex instruction
1.
set. set.
The number of instructions is The number of instructions is
2.
less as compared to CISC. more as compared to RISC.
3. The addressing modes are less. The addressing modes are more.
It works in a fixed instruction It works in a variable instruction
4.
format. format.
The RISC consumes low The CISC consumes high
5.
power. power.
The RISC processors are The CISC processors are less
6.
highly pipelined. pipelined.
It optimizes the performance It optimizes the performance by
7.
by focusing on software. focusing on hardware.
Requires less RAM.
8. Requires more RAM.
Multicomputer
Multicomputer is a system architecture composed of multiple
independent computers connected via a network.Each computer in a
multicomputer operates autonomously with its own local resources,
including memory and processors.Communication between
computers in a multicomputer is achieved through message passing
over the network.Multicomputers can be highly scalable, allowing
for the addition of more computers to expand computational power.
Unlike shared memory architectures, multicomputers do not have a
single global address space.Multicomputers offer flexibility in terms
of heterogeneous hardware configurations and distributed computing
models.Load balancing and task distribution are important
considerations in multicomputer systems to ensure efficient
utilization of
resources across the
network.
NORMA (No-Remote Memory Access)
NORMA (No-Remote Memory Access) restricts direct remote
memory access in a multiprocessor system.
Each processor in NORMA has its dedicated local memory for direct
access.
NORMA does not provide a shared global address space like UMA
or NUMA.
Communication between processors in NORMA relies on explicit
message passing or inter-processor communication methods.
NORMA emphasizes efficient utilization of local memory to reduce
remote memory access latency.
NORMA offers improved scalability and performance by limiting
remote memory access and promoting local memory usage.
Multiprocessor Model Multicomputer Model
Shared memory
architecture with a single Each computer has its own
Memory
memory accessed by all local memory.
processors.
Communication between
Direct communication
Communic computers is achieved
between processors
ation through message passing
through shared memory.
over a network.
Highly scalable as
Limited scalability due to
computers can be added or
Scalability contention and shared
removed from the network
memory access.
easily.
Global Provides a single global No single global address
Address address space accessible space; each computer has its
Space by all processors. own address space.
Synchronization
Requires explicit message
Synchroniz mechanisms rely on shared
passing for synchronization
ation memory for efficient
between computers.
coordination.
Load balancing can be
Load balancing can be
Load challenging due to shared
implemented by distributing
Balancing memory access and
tasks across computers.
contention.
Failure of a processor can
Fault impact the entire system's Individual computer failures
Tolerance functionality and have a localized impact.
performance.
NUMA (Non- COMA NORMA (No-
UMA (Uniform
Uniform (Cache-Only Remote
Memory
Memory Memory Memory
Access)
Access) Architecture) Access)
Memory access
Single memory Memory is Each processor
time can vary
Memory Access accessed by all replaced by has its own
based on
processors cache memory local memory
location
Shared
Shared No shared
Shared memory memory
Shared Memory memory global address
architecture consists of
architecture space
cache memory
Memory
Hierarchy Yes Yes No No
Cache
Shared bus, Explicit
Interconnect directory (D)
Communication multiple bus, or message
network for remote
crossbar switch passing
cache access
Limited
Limited Scalable as
scalability due Scalable with
scalability due processors can
Scalability to lack of addition/remov
to shared have dedicated
memory al of processors
memory access memory
hierarchy
Balanced shared Varying Emphasizes Limits remote
Performance
memory access memory access efficient memory access,
among time based on utilization of improving
processors distance cache memory performance
Kendall Square
Symmetric
Research's
Example Multiprocessor -
KSR-1
(SMP) systems
machine
Flynn's taxonomy:Flynn's taxonomy is a classification system used
to categorize computer architectures based on the number of
instruction streams and data streams that can be processed
simultaneously. It was proposed by Michael J. Flynn in 1966 and
has four main categories, as follows:
Single Instruction, Single Data (SISD):
This category represents the traditional sequential processing model.
It consists of a single processor executing a single instruction stream
on a single set of data.
Examples include most conventional desktop computers.
Single Instruction, Multiple Data (SIMD):
In this category, a single instruction is applied to multiple data
elements simultaneously.
It involves a single control unit that issues the same instruction to
multiple processing units.
SIMD architectures are commonly used for tasks that involve
parallel processing, such as multimedia applications and scientific
simulations.
Multiple Instruction, Single Data (MISD):
This category involves multiple instructions operating on a single
data stream.
MISD architectures are relatively rare and less commonly used in
practical systems.
They were initially proposed for error checking and fault tolerance
purposes but have limited real-world implementations.
Multiple Instruction, Multiple Data (MIMD):
MIMD architectures have multiple processors that independently
execute different instruction streams on different data streams.
Each processor in a MIMD system can operate on its own data and
execute its own instructions.MIMD is the most common category
and includes various parallel processing architectures, such as
clusters, multiprocessor systems, and distributed systems.
Feng’s classification:
Feng’s classification: (1972) is based on serial versus parallel
processing.
Under above classification:
1.Word Serial and Bit Serial (WSBS):
In WSBS architecture, both data and instructions are processed
serially, one word or bit at a time.
The entire word or bit is processed before moving on to the next
word or bit.
This type of architecture is typically slower compared to parallel
processing types.
Word Parallel and Bit Serial (WPBS):
WPBS architecture involves parallel processing of multiple words
simultaneously, but within each word, processing occurs serially.
Multiple words are processed concurrently, but within a word, the
individual bits are processed serially.
This type of architecture can provide some performance
improvement over WSBS by utilizing parallelism at the word level.
Word Serial and Bit Parallel (WSBP):
In WSBP architecture, each word is processed serially, but multiple
bits within the word are processed in parallel.
The architecture allows for parallel processing of the bits within a
word, improving performance compared to WSBS.
This type of architecture is commonly used in SIMD (Single
Instruction, Multiple Data) systems.
Word Parallel and Bit Parallel (WPBP):
WPBP architecture involves parallel processing of both words and
bits.
Multiple words are processed simultaneously, and within each word,
multiple bits are processed concurrently.
This type of architecture offers the highest level of parallelism and
can provide significant performance improvements.
Distributed Memory Multi-computers
In a distributed-memory multiprocessor, each processor has its own
associated memory module.
Processors can directly access their own memory but require a
message passing mechanism (such as MPI) to access memory
associated with other processors.
Memory access in distributed-memory multiprocessors is non-
uniform, meaning it depends on which memory module a processor
is trying to access. This is known as a Non-Uniform Memory Access
(NUMA) multiprocessor system.
If all processors in a distributed-memory multiprocessor are
identical, it is referred to as a Symmetric Multiprocessor (SMP).
If the processors in a distributed-memory multiprocessor are
heterogeneous, it is called an Asymmetric Multiprocessor (ASMP).
Distributed-memory systems are easier to build but harder to use, as
they consist of multiple shared-memory computers with separate
operating systems and memory.
Distributed-memory multiprocessors are the architecture of choice
for constructing modern supercomputers due to their scalability and
ability to handle large-scale parallel computing tasks.
Shared Memory Multi-processors
In a shared-memory multiprocessor, both data and code in a parallel
program are stored in the main memory accessible to all processors.
All processors in a shared-memory system have direct hardware
access to the entire main memory address space.
Shared-memory multiprocessors consist of a limited number of
processors that can concurrently access and modify shared memory.
In this architecture, all CPU cores can access the same memory,
similar to multiple workers sharing a whiteboard, and are controlled
by a single operating system.
Modern processors are often multicore, with multiple CPU cores
integrated on a single chip.
Shared-memory systems are relatively easier to use, as all processors
can access shared memory without explicit message passing.
Shared-memory multiprocessors are well-suited for laptops and
desktops, providing a convenient architecture for general-purpose
computing tasks.
Introduction to Distributed Systems
Distributed System:
A collection of autonomous computer systems connected by a
centralized computer network.
Autonomous systems communicate by sharing resources and files.
Example: Social media with a centralized network as headquarters
and autonomous systems for user access.