Distributed
Distributed
"Distributed Information Processing" redirects here. For the computer company, see DIP
Research.
Introduction[edit]
A distributed system may have a common goal, such as solving a large computational
problem;[10] the user then perceives the collection of autonomous processors as a unit.
Alternatively, each computer may have its own user with individual needs, and the
purpose of the distributed system is to coordinate the use of shared resources or
provide communication services to the users.[11]
Each computer has only a limited, incomplete view of the system. Each computer may
know only one part of the input.[14]
In distributed computing, each processor has its own private memory (distributed
memory). Information is exchanged by passing messages between the processors.[19]
The figure on the right illustrates the difference between distributed and parallel
systems. Figure (a) is a schematic view of a typical distributed system; the system is
represented as a network topology in which each node is a computer and each line
connecting the nodes is a communication link. Figure (b) shows the same distributed
system in more detail: each computer has its own local memory, and information can be
exchanged only by passing messages from one node to another by using the available
communication links. Figure (c) shows a parallel system in which each processor has a
direct access to a shared memory.
The situation is further complicated by the traditional uses of the terms parallel and
distributed algorithm that do not quite match the above definitions of parallel and
distributed systems (see below for more detailed discussion). Nevertheless, as a rule of
thumb, high-performance parallel computation in a shared-memory multiprocessor uses
parallel algorithms while the coordination of a large-scale distributed system uses
distributed algorithms.[20]
History[edit]
The use of concurrent processes which communicate through message-passing has its
roots in operating system architectures studied in the 1960s.[21] The first widespread
distributed systems were local-area networks such as Ethernet, which was invented in
the 1970s.[22]
ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and
ARPANET e-mail was invented in the early 1970s. E-mail became the most successful
application of ARPANET,[23] and it is probably the earliest example of a large-scale
distributed application. In addition to ARPANET (and its successor, the Internet), other
early worldwide computer networks included Usenet and FidoNet from the 1980s, both
of which were used to support distributed discussion systems.[24]
The study of distributed computing became its own branch of computer science in the
late 1970s and early 1980s. The first conference in the field, Symposium on Principles of
Distributed Computing (PODC), dates back to 1982, and its counterpart International
Symposium on Distributed Computing (DISC) was first held in Ottawa in 1985 as the
International Workshop on Distributed Algorithms on Graphs.[25]
Architectures[edit]
Various hardware and software architectures are used for distributed computing. At a
lower level, it is necessary to interconnect multiple CPUs with some sort of network,
regardless of whether that network is printed onto a circuit board or made up of loosely
coupled devices and cables. At a higher level, it is necessary to interconnect processes
running on those CPUs with some sort of communication system.[26]
Distributed programming typically falls into one of several basic architectures: client–
server, three-tier, n-tier, or peer-to-peer; or categories: loose coupling, or tight coupling.
[27]
Client–server: architectures where smart clients contact the server for data then
format and display it to the users. Input at the client is committed back to the server
when it represents a permanent change.
Three-tier: architectures that move the client intelligence to a middle tier so that
stateless clients can be used. This simplifies application deployment. Most web
applications are three-tier.
n-tier: architectures that refer typically to web applications which further forward their
requests to other enterprise services. This type of application is the one most
responsible for the success of application servers.
Peer-to-peer: architectures where there are no special machines that provide a service
or manage the network resources.[28]:227 Instead all responsibilities are uniformly
divided among all machines, known as peers. Peers can serve both as clients and as
servers.[29] Examples of this architecture include BitTorrent and the bitcoin network.
Applications[edit]
Reasons for using distributed systems and distributed computing may include:
The very nature of an application may require the use of a communication network
that connects several computers: for example, data produced in one physical location
and required in another location.
There are many cases in which the use of a single computer would be possible in
principle, but the use of a distributed system is beneficial for practical reasons. For
example, it may be more cost-efficient to obtain the desired level of performance by
using a cluster of several low-end computers, in comparison with a single high-end
computer. A distributed system can provide more reliability than a non-distributed
system, as there is no single point of failure. Moreover, a distributed system may be
easier to expand and manage than a monolithic uniprocessor system.[32]
Examples[edit]
telecommunication networks:
routing algorithms;
network applications:
parallel computation:
scientific computing, including cluster computing, grid computing, cloud
computing,[34] and various volunteer computing projects (see the list of distributed
computing projects),
Theoretical foundations[edit]
Models[edit]
Many tasks that we would like to automate by using a computer are of question–answer
type: we would like to ask a question and the computer should produce an answer. In
theoretical computer science, such tasks are called computational problems. Formally, a
computational problem consists of instances together with a solution for each instance.
Instances are questions that we can ask, and solutions are desired answers to these
questions.
The field of concurrent and distributed computing studies similar questions in the case
of either multiple computers, or a computer that executes a network of interacting
processes: which computational problems can be solved in such a network and how
efficiently? However, it is not at all obvious what is meant by "solving a problem" in the
case of a concurrent or distributed system: for example, what is the task of the algorithm
designer, and what is the concurrent or distributed equivalent of a sequential general-
purpose computer?[citation needed]
The discussion below focuses on the case of multiple computers, although many of the
issues are the same for concurrent processes running on a single computer.
All processors have access to a shared memory. The algorithm designer chooses the
program executed by each processor.
One theoretical model is the parallel random access machines (PRAM) that are used.
[37] However, the classical PRAM model assumes synchronous access to the shared
memory.
The algorithm designer chooses the structure of the network, as well as the program
executed by each computer.
Models such as Boolean circuits and sorting networks are used.[40] A Boolean circuit
can be seen as a computer network: each gate is a computer that runs an extremely
simple computer program. Similarly, a sorting network can be seen as a computer
network: each comparator is a computer.
Distributed algorithms in message-passing model
The algorithm designer only chooses the computer program. All computers run the
same program. The system must work correctly regardless of the structure of the
network.
A commonly used model is a graph with one finite-state machine per node.
An example[edit]
The graph G is encoded as a string, and the string is given as input to a computer.
The computer program finds a coloring of the graph, encodes the coloring as a string,
and outputs the result.
Parallel algorithms
Again, the graph G is encoded as a string. However, multiple computers can access
the same string in parallel. Each computer might focus on one part of the graph and
produce a coloring for that part.
The graph G is the structure of the computer network. There is one computer for each
node of G and one communication link for each edge of G. Initially, each computer only
knows about its immediate neighbors in the graph G; the computers must exchange
messages with each other to discover more about the structure of G. Each computer
must produce its own color as output.
While the field of parallel algorithms has a different focus than the field of distributed
algorithms, there is much interaction between the two fields. For example, the Cole–
Vishkin algorithm for graph coloring[41] was originally presented as a parallel algorithm,
but the same technique can also be used directly as a distributed algorithm.
Complexity measures[edit]
In parallel algorithms, yet another resource in addition to time and space is the number
of computers. Indeed, often there is a trade-off between the running time and the
number of computers: the problem can be solved faster if there are more computers
running in parallel (see speedup). If a decision problem can be solved in polylogarithmic
time by using a polynomial number of processors, then the problem is said to be in the
class NC.[43] The class NC can be defined equally well by using the PRAM formalism or
Boolean circuits—PRAM machines can simulate Boolean circuits efficiently and vice
versa.[44]
In the analysis of distributed algorithms, more attention is usually paid on
communication operations than computational steps. Perhaps the simplest model of
distributed computing is a synchronous system where all nodes operate in a lockstep
fashion. This model is commonly known as the LOCAL model. During each
communication round, all nodes in parallel (1) receive the latest messages from their
neighbours, (2) perform arbitrary local computation, and (3) send new messages to their
neighbors. In such systems, a central complexity measure is the number of synchronous
communication rounds required to complete the task.[45]
This complexity measure is closely related to the diameter of the network. Let D be the
diameter of the network. On the one hand, any computable problem can be solved
trivially in a synchronous distributed system in approximately 2D communication
rounds: simply gather all information in one location (D rounds), solve the problem, and
inform each node about the solution (D rounds).
On the other hand, if the running time of the algorithm is much smaller than D
communication rounds, then the nodes in the network must produce their output
without having the possibility to obtain information about distant parts of the network.
In other words, the nodes must make globally consistent decisions based on
information that is available in their local D-neighbourhood. Many distributed
algorithms are known with the running time much smaller than D rounds, and
understanding which problems can be solved by such algorithms is one of the central
research questions of the field.[46] Typically an algorithm which solves a problem in
polylogarithmic time in the network size is considered efficient in this model.
Another commonly used measure is the total number of bits transmitted in the network
(cf. communication complexity).[47] The features of this concept are typically captured
with the CONGEST(B) model, which similarly defined as the LOCAL model but where
single messages can only contain B bits.
Other problems[edit]
Traditional computational problems take the perspective that the user asks a question, a
computer (or a distributed system) processes the question, then produces an answer
and stops. However, there are also problems where the system is required not to stop,
including the dining philosophers problem and other similar mutual exclusion problems.
In these problems, the distributed system is supposed to continuously coordinate the
use of shared resources so that no conflicts or deadlocks occur.
There are also fundamental challenges that are unique to distributed computing, for
example those related to fault-tolerance. Examples of related problems include
consensus problems,[48] Byzantine fault tolerance,[49] and self-stabilisation.[50]
Election[edit]
Coordinator election (or leader election) is the process of designating a single process
as the organizer of some task distributed among several computers (nodes). Before the
task is begun, all network nodes are either unaware which node will serve as the
"coordinator" (or leader) of the task, or unable to communicate with the current
coordinator. After a coordinator election algorithm has been run, however, each node
throughout the network recognizes a particular, unique node as the task coordinator.
[54]
The network nodes communicate among themselves in order to decide which of them
will get into the "coordinator" state. For that, they need some method in order to break
the symmetry among them. For example, if each node has unique and comparable
identities, then the nodes can compare their identities, and decide that the node with
the highest identity is the coordinator.[54]
Many other algorithms were suggested for different kind of network graphs, such as
undirected rings, unidirectional rings, complete graphs, grids, directed Euler graphs, and
others. A general method that decouples the issue of the graph family from the design
of the coordinator election algorithm was suggested by Korach, Kutten, and Moran.[57]
So far the focus has been on designing a distributed system that solves a given
problem. A complementary research problem is studying the properties of a given
distributed system.[59][60]
The halting problem is an analogous example from the field of centralised computation:
we are given a computer program and the task is to decide whether it halts or runs
forever. The halting problem is undecidable in the general case, and naturally
understanding the behaviour of a computer network is at least as hard as understanding
the behaviour of one computer.[61]
However, there are many interesting special cases that are decidable. In particular, it is
possible to reason about the behaviour of a network of finite-state machines. One
example is telling whether a given network of interacting (asynchronous and non-
deterministic) finite-state machines can reach a deadlock. This problem is PSPACE-
complete,[62] i.e., it is decidable, but not likely that there is an efficient (centralised,
parallel or distributed) algorithm that solves the problem in the case of large networks.