Chapter-1 Introduction To Distributed System
Chapter-1 Introduction To Distributed System
Informatics
4/23/2020 WSU@School-Informatics 1
Outline
• Definition of a Distributed System
• Goals of a Distributed System
• Types of Distributed Systems
4/23/2020 WSU@School-Informatics 2
What Is A Distributed System?
• A collection of independent computers that
appears to its users as a single coherent system.
• Features:
– No shared memory – message-based communication
– Each runs its own local OS
– Heterogeneity
• Ideal: to present a single-system image:
– The distributed system “looks like” a single
computer rather than a collection of separate
computers.
4/23/2020 WSU@School-Informatics 3
Distributed System
Characteristics
• To present a single-system image:
– Hide internal organization, communication details
– Provide uniform interface
• Easily expandable
– Adding new computers is hidden from users
• Continuous availability
– Failures in one component can be covered by other
components
• Supported by middleware
4/23/2020 WSU@School-Informatics 4
Definition of a Distributed System
4/23/2020 WSU@School-Informatics 6
Middleware Examples
• CORBA (Common Object Request Broker
Architecture)
• DCOM (Distributed Component Object
Management) – being replaced by .net
• Sun’s ONC RPC (Remote Procedure Call)
• RMI (Remote Method Invocation)
• SOAP (Simple Object Access Protocol)
4/23/2020 WSU@School-Informatics 7
Middleware Examples
• All of the previous examples support
communication across a network:
• They provide protocols that allow a
program running on one kind of computer,
using one kind of operating system, to call a
program running on another computer with
a different operating system
– The communicating programs must be running
the same middleware.
4/23/2020 WSU@School-Informatics 8
Distributed System Goals
• Resource Accessibility
• Distribution Transparency
• Openness
• Scalability
4/23/2020 WSU@School-Informatics 9
Goal 1 – Resource Availability
• Support user access to remote resources (printers,
data files, web pages, CPU cycles) and the fair
sharing of the resources
• Economics of sharing expensive resources
• Performance enhancement – due to multiple
processors; also due to ease of collaboration and
info exchange – access to remote services
– Groupware: tools to support collaboration
• Resource sharing introduces security problems.
4/23/2020 WSU@School-Informatics 10
Goal 2 – Distribution Transparency
• Software hides some of the details of the
distribution of system resources.
– Makes the system more user friendly.
• A distributed system that appears to its users &
applications to be a single computer system is said
to be transparent.
– Users & apps should be able to access remote
resources in the same way they access local
resources.
• Transparency has several dimensions.
4/23/2020 WSU@School-Informatics 11
Types of Transparency
Transparency Description
Access Hide differences in data representation &
resource access (enables interoperability)
Location Hide location of resource (can use resource
without knowing its location)
Migration Hide possibility that a system may change
location of resource (no effect on access)
Replication Hide the possibility that multiple copies of the
resource exist (for reliability and/or availability)
Concurrency Hide the possibility that the resource may be
shared concurrently
Failure Hide failure and recovery of the resource. How
does one differentiate betw. slow and failed?
Relocation Hide that resource may be moved during use
4/23/2020 WSU@School-Informatics 13
Goal 3 - Openness
• An open distributed system “…offers services according to
standard rules that describe the syntax and semantics of those
services.” In other words, the interfaces to the system are
clearly specified and freely available.
– Compare to network protocols
– Not proprietary
• Interface Definition/Description Languages (IDL): used to
describe the interfaces between software components, usually
in a distributed system
– Definitions are language & machine independent
– Support communication between systems using different
OS/programming languages; e.g. a C++ program running on Windows
communicates with a Java program running on UNIX
– Communication is usually RPC-based.
4/23/2020 WSU@School-Informatics 14
Examples of IDLs
Goal 3-Openness
• IDL: Interface Description Language
– The original
• WSDL: Web Services Description Language
– Provides machine-readable descriptions of the
services
• OMG IDL: used for RPC in CORBA
– OMG – Object Management Group
4/23/2020 WSU@School-Informatics 15
Open Systems Support …
• Interoperability: the ability of two different
systems or applications to work together
– A process that needs a service should be able to talk to
any process that provides the service.
– Multiple implementations of the same service may be
provided, as long as the interface is maintained
• Portability: an application designed to run on one
distributed system can run on another system which
implements the same interface.
• Extensibility: Easy to add new components, features
4/23/2020 WSU@School-Informatics 16
Goal 4 - Scalability
• Dimensions that may scale:
– With respect to size
– With respect to geographical distribution
– With respect to the number of administrative
organizations spanned
• A scalable system still performs well as it
scales up along any of the three dimensions.
4/23/2020 WSU@School-Informatics 17
Size Scalability
4/23/2020 WSU@School-Informatics 18
Decentralized Algorithms
4/23/2020 WSU@School-Informatics 20
Scalability - Administrative
• Different domains may have different
policies about resource usage, management,
security, etc.
• Trust often stops at administrative
boundaries
– Requires protection from malicious attacks
4/23/2020 WSU@School-Informatics 21
Scaling Techniques
• Scalability affects performance more than
anything else.
• Three techniques to improve scalability:
– Hiding communication latencies
– Distribution
– Replication
4/23/2020 WSU@School-Informatics 22
Hiding Communication Delays
4/23/2020 WSU@School-Informatics 25
Scaling Techniques (2)
4/23/2020 WSU@School-Informatics 28
Summary
Goals for Distribution
• Resource accessibility
– For sharing and enhanced performance
• Distribution transparency
– For easier use
• Openness
– To support interoperability, portability, extensibility
• Scalability
– With respect to size (number of users), geographic
distribution, administrative domains
4/23/2020 WSU@School-Informatics 29
Issues/Pitfalls of Distribution
• Requirement for advanced software to realize the
potential benefits.
• Security and privacy concerns regarding network
communication
• Replication of data and services provides fault
tolerance and availability, but at a cost.
• Network reliability, security, heterogeneity,
topology
• Latency and bandwidth
• Administrative domains
4/23/2020 WSU@School-Informatics 30
Distributed Systems
• Early distributed systems emphasized the
single system image – often tried to make a
networked set of computers look like an
ordinary general purpose computer
• Examples: Amoeba, Sprite, NOW, Condor
(distributed batch system), …
4/23/2020 WSU@School-Informatics 31
Types of Distributed Systems
• Distributed Computing Systems
– Clusters
– Grids
– Clouds
• Distributed Information Systems
– Transaction Processing Systems
– Enterprise Application Integration
• Distributed Embedded Systems
– Home systems
– Health care systems
– Sensor networks
4/23/2020 WSU@School-Informatics 32
Cluster Computing
• A collection of similar processors (PCs,
workstations) running the same operating
system, connected by a high-speed LAN.
• Parallel computing capabilities using
inexpensive PC hardware
• Replace big parallel computers (MPPs)
4/23/2020 WSU@School-Informatics 33
Cluster Types & Uses
• High Performance Clusters (HPC)
– run large parallel programs
– Scientific, military, engineering apps; e.g., weather
modeling
• Load Balancing Clusters
– Front end processor distributes incoming requests
– server farms (e.g., at banks or popular web site)
• High Availability Clusters (HA)
– Provide redundancy – back up systems
– May be more fault tolerant than large mainframes
4/23/2020 WSU@School-Informatics 34
Clusters
• Linux-based
• Master-slave paradigm
– One processor is the master; allocates tasks to
other processors, maintains batch queue of
submitted jobs, handles interface to users
– Master has libraries to handle message-based
communication or other features (the
middleware).
4/23/2020 WSU@School-Informatics 35
Cluster Computing Systems
• Figure 1-6. An example of a cluster
computing system.
4/23/2020 WSU@School-Informatics 36
Clusters – MOSIX model
• Provides a symmetric, rather than
hierarchical paradigm
– High degree of distribution transparency (single
system image)
– Processes can migrate between nodes
dynamically and preemptively (more about this
later.) Migration is automatic
• Used to manage Linux clusters
4/23/2020 WSU@School-Informatics 37
More About MOSIX
“The MOSIX Management System for Linux Clusters, Multi-clusters,
GPU Clusters and Clouds”, A. Barak and A. Shiloh”
4/23/2020 WSU@School-Informatics 41
OGSA – Another Grid Architecture*
4/23/2020 WSU@School-Informatics 42
Globus Toolkit*
• An example of grid middleware
• Supports the combination of heterogeneous
platforms into virtual organizations.
• Implements the OSGA standards, among
others.
4/23/2020 WSU@School-Informatics 43
Cloud Computing
• Provides scalable services as a utility over
the Internet.
• Often built on a computer grid
• Users buy services from the cloud
– Grid users may develop and run their own
software
• Cluster/grid/cloud distinctions blur at the
edges!
4/23/2020 WSU@School-Informatics 44
Types of Distributed Systems
4/23/2020 WSU@School-Informatics 45
Distributed Information Systems
• Business-oriented
• Systems to make a number of separate
network applications interoperable and
build “enterprise-wide information
systems”.
• Two types discussed here:
– Transaction processing systems
– Enterprise application integration (EAI)
4/23/2020 WSU@School-Informatics 46
Transaction Processing Systems
• Provide a highly structured client-server
approach for database applications
• Transactions are the communication model
• Obey the ACID properties:
– Atomic: all or nothing
– Consistent: invariants are preserved
– Isolated (serializable)
– Durable: committed operations can’t be undone
4/23/2020 WSU@School-Informatics 47
Transaction Processing Systems
• Figure 1-8. Example primitives for
transactions.
4/23/2020 WSU@School-Informatics 48
Transactions
• Transaction processing may be centralized
(traditional client/server system) or
distributed.
• A distributed database is one in which the
data storage is distributed – connected to
separate processors.
4/23/2020 WSU@School-Informatics 49
Nested Transactions
• A nested transaction is a transaction within
another transaction (a sub-transaction)
– Example: a transaction may ask for two things
(e.g., airline reservation info + hotel info)
which would spawn two nested transactions
• Primary transaction waits for the results.
– While children are active parent may only
abort, commit, or spawn other children
4/23/2020 WSU@School-Informatics 50
Transaction Processing Systems
4/23/2020 WSU@School-Informatics 55
Home System
• Built around one or more PCs, but can also
include other electronic devices:
– Automatic control of lighting, sprinkler
systems, alarm systems, etc.
– Network enabled appliances
– PDAs and smart phones, etc.
4/23/2020 WSU@School-Informatics 56
Electronic Health Care Systems
4/23/2020 WSU@School-Informatics 57
Sensor Networks
• A collection of geographically distributed nodes
consisting of a comm. device, a power source,
some kind of sensor, a small processor…
• Purpose: to collectively monitor sensory data
(temperature, sound, moisture etc.,) and transmit
the data to a base station
• “smart environment” – the nodes may do some
rudimentary processing of the data in addition to
their communication responsibilities.
4/23/2020 WSU@School-Informatics 58
Sensor Networks
4/23/2020 WSU@School-Informatics 60
Summary – Types of Systems
• Distributed computing systems – our main
emphasis
• Distributed information systems – we will
talk about some aspects of them
• Distributed pervasive systems – not so
much
4/23/2020 WSU@School-Informatics 61
Additional Slides
• Middleware: CORBA, ONC RPC, SOAP
• Distributed Systems – Historical
Perspective
• Grid Computing Sites
4/23/2020 WSU@School-Informatics 62
CORBA
• “CORBA is the acronym for Common Object
Request Broker Architecture, OMG's open,
vendor-independent architecture and infrastructure
that computer applications use to work together
over networks. Using the standard protocol IIOP,
a CORBA-based program from any vendor, on
almost any computer, operating system,
programming language, and network, can
interoperate with a CORBA-based program from
the same or another vendor, on almost any other
computer, operating system, programming
language, and network.”
http://www.omg.org/gettingstarted/corbafaq.htm
4/23/2020 WSU@School-Informatics 63
ONC RPC
• “ONC RPC, short for Open Network
Computing Remote Procedure Call, is a
widely deployed remote procedure call
system. ONC was originally developed by
Sun Microsystems as part of their Network
File System project, and is sometimes
referred to as Sun ONC or Sun RPC.”
http://en.wikipedia.org/wiki/Open_Network_Computing_Remote_Procedure_Call
4/23/2020 WSU@School-Informatics 64
Simple Object Access Protocol
• SOAP is a lightweight protocol for exchange of information in a
decentralized, distributed environment. It is an XML based protocol that
consists of three parts: an envelope that defines a framework for describing
what is in a message and how to process it, a set of encoding rules for
expressing instances of application-defined data types, and a convention for
representing remote procedure calls and responses. SOAP can potentially be
used in combination with a variety of other protocols; however, the only
bindings defined in this document describe how to use SOAP in combination
with HTTP and HTTP Extension Framework.
• http://www.w3.org/TR/2000/NOTE-SOAP-20000508/
4/23/2020 WSU@School-Informatics 65
Historical Perspective - MPPs
• Compare clusters to the Massively Parallel
Processors of the 1990’s
• Many separate nodes, each with its own private
memory –hundreds or thousands of nodes (e.g.,
Cray T3E, nCube)
– Manufactured as a single computer with a
proprietary OS, very fast communication network.
– Designed to run large, compute-intensive parallel
applications
– Expensive, long time-to-market cycle
4/23/2020 WSU@School-Informatics 66
Historical Perspective - NOWs
• Networks of Workstations
• Designed to harvest idle workstation cycles
to support compute-intensive applications.
• Advocates contended that if done properly,
you could get the power of an MPP at
minimal additional cost.
• Supported general-purpose processing and
parallel applications
4/23/2020 WSU@School-Informatics 67
Other Grid Resources
• The Globus Alliance: “a community of organizations
and individuals developing fundamental technologies
behind the "Grid," which lets people share computing
power, databases, instruments, and other on-line tools
securely across corporate, institutional, and geographic
boundaries without sacrificing local autonomy”
• Grid Computing Info Center: “aims to promote the
development and advancement of technologies that
provide seamless and scalable access to wide-area
distributed resources”
4/23/2020 WSU@School-Informatics 68