0% found this document useful (0 votes)
2 views30 pages

CC Notes

The document discusses the transition from High-Performance Computing (HPC) to High-Throughput Computing (HTC) and the implementation of Peer-to-Peer (P2P) systems, highlighting their self-organizing nature and challenges. It also covers cluster architectures, design principles, and classifications, emphasizing the importance of scalability, fault tolerance, and resource management. Additionally, it introduces various virtualization levels, including instruction set architecture and hardware abstraction, as well as the vCUDA model for GPU virtualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views30 pages

CC Notes

The document discusses the transition from High-Performance Computing (HPC) to High-Throughput Computing (HTC) and the implementation of Peer-to-Peer (P2P) systems, highlighting their self-organizing nature and challenges. It also covers cluster architectures, design principles, and classifications, emphasizing the importance of scalability, fault tolerance, and resource management. Additionally, it introduces various virtualization levels, including instruction set architecture and hardware abstraction, as well as the vCUDA model for GPU virtualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

MODULE 1

High-Throughput Computing The development of market-oriented high-end computing


systems is undergoing a strategic change from an HPC paradigm to an HTC paradigm. This
HTC paradigm pays more attention to high-flux computing. HTC is usually implemented using
P2P systems.

P2P systems

A P2P system, every node acts as both a client and a server, providing part of the system
resources. Peer machines are simply client computers connected to the Internet. All client
machines act autonomously to join or leave the system freely. There is no master-slave
relationship exists among the peers. No central coordination or central database is needed, no
peer machine has a global view of the entire P2P system. The system is self-organizing with
distributed control. The peers are totally unrelated. Each peer machine joins or leaves the P2P
network voluntarily. Only the participating peers form the physical network at any time. Unlike
the cluster or grid, a P2P network does not use a dedicated interconnection network.

P2P system challenges


P2P computing faces three types of heterogeneity problems in hardware, software, and network
requirements. Lack of trust among peers poses another problem. There are too many hardware
models and architectures to select from; incompatibility exists between software and the OS.

High-Performance Computing For many years, HPC systems emphasize the raw speed
performance. The speed of HPC systems has increased from Gflops in the early 1990s to now
Pflops in 2010. This improvement was driven mainly by the demands from scientific,
engineering, and manufacturing communities. HPC systems are usually implemented using
clusters.

Cluster:

A computing cluster consists of interconnected stand-alone computers which work


cooperatively as a single integrated computing resource.
Cluster Architecture

Architecture of a typical server cluster built around a low-latency, high bandwidth


interconnection network. This network can be as simple as a SAN (e.g., Myrinet) or a LAN
(e.g., Ethernet). To build a larger cluster with more nodes, the interconnection network can be
built with multiple levels of Gigabit Ethernet, Myrinet, or InfiniBand switches. Through
hierarchical construction using a SAN, LAN, or WAN, one can build scalable clusters with an
increasing number of nodes. The cluster is connected to the Internet via a virtual private
network (VPN) gateway. The gateway IP address locates the cluster. The system image of a
computer is decided by the way the OS manages the shared cluster resources. Most clusters
have loosely coupled node computers. All resources of a server node are managed by their own
OS.
MODULE 2

Basic cluster architecture

 A simple cluster of computers built with commodity components and fully supported with
desired SSI features and HA capability.
 The processing nodes are commodity workstations, PCs, or servers.
 These commodity nodes are easy to replace or upgrade with new generations of hardware.
 The node operating systems should be designed for multiuser, multitasking, and
multithreaded applications.
 The nodes are interconnected by one or more fast commodity networks.
 These networks use standard communication protocols and operate at a speed that should
be two orders of magnitude faster than that of the current TCP/IP speed over Ethernet.
 The network interface card is connected to the node’s standard I/O bus (e.g., PCI).
 When the processor or the operating system is changed, only the driver software needs to
change.
 The desire to have a platform-independent cluster operating system, sitting on top of the
node platforms.
 A cluster middleware can be used to glue together all node platforms at the user space. An
availability middleware offers HA services.
 An SSI layer provides a single entry point, a single file hierarchy, a single point of control.
Design Principles Of Computer Clusters
Single-System Image Features:

 SSI does not mean a single copy of an operating system image residing in memory, as in
an SMP or a workstation.
 Single system:
 The entire cluster is viewed by users as one system that has multiple processors.
 The user could say, “Execute my application using five processors.” This is different from
a distributed system.
 Single control:
 Logically, an end user or system user utilizes services from one place with a single
interface.
Single Entry Point:

 The single-entry point enables users to log in (e.g., through Telnet, rlogin, or HTTP) to
a cluster as one virtual host, although the cluster may have multiple physical host nodes
to serve the login sessions.
 The system transparently distributes the user’s login and connection requests to
different physical hosts to balance the load.
 Clusters could substitute for mainframes and supercomputers.
 Example: Four nodes of a cluster are used as host nodes to receive users’ login requests.
Single File Hierarchy:

 Single system
 There is just one file hierarchy from the user’s viewpoint.
 Symmetry
 A user can access the global storage (e.g., /scratch) using a cluster service from any
node.
 In other words, all file services and functionalities are symmetric to all nodes and all
users, except those protected by access rights.

Single I/O, Networking, and Memory Space:

Single Networking:

 A properly designed cluster should behave as one system (the shaded area). In other
words, it is like a big workstation with four network connections and four I/O
devices.

Single Point of Control:

 The system administrator should be able to configure, monitor, test, and control the
entire cluster and each individual node from a single point.
 Many clusters help with this through a system console that is connected to all nodes
of the cluster.
Single Memory Space:

 Single memory space gives users the illusion of a big, centralized main memory,
which in reality may be a set of distributed local memory spaces.

Single I/O Address Space:

 The cluster is used as a web server.


 The web information database is distributed between the two RAIDs.
 An HTTP daemon is started on each node to handle web requests, which come from
all four network connections.

Cluster Family Classification


Compute clusters

 These are clusters designed mainly for collective computation over a single large job.
 A good example is a cluster dedicated to numerical simulation of weather conditions.
 The compute clusters do not handle many I/O operations, such as database services.
 When a single compute job requires frequent communication among the cluster nodes,
the cluster must share a dedicated network, and thus the nodes are mostly homogeneous
and tightly coupled.
 This type of clusters is also known as a Beowulf cluster.
High-Availability clusters

 HA (high-availability) clusters are designed to be fault-tolerant and achieve HA of


services.
 HA clusters operate with many redundant nodes to sustain faults or failures.
 The simplest HA cluster has only two nodes that can fail over to each other. Of course,
high redundancy provides higher availability.

Load-balancing clusters

 These clusters shoot for higher resource utilization through load balancing among all
participating nodes in the cluster.

 All nodes share the workload or function as a single virtual machine (VM).

 Requests initiated from the user are distributed to all node computers to form a cluster.

Fundamental Cluster Design Issues


 Scalable Performance This refers to the fact that scaling of resources (cluster nodes,
memory capacity, I/O bandwidth, etc.) leads to a proportional increase in performance.
Clustering is driven by scalability.
 Single-System Image (SSI) A set of workstations connected by an Ethernet network is not
necessarily a cluster. A cluster is a single system.
 Availability Support Clusters can provide cost-effective HA capability with lots of
redundancy in processors, memory, disks, I/O devices, networks, and operating system
images.
 Cluster Job Management Clusters try to achieve high system utilization from traditional
workstations or PC nodes that are normally not highly utilized. Job management software
is required to provide batching, load balancing, parallel processing, and other functionality.
 Internode Communication Because of their higher node complexity, cluster nodes cannot
be packaged as compactly as MPP nodes. The internode physical wire lengths are longer in
a cluster than in an MPP. This is true even for centralized clusters. A long wire implies
greater interconnect network latency.
 Fault Tolerance and Recovery Clusters of machines can be designed to eliminate all
single points of failure. Through redundancy, a cluster can tolerate faulty conditions up to
a certain extent. Heartbeat mechanisms can be installed to monitor the running condition
of all nodes.

Resource Sharing in Clusters/ scenario’s in which resource management is


critical
Shared nothing architecture:
The shared-nothing architecture is used in most clusters, where the nodes are connected
through the I/O bus. The shared-nothing configuration simply connects two or more
autonomous computers via a LAN such as Ethernet.

Shared disk architecture:

The shared-disk architecture is in favor of small-scale availability clusters in business


applications. When one node fails, the other node takes over. A shared-disk cluster is This
is what most business clusters desire so that they can enable recovery support in case of
node failure. The shared disk can hold checkpoint files or critical system images to enhance
cluster availability.
Shared-memory cluster

Shared-memory cluster is much more difficult to realize. The nodes could be connected by
a scalable coherence interface (SCI) ring, which is connected to the memory bus of each
node through an NIC module. In the other two architectures, the interconnect is attached to
the I/O bus. The memory bus operates at a higher frequency than the I/O bus.

GPU Clusters for Massive Parallelism


SSI and HA features in a cluster are not obtained free of charge. They must be supported by
hardware, software, middleware, or OS extensions. Any change in hardware design and OS
extensions must be done by the manufacturer. The hardware and OS support could be cost
prohibitive to ordinary users. However, programming level is a big burden to cluster users.
Therefore, the middleware support at the application level costs the least to implement. Close
to the user application end, middleware packages are needed at the cluster management level:
one for fault management to support failover and failback, Another desired feature is to achieve
HA using failure detection and recovery and packet switching. In the middle of need to modify
the Linux OS to support HA, and need special drivers to support HA, I/O, and hardware
devices. Toward the bottom, need special hardware to support hot-swapped devices and provide
router interfaces.

Relationship among clustering middleware at the job management,


programming, and implementation levels.

• Single job management system All cluster jobs can be submitted from any node to a single
job management system.

• Single user interface The users use the cluster through a single graphical interface. Such an
interface is available for workstations and PCs. A good direction to take in developing a cluster
GUI is to utilize web technology.

• Single process space All user processes created on various nodes form a single process space
and share a uniform process identification scheme. A process on any node can create (e.g.,
through a UNIX fork) or communicate with (e.g., through signals, pipes, etc.) processes on
remote nodes.

• Middleware support for SSI clustering various SSI features are supported by middleware
developed at three cluster application levels:
• Management level This level handles user applications and provides a job
management system such as GLUnix, MOSIX, Load Sharing Facility (LSF), or Codine.
• Programming level This level provides single file hierarchy (NFS, xFS, AFS, Proxy)
and distributed shared memory (TreadMark, Wind Tunnel).

 Implementation level This level supports a single process space, checkpointing, process
migration, and a single I/O space. These features must interface with the cluster hardware
and OS platform.

Centralized vs decentralized
 In a centralized cluster, all the nodes are owned, controlled, managed, and administered
by a central operator.
 In a decentralized cluster, the nodes have individual owners.

Exposed vs Enclosed
 Intra-cluster communication can be either exposed or enclosed.
 In an exposed cluster, the communication paths among the nodes are exposed to the
outside world.
 An outside machine can access the communication paths, and thus individual nodes,
using standard protocols (e.g., TCP/IP).
 Such exposed clusters are easy to implement, but have several disadvantages:
o Being exposed, intra-cluster communication is not secure, unless the
communication subsystem performs additional work to ensure privacy and
security.
o Outside communications may disrupt intra-cluster communications in an
unpredictable fashion.
o Standard communication protocols tend to have high overhead.
 Dedicated versus Enterprise Clusters
 A dedicated cluster is typically installed in a deskside rack in a central computer room.
It is homogeneously configured with the same type of computer nodes and managed by
a single administrator group like a frontend host.
 An enterprise cluster is mainly used to utilize idle resources in the nodes. Each node is
usually a full-fledged SMP, workstation, or PC, with all the necessary peripherals
attached. The nodes are typically geographically distributed, and are not necessarily in
the same room or even in the same building. The nodes are individually owned by
multiple owners.

Fault-Tolerant Cluster Configurations


 Hot standby server clusters In a hot standby cluster, only the primary node is actively
doing all the useful work normally. The standby node is powered on (hot) and running
some monitoring programs to communicate heartbeat signals to check the status of the
primary node, but is not actively running other useful workloads.

 Active-takeover clusters In this case, the architecture is symmetric among multiple


server nodes. Both servers are primary, doing useful work normally. Both failover and
failback are often supported on both server nodes. When a node fails, the user
applications fail over to the available node in the cluster.

 Failover cluster This is probably the most important feature demanded in current
clusters for commercial applications. When a component fails, this technique allows
the remaining system to take over the services originally provided by the failed
component. A failover mechanism must provide several functions, such as failure
diagnosis, failure notification, and failure recovery. Failure diagnosis refers to the
detection of a failure and the location of the failed component that caused the failure. A
commonly used technique is heartbeat, whereby the cluster nodes send out a stream of
heartbeat messages to one another. If the system does not receive the stream of heartbeat
messages from a node, it can conclude that either the node or the network connection
has failed.

MODULE 3
Virtualization levels

Instruction Set Architecture Level


With this approach, it is possible to run a large amount of legacy binary code writ ten for
various processors on any given new hardware host machine. Instruction set emulation leads
to virtual ISAs created on any hardware machine. code interpretation. An interpreter program
interprets the source instructions to target instructions one by one. Dynamic binary
translation in this approach translates basic blocks of dynamic source instructions to target
instructions.

Hardware Abstraction Level


Hardware-level virtualization is performed right on top of the bare hardware. On the one
hand, this approach generates a virtual hardware environment for a VM. The idea is to
virtualize a computer’s resources, such as its processors, memory, and I/O devices. The
intention is to upgrade the hardware utilization rate by multiple users concurrently.

Operating System Level


It refers to an abstraction layer between traditional OS and user applications. OS-level
virtualization creates isolated containers on a single physical server and the OS instances to
utilize the hardware and software in data centers. The containers behave like real servers.
Library Support Level
Most applications use APIs exported by user-level libraries rather than using lengthy system
calls by the OS. Since most systems provide well-documented APIs, such an interface
becomes another candidate for virtualization. Virtualization with library interfaces is possible
by controlling the communication link between applications and the rest of a system through
API hooks.

User-Application Level
Virtualization at the application level virtualizes an application as a VM. On a traditional OS,
an application often runs as a process. Therefore, application-level virtualization is also
known as process-level virtualization. The most popular approach is to deploy high level
language (HLL) VMs. In this scenario, the virtualization layer sits as an application program
on top of the operating system, and the layer exports an abstraction of a VM that can run
programs written and compiled to a particular abstract machine definition.

vCUDA

CUDA is a programming model and library for general-purpose GPUs. The vCUDA employs
a client-server model to implement CUDA virtualization. It consists of three user space
components: the vCUDA library, a virtual GPU in the guest OS (which acts as a client), and
the vCUDA stub in the host OS (which acts as a server). The vCUDA library resides in the
guest OS as a substitute for the standard CUDA library. It is responsible for intercepting and
redirecting API calls from the client to the stub. vCUDA also creates vGPUs and manages
them.

The Cluster-on-Demand (COD) Project at Duke University


The COD partitions a physical cluster into multiple virtual clusters (vClusters). . The COD
servers are backed by a configuration database. This system provides resource policies and
template definition in response to user requests. The vClusters run a batch schedule.

Live VM migration:
Steps 0 and 1: Start migration. This step makes preparations for the migration, including
determining the migrating VM and the destination host. Although users could manually make
a VM migrate to an appointed host, in most circumstances, the migration is automatically
started by strategies such as load balancing and server consolidation.

Steps 2: Transfer memory. Since the whole execution state of the VM is stored in memory,
sending the VM’s memory to the destination node ensures continuity of the service provided
by the VM. All of the memory data is transferred in the first round, and then the migration
controller recopies the memory data which is changed in the last round. These steps keep
iterating until the dirty portion of the memory is small enough to handle the final copy.
Although pre-copying memory is performed iteratively, the execution of programs is not
obviously interrupted.

Step 3: Suspend the VM and copy the last portion of the data. The migrating VM’s
execution is suspended when the last round’s memory data is transferred. Other nonmemory
data such as CPU and network states should be sent as well. During this step, the VM is stopped
and its applications will no longer run. This “service unavailable” time is called the “downtime”
of migration, which should be as short as possible so that it can be negligible to users.

Steps 4 and 5: Commit and activate the new host. After all the needed data is copied, on the
destination host, the VM reloads the states and recovers the execution of programs in it, and
the service provided by this VM continues. Then the network connection is redirected to the
new VM and the dependency to the source host is cleared. The whole migration process finishes
by removing the original VM from the source host.
Xen Architecture

Xen is an open-source hypervisor program developed by Cambridge University. Xen is a micro


kernel hypervisor, which separates the policy from the mechanism. The Xen hypervisor
implements all the mechanisms, leaving the policy to be handled by Domain 0. It just provides
a mechanism by which a guest OS can have direct access to the physical devices. The core
components of a Xen system are the hypervisor, kernel, and applications. The guest OS, which
has control ability, is called Domain 0, and the others are called Domain U. Domain 0 is a
privileged guest OS of Xen. It is first loaded when Xen boots without any file system drivers
being available. Domain 0 is designed to access hardware directly and manage devices. One of
the responsibilities of Domain 0 is to allocate and map hardware resources for the guest
domains (the Domain U domains).

Physical versus Virtual Clusters


Virtual clusters are built with VMs installed at distributed servers from one or more physical
clusters. The VMs in a virtual cluster are interconnected logically by a virtual network across
several physical networks.

VM’s have following properties:


• The virtual cluster nodes can be either physical or virtual machines. Multiple VMs running
with different OSes can be deployed on the same physical node.
• A VM runs with a guest OS, which is often different from the host OS, that manages the
resources in the physical machine, where the VM is implemented.
• The purpose of using VMs is to consolidate multiple functionalities on the same server. This
will greatly enhance server utilization and application flexibility.
• VMs can be colonized (replicated) in multiple servers for the purpose of promoting
distributed parallelism, fault tolerance, and disaster recovery.
• The size (number of nodes) of a virtual cluster can grow or shrink dynamically
• The failure of any physical nodes may disable some VMs installed on the failing nodes. But
the failure of VMs will not pull down the host system.

Advantages of OS level virtualization:


Compared to hardware-level virtualization, the benefits of OS extensions are twofold:

 VMs at the operating system level have minimal startup/shutdown costs, low resource
requirements, and high scalability and
 OS-level VM, it is possible for a VM and its host environment to synchronize state
changes when necessary.
 These benefits can be achieved via two mechanisms of OS-level virtualization:
 All OS-level VMs on the same physical machine share a single operating system
kernel.
 The virtualization layer can be designed in a way that allows processes in VMs to
access as many resources of the host machine as possible, but never to modify them.

Disadvantage of OS level virtualization:


 All the VMs at operating system level on a single container must have the same kind
of guest operating system.
 The access requests from a VM need to be redirected to the VM’s local resource
partition on the physical machine.
 There are two ways to implement virtual root directories.
 Duplicating common resources to each VM partition; or sharing most resources with
the host environment and only creating private resource copies on the VM on demand.
 The first way incurs significant resource costs and overhead on a physical machine.
 This issue neutralizes the benefits of OS-level virtualization, compared with hardware-
assisted virtualization.
 Therefore, OS-level virtualization is often a second choice.
MODULE 4

Public, Private and hybrid clouds


Public cloud
A public cloud is built over the Internet and can be accessed by any user who has paid for the
service. Public clouds are owned by service providers and are accessible through a subscription.
Many public clouds are available, including Google App Engine (GAE), Amazon Web Services
(AWS), Microsoft Azure, IBM Blue Cloud, and Salesforce.com’s Force.com. The providers of
the aforementioned clouds are commercial providers that offer a publicly accessible remote
interface for creating and managing VM instances within their proprietary infrastructure.

The callout box in top of shows the architecture of a typical public cloud.

The hybrid cloud is shown in the left corner of the figure.

Private Clouds:
A private cloud is built within the domain of an intranet owned by a single organization.
Therefore, it is client owned and managed, and its access is limited to the owning clients and
their partners. A private cloud is supposed to deliver more efficient and convenient cloud
services. It may impact the cloud standardization, while retaining greater customization and
organizational control.

Hybrid Clouds:

A hybrid cloud is built with both public and private clouds. A hybrid cloud provides access to
clients, the partner network, and third parties. In summary, public clouds promote
standardization and offer application flexibility. Private clouds attempt to achieve
customization and offer higher efficiency, resiliency, security, and privacy. Hybrid clouds
operate in the middle, with many compromises in terms of resource sharing.

Service models:

Cloud computing delivers infrastructure, platform, and software (application) as services,


which are made available as subscription-based services in a pay-as-you-go model to
consumers.

Infrastructure as a Service:
This model allows users to use virtualized IT resources for computing, storage, and networking.
In short, the service is performed by rented cloud infrastructure. The user can deploy and run
his applications over his chosen OS environment. The user does not manage or control the
underlying cloud infrastructure, but has control over the OS, storage, deployed applications,
and possibly select networking components. This IaaS model encompasses storage as a service,
compute instances as a service, and communication as a service.

Platform-as-a-service

To be able to develop, deploy, and manage the execution of applications using provisioned
resources demands a cloud platform with the proper software environment. Such a platform
includes operating system and runtime library support. The platform cloud is an integrated
computer system consisting of both hardware and software infrastructure. The user application
can be developed on this virtualized cloud platform using some programming languages and
software tools supported by the provider (e.g., Java, Python, .NET). The user does not manage
the underlying cloud infrastructure. The cloud provider supports user application development
and testing on a well-defined service platform.

Software as a Service (SaaS)

This refers to browser-initiated application software over thousands of cloud customers.


Services and tools offered by PaaS are utilized in construction of applications and management
of their deployment on resources offered by IaaS providers. The SaaS model provides software
applications as a service. As a result, on the customer side, there is no upfront investment in
servers or software licensing. On the provider side, costs are kept rather low, compared with
conventional hosting of user applications.

The best examples of SaaS services include google workspace, Microsoft SharePoint, and the
CRM software from Salesforce.com.

Amazon Web Services (AWS)

Amazon has been a leader in providing public cloud services (http://aws.amazon.com/).


Amazon applies the IaaS model in providing its services. EC2 provides the virtualized
platforms to the host VMs where the cloud application can run. S3 (Simple Storage Service)
provides the object-oriented storage service for users. EBS (Elastic Block Service) provides
the block storage interface which can be used to support traditional applications. SQS stands
for Simple Queue Service, and its job is to ensure a reliable message service between two
processes.
Cloud Design Objectives:
 Shifting computing from desktops to data centers: Computer processing, storage,
and software delivery is shifted away from desktops and local servers and toward data-
centers over the Internet.
 Service provisioning and cloud economics: Providers supply cloud services by
signing SLAs with consumers and end users. The services must be efficient in terms of
computing, storage, and power consumption. Pricing is based on a pay-as-you-go
policy.
 Scalability in performance: The cloud platforms and software and infrastructure
services must be able to scale in performance as the number of users increases.
 Data privacy protection: Can you trust data-centers to handle your private data and
records? This concern must be addressed to make clouds successful as trusted services.
 High quality of cloud services: The QoS of cloud computing must be standardized to
make clouds interoperable among multiple providers. •
 New standards and interfaces: This refers to solving the data lock-in problem
associated with data-centers or cloud providers. Universally accepted APIs and access
protocols are needed to provide high portability and flexibility of virtualized
applications.

Data-Center management issues


• Making common users happy The data center should be designed to provide quality service
to the majority of users for at least 30 years.

• Controlled information flow Information flow should be streamlined. Sustained services


and high availability (HA) are the primary goals.
• Multiuser manageability The system must be managed to support all functions of a data
center, including traffic flow, database updating, and server maintenance.
• Scalability to prepare for database growth The system should allow growth as workload
increases. The storage, processing, I/O, power, and cooling subsystems should be scalable.

• Reliability in virtualized infrastructure Failover, fault tolerance, and VM live migration


should be integrated to enable recovery of critical applications from failures or disasters.

• Low cost to both users and providers The cost to users and providers of the cloud system
built over the data centers should be reduced, including all operational costs.

• Security enforcement and data protection Data privacy and security defense mechanisms
must be deployed to protect the data center against network attacks and system interrupts and
to maintain data integrity from user abuses or network attacks.

• Green information technology Saving power consumption and upgrading energy efficiency
are in high demand when designing and operating current and future data centers.
MODULE 5

Simple REST interaction between user and server in HTTP specification

 REST is a software architecture style for distributed systems, particularly distributed


hypermedia systems, such as the World Wide Web.
 It has recently gained popularity among enterprises such as Google, Amazon, Yahoo!.
 Resource Identification through URIs: The RESTful web service exposes a set of
resources which identify targets of interaction with its clients.
 The key abstraction of information in REST Resource Identification through URIs
is a resource.
 Any information that can be named can be a resource, such as a document or image
or a temporal service.
 A resource is a conceptual mapping to a set of entities.
 Each particular resource is identified by a unique name, or more precisely, a
Uniform Resource Identifier (URI) which is of type URL, providing a global
addressing space for resources.
 Uniform, Constrained Interface: Interaction with RESTful web services is done via
the HTTP standard, client/server cacheable protocol.
 Resources are manipulated using a fixed set of four CRUD (create, read, update,
delete) verbs or operations: PUT, GET, POST, and DELETE.
 Self-Descriptive Message: A REST message includes enough information to describe
how to process the message.
 This enables intermediaries to do more with the message without parsing the
message contents.
 Stateless Interactions: The REST interactions are stateless.
 Stateless interaction treats each message independently.
 Stateless communications improve visibility.
OGSA architecture

OGSA is intended to:

• Facilitate use and management of resources across distributed, heterogeneous


environments • Deliver seamless QoS

• Define open, published interfaces in order to provide interoperability of diverse


resources

• Exploit industry-standard integration technologies

• Develop standards that achieve interoperability.

• Integrate, virtualize, and manage services and resources in a distributed,


heterogeneous environment

• Deliver functionality as loosely coupled, interacting services aligned with industry-


accepted web service standards.

 Based on OGSA, a grid is built from a small number of standards-based components,


called grid services.
 A grid service implements one or more interfaces, where each interface defines a set of
operations that are invoked by exchanging a defined sequence of messages, based on
the Open Grid Services Infrastructure (OGSI).
 OGSA services fall into seven broad areas, defined in terms of capabilities frequently
required in a grid scenario. These services are summarized as follows:
 Infrastructure Services Refer to a set of common functionalities.
 Execution Management Services Concerned with issues such as starting and
managing tasks.
 Data Management Services Provide functionality to move data to where it is needed.
 Resource Management Services Provide management capabilities for grid resources:
management of the resources themselves.
 Security Services Facilitate the enforcement of security-related policies within a
(virtual) organization.
 Information Services Provide efficient production of, and access to, information about
the grid.
 Self-Management Services Support service-level attainment for a set of services.

Three-tier architecture

Enterprise applications often use multitier architecture to encapsulate and integrate various
functionalities. Multitier architecture is a kind of client/server architecture in which the
presentation, the application processing, and the data management are logically separate
processes. The traditional two-tier, client/server model requires clustering and disaster recovery
to ensure resiliency. While the use of fewer nodes in an enterprise simplifies manageability,
change management is difficult as it requires servers to be taken offline for repair, upgrading,
and new application deployments. The business logic and data can be shared by both automated
and GUI clients. The only differences are the nature of the client and the presentation layer of
the middle tier. Separating business logic from data access enables database independence.

• Presentation layer: Presents information to external entities and allows them to interact with
the system by submitting operations and getting responses.

• Business/application logic layer or middleware: Programs that implement the actual


operations requested by the client through the presentation layer. The middle tier can also
control user authentication and access to resources, as well as performing some of the query
processing for the client, thus removing some of the load from the database servers.

• Resource management layer: Also known as the data layer, deals with and implements the
different data sources of an information system.

Data organisation in UDDI registry / Simple web service interaction among


provider, user and UDDI registry

UDDI specifications define a way to describe, publish, and discover information about web
services by creating a platform-independent, open framework. The UDDI specification is
focused on the definition of a collection of services supporting the description and discovery
of: Businesses, organizations, and other web services providers, the web services they make
available, and the technical interfaces which may be used to access those services. There are
two primary types of registries. A public registry is a logically centralized distributed service
that replicates data with other public registries on a regular basis. A private registry is only
accessible within a single organization or is shared by a group of business partners for a special
purpose. Data in a UDDI registry is organized as instance types:

• businessEntity Describes an organization or a business that provides the web services,


including the company name, contact information, industry/product/geographic classification,
and so on

• businessService Describes a collection of related instances of web services offered by an


organization, such as the name of the service, a description, and so forth

• bindingTemplate Describes the technical information necessary to use a particular web


service, such as the URL address to access the web service instance and references to its
description • tModel A generic container for specification of WSDL documents in general web
services • publisherAssertion Defines a relationship between two or more businessEntity
elements

• subscription A standing request to keep track of changes to the entities in the subscription.

Basic types of API operations are applied to UDDI components:


• UDDI Inquiry API In order to find the set of registry entries such as business, service,
binding, or tMode, details matching a particular search criterion (find_) or details of an entry
corresponding to a given UDDI key (get_) operation can be used.

• UDDI Publishers API This enables add, modify, and delete entries by providing save_ and
delete_ operations. In addition to the aforementioned look-up APIs, UDDI also defines general
purpose operation types, such as the next 4 specialized APIs.

• UDDI Security API Allows users to get and discard authentication tokens (get_autToken,
discard_autToken)

• UDDI Custody and Ownership Transfer API Enables registries to transfer the custody of
information among themselves and to transfer ownership of these structures one another
(transfer_entities, transfer_custody)

• UDDI Subscription API Enables monitoring of changes in a registry by subscribing to track


new, modified, and deleted entries (delete_subscription, get_subscriptionResults,
get_subscriptions, save_subscriptions)

• UDDI Replication API Supports replication of information between registries so that


different registries can be kept synchronize

You might also like