CC Unit4 Final
CC Unit4 Final
Cloud Programming and Software Environments: Features of Cloud and Grid Platforms, Parallel
& Distributed programming Paradigms, Programming Support of Google App Engine,
Programming on Amazon AWS and Microsoft Azure, Emerging Cloud Software Environments.
The commercial clouds require huge number of capabilities. They provide cost effective utility
computing with flexibility. However, all the extra capabilities together termed as “Platform as a
Service”. The present platform features for the Azure are table, queues, web and worker roles,
database SQL, blobs. Even Amazon is viewed as offering just “Infrastructure as a Service”, its
platform features including SimpleDB, queues, notifications, monitoring, content delivery
network, relational database and map reduce. The capabilities of cloud platform are listed in the
below table.
Capability Description
Physical or virtual computing platform The cloud environment incorporates both physical as
well as virtual platform. The virtual platforms posses
the unique capabilities in order to isolate environments
for various applications and users
Massive data storage service, The cloud data storage services provide large disk
distributed file system capacity and service interface that allow users to put and
get data. The distributed system offers massive data
storage and it can provide interface as local file systems.
Massive database storage service The clouds require this service similar to DBMS so that
the developers can store the data in semantic way.
Massive data processing method and The infrastructure of cloud provides several computing
programming model nodes for simple applications also. So the programmers
must manage the issues such as network failure or
scaling of running code etc in order to use all the
services provided by platform
Workflow and data query language The programming model offers abstraction of the cloud
support infrastructure. The providers have built workflow
language and data query language to support for better
application logic.
Programming interface and services Cloud applications need web interfaces or special API’s
deployment such as J2EE, PHP, ASP or Rails. They can make use
of Ajax technologies for improving user experience
while using web browsers for function access
Runtime support The runtime support is transparent for the users as well
as applications. It incorporates the distributed
monitoring services, distributed task scheduler,
distributed locking etc.
Support services Various important services are data and computing
service.
The infrastructure cloud features as illustrated in below table:
Traditional features in cluster, grid and parallel computing environments are given below:
Cluster management Clusters are developed using the tools provided by ROCKS
and packages
Data management A metadata support called RDF triple stores is provided. In
addition to these SQL and NoSQL are provided
Portals It is also termed as gateways that has transformation in
technology from portlets to HUB zero
Virtual organizations Organizations range from specialized grid solutions to
popular web 2.0 such as Facebook
Grid programming environments The programming environment differs from link together
services as in open grid services architecture to grid RPC
and SAGA
Open MP/Threading It incorporates parallel compilers like Click and roughly
shared memory technologies in addition to transactional
memory and fine grained data flow
The platform features supported by clouds and grids area s follows:
Various features that are common to grids and clouds are as follows:
1. Workflows: In US and Europe the workflow has created various projects such as pegasus,
Taverna and Kepler. There are commercial systems such as Pipeline, Pilot, AVS and LIMS
environment. Trident from Microsoft is the latest built on top of Windows Workflow
Foundation. So it will run the workflow proxy services on external environments if working on
Azure or Windows.
2. Data Transport: The data transfer is the major issue in commercial clouds. If commercial
clouds become an important component of national cyberinfrastructure then we can expect high
bandwidth links Teragrid. The cloud data can be structured into tables and blocks in order to
make the high performance parallel algorithms in addition to HTTP mechanisms for data transfer
among academic systems/Teragrid and commercial clouds.
3. Security, Privacy and Availability: These techniques related to security, privacy and
availability used for developing good and dependable cloud programming environment are
illustrated as follows,
Using virtual clustering for achieving dynamic resources support at less overhead cost
Using special API’s for user authentication and email sending through commercial
accounts.
Accessing cloud resources using security protocols like HTTPS and SSL
Using stable and persistent data storage through quick queries for data access
Including the features for improving availability and disaster recovery with file migration
of VM’s.
Using fine gained access control for protecting the data integrity and deterring intruders
and hackers
Protecting shared data form Malicious alternation, deletion of copy right Molations
Using popular systems for protecting data centres. This system only authorizes trusted
clients and stops the pirates.
1. Program Library: An attempt is made for developing a VM image library for managing the
images to be used in academic and commercial clouds.
2. Blobs and Drives: The containers of Azure are responsible for arranging the storage in clouds
such as blobs for Azure and S3 for Amazon. The users are allowed to attach directly for
computing instances like Azure drives and Elastic Book Store for Amazon. The cloud storage is
found to be fault tolerant when tera grid require backup storage.
3. DPFS: It supports the file systems like Google File System (MapReduce), HDFS(Hadoop)
and Cosmos(Dryad) with the features of optimized compute data affinity for processing the data.
The DPFS can linked with blob and drives based architecture but it is better to use as an
application centric storage model with optimized compute data affinity. With this, the blob and
drives must be used as repository centric view. The DPFS file systems are developed for
executing the data intensive applications efficiently.
5. Table and NoSQL Non-related Databases: A large number of developments took place
related to a simplified database structure called “NoSQL”. This emphasizes distribution and
scalability. The clouds like Big Table in Google and simpleDB in Amazon and Azure Table in
Azure make the use of it. Non-relational databases are used in many terms of triple stores
depending upon the MapReduce and tables or Hadoop File system with good success.
The tables of the cloud can be classified into Azure table and Amazon simpleDB which support
lightweight storage for document stores. They are found to be schema free and will soon gain
importance in scientific computing.
6. Queuing Services: The Amazon as well as Azure provide the robust and scalable queuing
services for the components to interact with each other in an application. These messages are
short and contain a REST (Representational State Transfer) interface which has deliver at least
once semantics. Time outs are used for controlling them in order to post the amount of
processing time assigned for the client.
The programming and runtime support after parallel programming and runtime support of major
functions in grids and clouds.
1. Worker and Web Roles: Azure provides roles for facilitating nontrivial functionality and
also for preserving better affinity support in non-virtualized environment. They are the
schedulable processes that can be launched automatically. Queues are considered to be complex
since they offer natural method for assigning tasks in fault-tolerant and distributed fashion. The
web roles offer a significant method for the portal.
2. MapReduce: The parallel languages are found to have great interest in loosely coupled
computations that execute among the data samples. The grid applications are provided with
efficient execution by language and runtime. The map reduce is found to be more advantageous
than traditional implementations of the task problems. This is because it supports dynamic
execution, strong fault tolerance and easy to use high level interface. Hadoop and Dryad are the
map reduce implementations that can be executed with or without VMs. Hadoop is provided by
Amazon and Dryad is to be available on Azure.
3. Cloud Programming Models: The GAE and Manjrasoft Aneka environment are the two
basic programming models that are applied on clouds. But these models are not specific to this
architecture model. Iterative MapReduce is an interesting programming model that offers
portability between cloud, HPC and cluster environments.
4. SaaS: Services are used similarly in both commercial clouds and latest distributed systems.
The users can package their programs as required. Hence, SaaS services can be enabled without
any additional support because of this reason. SaaS environment is expected to provide various
M. Purnachandra Rao, Assoc. Prof. Dept. of IT, KITS. Page 4
useful tools for developing the cloud applications over the huge datasets. In addition to this
various protection features are also offered by SaaS for achieving scalability, security, privacy
and availability.
The distributed and parallel programs are assumed as the parallel programs running on set of
computing engines or distributed computing systems. The distributed computing denotes the
computational engines interconnected in a network intended to run a job or application. The
parallel computing denotes the usage of one or more computational engine intended to run a
job or application. The parallel programs are allowed to be run on distributed computing
systems. But it has certain issues described below
i) Computation Partitioning: The given program or job is divided into various tasks
depending upon the portion identification that is capable for concurrent transaction. Various
parts of a program can process different data or share same data.
ii) Data Partitioning: The input or intermediate data is divided into various partitions that
can be processed on various workers. A copy of the program or various parts of it is
responsible for processing the pieces of data.
2. Mapping: The process of assigning parts of program or data pieces to the respective
resources is called mapping. It is handled by the system resources allocators.
5. Scheduling: A scheduler is responsible for picking a set of jobs or programs and for
running them on distributed computing system. It is required when the resources are not
sufficient to run various jobs or programs simultaneously. Scheduling is done based on the
scheduling policy.
These models provide an abstraction layer to the users in order to hide the implementation
details of data flow that requires the users to code. An important metric to be considered is
simple coding for parallel programming with respect to parallel and distributed programming
paradigms. Motivation behind parallel and distributed programming model is as follows:
The data flow in a program is manipulated through these functions. The below figure
illustrates the flow of data from Map to Reduce.
In the above figure the abstraction layer abstracts the data flow steps such as partitioning,
mapping, synchronization, communication and scheduling to the users. The Map and Reduce
functions can be overridden by the user in order to achieve their respective goals. These
functions can be passed with the required parameters such as spec and results etc. The
program structure containing Map and Reduce subroutine is illustrated below:
Reduce function(….){…………..}
Main function(….)
{
Initialize spec object
-
-
MapReduce(spec & results)
}
The input data to Map function is a (key. Value) pair where key indicates the line offset in a
input file and value is the line content. The output returned from Map function is also a (key,
value) pair called as intermediate pair. The Reduce function is responsible for receiving the
intermediate (key, value) pairs as a set of values as (key, [set of values]) by sorting and
grouping the same value keys. It processes and generates a group of (key, value) pairs as
output. The formal notation of Map function is,
The various occurrences of same key are gathered and reduce function is applied on them to
produce another list.
The MapReduce framework is responsible for running the program on distributed computing
system efficiently. This process is detailed as follows:
1. Data Partitioning: The input data is retrieved from GFS and divided into M pieces by the
MapReduce library. These partitions correspond to number of map tasks.
3. Determining the Master and Workers: The architecture of MapReduce depends upon
master workers model. Here a copy of user programs becomes the master and the remaining
become the workers. The master is responsible for assigning the map and reduces tasks to
the idle workers. And the worker is responsible to run the map/reduce task through
Map/Reduce function execution.
4. Retrieving Input Data (Data Distribution): The respective input data is read by the
worker and sent to the map function after dividing it.
5. Map Function: The input data is retrieved by the map function in the form of (key, value)
pairs in order to process and produce the intermediate (key, value) pairs.
6. Combiner Function: This function is applied on (key, value) pairs and invoked in the user
program. It merges the local data of map worker and sends it on networks. With this the
communication cost decreases.
7. Partitioning Function: The intermediate (key, value) pairs are partitioned using
partitioning function. All the similar keys are stored in same region through hash function.
The data later sent to master which intone forward to the workers.
9. Communication: A remote procedure call is used by the reduce worker for reading the
data from the map workers. An all-to-all communication occurs between the map and reduce
workers giving rise to network congestion. For this purpose a data transfer module is
developed for scheduling the data transfer.
10. Sorting and Corresponding: A reduce worker decides the reading process of input data
and groups the intermediate (key, value) pairs by sorting the data according to the keys. All
the occurrences of similar keys are grouped and unique is produced.
Figure: Linking Map and Reduce workers through MapReduce Partitioning Functions.
The performance of any runtime requires to be checked and the MPI and MapReduce also
need to be compared. The communication and load imbalance are the important sources of
parallel overhead. The overhead of communication can be high in MapReduce because of
the following reasons.
The MapReduce performs read and write through files and MPI allows data transfer
between the nodes in the network.
MPI will not transfer the complete data rather it only does the required data for
updating. The MPI flow is called flow and MapReduce flow is called full data flow.
This phenomenon can be observed in all the classic parallel loosely synchronous applications
that show off the iteration off structure in the compute phases and communication phases.
The performance issues can deposited with below changes:
Transfer of data between the steps without expanding the steps internally to disk.
Usage of long running threads or processors for communicating flow.
The above changes give rise to increase in performance at the cost of fault tolerance and also
support dynamic changes like available nodes. The below figure depicts the twister
programming paradigms along with its architecture at run time. The twister illustrates the
difference of static data that can never be reloaded from dynamic flow that is communicated.
The pair of map and reduce is executed iteratively in thread that are long running. The below
figure shows the comparison of thread and process structures of parallel programming
paradigms like Hadoop, Dryad, Twister and MPI.
Figure: Four Parallel Programming Paradigms for Thread and Process structure
Yahoo Hadoop: It is used for short running processes communication through disk and
tracking process.
Microsoft Dryad: It is used for short running processes communication through pipes, disk
or shared memory between cores.
MPI: It is used for long running process with rendezvous for message exchange
synchronization.
MapReduce engine is the upper layer of the Hadoop. It is responsible for managing the data
flow and control flow of the MapReudce jobs in distributed computing systems. MapReduce
engine contains master/slave architecture with single Job Tracker (that acts as a master) and
several Task Trackers (which act as slaves). The MapReduce job is managed by the Job
Tracker over the cluster. It also monitors and assigns the jobs and tasks to the Task Trackers.
The Task Tracker is responsible for managing the map/reduce tasks execution on a
computation node within the cluster.
Every Task Tracker is assigned with various execution slots to execute map or reduce task.
A map task that is running on a slot will process a data block. A one-to-one correspondence is
found between the map task and the data block of the Data Node.
Components required to run a job in this system are user node, Job Tracker and a set of Task
Trackers. The function runJob(conf) is called to begin the data flow in the user program.
The conf is the parameter is an object for MapReduce framework and HDFs. The function is
similar to MapReduce(spec & Results)
Figure: Data flow in running MapReduce job at various task trackers using Hadoop Library
HDFS:
HDFS acts as a distributed file system and stores the data on distributed computed system
after organizing. Architecture of it contains a master and slave with single NameNode and
number of DataNodes respectively. The files are divided into fixed sized blocks by this
1. File Read: To perform read operation the user will send the “open” request to
NameNode for file block location. The response will be address of DataNode in
which replica data is stored. The addresses depend upon the block replicas. After this
read is performed to connect to the nearby DataNode. The connection will be
terminated after streaming the connection. The complete process will iterative until
the file is streamed completely to the user.
2. File Write: The user initially will send a create request to NameNode for new file
creation. Then the data is written to it using write function. The data queue which is
an internal queue first holds the first data block later it is written to DataNode while
the data streamer monitors it. At parallel even replicas of the data blocks are also
created accordingly.
4.3.1 Programming Google App Engine: The key features of GAE programming
model for languages such as Java and python is illustrated in the following figure
The GAE is allowed to debug on the local machine by the client environment that
contains eclipse plug-in for Java. The Java web applications developers are provided
with GWT (Google Web Toolkit). It can be used even for JavaScript or Rub. The
language python is used with frameworks like Django and CherryPy. Instead a webapp
python environment is provided by Google. The data is stored and accessed using
various constructs from the NOSQL data storage. The entities can be retrieved by queries
through filtering and sorting the values. The JDO (Java Data Object) and JPA(Java
Persistence API) interfaces are offered by Java and implemented by open source Data
Nucleus Access Platform. The python is provided with SQL – like query language called
GQL.
The applications are capable of executing various data store operations in one transaction
which either all succeed or fail together. The entities can be assigned to groups by GAE
application. Google appended a new feature blob store for heavy files.
The Google SDC (Secure Data Connection) can tunnel Using Internet and connect the
Intranet to external GAE application. The URL Fetch operation will make the
applications capable to fetch the resources and to interact with others on Internet through
HTTP and HTTPs requests. It also accesses the web resources through high speed
Google infrastructure to get the web pages for various products of Google.
GFS was designed as a storage service for Google’s search Engine. It was basically
designed to store and process huge amount of data needed by Google.
Google File System is a distributed file system that was developed to support Google
applications. The reason for employing GFS is that it is capable of holding a file of about
100MB. It basically partitions a file into fixed size segments called chunks. Each chunk
provides a data block of about 64KB. Besides this it also ensures reliability of data by
distributing replicate copies of data across multiple chunk servers.
It also allow multiple append operations concurrently. It makes use of a single master in
order to provide access to metadata and simultaneously stores the data. It provides an
M. Purnachandra Rao, Assoc. Prof. Dept. of IT, KITS. Page 12
accessing interface similar to POSIX file system. This feature allows application to view
the physical location of a file blocks. It also make use of customized API in order to
capture the append operation and also to record them.
The architecture includes only single Master for storing metadata in cluster. The different
nodes act as chunk servers. Each chunk server is responsible for storing data. A Master
is also responsible for managing file system namespace and locking facilities. It also
interacts with chunk server in order to obtain the management information from them and
also to instruct the chunk server to perform task like load balancing/fail recovery.
A single master is capable of managing the whole cluster. It (use of master) inhabits the
use of complicated distributed algorithms in GFS architecture design. Despite of this, the
inclusion of only single master may affect the performance of the system.
To overcome the performance bottle neck, shadow mater is employed by Google. This
master allow replication of data on master so as to ensure that data operation performed
between client and chunk server are directly done without any further interruptions.
Besides this, attached copy of control messages transferred between client and chunk
server is maintained for further use. Hence, these facilities allow single master to manage
a cluster containing 1,000 nodes. The figure illustrates the data mutation operations like
write and append in GFS
1. Initially the client queries the master about the chunk server which contains the
current chunk lease and also the location of the replicas.
2. If the chunk servers do not contain the chunk lease, then the master grants a single
lease to the chosen replica and replies the client with a primary identity and the
location of replica.
3. The client maintains the cached copy of data for future mutation
4. The client then forwards the data to other replicas. Chunk servers accept the
replicated data and maintain them in the internal LRU buffer cache. To improve the
performance of the system data flow is decoupled from control flow and also by
scheduling the expensive data flow depending on the network technology.
5. After receiving the acknowledgment from the replicas. The client forwards the write
request to the primary replica. The primary replica allots consecutive serial numbers
to all the mutations received from multiple clients and performs serialization on them.
It then applies mutations to all the local states in serial order.
6. The write requests are now forwarded to secondary replica by primary replica. The
secondary replica now applies mutations in the order similar to primary.
7. The secondary replicas now reply the primary about the completion of operation.
8. The primary replica intone replies to the client with all the errors that are encountered
during mutation process. The code generated by client helps recovering from errors
and also in retrying failed mutation.
A GFS allow users to perform append operation. This operation allows users to append
data block at the end of file. GFS offers fast recover capability to recover from various
system error. Besides this, it also ensures,
1. High availability
2. High performance
3. High fault tolerance and
4. High scalability
Amazon is the company that started for generating VMs in application hosting. The
customers rather than using the physical machines to run the applications will opt to rent
VMs. With VMs the customers are allowed to load any desired software. In addition to this
they can also create, launch and even terminate the server instance by paying for them.
Various types of VMs are provided by the Amazon. The instance are called as Amazon
Machine Images (AMIs) preconfigured with Linux or windows in addition to the software.
There are three types of AMIs defined as follows:
1. Private AMI: The images that are created by the users are called Private AMIs. They are
private by default and can be allowed to be launched by others.
2. Public AMI: The images that are created by users and released for AWS community by
allowing others to launch and use the instances are called public AMI
The below figure shows the execution environment where AMIs act as the templates for
instance running VMs. All the public, private and paid AMIs supports this environment.
Amazon S3 (Simple Storage Services) aims to provide an interface with simple web service
for storing and retrieving the data on web at any time/anywhere. The service is in the form of
object oriented storage. The objects are accessible to the users through SOAP (Simple Object
Access Protocol) along with supporting browsers or client programs. A reliable message
service among any two processes is provided by SQS. The below figure shows Amazon S3
execution environment.
An object is the fundamental unit of the S3. A bucket holds various objects by accessing
them through keys. There are even other attributes such as values, access control information
and metadata. The users perform read, write and delete on objects through the key-value
programming interface. Users can access the data from the Amazon clouds through the
interfaces REST and SOAP. Features of S3 are illustrated as follows:
The Amazon elastic block store offers a volume block interface to save as well as restore the
EC2 instance’s virtual images. Once the usage of traditional EC2 instances completed, they
will be deleted. The status of them is saved in EBS system upon machine shutdown. The
running data and EC2 instances are saved using EBS. The users are allowed to create the
storage ranging from 1GB to 1TB. These volumes can be mounted as EC2 instances.
Various volumes can be mounted as if they belong one instance. The user is allowed to create
file system above Amazon EBs volumes or to use in any desired way. Data saving is done
through snapshots to improve the performance. Amazon charges according to the usage.
Amazon SimpleDB service will provide a simple data model with respect to relational
database data model. The user input is sorted into domains which are considered as tables. A
table contains items as rows and attributes values as cells of the respective row. A single cell
can also be assigned with multiple values.
The developers require the data to be stored, accessed and queried easily. But this consistency
is not considered by the simpleDB. The Azure table manage less amounts of data in
distributed table so they called as ‘Little Table’. The BigTable is supposed to store big data.
The SimpleDB costs $0.140 for each Amazon simpleDB machine hours.
The underneath layer in the above figure is fabric that contains virtualized hardware along
with sophisticated control environment that dynamically assigns resources and implements
fault tolerance. With this even domain name system and monitoring capabilities get
implemented. Service models are allowed to be defined by XML templates and various
service copies to be initialized.
The services are monitored while the system is running and the users are allowed to access
the event logs, trace/debug the data. IIS web server logs, crash dumps, performance counters,
crash dump and other files. Azure storage is responsible to hold all this data and debugging
is allowed to be performed. The Azure is related to Internet through a customized computer
VM known as web role that supports basic Microsoft web hosting. These VMs are called
appliances. The roles that support HTTPs and TCP provide below methods.
Onstart(): This method is called fabric on startup that allows user to perform initialization
tasks. It will show busy status to load balance until it is false.
Onstop():This method is invoked when the role is supported to be shut down and then it
exits.
SQL Azure: The SQL server is provided as a service by SQL Azure. The REST interface is
used to access the storage modalities excluding the drivers that are introduce most recently
and which are analogous to Amazon EBS. It provides a file system interface in the form of
durable NTFS volume backed by blob storage. The interface REST is related with URL’s by
default. The storage gets duplicated three times due to fault tolerance and also guaranteed to
be consistent in access.
The storage system gets emerged from blobs that are analogous to S3 for Amazon. The blobs
are classified as,
The containers observed to be analogous to directories within traditional file systems where
account is the root. The block blob streams the data and arranges them as sequence of blocks
4MB each upto 200 GB. The page blobs are intended for read/write access and contain set of
pages with 1 TB size.
Azure Tables: The azure table and queue storage modes are intended for fewer volumes of
data. The queue will be offering reliable message delivery and also support work spooling
among the web and worker roles. They do not restrict the messages. The azure supports the
operations such as PUT, GET, DELETE as well as CREATE and DELETE. Every account is
assigned infinite tables containing rows and columns in the form of entities and properties
respectively.
The table entities are not restricted in number rather huge number of entities of distributed
computers. The general properties such as <name, type, values> are assigned to the entities.
The other properties such as PartitionKey and RowKey can be assigned to entities. The
purpose of RowKey is to assign unique label to every entry. The purpose of PartitionKey is
to get shared.
M. Purnachandra Rao, Assoc. Prof. Dept. of IT, KITS. Page 17
4. 5 EMERGING CLOUD SOFTWARE ENVIRONMENTS
It also supports the development of computer cloud and storage cloud. It stores images in
Walrus storage system which is similar to Amazon S3 service. It can be uploaded and
retrieved anytime. This helps users to create special virtual appliances. The above figure
depicts the architecture that depends on VM images requirement.
Nimbus:
Nimbus is a collection of open source tools which aim to offer an IaasS cloud computing
solution. It provides a special web interface known as Nimbus Web, which is placed around
python Django Web application installed independent of Nimbus service. A storage cloud
implementation known as Cumulus is combined with other central services and it is similar to
Amazon S3 REST API.
Nimbus supports the below defined resource management strategies. They are,
2. Pilot: In this mode the service requests cluster a Local Resource Management System for
VM manager in order to deploy VMs.
Nimbus implements Amazon’s EC2 interface in order to allow the users to make use of
clients which are developed with the aim of real EC2 system against Nimbus-based clouds.
4.5.2 Open Nebula: Open Nebula is an open source device that enables users to convert
available infrastructure into a IaaS cloud. It is designed to be flexible and modular to merge
with various storage and network infrastructure configurations and hypervisor technologies.
1. Core
2. Capacity manager or Scheduler
3. Access Drivers.
1. Core: It is a centralized component that controls the complete life cycle of virtual
machine. It includes setting networks for set of virtual machines and controls the storage
requirements like VM disk image deployment or software environment.
Apart from this, it provides management interfaces to merge core working with other data-
center tools like a accounting or monitoring frame works. It implements libvirt API and
Command-Line Interface (CLI) for virtual machine management. It also consists of two
features for changing environment like live migration and VM snapshots.
Sector /Sphere: Sector/Sphere is software which supports huge distributed data storage and
data processing on large clusters, in data center or multiple data centers. It consists of sector
distributed file system and sphere parallel data processing frame work. By using the fast
network connection the DFS is placed in large areas and enables user to control large data
sets. Fault tolerance is performed by copying and controlling data in file system. It is also
familiar with network topology and provides reliability, availability and access. In this
communication is achieved using User Datagram Protocol (UDP) and User Defined
Type(UDT) i.e UDP is used for message passing and UDT is used to transfer data.
It is a parallel data process designed to work with data controlled by sector. The data stored
can be processed by the developers using a programming framework provided by sphere. In
this application inputs and outputs are known as sector files. To support difficult
applications, multiple sphere processing segments are merged.
1. Security server
2. Slave nodes
3. Client
4. Space
1. Security Servers: It is responsible for verifying master servers, slave nodes and users.
This master server contains file system meta data, schedule jobs and responds to user’s
requests.
2. Slave Nodes: It is used to store and execute the data. It can be placed in a single data
center or multiple data center with high-speed network connections.
3. Client: It provides tools and programming API’s for accessing and processing the data.
Open Stack:
Open Stack was introduced by Rack space and NASA in July 2010. It is used to share
resources and technologies with a scalable and secure cloud infrastructure. Features of Open
Stack are given as follows:
M. Purnachandra Rao, Assoc. Prof. Dept. of IT, KITS. Page 20
a) Open Stack Compute
b) Open Stack Storage
a) Open Stack Compute: It is the internal fabric of cloud. It is used to create and control
large sets of virtual private servers.
It develops a cloud computing internal fabric controller, which is a part of an IaaS system
called Nova. It is built based on the idea of shared-nothing and exchange of message-based
information. Communication is done by using message queues. The components get
blocked while interacting with each other thereby waiting for the response can be prevented
by using deferred objects. This deferred object containing callbacks that triggered when
receiving response.
Shared-nothing paradigm can be achieved by changing the system state to distributed data
system. In this architecture, the API server receivers HTTP requests from boto, and it
converts the commands to API format then, it forwards the requests to cloud controller. This
cloud controller interacts with user manager with the help of Lightweight Directory Access
Protocol (LDAP). Apart from this nova combines networking parts to control private
networks, public IP addressing, VPN connectivity and firewall rules. It includes the
following types.
b) Open Stack Storage: It generates solution based on interacting parts and concepts that
consists a proxy server, ring, object server, container server, account server, replication,
updates and auditors. This proxy server allows the accounts, containers or objects in this
An object server is a simple blob storage server which is used to store, retrieve and delete
objects which are on local devices. A container server is used for listing the objects and it is
handled by the account server.
Aneka is a cloud application platform that is developed by Manjrasoft. It aims to support the
development and deployment of parallel and distributed applications on private and public
clouds. It produces a collection of APIs to utilize distributed resources and business logic
applications through programming abstractions. System administrators control tools to
observe and control deployed infrastructure. To increase the applications in both Linux and
Microsoft .NET framework it works as a workload distribution and management platform
Some of the key advantages of Aneka over other workload distribution solutions, include
Aneka offers 3 types of capabilities which are essential for building, accelerating and
managing clouds and their applications:
3. Manage: Management tools and capabilities supported by Aneka include GUI and APIs to
set up, monitor, manage, and maintain remote and global Aneka compute clouds.
In Aneka, the available services can be aggregated into three major categories like
1. Fabric Services
2. Foundation Services
3. Application Services
2. Foundation Services: These services constitute the core functionalities of the Aneka
middleware. They provide basic set of capabilities that enhance application execution in the
cloud. These services include storage management, resource reservation, reporting,
accounting, billing, services monitoring and licensing.
3. Application Services: These services deal directly with the execution of applications and
are in charge of providing the appropriate runtime environment for each application model.
At this level, Aneka can support different application models and distributed programming
patterns.