Ccs335 CC Unit IV Cloud Computing Unit 4 Notes
Ccs335 CC Unit IV Cloud Computing Unit 4 Notes
UNIT IV
Google App Engine – Amazon AWS – Microsoft Azure; Cloud Software Environments –
Eucalyptus– OpenStack.
1. Discuss in detail about the Google App engine and its architecture(or)Discuss in detail
about GAE Applications. Nov/Dec 2020(Or)Explain the functional modules of GAE with
an example(May-2022)(Or) Demonstrate the programming environment of Google
APP Engine.(May-2023)
Google App Engine (GAE) is a Platform-as-a-Service cloud computing model that supports
many programming languages.
GAE is a scalable runtime environment mostly devoted to execute Web applications. In fact, it
allows developers to integrate third-party frameworks and libraries with the infrastructure still
being managed by Google.
It allows developers to use readymade platform to develop and deploy web applications using
development tools, runtime engine, databases and middleware solutions. It supports languages
like Java, Python, .NET, PHP, Ruby, Node.js and Go in which developers can write their code and
deploy it on available google infrastructure with the help of Software Development Kit (SDK).
In GAE, SDKs are required to set up your computer for developing, deploying, and managing
your apps in App Engine. GAE enables users to run their applications on a large number of data
centers associated with Google’s search engine operations. Presently, Google App Engine uses
fully managed, serverless platform that allows to choose from several popular languages,
libraries, and frameworks to develop user applications and then uses App Engine to take care
of provisioning servers and scaling application instances based on demand. The functional
architecture of the Google cloud platform for app engine is shown in Fig. 4.1.
The infrastructure for google cloud is managed inside datacenter. All the cloud services
and applications on Google runs through servers inside datacenter. Inside each data center, there
are thousands of servers forming different clusters. Each cluster can run multipurpose servers.
The infrastructure for GAE composed of four main components like Google File System (GFS),
MapReduce, BigTable, and Chubby. The GFS is used for storing large amounts of data on google
storage clusters. The MapReduce is used for application program development with data
processing on large clusters. Chubby is used as a distributed application locking services while
BigTable offers a storage service for accessing structured as well as unstructured data. In this
architecture, users can interact with Google applications via the web interface provided by each
application.
Fig. 4.1 : Functional architecture of the Google cloud platform for app engine
• Application runtime environment offers a platform that has built-in execution engine
for scalable web programming and execution.
• Software Development Kit (SDK) for local application development and deployment
over google cloud platform.
• Datastore to provision object-oriented, distributed, structured data storage to store
application and data. It also provides secures data management operations based on
BigTable techniques.
• Admin console used for easy management of user application development and resource
management
• GAE web service for providing APIs and interfaces.
The Google provides programming support for its cloud environment, that is, Google Apps
Engine, through Google File System (GFS), Big Table, and Chubby. The following sections provide
a brief description about GFS, Big Table, Chubby and Google APIs.
Google has designed a distributed file system, named GFS, for meeting its exacting demands
off processing a large amount of data. Most of the objectives of designing the GFS are similar
to those of the earlier designed distributed systems. Some of the objectives include
availability, performance, reliability, and scalability of systems. GFS has also been designed
with certain challenging assumptions that also provide opportunities for developers and
researchers to achieve these objectives. Some of the assumptions are listed as follows :
b) Efficient storage support for large - sized files as a huge amount of data to be processed is
stored in these files. Storage support is provided for small - sized files without requiring any
optimization for them.
c) With the workloads that mainly consist of two large streaming reads and small random
reads, the system should be performance conscious so that the small reads are made steady
rather than going back and forth by batching and sorting while advancing through the file.
d) The system supports small writes without being inefficient, along with the usual large
and sequential writes through which data is appended to files.
e) Semantics that are defined well are implemented.
g) Provisions for sustained bandwidth is given priority rather than a reduced latency. Google
takes the aforementioned assumptions into consideration, and supports its
cloud platform, Google Apps Engine, through GFS. Fig. 4.2 shows the architecture of
the GFS clusters.
GFS provides a file system interface and different APIs for supporting different file operations
such as create to create a new file instance, delete to delete a file instance, open to open a named
file and return a handle, close to close a given file specified by a handle, read to read data from
a specified file and write to write data to a specified file.
A single GFS Master and three chunk servers are serving to two clients comprise a GFS cluster.
These clients and servers, as well as the Master, are Linux machines, each running a server
process at the user level. These processes are known as user-level server processes.
Applications contain a specific file system, Application Programming Interface (APIs) that are
executed by the code that is written for the GFS client. Further, the communication
with the GFS Master and chunk servers are established for performing the read and write
operations on behalf of the application.
The clients interact with the Master only for metadata operations. However, data-bearing
communications are forwarded directly to chunk servers. POSIX API, a feature that is common to
most of the popular file systems, is not included in GFS, and therefore, Linux vnode layer hook-
in is not required.
Clients or servers do not perform the caching of file data. Due to the presence of the
streamed workload, caching does not benefit clients, whereas caching by servers has the least
consequence as a buffer cache that already maintains a record for frequently requested files
locally.
The GFS provides the following features :
Big Table
Googles Big table is a distributed storage system that allows storing huge volumes of
structured as well as unstructured data on storage mediums.
Google created Big Table with an aim to develop a fast, reliable, efficient and scalable
storage system that can process concurrent requests at a high speed.
Millions of users access billions of web pages and many hundred TBs of satellite images. A
lot of semi-structured data is generated from Google or web access by users.
This data needs to be stored, managed, and processed to retrieve insights. This required
data management systems to have very high scalability.
Google's aim behind developing Big Table was to provide a highly efficient system for
managing a huge amount of data so that it can help cloud storage services.
It is required for concurrent processes that can update various data pieces so that the most
recent data can be accessed easily at a fast speed. The design requirements of Big Table
are as follows :
1. High speed
2. Reliability
3. Scalability
4. Efficiency
5. High performance
Big Table is a popular, distributed data storage system that is highly scalable and self-
managed. It involves thousands of servers, terabytes of data storage for in-memory
operations, millions of read/write requests by users in a second and petabytes of data stored
on disks. Its self-managing services help in dynamic addition and removal of servers that
are capable of adjusting the load imbalance by themselves.
It has gained extreme popularity at Google as it stores almost all kinds of data, such as Web
indexes, personalized searches, Google Earth, Google Analytics, and Google Finance. It
contains data from the Web is referred to as a Web table. The generalized architecture of Big table
is shown in Fig. 4.3
It is composed of three entities, namely Client, Big table master and Tablet servers. Big tables
are implemented over one or more clusters that are similar to GFS clusters. The client
application uses libraries to execute Big table queries on the master server. Big table is initially
broken up into one or more slave servers called tablets for the execution of secondary tasks. Each
tablet is 100 to 200 MB in size.
The master server is responsible for allocating tablets to tasks, clearing garbage collections
and monitoring the performance of tablet servers. The master server splits tasks and executes
them over tablet servers. The master server is also responsible for maintaining a centralized view
of the system to support optimal placement and load- balancing decisions.
It performs separate control and data operations strictly with tablet servers. Upon granting
the tasks, tablet servers provide row access to clients. Fig. 4.6.3 shows the structure of Big table
:
Big Table is arranged as a sorted map that is spread in multiple dimensions and involves
sparse, distributed, and persistence features. The Big Table’s data model primarily combines
three dimensions, namely row, column, and timestamp. The first two dimensions are string types,
whereas the time dimension is taken as a 64-bit integer. The resulting combination of these
dimensions is a string type.
Each row in Big table has an associated row key that is an arbitrary string of up to
64 KB in size. In Big Table, a row name is a string, where the rows are ordered in a lexicological
form. Although Big Table rows do not support the relational model, they offer atomic access
to the data, which means you can access only one record at a time. The rows contain a large
amount of data about a given entity such as a web page. The row keys represent URLs that
contain information about the resources that are referenced by the URLs.
The other important dimension that is assigned to Big Table is a timestamp. In Big table,
the multiple versions of data are indexed by timestamp for a given cell. The timestamp is either
related to real-time or can be an arbitrary value that is assigned by a programmer. It is used for
storing various data versions in a cell.
By default, any new data that is inserted into Big Table is taken as current, but you can
explicitly set the timestamp for any new write operation in Big Table. Timestamps provide the
Big Table lookup option that returns the specified number of the most recent values. It can be
used for marking the attributes of the column families.
The attributes either retain the most recent values in a specified number or keep the values
for a particular time duration.
Big Table supports APIs that can be used by developers to perform a wide range of operations
such as metadata operations, read/write operations, or modify/update operations. The
commonly used operations by APIs are as follows:
• Creation and deletion of tables
• Creation and deletion of column families within tables
• Writing or deleting cell values
• Accessing data from rows
• Associate metadata such as access control information with tables and column
families
The functions that are used for atomic write operations are as follows :
Chubby
Chubby is the crucial service in the Google infrastructure that offers storage and coordination
for other infrastructure services such as GFS and Bigtable. It is a coarse - grained distributed
locking service that is used for synchronizing distributed activities in an asynchronous
environment on a large scale. It is used as a name service within Google and provides reliable
storage for file systems along with the election of coordinator for multiple replicas. The Chubby
interface is similar to the interfaces that are provided by
distributed systems with advisory locks. However, the aim of designing Chubby is to provide
reliable storage with consistent availability.
It is designed to use with loosely coupled distributed systems that are connected in a
high-speed network and contain several small-sized machines. The lock service enables the
synchronization of the activities of clients and permits the clients to reach a consensus about
the environment in which they are placed. Chubby’s main aim is to efficiently handle a large set
of clients by providing them a highly reliable and available system. Its other important
characteristics that include throughput and storage capacity are secondary. Fig. 4.5 shows the
typical structure of a Chubby system :
The chubby architecture involves two primary components, namely server and client library.
Both the components communicate through a Remote Procedure Call (RPC). However, the
library has a special purpose, i.e., linking the clients against the chubby cell. A Chubby cell
contains a small set of servers. The servers are also called replicas, and usually, five servers are
used in every cell. The Master is elected from the five replicas through a distributed protocol that
is used for consensus. Most of the replicas must vote
for the Master with the assurance that no other Master will be elected by replicas that have
once voted for one Master for a duration. This duration is termed as a Master lease.
Chubby supports a similar file system as Unix. However, the Chubby file system is simpler than
the Unix one. The files and directories, known as nodes, are contained in the Chubby namespace.
Each node is associated with different types of metadata. The nodes are opened to obtain the
Unix file descriptors known as handles. The specifiers for handles include check digits for
preventing the guess handle for clients, handle sequence numbers, and mode information for
recreating the lock state when the Master changes. Reader and writer locks are implemented by
Chubby using files and directories. While exclusive permission for a lock in the writer mode can
be obtained by a single client, there can be any number of clients who share a lock in the reader’s
mode.
Another important term that is used with Chubby is an event that can be subscribed by
clients after the creation of handles. An event is delivered when the action that corresponds to it
is completed. An event can be :
a. Modification in the contents of a file
In Chubby, caching is done by a client that stores file data and metadata to reduce the traffic
for the reader lock. Although there is a possibility for caching of handles and files locks, the
Master maintains a list of clients that may be cached. The clients, due to caching, find data
to be consistent. If this is not the case, an error is flagged. Chubby maintains sessions between
clients and servers with the help of a keep-alive message, which is required every few seconds to
remind the system that the session is still active.
If the server failure has indeed occurred, the Master does not respond to a client about the
keep-alive message in the local lease timeout. This incident sends the session in jeopardy.
It can be recovered in a manner as explained in the following points:
• The cache needs to be cleared.
• The client needs to wait for a grace period, which is about 45 seconds.
• Another attempt is made to contact the Master.
If the attempt to contact the Master is successful, the session resumes and its jeopardy is over.
However, if this attempt fails, the client assumes that the session is lost. Fig.4.6 shows the case of
the failure of the Master :
Chubby offers a decent level of scalability, which means that there can be any (unspecified)
number of the Chubby cells. If these cells are fed with heavy loads, the lease timeout increases.
This increment can be anything between 12 seconds and 60 seconds. The data is fed in a small
package and held in the Random-Access Memory (RAM) only. The Chubby system also uses
partitioning mechanisms to divide data into smaller packages. All of its excellent services and
applications included, Chubby has proved to be a great innovation when it comes to storage,
locking, and program support services.
The Chubby is implemented using the following APls :
API Descriptio
Open Opens the file or directory and returns a handle
Close Closes the file or directory and returns the associated
Delete Deletes
handle the file or directory
ReadDir Returns the contents of a directory
SetContents Writes the contents of a file
GetStat Returns the metadata
GetContentsAndSt Writes the file contents and return metadata associated
Acquire Acquires a lock on a file
with the file
Google developed a set of Application Programming Interfaces (APIs) that can be used to
communicate with Google Services. This set of APIs is referred as Google APIs. and their
integration to other services. They also help in integrating Google Services to other services.
Google App Engine help in deploying an API for an app while not being aware about its
infrastructure. Google App Engine also hosts the endpoint APIs which are created by Google
Cloud Endpoints. A set of libraries, tools, and capabilities that can be used to generate client
libraries and APIs from an App Engine application is known as Google Cloud Endpoints. It eases
the data accessibility for client applications. We can also save the time of writing the network
communication code by using Google Cloud Endpoints that can also generate client libraries
for accessing the backend API.
AWS:
2. Explain in detail about AWS EC2 and EB with an example.
Programming on Amazon EC2
✓ Amazon was the first company to introduce VMs in application hosting. Customers can
rent VMs instead of physical machines to run their own applications. By using VMs,
customers can load any software of their choice.
✓ The elastic feature of such a service is that a customer can create, launch, and terminate
server instances as needed, paying by the hour for active servers. Amazon provides
several types of preinstalled VMs.
✓ Instances are often called Amazon Machine Images (AMIs) which are preconfigured
with operating systems based on Linux or Windows, and additional software.
Table 6.12 defines three types of AMI. Figure 6.24 shows an execution environment. AMIs are
the templates for instances, which are running VMs. The workflow to create a VM is
Create an AMI → Create Key Pair → Configure Firewall → Launch (6.3)
This sequence is supported by public, private, and paid AMIs shown in Figure 6.24. The
AMIs
are
formed from the virtualized compute, storage, and server resources shown at the bottom of
Figure 6.23.
network performance and are well suited for high-performance computing (HPC)
applications and other demanding network-bound applications. They use 10 Gigabit
Ethernet interconnections.
Amazon S3 offers a simple web services interface that can be used to store and retrieve any
amount of data from anywhere, at any time on the web. It gives any developer access to the same
scalable, secure, fast, low - cost data storage infrastructure that Amazon uses to operate its own
global website network. S3 is an online backup and storage system. The high - speed data transfer
feature known as AWS Import/Export will exchange data to and from AWS using Amazon’s
own internal network to another portable device.
Amazon S3 is a cloud - based storage system that allows storage of data objects in the range
of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have predefined
buckets, and buckets serve the function of a directory, though there is no object hierarchy to
a bucket, and the user can save objects to it but not files. Here it is important to note that the
concept of a file system is not associated with S3 because file systems are not supported, only
objects are stored. In addition to this, the user is not required to mount a bucket, as opposed to a
file system. Fig. 4.7 shows an S3 diagrammatically.
S3 system allows buckets to be named (Fig. 4.8), but the name must be unique in the S3
namespace across all consumers of AWS. The bucket can be accessed through the S3 web API
(with SOAP or REST), which is similar to a normal disk storage system.
The performance of S3 is limited for use with non-operational functions such as data archiving,
retrieval and disk backup. The REST API is more preferred to SOAP API because it is easy to
work with large binary objects in REST.
Amazon S3 offers large volumes of reliable storage with high protection and low
bandwidth access. S3 is most ideal for applications that need storage archives. For example,
S3 is used by large storage sites that share photos and images.
The APIs to manage the bucket has the following features :
The S3 service can be used by many users as a backup component in a 3-2-1 backup method.
This implies that your original data is 1, a copy of your data is 2 and an off-site copy of data is 3.
In this method, S3 is the 3rd level of backup. In addition to this, Amazon S3 provides the feature
of versioning.
In versioning, every version of the object stored in an S3 bucket is retained, but for this, the
user must enable the versioning feature. Any HTTP or REST operation, namely PUT, POST, COPY
or DELETE will create a new object that is stored along with the older version. A GET operation
retrieves the new version of the object, but the ability to recover and undo actions are also
available. Versioning is a useful method for reserving and
archiving data.
Amazon Glacier
Amazon glacier is very low - price online file storage web service which offer secure, flexible
and durable storage for online data backup and archiving. This web service is specially designed
for those data which are not accessed frequently. That data which is allowed to be retrieved
within three to five hours can use amazon glacier service.
You can virtually store any type of data, any format of data and any amount of data using
amazon glacier. The file in ZIP and TAR format are the most common type of data stored in
amazon glacier.
Some of the common use of amazon glacier are :
• Replacing the traditional tape solutions with backup and archive which can last
longer.
• Storing data which is used for the purposes of compliance.
Glacier Vs S3
Both amazon S3 and amazon glacier work almost the same way. However, there are
certain important aspects that can reflect the difference between them. Table 6.10.1 shows
the comparison of amazon glacier and amazon S3 :
You can also use amazon S3 interface for availing the offerings of amazon glacier with no need
of learning a new interface. This can be done by utilising Glacier as S3 storage class along with
object lifecycle policies.
Azure:
4. Explain in detail about Azure Architecture and its Components with an example.
Microsoft Windows Azure(Azure)
In 2008, Microsoft launched a Windows Azure platform to meet the challenges in cloud computing.
This platform is built over Microsoft data centers. Figure 4.22 shows the overall architecture of
Microsoft’s cloud platform. The platform is divided into three major component platforms.
Windows Azure offers a cloud platform built on Windows OS and based on Microsoft virtualization
technol- ogy. Applications are installed on VMs deployed on the data-center servers. Azure manages
all servers, storage, and network resources of the data center. On top of the infrastructure are the
var- ious services for building different cloud applications.
Live service Users can visit Microsoft Live applications and apply the data involved across multiple
machines concurrently.
• .NET service This package supports application development on local hosts and execution on
cloud machines.
SQL Azure This function makes it easier for users to visit and use the relational database associated
with the SQL server in the cloud.
• SharePoint service This provides a scalable and manageable platform for users to develop their
special business applications in upgraded web services.
• Dynamic CRM service This provides software developers a business platform in managing CRM
applications in financing, marketing, and sales and promotions.
✓ All these cloud services in Azure can interact with traditional Microsoft software
applications, such as Windows Live, Office Live, Exchange online, SharePoint online, and
dynamic CRM online.
✓ The Azure platform applies the standard web communication protocols SOAP and REST. The
Azure service applications allow users to integrate the cloud application with other
platforms or third-party clouds.
✓ You can download the Azure development kit to run a local version of Azure. The powerful
SDK allows Azure applications to be developed and debugged on the Windows hosts.
SQLAzure
Azure offers a very rich set of storage capabilities, as shown in Figure 6.25. All the storage
modalities are accessed with REST interfaces except for the recently introduced Drives that are
analogous to Amazon EBS discussed in above (AWS Methods), and offer a file system interface
as a durable NTFS volume backed by blob storage. The REST interfaces are automatically
associated with URLs and all storage is replicated three times for fault tolerance and is
guaranteed to be consistent in access.
The basic storage system is built from blobs which are analogous to S3 for Amazon. Blobs are
arranged as a three-level hierarchy: Account → Containers → Page or Block Blobs.
Containers are analogous to directories in traditional file systems with the account acting as the
root. The block blob is used for streaming data and each such blob is made up as a sequence of
blocks of up to 4 MB each, while each block has a 64 byte ID.
Block blobs can be up to 200 GB in size. Page blobs are for random read/write access and consist
of an array of pages with a maximum blob size of 1 TB. One can associate metadata with blobs
as <name, value> pairs with up to 8 KB per blob.
Azure Tables
The Azure Table and Queue storage modes are aimed at much smaller data volumes. Queues
provide reliable message delivery and are naturally used to support work spooling between web
and worker roles. Queues consist of an unlimited number of messages which can be retrieved
and pro- cessed at least once with an 8 KB limit on message size.
Azure supports PUT, GET, and DELETE message operations as well as CREATE and DELETE for
queues. Each account can have any number of Azure tables which consist of rows called entities
and columns called properties.
There is no limit to the number of entities in a table and the technology is designed to scale well
to a large number of entities stored on distributed computers. All entities can have up to 255
general properties which are <name, type, value> triples.
Two extra properties, PartitionKey and RowKey, must be defined for each entity, but
otherwise, there are no constraints on the names of properties—this table is very flexible!
RowKey is designed to give each entity a unique label while PartitionKey is designed to be
shared and entities with the same PartitionKey are stored next to each other; a good use of
PartitionKey can speed up search performance. An entity can have, at most, 1 MB storage; if you
need large value sizes, just store a link to a blob store in the Table property value. ADO.NET and
LINQ support table queries.
Eucalyptus is a Linux-based open-source software architecture for cloud computing and also a
storage platform that implements Infrastructure a Service (IaaS). It provides quick and efficient
computing services. Eucalyptus was designed to provide services compatible with Amazon’s EC2
cloud and Simple Storage Service(S3).
Eucalyptus Architecture
Eucalyptus CLIs can handle Amazon Web Services and their own private instances. Clients have
the independence to transfer cases from Eucalyptus to Amazon Elastic Cloud. The virtualization
layer oversees the Network, storage, and Computing. Occurrences are isolated by hardware
virtualization.
Components of Architecture
• Node Controller is the lifecycle of instances running on each node. Interacts with the
operating system, hypervisor, and Cluster Controller. It controls the working of VM instances
on the host machine.
• Cluster Controller manages one or more Node Controller and Cloud Controller
simultaneously. It gathers information and schedules VM execution.
• Storage Controller (Walrus) Allows the creation of snapshots of volumes. Persistent block
storage over VM instances. Walrus Storage Controller is a simple file storage system. It stores
images and snapshots. Stores and serves files using S3(Simple Storage Service) APIs.
• Cloud Controller Front-end for the entire architecture. It acts as a Complaint Web Services
to client tools on one side and interacts with the rest of the components on the other side.
• Managed Mode: Numerous security groups to users as the network is large. Each security
group is assigned a set or a subset of IP addresses. Ingress rules are applied through the
security groups specified by the user. The network is isolated by VLAN between Cluster
Controller and Node Controller. Assigns two IP addresses on each virtual machine.
• Managed (No VLAN) Node: The root user on the virtual machine can snoop into other
virtual machines running on the same network layer. It does not provide VM network isolation.
• System Mode: Simplest of all modes, least number of features. A MAC address is assigned to
a virtual machine instance and attached to Node Controller’s bridge Ethernet device.
• Static Mode: Similar to system mode but has more control over the assignment of IP address.
MAC address/IP address pair is mapped to static entry within the DHCP server. The next set of
MAC/IP addresses is mapped.
Advantages Of The Eucalyptus Cloud
1. Eucalyptus can be utilized to benefit both the eucalyptus private cloud and the eucalyptus
public cloud.
2. Examples of Amazon or Eucalyptus machine pictures can be run on both clouds.
3. Its API is completely similar to all the Amazon Web Services.
4. Eucalyptus can be utilized with DevOps apparatuses like Chef and Puppet.
5. Although it isn’t as popular yet but has the potential to be an alternative to OpenStack and
CloudStack.
6. It is used to gather hybrid, public and private clouds.
7. It allows users to deliver their own data centers into a private cloud and hence, extend the
services to other organizations.
Nimbus Nimbus is a toolkit that, once included in the collection, provides infrastructure such as a
cloud of service to its client through the WSRF-based web service APIs or Amazon EC2 WSDL.
Nimbus is a free and open source software, subject to the requirements of the Apache License,
version 2.
Nimbus supports both the Xen and KVM hypervisors as well as the portable device organizers
Portable Batch System and Oracle Grid Engine. Allows the submission of customized visual
clusters for content. It is adjustable in terms of planning, network rental, and accounting usage.
Open Stack
OpenStack is an open - source cloud operating system that is increasingly gaining admiration
among data centers. This is because OpenStack provides a cloud computing platform to handle
enormous computing, storage, database and networking resources in a data center. In simple
way we can say, OpenStack is an opensource highly scalable cloud computing platform that
provides tools for developing private, public or hybrid clouds, along with a web interface for users
to access resources and admins to manage those resources.
Put otherwise, OpenStack is a platform that enables potential cloud providers to create,
manage and bill their custom-made VMs to their future customers. OpenStack is free and open,
which essentially means that everyone can have access to its source code and can suggest or make
changes to it and share it with the OpenStack community. OpenStack is an open-source and freely
available cloud computing platform that enables its users to create, manage and deploy virtual
machines and other instances. Technically, OpenStack provides Infrastructure-as-a-Service
(IaaS) to its users to enable them to manage virtual private servers in their data centers.
OpenStack provides the required software tools and technologies to abstract the underlying
infrastructure to a uniform consumption model. Basically, OpenStack allows various
organisations to provide cloud services to the user community by leveraging the organization’s
pre-existing infrastructure. It also provides options for scalability so that resources can be scaled
whenever organisations need to add more resources without hindering the ongoing processes.
The main objective of OpenStack is to provide a cloud computing platform that is :
• Global
• Open-source
• Freely available
• Easy to use
• Highly and easily scalable
• Easy to implement
• Interoperable
OpenStack is for all. It satisfies the needs of users, administrators and operators of private
clouds as well as public clouds. Some examples of open-source cloud platforms already available
are Eucalyptus, OpenNebula, Nimbus, CloudStack and OpenStack, which are used for
infrastructure control and are usually implemented in private clouds.
Components of OpenStack
OpenStack consists of many different components. Because OpenStack cloud is open - source,
developers can add components to benefit the OpenStack community. The following are the core
components of OpenStack as identified by the OpenStack community:
• Nova : This is one of the primary services of OpenStack, which provides numerous tools for
the deployment and management of a large number of virtual machines. Nova is the
compute service of OpenStack.
• Swift : Swift provides storage services for storing files and objects. Swift can be equated
with Amazon’s Simple Storage System (S3).
• Cinder : This component provides block storage to Nova Virtual Machines. Its working
is similar to a traditional computer storage system where the computer is able to access
specific locations on a disk drive. Cinder is analogous to AWS’s EBS.
• Glance : Glace is OpenStack’s image service component that provides virtual templates
(images) of hard disks. These templates can be used for new VMs. Glance may use either
Swift or flat files to store these templates.
• Neutron (formerly known as Quantum) : This component of OpenStack provides
Networking-as-a- Service, Load-Balancer-as-a-Service and Firewall- as-a-Service. It also
ensures communication between other components.
• Heat : It is the orchestration component of OpenStack. It allows users to manage
infrastructural needs of applications by allowing the storage of requirements in files.
• Keystone : This component provides identity management in OpenStack
• Horizon : This is a dashboard of OpenStack, which provides a graphical interface.
• Ceilometer : This component of OpenStack provisions meters and billing models for
users of the cloud services. It also keeps an account of the resources used by each
individual user of the OpenStack cloud. Let us also discuss some of the non- core
components of OpenStack and their offerings.
The basic architectural components of OpenStack, shown in Fig:4.12, includes its core and
optional services/ components. The optional services of OpenStack are also known as Big Tent
services, and OpenStack can be used without these components or they can be used as per
requirement.
We have already discussed the core services and the four optional services. Let us now
discuss the rest of the services.
• Designate : This component offers DNS services analogous to Amazon’s Route 53.
The following are the subsystems of Designate :
Mini DNS Server
Pool Manager
Central Service and APIs
• Barbican : Barbican is the key management service of OpenStack that is comparable to
KMS from AWS. This provides secure storage, retrieval, and provisioning and management
of various types of secret data, such as keys, certificates, and even binary data.
• AMQP : AMQP stands for Advanced Message Queue Protocol and is a messaging mechanism
used by OpenStack. The AQMP broker lies between two components of Nova and enables
communication in a slackly coupled fashion.
Further, OpenStack uses two architectures - Conceptual and Logical, which are
discussed in the next section.
OpenStack helps build cloud environments by providing the ability to integrate various
technologies of your choice. Apart from the fact that OpenStack is open-source, there are
numerous benefits that make it stand out. Following are some of the features and benefits of
OpenStack Cloud :
• Compatibility : OpenStack supports both private and public clouds and is very easy to
deploy and manage. OpenStack APIs are supported in Amazon Web Services. The
compatibility eliminates the need for rewriting applications for AWS, thus enabling easy
portability between public and private clouds.
• Security : OpenStack addresses the security concerns, which are the top- most concerns
for most organisations, by providing robust and reliable security systems.
• Real-time Visibility : OpenStack provides real-time client visibility to administrators,
including visibility of resources and instances, thus enabling administrators and providers
to track what clients are requesting for.
• Live Upgrades : This feature allows upgrading services without any downtime.
Earlier, for upgradations, the was a need for shutting-down complete systems, which
resulted in loss of performance. Now, OpenStack has enabled upgrading systems while they
are running by requiring only individual components to shut- down.
Apart from these, OpenStack offers other remarkable features, such as networking,
compute, Identity Access Management, orchestration, etc.
Fig. 4.13, depicting a magnified version of the architecture by showing relationships among
different services and between the services and VMs. This expanded representation is
also known as the Conceptual architecture of OpenStack.
From Fig. 5.7.2, we can see that every service of OpenStack depends on other services within
the systems, and all these services exist in a single ecosystem working together to produce a
virtual machine. Any service can be turned on or off depending on the VM required to be
produced. These services communicate with each other through APIs and in some cases through
privileged admin commands.
Let us now discuss the relationship between various components or services specified in the
conceptual architecture of OpenStack. As you can see in Figure 4.2, three components, Keystone,
Ceilometer and Horizon, are shown on top of the OpenStack platform.
Here, Horizon is providing user interface to the users or administrators to interact with
underlying OpenStack components or services, Keystone is providing authentication to the
user by mapping the central directory of users to the accessible OpenStack services, and
Ceilometer is monitoring the OpenStack cloud for the purpose of scalability, billing,
benchmarking, usage reporting and other telemetry services. Inside the OpenStack platform, you
can see that various processes are handled by different OpenStack services; Glance is registering
Hadoop images, providing image services to OpenStack and allowing retrieval and storage of
disk images. Glance stores the images in Swift, which is responsible for providing reading service
and storing data in the form of objects and files. All other OpenStack components also store data
in Swift, which also stores data or job binaries. Cinder, which offers permanent block storage or
volumes to VMs, also stores backup volumes in Swift. Trove stores backup databases in Swift and
boots databases instances via Nova, which is the main computing engine that provides and
manages virtual machines using disk images.
Neutron enables network connectivity for VMs and facilitates PXE Network for Ironic that
fetches images via Glance. VMs are used by the users or administrators to avail and provide the
benefits of cloud services. All the OpenStack services are used by VMs in order to provide best
services to the users. The infrastructure required for running cloud services is managed by Heat,
which is the orchestration component of OpenStack that orchestrates clusters and stores the
necessarys resource requirements of a cloud application. Here, Sahara is used to offer a
simple means of providing a data processing framework to the cloud users.
Table 4.14 shows the dependencies of these services.
OpenStack majorly operates in two modes - single host and multi host. A single host mode of
operation is that in which the network services are based on a central server, whereas a multi
host operation mode is that in which each compute node has a duplicate copy of the network
running on it and the nodes act like Internet gateways that are running on individual nodes.
In addition to this, in a multi host operation mode, the compute nodes also individually host
floating IPs and security groups. On the other hand, in a single host mode of operation, floating
IPs and security groups are hosted on the cloud controller to enable communication.
Both single host and multi host modes of operations are widely used and have their own
set of advantages and limitations. A single host mode of operation has a major limitation that if
the cloud controller goes down, it results in the failure of the entire system because instances
stop communicating. This is overcome by a multi host operation mode where a copy of the
network is provisioned to every node. Whereas, this limitation is overcome by the multi host
mode, which requires a unique public IP address for each compute node to enable
communication. In case public IP addresses are not available, using the multi host mode is not
possible.
PART-A
7. Define GAE.
Google App Engine (GAE) is a platform-as-a-service cloud computing model that
supports many programming languages. GAE is a scalable runtime environment mostly
devoted to execute Web applications. In fact, it allows developers to integrate third-party
frameworks and libraries with the infrastructure still being managed by Google. It allows
developers to use readymade platform to develop and deploy web applications using
development tools, runtime engine, databases and middleware solutions. It supports
languages like Java, Python, .NET, PHP, Ruby, Node.js and Go in which developers can write
their code and deploy it on available google infrastructure with the help of Software
Development Kit (SDK). In GAE, SDKs are required to set up your computer for developing,
deploying, and managing your apps in App Engine.
OpenStack is an open source highly scalable cloud computing platform that provides
tools for developing private, public or hybrid clouds, along with a web interface for
users to access resources and admins to manage those resources.
The different components of Openstack architecture are :
a. Nova (Compute)
b. Swift (Object storage)
c. Cinder (Block level storage)
d. Neutron (Networking)
e. Glance (Image Management)
f. Keystone (Key management) g. Horizon (Dashboard)
h. Ceilometer (Metering)
i. Heat (Orchestration)
this feature, users don't need to worry about tracking the latest version or who made
any changes.
• Data Protection - By storing data on cloud storage services, data is well protected
against all kinds of disasters, such as floods, earthquakes and human error.
• Disaster Recovery - Data stored in the cloud is not only protected from disasters by
having the same copy at several locations, but can also favor disaster recovery in
order to ensure business continuity.
The description about popular cloud storage providers are given as follows :
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
• The size of the Bigtable database can be petabytes, spanning thousands of distributed
servers. Bigtable is now open to developers as part of the Google App Engine, their
cloud computing platform.
• Microsoft Live Mesh : Windows Live Mesh was a free-to-use Internet-based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses Live Framework APIs to share any
data item between devices that recognize the data.
• Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage-based pricing. It supports Cloud-based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup, or unstructured archives that need long-
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built-in disaster data recovery and automatic
data replication feature for up to three geographically distributed storage nodes.
17. What is meant by Amazon Elastic Block Store (EBS) and SimpleDB?
• The Elastic Block Store (EBS) provides the volume block interface for saving and
restoring the virtual images of EC2 instances.
• Traditional EC2 instances will be destroyed after use. The status of EC2 can now be saved
in the EBS system after the machine is shut down. Users can use EBS to save persistent
data and mount to the running instances of EC2.
18. What is Amazon SimpleDB Service?
• SimpleDB provides a simplified data model based on the relational database data model.
Structured data from users must be organized into domains. Each domain can be
considered a table. The items are the rows in the table.
• A cell in the table is recognized as the value for a specific attribute (column name) of the
corresponding row. This is similar to a table in a relational database. However, it is
possible to assign multiple values to a single cell in the table.
Windows Azure offers a cloud platform built on Windows OS and based on Microsoft
virtualization technology. Applications are installed on VMs deployed on the data-center
servers. Azure manages all servers, storage, and network resources of the data center.
SQL Azure This function makes it easier for users to visit and use the relational database
associated with the SQL server in the cloud.
• Dynamic CRM service This provides software developers a business platform in managing
CRM applications in financing, marketing, and sales and promotions.
The Azure Table and Queue storage modes are aimed at much smaller data volumes. Queues
provide reliable message delivery and are naturally used to support work spooling between web
and worker roles. Queues consist of an unlimited number of messages which can be retrieved
and pro- cessed at least once with an 8 KB limit on message size.