Clouding Computing (UNIT - IV)
Clouding Computing (UNIT - IV)
1. Hadoop - MapReduce
Hadoop is an open source software framework used to develop data processing applications which are executed in a
distributed computing environment. There are (of Hadoop Architecture) basically two components in Hadoop:
The first one is HDFS for storage (Hadoop distributed File System), that allows you to store data of various formats
across a cluster.
The second one is YARN, for resource management in Hadoop. It allows parallel processing over the data, i.e. stored
across HDFS.
MapReduce is the core component for data processing in Hadoop framework. It is a processing technique built on
divide and conquer algorithm. It is made of two different tasks - Map and Reduce. Map takes a set of data and converts
it into another set of data, where individual elements are broken down into tuples. Secondly, reduce task, which takes
the output from a map as an input and combines those data tuples into a smaller set of tuples and fetches it.
How MapReduce Algorithm Works? The whole process goes through four phases of execution namely, splitting,
mapping, shuffling, and reducing. The data goes through the following phases:
Input Splits: In this phase it takes input tasks (say Data Sets) and divided into fixed-size pieces called input splits.
Mapping: This is the very first phase in the execution of map-reduce program. It takes input tasks (say DataSets) and
divides them into smaller sub-tasks. Then perform required computation on each sub-task in parallel. The output of this
Map Function is a set of key and value pairs in the form of <word, frequency>.
1
Shuffling: Shuffle Function is also known as “Combine Function”. It performs the following two sub-steps:
Merging
Sorting
This phase consumes the output of mapping phase and performs these two sub-steps on each and every key-value pair.
o Merging step combines all key-value pairs which have same keys.
o Sorting step takes input from merging step and sorts all key-value pairs by using Keys.
Finally, Shuffle Function returns a list of <Key, List<Value>> sorted pairs to next step.
Reducing: In this phase, output values from the shuffling phase are aggregated. This phase combines values from
shuffling phase and returns a single output value. In short, this phase summarizes the complete dataset.
Let's understand this with an example –
Consider you have following input data for your Map Reduce Program
Welcome to Hadoop Class
Hadoop is good
Hadoop is bad
bad 1
The final output of the MapReduce task is
Class 1
good 1
Hadoop 3
is 2
to 1
Welcom 1
2
any 32-bit processor compatible with the x86 instruction set]. It acts as a hypervisor, creating a VM (virtual machine) in
which the user can run another OS (operating system).
The operating system in which Virtual Box runs is called the "host" OS. The operating system running in the VM
is called the "guest" OS. Virtual Box supports Windows, Linux, or macOS as its host OS.
Guest operating systems supported by Virtual Box include:
o Windows 10, 8, 7, XP, Vista, 2000, NT, and 98.
o Solaris and OpenSolaris
o MS-DOS.
o OS/2
o QNX
o BeOS
3. Google App Engine (GAE): Google App Engine is a Platform-as-a-Service. Amongst its various cloud-based products,
Google app engine has become quite popular.
It is a service for developing and hosting Web applications in Google's data centers, belonging to the platform as a
service (PaaS) category of cloud computing. These applications are required to be written in one of a few supported
languages, namely: Java, Python, GO, PHP etc. It is basically a cloud-computing platform through which applications can
be run in a serverless environment. The app engine supports the delivery, testing and development of software on
demand in a Cloud computing environment that supports millions of users and is highly scalable.
The company extends its platform and infrastructure to the Cloud through its app engine. It presents the
platform to those who want to develop SaaS solutions at competitive costs.
Advantages of Google App Engine: There are many advantages to the Google App Engine that helps to take your app
ideas to the next level. This includes:
3
Infrastructure for Security : Around the world, the Internet infrastructure that Google has is probably the most secure.
There is rarely any type of unauthorized access to date as the application data and code are stored in highly secure
servers.
Faster Time to Market: Quickly releasing a product or service to market is the most important thing for every business.
Stimulating the development and maintenance of an app is critical when it comes to deploying the product fast. With
the help of Google cloud app Engine, a business can quickly develop-
Feature-rich apps with a quick development process
The backend application in a PaaS style environment
NoSQL style storage, flexible data storage, or Google Cloud SQL for relational database support.
Quick to Start: With no product or hardware to purchase and maintain, you can prototype and deploy the app to your
users without taking much time.
Easy to Use: Google App Engine (GAE) incorporates the tools that you need to develop, test, launch, and update the
applications.
Rich set of APIs & Services:
Google App Engine has several built-in APIs and services that allow developers to build robust and feature-rich apps.
These features include:
Access to the application log
Blobstore, serve large data objects
Google Cloud Storage
SSL Support
Page Speed Services
Google Cloud Endpoint, for mobile application
URL Fetch API, User API, Memcache API, Channel API, XXMP API, File API
Platform Independence: You can move all your data to another environment without any difficulty as there are not
many dependencies on the app engine platform.
Cost Savings: You don’t have to hire engineers to manage your servers or to do that yourself. You can invest the money
saved into other parts of your business.
Performance and Reliability: Google is among the leaders worldwide among global brands. So, when you discuss
performance and reliability you have to keep that in mind. In the past 15 years, the company has created new
benchmarks based on its services’ and products’ performance. The app engine provides the same reliability and
performance as any other Google product.
4
your applications from security threats using App Engine firewall capabilities, identity and access management (IAM)
rules, and managed SSL/ TLS certificates.
Pay only for what you use: Choose to run your applications in a serverless environment without the worry of over
or under provisioning. App Engine automatically scales depending on your application traffic and consumes
resources only when your code is running. You will only need to pay for the resources you consume.
Features:
Popular languages
Open and flexible
Fully managed
Monitoring, logging, and diagnostics
Application versioning
Traffic splitting
Application security
Services ecosystem
5. Open Stack:
OpenStack is an open source cloud computing platform that allows businesses to control large pools of compute,
storage and networking in a data centre. It uses pooled virtual resources to build and manage private and public clouds.
So OpenStack is Infrastructure-as-a-Service (IaaS) solution that consists a set of interrelated services.
OpenStack is highly configurable it means there are many different ways to use OpenStack, which makes it a
flexible tool that is able to work along with other software.
Another reason to adopt OpenStack is that it supports different hypervisors (Xen, VMware or kernel-based
virtual machine [KVM] for instance) and several virtualization technologies (such as bare metal or high-performance
computing).
OpenStack components: The OpenStack cloud platform is not a single thing, but a group of software modules that
serve different purposes. OpenStack components are shaped by open source contributions from the developer
community, and adopters can implement some or all of these components. Key OpenStack components, by category,
include:
o Compute- “Nova” is a full management and access tool to OpenStack compute resources—handling
scheduling, creation, and deletion.
o Storage- “Swift” an object storage service;
o Networking and content delivery- “Neutron” connects the networks across other OpenStack services.
o Data and analytics- “Searchlight” a data indexing and search service;
o Security and compliance- “Barbican” a management service for passwords, encryption keys and X.509
Certificates;
o Deployment- “Kolla” a service for container deployment;
o Management- “Rally” an OpenStack benchmark service;
o Applications- “Solum” a software development tool;
o Monitoring- “Monasca” a high-speed metrics monitoring and alerting service;
OpenStack pros and cons:
► avoid vendor lock-in - It means makes a customer dependent on a vendor for products and services,
unable to use another vendor without substantial switching costs. The most common vendor lock-in is the operating
system. When custom programs are written for a specific operating system, it is time consuming and costly to convert
those programs to another platform.
► Strong security - it has outstanding security features that keep you secure all the time.
► Open-source- OpenStack is open-source that makes it the favourite cloud software for the developers and
entrepreneurs. You can change OpenStack according to your growing needs. Due to open-source, you can
5
always add extra features. Thus, it becomes very flexible software. You can use it without any restrictions
- OpenStack is free of cost, and there are no restrictions to use it.
► Development support- OpenStack has been receiving a concrete development support from many
prestigious companies and from the top developers of the IT industry for many years.
► An array of services for different tasks.
► Easy to access and manage OpenStack.
But potential enterprise adopters must also consider some drawbacks.
Perhaps the biggest disadvantage of OpenStack is its very size and scope -- such complexity requires an IT staff to have
significant knowledge to deploy the platform and make it work. In some cases, an organization might require additional
staff or a consulting firm to deploy OpenStack, which adds time and cost.
As open source software, OpenStack is not owned or directed by any one vendor or team. This can make it
difficult to obtain support for the technology -- other than support from the open source community.
Consistency and access controls are managed when two or more independent geographically distributed clouds share
authentication, files, computing resources, control structures or access to storage resources. This means that the right
information must flow from one cloud to the other and vice-versa.
There are four basic types of federation: 1) Permissive 2) Verified 3) Encrypted 4) Trusted
What happens in a Federated Cloud?
In a federated cloud, the boundary between two clouds is always present. But, the elements of the boundary which
prevent the interoperability of two clouds are removed. The relevancy and visibility depend on who is doing what kind
of action to complete a task.
CLOUD FEDERATION BENEFITS:
1) The federation of cloud resources allows client to optimize enterprise IT service delivery.
2) The federation of cloud resources allows a client to choose. The best cloud service providers in terms of
flexibility cost and availability of services to neat a particular business or technological need within their organization.
3) Federation across different cloud resources pools allows applications to run in the most appropriate
infrastructure environments.
4) The federation of cloud resources allows an enterprise to distribute workload around the globe and move data
between desperate networks and implement innovative security models for user access to cloud resources.
6.1 Level of federations: Each cloud federation level presents different challenges and operates at a different layer of
the IT stack. It then requires the use of different approaches and technologies. Taken together, the solutions to the
challenges faced at each of these levels constitute a reference model for a cloud federation.
6
CONCEPTUAL LEVEL: The conceptual level addresses the challenges in presenting a cloud federation as a favorable
solution with respect to the use of services leased by single cloud providers. In this level it is important to clearly
identify the advantages for either service providers or service consumers in joining a federation and to describe the
new opportunities that a federated environment creates with respect to the single-provider solution.
Elements of concern at this level are:
Motivations for cloud providers to join a federation.
Motivations for service consumers to leverage (lift) a federation.
Advantages for providers in leasing their services to other providers.
Obligations of providers once they have joined the federation.
Trust agreements between providers.
Transparency versus consumers.
LOGICAL & OPERATIONAL LEVEL: The logical and operational level of a federated cloud identifies and addresses the
challenges in devising a framework that enables the aggregation of providers that belong to different administrative
domains within a context of a single overlay infrastructure, which is the cloud federation.
At this level, policies and rules for interoperation are defined. Moreover, this is the layer at which decisions are made as
to how and when to lease a service to—or to leverage a service from— another provider.
The logical component defines a context in which agreements among providers are settled and services are negotiated,
whereas the operational component characterizes and shapes the dynamic behaviour of the federation as a result of
the single providers’ choices.
This is the level where Maintenance Operations Control Centre (MOCC) is implemented and realized. It is important at
this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud provider, or an agreement?
• How should we define the rules and policies that allow providers to join a federation?
• What are the mechanisms in place for settling agreements among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the federation?
• Which kinds of services are more likely to be leased or bought?
7
• How should we price resources that are leased, and which fraction of resources should we lease? The logical and
operational level provides opportunities for both academia and industry.
INFRASTRUCTURE LEVEL: The infrastructural level addresses the technical challenges involved in enabling
heterogeneous cloud computing systems to interoperate seamlessly. It deals with the technology barriers that keep
separate cloud computing systems belonging to different administrative domains. By having standardized protocols and
interfaces, these barriers can be overcome.
At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and services enabling interoperability?
8
Server. This allows you to authenticate external users without having to let unauthenticated traffic into
your internal network
Attribute Stores: The attribute store is where the values used for the claims are stored. After authentication, the
STS will query the attribute store to find the appropriate user information needed to set the claims and create the
token.
Relying Parties: The relying party is the consumer of the claims created by the STS. Since ADFS supports both active
and passive clients, the relying parties can be web applications or web services. The STS must be configured with
the configuration information for each relying party that it will support.
Endpoints: Endpoints are used to provide access to services on the federation server. There are several types of
endpoints that can be used with ADFS including WS-Trust 1.3, WS-Trust 2005, WS-Federation Passive, SAML SS0,
Federation Metadata, SAML Artifact Resolution, and WS-Trust WSDL.
6.2.3 Future of federation: Cloud Federation continues being an open issue in current cloud market. Cloud Federation
would address many existing limitations in cloud computing:
Cloud end-users are often tied to a unique cloud provider, because of the different APIs, image formats, and
access methods exposed by different providers that make very difficult for an average user to move its
applications from one cloud to another, so leading to a vendor lock-in problem.
Many big companies (e.g. banks, hosting companies, etc.) and also many large institutions maintain several
distributed data-centers or server-farms, for example to serve to multiple geographically distributed offices.
Resources and networks in these distributed data-centers are usually configured as non-cooperative separate
elements, so that usually every single service or workload is deployed in a unique site or replicated in multiple
sites.
Many educational and research centers often deploy their own computing infrastructures, that usually do not
cooperate with other institutions, except in same punctual situations (e.g. in joint projects or initiatives). Many
times, even different departments within the same institution maintain their own non-cooperative
infrastructures
This Study Group will evaluate the main challenges to enable the provision of federated cloud infrastructures, with
special emphasis on inter-cloud networking and security issues:
-Security and Privacy
-Interoperability and Portability
-Performance and Networking Cost