0% found this document useful (0 votes)
23 views

Clouding Computing (UNIT - IV)

cloud computing notes

Uploaded by

Safalta Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Clouding Computing (UNIT - IV)

cloud computing notes

Uploaded by

Safalta Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Cloud computing (UNIT – V)

1. Hadoop - MapReduce
Hadoop is an open source software framework used to develop data processing applications which are executed in a
distributed computing environment. There are (of Hadoop Architecture) basically two components in Hadoop:
The first one is HDFS for storage (Hadoop distributed File System), that allows you to store data of various formats
across a cluster.
The second one is YARN, for resource management in Hadoop. It allows parallel processing over the data, i.e. stored
across HDFS.

Fig. : Hadoop Framework

MapReduce is the core component for data processing in Hadoop framework. It is a processing technique built on
divide and conquer algorithm. It is made of two different tasks - Map and Reduce. Map takes a set of data and converts
it into another set of data, where individual elements are broken down into tuples. Secondly, reduce task, which takes
the output from a map as an input and combines those data tuples into a smaller set of tuples and fetches it.

How MapReduce Algorithm Works? The whole process goes through four phases of execution namely, splitting,
mapping, shuffling, and reducing. The data goes through the following phases:
Input Splits: In this phase it takes input tasks (say Data Sets) and divided into fixed-size pieces called input splits.
Mapping: This is the very first phase in the execution of map-reduce program. It takes input tasks (say DataSets) and
divides them into smaller sub-tasks. Then perform required computation on each sub-task in parallel. The output of this
Map Function is a set of key and value pairs in the form of <word, frequency>.

1
Shuffling: Shuffle Function is also known as “Combine Function”. It performs the following two sub-steps:
 Merging
 Sorting
This phase consumes the output of mapping phase and performs these two sub-steps on each and every key-value pair.
o Merging step combines all key-value pairs which have same keys.
o Sorting step takes input from merging step and sorts all key-value pairs by using Keys.
Finally, Shuffle Function returns a list of <Key, List<Value>> sorted pairs to next step.
Reducing: In this phase, output values from the shuffling phase are aggregated. This phase combines values from
shuffling phase and returns a single output value. In short, this phase summarizes the complete dataset.
Let's understand this with an example –
Consider you have following input data for your Map Reduce Program
Welcome to Hadoop Class
Hadoop is good
Hadoop is bad

bad 1
The final output of the MapReduce task is
Class 1
good 1
Hadoop 3
is 2
to 1
Welcom 1

2. Virtual Box: Virtual Box is open- e source software for virtualizing


the x86 computing architecture [x86 is Intel CPU architecture. Today, the term "x86" is used generally to refer to

2
any 32-bit processor compatible with the x86 instruction set]. It acts as a hypervisor, creating a VM (virtual machine) in
which the user can run another OS (operating system).
The operating system in which Virtual Box runs is called the "host" OS. The operating system running in the VM
is called the "guest" OS. Virtual Box supports Windows, Linux, or macOS as its host OS.
Guest operating systems supported by Virtual Box include:
o Windows 10, 8, 7, XP, Vista, 2000, NT, and 98.
o Solaris and OpenSolaris
o MS-DOS.
o OS/2
o QNX
o BeOS

3. Google App Engine (GAE): Google App Engine is a Platform-as-a-Service. Amongst its various cloud-based products,
Google app engine has become quite popular.
It is a service for developing and hosting Web applications in Google's data centers, belonging to the platform as a
service (PaaS) category of cloud computing. These applications are required to be written in one of a few supported
languages, namely: Java, Python, GO, PHP etc. It is basically a cloud-computing platform through which applications can
be run in a serverless environment. The app engine supports the delivery, testing and development of software on
demand in a Cloud computing environment that supports millions of users and is highly scalable.
The company extends its platform and infrastructure to the Cloud through its app engine. It presents the
platform to those who want to develop SaaS solutions at competitive costs.

Features of App Engine


A. Runtimes and Languages: You can use Go, Java, PHP or Python to write an app engine application. You can develop
and test an app locally using the SDK containing tools for deploying apps. Every language has its own SDK and runtime.
Your code is executed in a:
 Java 7 environment by Java runtime
 Python 2.7 environment by Python runtime
 PHP 5.4 environment by PHP runtime
 Go 1.2 environment by Go runtime
B. Generally Available Features: These are covered by the depreciation policy and the service-level agreement of the
app engine. Any changes made to such a feature are backward-compatible and implementation of such a feature is
usually stable. These include data storage, retrieval, and search; communications; process management; computation;
app configuration and management.
C. Features in Preview: These features are sure to ultimately become generally available features in some release of
the app engine in the future. However, their implementation might change in backward-incompatible ways, as these
are in the preview. These include Sockets, Map Reduce and Google Cloud Storage Client Library.
Preview features include Google Cloud storage client library, sockets, and Map Reduce.
D. Experimental Features: These might or might not become generally available in app engine releases in the future.
The experimental features include Appstats Analytics, Restore/Backup/Datastore Admin, Task Queue Tagging,
MapReduce, Task Queue REST API, OAuth, Prospective Search, PageSpeed and OpenID.

Advantages of Google App Engine: There are many advantages to the Google App Engine that helps to take your app
ideas to the next level. This includes:

3
Infrastructure for Security : Around the world, the Internet infrastructure that Google has is probably the most secure.
There is rarely any type of unauthorized access to date as the application data and code are stored in highly secure
servers.
Faster Time to Market: Quickly releasing a product or service to market is the most important thing for every business.
Stimulating the development and maintenance of an app is critical when it comes to deploying the product fast. With
the help of Google cloud app Engine, a business can quickly develop-
 Feature-rich apps with a quick development process
 The backend application in a PaaS style environment
 NoSQL style storage, flexible data storage, or Google Cloud SQL for relational database support.
Quick to Start: With no product or hardware to purchase and maintain, you can prototype and deploy the app to your
users without taking much time.
Easy to Use: Google App Engine (GAE) incorporates the tools that you need to develop, test, launch, and update the
applications.
Rich set of APIs & Services:
Google App Engine has several built-in APIs and services that allow developers to build robust and feature-rich apps.
These features include:
 Access to the application log
 Blobstore, serve large data objects
 Google Cloud Storage
 SSL Support
 Page Speed Services
 Google Cloud Endpoint, for mobile application
 URL Fetch API, User API, Memcache API, Channel API, XXMP API, File API
Platform Independence: You can move all your data to another environment without any difficulty as there are not
many dependencies on the app engine platform.
Cost Savings: You don’t have to hire engineers to manage your servers or to do that yourself. You can invest the money
saved into other parts of your business.
Performance and Reliability: Google is among the leaders worldwide among global brands. So, when you discuss
performance and reliability you have to keep that in mind. In the past 15 years, the company has created new
benchmarks based on its services’ and products’ performance. The app engine provides the same reliability and
performance as any other Google product.

4. Programming Environment for GAE :


Build and deploy applications on a fully managed platform. Scale your applications seamlessly from zero to planet scale
without having to worry about managing the underlying infrastructure. With zero server management and zero
configuration deployments, developers can focus only on building great applications without the management
overhead. App Engine enables developers to stay more productive and agile by supporting popular development
languages and a wide range of developer tools.
Open and familiar languages and tools: Quickly build and deploy applications using many of the popular languages like
Java, PHP, Node.js, Python, C#, .Net, Ruby, and Go or bring your own language runtimes and frameworks if you choose.
Manage resources from the command line, debug source code in production, and run API backends easily, using
industry-leading tools such as Cloud SDK, Cloud Source Repositories, IntelliJ IDEA, Visual Studio, and PowerShell.
Just add code: Focus just on writing code, without the worry of managing the underlying infrastructure. With
capabilities such as automatic scaling-up and scaling-down of your application between zero and planet scale, fully
managed patching and management of your servers, you can offload all your infrastructure concerns to Google. Protect

4
your applications from security threats using App Engine firewall capabilities, identity and access management (IAM)
rules, and managed SSL/ TLS certificates.
Pay only for what you use: Choose to run your applications in a serverless environment without the worry of over
or under provisioning. App Engine automatically scales depending on your application traffic and consumes
resources only when your code is running. You will only need to pay for the resources you consume.
Features:
Popular languages
Open and flexible
Fully managed
Monitoring, logging, and diagnostics
Application versioning
Traffic splitting
Application security
Services ecosystem

5. Open Stack:
OpenStack is an open source cloud computing platform that allows businesses to control large pools of compute,
storage and networking in a data centre. It uses pooled virtual resources to build and manage private and public clouds.
So OpenStack is Infrastructure-as-a-Service (IaaS) solution that consists a set of interrelated services.
OpenStack is highly configurable it means there are many different ways to use OpenStack, which makes it a
flexible tool that is able to work along with other software.
Another reason to adopt OpenStack is that it supports different hypervisors (Xen, VMware or kernel-based
virtual machine [KVM] for instance) and several virtualization technologies (such as bare metal or high-performance
computing).
OpenStack components: The OpenStack cloud platform is not a single thing, but a group of software modules that
serve different purposes. OpenStack components are shaped by open source contributions from the developer
community, and adopters can implement some or all of these components. Key OpenStack components, by category,
include:
o Compute- “Nova” is a full management and access tool to OpenStack compute resources—handling
scheduling, creation, and deletion.
o Storage- “Swift” an object storage service;
o Networking and content delivery- “Neutron” connects the networks across other OpenStack services.
o Data and analytics- “Searchlight” a data indexing and search service;
o Security and compliance- “Barbican” a management service for passwords, encryption keys and X.509
Certificates;
o Deployment- “Kolla” a service for container deployment;
o Management- “Rally” an OpenStack benchmark service;
o Applications- “Solum” a software development tool;
o Monitoring- “Monasca” a high-speed metrics monitoring and alerting service;
OpenStack pros and cons:
► avoid vendor lock-in - It means makes a customer dependent on a vendor for products and services,
unable to use another vendor without substantial switching costs. The most common vendor lock-in is the operating
system. When custom programs are written for a specific operating system, it is time consuming and costly to convert
those programs to another platform.
► Strong security - it has outstanding security features that keep you secure all the time.
► Open-source- OpenStack is open-source that makes it the favourite cloud software for the developers and
entrepreneurs. You can change OpenStack according to your growing needs. Due to open-source, you can

5
always add extra features. Thus, it becomes very flexible software. You can use it without any restrictions
- OpenStack is free of cost, and there are no restrictions to use it.
► Development support- OpenStack has been receiving a concrete development support from many
prestigious companies and from the top developers of the IT industry for many years.
► An array of services for different tasks.
► Easy to access and manage OpenStack.
But potential enterprise adopters must also consider some drawbacks.
Perhaps the biggest disadvantage of OpenStack is its very size and scope -- such complexity requires an IT staff to have
significant knowledge to deploy the platform and make it work. In some cases, an organization might require additional
staff or a consulting firm to deploy OpenStack, which adds time and cost.
As open source software, OpenStack is not owned or directed by any one vendor or team. This can make it
difficult to obtain support for the technology -- other than support from the open source community.

6. Federation in the cloud:


A cloud federation is the deployment and management of multiple external and internal cloud computing services
to match business needs. It means that the functions and resources of two geographically different clouds are
completely available to each other. A federation is the union of several smaller parts that perform a common action.

Consistency and access controls are managed when two or more independent geographically distributed clouds share
authentication, files, computing resources, control structures or access to storage resources. This means that the right
information must flow from one cloud to the other and vice-versa.
There are four basic types of federation: 1) Permissive 2) Verified 3) Encrypted 4) Trusted
What happens in a Federated Cloud?
In a federated cloud, the boundary between two clouds is always present. But, the elements of the boundary which
prevent the interoperability of two clouds are removed. The relevancy and visibility depend on who is doing what kind
of action to complete a task.
CLOUD FEDERATION BENEFITS:
1) The federation of cloud resources allows client to optimize enterprise IT service delivery.
2) The federation of cloud resources allows a client to choose. The best cloud service providers in terms of
flexibility cost and availability of services to neat a particular business or technological need within their organization.
3) Federation across different cloud resources pools allows applications to run in the most appropriate
infrastructure environments.
4) The federation of cloud resources allows an enterprise to distribute workload around the globe and move data
between desperate networks and implement innovative security models for user access to cloud resources.

6.1 Level of federations: Each cloud federation level presents different challenges and operates at a different layer of
the IT stack. It then requires the use of different approaches and technologies. Taken together, the solutions to the
challenges faced at each of these levels constitute a reference model for a cloud federation.
6
CONCEPTUAL LEVEL: The conceptual level addresses the challenges in presenting a cloud federation as a favorable
solution with respect to the use of services leased by single cloud providers. In this level it is important to clearly
identify the advantages for either service providers or service consumers in joining a federation and to describe the
new opportunities that a federated environment creates with respect to the single-provider solution.
Elements of concern at this level are:
 Motivations for cloud providers to join a federation.
 Motivations for service consumers to leverage (lift) a federation.
 Advantages for providers in leasing their services to other providers.
 Obligations of providers once they have joined the federation.
 Trust agreements between providers.
 Transparency versus consumers.
LOGICAL & OPERATIONAL LEVEL: The logical and operational level of a federated cloud identifies and addresses the
challenges in devising a framework that enables the aggregation of providers that belong to different administrative
domains within a context of a single overlay infrastructure, which is the cloud federation.
At this level, policies and rules for interoperation are defined. Moreover, this is the layer at which decisions are made as
to how and when to lease a service to—or to leverage a service from— another provider.
The logical component defines a context in which agreements among providers are settled and services are negotiated,
whereas the operational component characterizes and shapes the dynamic behaviour of the federation as a result of
the single providers’ choices.
This is the level where Maintenance Operations Control Centre (MOCC) is implemented and realized. It is important at
this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud provider, or an agreement?
• How should we define the rules and policies that allow providers to join a federation?
• What are the mechanisms in place for settling agreements among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the federation?
• Which kinds of services are more likely to be leased or bought?
7
• How should we price resources that are leased, and which fraction of resources should we lease? The logical and
operational level provides opportunities for both academia and industry.
INFRASTRUCTURE LEVEL: The infrastructural level addresses the technical challenges involved in enabling
heterogeneous cloud computing systems to interoperate seamlessly. It deals with the technology barriers that keep
separate cloud computing systems belonging to different administrative domains. By having standardized protocols and
interfaces, these barriers can be overcome.
At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and services enabling interoperability?

6.2 Federated Services and Applications:


Active Directory Federation Service (AD FS) enables Federated Identity and Access Management by securely sharing
digital identity and entitlements rights across security and enterprise boundaries. In ADFS, an identity federation is
constructed between two organizations. On one side is the federation server, which authenticates the user through
standard accepted means using an active directory and issues tokens containing the user's claims. On the other side is
the resource. Federation services validate this token and accept the claimed identity. This allows the federation to
provide a user with access to resources that essentially belong to another secure server.
It provides a secure, reliable, scalable, and extensible identity federation solution.
6.2.1 ADFS Functionality: ADFS 2.0 uses a true claim-based approach to authentication, authorization, and federation.
ADFS takes a standards-based approach to implementing functionality. This allows greater interoperability with other
token services and claims-based IdPs (identity provider).
 Claims-Based Authentication Clients: ADFS provides full claims-based authentication (CBA) functionality by
supporting both active and passive clients. Passives clients generally use in web-site-based activities. Most web
browsers have built-in passive CBA client functionality. Active clients are a little bit different; they are mostly used
with web services. Active CBA clients are usually developed using the Windows Identity Foundation framework.
 Security Assertion Markup Language (SAML): In order to provide standard token support, ADFS supports the
use of SAML. This allows it to be compatible with a wide range of federation technologies. It can interoperate with
virtually any implementation that adheres to the SAML standard.
 Federation with Other Secure Token Servers: ADFS supports federation with other Secure Token Servers
(STSs). This allows you to trust tokens that were generated by another issuer. The federation server will then
perform a token transformation. The federation server will pull the claims from the incoming token and use them
to create tokens of its own. The new token can then be used by relying parties that trust you’re STS.
6.2.2 ADFS Components: An ADFS 2.0 implementation includes several key components. Each component plays a
different role in providing the total solution. We will cover each of these components. They include the federation
servers, the attribute store, relying parties, and endpoints.
 Federation Service: The Federation Service is one of the key components of an ADFS environment. The Federation
Service serves several purposes. The federation server is the server that manages the tokens. Basically, it’s the
server where the STS is installed. The Federation Service manages the trust relationship with the relying parties. It
also manages the trust relationship with other IdPs. The federation server can be configured using the Federation
Server Configuration Wizard.
 Federation Proxy Servers: Federation Proxy Servers allow external users access to your internal ADFS environment.
A Federation Proxy Server can be installed in your DMZ (A demilitarized zone (DMZ) refers to a host or network that
acts as a secure and intermediate network or path between an organization's internal network and the external).
External users will authenticate against the proxy. The proxy will forward the requests to your internal Federation

8
Server. This allows you to authenticate external users without having to let unauthenticated traffic into
your internal network
 Attribute Stores: The attribute store is where the values used for the claims are stored. After authentication, the
STS will query the attribute store to find the appropriate user information needed to set the claims and create the
token.
 Relying Parties: The relying party is the consumer of the claims created by the STS. Since ADFS supports both active
and passive clients, the relying parties can be web applications or web services. The STS must be configured with
the configuration information for each relying party that it will support.
 Endpoints: Endpoints are used to provide access to services on the federation server. There are several types of
endpoints that can be used with ADFS including WS-Trust 1.3, WS-Trust 2005, WS-Federation Passive, SAML SS0,
Federation Metadata, SAML Artifact Resolution, and WS-Trust WSDL.
6.2.3 Future of federation: Cloud Federation continues being an open issue in current cloud market. Cloud Federation
would address many existing limitations in cloud computing:
 Cloud end-users are often tied to a unique cloud provider, because of the different APIs, image formats, and
access methods exposed by different providers that make very difficult for an average user to move its
applications from one cloud to another, so leading to a vendor lock-in problem.
 Many big companies (e.g. banks, hosting companies, etc.) and also many large institutions maintain several
distributed data-centers or server-farms, for example to serve to multiple geographically distributed offices.
Resources and networks in these distributed data-centers are usually configured as non-cooperative separate
elements, so that usually every single service or workload is deployed in a unique site or replicated in multiple
sites.
 Many educational and research centers often deploy their own computing infrastructures, that usually do not
cooperate with other institutions, except in same punctual situations (e.g. in joint projects or initiatives). Many
times, even different departments within the same institution maintain their own non-cooperative
infrastructures
This Study Group will evaluate the main challenges to enable the provision of federated cloud infrastructures, with
special emphasis on inter-cloud networking and security issues:
-Security and Privacy
-Interoperability and Portability
-Performance and Networking Cost

You might also like