0% found this document useful (0 votes)
56 views

CloudComputingTechnology1 Unit1

The document provides an overview of cloud computing technology. It discusses that cloud computing provides on-demand access to computing resources like applications, data storage, servers, and development tools over the internet. These resources are hosted at remote data centers managed by cloud service providers who make the resources available to users for a regular subscription fee. The document also summarizes the history and evolution of cloud computing from concepts in the 1960s to the development of technologies like virtualization in the 2000s that enabled the cloud models used today.

Uploaded by

Navnoor kaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

CloudComputingTechnology1 Unit1

The document provides an overview of cloud computing technology. It discusses that cloud computing provides on-demand access to computing resources like applications, data storage, servers, and development tools over the internet. These resources are hosted at remote data centers managed by cloud service providers who make the resources available to users for a regular subscription fee. The document also summarizes the history and evolution of cloud computing from concepts in the 1960s to the development of technologies like virtualization in the 2000s that enabled the cloud models used today.

Uploaded by

Navnoor kaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Cloud Computing Technology

Lecture1 (Module-1)
Cloud Computing is on-demand access to storage & computing resources
(applications, data storage, physical or virtual servers, development
tools, simulation software, networking capabilities & many more)
through the internet.

storage & computing resources are hosted at a remote data center


managed by CSP.

Hence, there are two main components of cloud –


1. Cloud Service Providers.
2. Users.
PC: Internet
CSP makes these resources available to users for a regular subscription fee
– bills them as per usage, no upfront cost – Pay as you go.
There is a Hype around Cloud:
Past: resources – compute power and data storage are owned & maintained by individual
companies. On-Premise Hardware.
Now: Companies are shifting towards Cloud.
Yahoo Finance Report (5th November 2021):

IDC Forecasts (14th September 2021):

Gartner Forecasts (21st April, 2021):

https://www.businesswire.com/news/home/20210810005902/en/Global-Cloud-Computing-Market-2021-to-2028---Si
ze-Share-Trends-Analysis-Report---ResearchAndMarkets.com
Major Cloud Providers:

Amazon Web Services (AWS):


Elastic Compute Cloud (EC2) – Provides computing services.
Simple Storage Service (S3) – Provides you the ability to store data, so that you can
access those data from anywhere.
Elastic Block Storage (EBS) – This is volume storage that EC2 instances can access
during the time when they are running.

Microsoft Azure

Google Cloud Platform (GCP)

Rightscale, Salesforce, IBM, EMC, Gigaspaces, 10gen, Datastax, Oracle, VMWare,


Yahoo, Cloudera & Many more.
Cloud

Private Cloud Public Cloud

A private cloud is accessible Public cloud can be accessed by


only to special privileged anyone, anywhere in the world,
employees of a company. For if they are willing to swipe their
example, if you're working in a credit/debit card.
company A, and the company Example: AWS, GCP
A has a data center or a cluster
or a cloud which is accessible
only to the employees of that
company, that's a private cloud;
it is not openly accessible.
Private Cloud – You buy a data center or a cloud and run it by yourself; that is your own
private cloud.

Public Cloud – You can access for regular (monthly) subscription basis – pay as you go
basis.

The AWS S3 Simple Storage Service allows you to store arbitrary data sets, and you pay
per gigabyte month that you store; there are no upfront costs.

The Elastic Compute Cloud, allows you to upload and run arbitrary OS images. These are
virtual machines (VMs), and you pay per CPU hours that you use. Therefore, if you use,
say, 3 VMs for 1 hour, all at the same time, you will be paying, for 3 CPU hours. And
similarly, the other services of AWS or Microsoft Azure might offer, are also, priced
accordingly.
The GB-month is a measure of how many gigabytes of storage are
provisioned to your account and how long the storage is provisioned to
your account.

1 GB-Month = storage of 1 GB for 1 month (31 days = 744 hours)

In a 31-day month, you are billed for one GB-month in the following
scenarios:
• If you have provisioned a 744-GB volume for 1 hours.
• If you have provisioned a 31-GB volume for 24 hours.
• If you have provisioned a 1-GB volume for 744 hours (the number of
hours in 31 days).
A core-hour is a measurement of computational time. In On Scale, if you run one CPU
for one hour, that's one core-hour. If you run 1000 CPUs for 1 hour, then that's 1000
core-hours.
There are a variants of services –

Google AppEngine or Compute Engine offers you the ability to develop


applications within their AppEngine, and then you upload the data that would be
imported into their format, and you run your application.

In this way, you can use these to host – your own web services, or your data so that
it is served out, or to run your own computation.

Hybrid Cloud = Private + Public

Orchestration among different platforms - on-premises infrastructure, private cloud services, and
public clouds.
Why Cloud ?

hundreds of startups across the globe, are leveraging the popular cloud providers, and a
variety of other cloud providers as well, rather than buying their own machines.

Save money & time.

Cost Globally
Agility Elasticity Saving available
Further study:
Accelerate Scaling https://www.leadingedgetech.co.uk/it-s
up & Eliminates Cloud service
The ervices/it-consultancy-services/cloud-c
down to The cost of is globally
development omputing/advantages-of-cloud-computi
grow or On-site available
Of new ng/
shrink infrastructure
software
capacity
What is Cloud ?

It’s a Custer – a bunch of servers connected by a network.


It has a huge computing power – process TB, PB, EB or ZB of data.
It can also store Huge data volume.

Cloud = A lot of storage resources + Compute Cycle located nearby.

Example: you have a 1000 nodes cluster connected via ethernet. You will
scan 100 TB data in your 1000 nodes cluster; each node will have roughly
0.1 TB data. Now if data is coming from remote storage, with 10 Mbps
ethernet speed, it will take about 165 minutes.
On the other hand, data is locally stored in nodes, with 50 Mbps speed, it
will take about 33 minutes.

Hence, we can’t move data easily; rather moving computation to the data is
much more feasible.
Cloud/Cluster can be of 2 types – (1) Single Site Cloud and (2)
Geo-distributed Cloud
Single Site Cloud / Datacenter –
(i) Computing nodes
(ii) Computing nodes grouped together to form a rack.
(iii) All the servers of a rack share same power & top of the rack switch.
(iv) The top of the rack switches are connected to a tree like network
topology.
(v) So, All top of the rack switches are connected to a core swich.
(vi) Front end nodes – user facing nodes – servers used for computing –
users’ requests come in or clients can submit the job.
(vii) Softwares – running at these servers and routers – OS, User level
applications, Actual IP Protocols, routing Protocols.
(viii) Back end nodes – storage – SSD, Large hard discs.

This Datacenter is generally in a single building – Data Warehouse.

Example – many companies may have such Datacenter.


Single Site Cloud -
Geo-Distributed Cloud:

Multiple such sites.

There can be multiple datacentres those are geographically


distributed, and they are connected with each other via internet.

Each site typically has the same structure and services as the other or
at least as far as a software is concerned, it has the same software
stack.

Different sites might also have different software stacks.


History of cloud computing:

1. It’s not today’s technology – evolved gradually through a sequence of phases


– client-server computing, distributed computing (Grid Computing, P2P
etc.).
2. These technologies shape what Today’s cloud is.
EARLY 1960S – John McCarthy – “Computing Can be sold as a Utility, like Water and Electricity.” – concept of
timesharing – enabling Organization to simultaneously use an expensive mainframe. This computing is described as a
significant contribution to the development of the Internet, and a pioneer of Cloud computing.

IN 1969 – The idea of an “Intergalactic Computer Network” or “Galactic Network” (a computer networking concept
similar to today’s Internet) was introduced by J.C.R. Licklider, who was responsible for enabling the development of
ARPANET (Advanced Research Projects Agency Network). His vision was for everyone on the globe to be
interconnected and being able to access programs and data at any site, from anywhere.

Early days of internet, data centers rented physical servers for hosting websites. Multiple Websites shared one server,
where each website got a virtual web server that was hosted from a folder on a server disc. It’s not possible to install
anything on server.

IN 1970 – Using virtualization software like VMware. It become possible to run more than one Operating System
simultaneously in an isolated environment. It was possible to run a completely different Computer (virtual machine)
inside a different Operating System.

IN 1997 – The first known definition of the term “Cloud Computing” seems to be by Prof. Ramnath Chellappa in
Dallas in 1997 – “A computing paradigm where the boundaries of computing will be determined by economic
rationale rather than technical limits alone.”

IN 1999 – The arrival of Salesforce.com in 1999 pioneered the concept of delivering enterprise applications via simple
website. The services firm covered the way for both specialist and mainstream software firms to deliver applications
over the Internet.
IN 2003 – The first public release of Xen, which creates a Virtual Machine Monitor (VMM) also known as a
hypervisor, a software system that allows the execution of multiple virtual guest operating systems simultaneously
on a single machine.

Early 2000, hosting providers shifted towards virtualization, so that one physical machine can run multiple virtual
servers – one virtual server per customer. Each customer would then install an OS & configure a virtual server
from the scratch.

Amazon was a book selling company, Later started to sell variety of items – Eventually, the computing
infrastructure necessary to support their business, grew so large, they decided to rent their excess capacity to other
companies.

IN 2006 – Amazon commercialized this idea as AWS. Amazon expanded its cloud services. First was its Elastic
Compute cloud (EC2), which allowed people to access computers and run their own applications on them, all on
the cloud.

Then they brought out Simple Storage Service (S3). This introduced the pay-as-you-go model to both users and
the industry as a whole, and it has basically become standard practice now.

Sources: www.timesofcloud.com, www.udacity.com


How Today’s cloud is different from previous generation distributing system?

1. Massive Scale – large datacentre – Hundreds of thousands of servers connected. You can
run computation on as many as servers you want.
2. On-Demand Access – (i) you don’t need to sign a contract to purchase the resources, you
pay only when you use – no upfront cost. (ii) Anyone can use it.
3. Data Intensive nature – TB, EB & ZB of data, diversity of data.
4. New cloud programming paradigms –easier processing of large volume of data –
Map Reduce, easy storage (Cassandra and NoSQL stores such as MongoDB) and query
on data.
Massive Scale: Facebook, in 2012, ran about 180,000 servers and it had exponentially grown the
number of servers from 30,000 in 2009 to 60,000 in 2010.

Microsoft in 2008 was running 150,000 machines out of which 80,000 were running Bing, their
search engine, and they were growing at that point of time at 10,000 machines per month.

Yahoo in 2009 was running 100,000 machines in the data centers. And these are split into
clusters of 4,000 because of the way Hadoop runs.

Amazon Web Services was running 40K machines in 2009.


Google is known for its search engine, they need to index the data, & websites, those
are crawled by their crawlers.
In 2006, this indexing – a chain of 24 MapReduce jobs.
Processed about 50 petabytes per month – using 200,000 MapReduce jobs.

Yahoo similarly has a web map application – chain of 100 MapReduce jobs.
Processed 280 terabytes of data in 73 hours, using 2500 nodes.
Hadoop + Pig, (developed on top of Hadoop).

Facebook uses Hadoop + Hive, (developed on top Hadoop) – 3000 MR jobs processed
55 terabytes per day
Adding 2 terabytes per day, in 2008.
Power at Datacenter – Hydro power, Solar power, coal power

WUE = annual water usage/IT equipment energy (L/KWh)

PUE = Total facility power/IT equipment power

Lower values is preferred

Google – 1.11

Cooling at Datacenter –

Suck in Air from the top of data center - > purified water
spray - > Moves cool air through motor near server.

https://www.youtube.com/watch?v=ujO-xNvXj3g , https://www.youtube.com/watch?v=XZmGGAbHqa0
Cloud = Hundreds to thousands of machines on the data center side - connected. This
we call as a server side. + On the client side there might be millions or even
more machines which are accessing services that are hosted by these servers.

Client side – web pages, websites, objects to be stored, or other kind of services.
The servers on the data center side i.e. the server side – communicate amongst one
another.
The clients directly communicate with the one/multiple servers.

every time you access your Facebook profile, your client process, is going to access
multiple Facebook servers – some for the photos, some for the profile, some for the wall,
and so on and so forth.

Client can also talk to each other (P2P).

Communication with one another happens over a network.

Distributed System.
Distributed System - A distributed system is a collection of machines (connected), whose
distribution appears as one local machine to user.

This working definition is not 100% perfect; for example – web system is a distributed
system, although it does not appear to a single entity to user.

Also, it’s not always client-server architecture; Like P2P system.

Features : Autonomous machines,


Asynchronous (unlike parallel system),
Failure-Prone,
communicate through unreliable medium.
Grid System – used for HPC applications.

B C
DAG
Job 0
DAG

Data Data
Job 1 Several GBs
Job 2

Data Data
Jobs can take several
hrs to several days.
1. Stage-in / Initialization stage Job 3
2. Execution stage
3. Stage-out
4. Publish
Each job is split into several tasks (1000 or even more).

Each task runs on one machine, or on one processor of a particular site (A, B or C).

These tasks typically do not talk with each other, because they are highly parallelized.

When they are done, they return their results, to the job itself, which then aggregates it.

So, the main question here is a scheduling problem.

How do you schedule this overall application (DAG of Jobs) across the grid
resources, which are distributed out over the wide area?
Intra-site protocol
A

Inter-site protocol
B C

Intra-site protocol
Intra-site protocol, example- HT Condor
protocol

Job 0

Job 3

Internal allocation, Scheduling, Monitoring, distribution & publishing of files.


Run on several workstations.

When workstation is free, ask site’s central server (or Inter-site


protocol) for task.

It is given a task to run. When the task completes, then it asks for
another task, and so on and so forth.

If user comes & hits a keystroke or mouse click, the task needs
to be stopped – Either kill task or ask server to reschedule. The
partial results from that task are sent to the central server so that,
,the rest of the work of that task can be completed on some other
workstation.

Can run on dedicated machine, as well.


1.External allocation & scheduling
2.Stage in & Stage out of files
3. Internal structure of a site is
invisible to inter-site protocol - does
not necessarily need to know about
what intra-site does inside the site A. Intra-site protocol
It has a well-defined API that it uses A
to communicate with these intra-site
protocols.

Inter-site protocol
B C

Intra-site protocol
So, We have seen different computing paradigms –

1. Centralized computing – All resources (storage + memory+ computation) are


centralized in one physical system & tightly integrated with one Operating System
(example – Mainframe computer)
2. Parallel computing – Processors on a single board or different processors communicate
via tightly coupled network; A problem is broken into discrete parts that can be solved
concurrently.
3. Distributed computing – Collection of network connected machines.
4. Cluster computing – Multiple Commodity machines connected via LAN and
controlled by a specific Software tools, that manages them as a single entity.
Alternative to supercomputer and Mainframe.
5. Grid Computing – harness the resource from multiple systems.
6. Utility Computing – Computing service as utility.

Cloud Computing = ?
Cloud Economics: Study of cloud computing’s costs and benefits and the economic
principles that underpin them

Are cloud always money-saver?

When you start-up your own company there are two options – (a) set up own Private cloud
or (b) use Public cloud/outsource ?

Public Cloud – pay for CPU-hour or GB-month only; not power, cooling, set-up cost,
management cost etc.
Private Cloud – pay for everything.

Example :
A organization wishes to run it’s business for n months.

Service require 128 servers (1024 cores) & 524 TB storage.

Outsource (via AWS): Monthly cost – S3 cost: $ 0.12 /GB month & EC2 cost: $ 0.10 / CPU-hour
Storage-cost/month – 0.12 * 524 * 1024 = $ 62 K
Computation-cost/month – 0.10 * 1024 * 24 * 30 = $ 74 K
Total cost/month – $ 136 K

Private: Storage = $ 350 K/ n months


Computing hardwares + power+ network = $ 1200 K/n months
Total cost/month = $ 1550 K/n months + 7.5 K (1 sys admin / 100 nodes)

Industry rule for thumb: 0.45:0.4:0.15 for hardware: power: network (3 yerars of hardware lifetime).

This analysis shows – owning your own private cloud is preferable only if:
Storage: $ 350 K/n < $ 62 K
Total: $ 1500 K/n +7.5 K < $ 136

Storage: n > 5.5


Total: n > 12
Summary

You might also like