CloudComputingTechnology1 Unit1
CloudComputingTechnology1 Unit1
Lecture1 (Module-1)
Cloud Computing is on-demand access to storage & computing resources
(applications, data storage, physical or virtual servers, development
tools, simulation software, networking capabilities & many more)
through the internet.
https://www.businesswire.com/news/home/20210810005902/en/Global-Cloud-Computing-Market-2021-to-2028---Si
ze-Share-Trends-Analysis-Report---ResearchAndMarkets.com
Major Cloud Providers:
Microsoft Azure
Public Cloud – You can access for regular (monthly) subscription basis – pay as you go
basis.
The AWS S3 Simple Storage Service allows you to store arbitrary data sets, and you pay
per gigabyte month that you store; there are no upfront costs.
The Elastic Compute Cloud, allows you to upload and run arbitrary OS images. These are
virtual machines (VMs), and you pay per CPU hours that you use. Therefore, if you use,
say, 3 VMs for 1 hour, all at the same time, you will be paying, for 3 CPU hours. And
similarly, the other services of AWS or Microsoft Azure might offer, are also, priced
accordingly.
The GB-month is a measure of how many gigabytes of storage are
provisioned to your account and how long the storage is provisioned to
your account.
In a 31-day month, you are billed for one GB-month in the following
scenarios:
• If you have provisioned a 744-GB volume for 1 hours.
• If you have provisioned a 31-GB volume for 24 hours.
• If you have provisioned a 1-GB volume for 744 hours (the number of
hours in 31 days).
A core-hour is a measurement of computational time. In On Scale, if you run one CPU
for one hour, that's one core-hour. If you run 1000 CPUs for 1 hour, then that's 1000
core-hours.
There are a variants of services –
In this way, you can use these to host – your own web services, or your data so that
it is served out, or to run your own computation.
Orchestration among different platforms - on-premises infrastructure, private cloud services, and
public clouds.
Why Cloud ?
hundreds of startups across the globe, are leveraging the popular cloud providers, and a
variety of other cloud providers as well, rather than buying their own machines.
Cost Globally
Agility Elasticity Saving available
Further study:
Accelerate Scaling https://www.leadingedgetech.co.uk/it-s
up & Eliminates Cloud service
The ervices/it-consultancy-services/cloud-c
down to The cost of is globally
development omputing/advantages-of-cloud-computi
grow or On-site available
Of new ng/
shrink infrastructure
software
capacity
What is Cloud ?
Example: you have a 1000 nodes cluster connected via ethernet. You will
scan 100 TB data in your 1000 nodes cluster; each node will have roughly
0.1 TB data. Now if data is coming from remote storage, with 10 Mbps
ethernet speed, it will take about 165 minutes.
On the other hand, data is locally stored in nodes, with 50 Mbps speed, it
will take about 33 minutes.
Hence, we can’t move data easily; rather moving computation to the data is
much more feasible.
Cloud/Cluster can be of 2 types – (1) Single Site Cloud and (2)
Geo-distributed Cloud
Single Site Cloud / Datacenter –
(i) Computing nodes
(ii) Computing nodes grouped together to form a rack.
(iii) All the servers of a rack share same power & top of the rack switch.
(iv) The top of the rack switches are connected to a tree like network
topology.
(v) So, All top of the rack switches are connected to a core swich.
(vi) Front end nodes – user facing nodes – servers used for computing –
users’ requests come in or clients can submit the job.
(vii) Softwares – running at these servers and routers – OS, User level
applications, Actual IP Protocols, routing Protocols.
(viii) Back end nodes – storage – SSD, Large hard discs.
Each site typically has the same structure and services as the other or
at least as far as a software is concerned, it has the same software
stack.
IN 1969 – The idea of an “Intergalactic Computer Network” or “Galactic Network” (a computer networking concept
similar to today’s Internet) was introduced by J.C.R. Licklider, who was responsible for enabling the development of
ARPANET (Advanced Research Projects Agency Network). His vision was for everyone on the globe to be
interconnected and being able to access programs and data at any site, from anywhere.
Early days of internet, data centers rented physical servers for hosting websites. Multiple Websites shared one server,
where each website got a virtual web server that was hosted from a folder on a server disc. It’s not possible to install
anything on server.
IN 1970 – Using virtualization software like VMware. It become possible to run more than one Operating System
simultaneously in an isolated environment. It was possible to run a completely different Computer (virtual machine)
inside a different Operating System.
IN 1997 – The first known definition of the term “Cloud Computing” seems to be by Prof. Ramnath Chellappa in
Dallas in 1997 – “A computing paradigm where the boundaries of computing will be determined by economic
rationale rather than technical limits alone.”
IN 1999 – The arrival of Salesforce.com in 1999 pioneered the concept of delivering enterprise applications via simple
website. The services firm covered the way for both specialist and mainstream software firms to deliver applications
over the Internet.
IN 2003 – The first public release of Xen, which creates a Virtual Machine Monitor (VMM) also known as a
hypervisor, a software system that allows the execution of multiple virtual guest operating systems simultaneously
on a single machine.
Early 2000, hosting providers shifted towards virtualization, so that one physical machine can run multiple virtual
servers – one virtual server per customer. Each customer would then install an OS & configure a virtual server
from the scratch.
Amazon was a book selling company, Later started to sell variety of items – Eventually, the computing
infrastructure necessary to support their business, grew so large, they decided to rent their excess capacity to other
companies.
IN 2006 – Amazon commercialized this idea as AWS. Amazon expanded its cloud services. First was its Elastic
Compute cloud (EC2), which allowed people to access computers and run their own applications on them, all on
the cloud.
Then they brought out Simple Storage Service (S3). This introduced the pay-as-you-go model to both users and
the industry as a whole, and it has basically become standard practice now.
1. Massive Scale – large datacentre – Hundreds of thousands of servers connected. You can
run computation on as many as servers you want.
2. On-Demand Access – (i) you don’t need to sign a contract to purchase the resources, you
pay only when you use – no upfront cost. (ii) Anyone can use it.
3. Data Intensive nature – TB, EB & ZB of data, diversity of data.
4. New cloud programming paradigms –easier processing of large volume of data –
Map Reduce, easy storage (Cassandra and NoSQL stores such as MongoDB) and query
on data.
Massive Scale: Facebook, in 2012, ran about 180,000 servers and it had exponentially grown the
number of servers from 30,000 in 2009 to 60,000 in 2010.
Microsoft in 2008 was running 150,000 machines out of which 80,000 were running Bing, their
search engine, and they were growing at that point of time at 10,000 machines per month.
Yahoo in 2009 was running 100,000 machines in the data centers. And these are split into
clusters of 4,000 because of the way Hadoop runs.
Yahoo similarly has a web map application – chain of 100 MapReduce jobs.
Processed 280 terabytes of data in 73 hours, using 2500 nodes.
Hadoop + Pig, (developed on top of Hadoop).
Facebook uses Hadoop + Hive, (developed on top Hadoop) – 3000 MR jobs processed
55 terabytes per day
Adding 2 terabytes per day, in 2008.
Power at Datacenter – Hydro power, Solar power, coal power
Google – 1.11
Cooling at Datacenter –
Suck in Air from the top of data center - > purified water
spray - > Moves cool air through motor near server.
https://www.youtube.com/watch?v=ujO-xNvXj3g , https://www.youtube.com/watch?v=XZmGGAbHqa0
Cloud = Hundreds to thousands of machines on the data center side - connected. This
we call as a server side. + On the client side there might be millions or even
more machines which are accessing services that are hosted by these servers.
Client side – web pages, websites, objects to be stored, or other kind of services.
The servers on the data center side i.e. the server side – communicate amongst one
another.
The clients directly communicate with the one/multiple servers.
every time you access your Facebook profile, your client process, is going to access
multiple Facebook servers – some for the photos, some for the profile, some for the wall,
and so on and so forth.
Distributed System.
Distributed System - A distributed system is a collection of machines (connected), whose
distribution appears as one local machine to user.
This working definition is not 100% perfect; for example – web system is a distributed
system, although it does not appear to a single entity to user.
B C
DAG
Job 0
DAG
Data Data
Job 1 Several GBs
Job 2
Data Data
Jobs can take several
hrs to several days.
1. Stage-in / Initialization stage Job 3
2. Execution stage
3. Stage-out
4. Publish
Each job is split into several tasks (1000 or even more).
Each task runs on one machine, or on one processor of a particular site (A, B or C).
These tasks typically do not talk with each other, because they are highly parallelized.
When they are done, they return their results, to the job itself, which then aggregates it.
How do you schedule this overall application (DAG of Jobs) across the grid
resources, which are distributed out over the wide area?
Intra-site protocol
A
Inter-site protocol
B C
Intra-site protocol
Intra-site protocol, example- HT Condor
protocol
Job 0
Job 3
It is given a task to run. When the task completes, then it asks for
another task, and so on and so forth.
If user comes & hits a keystroke or mouse click, the task needs
to be stopped – Either kill task or ask server to reschedule. The
partial results from that task are sent to the central server so that,
,the rest of the work of that task can be completed on some other
workstation.
Inter-site protocol
B C
Intra-site protocol
So, We have seen different computing paradigms –
Cloud Computing = ?
Cloud Economics: Study of cloud computing’s costs and benefits and the economic
principles that underpin them
When you start-up your own company there are two options – (a) set up own Private cloud
or (b) use Public cloud/outsource ?
Public Cloud – pay for CPU-hour or GB-month only; not power, cooling, set-up cost,
management cost etc.
Private Cloud – pay for everything.
Example :
A organization wishes to run it’s business for n months.
Outsource (via AWS): Monthly cost – S3 cost: $ 0.12 /GB month & EC2 cost: $ 0.10 / CPU-hour
Storage-cost/month – 0.12 * 524 * 1024 = $ 62 K
Computation-cost/month – 0.10 * 1024 * 24 * 30 = $ 74 K
Total cost/month – $ 136 K
Industry rule for thumb: 0.45:0.4:0.15 for hardware: power: network (3 yerars of hardware lifetime).
This analysis shows – owning your own private cloud is preferable only if:
Storage: $ 350 K/n < $ 62 K
Total: $ 1500 K/n +7.5 K < $ 136