0% found this document useful (0 votes)
3 views

Module 1 ppt

The document discusses the evolution of computing and IT trends over the past 30 years, focusing on high-performance computing (HPC) and high-throughput computing (HTC) systems. It highlights the transition from centralized computing to distributed and cloud computing paradigms, emphasizing scalability, performance, and security. Additionally, it explores the impact of innovative applications and the need for efficient, reliable systems in various domains such as science, business, and the Internet of Things.

Uploaded by

Shivaay
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module 1 ppt

The document discusses the evolution of computing and IT trends over the past 30 years, focusing on high-performance computing (HPC) and high-throughput computing (HTC) systems. It highlights the transition from centralized computing to distributed and cloud computing paradigms, emphasizing scalability, performance, and security. Additionally, it explores the impact of innovative applications and the need for efficient, reliable systems in various domains such as science, business, and the Internet of Things.

Uploaded by

Shivaay
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Cloud Computing

& Security

By Prof. Shivakumar M

Department of Information Science


& Engineering

www.cambridge.edu.
in
Chapter 1 assesses the evolutional changes in computing and IT
trends in the past 30 years, driven by applications with variable
workloads and large data sets.

We study both high-performance computing (HPC) for scientific


computing and high throughput computing (HTC) systems for
business computing.

We study clusters/MPP, grids, P2P networks, and Internet clouds.


These systems are distinguished by their platform architectures,
OS platforms, processing algorithms, communication protocols,
security demands, and service models applied.

The study emphasizes the scalability, performance, availability,


security, energy-efficiency, workload outsourcing, data center
SCALABLE COMPUTING OVER THE
INTERNET

Instead of using a centralized computer to solve computational


problems, a parallel and distributed computing system uses multiple
computers to solve large-scale problems over the Internet. Thus,
distributed computing becomes data-intensive and network-centric.
i. The Age of Internet Computing
• Supercomputer sites and large data centers must
provide high-performance computing services to
huge numbers of Internet users concurrently.
• Because of this high demand, the emergence of
computing clouds demands high-throughput
computing (HTC) systems built with parallel and
distributed computing technologies.
• We have to upgrade data centers using fast servers,
ii.The Platform
Evolution
⚫ Computer technology has gone through five generations
of development,
⚫ 1950 to 1970, a handful of mainframes, including the
IBM 360 and CDC 6400, were built to satisfy
the demands of large businesses and
government organizations.
⚫ 1960 to 1980, lower-cost minicomputers such as the
DEC PDP 11 and VAX Series became popular
among small businesses and on college
campuses.
⚫ 1970 to 1990, we saw widespread use of personal
⚫ computers 1980
From built with
to VLSI
2000,microprocessors.
massive numbers of
computers and pervasive devices appeared portable
wired and wireless applications in both
⚫ Since 1990, the use of both HPC and HTC
systems hidden in clusters, grids, or Internet clouds 7
Clouds and Internet of
Things

Fig. Evolutionary trend toward parallel, distributed, and cloud


computing with clusters, MPPs, P2P networks, grids, clouds,
web services, and the Internet of Things.
1-8
 Figure above illustrates the evolution of HPC and HTC systems.

 On the HPC side, supercomputers (massively parallel processors or


MPPs) are gradually replaced by clusters of cooperative computers out
of a desire to share computing resources. The cluster is often a
collection of homogeneous compute nodes that are physically connected
in close range to one another.

 On the HTC side, peer-to-peer (P2P) networks are formed for distributed
file sharing and content delivery applications. A P2P system is built over
many client machines. Peer machines are globally distributed in nature.

 Clustering and P2P technologies lead to the development of


computational grids or data grids.
⚫ High-performance computing (HPC) is the use of parallel
processing for running advanced application
programs efficiently, reliably and quickly. The term
applies especially to systems that function above a teraflop
or 1012 floating-point operations per second.
⚫ High-throughput computing (HTC) is a computer science
term to describe the use of many computing resources
over long periods of time to accomplish a computational task.
⚫ HTC paradigm pays more attention to high-flux computing.
The main application for high-flux computing is in
Internet searches and web services by millions or
more users simultaneously.
⚫ The performance goal thus shifts to measure high
throughput or the number of tasks completed per unit of time.
⚫ HTC technology needs to not only improve in terms of batch
processing speed, but also address the acute
problems of cost, energy savings, security, and reliability
at many data and enterprise computing centers.
8
iii.Three New Computing Paradigms
⚫ With the introduction of SOA, Web 2.0
services become available.
⚫ Advances in virtualization make
it possible to see the growth of
Internet clouds as a new computing
paradigm.
⚫ The maturity of radio-frequency
identification (RFID), Global Positioning
System (GPS), and sensor technologies
has triggered the development of the
Internet of Things (IoT).
9
iv.Computing Paradigm
Distinctions
In general, distributed computing is the opposite of centralized
computing. The field of parallel computing overlaps with
distributed computing to a great extent, and cloud computing
overlaps with distributed, centralized, and parallel computing.

i. Centralized computing - paradigm by which all computer


resources are centralized in one physical system. All
resources (processors, memory, and storage) are fully shared
and tightly coupled within one integrated OS. Many data
centers and supercomputers are centralized systems, but they
are used in parallel, distributed, and cloud computing
applications
10
ii. Parallel computing :; here.. all processors are
either tightly coupled with centralized
shared memory or loosely coupled
memory.
with distributed
Interprocessor communication is accomplished
through shared memory or via message passing.
A computer system capable of parallel computing is
commonly known as a parallel computer .
Programs running in a parallel computer are called
parallel programs.
The process of writing parallel
programs is often referred to as parallel
programming. 11
iii. Distributed computing : of
science/engineering that field computer
systems. studies distributed
A distributed system consists of multiple
autonomous computers, each having its own
private memory, communicating through a
computer network.
Information exchange in a distributed system is
accomplished through message passing.
A computer program that runs in a distributed
system is known as a distributed program. The
process of writing distributed programs is referred
to as distributed programming.
12
iv. Cloud computing : An Internet cloud of resources
can be either a centralized or a distributed
computing system.
⚫ The cloud applies parallel or
distributed computing, or both.
⚫ Clouds can be built with physical or
virtualized resources over large data
centers that are centralized or distributed.
⚫ Some authors consider cloud computing to
be a form of utility computing or service
computing.
⚫ high-tech community prefer the term
concurrent computing or concurrent
programming. These terms typically refer to 13
the union of parallel computing and
⚫ Ubiquitous computing refers to computing with
pervasive devices at any place and time.. using
wired or wireless communication.
⚫ The Internet of Things (IoT) is a
networked connection of everyday objects
including computers, sensors, humans, etc.
⚫ The IoT is supported by Internet clouds to
achieve ubiquitous computing with any object at
any place and time.
⚫ Finally, the term Internet computing is even
broader and covers all computing paradigms
over the Internet
14
v.Distributed System Families
⚫ Technologies used for building P2P networks and
networks of clusters have been consolidated into
many national projects designed to establish wide
area computing infrastructures, known as computational
grids or data grids
⚫ Internet clouds are the result of moving desktop computing
to service-oriented computing using server clusters
and huge databases at data centers.
⚫ In October 2010, the highest performing cluster machine
was built in China with 86016 CPU processor cores
and 3,211,264 GPU cores in a Tianhe-1A system.
The largest computational grid connects up to hundreds of
server clusters.
<A graphics processing unit (GPU), also occasionally called visual processing
unit (VPU), is a specialized electronic circuit designed to rapidly manipulate and
alter memory to accelerate the creation of images in a frame buffer intended for
output
a to
display.> 16
⚫ In the future, both HPC and HTC systems will
demand multicore or many-core processors that
can handle large numbers of computing threads
per core.
⚫ Both HPC and HTC systems emphasize
parallelism and distributed computing.
⚫ Future HPC and HTC systems must be able to
satisfy this huge demand in computing power in
terms of throughput, efficiency, scalability, and
reliability.
⚫ The system efficiency is decided by speed,
programming, and energy factors (i.e., throughput
per watt of energy consumed).

16
Meeting these goals requires to yield the following design
objectives:
⚫ Efficiency measures the utilization rate of resources in an
execution model by exploiting massive parallelism in
HPC. For HTC, efficiency is more closely related
to job throughput, data access, storage, and power
efficiency.
⚫ Dependability measures the reliability and self-
management from the chip to the system and
application levels. The purpose is to provide high-
throughput service with Quality of Service (QoS)
assurance, even under failure conditions.
⚫ Adaptation in the programming model measures the ability
to support billions of job requests over massive data
sets and virtualized cloud resources under various
workload and service models.
⚫ Flexibility in application deployment measures the ability of
distributed systems to run well in both HPC (science 17
and engineering) and HTC (business) applications.
Scalable Computing Trends and
New Paradigms
⚫ Degrees of Parallelism
when hardware was bulky and expensive, most computers were designed
in a bit-serial fashion.
bit-level parallelism (BLP): converts bit-serial processing to word-level
processing gradually.
Over the years, users graduated from 4-bit microprocessors to 8-,16-, 32-,
and 64-bit CPUs. This led us to the next wave of improvement, known as
instruction-level parallelism (ILP) , in which the processor executes
multiple instructions simultaneously rather than only one instruction at a
time.
For the past 30 years, we have practiced ILP through pipelining, super-
scalar computing,
VLIW (very long instruction word) architectures, and multithreading.
ILP requires branch prediction, dynamic scheduling, speculation, and
compiler support to work efficiently.

18
Data-level parallelism (DLP): was made popular through SIMD (single
instruction, multiple data) and vector machines using vector or array types of
instructions.
DLP requires even more hardware support and compiler assistance to work
properly. Ever since the introduction of multicore processors and chip
multiprocessors (CMPs) ,
we have been exploring task-level parallelism (TLP) .
A modern processor explores all of the aforementioned parallelism types. In
fact, BLP, ILP, and DLP are well supported by advances in hardware and
compilers. However, TLP is far from being very successful due to difficulty in
programming and compilation of code for efficient execution on multicore
CMPs.
As we move from parallel processing to distributed processing, we will
see an increase in computing granularity to job-level parallelism (JLP)
. It is fair to say that coarse-grain parallelism is built on top of fine-grain
parallelism

19
⚫ Innovative Applications
Few key applications that have driven the development of
parallel and distributed systems over the years

These applications spread across many important domains in science, engineering,


business, education, health care, traffic control, Internet and web services, military,
and government applications.
Almost all applications demand computing economics, web-scale data collection,
system reliability, and scalable performance. For example, distributed transaction
processing is often practiced in the banking and finance industry. Transactions
represent 90 percent of the existing market for reliable banking systems.
20
The Trend toward Utility
Computing:Technology Convergence toward
HPC for Science and HTC for Business

(Courtesy of Raj Buyya, University of Melbourne, 2011)

Copyright © 2012, Elsevier Inc. All rights reserved. 21


2011 Gartner “IT Hype Cycle” for Emerging
Technologies
2010

2009 2011

2008

2007

Copyright © 2012, Elsevier Inc. All rights reserved. 22


Disillusionment:a feeling of disappointment resulting from the
discovery that something is not as good as one believed it to be.

Inflated: excessively or unreasonably high.

Enlighten: greater knowledge and understanding about a subject or


situation.

Trough:::low point. a short period of low activity, low prices etc

Cyber-Physical Systems
A cyber-physical system (CPS) is the result of interaction between
computational processes and the physical world. A CPS integrates “
cyber ” (heterogeneous, asynchronous) with“ physical ” (concurrent
and information-dense) objects. A CPS merges the “ 3C ”
technologies of computation, communication , and control into an
intelligent closed feedback system between the physical world and
the information world, a concept which is actively explored in the
United States.
24
TECHNOLOGIES FOR NETWORK-BASED
SYSTEMS
Time to explore hardware, software, and network technologies for distributed
computing system design and applications.
Performance Metrics and Scalability
Analysis
• Performance metrics are needed to measure various distributed systems.

• In a distributed system, performance is attributed to a large number of factors.

• System throughput is often measured in MIPS, Tflops (tera floating-point


operations per second), or TPS (transactions per second). Other measures
include job response time and network latency.

• An interconnection network that has low latency and high bandwidth is


preferred.

• System overhead is often attributed to OS boot time, compile time, I/O


data rate, and the runtime support system used.

• Other performance-related metrics include the QoS for Internet and


web services; system availability and dependability; and security resilience
for system defense against network attacks.
Dimensions of Scalability
• Users want to have a distributed system that can achieve scalable
performance.

• The following dimensions of scalability are characterized in parallel and


distributed systems:

• Size scalability This refers to achieving higher performance or more


functionality by increasing the machine size. The word “size” refers to
adding processors, cache, memory, storage, or I/O channels. The most
obvious way to determine size scalability is to simply count the number of
processors installed.

• Software scalability This refers to upgrades in the OS or compilers,


adding mathematical and engineering libraries, porting new application
software, and installing more user-friendly programming environments.

• Application scalability This refers to matching problem size scalability


with machine size scalability. Instead of increasing machine size, users
• Technology scalability This refers to a system that can adapt to changes in
building technologies, such as the component and networking technologies.

• When scaling a system design with new technology one must consider three
aspects: time, space, and heterogeneity.
(1) Time refers to generation scalability. When changing to new-generation
processors, one must consider the impact to the motherboard, power supply,
packaging and cooling, and so forth.
(2) Space is related to packaging and energy concerns.
(3) Heterogeneity refers to the use of hardware components or software
packages from different vendors.
Scalability versus OS Image Count

 Scalable performance implies that the system can achieve higher speed by adding
more processors or servers, enlarging the physical node’s memory size, extending
the disk capacity, or adding more I/O channels.

 The OS image is counted by the number of independent OS images observed in a


cluster, grid, P2P network, or the cloud.

• An SMP (symmetric multiprocessor) server has a single system


image, which could be a single node in a large cluster.
• NUMA (nonuniform memory access) machines are often made out
of SMP nodes with distributed, shared memory. A NUMA machine
can run with multiple operating systems, and can scale to a few
thousand processors communicating with the MPI library. For
example, a NUMA machine may have 2,048 processors running 32
SMP operating systems, resulting in 32 OS images in the 2,048-
processor NUMA system.

• Many cluster nodes are SMP or multicore servers and hence the
total number of processors or cores in a cluster system is one or two
orders of magnitude greater than the number of OS images running
Amdahl’s Law
• Assume that a fraction α of the code must be executed
sequentially, called the sequential bottleneck. Therefore, (1 − α)
of the code can be compiled for parallel execution by n
processors.

• The total execution time of the program is calculated by α


T + (1− α)T/n, where the first term is the sequential execution
time on a single processor and the second term is the parallel
execution time on n processing nodes.

• Amdahl’s Law states that the speedup factor of using the n-


• The maximum speedup of n is achieved only if the sequential
bottleneck α is reduced to zero or the code is fully parallelizable
with α = 0.

• For example, the maximum speedup achieved is 4, if α = 0.25 or 1


− α = 0.75, even if one uses hundreds of processors.

• Amdahl’s law teaches us that we should make the sequential


bottleneck as small as possible. Increasing the cluster size alone
Problem
may notwith Fixed
result Workload
in a good speedup.
• To execute a fixed workload on n processors, parallel processing
may lead to a system efficiency defined as follows:
• E =S/n =1/[αn +1 −α]

• Very often the system efficiency is rather low, especially when the
cluster size is very large. This is because only a few processors
(say, 4) are kept busy, while the majority of the nodes are left

You might also like