0% found this document useful (0 votes)
41 views30 pages

1CC Notes

Uploaded by

darkxedits9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views30 pages

1CC Notes

Uploaded by

darkxedits9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CLOUD COMPUTING

UNIT - I
SYSTEMS MODELING,CLUSTERING AND VIRTUALIZATION

1. Scalable Computing over the Internet


In past 60 Years the computing technology has undergone a series of
platform and environment changes. At those time there are evolutionary
changes in machine architecture, operating system platform, network
connectivity and application workload.
Instead of using a centralized computer to solve computational
problem, a parallel and distributed computing system uses multiple
computer to solve large-scale problems over the internet.

2. The age of Internet Computing


The age of internet computing means who the computing resources
and services become widely available over the internet, transforming how
individuals, business and gov access and use technology. This all are
characterized by cloud computing.
Phases of Internet Computing
1) Pre-internet computing (1960-1980)
2) The advent of the internet (1990)
3) Cloud computing emergence (2000)
4) Distributed and Decentralized computing (2010)
5) Modern era (22020 and Beyond)

3.Technologies for network based systems


The Technologies for network based systems will plan critical role in
ensuring efficient communication,resource allocation, and service
delivery. The underlying technologies enable cloud infrastructures to
handle more data, connect distributed resources, and meet the demands
of modern applications.

1) Multicore CPUs and Multithreading Technologies


2) Advances in CPU processors
3) Multicore CPUs and Many-core GPU Architectures
4) Multithreading Technology & how GPUs workload
5) Memory, storage, and wide-area networking
6) Memory Technology
7) Disk and storage technology
8) system area interconnects
9) virtual machines and virtualization middleware
10)virtual machines
11)VM primitive operations
12)virtual infrastructures
13)data center virtualization for cloud computing
14)Data center growth and cost breakdown
4. System models for distributed and cloud computing
The distributed and cloud computing systems are built over a large
number of autonomous computer nodes. These all nodes are
interconnected by using the SAN, LAN, & WAN In a hierarchical manner. In
todays life with few LAN switches we can connect hundreds fo machines
as a working cluster and WAN can connect many local clusters to form a
large cluster of clusters. Massive systems are considered highly scalable,
and can reach web-scale connectivity,either physically or logically.
Massive systems are classified into four groups they are:

a) Clusters : A distributed systems cluster is a group of machines that


are virtually or geographically separated and that work together to
provide the same service or application to clients. It is possible that many
of the services you run in your network today are part of a distributed
systems

b) P2P Networks : In a P2P system, every node acts as both a client and
a server, providing part of the system resources. Peer machines are
simply client computers connected to the Internet. All client machines act
autonomously to join or leave the system freely. This implies that no
master-slave relationship exists among the peers. No central coordination
or central database is needed. The system is self-organizing with
distributed control.

c) Computing grids : This is the use of widely distributed computer


resources to reach a common goal. A computing grid can be thought of as
a distributed system with non-interactive workloads that involve many
files. Grid computing is distinguished from conventional high-performance
computing systems such as cluster computing in that grid computers
have each node set to perform a different task/application. Grid
computers also tend to be more heterogeneous and geographically
dispersed than cluster computers.

d) Internet clouds : The idea is to move desktop computing to a


service-oriented platform using server clusters and huge databases at
data centers. Cloud computing leverages its low cost and simplicity to
benefit both users and providers. Machine virtualization has enabled such
cost-effectiveness. Cloud computing intends to satisfy many user
Virtualized resources from data centers to form an Internet cloud,
provisioned with hardware, software, storage, network, and services for
paid users to run their applications.
5.Performance, Security and Energy Efficiency

1.5.1 Performance metrics and Scalability analysis


The performance metrics are needed to measure various distributed
systems. So we will see various dimensions of scalability and performance
laws. And system scalability against OS images and the limiting factors
encountered
1.5.1.1. Performance matrics
1.5.1.2. Dimensions of Scalability
1.5.1.3. Scalability versus OS Image count

1.5.2 Fault Tolerance and System Availability


In addition to performance, system availability and application
flexibility are two other important design goals in a distributed computing
system.
1.5.2.1. System Availability

1.5.3 Network Threats and data integrity


Clusters, grids, P2P networks, and clouds demand security and
copyright protection if they are to be accepted in today’s digital society.
This section introduces system vulnerability, network threats, defense
countermeasures, and copyright protection in distributed or cloud
computing systems.
1.5.3.1. Threats to systems and networks
1.5.3.2. Security Responsibilities
1.5.3.3. Copyright protection
1.5.3.4. System Defense Technologies
1.5.3.5. Data Protection infrastructure

1.5.4 Energy efficiency in distributed computing


The energy efficiency in dispersed computing is very important
because per days in the world many servers will run to there daily tasks
and work but some servers are kept back with any work those severs will
consume power as like working servers to save energy from the servers
we have the use energy efficiency in all the distributed computing
The potential savings in turning off these servers are large—$3.8
billion globally in energy costs alone, and $24.7 billion in the total cost of
running nonproductive servers, according to a study by 1E Company in
partnership with the Alliance to Save Energy (ASE). This amount of
wasted energy is equal to 11.8 million tons of carbon dioxide per year,
which is equivalent to the CO pollution of 2.1 million cars.
1.5.4.1. Energy consumption of unused servers
1.5.4.2. Reducing energy in active servers
1.5.4.3. DVFS Method for energy efficiency
1.5.4.4. Middleware layer
1.5.4.5. Resource layer
UNIT – II
VIRTUAL MACHINES AND VIRTUALIZATION OF
CLUSTERING AND DATA CENTER
Virtualization:
Virtualization is the "creation of a virtual (rather than actual) version of
something, such as a server, a desktop, a storage device, an operating
system or network resources".
In other words, Virtualization is a technique, which allows to share a
single physical instance of a resource or an application among multiple
customers and organizations. It does by assigning a logical name to a
physical storage and providing a pointer to that physical resource when
demanded.

1. Implementation Levels of Virtualization


Virtualization is a computer architecture technology by which multiple
virtual machines (VMs) are multiplexed in the same hardware machine.
The purpose of a VM is to enhance resource sharing by many users and
improve computer performance in terms of resource utilization and
application flexibility. Hardware resources (CPU, memory, I/O devices,
etc.) or software resources (operating system and software libraries) can
be virtualized in various functional layers. The idea is to separate the
hardware from the software to yield better system efficiency.

2.1.1. Levels of Virtualization


Virtualization is not a simple setup because your computer runs on
an operating system that gets configured on some particular hardware,
for running another operating system in same hardware is so tough, to do
this we need hypervisor.
Hypervisor is a bridge between the hardware and the virtual
operating system which allows smooth functioning. For implementing this
we have five different levels that are used commonly. Those are:
1) Instruction set Architecture level (ISA)
2) Hardware Abstraction level (HAL)
3) Operating System Level
4) Library level
5) Application level
2.1.2. VMM Design Requirements and Providers
Hardware level virtualization will inserts a layer between real
hardware and traditional operating system this layer is called as
Virtual Machine Monitor (VMM) and it manages the hardware resources of
a computing system

➢ There are three requirements for a VMM


1. first a VMM should provide an environment for programs which is
essentially identical to the original machine.
2. Second, Programs run in this environment should show, at worst,
only minor decreases in speed.

3. Third, a VMM should be in complete control of the system


resources.

➢ Complete control of these resources by a VMM includes the following


aspects:
1. The VMM is responsible for allocating hardware resources for
programs;

2. it is not possible for a program to access any resource not


explicitly allocated to it; and

3. it is possible under certain circumstances for a VMM to regain


control of resources already allocated.

Not all processors satisfy these requirements for a VMM.


A VMM is tightly related to the architectures of processors.

2.1.3. Virtualization support at the OS level


• Operating system virtualization (or OS Image virtualization) inserts a
virtualization layer inside an operating system to partition a
machine's physical resources.
• It enables multiple isolated VMs within a single operating system
kernel.
• This kind of VM is often called a virtual execution environment (VE),
Virtual Private System (VPS), or simply container.
• From the user's point of view, VEs look like real servers. - a VE has its
own set of processes, file system, user accounts, network interfaces
with IP addresses, routing tables, firewall rules, and other personal
settings.
2.1.4. Middleware support for virtualization
The library-level virtualization is also known as user-level Application
Binary Interface (ABI) or API emulation. This type of virtualization can
create execution environments for running alien programs on a platform
rather than creating a VM to run the entire operating system. API call
interception and remapping are the key functions performed. The various
library-level virtualization systems are the Windows Application Binary
Interface (WABI), lxrun, WINE, Visual MainWin, and vCUDA .
The WABI offers middleware to convert Windows system calls to
Solaris system calls. Lxrun is really a system call emulator that enables
Linux applications written for x86 hosts to run on UNIX systems. Similarly,
Wine offers library support for virtualizing x86 processors to run Windows
applications on UNIX hosts. Visual MainWin offers a compiler support
system to develop Windows applications using Visual Studio to run on
some UNIX hosts.

2. Virtualization Structures/ Tools and Mechanisms


In VM architecture there are 3 typical classes they are
application,Virtualization layer(hypervisor or VMM), Hardware running the
host OS this are in after the virtualization in before virtualization there are
3 class they are application, host operating system, hardware.
Before virtualization, the operating system manages the hardware.
After virtualization, a virtualization layer is inserted between the hardware
and the operating system. In such a case, the virtualization layer is
responsible for converting portions of the real hardware into virtual
hardware. Therefore, different operating systems such as Linux and
Windows can run on the same physical machine, simultaneously.

Based on position of the Virtualization layer, there are several classes of


VM Architecture they are

2.2.1. Hypervisor and Xen Architecture


The hypervisor support hardware-level virtualization on bare metal
devices like CPU, Memory, Disk and network interfaces. The hypervisor
software sits directly between the physical hardware and its OS. This
virtualization layer is referred to as either the VMM or the hypervisor. The
hypervisor provides hypercalls for the guest OSes and applications.
2.2.2. Binary Translation with full Virtualization
Depending on implementation technologies, hardware virtualization can
be classified into two categories: full virtualization and host-based
virtualization. Full virtualization does not need to modify the host OS. It
relies on binary translation to trap and to virtualize the execution of
certain sensitive, nonvirtualizable instructions. The guest OSes and their
applications consist of noncritical and critical instructions. In a host-based
system, both a host OS and a guest OS are used. A virtualization software
layer is built between the host OS and guest OS. These two classes of VM
architecture are introduced next.
➢ Full virtualization
With full virtualization, noncritical instructions run on the hardware
directly while critical instructions are discovered and replaced with
traps into the VMM to be emulated by software. Both the hypervisor
and VMM approaches are considered full virtualization.

➢ Binary translation of guest OS requests using a VMM


Binary translation of guest OS requests using a VMM approach was
implemented by VMware and many other software companies.
VMware puts the VMM at Ring 0 and the guest OS at Ring 1. The
VMM scans the instruction stream and identifies the privileged,
control- and behavior-sensitive instructions. When these instructions
are identified, they are trapped into the VMM, which emulates the
behavior of these instructions. The method used in this emulation is
called binary translation.

➢ Host based virtualization


An alternative VM architecture is to install a virtualization layer on
top of the host OS. This host OS is still responsible for managing the
hardware. The guest OSes are installed and run on top of the
virtualization layer. Dedicated applications may run on the VMs.
Certainly, some other applications can also run with the host OS
directly.
2.2.3. Para-Virtualization with compiler support
Para-virtualization needs to modify the guest operating systems. A
para-virtualized VM provides special APIs requiring substantial OS
modifications in user applications. Performance degradation is a critical
issue of a virtualized system. No one wants to use a VM if it is much
slower than using a physical machine. The virtualization layer can be
inserted at different positions in a machine software stack. However,
para-virtualization attempts to reduce the virtualization overhead, and
thus improve performance by modifying only the guest OS kernel.

➢ Para-Virtualization Architecture
When the x86 processor is virtualized, a virtualization layer is
inserted between the hardware and the OS. According to the x86
ring definition, the virtualization layer should also be installed at
Ring 0. Different instructions at Ring 0 may cause some problems. In
Figure 3.8, we show that para-virtualization replaces non-
virtualizable instructions with hyper calls that communicate directly
with the hypervisor or VMM. However, when the guest OS kernel is
modified for virtualization, it can no longer run on the hardware
directly. Compared with full virtualization, para-virtualization is
relatively easy and more practical. The main problem in full
virtualization is its low performance in binary translation. To speed
up binary translation is difficult. Therefore, many virtualization
products employ the para-virtualization architecture. The popular
Xen, KVM, and VMware ESX are good examples
3. Virtualization of CPU
Virtualization of the CPU refers to the process of creating a virtual
version of a physical CPU, allowing multiple virtual machines (VMs) to
share and use the same underlying hardware resources. This concept is
central to modern cloud computing, data centers, and server
management, enabling efficient use of physical hardware.

How CPU Virtualization Works:


➢ Hypervisor: A hypervisor is software or firmware that creates and
manages virtual machines (VMs). It allocates CPU resources from the
host machine to each VM and manages execution.
• Type 1 Hypervisor: Runs directly on the host hardware (e.g.,
VMware ESXi, Microsoft Hyper-V).
• Type 2 Hypervisor: Runs on top of a host operating system (e.g.,
VirtualBox, VMware Workstation).

➢ Virtual CPUs (vCPUs): The hypervisor creates virtual CPUs for


each VM by dividing the physical CPU(s) into smaller, manageable
units. Each VM perceives these as if they are dedicated CPUs, but
they are shared by multiple VMs.
➢ Scheduling: The hypervisor schedules and assigns vCPUs to the
physical CPU cores, ensuring efficient use of resources and isolation
between VMs.
➢ Hardware-Assisted Virtualization: Modern CPUs (e.g., Intel VT-x,
AMD-V) offer hardware support for virtualization, improving
performance and reducing overhead. This allows the hypervisor to
interact more directly with the CPU.
4. Memory and I/O Devices
2.4.1. Memory virtualization
Memory virtualization is the process of abstracting and managing
the system's physical memory (RAM) to create the illusion of a
larger, more flexible pool of memory for applications and operating
systems. It plays a key role in modern computing, particularly in
virtualization, cloud environments, and operating systems.

Memory Virtualization Works:


➢ Virtual Memory: The core concept of memory virtualization is
virtual memory, which provides each application or process with its
own address space. This means the application perceives it has
access to a large, contiguous block of memory, even though the
actual physical memory may be fragmented or limited.
➢ Memory Manager (MMU): The Memory Management Unit
(MMU) in the CPU is responsible for translating virtual addresses
into physical addresses. It maintains a mapping between the virtual
and physical memory using structures like page tables.
➢ Paging: Memory is divided into fixed-size blocks called pages.
When an application accesses a piece of data, the MMU translates
the virtual address to the physical address of the corresponding
page.
➢ Page Tables: The MMU uses page tables to map virtual memory
pages to physical memory frames. These tables keep track of where
data is stored and manage the mapping efficiently.
➢ Swapping (Virtual Memory Extension): When the system runs
out of physical memory, it can move inactive pages to disk storage
(swap space) and load them back into memory when needed. This
extends the effective memory available to applications, albeit at a
performance cost.
2.4.2. I/O Virtualization
I/O virtualization involves managing the routing of I/O requests
between virtual devices and the shared physical hardware. At the time of
this writing, there are three ways to implement I/O virtualization: full
device emulation, para-virtualization, and direct I/O.

➢ Full device emulation is the first approach for I/O virtualization.


Generally, this approach emulates well known, real-world devices. All
the functions of a device or bus infrastructure, such as device
enumeration, identification, interrupts, and DMA, are replicated in
software. This software is located in the VMM and acts as a virtual
device. The I/O access requests of the guest OS are trapped in the
VMM which interacts with the I/O devices.
➢ The para-virtualization method of I/O virtualization is typically used
in Xen. It is also known as the split driver model consisting of a
frontend driver and a backend driver. The frontend driver is running
in Domain U and the backend driver is running in Domain 0. They
interact with each other via a block of shared memory.
➢ Direct I/O virtualization lets the VM access devices directly. It can
achieve close-to-native performance without high CPU costs.
However, current direct I/O virtualization implementations focus on
networking for mainframes. There are a lot of challenges for
commodity hardware devices.
5. Virtual Clusters and Resource Management
Virtual Clusters and Resource Management are essential
components in distributed computing and cloud environments, where
multiple virtual machines (VMs), containers, or workloads are run on
shared infrastructure. Virtual clusters provide an abstraction of physical
clusters, allowing for more flexible and efficient resource utilization, while
resource management ensures that computational resources (like CPU,
memory, storage, and network) are allocated, balanced, and optimized
for performance, scalability, and cost-efficiency. VMs can communicate
with one another freely through the virtual network interface card and
configure the network automatically.
2.5.1. Physical versus virtual clusters
The provisioning of VMs to a virtual cluster is done dynamically to have
the following interesting properties:
● The virtual cluster nodes can be either physical or virtual machines.
Multiple VMs running with different OSes can be deployed on the
same physical node.
● A VM runs with a guest OS, which is often different from the host OS,
that manages the resources in the physical machine, where the VM
is implemented.
● The purpose of using VMs is to consolidate multiple functionalities on
the same server. This will greatly enhance server utilization and
application flexibility.

2.5.2. Live VM Migration steps and performance Effects


In a cluster built with mixed nodes of host and guest systems, the
normal method of operation is to run everything on the physical machine.
When a VM fails, its role could be replaced by another VM on a different
node, as long as they both run with the same guest OS.
2.5.3. Migration of memory, files, and network resource
Since clusters have a high initial cost of ownership, including space,
power conditioning, and cool-ing equipment, leasing or sharing access to
a common cluster is an attractive solution when demands vary over time.
Shared clusters offer economies of scale and more effective utilization of
resources by multiplexing. Early configuration and management systems
focus on expressive and scalable mechanisms for defining clusters for
specific types of service, and physically partition cluster nodes among
those types.
2.5.4. Dynamic Deployment of virtual clusters
Dynamic deployment of virtual clusters involves the creation,
scaling, and management of clusters of virtual machines (VMs) or
containers in response to workload demands. This process is highly
automated, allowing for efficient resource utilization and flexible scaling.
6. Virtualization for Data-Center Automation
Data-center automation means that huge volumes of hardware,
software, and database resources in these data centers can be allocated
dynamically to millions of Internet users simultaneously, with guaranteed
QoS and cost-effectiveness. This automation process is triggered by the
growth of virtualization products and cloud computing services. The latest
virtualization development highlights high availability (HA), backup
services, workload balancing, and further increases in client bases
➢ Server consolidation in Data centers
In data centers, a large number of heterogeneous workloads can run
on servers at various times. These heterogeneous workloads can be
roughly divided into two categories: chatty workloads and non-
interactive workloads. Chatty workloads may burst at some point
and return to a silent state at some other point
➢ Virtual storage management
The term <storage virtualization= was widely used before the
rebirth of system virtualization. Yet the term has a different meaning
in a system virtualization environment. Previously, storage
virtualization was largely used to describe the aggregation and
repartitioning of disks at very coarse time scales for use by physical
machines. In system virtualization, virtual storage includes the
storage managed by VMMs and guest OSes. Generally, the data
stored in this environment can be classified into two categories: VM
images and application data. The VM images are special to the
virtual environment, while application data includes all other data
which is the same as the data in traditional OS environments.
➢ Cloud OS for virtualized data centers
➢ Trust management In virtualized data center
A VMM changes the computer architecture. It provides a layer of
software between the operating systems and system hardware to
create one or more VMs on a single physical platform. A VM entirely
encapsulates the state of the guest operating system running inside
it. Encapsulated machine state can be copied and shared over the
network and removed like a normal file, which proposes a challenge
to VM security. In general, a VMM can provide secure isolation and a
VM accesses hardware resources through the control of the VMM, so
the VMM is the base of the security of a virtual system
UNIT - III
Cloud Platform Architecture
1. Cloud Computing and Service Models
Cloud computing is the delivery of computing services including
storage, processing power, databases, networking, software, and more,
typically referred to as "the cloud." These services are provided by cloud
service providers like Amazon Web Services (AWS), and Google Cloud,
which allow users to access resources on demand without having to own
and maintain physical infrastructure.
➢ Cloud Computing Service Models
1.Infrastructure as a Service (IaaS):
• Definition: IaaS provides virtualized computing resources over
the internet. This includes physical or virtual machines, storage,
and networking components. Users are responsible for
managing the OS, middleware, applications, and data.
• Use Case: Ideal for businesses that want to build and manage
their own applications but don't want to invest in physical
hardware.
• Examples: Amazon EC2, Google Compute Engine, Microsoft
Azure Virtual Machines.
2.Platform as a Service (PaaS):
• Definition: PaaS provides a platform allowing customers to
develop, run, and manage applications without the complexity
of building and maintaining the underlying infrastructure. PaaS
solutions handle the OS, middleware, and runtime environment.
• Use Case: Suitable for developers who want to focus on
building and deploying applications without worrying about the
underlying infrastructure.
• Examples: Heroku, Google App Engine, Microsoft Azure App
Service.
3.Software as a Service (SaaS):
• Definition: SaaS delivers fully functional, ready-to-use software
applications over the internet. Users can access the software
through web browsers, and the service provider manages
everything from the infrastructure to the application.
• Use Case: Great for users who need ready-to-use software
without managing hardware or software updates.
• Examples: Google Workspace (Gmail, Google Docs),
Salesforce, Microsoft Office 365.
➢ Cloud Deployment Models
1.Public Cloud:
• Definition: Resources are shared and delivered over the
internet to multiple customers by a third-party provider. These
are typically available to the general public.
• Examples: AWS, Microsoft Azure, Google Cloud.
2.Private Cloud:
• Definition: The cloud infrastructure is used exclusively by a
single organization, either hosted on-premises or by a third
party. It provides more control over the infrastructure and
security.
• Examples: VMware, OpenStack private cloud.
3.Hybrid Cloud:
• Definition: A combination of both public and private clouds,
allowing data and applications to be shared between them.
Hybrid clouds offer more flexibility by enabling businesses to
scale workloads across public and private clouds as needed.
• Examples: A company using its private cloud for sensitive data
and public cloud for less critical operations.
4.Community Cloud:
• Definition: A cloud infrastructure shared by several
organizations with similar requirements, typically managed by
one of the organizations or a third-party provider.
• Examples: Research institutions or government organizations
with shared resources.

2. Public Cloud Platforms


Public cloud platform are nothing but the cloud providers. Who
provide cloud services to the public are know as public cloud platforms.
There are different companies that provide different cloud services to the
public user’s they are IBM, Amazon, Google, Microsoft, Salesforce.

3.2.1. Public Clouds and service offerings


Public clouds provide a variety of scalable and flexible services that are
typically grouped into different categories such as Infrastructure,
Platform, and Software as a Service. These services allow businesses to
reduce the need for on-premises infrastructure and pay only for the
resources they use.
3.2.2. Google app engine(GAE)
Google has the world’s largest search engine facilities. The company
has extensive experience in massive data processing that has led to new
insights into data-center design (see Chapter 3) and novel programming
models that scale to incredible sizes.Google has hundreds of data centers
and has installed more than 460,000 servers worldwide. The Google’s App
Engine (GAE) which offers a PaaS platform supporting various cloud and
web applications.

3.2.3. Amazon web services


VMs can be used to share computing resources both flexibly and safely.
Amazon has been a leader in providing public cloud services
(http://aws.amazon.com/). Amazon applies the IaaS model in pro-viding
its services.C2 provides the virtualized platforms to the host VMs where
the cloud application can run. S3 (Simple Storage Service) provides the
object-oriented storage service for users. EBS (Elastic Block Service)
provides the block storage interface which can be used to support
traditional applications. SQS stands for Simple Queue Service, and its job
is to ensure a reliable message service between two processes. The
message can be kept reliably even when the receiver processes are not
running. Users can access their objects through SOAP with either
browsers or other client programs which support the SOAP standard.
3.2.4. Microsoft windows Azure
In 2008, Microsoft launched a Windows Azure platform to meet the
challenges in cloud computing. This platform is built over Microsoft data
centers. Below Figure shows the overall architecture of Microsoft’s cloud
platform. The platform is divided into three major component platforms.
Windows Azure offers a cloud platform built on Windows OS and based on
Microsoft virtualization technology. Applications are installed on VMs
deployed on the data-center servers. Azure manages all servers, storage,
and network resources of the data center. On top of the infrastructure are
the various services for building different cloud applications

3. Service Oriented Architecture


The service-oriented Architecture is used to design a software
system that makes use of services of new or legacy applications through
their published or discoverable interfaces. These applications are often
distributed over the networks. SOA also aims to make service
interoperability extensible and effective. It prompts architecture styles
such as loose coupling, published interfaces, and a standard
communication model in order to support this goal. The World Wide Web
Consortium (W3C) defines SOA as a form of distributed systems
architecture. SOA is related to early efforts on the architecture style of
large-scale distributed systems, particularly Representational State
Transfer (REST). Nowadays, REST still provides an alternative to the
complex standard-driven web services technology and is used in many
Web 2.0 services.
3.3.1. REST and systems of systems
REST is a software architecture style for distributed systems,
particularly distributed hypermedia systems, such as the World Wide Web.
It has gained popularity among enterprises such as Google, Amazon,
Yahoo!, and especially social networks such as Facebook and Twitter
because of its simplicity, and its ease of beingpublished and consumed by
clients. RESTful web services can be considered an alternative to SOAP
stack or “big web services” described in the next section, because of their
simplicity, lightweight nature, and integration with HTTP. With the help of
URIs and hyperlinks, REST has shown that it is possible to discover web
resources without an approach based on registration to a centralized
repository.

3.3.2. Services and Web Services


The term “web service” is often referred to a self-contained, self-
describing, modular application designed to be used and accessible by
other software applications across the web. Once a web service is
deployed, other applications and other web services can discover and
invoke the deployed service. Web Services Description Language (WSDL)
WSDL describes the interface, a set of operations supported by a web
service in a standard format. It standardizes the representation of input
and output parameters of its operations as well as the service’s protocol
binding, the way in which the messages will be transferred on the wire.
Using WSDL enables disparate clients to automatically understand how to
interact with a web service.

3.3.3. Enterprise Multitier Architecture


Enterprise applications often use multitier architecture to
encapsulate and integrate various functionalities. Multitier architecture is
a kind of client/server architecture in which the presentation, the
application processing, and the data management are logically separate
processes. The simplest known multilayer architecture is a two-tier or
client/server system. This traditional two-tier, client/server model requires
clustering and disaster recovery to ensure resiliency. While the use of
fewer nodes in an enterprise simplifies manageability, change
management is difficult as it requires servers to be taken offline for
repair, upgrading, and new application deployments.
3.3.4. Grid services and OGSA
The OGSA, developed within the OGSA Working Group of the Global
Grid Forum (recently renamed to Open Grid Forum or OGF and being
merged with the Enterprise Grid Alliance or EGA in June 2006), is a
service-oriented architecture that aims to define a common, standard,
and open architecture for grid-based applications. “Open” refers to both
the process to develop standards and the standards themselves. In OGSA,
everything from registries, to computational tasks, to data resources is
considered a service. These extensible set of services are the building
blocks of an OGSA-based grid. OGSA is intended to:
• Facilitate use and management of resources across distributed,
heterogeneous environments
• Deliver seamless QoS
• Define open, published interfaces in order to provide interoperability of
diverse resources
• Exploit industry-standard integration technologies
• Develop standards that achieve interoperability • Integrate, virtualize,
and manage services and resources in
a distributed, heterogeneous environment
4. Programming on Amazon AWS and Microsoft Azure
To get into Amazon AWS and Microsoft Azure programming we have
to learn few things those are. First we will review the AWS platform and its
updated service offerings. Then we will study the EC2, S3, and Simple DB
services with programming examples. Returning to the programming
environment features , Amazon (like Azure) offers a Relational Database
Service (RDS) with a messaging interface sketched. The Elastic
MapReduce capability is equivalent to Hadoop running on the basic EC2
offering. Amazon has NOSQL support in SimpleDB introduced. However,
Amazon does not directly support BigTable

3.4.1. Programming on Amazon Elastic Compute Cloud (EC2)


Amazon was the first company to introduce VMs in application
hosting. Customers can rent Vms instead of physical machines to run
their own applications. By using VMs, customers can load any software of
their choice. The elastic feature of such a service is that a customer can
create, launch, and terminate server instances as needed, paying by the
hour for active servers. Amazon provides several types of preinstalled
VMs. Instances are often called Amazon Machine Images (AMIs) which are
preconfigured with operating systems based on Linux or Windows, and
additional software.
Amazon EC2 Services
3.4.2. Amazon Simple Storage Services(S3)
Amazon S3 provides a simple web services interface that can be
used to store and retrieve any amount of data, at any time, from
anywhere on the web. S3 provides the object-oriented storage service for
users. Users can access their objects through Simple Object Access
Protocol (SOAP) with either browsers or other client programs which
support SOAP. SQS is responsible for ensuring a reliable message service
between two processes, even if the receiver processes are not running.

3.4.3. Amazon Elastic Block Store (EBS) and simpleDB


The Elastic Block Store (EBS) provides the volume block interface for
saving and restoring the virtual images of EC2 instances. Traditional EC2
instances will be destroyed after use. The status of EC2 can now be saved
in the EBS system after the machine is shut down. Users can use EBS to
save persistent data and mount to the running instances of EC2. Note
that S3 is “Storage as a Service” with a messaging interface. EBS is
analogous to a distributed file system accessed by traditional OS disk
access mechanisms. EBS allows you to create storage volumes from 1 GB
to 1 TB that can be mounted as EC2 instances.

3.4.3.1 Amazon simpleDB service


SimpleDB provides a simplified data model based on the relational
database data model. Structured data from users must be organized into
domains. Each domain can be considered a table. The items are the rows
in the table. A cell in the table is recognized as the value for a specific
attribute (column name) of the corresponding row. This is similar to a
table in a relational database. However, it is possible to assign multiple
values to a single cell in the table. This is not permitted in a traditional
relational database which wants to maintain data consistency.
3.4.4. Microsoft Azure programming Support
First we have the underlying Azure fabric consisting of virtualized
hardware together with a sophisticated control environment
implementing dynamic assignment of resources and fault tolerance. This
implements domain name system (DNS) and monitoring capabilities.
Automated service management allows service models to be defined by
an XML template and multiple service copies to be instantiated on
request.
When the system is running, services are monitored and one can
access event logs, trace/debug data, performance counters, IIS web
server logs, crash dumps, and other log files. This information can be
saved in Azure storage.One can divide the basic features into storage and
compute capabilities. The Azure application is linked to the Internet
through a customized compute VM called a web role supporting basic
Microsoft web hosting. Such configured VMs are often called appliances.
The other important compute class is the worker role reflecting the
importance in cloud computing of a pool of compute resources that are
scheduled as needed.
UNIT-5
STORAGE SYSTEMS

1. Evolution of storage technology


5.1.1. Early Storage (1940s - 1950s)
• Punch Cards and Paper Tape: Early computers used punch cards
and paper tapes for data storage, which were extremely limited in
capacity and speed.

5.1.2. Magnetic Tape and Disk Storage (1950s - 1960s)


• Magnetic Tape: Developed in the 1950s, magnetic tape was used
for sequential data storage, primarily for backup and archival
purposes. It offered larger storage capacities than punch cards but
had slow access times.

• Hard Disk Drives (HDD): IBM introduced the first commercial HDD
in 1956, the IBM 350, with a storage capacity of 5 MB spread across
50 platters. This marked the beginning of disk-based storage
technology, which allowed direct access to data.
5.1.3. Early Semiconductor Memory (1970s)
• Core Memory: Before the dominance of modern semiconductor
memory, core memory (magnetic cores) was a common technology.
Each bit was stored on a magnetic core.

• Dynamic RAM (DRAM): In 1968, DRAM was invented by Robert


Dennard, leading to the development of faster, smaller, and more
reliable memory for computers. This began the transition to
semiconductor-based storage.

5.1.4. Floppy Disks and Optical Storage (1970s - 1980s)


• Floppy Disks: Introduced in the 1970s, floppy disks provided
portable storage for personal computers, with capacities starting at
80 KB and evolving up to 1.44 MB.
• CDs and DVDs: In the 1980s and 1990s, optical storage like CDs
(Compact Discs) and DVDs (Digital Versatile Discs) became popular
for data storage and media distribution. CDs had a capacity of 700
MB, and DVDs could store 4.7 GB.

5.1.5. Modern HDDs and Flash Memory (1980s - 2000s)


• Modern HDDs: Hard drives improved significantly in terms of
capacity, speed, and reliability. By the early 2000s, HDD capacities
had grown to hundreds of gigabytes and later to multiple terabytes.
• Flash Memory: In the 1980s, flash memory was invented by Dr.
Fujio Masuoka at Toshiba. Flash memory became widely adopted in
the form of USB drives, SSDs (Solid State Drives), and memory
cards. It offered faster access times and better durability than HDDs,
but with lower capacities in the beginning.

5.1.6. Solid State Drives (2000s - Present)


• SSD Technology: SSDs, based on NAND flash memory, began to
replace HDDs in many applications due to their faster read/write
speeds, smaller size, and resistance to physical shock. Over time,
SSD capacities have grown to several terabytes, making them more
competitive with traditional HDDs.
• NVMe and PCIe SSDs: Newer SSD interfaces like NVMe (Non-
Volatile Memory Express) and PCIe (Peripheral Component
Interconnect Express) have significantly improved data transfer
speeds, reducing bottlenecks in storage performance.
5.1.7. Cloud Storage (2000s - Present)
• Cloud Storage: With the rise of high-speed internet, cloud storage
services like Google Drive, Dropbox, and Amazon S3 have become
popular. These services allow users to store and access their data
over the internet, providing scalability, redundancy, and data
protection.

5.1.8. Future Trends


• DNA Storage: DNA-based storage is an emerging technology that
promises to store massive amounts of data (exabytes) in incredibly
small spaces by encoding data in the sequences of DNA.

• Quantum Storage: In the long term, quantum storage technologies


could revolutionize data storage with massive potential capacities
and ultra-fast access times.
• Holographic Storage: Holographic data storage is another
futuristic technology that aims to use light patterns to store data in
three dimensions, potentially increasing both speed and capacity.

2. Storage models
Storage models describe the layout of a data structure in a physical
storage; a data model captures the most important logical aspects of a
data structure in a database. Physical storage can be a local disk, a
removable media, or storage accessible via the network.
Two abstract models of storage are commonly used: cell storage and
journal storage. Cell storage assumes that the storage consists of cells of
the same size and that each object fits in one cell. This model reflects the
physical organization of several storage media; the primary memory of a
computer is organized as an array of memory cells and a secondary
storage device, e.g., a disk, is organized in sectors or blocks read and
written as a unit. Read/write coherence and before-or-after atomicity are
two highly desirable properties of any storage model and, in particular, of
cell storage

Journal storage is a fairly elaborate organization for storing composite


objects such as records consisting of multiple fields. Journal storage
consists of a manager and a cell storage where the entire history of a
variable is maintained, rather than just the current value. The user does
not have direct access to the cell storage, instead it can request the
journal manager to: (i) start a new action; (ii) read the value of a cell; (iii)
write the value of a cell; (iv) commit an action; and (v) abort an action.
The journal manager translates user requests into commands sent to the
cell storage: (i) read a cell; (ii) write a cell; (iii) allocate a cell; and (iv)
deallocate a cell.
3. File systems
A file system consists of a collection of directories and each directory
provides information about a set of files. High-performance systems can
choose among three classes of file systems: Network File Systems (NFS),
Storage Area Networks (SAN), and Parallel File Systems (PFS). Network file
systems are very popular and have been used for some time, but do not
scale well and have reliability problems; an NFS server could be a single
point of failure

Advances in networking technology allow separation of storage systems


from computational servers; the two can be connected by an SAN. SANs
offer additional flexibility and allow cloud servers to deal with
nondisruptive changes in the storage configuration.

Parallel file systems are scalable, are capable of distributing files across a
large number of nodes, and provide a global naming space. In a parallel
data system, several I/O nodes serve data to all computational nodes

4. Databases
Most cloud applications are data-intensive, test the limitations of existing
cloud storage infrastructure, and demand database management systems
capable of supporting rapid application development and a short time-to-
market. Cloud applications require low latency, scalability, high
availability, and demand a consistent view of data. These requirements
cannot be satisfied simultaneously by existing database models; for
example, relational databases are easy to use for application
development but do not scale well.

As its name implies, the NoSQL model does not support SQL as a query
language and may not guarantee the ACID, Atomicity, Consistency,
Isolation, and Durability properties of traditional databases. It usually
guarantees an eventual consistency for transactions limited to a single
data item.
5. Distributed file systems
A Distributed File System (DFS) is a file system that is distributed on multiple file
servers or multiple locations. It allows programs to access or store isolated files as
they do with the local ones, allowing programmers to access files from any
network or computer. In this article, we will discuss everything about Distributed
File System.
DFS (Distributed File System) is a technology that allows you to group shared
folders located on different servers into one or more logically structured
namespaces. The main purpose of the Distributed File System (DFS) is to allows
users of physically distributed systems to share their data and resources by using
a Common File System. A collection of workstations and mainframes connected by
a Local Area Network (LAN) is a configuration on Distributed File System. A DFS
is executed as a part of the operating system. In DFS, a namespace is created and
this process is transparent for the clients.

Components of DFS

•Location Transparency: Location Transparency achieves through the


namespace component.

•Redundancy: Redundancy is done through a file replication component.

6. General parallel file systems


General Parallel File System (GPFS™) is a cluster file system. This means that it provides
concurrent access to a single file system or set of file systems from multiple nodes. These
nodes can all be SAN attached or a mix of SAN and network attached. This enables high
performance access to this common set of data to support a scale-out solution or provide a
high availability platform.
GPFS has many features beyond common data access including data replication, policy
based storage management, and multi-site operations. You can create a GPFS cluster of
AIX® nodes, Linux nodes, Windows server nodes, or a mix of all three. GPFS can run on
virtualized instances providing common data access in environments, leverage logical
partitioning, or other hypervisors. Multiple GPFS clusters can share data within a location or
across wide area network (WAN) connections.
7. Google file systems
Google Inc. developed the Google File System (GFS), a scalable distributed file
system (DFS), to meet the company’s growing data processing needs. GFS offers
fault tolerance, dependability, scalability, availability, and performance to big
networks and connected nodes. GFS is made up of a number of storage systems
constructed from inexpensive commodity hardware parts. The search engine,
which creates enormous volumes of data that must be kept, is only one example
of how it is customized to meet Google’s various data use and storage
requirements.
The Google File System reduced hardware flaws while gains of commercially
available servers.
GoogleFS is another name for GFS. It manages two types of data namely File
metadata and File Data.
The GFS node cluster consists of a single master and several chunk servers that
various client systems regularly access. On local discs, chunk servers keep data in
the form of Linux files. Large (64 MB) pieces of the stored data are split up and
replicated at least three times around the network. Reduced network overhead
results from the greater chunk size.

You might also like