Practicals Cloud Computing
Practicals Cloud Computing
Oracle VM VirtualBox (formerly Sun VirtualBox, Sun xVM VirtualBox and innotek
VirtualBox), a virtualization software package for x86 and AMD64/Intel64-based computers
from Oracle Corporation, forms part of Oracle's family of virtualization products. innotek GmbH
first developed the product; Sun Microsystems purchased it in 2008; Oracle has continued
development since 2010.
The VirtualBox package installs on an existing host operating system as an application; this host
application allows additional guest operating systems, each known as a Guest OS, to load and
run, each with its own virtual environment.
Supported host operating systems include Linux, Mac OS X, Windows XP, Windows Vista,
Windows 7, Windows 8, Solaris, and OpenSolaris; there are also ports to FreeBSD[4] and
Genode.[5]
Supported guest operating systems include versions and derivations of Windows, Linux, BSD,
OS/2, Solaris, Haiku and others.[6] Since release 3.2.0, VirtualBox also allows limited
virtualization of Mac OS X guests on Apple hardware, though OSX86 can also be installed using
VirtualBox.[7][8]
Since version 4.3 (released in October 2013[9]), Microsoft Windows guests on supported
hardware can take advantage of the recently implemented WDDM driver included in the guest
additions; this allows Windows Aero to be enabled along with Direct3D support.
Guest Additions should be installed in order to achieve the best possible experience. [10] The
Guest Additions are designed[by whom?] for installation inside a virtual machine after the
installation of the guest operating system. They consist of device drivers and system applications
that optimize the guest operating system for better performance and usability.
Emulated environment
Users of VirtualBox can load multiple guest OSs under a single host operating-system (host OS).
Each guest can be started, paused and stopped independently within its own virtual machine
(VM). The user can independently configure each VM and run it under a choice of software-
based virtualization or hardware assisted virtualization if the underlying host hardware supports
this. The host OS and guest OSs and applications can communicate with each other through a
number of mechanisms including a common clipboard and a virtualized network facility. Guest
VMs can also directly communicate with each other if configured to do so.[27]
Software-based virtualization
The system reconfigures the guest OS code, which would normally run in ring 0, to
execute in ring 1 on the host hardware. Because this code contains many privileged
instructions which cannot run natively in ring 1, VirtualBox employs a Code Scanning
and Analysis Manager (CSAM) to scan the ring 0 code recursively before its first
execution to identify problematic instructions and then calls the Patch Manager (PATM)
to perform in-situ patching. This replaces the instruction with a jump to a VM-safe
equivalent compiled code fragment in hypervisor memory.
The guest user-mode code, running in ring 3, generally runs directly on the host hardware
in ring 3.
In both cases, VirtualBox uses CSAM and PATM to inspect and patch the offending instructions
whenever a fault occurs. VirtualBox also contains a dynamic recompiler, based on QEMU to
recompile any real mode or protected mode code entirely (e.g. BIOS code, a DOS guest, or any
operating system startup).[28]
Using these techniques, VirtualBox can achieve a performance comparable to that of
VMware.[29][30]
Hardware-assisted virtualization
VirtualBox supports both Intel's VT-x and AMD's AMD-V hardware-virtualization. Making use
of these facilities, VirtualBox can run each guest VM in its own separate address-space; the guest
OS ring 0 code runs on the host at ring 0 in VMX non-root mode rather than in ring 1.
VirtualBox supports some guests (including 64-bit guests, SMP guests and certain proprietary
OSs) only on hosts with hardware-assisted virtualization.
Device virtualization
The system emulates hard disks in one of three disk image formats:
1. a VirtualBox-specific container format, called "Virtual Disk Image" (VDI), storing files
(with a .vdi suffix) on the host operating system
2. VMware Virtual Machine Disk Format (VMDK)
3. Microsoft Virtual PC VHD format
A VirtualBox virtual machine can, therefore, use disks previously created in VMware or
Microsoft Virtual PC, as well as its own native format. VirtualBox can also connect to iSCSI
targets and to raw partitions on the host, using either as virtual hard disks. VirtualBox emulates
IDE (PIIX4 and ICH6 controllers), SCSI, SATA (ICH8M controller) and SAS controllers to
which hard drives can be attached.
VirtualBox has supported Open Virtualization Format (OVF) since version 2.2.0 (April 2009).[31]
Both ISO images and host-connected physical devices can be mounted as CD/DVD drives. For
example, the DVD image of a Linux distribution can be downloaded and used directly by
VirtualBox.
By default VirtualBox provides graphics support through a custom virtual graphics-card that is
VESA compatible. The Guest Additions for Windows, Linux, Solaris, OpenSolaris, or OS/2
guests include a special video-driver that increases video performance and includes additional
features, such as automatically adjusting the guest resolution when resizing the VM window[32]
or desktop composition via virtualized WDDM drivers .
For an Ethernet network adapter, VirtualBox virtualizes these Network Interface Cards:[33]
The emulated network cards allow most guest OSs to run without the need to find and install
drivers for networking hardware as they are shipped as part of the guest OS. A special
paravirtualized network adapter is also available, which improves network performance by
eliminating the need to match a specific hardware interface, but requires special driver support in
the guest. (Many distributions of Linux ship with this driver included.) By default, VirtualBox
uses NAT through which Internet software for end-users such as Firefox or ssh can operate.
Bridged networking via a host network adapter or virtual networks between guests can also be
configured. Up to 36 network adapters can be attached simultaneously, but only four are
configurable through the graphical interface.
For a sound card, VirtualBox virtualizes Intel HD Audio, Intel ICH AC'97 and SoundBlaster 16
devices.[34]
A USB 1.1 controller is emulated so that any USB devices attached to the host can be seen in the
guest. The proprietary extension pack adds a USB 2.0 controller and, if VirtualBox acts as an
RDP server, it can also use USB devices on the remote RDP client as if they were connected to
the host, although only if the client supports this VirtualBox-specific extension (Oracle provides
clients for Solaris, Linux and Sun Ray thin clients that can do this, and have promised support
for other platforms in future versions).
Practical No: 2
Google App Engine (often referred to as GAE or simply App Engine) is a platform as a service
(PaaS) cloud computing platform for developing and hosting web applications in Google-
managed data centers. Applications are sandboxed and run across multiple servers.[1] App Engine
offers automatic scaling for web applications—as the number of requests increases for an
application, App Engine automatically allocates more resources for the web application to handle
the additional demand.[2]
Google App Engine is free up to a certain level of consumed resources. Fees are charged for
additional storage, bandwidth, or instance hours required by the application. [3] It was first
released as a preview version in April 2008 and came out of preview in September 2011.
Currently, the supported programming languages are Python, Java (and, by extension, other JVM
languages such as Groovy, JRuby, Scala, Clojure), Go, and PHP. Go and PHP are in
experimental status.[4] Google has said that it plans to support more languages in the future, and
that the Google App Engine has been written to be language independent. [5]
Python web frameworks that run on Google App Engine include Django, CherryPy, Pyramid,
Flask, web2py and webapp2,[6] as well as a custom Google-written webapp framework and
several others designed specifically for the platform that emerged since the release. [7] Any
Python framework that supports the WSGI using the CGI adapter can be used to create an
application; the framework can be uploaded with the developed application. Third-party libraries
written in pure Python may also be uploaded.[8][9]
Google App Engine supports many Java standards and frameworks. Core to this is the servlet 2.5
technology using the open-source Jetty Web Server,[10] along with accompanying technologies
such as JSP. JavaServer Faces operates with some workarounds. Though the datastore used may
be unfamiliar to programmers, it is easily accessed and supported with JPA. JDO and other
methods of reading and writing data are also provided. The Spring Framework works with GAE,
however the Spring Security module (if used) requires workarounds. Apache Struts 1 is
supported, and Struts 2 runs with workarounds.[11]
The Django web framework and applications running on it can be used on App Engine with
modification. Django-nonrel[12] aims to allow Django to work with non-relational databases and
the project includes support for App Engine. [13]
Applications developed for the Grails web application framework was meant to be modified and
deployed to Google App Engine with very little effort using the App Engine Plugin, [14] but is no
more because grails GAE plugin project is not operational anymore.
Practical No:3
Microsoft Azure (English pronunciation: /aʒəː/) (formerly Windows Azure before 25 March
2014) is a cloud computing platform and infrastructure, created by Microsoft, for building,
deploying and managing applications and services through a global network of Microsoft-
managed datacenters. It provides both PaaS and IaaS services and supports many different
programming languages, tools and frameworks, including both Microsoft-specific and third-
party software and systems. Azure was released on 1 February 2010. [1]
Features
Microsoft Azure is Microsoft's cloud application platform. In June 2012, Microsoft Azure
released the following new features:
Websites allows developers to build sites using ASP.NET, PHP, Node.js, or Python and
can be deployed using FTP, Git, Mercurial or Team Foundation Server.
Virtual machines let developers migrate applications and infrastructure without changing
existing code and can run both Windows Server and Linux virtual machines.
Cloud services - Microsoft's Platform as a Service (PaaS) environment can be used to
create scalable applications and services. It supports multi-tier architectures and
automated deployments.
Data management - SQL Database, formerly known as SQL Azure Database, works to
create, scale and extend applications into the cloud using Microsoft SQL Server
technology. It also integrates with Active Directory and Microsoft System Center and
Hadoop.[2]
Media services - A PaaS offering that can be used for encoding, content protection,
streaming, and/or analytics.
The Microsoft Azure Platform provides an API built on REST, HTTP, and XML that allows a
developer to interact with the services provided by Microsoft Azure. Microsoft also provides a
client-side managed class library which encapsulates the functions of interacting with the
services. It also integrates with Microsoft Visual Studio, Git, and Eclipse.
Services
Web sites - High density hosting of web sites. This feature was announced in preview
form in June 2012 at the Meet Microsoft Azure event. [3] Customers can create web sites
in PHP, ASP.NET, Node.js, or Python, or select from several open source applications
from a gallery to deploy. This comprises one aspect of the Platform as a Service (PaaS)
offerings for the Windows Azure Platform.
Virtual machines - Announced in preview form at the Meet Windows Azure event in
June 2012[3] the Windows Azure Virtual Machines comprise the Infrastructure as a
Service (IaaS) offering from Microsoft for their public cloud. Customers can create
Virtual Machines, of which they have complete control, to run the Microsoft Data
Centers. As of the preview the Virtual Machines supported Windows Server 2008 and
2012 operating systems and a few distributions of Linux. The General Availability
version of Virtual Machine was released in May 2013.
Cloud services - Previously named "Hosted Services", the Cloud Services for Windows
Azure comprise one aspect of the PaaS offerings from the Windows Azure Platform. The
Cloud Services are containers of hosted applications. These applications can be internet-
facing public web applications (such as web sites and e-commerce solutions), or they can
be private processing engines for other work, such as processing orders or analyzing data.
o Developers can write code for Cloud Services in a variety of different
programming languages; however, there are specific software development kits
(SDKs) started by Microsoft for Python, Java, Node.js and .NET.[4] Other
languages may have support through Open Source projects. Microsoft published
the source code for their client libraries on GitHub.
Practical No:4
Apache Hadoop is an open-source software framework for distributed storage and distributed
processing of Big Data on clusters of commodity hardware. Its Hadoop Distributed File System
(HDFS) splits files into large blocks (default 64MB or 128MB) and distributes the blocks
amongst the nodes in the cluster. For processing the data, the Hadoop Map/Reduce ships code
(specifically Jar files) to the nodes that have the required data, and the nodes then process the
data in parallel. This approach takes advantage of data locality, [2] in contrast to conventional
HPC architecture which usually relies on a parallel file system (compute and data separated, but
connected with high-speed networking).[3]
Since 2012,[4] the term "Hadoop" often refers not to just the base Hadoop package but rather to
the Hadoop Ecosystem, which includes all of the additional software packages that can be
installed on top of or alongside Hadoop, such as Apache Hive, Apache Pig and Apache Spark.
Hadoop Common – contains libraries and utilities needed by other Hadoop modules.
Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on
commodity machines, providing very high aggregate bandwidth across the cluster.
Hadoop YARN – a resource-management platform responsible for managing compute
resources in clusters and using them for scheduling of users' applications.
Hadoop MapReduce – a programming model for large scale data processing.
All the modules in Hadoop are designed with a fundamental assumption that hardware failures
(of individual machines, or racks of machines) are common and thus should be automatically
handled in software by the framework. Apache Hadoop's MapReduce and HDFS components
originally derived respectively from Google's MapReduce and Google File System (GFS)
papers.
YARN stands for "Yet Another Resource Negotiator" and was added later as part of Hadoop 2.0.
YARN takes the resource management capabilities that were in MapReduce and packages them
so they can be used by new engines. This also streamlines MapReduce to do what it does best,
process data. With YARN, you can now run multiple applications in Hadoop, all sharing a
common resource management. As of September, 2014, YARN manages only CPU (number of
cores) and memory,[5] but management of other resources such as disk, network and GPU is
planned for the future.[6]
Beyond HDFS, YARN, and MapReduce, the entire Apache Hadoop "platform" is now
commonly considered to consist of a number of related projects as well – Apache Pig, Apache
Hive, Apache HBase, Apache Spark, and others.[7]
For the end-users, though MapReduce Java code is common, any programming language can be
used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's
program.[8] Apache Pig, Apache Hive, Apache Spark among other related projects expose higher
level user interfaces like Pig Latin and a SQL variant respectively. The Hadoop framework itself
is mostly written in the Java programming language, with some native code in C and command
line utilities written as shell-scripts.
Architecture
Hadoop consists of the Hadoop Common package, which provides filesystem and OS level
abstractions, a MapReduce engine (either MapReduce/MR1 or YARN/MR2) [13] and the Hadoop
Distributed File System (HDFS). The Hadoop Common package contains the necessary Java
ARchive (JAR) files and scripts needed to start Hadoop. The package also provides source code,
documentation, and a contribution section that includes projects from the Hadoop
Community.[citation needed]
For effective scheduling of work, every Hadoop-compatible file system should provide location
awareness: the name of the rack (more precisely, of the network switch) where a worker node is.
Hadoop applications can use this information to run work on the node where the data is, and,
failing that, on the same rack/switch, reducing backbone traffic. HDFS uses this method when
replicating data to try to keep different copies of the data on different racks. The goal is to reduce
the impact of a rack power outage or switch failure, so that even if these events occur, the data
may still be readable.[14]
A small Hadoop cluster includes a single master and multiple worker nodes. The master node
consists of a JobTracker, TaskTracker, NameNode and DataNode. A slave or worker node acts
as both a DataNode and TaskTracker, though it is possible to have data-only worker nodes and
compute-only worker nodes. These are normally used only in nonstandard applications. [15]
Hadoop requires Java Runtime Environment (JRE) 1.6 or higher. The standard startup and
shutdown scripts require that Secure Shell (ssh) be set up between nodes in the cluster. [16]
In a larger cluster, the HDFS is managed through a dedicated NameNode server to host the file
system index, and a secondary NameNode that can generate snapshots of the namenode's
memory structures, thus preventing file-system corruption and reducing loss of data. Similarly, a
standalone JobTracker server can manage job scheduling. In clusters where the Hadoop
MapReduce engine is deployed against an alternate file system, the NameNode, secondary
NameNode, and DataNode architecture of HDFS are replaced by the file-system-specific
equivalents.
Practical No: 5
Amazon Web Services (AWS) is a collection of remote computing services (also called web
services) that together make up a cloud computing platform, offered over the Internet by
Amazon.com. The most central and well-known of these services are Amazon EC2 and Amazon
S3. The service is advertised as providing a large computing capacity (potentially many servers)
much faster and cheaper than building a physical server farm. [2]
Architecture
AWS is located in 11 geographical "regions": US East (Northern Virginia), where the majority
of AWS servers are based,[3] US West (northern California), US West (Oregon), Brazil (São
Paulo), Europe (Ireland and Germany), Southeast Asia (Singapore), East Asia (Tokyo and
Beijing) and Australia (Sydney). There is also a "GovCloud", based in the Northwestern United
States, provided for U.S. government customers, complementing existing government agencies
already using the US East Region. [4] Each Region is wholly contained within a single country
and all of its data and services stay within the designated Region.
Each Region has multiple "Availability Zones", which are distinct data centers providing AWS
services. Availability Zones are isolated from each other to prevent outages from spreading
between Zones. Several services operate across Availability Zones (e.g., S3, DynamoDB) while
others can be configured to replicate across Zones to spread demand and avoid downtime from
failures.
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the
Amazon Web Services (AWS) cloud. Using Amazon EC2 eliminates your need to invest in
hardware up front, so you can develop and deploy applications faster. You can use Amazon EC2
to launch as many or as few virtual servers as you need, configure security and networking, and
manage storage. Amazon EC2 enables you to scale up or down to handle changes in
requirements or spikes in popularity, reducing your need to forecast traffic.
For more information about cloud computing, see What is Cloud Computing?
Preconfigured templates for your instances, known as Amazon Machine Images (AMIs),
that package the bits you need for your server (including the operating system and
additional software)
Various configurations of CPU, memory, storage, and networking capacity for your
instances, known as instance types
Secure login information for your instances using key pairs (AWS stores the public key,
and you store the private key in a secure place)
Storage volumes for temporary data that's deleted when you stop or terminate your
instance, known as instance store volumes
Persistent storage volumes for your data using Amazon Elastic Block Store (Amazon
EBS), known as Amazon EBS volumes
Multiple physical locations for your resources, such as instances and Amazon EBS
volumes, known as regions and Availability Zones
A firewall that enables you to specify the protocols, ports, and source IP ranges that can
reach your instances using security groups
Metadata, known as tags, that you can create and assign to your Amazon EC2 resources
Virtual networks you can create that are logically isolated from the rest of the AWS
cloud, and that you can optionally connect to your own network, known as virtual private
clouds (VPCs)
For more information about the features of Amazon EC2, see the Amazon EC2 product page.
For more information about running your website on AWS, see Websites & Website Hosting.
The first thing you need to do is get set up to use Amazon EC2. After you are set up, you are
ready to complete the Getting Started tutorial for Amazon EC2. Whenever you need more
information about a feature of Amazon EC2, you can read the technical documentation.
Basics
Instance Types
Tags
Security Groups
Storage
Amazon EBS
Instance Store
If you have questions about whether AWS is right for you, contact AWS Sales. If you have
technical questions about Amazon EC2, use the Amazon EC2 forum.
Related Services
You can provision Amazon EC2 resources, such as instances and volumes, directly using
Amazon EC2. You can also provision Amazon EC2 resources using other services in AWS. For
more information, see the following documentation:
To automatically distribute incoming application traffic across multiple instances, use Elastic
Load Balancing. For more information, see Elastic Load Balancing Developer Guide.
To monitor basic statistics for your instances and Amazon EBS volumes, use Amazon
CloudWatch. For more information, see the Amazon CloudWatch Developer Guide.
To monitor the calls made to the Amazon EC2 API for your account, including calls made by the
AWS Management Console, command line tools, and other services, use AWS CloudTrail. For
more information, see the AWS CloudTrail User Guide.
To get a managed relational database in the cloud, use Amazon Relational Database Service
(Amazon RDS) to launch a database instance. Although you can set up a database on an EC2
instance, Amazon RDS offers the advantage of handling your database management tasks, such
as patching the software, backing up, and storing the backups. For more information, see
Amazon Relational Database Service Developer Guide.
Amazon EC2 provides a web-based user interface, the Amazon EC2 console. If you've signed up
for an AWS account, you can access the Amazon EC2 console by signing into the AWS
Management Console and selecting EC2 from the console home page.
If you prefer to use a command line interface, you have several options:
Provides commands for a broad set of AWS products, and is supported on Windows,
Mac, and Linux. To get started, see AWS Command Line Interface User Guide. For more
information about the commands for Amazon EC2, see ec2 in the AWS Command Line
Interface Reference.
Provides commands for Amazon EC2, Amazon EBS, and Amazon VPC, and is supported
on Windows, Mac, and Linux. To get started, see Setting Up the Amazon EC2 Command
Line Interface Tools on Linux and Commands (CLI Tools) in the Amazon EC2 Command
Line Reference.
Provides commands for a broad set of AWS products for those who script in the
PowerShell environment. To get started, see the AWS Tools for Windows PowerShell
User Guide. For more information about the cmdlets for Amazon EC2, see the AWS
Tools for Windows PowerShell Reference.
Amazon EC2 provides a Query API. These requests are HTTP or HTTPS requests that use the
HTTP verbs GET or POST and a Query parameter named Action. For more information about
the API actions for Amazon EC2, see Actions in the Amazon EC2 API Reference.
If you prefer to build applications using language-specific APIs instead of submitting a request
over HTTP or HTTPS, AWS provides libraries, sample code, tutorials, and other resources for
software developers. These libraries provide basic functions that automate tasks such as
cryptographically signing your requests, retrying requests, and handling error responses, making
it is easier for you to get started. For more information, see AWS SDKs and Tools.
Practical No: 6
Aneka Architecture
Aneka is a platform and a framework for developing distributed applications on the Cloud. It
harnesses the spare CPU cycles of a heterogeneous network of desktop PCs and servers or
datacenters on demand. Aneka provides developers with a rich set of APIs for transparently
exploiting such resources and expressing the business logic of applications by using the preferred
programming abstractions. System administrators can leverage on a collection of tools to
monitor and control the deployed infrastructure. This can be a public cloud available to anyone
through the Internet, or a private cloud constituted by a set of nodes with restricted access.
The Aneka based computing cloud is a collection of physical and virtualized resources connected
through a network, which are either the Internet or a private intranet. Each of these resources
hosts an instance of the Aneka Container representing the runtime environment where the
distributed applications are executed. The container provides the basic management features of
the single node and leverages all the other operations on the services that it is hosting. The
services are broken up into fabric, foundation, and execution services. Fabric services directly
interact with the node through the Platform Abstraction Layer (PAL) and perform hardware
profiling and dynamic resource provisioning. Foundation services identify the core system of the
Aneka middleware, providing a set of basic features to enable Aneka containers to perform
specialized and specific sets of tasks. Execution services directly deal with the scheduling and
execution of applications in the Cloud.
One of the key features of Aneka is the ability of providing different ways for expressing
distributed applications by offering different programming models; execution services are mostly
concerned with providing the middleware with an implementation for these models. Additional
services such as persistence and security are transversal to the entire stack of services that are
hosted by the Container. At the application level, a set of different components and tools are
provided to: 1) simplify the development of applications (SDK); 2) porting existing applications
to the Cloud; and 3) monitoring and managing the Aneka Cloud.
A common deployment of Aneka is presented at the side. An Aneka based Cloud is constituted
by a set of interconnected resources that are dynamically modified according to the user needs by
using resource virtualization or by harnessing the spare CPU cycles of desktop machines. If the
deployment identifies a private Cloud all the resources are in house, for example within the
enterprise. This deployment is extended by adding publicly available resources on demand or by
interacting with other Aneka public clouds providing computing resources connected over the
Internet.
Aneka
Manjrasoft is focused on the creation of innovative software technologies for simplifying the
development and deployment of applications on private or public Clouds. Our product Aneka
plays the role of Application Platform as a Service for Cloud Computing. Aneka supports
various programming models involving Task Programming, Thread Programming and
MapReduce Programming and tools for rapid creation of applications and their seamless
deployment on private or public Clouds to distribute applications.
Aneka technology primarily consists of two key components:
business intelligence
Highlights of Aneka
Technical Value
Support of multiple programming and application environments
Simultaneous support of multiple run-time environments
Rapid deployment tools and framework
Simplicity in developing applications on Cloud
Dynamic Scalability
Ability to harness multiple virtual and/or physical machines for accelerating application
result
Provisioning based on QoS/SLA
Business Value
Improved reliability
Simplicity
Faster time to value
Operational Agility
Definite application performance enhancement
Optimizing the capital expenditure and operational expenditure
All these features make Aneka a winning solution for enterprise customers in the Platform-as-a-
Service scenario