0% found this document useful (0 votes)
3 views

module 4

Aneka is a cloud application development platform that provides a framework for building, deploying, and managing applications on cloud infrastructures. It includes various services such as storage management, accounting, resource reservation, and execution management, categorized into fabric, foundation, and application services. The platform supports multiple programming models and offers tools for infrastructure, platform, and application management, making it suitable for data-intensive computing tasks.

Uploaded by

martinphilson356
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

module 4

Aneka is a cloud application development platform that provides a framework for building, deploying, and managing applications on cloud infrastructures. It includes various services such as storage management, accounting, resource reservation, and execution management, categorized into fabric, foundation, and application services. The platform supports multiple programming models and offers tools for infrastructure, platform, and application management, making it suitable for data-intensive computing tasks.

Uploaded by

martinphilson356
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

MODULE IV

• ANEKA
• DATA INTENSIVE COMPUTING
Topic 1 :
ANATOMY OF THE ANEKA CONTAINER / SERVICES
PROVIDED BY ANEKA / FRAMEWORK(15MARK)
• Aneka is a cloud application development and management platform
developed by Manjrasoft, an Australian company.
• It provides a framework for building, deploying, and managing
applications on private or public cloud infrastructures.
• Aneka is Manjrasoft's solution for developing, deploying, and
managing Cloud applications.
• Aneka is a Pure PaaS solution for Cloud computing.
• One of the key advantages of Aneka is its extensible set of APIs
associated with different types of programming models.
Services installed in the Aneka container
The services installed in the Aneka container can be classified into
three major categories:

1. fabric services
2. foundation services
3. application services
• Foundation Services
1. Storage Management
Aneka offers two different facilities for storage management:
• A centralized file storage, all files are stored on a single server or
storage system,which is mostly used for the execution of compute-
intensive applications.
-Compute intensive applications mostly require powerful
processors and do not have high demands in terms of storage,
which in many cases is used to store small files that are easily
transferred from one node to another
• A distributed file system, Files are stored across multiple servers
or locations. which is more suitable for the execution of data-
intensive applications.
-Data intensive applications are characterized by a large data files
(gigabytes or terabytes), and the processing power required by
tasks does not constitute a performance bottleneck.
2. Accounting, Billing, and Resource Pricing
-A complete history of application execution and storage as well as
other resource utilization parameter are captured and minded by the
accounting services. This information constitutes the foundation on
top of which users are charged in Aneka.
- Billing is another important feature of accounting. Aneka billing
service provides detailed information about the resource usage of
each user with the associated costs.
-Each resource can be priced differently according to the different set
of services that are available on the corresponding Aneka container or
the installed software in the node.
3. Resource Reservation
• Resource reservation allows reserving resources for exclusive use by
specific applications.
• Resource reservation is built out of two different kinds of services:
Resource Reservation and Allocation Service. The former keeps track
of all the reserved time slots in the Aneka Cloud , while the latter
manages the database of information regarding the allocated slots
on the local node.
• 3 types of reservation:(a) Basic Reservation (b) Libra Reservation
(c) Relay Reservation
• Application Services
1. Scheduling
Common tasks that are performed by the scheduling component are
the following:
● Job-to-node(virtual machines) mapping
● Rescheduling of failed jobs
● Job status monitoring
● Application status monitoring
2. Execution
• Execution services control the execution of single jobs that
compose applications.
• They are in charge of setting up the runtime environment
hosting the execution of jobs.
• unpacking the jobs received from the scheduler
• Retrieval of input files required for the job execution
• Sandboxed execution of jobs
• Submission of output files at the end of execution
• Execution failure management (i.e., capturing sufficient
contextual information useful to identify the nature of the
failure)
• Performance Monitoring
• Fabric Services

1.Profiling and Monitoring


❖ Heartbeat Service
✅ Sends periodic heartbeat signals from worker nodes to the Aneka master to


indicate that they are active.
If a node stops sending heartbeats, it is marked as inactive or failed, triggering


task reallocation.
Ensures fault detection and maintains system reliability by proactively
identifying failures.
❖ Monitoring Service
✅ Continuously tracks the status, and performance of worker nodes and

✅ Detects failures, and resource underutilization to optimize execution.


resources.

✅ Helps in decision-making for workload balancing.


❖ Reporting Service
✅ Collects execution logs, resource usage data, and application performance


metrics.
Generates detailed reports for performance analysis, debugging, and
optimization.
2.Resource Management
❖ Resource Membership (Index Service or Membership Catalogue)


Maintains a catalog of available resources in the Aneka cloud environment.
The Index Service keeps an updated list of all active, idle, and busy resources.
❖ Resource Reservation
✅ Allows users or applications to pre-book specific computing resources for


dedicated use.
Ensures guaranteed availability of resources for critical tasks or high-priority
jobs.
❖ Resource Provisioning
✅ Dynamically allocates and deallocates resources based on workload demand.
✅ Enables elastic scaling, where resources are added when demand is high and
released when demand decreases.
Platform Abstraction Layer (PAL)
• PAL (Programming Abstraction Layer) in Aneka is a software layer that
provides a simplified interface for developers to build and execute
applications on cloud infrastructure
• PAL provides an abstraction layer that simplifies how applications interact
with cloud resources.
• PAL providing a unified API for managing resources, supporting various
programming models, and abstracting the complexities of cloud
infrastructure.
• Helps developers focus on application logic while PAL manages the cloud
execution environment.
• Supports Multiple Programming Models – Allows developers to use
different execution models like Task, Thread, and MapReduce.
Aneka container
• Aneka containers are the execution environments that run those
applications in the cloud.
• Task Isolation – Containers isolate tasks, ensuring they don’t interfere
with each other’s execution.
• Resource Management – They manage resource allocation, ensuring
tasks are distributed efficiently across available nodes.
• Scalability – Containers allow dynamic scaling of applications based
on demand.
• Fault Tolerance – They help ensure reliability by automatically
recovering from failures.
Topic 2 :CLOUD PROGRAMMING AND MANAGEMENT

a. Aneka SDK
• Aneka SDK(software development kit) is a Platform-as-a-Service (PaaS)
framework for developing and deploying parallel and distributed applications
on cloud infrastructure.
• Multiple Programming Models: It supports Task Model, Thread Model, and
MapReduce Model, allowing developers to create applications based on
different computational needs.
• Dynamic Resource Management: Aneka enables scalable and flexible resource
provisioning, utilizing public, private, and hybrid cloud environments efficiently.
• The Aneka SDK provides support for both programming models and services by
means of the Application Model and the Service Model.
a.1.Service Model
The Aneka Service Model defines the basic requirements needed to
implement a service that can be hosted in the Aneka Cloud.
The container defines the runtime environment where services are
hosted.
Each service that is hosted in the container must be compliant with the
iServices interface, which exposes the following methods and
properties:
● Name and status
● Control operations such as
Start, Stop, Pause, and Continue methods Message handling by means
of the Handle Message method
a.2.Application Model
• Aneka SDK provides a flexible application model that enables
developers to design and execute parallel and distributed
applications on cloud infrastructure.
Workflow in Aneka SDK Application Model
• Application Submission: The application is designed using Aneka APIs
and submitted for execution.
• Job Scheduling: Aneka's resource manager schedules tasks or threads
to available computing resources.
• Task Execution: Tasks are executed across distributed cloud resources.
• Result Collection: The results are collected and merged after
execution.
b. Management Tools
1. Infrastructure Management(IAAS)
2. Platform Management(PAAS)
3. Application Management(SAAS)
Topic 3:BUILDING ANEKA CLOUDS

• Aneka is primarily a platform for developing distributed applications


for Clouds.
• As a software platform, it requires infrastructure to be deployed on,
which needs to be managed.
• Infrastructure management tools are specifically designed for this
task, and building Clouds is one of the primary tasks of
administrators.
• Different deployments models for Public, Private and Hybrid Clouds
are supported.
Logical Organization(15M)
The logical organization of Aneka Clouds can be very diverse, since it strongly depends on the
configuration selected for each of the container instances belonging to the Cloud. The most
common scenario is using a master-worker configuration with separated nodes for storage as
shown in Fig. 5.4.
A common configuration of the master node is the following: Index
service (master copy)
● Heartbeat service
● Logging service
● Reservation service
● Resource provisioning service
● Accounting service
● Reporting and monitoring service
● Scheduling services for the supported programming models
Private Cloud Deployment Mode
Public Cloud Deployment Mode
Hybrid Cloud Deployment Mode
Topic 4:DATA INTENSIVE
COMPUTING(15mark)
• Data-intensive computing focuses on processing, analyzing, and
managing extremely large volumes of data using cloud, often
reaching terabytes or petabytes.
• Example: Platforms like Netflix analyze massive datasets of user
preferences to provide personalized recommendations.
• Key features:
1.Large-Scale Data Handling:Manages terabytes to petabytes of data.
2.Distributed and Parallel Processing:Tasks are divided and processed
across multiple servers simultaneously.
3.Scalability on Demand:Resources are scaled up or down based on
workload.
TECHNOLOGIES FOR DATA INTENSIVE
COMPUTING
1.STORAGE SYSTEM
2.PROGRAMMING PLATFORM

1.STORAGE SYSTEM:
Category1:High-Performance Distributed File Systems and Storage Clouds
a)LUSTRE
• High-Speed Parallel File System: Lustre distributes data across multiple
servers to enable fast data access.
• Optimized for Large Workloads: Handles petabytes of data, making it
suitable for big data applications.
• Applications: Commonly used in supercomputing and scientific research
like genomics and climate modeling.
b) IBM General Parallel File System (GPFS)
• Scalable and Fault-Tolerant: Developed by IBM, it supports large-scale data
storage and ensures reliability by replicating data.
• Efficient Data Management: Enables quick access and sharing of large
datasets.
• Applications: Used in enterprise systems, artificial intelligence, and
analytics workloads.
c) Google File System (GFS)
• Distributed and High Throughput: Stores and processes massive datasets
with an emphasis on batch processing over low latency.
• Manages Large Files: Designed for handling terabytes of data efficiently.
• Applications: Powers Google’s services like search engines, Gmail, and
YouTube.
d) Sector
• Open-Source Distributed Storage: Sector is an affordable storage
solution designed for handling big data tasks.
• Optimized for Analytics: Provides high performance and is suitable
for data analytics and computational workloads.
• Applications: Used in academic research and small-scale businesses
for big data projects.
e) Amazon Simple Storage Service (S3).
(Refer module 5 notes-aws)
Category 2:Not Only SQL (NoSQL) Systems

(a) Apache CouchDB and MongoDB.


• Apache CouchDB is an open-source NoSQL database designed to store and
manage large amounts of data in the form of documents.
• It follows a document-oriented model, where data is stored as JSON documents,
making it flexible and easy to work with for applications dealing with semi-
structured data.
• MongoDB is an open-source, document-oriented NoSQL database designed for
storing and managing large volumes of unstructured or semi-structured data.
• It stores data in a flexible, JSON-like format called BSON (Binary JSON), which
makes it ideal for applications that require rapid changes to the schema.
Notes:
1. NoSQL databases typically do not require a fixed schema. NoSQL (Not Only SQL) refers to a category of non-
relational databases designed to store, retrieve, and manage data that doesn't fit neatly into tables and
rows like in traditional relational databases (RDBMS).
2. JSON is text-based, while BSON is binary. SON is a lightweight, text-based data interchange format that is
easy for humans to read and write. BSON is a binary-encoded serialization of JSON-like documents.
(b) Amazon Dynamo
• NoSQL Key-Value Database: Amazon DynamoDB is a fully managed,
serverless NoSQL database designed for high-speed, low-latency
applications, storing data in a key-value format.
• Scalable and Reliable: It automatically scales to handle any amount of
traffic and ensures high availability with built-in replication across multiple
regions.
(c) Google Bigtable
• Google Bigtable is a distributed, scalable NoSQL database designed for
handling large amounts of structured data across many machines.
• It was developed by Google to handle high-volume, low-latency workloads
and is used internally by many of Google's services, such as Search, Gmail,
and Google Maps.
• Integration with Google Cloud: It integrates with other Google Cloud
products like Google Cloud Storage and Google Cloud Datastore for
enhanced functionality.
(d) Apache Cassandra
• Apache Cassandra is an open-source, distributed NoSQL database designed to
manage large amounts of data across multiple servers while ensuring high
availability and fault tolerance
• Cassandra was initially developed by Facebook and now it is part of the Apache
incubator initiative.
• Currently, it provides storage support for several very large Web applications such
as Facebook itself, Digg, and Twitter.
(e) Hadoop Hbase
• HBase is the distributed database supporting the storage needs of the Hadoop
distributed programming platform(open-source framework for processing and
storing large datasets across distributed clusters of computers).
• HBase is designed by taking inspiration from Google Big table, and its main goal is
to offer real time read/write operations for tables with billions of rows and
millions of columns
2. Programming Platforms

2.1. The MapReduce Programming Model


Working
Input: “cat dog cat”
2.2 Variations and Extensions of MapReduce
(a) Hadoop
• Hadoop is an open-source framework for distributed storage and processing of large datasets.
• It allows data to be stored across many machines and processed in parallel, making it highly
scalable and fault-tolerant.
• Hadoop was created by Doug Cutting and Mike Cafarella in 2005.
(b) Pig
• Pig is a high-level platform built on top of Hadoop used for analyzing large datasets. Pig was
developed by Yahoo!
• It provides a scripting language called Pig Latin, which is easier to use than writing raw
MapReduce code.
(c) Hive
• Hive is a data warehouse infrastructure built on top of Hadoop that provides a SQL-like query
language called HiveQL for querying and managing large datasets stored in HDFS (Hadoop
Distributed File System).
• Invented by: Facebook,Developed in 2007.

You might also like