module 4
module 4
• ANEKA
• DATA INTENSIVE COMPUTING
Topic 1 :
ANATOMY OF THE ANEKA CONTAINER / SERVICES
PROVIDED BY ANEKA / FRAMEWORK(15MARK)
• Aneka is a cloud application development and management platform
developed by Manjrasoft, an Australian company.
• It provides a framework for building, deploying, and managing
applications on private or public cloud infrastructures.
• Aneka is Manjrasoft's solution for developing, deploying, and
managing Cloud applications.
• Aneka is a Pure PaaS solution for Cloud computing.
• One of the key advantages of Aneka is its extensible set of APIs
associated with different types of programming models.
Services installed in the Aneka container
The services installed in the Aneka container can be classified into
three major categories:
1. fabric services
2. foundation services
3. application services
• Foundation Services
1. Storage Management
Aneka offers two different facilities for storage management:
• A centralized file storage, all files are stored on a single server or
storage system,which is mostly used for the execution of compute-
intensive applications.
-Compute intensive applications mostly require powerful
processors and do not have high demands in terms of storage,
which in many cases is used to store small files that are easily
transferred from one node to another
• A distributed file system, Files are stored across multiple servers
or locations. which is more suitable for the execution of data-
intensive applications.
-Data intensive applications are characterized by a large data files
(gigabytes or terabytes), and the processing power required by
tasks does not constitute a performance bottleneck.
2. Accounting, Billing, and Resource Pricing
-A complete history of application execution and storage as well as
other resource utilization parameter are captured and minded by the
accounting services. This information constitutes the foundation on
top of which users are charged in Aneka.
- Billing is another important feature of accounting. Aneka billing
service provides detailed information about the resource usage of
each user with the associated costs.
-Each resource can be priced differently according to the different set
of services that are available on the corresponding Aneka container or
the installed software in the node.
3. Resource Reservation
• Resource reservation allows reserving resources for exclusive use by
specific applications.
• Resource reservation is built out of two different kinds of services:
Resource Reservation and Allocation Service. The former keeps track
of all the reserved time slots in the Aneka Cloud , while the latter
manages the database of information regarding the allocated slots
on the local node.
• 3 types of reservation:(a) Basic Reservation (b) Libra Reservation
(c) Relay Reservation
• Application Services
1. Scheduling
Common tasks that are performed by the scheduling component are
the following:
● Job-to-node(virtual machines) mapping
● Rescheduling of failed jobs
● Job status monitoring
● Application status monitoring
2. Execution
• Execution services control the execution of single jobs that
compose applications.
• They are in charge of setting up the runtime environment
hosting the execution of jobs.
• unpacking the jobs received from the scheduler
• Retrieval of input files required for the job execution
• Sandboxed execution of jobs
• Submission of output files at the end of execution
• Execution failure management (i.e., capturing sufficient
contextual information useful to identify the nature of the
failure)
• Performance Monitoring
• Fabric Services
✅
indicate that they are active.
If a node stops sending heartbeats, it is marked as inactive or failed, triggering
✅
task reallocation.
Ensures fault detection and maintains system reliability by proactively
identifying failures.
❖ Monitoring Service
✅ Continuously tracks the status, and performance of worker nodes and
✅
metrics.
Generates detailed reports for performance analysis, debugging, and
optimization.
2.Resource Management
❖ Resource Membership (Index Service or Membership Catalogue)
✅
✅
Maintains a catalog of available resources in the Aneka cloud environment.
The Index Service keeps an updated list of all active, idle, and busy resources.
❖ Resource Reservation
✅ Allows users or applications to pre-book specific computing resources for
✅
dedicated use.
Ensures guaranteed availability of resources for critical tasks or high-priority
jobs.
❖ Resource Provisioning
✅ Dynamically allocates and deallocates resources based on workload demand.
✅ Enables elastic scaling, where resources are added when demand is high and
released when demand decreases.
Platform Abstraction Layer (PAL)
• PAL (Programming Abstraction Layer) in Aneka is a software layer that
provides a simplified interface for developers to build and execute
applications on cloud infrastructure
• PAL provides an abstraction layer that simplifies how applications interact
with cloud resources.
• PAL providing a unified API for managing resources, supporting various
programming models, and abstracting the complexities of cloud
infrastructure.
• Helps developers focus on application logic while PAL manages the cloud
execution environment.
• Supports Multiple Programming Models – Allows developers to use
different execution models like Task, Thread, and MapReduce.
Aneka container
• Aneka containers are the execution environments that run those
applications in the cloud.
• Task Isolation – Containers isolate tasks, ensuring they don’t interfere
with each other’s execution.
• Resource Management – They manage resource allocation, ensuring
tasks are distributed efficiently across available nodes.
• Scalability – Containers allow dynamic scaling of applications based
on demand.
• Fault Tolerance – They help ensure reliability by automatically
recovering from failures.
Topic 2 :CLOUD PROGRAMMING AND MANAGEMENT
a. Aneka SDK
• Aneka SDK(software development kit) is a Platform-as-a-Service (PaaS)
framework for developing and deploying parallel and distributed applications
on cloud infrastructure.
• Multiple Programming Models: It supports Task Model, Thread Model, and
MapReduce Model, allowing developers to create applications based on
different computational needs.
• Dynamic Resource Management: Aneka enables scalable and flexible resource
provisioning, utilizing public, private, and hybrid cloud environments efficiently.
• The Aneka SDK provides support for both programming models and services by
means of the Application Model and the Service Model.
a.1.Service Model
The Aneka Service Model defines the basic requirements needed to
implement a service that can be hosted in the Aneka Cloud.
The container defines the runtime environment where services are
hosted.
Each service that is hosted in the container must be compliant with the
iServices interface, which exposes the following methods and
properties:
● Name and status
● Control operations such as
Start, Stop, Pause, and Continue methods Message handling by means
of the Handle Message method
a.2.Application Model
• Aneka SDK provides a flexible application model that enables
developers to design and execute parallel and distributed
applications on cloud infrastructure.
Workflow in Aneka SDK Application Model
• Application Submission: The application is designed using Aneka APIs
and submitted for execution.
• Job Scheduling: Aneka's resource manager schedules tasks or threads
to available computing resources.
• Task Execution: Tasks are executed across distributed cloud resources.
• Result Collection: The results are collected and merged after
execution.
b. Management Tools
1. Infrastructure Management(IAAS)
2. Platform Management(PAAS)
3. Application Management(SAAS)
Topic 3:BUILDING ANEKA CLOUDS
1.STORAGE SYSTEM:
Category1:High-Performance Distributed File Systems and Storage Clouds
a)LUSTRE
• High-Speed Parallel File System: Lustre distributes data across multiple
servers to enable fast data access.
• Optimized for Large Workloads: Handles petabytes of data, making it
suitable for big data applications.
• Applications: Commonly used in supercomputing and scientific research
like genomics and climate modeling.
b) IBM General Parallel File System (GPFS)
• Scalable and Fault-Tolerant: Developed by IBM, it supports large-scale data
storage and ensures reliability by replicating data.
• Efficient Data Management: Enables quick access and sharing of large
datasets.
• Applications: Used in enterprise systems, artificial intelligence, and
analytics workloads.
c) Google File System (GFS)
• Distributed and High Throughput: Stores and processes massive datasets
with an emphasis on batch processing over low latency.
• Manages Large Files: Designed for handling terabytes of data efficiently.
• Applications: Powers Google’s services like search engines, Gmail, and
YouTube.
d) Sector
• Open-Source Distributed Storage: Sector is an affordable storage
solution designed for handling big data tasks.
• Optimized for Analytics: Provides high performance and is suitable
for data analytics and computational workloads.
• Applications: Used in academic research and small-scale businesses
for big data projects.
e) Amazon Simple Storage Service (S3).
(Refer module 5 notes-aws)
Category 2:Not Only SQL (NoSQL) Systems