0% found this document useful (0 votes)
39 views

Grid Computing PPT 5 Wecompress - Com 1

Uploaded by

Jaya R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Grid Computing PPT 5 Wecompress - Com 1

Uploaded by

Jaya R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

Grid Computing

(Center for Computational Mathematics)


Dr Ashok Mishra

Team : Dr. Banitamani Mallik, Dr. Tumbanath Samantara, Mr.Balaji


Padhy , Mrs Sasmita Jena
Lecture_5
Data-intensive Applications
Data-intensive computing is a class of parallel
computing applications which use a data
parallel approach to process large volumes of
data typically terabytes or petabytes in size
and typically referred to as big data .
Computing applications which devote most of
their execution time to computational
requirements are deemed compute-intensive,
whereas computing applications which require
large volumes of data and devote most of their
processing time to I/O and manipulation of data
are deemed data-intensive.
ABSTRACT MODEL OF A WORKFLOW
MANAGEMENT SYSTEM:

The architecture of a Grid workflow system


based on the workflow reference model
proposed by Workflow Management Coalition
(WfMC).
The build time and run time borders separate
the functionality of the design to defining and
executing tasks, respectively.
At the core of the run time, components to
actively process both data and tasks equally
• The scheduler, that forms the core of the
engine,handles data flow schedules on top of
task schedules.
• For example, if a workflow is modelled such
that the data transfer tasks are separate from
computation tasks, the scheduler may apply a
different scheduling policy to the data transfer
tasks.
• Similarly, when there is no distinction between
these tasks, the scheduler may prioritize data
transfers between certain tasks over
computation depending on the structure of the
workflow, scheduling objectives, and so forth.
SURVEY:
In this section, we characterize and classify key
concepts and techniques used for scheduling and
managing data-intensive application workflows.
we have classified the techniques into seven
major categories:
(a) data locality,
(b) data transfer,
(c) data-footprint,
(d) granularity,
(e) model,
(f) platform,
(g) miscellaneous technologies
Data Locality

• Transferring data between computing nodes


takes significant amount of time depending on
the size of data and network capacity between
participating nodes. Hence, most scheduling
techniques target on optimizing data transfers
by exploiting the locality of data. These
techniques can be classified into
(i) spatial clustering,
(ii) task clustering, and
(iii) worker centric.
Data Transfer:

• several mechanisms for transferring data


so that data transfer time is minimized.
These techniques are:
• (i) data parallelism,
• (ii) data streaming, and
• (iii) data throttling
Data Footprint:

Workflow systems adopt several


mechanisms to track and utilize the data
footprint of the application. These
mechanisms can be classified into:
• cleaning jobs,
• restructuring of workflow,
• data placement & replication.
Granularity :
• Workflow schedulers can make scheduling
decisions based on either:
• (a) task level, or
• (b)workflow level.
• Task level schedulers map individual tasks to
compute resources.
• The decision of resource selection and data
movement is based on the characteristics of
individual task and its dependencies with
other tasks
Model :
Workflow scheduling model depends on
the way the tasks and data are
composed and handled.
They can be classified into two
categories:

(i) task-based,
(ii) service-based
Platform :

Data-intensive application workflows could be


executed in different resource configuration and
environments (e.g. Cluster, Data Grids, Clouds
etc.) depending on the requirements of the
application.
Miscellaneous :
In this Section, we list some technologies
that have been used for enhancing the
performance of data-intensive
application workflows
Semantic Technology :

Database Technology

• Thank You

You might also like