We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15
Grid Computing
(Center for Computational Mathematics)
Dr Ashok Mishra
Team : Dr. Banitamani Mallik, Dr. Tumbanath Samantara, Mr.Balaji
Padhy , Mrs Sasmita Jena Lecture_5 Data-intensive Applications Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data . Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive. ABSTRACT MODEL OF A WORKFLOW MANAGEMENT SYSTEM:
The architecture of a Grid workflow system
based on the workflow reference model proposed by Workflow Management Coalition (WfMC). The build time and run time borders separate the functionality of the design to defining and executing tasks, respectively. At the core of the run time, components to actively process both data and tasks equally • The scheduler, that forms the core of the engine,handles data flow schedules on top of task schedules. • For example, if a workflow is modelled such that the data transfer tasks are separate from computation tasks, the scheduler may apply a different scheduling policy to the data transfer tasks. • Similarly, when there is no distinction between these tasks, the scheduler may prioritize data transfers between certain tasks over computation depending on the structure of the workflow, scheduling objectives, and so forth. SURVEY: In this section, we characterize and classify key concepts and techniques used for scheduling and managing data-intensive application workflows. we have classified the techniques into seven major categories: (a) data locality, (b) data transfer, (c) data-footprint, (d) granularity, (e) model, (f) platform, (g) miscellaneous technologies Data Locality
• Transferring data between computing nodes
takes significant amount of time depending on the size of data and network capacity between participating nodes. Hence, most scheduling techniques target on optimizing data transfers by exploiting the locality of data. These techniques can be classified into (i) spatial clustering, (ii) task clustering, and (iii) worker centric. Data Transfer:
• several mechanisms for transferring data
so that data transfer time is minimized. These techniques are: • (i) data parallelism, • (ii) data streaming, and • (iii) data throttling Data Footprint:
Workflow systems adopt several
mechanisms to track and utilize the data footprint of the application. These mechanisms can be classified into: • cleaning jobs, • restructuring of workflow, • data placement & replication. Granularity : • Workflow schedulers can make scheduling decisions based on either: • (a) task level, or • (b)workflow level. • Task level schedulers map individual tasks to compute resources. • The decision of resource selection and data movement is based on the characteristics of individual task and its dependencies with other tasks Model : Workflow scheduling model depends on the way the tasks and data are composed and handled. They can be classified into two categories:
(i) task-based, (ii) service-based Platform :
Data-intensive application workflows could be
executed in different resource configuration and environments (e.g. Cluster, Data Grids, Clouds etc.) depending on the requirements of the application. Miscellaneous : In this Section, we list some technologies that have been used for enhancing the performance of data-intensive application workflows Semantic Technology :