Grid Computing Material
Grid Computing Material
Grid deployments that require access to, and processing of, data are called data grids. They are optimized for
data-oriented operations. Although they may consume a lot of storage capacity, these grids are not to be
confused with storage service providers.
26. What are Utility Grids?
utility grids as being commercial compute resources that are maintained and managed by a service provider.
Customers that have the need to augment their existing, internal computational resources may purchase
cycles from a utility grid. In addition to overflow applications, customers may choose to use utility grids for
business continuity and disaster recovery purposes. Utility grid providers are also called Grid Resource
Providers (GReP). Along with computing resources, some utility grids also offer key business applications that
can be purchased by the minute.
Unit II
management, and data marshalling. By comparison, protocols such as MPI are significantly more complex,
with some 380 primary calls.
Finally, GridIron XLR8 is embedded directly into the software applications. Once compiled and installed, users
can benefit from the speed of distributed computing without having to change the way they use the
application and without learning special skills.
40. What is Hyperthreading?
There are a number of currently available technologies that provide the facility for performance improvement
through coprocessor and software optimizations, such as vectorization (e.g., AltiVec), Single Instruction
Multiple Data (SIMD), Pthreads, SSE2, etc. One such technology is hyperthreading.
Hyperthreading is an evolving Intel processor technology (first available on Intels XEON server processors
and now being delivered on all desktop 3.06 GHz+ processors) that provides dual simultaneous execution of
two threads on the same physical processor. Performance improvements for most multi-threaded
applications range from a typical 5 percent to a current theoretical maximum of approximately 30 percent.
Hyperthreading was utilized in this implementation to demonstrate that such technologies are
complimentary to distributed computing and will achieve cumulative performance improvements.
41. What are the advantages a grid?
Single sign-onGlobus creation using Grid Security Infrastructure and X509 certificates. This allows the user
to seamlessly establish his or her identity across all campus grid resources.
Resource informationViewable status information on grid resources, both static and dynamic attributes such
as operating systems,
Job specification and submissiona GUI that enables the user to enter job specifications such as the compute
resource, I/O, and queue requirements. Automated translation of these requirements into Resource
specification language (RSL) and subsequent job submission to Globus Resource Allocation Managers (GRAM)
are supported by the portal. Scripts have been implemented to enable job handoff to SGE via Globus services.
Further, automated translation of some job requirements into SGE parameters is supported.
Precise usage controlPolicy-based authorization and accounting services to examine and evaluate usage
policies of the resource providers. Such a model is critical when sharing resources in a heterogeneous
environment such as the campus grid.
Job managementStorage and retrieval of relevant application profile information, history of job executions,
and related information. Application profiles are meta-data that can be composed to characterize the
applications.
Data handlingUsers can transparently authenticate with and browse remote file systems of the grid
resources. Data can be securely transferred between grid resources using the GSI-enabled data transport
services.
44. Explain about Grid Engine Enterprise Edition
Grid Engine Enterprise Edition (GEEE) is installed at each of the four nodesMaxima, Snowdon, Titania, and
Pascali. The command line and GUI of GEEE is the main access point to each node for local users. The
Enterprise Edition version of Grid Engine provides policy driven resource management at the node level.
There are four policy types which may be implemented:
Share Tree PolicyGEEE keeps track of how much usage users/projects have already received. At each
scheduling interval, the Scheduler adjusts all jobs share of resources to ensure that users/groups and projects
get very close to their allocated share of the system over the accumulation period.
Functional PolicyFunctional scheduling, sometimes called priority scheduling, is a non-feedback scheme
for determining a jobs importance by its association with the submitting user/project/department.
Deadline PolicyDeadline scheduling ensures that a job is completed by a certain time by starting it soon
enough and giving it enough resources to finish on time.
Override PolicyOverride scheduling allows the GEEE operator to dynamically adjust the relative
importance of an individual job or of all the jobs associated with a user/department/project.
Unit III
45. Listout the characteristics of Data Grid.
They are numerous.
They are owned and managed by different, potentially mutually distrustful organizations and individuals.
They are potentially faulty.
They have different security requirements and policies.
They are heterogeneous, i.e., they have different CPU architectures, are running different operating systems,
and have different amounts of memory and disk.
They are connected by heterogeneous, multilevel networks.
They have different resource management policies.
They are likely to be separated geographically (on a campus, in an enterprise, on a continent).
46. Write short notes on Network File System (NFS)
NFS is the standard Unix solution for accessing files on remote machines within a LAN. With NFS, a disk on a
remote machine can be made part of the local machines file system. Accessing data from the remote system
now becomes a matter of accessing a particular part of the file system in the usual manner.
47. Write short notes on File Transfer Protocol (FTP)
FTP has been the tool of choice for transferring files between computers since the 1970s. FTP is a commandline tool that provides its own command prompt and has its own set of commands. Several of the commands
resemble Unix commands, although several new commands, particularly for file transfer as well manipulating
the local file system, are different. FTP may be used within a script; however, in that case, the password for
the remote machine must be stored in a clear-text file on the local machine.
48. Write short notes on GridFTP
GridFTP is a tool for transferring files. It is built on top of the Globus Toolkit. GridFTP is an example of a
service that characterizes the Globus sum of services approach for a grid architecture.
The Andrew File System is a distributed network file system that enables access to files and directories
distributed across multiple sites. Access to files involves becoming part of a single virtual file system. AFS
comprises several cells, with each cell representing an independently administered file system.
User ExpectationsThe user of the PC on the corporate desktop views it as a truly personal part
of his work experience, much like a telephone or a stapler. It is often running many concurrent
applications and needs to appear as if it is always and completely available to serve that employees needs.
After a distributed computing component is deployed on an employees PC, that component will tend to be
blamed for every future fault that occursat least until the next new component or application is
installed
Unit IV
platform. The central grid server distributes the EDRs among the nodes for their process in a distributed
mode.
Once the validation step has been completed, an evaluation of the EDRs takes place. In this evaluation, the
EDRs data are transformed for its inclusion in the DataWareHouse. The system loads the evaluation rules
from a database, and sends them to the platform. The evaluation of the validated EDRs is distributed
among the computers in the platform. Finally the results are committed to the DataWareHouse.
65. Write short notes on Smart System Software (SSS)
Smart System Software (SSS) to virtualize independent operating-system instances to provide an HPC
service. Next to the attractive price/performance of COTS components, SSS plays a key role here. SSS
allows a number of distinct systems to appear as oneeven though each runs its own instance of the
operating system. There are two possibilities for SSS. At one extreme the Single System Image (SSI) is SSS
that involves kernel modification. At the other extreme, the Single System Environment (SSE) is SSS that
runs in user space as a layered service. The arrows in emphasize interconnections and corresponding
communications.
66. Explain about Single System Environment(SSE)
Clustering solutions can also be delivered via an SSE. In contrast to SSI, clustering via SSE does not require
modifications to the kernel. Instead, SSE runs in user space and provides a distributed process abstraction
that includes primitives for process creation and process control.
The user-space approach releases the single-operating-system restriction, and allows third parties to
craft cross-platform clusters based on Linux, Mac OS, UNIX, and/or Windows. SSE directly addresses the
tension between supply and demand by matching an applications resource requirements with the
resources capable of filling the need. By effectively arbitrating the supply-demand budget over an
enterprise-scale IT infrastructure, subject to policy-driven objectives, SSE solutions allow organizations
to derive maximal utilization from all available computer resources.
67. Write short notes on Electronic Design Automation (EDA)
The high-tech field of electronic design automation (EDA) offers rich possibilities for illustrating SSE in
capacity-driven simulation. In EDA, the fundamental challenge stems from incremental progress into
deeper sub-micron design technologies; this advance implies staggering challenges for design synthesis,
verification, timing closure, and power consumption. Through direct association with Moores Law,
design synthesis has gained a profile. However, it is design verification that has an even greater potential
to become the ultimate design bottleneck: As design complexity increases, verification requirements
escalate rapidly.
68. Explain about Open Grid Services Architecture (OGSA)
The Open Grid Services Architecture (OGSA) is a set of technical specifications which define a common
framework that will allow businesses to build grids both across the enterprise and with their business
partners. It is expected that OGSA will define the standards required for both open source and
commercial software for a broadly applicable and widely adopted global grid infrastructure.
The Open Grid Services Architecture (OGSA) has been proposed as an enabling infrastructure for systems
and applications that require the integration and management of service within distributed,
heterogeneous, dynamic virtual organizations.
69. Write short notes on Submission-execution topologies for Platform MultiCluster
72. Write short notes on Data Access and Integration Services (DAIS)
The Data Access and Integration Services working group is focused on defining grid data services that provide
consistent access to existing, autonomously managed databases. Although there had already been a lot of
work around Grid Services for file management (e.g., GridFTP), database integration was not really covered by
this work, even though databases play a central role in both the research and commercial computing domains.
73. Explain the PortTypes for Basic Services
OGSI defines a set of portTypes and describes the behavior of a collection of common distributed computing
patterns that are fundamental to OGSI.
GridServiceencapsulates the root behavior of the service model.
HandleResolvermapping from a GSH to a GSR.
NotificationSourceallows clients to subscribe to notification messages.
NotificationSubscriptiondefines
the
relationship
between
a
singleNotificationSource and NotificationSink pair.
NotificationSinkdefines a single operation for delivering a notification message to the service instance that
implements the operation.
Factorystandard operation for creation of Grid Service instances.
ServiceGroupallows clients to maintain groups of services.
ServiceGroupRegistrationallows Grid Services to be added and removed from a ServiceGroup.
ServiceGroupEntrydefines the relationship between a Grid Service and its membership within
a ServiceGroup.
74.List out the serviceData elements in the GridService
interface a list of the QNames of all portTypes implemented by the service.
serviceDataNamea list of QNames of all SDEs supported by this service instance. This includes SDEs defined
at the interface level, as well as SDEs added dynamically during the lifetime of the service instance.
factoryLocator a service locator that points to the Grid Service instance that created this Grid Service
instance.
gridServiceHandlezero or more GSHs of this Grid Service instance.
gridServiceReferencezero or more GSRs of this Grid Service instance.
FindServiceDataExtensibilitya set of operation extensibility declarations for the findServiceData operation.
The client can use a query expression that conforms to any of the listed inputElement types.
UNIT V
77. What is Hive Computing?
(Nov 10)
The development, deployment, and management of mission-critical applicationscalledHive
Computingthat is designed to complement and extend the vision of Grid Computing.Hive Computing
enables businesses to build a transactional resource, called a Hive that can be plugged into a grid and
host the transaction-oriented applications upon which businesses depend. The goal of Hive Computing is
to expand the range of problems that can be solved with a grid and bring the benefits of Grid Computing
to the mainstream of business computing.Hive Computing defines a new type of resource called
a Transactional Resource that can be integrated into an existing grid. The transactional resource handles
all the transaction-oriented application.
78. What are the services performed by the Hive ?
v Get a real-time quote based on a CUSIP (stock identifier)
v Get a delayed quote based on a CUSIP
v Generate a 30-day or other price chart based on a CUSIP
79. What are the components of Hive Computing?
Extensibility
Diversity (Multiple information sources)
Dynamicity
Flexibility
Security
Deployability
Decentralized maintenance
White pagesThese contain basic contact information and identifiers about a company, including
business name, address, contact information, and unique identifiers such as its Dun-and-Bradstreet
(DUNS) numbers or tax IDs. This information allows others to discover Web Service based on business
identification. In the context of Grid Computing, white pages can provide the retrieval of an IP address or
the amount of memory available on a particular resource.
Yellow pagesThese contain information that describes a Web Service using different business
categories (taxonomies). This information allows others to discover Web Services based on its
categorization (such as flower sellers or car sellers).
Green pagesThese contain technical information about Web Services that are exposed by a
business, including references to specifications of interfaces, as well as support for pointers to various file
and URL-based discovery mechanisms.
(Nov 10)