GCC-2 Marks PDF
GCC-2 Marks PDF
3. Difference between
distributed and parallel
computing.
Each processor has its own private All processors may have
memory (distributed memory). access to a shared
Information is exchanged by passing memory to exchange
messages between the processors. information between
It is loosely coupled. processors.
An important goal and challenge of It is tightly coupled.
distributed systems is location Large problems can often
transparency. be divided into smaller
ones, which are then
solved concurrently ("in
parallel").
UNIT-3
1. Define private cloud.
The private cloud is built within the domain of an intranet owned by a single organization.
Therefore, they are client owned and managed. Their access is limited to the owning clients and
their partners. Their deployment was not meant to sell capacity over the Internet through
publicly accessible interfaces. Private clouds give local users a flexible and agile private
infrastructure to run service workloads within their administrative domains.
2. Define public cloud.
A public cloud is built over the Internet, which can be accessed by any user who has paid for
the service. Public clouds are owned by service providers. They are accessed by subscription.
Many companies have built public clouds, namely Google App Engine, Amazon AWS,
Microsoft Azure, IBM Blue Cloud, and Salesforce Force.com. These are commercial providers
that offer a publicly accessible remote interface for creating and managing VM instances within
their proprietary infrastructure.
3. Define hybrid cloud.
A hybrid cloud is built with both public and private clouds, Private clouds can also support a
hybrid cloud model by supplementing local infrastructure with computing capacity from an
external public cloud. For example, the research compute cloud (RC2) is a private cloud built by
IBM.
4. List the essential characteristics of cloud computing
1. On-demand capabilities 2. Broad network access 3. Resource pooling
4. Rapid elasticity 5. Measured service
5. List the design objectives of cloud computing.
Shifting Computing from Desktops to Datacenters
Service Provisioning and Cloud Economics
Scalability in Performance
Data Privacy Protection.
High Quality of Cloud Services.
6. Define anything-as-a-service.
Providing services to the client on the basis on meeting their demands at some pay per use
cost such as data storage as a service, network as a service, communication as a service etc. it
is generally denoted as anything as a service (XaaS).
7. What is mean by SaaS?
The software as a service refers to browser initiated application software over thousands of
paid customer. The SaaS model applies to business process industry application, consumer
relationship management (CRM), Enterprise resource Planning (ERP), Human Resources (HR)
and collaborative application
8. What is mean by IaaS?
The Infrastructure as a Service model puts together the infrastructure demanded by the user
namely servers, storage, network and the data center fabric. The user can deploy and run on
multiple VM’s running guest OS on specific application.
9. What is PaaS?
The Platform as a Service model enables the user to deploy user built applications onto a
virtualized cloud platform. It includes middleware, database, development tools and some
runtime support such as web2.0 and java. It includes both hardware and software integrated with
specific programming interface.
10. What is mean by Virtualization?
Virtualization is a computer architecture technology by which multiple virtual machines
(VMs) are multiplexed in the same hardware machine. The purpose of a VM is to enhance
resource sharing by many users and improve computer performance in terms of resource
utilization and application flexibility.
11. Define virtual machine monitor.
A traditional computer runs with a host operating system specially tailored for its hardware
architecture, After virtualization, different user applications managed by their own
operating systems (guest OS) can run on the same hardware, independent of the host OS.
This is often done by adding additional software, called a virtualization layer. This
virtualization layer is known as hypervisor or virtual machine monitor (VMM).
12. List the requirements of VMM.
VMM should provide an environment for programs which is essentially identical to
the original machine.
Programs run in this environment should show, at worst, only minor decreases in speed.
VMM should be in complete control of the system resources. Any program run under a VMM
should exhibit a function identical to that which it runs on the original machine directly.
13. Define Host OS and Guest OS.
The guest OS, which has control ability, is called Domain 0, and the others are called Domain
U. Domain 0 is a privileged guest OS of Xen. It is first loaded when Xen boots without any file
system drivers being available. Domain 0 is designed to access hardware directly and manage
devices.
14. What are the responsibilities of VMM?
The VMM is responsible for allocating hardware resources for programs.
It is not possible for a program to access any resource not explicitly allocated to it.
It is possible under certain circumstances for a VMM to regain control of resources
already allocated.
15. Define CPU virtualization.
CPU architecture is virtualizable if it supports the ability to run the VM’s privileged and
unprivileged instructions in the CPU’s user mode while the VMM runs in supervisor mode.
When the privileged instructions including control- and behavior-sensitive instructions of a VM
are executed, they are trapped in the VMM. In this case, the VMM acts as a unified mediator for
hardware access from different VMs to guarantee the correctness and stability of the whole
system.
16. Define memory virtualization.
Virtual memory virtualization is similar to the virtual memory support provided by modern
operating systems. In a traditional execution environment, the operating system maintains
mappings of virtual memory to machine memory using page tables, which is a one-stage
mapping from virtual memory to machine memory. All modern x86 CPUs include a memory
management unit (MMU) and a translation look aside buffer (TLB) to optimize virtual memory
performance.
17. What is mean by I/O virtualization?
I/O virtualization involves managing the routing of I/O requests between virtual devices and the
shared physical hardware. There are three ways to implement I/O virtualization:
full device emulation, Full device emulation is the first approach for I/O virtualization
para-virtualization
direct I/O.
18. Distinguish the physical and virtual cluster. (Jan.2014)
A physical cluster is a collection of servers (physical machines) connected by a physical network
such as a LAN. Virtual clusters have different properties and potential applications. There are
three critical design issues of virtual clusters: live migration of virtual machines (VMs), memory
and file migrations, and dynamic deployment of virtual clusters.
19. What is memory migration?
Moving the memory instance of a VM from one physical host to another can be approached in
any number of ways. Memory migration can be in a range of hundreds of megabytes to a few
gigabytes in a typical system today, and it needs to be done in an efficient manner. The Internet
Suspend-Resume (ISR) technique exploits temporal locality as memory states are likely to have
considerable overlap in the suspended and the resumed instances of a VM.
20. What is mean by host based virtualization?
An alternative VM architecture is to install a virtualization layer on top of the host OS. This host
OS is still responsible for managing the hardware. The guest OSes are installed and run on top of
the virtualization layer. Dedicated applications may run on the VMs. Certainly, some other
applications can also run with the host OS directly.
21. Define KVM.
Kernel-Based VM:- This is a Linux para-virtualization system—a part of the Linux version 2.6.20
kernel. Memory management and scheduling activities are carried out by the existing Linux
kernel. The KVM does the rest, which makes it simpler than the hypervisor that controls
the entire machine. KVM is a hardware-assisted para-virtualization tool, which improves
performance and supports unmodified guest OSes such as Windows, Linux, Solaris, and other
UNIX variants
UNIT-4
1. List out the grid middleware Description
packages Package
BOINC Berkeley Open Infrastructure for Network
Computing.
UNICORE Middleware developed by the German grid
computing community
Globus (GT4) A middleware library jointly developed by
Argonne National Lab.
CGSP in ChinaGrid The CGSP (ChinaGrid Support Platform)
is a middleware library developed by 20
top universities in China as part of the
ChinaGrid Project
Condor-G Originally developed at the Univ. of
Wisconsin for general distributed
computing, and later extended to Condor-
G for grid job management.
Sun Grid Engine (SGE) Developed by Sun Microsystems for
business grid applications. Applied to
private grids and local clusters within
enterprises or campuses.
2. Define MapReduce.
The mapreduce software framework provides an abstraction layer with the data flow and flow of
control of users and hides implementation of all data flow steps such as data partitioning
mapping, synchronization, communication and scheduling. The data flow is such framework is
predefined the abstraction layer provides two well defined interface in the form of two functions
map and reduce.
3. What is the role of Map function?
Each Map function receives the input data split as a set of (key, value) pairs to process
and produce the intermediated (key, value) pairs.
4. What is the role of Reduce function?
The reduce worker iterates over the grouped (key, value) pairs, and for each unique key, it sends
the key and corresponding values to the Reduce function. Then this function processes its input
data and stores the output results in predetermined files in the user’s program.
5. List out the Hadoop core fundamental layers
The Hadoop core is divided into two fundamental layers: the MapReduce engine and HDFS.
The MapReduce engine is the computation engine running on top of HDFS as its data storage
manager. HDFS is a distributed file system inspired by GFS that organizes files and stores their
data on a distributed computing system.
6. What are the features of HDFS?
HDFS is not a general-purpose file system, as it only executes specific types of applications, it
does not need all the requirements of a general distributed file system. For example, security has
never been supported for HDFS systems.
7. List the areas where HDFS cannot be used?
Low-latency data access
Lots of small files
Multiple writers, arbitrary file modifications
8. Why is a block in HDFS so large?
HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks.
By making a block large enough, the time to transfer the data from the disk can be made to be
significantly larger than the time to seek to the start of the block. Thus the time to transfer a
large file made of multiple blocks operates at the disk transfer rate.
9. Define Namenode in HDFS
The namenode manages the filesystem namespace. It maintains the filesystem tree and the
metadata for all the files and directories in the tree. This information is stored persistently on the
local disk in the form of two files: the namespace image and the edit log. The namenode also
knows the datanodes on which all the blocks for a given file are located, however, it does not
store block locations persistently, since this information is reconstructed from datanodes when
the system starts.
10. Define Datanode in HDFS
Datanodes are the work horses of the filesystem. They store and retrieve blocks when they are
told to (by clients or the namenode), and they report back to the namenode periodically with lists
of blocks that they are storing.
11. What are the permission models for files and directories in HDFS
There are three types of permission: the read permission (r), the write permission (w) and the
execute permission (x). The read permission is required to read files or list the contents of a
directory. The write permission is required to write a file, or for a directory, to create or delete
files or directories in it. The execute permission is ignored for a file since you can’t execute a file
on HDFS (unlike POSIX), and for a directory it is required to access its children.
12. Define FUSE interface?
Filesystem in Userspace (FUSE) allows filesystems that are implemented in user space to be
integrated as a Unix filesystem. Hadoop’s Fuse-DFS contrib module allows any Hadoop
filesystem (but typically HDFS) to be mounted as a standard filesystem. You can then use Unix
utilities (such as ls and cat) to interact with the filesystem, as well as POSIX libraries to access
the filesystem from any programming language. Fuse-DFS is implemented in C using libhdfs as
the interface to HDFS.
13. Define globbing in HDFS?
It is a common requirement to process sets of files in a single operation.. To enumerate each
file and directory to specify the input, it is convenient to use wildcard characters to match
multiple files with a single expression, an operation that is known as globbing.
14. How to process globs in hadoop filesystem?
Hadoop provides two FileSystem methods for processing globs:
public FileStatus[] globStatus(Path pathPattern) throws IOException
public FileStatus[] globStatus(Path pathPattern, PathFilter filter) throws IOException
The globStatus() methods returns an array of FileStatus objects whose paths match the supplied
pattern, sorted by path. An optional PathFilter can be specified to restrict the matches further
15. How to delete file or directory in hadoop filesystem?
Use the delete() method on FileSystem to permanently remove files or directories:
public boolean delete(Path f, boolean recursive) throws IOException
If f is a file or an empty directory, then the value of recursive is ignored. A nonempty directory
is only deleted, along with its contents, if recursive is true (otherwise an IOException is thrown).
16. Define iterative MapReduce.
It is important to understand the performance of different runtime and in particular to
compare MPI and map reduce. The two major sources of parallel overhead are load imbalance
and communication. The communication overhead in mapreduce can be high for two reasons.
Mapreduce read and writes files whereas MPI transfer information directly between nodes
over the network.
MPI does not transfer all data from node to node.
17. Define HDFS.
HDFS is a distributed file system inspired by GFS that organizes files and stores their data on
a distributed computing system. The hadoop implementation of mapreduce uses the hadoop
distributed file system as in underlying layer rather than GFS.
18. List the characteristics of HDFS.
HDFS fault tolerance
Block replication
Relica placement
Heartbeat and block report messages
HDFS high throughput access to large dataset.
19. What are the operations of HDFS?
The control flow of HDFS operation such as read and write can properly highlights role of the
name node and data node in the managing operations. The control flow of the main operations
of HDFS on file is further described to manifest the interaction between the users.
20. Define block replication.
The reliably store data in HDFS is the file blocks, it is replicated in this system. HDFS store a
file as a set of blocks and each block is replicated and distributed across the whole cluster.
21. Define heart beat in Hadoop. What are the advantages of heart beat?
The heart beat are periodic messages sent to the name node by each data node in the cluster.
Receipt of a heartbeat implies that data mode is functioning properly while each block report
contains list of all blocks in a data mode. The name node receives such messages because it
is the sole decision maker of all replicas in the system.
UNIT-5
1. What are the challenges of grid sites
The first challenge is integration with existing systems and technologies. The
second challenge is interoperability with different hosting environments.
The third challenge is to construct trust relationships among interacting hosting environments.
2. Define Reputation-Based Trust Model
In a reputation-based model, jobs are sent to a resource site only when the site is trustworthy to
meet users’ demands. The site trustworthiness is usually calculated from the following
information: the defense capability, direct reputation, and recommendation trust.
3. Define direct reputation
Direct reputation is based on experiences of prior jobs previously submitted to the site. The
reputation is measured by many factors such as prior job execution success rate, cumulative site
utilization, job turnaround time, job slowdown ratio, and so on. A positive experience
associated with a site will improve its reputation. On the contrary, a negative experience with a
site will decrease its reputation.
4. What are the major authentication methods in the grid?
The major authentication methods in the grid include passwords, PKI, and Kerberos. The
password is the simplest method to identify users, but the most vulnerable one to use. The PKI is
the most popular method supported by GSI
5. List the types of authority in grid
The authority can be classified into three categories: attribute authorities, policy authorities,
and identity authorities. Attribute authorities issue attribute assertions; policy authorities issue
authorization policies; identity authorities issue certificates. The authorization server makes the
final authorization decision.
6. Define grid security infrastructure
The Grid Security Infrastructure (GSI), formerly called the Globus Security Infrastructure, is a
specification for secret, tamper-proof, delegatable communication between software in a grid
computing environment. Secure, authenticatable communication is enabled using asymmetric
encryption.
7. What are the functions present in GSI
GSI may be thought of as being composed of four distinct functions: message protection,
authentication, delegation, and authorization.
8. List the protection mechanisms in GSI
GSI allows three additional protection mechanisms. The first is integrity protection, by which
a receiver can verify that messages were not altered in transit from the sender. The second is
encryption, by which messages can be protected to provide confidentiality. The third is replay
prevention, by which a receiver can verify that it has not.
9. What is the primary information of GSI
GSI authentication, a certificate includes four primary pieces of information: (1) a subject name,
which identifies the person or object that the certificate represents; (2) the public key belonging
to the subject; (3) the identity of a CA that has signed the certificate to certify that the public
key and the identity both belong to the subject; and (4) the digital signature of the named CA