0% found this document useful (0 votes)
99 views

Distributed Programming Using Java: Quick Recap: UNIT-1

The document provides an introduction to distributed programming using Java and advanced Java concepts like generics. It discusses the anatomy of distributed applications including processes, threads, objects and agents. It then covers client-server and distributed object models, and provides details on RMI and CORBA for distributed object programming. Key advantages of Java generics are also summarized such as type safety and avoiding type casting.

Uploaded by

Pavan Pulicherla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Distributed Programming Using Java: Quick Recap: UNIT-1

The document provides an introduction to distributed programming using Java and advanced Java concepts like generics. It discusses the anatomy of distributed applications including processes, threads, objects and agents. It then covers client-server and distributed object models, and provides details on RMI and CORBA for distributed object programming. Key advantages of Java generics are also summarized such as type safety and avoiding type casting.

Uploaded by

Pavan Pulicherla
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to BIG DATA

UNIT-1

SYLLABUS:
1. Distributed programming using JAVA: Quick Recap
2. advanced Java Programming: Generics,
3. Threads,
4. Sockets,
5. Simple client server Programming using JAVA,
6. Difficulties in developing distributed programs for large scale clusters and
7. introduction to cloud computing.

1. DISTRIBUTED PROGRAMMING USING JAVA: QUICK RECAP

Anatomy of a Distributed Application

A distributed application is built upon several layers. At the lowest level, a network connects a group of
host computers together so that they can talk to each other. Network protocols like TCP/IP let the
computers send data to each other over the network by providing the ability to package and address
data for delivery to another machine. Higher-level services can be defined on top of the network
protocol, such as directory services and security protocols. Finally, the distributed application itself
runs on top of these layers, using the mid-level services and network protocols as well as the computer
operating systems to perform coordinated tasks across the network.

At the application level, a distributed application can be broken down into the following parts:

Processes

A process is created by describing a sequence of steps in a programming language, compiling the


program into an executable form, and running the executable in the operating system.

Threads

Every process has at least one thread of control. Some operating systems support the creation of
multiple threads of control within a single process. Each thread in a process can run independently
from the other threads, although there is usually some synchronization between them.

Objects

An object is a group of related data, with methods available for querying or altering the data
(getName(), set-Name()), or for taking some action based on the data (sendName(Out-putStreamo)).
Objects can be accessed by one or more threads within the process. And with the introduction of
distributed object technology like RMI and CORBA, an object can also be logically spread across
multiple processes, on multiple computers.

Agents

The term "agent" as a general way to refer to significant functional elements of a distributed
application. an agent is a higher-level system component, defined around a particular function, or
utility, or role in the overall system.

G B Gangadhar 1
Introduction to BIG DATA

Example: A remote banking application, for example, might be broken down into a customer agent,
a transaction agent and an information brokerage agent. Agents can be distributed across multiple
processes, and can be made up of multiple objects and threads in these processes. Our customer
agent might be made up of an object in a process running on a client desktop that's listening for
data and updating the local display, along with an object in a process running on the bank server,
issuing queries and sending the data back to the client.

Developing distributed object-based applications can be done in Java using RMI or JavaIDL (an
implementation of CORBA).

The Client/Server Model

The client/server model is a form of distributed computing in which one program (the client)
communicates with another program (the server) for the purpose of exchanging information. In this
model, both the client and server usually speak the same language -- a protocol that both the client and
server understand -- so they are able to communicate.

While the client/server model can be implemented in various ways, it is typically done using low-level
sockets. Using sockets to develop client/server systems means that we must design a protocol, which is a
set of commands agreed upon by the client and server through which they will be able to communicate.

The Distributed Objects Model

A distributed object-based system is a collection of objects that isolates the requesters of services (clients)
from the providers of services (servers) by a well-defined encapsulating interface. In other words, clients
are isolated from the implementation of services as data representations and executable code. This is one
of the main differences that distinguishes the distributed object-based model from the pure client/server
model.

In the distributed object-based model, a client sends a message to an object, which in turns interprets the
message to decide what service to perform. This service, or method, selection could be performed by either
the object or a broker. The Java Remote Method Invocation (RMI) and the Common Object Request Broker
Architecture (CORBA) are examples of this model.

RMI

RMI is a distributed object system that enables you to easily develop distributed Java applications.
Developing distributed applications in RMI is simpler than developing with sockets since there is no need
to design a protocol, which is an error-prone task. In RMI, the developer has the illusion of calling a local
method from a local class file, when in fact the arguments are shipped to the remote target and interpreted,
and the results are sent back to the callers.

The Genesis of an RMI Application

Developing a distributed application using RMI involves the following steps:

1. Define a remote interface


2. Implement the remote interface
3. Develop the server
4. Develop a client
5. Generate Stubs and Skeletons, start the RMI registry, server, and client

2
Introduction to BIG DATA

CORBA

The Common Object Request Broker Architecture (or CORBA) is an industry standard developed by the
Object Management Group (OMG) to aid in distributed objects programming. It is important to note that
CORBA is simply a specification. A CORBA implementation is known as an ORB (or Object Request Broker).
There are several CORBA implementations available on the market such as VisiBroker, ORBIX, and others.
JavaIDL is another implementation that comes as a core package with the JDK1.3 or above.

CORBA was designed to be platform and language independent. Therefore, CORBA objects can run on any
platform, located anywhere on the network, and can be written in any language that has Interface
Definition Language (IDL) mappings.

Similar to RMI, CORBA objects are specified with interfaces. Interfaces in CORBA, however, are specified in
IDL. While IDL is similar to C++, it is important to note that IDL is not a programming language.

The Genesis of a CORBA Application

There are a number of steps involved in developing CORBA applications. These are:

1. Define an interface in IDL


2. Map the IDL interface to Java (done automatically)
3. Implement the interface
4. Develop the server
5. Develop a client
6. Run the naming service, the server, and the client.

CORBA vs. RMI

Code-wise, it is clear that RMI is simpler to work with since the Java developer does not need to be familiar
with the Interface Definition Language (IDL). In general, however, CORBA differs from RMI in the following
areas:

 CORBA interfaces are defined in IDL and RMI interfaces are defined in Java. RMI-IIOP allows you to
write all interfaces in Java.
 CORBA supports in and out parameters, while RMI does not since local objects are passed by copy
and remote objects are passed by reference.
 CORBA was designed with language independence in mind. This means that some of the objects can
be written in Java, for example, and other objects can be written in C++ and yet they all can
interoperate. Therefore, CORBA is an ideal mechanism for bridging islands between different
programming languages. On the other hand, RMI was designed for a single language where all
objects are written in Java. Note however, with RMI-IIOP it is possible to achieve interoperability.
 CORBA objects are not garbage collected. As we mentioned, CORBA is language independent and
some languages (C++ for example) does not support garbage collection. RMI objects are garbage
collected automatically.

2. ADVANCED JAVA PROGRAMMING: GENERICS

The Java Generics programming is introduced in J2SE 5 to deal with type-safe objects.

Before generics, we can store any type of objects in collection i.e. non-generic. Now generics, forces the
java programmer to store specific type of objects.

3
Introduction to BIG DATA

• Collections can store Objects of any Type


• Generics restricts the Objects to be put in a collection
• Generics ease identification of runtime errors at compile time

Advantage of Java Generics

There are mainly 3 advantages of generics. They are as follows:

1) Type-safety : We can hold only a single type of objects in generics. It doesn’t allow to store other
objects.

2) Type casting is not required: There is no need to typecast the object.

Before Generics, we need to type cast.

1. List list = new ArrayList();


2. list.add("hello");
3. String s = (String) list.get(0);//typecasting

After Generics, we don't need to typecast the object.

1. List<String> list = new ArrayList<String>();


2. list.add("hello");
3. String s = list.get(0);

3) Compile-Time Checking: It is checked at compile time so problem will not occur at runtime. The good
programming strategy says it is far better to handle the problem at compile time than runtime.

1. List<String> list = new ArrayList<String>();


2. list.add("hello");
3. list.add(32);//Compile Time Error

Syntax to use generic collection

1. ClassOrInterface<Type>

Example to use Generics in java

1. ArrayList<String>

Example of Generics in Java

Here, we are using the ArrayList class, but you can use any collection class such as ArrayList,
LinkedList, HashSet, TreeSet, HashMap, Comparator etc.

1. import java.util.*;
2. class TestGenerics1{
3. public static void main(String args[]){
4. ArrayList<String> list=new ArrayList<String>();
5. list.add("rahul");
6. list.add("jai");
7. //list.add(32);//compile time error
8.

4
Introduction to BIG DATA

9. String s=list.get(1);//type casting is not required


10. System.out.println("element is: "+s);
11.
12. Iterator<String> itr=list.iterator();
13. while(itr.hasNext()){
14. System.out.println(itr.next());
15. }
16. }
17. }

Output:element is: jai


rahul
jai

Generic class

A class that can refer to any type is known as generic class. Here, we are using T type parameter
to create the generic class of specific type.

Let’s see the simple example to create and use the generic class.

Creating generic class:

1. class MyGen<T>{
2. T obj;
3. void add(T obj){this.obj=obj;}
4. T get(){return obj;}
5. }

The T type indicates that it can refer to any type (like String, Integer, Employee etc.). The type you
specify for the class, will be used to store and retrieve the data.

Using generic class:

Let’s see the code to use the generic class.

1. class TestGenerics3{
2. public static void main(String args[]){
3. MyGen<Integer> m=new MyGen<Integer>();
4. m.add(2);
5. //m.add("vivek");//Compile time error
6. System.out.println(m.get());
7. }}

Output:2

Type Parameters

The type parameters naming conventions are important to learn generics thoroughly. The
commonly type parameters are as follows:

1. T - Type
2. E - Element
3. K - Key

5
Introduction to BIG DATA

4. N - Number
5. V - Value

Generic Method

Like generic class, we can create generic method that can accept any type of argument.

Let’s see a simple example of java generic method to print array elements. We are using here E to
denote the element.

1. public class TestGenerics4{


2.
3. public static < E > void printArray(E[] elements) {
4. for ( E element : elements){
5. System.out.println(element );
6. }
7. System.out.println();
8. }
9. public static void main( String args[] ) {
10. Integer[] intArray = { 10, 20, 30, 40, 50 };
11. Character[] charArray = { 'J', 'A', 'V', 'A', 'T','P','O','I','N','T' };
12.
13. System.out.println( "Printing Integer Array" );
14. printArray( intArray );
15.
16. System.out.println( "Printing Character Array" );
17. printArray( charArray );
18. }
19. }

Output:Printing Integer Array


10
20
30
40
50
Printing Character Array
J
A
V
A
T
P
O
I
N
T

Java Generics Wildcards

Question mark (?) is the wildcard in generics and represent an unknown type. The wildcard can be used as
the type of a parameter, field, or local variable and sometimes as a return type. We can’t use wildcards
while invoking a generic method or instantiating a generic class. In following sections, we will learn about
upper bounded wildcards, lower bounded wildcards, and wildcard capture.

6
Introduction to BIG DATA

Java Generics Upper Bounded Wildcard

Upper bounded wildcards are used to relax the restriction on the type of variable in a method. Suppose we
want to write a method that will return the sum of numbers in the list, so our implementation will be
something like this.

public static double sum(List<Number> list){


double sum = 0;
for(Number n : list){
sum += n.doubleValue();
}
return sum;
}

Now the problem with above implementation is that it won’t work with List of Integers or Doubles because
we know that List<Integer> and List<Double> are not related, this is when upper bounded wildcard is
helpful. We use generics wildcard with extends keyword and the upper bound class or interface that will
allow us to pass argument of upper bound or it’s subclasses types.

The above implementation can be modified like below program.

package com.journaldev.generics;
import java.util.ArrayList;
import java.util.List;
public class GenericsWildcards {

public static void main(String[] args) {


List<Integer> ints = new ArrayList<>();
ints.add(3); ints.add(5); ints.add(10);
double sum = sum(ints);
System.out.println("Sum of ints="+sum);
}

public static double sum(List<? extends Number> list){


double sum = 0;
for(Number n : list){
sum += n.doubleValue();
}
return sum;
}
}

It’s similar like writing our code in terms of interface, in above method we can use all the methods of upper
bound class Number. Note that with upper bounded list, we are not allowed to add any object to the list
except null. If we will try to add an element to the list inside the sum method, the program won’t compile.

Java Generics Unbounded Wildcard

Sometimes we have a situation where we want our generic method to be working with all types, in this
case unbounded wildcard can be used. Its same as using <? extends Object>.

public static void printData(List<?> list){


for(Object obj : list){
System.out.print(obj + "::");
}

7
Introduction to BIG DATA

We can provide List<String> or List<Integer> or any other type of Object list argument to the printData
method. Similar to upper bound list, we are not allowed to add anything to the list.

Java Generics Lower bounded Wildcard

Suppose we want to add Integers to a list of integers in a method, we can keep the argument type as
List<Integer> but it will be tied up with Integers whereas List<Number> and List<Object> can also hold
integers, so we can use lower bound wildcard to achieve this. We use generics wildcard (?) with super
keyword and lower bound class to achieve this.

We can pass lower bound or any super type of lower bound as an argument in this case, java compiler
allows to add lower bound object types to the list.

public static void addIntegers(List<? super Integer> list){


list.add(new Integer(50));
}

Subtyping using Generics Wildcard

List<? extends Integer> intList = new ArrayList<>();


List<? extends Number> numList = intList; // OK. List<? extends Integer> is a subtype of List<? extends
Number>

Java Generics Type Erasure

Generics in Java was added to provide type-checking at compile time and it has no use at run time, so java
compiler uses type erasure feature to remove all the generics type checking code in byte code and insert
type-casting if necessary. Type erasure ensures that no new classes are created for parameterized types;
consequently, generics incur no runtime overhead.

For example if we have a generic class like below;

public class Test<T extends Comparable<T>> {

private T data;
private Test<T> next;

public Test(T d, Test<T> n) {


this.data = d;
this.next = n;
}

public T getData() { return this.data; }


}

The Java compiler replaces the bounded type parameter T with the first bound interface, Comparable, as
below code:

public class Test {

private Comparable data;


private Test next;

8
Introduction to BIG DATA

public Node(Comparable d, Test n) {


this.data = d;
this.next = n;
}

public Comparable getData() { return data; }


}

3. THREADS
Multitasking refers to a computer's ability to perform multiple jobs concurrently more than one program
are running concurrently, e.g., UNIX

A thread is a single sequence of execution within a program

Multithreading refers to multiple threads of control within a single program each program can run
multiple threads of control within it, e.g., Web Browser
Concurrency vs. Parallelism Threads and Processes

What are Threads Good For? • Some problems are intrinsically parallel.

• To maintain responsiveness of an • To monitor status of some resource (DB).


application during a long running task.
• Some APIs and systems demand it: Swing.
• To enable cancellation of separable tasks.

Application Thread

• When we execute an application:

– The JVM creates a Thread object whose task is defined by the main() method

– It starts the thread

– The thread executes the statements of the program one by one until the method returns
and the thread dies

9
Introduction to BIG DATA

Multiple Threads in an Application

• Each thread has its private run-time stack

• If two threads execute the same method, each will have its own copy of the local variables the
methods uses

• However, all threads see the same dynamic memory (heap)

• Two different threads can act on the same object and same static fields concurrently

Creating Threads

• There are two ways to create our own Thread object

1. Subclassing the Thread class and instantiating a new object of that class

2. Implementing the Runnable interface

• In both cases the run() method should be implemented

Extending Thread
public class ThreadExample extends Thread {
public void run () {
for (int i = 1; i <= 100; i++) {
System.out.println(“Thread: ” + i);
}
}
}
Implementing Runnable
public class RunnableExample implements Runnable {
public void run () {
for (int i = 1; i <= 100; i++) {
System.out.println (“Runnable: ” + i);
}
}
}
• The Thread object’s run() method calls the Runnable object’s run() method
• Allows threads to run inside any object, regardless of inheritance

Thread Methods
void start()
– Creates a new thread and makes it runnable
– This method can be called only once
void run()
– The new thread begins its life inside this method
void stop() (deprecated)
The thread is being terminated

10
Introduction to BIG DATA

yield() Causes the currently executing thread object to temporarily pause and allow
other threads to execute
Allow only threads of the same priority to run
sleep(int m)/sleep(int m,int n)
The thread sleeps for m milliseconds, plus n nanoseconds
Starting the Threads

public class ThreadsStartExample


{
public static void main (String argv[])
{
new ThreadExample ().start ();
new Thread(new RunnableExample ()).start ();
}
}

Scheduling Threads

Example:
public class PrintThread1 extends Thread
{
String name;
public PrintThread1(String name)
{
this.name = name;
}
public void run()
{
for (int i=1; i<500 ; i++)
{
try {
sleep((long)(Math.random() * 100));
} catch (InterruptedException ie) { }

11
Introduction to BIG DATA

System.out.print(name);
}
}
public static void main(String args[])
{
PrintThread1 a = new PrintThread1("*");
PrintThread1 b = new PrintThread1("-");
PrintThread1 c = new PrintThread1("=");
a.start();
b.start();
c.start();
}
}
• Thread scheduling is the mechanism used to determine how runnable threads are allocated CPU
time
• A thread-scheduling mechanism is either preemptive or nonpreemptive

Preemptive Scheduling
• Preemptive scheduling – the thread scheduler preempts (pauses) a running thread to allow
different threads to execute
• Nonpreemptive scheduling – the scheduler never interrupts a running thread
• The nonpreemptive scheduler relies on the running thread to yield control of the CPU so that
other threads may execute

Starvation
• A nonpreemptive scheduler may cause starvation (runnable threads, ready to be executed, wait
to be executed in the CPU a lot of time, maybe even forever)
• Sometimes, starvation is also called a livelock

Time-Sliced Scheduling
• Time-sliced scheduling – the scheduler allocates a period of time that each thread can use the
CPU
• when that amount of time has elapsed, the scheduler preempts the thread and switches to a
different thread
• Nontime-sliced scheduler – the scheduler does not use elapsed time to determine when to
preempt a thread
• it uses other criteria such as priority or I/O status

Java Scheduling
• Scheduler is preemptive and based on priority of threads
• Uses fixed-priority scheduling:
– Threads are scheduled according to their priority w.r.t. other threads in the ready queue
• The highest priority runnable thread is always selected for execution above lower priority threads
• When multiple threads have equally high priorities, only one of those threads is guaranteed to be
executing
• Java threads are guaranteed to be preemptive-but not time sliced

12
Introduction to BIG DATA

Thread Priority
• Every thread has a priority
• When a thread is created, it inherits the priority of the thread that created it
• The priority values range from 1 to 10, in increasing priority
• The priority can be adjusted subsequently using the setPriority() method
• The priority of a thread may be obtained using getPriority()
• Priority constants are defined:
• MIN_PRIORITY=1
• MAX_PRIORITY=10
• NORM_PRIORITY=5
• Thread implementation in Java is actually based on operating system support
Some Windows operating systems support only 7 priority levels, so different levels in Java may actually be
mapped to the same operating system level

Daemon Threads
• Daemon threads are “background” threads, that provide services to other threads, e.g., the garbage
collection thread
• The Java VM will not exit if non-Daemon threads are executing
• The Java VM will exit if only Daemon threads are executing
• Daemon threads die when the Java VM exits

Multithreading Client-Server
• Refer next section 5. Simple Client Server Programming Using Java ,example-2

Concurrency
• An object in a program can be changed by more than one thread

Race Condition
• A race condition – the outcome of a program is affected by the order in which the program's
threads are allocated CPU time
• Two threads are simultaneously modifying a single object
• Both threads “race” to store their value

Race Condition Example

Monitors
• Each object has a “monitor” that is a token used to determine which application thread has control
of a particular object instance

13
Introduction to BIG DATA

• In execution of a synchronized method (or block), access to the object monitor must be gained
before the execution
• Access to the object monitor is queued
• Entering a monitor is also referred to as locking the monitor, or acquiring ownership of the
monitor
• If a thread A tries to acquire ownership of a monitor and a different thread has already entered the
monitor, the current thread (A) must wait until the other thread leaves the monitor

Critical Section
• The synchronized methods define critical sections
• Execution of critical sections is mutually exclusive.

Thread Synchronization
• We need to synchronized between transactions, for example, the consumer-producer scenario

• Allows two threads to cooperate


• Based on a single shared lock object
– Marge put a cookie wait and notify Homer
– Homer eat a cookie wait and notify Marge
• Marge put a cookie wait and notify Homer
• Homer eat a cookie wait and notify Marge
The wait() Method
• The wait() method is part of the java.lang.Object interface
• It requires a lock on the object’s monitor to execute
• It must be called from a synchronized method, or from a synchronized segment of code.
• wait() causes the current thread to wait until another thread invokes the notify() method or the
notifyAll() method for this object
• Upon call for wait(), the thread releases ownership of this monitor and waits until another thread
notifies the waiting threads of the object
• wait() is also similar to yield()
• Both take the current thread off the execution stack and force it to be rescheduled
• However, wait() is not automatically put back into the scheduler queue
• notify() must be called in order to get a thread back into the scheduler’s queue

Consumer Producer
synchronized (lock) { produceResource();
while (!resourceAvailable()) { synchronized (lock) {
lock.wait(); lock.notifyAll();
} }
consumeResource();
}

14
Introduction to BIG DATA

Wait/Notify Sequence

4. SOCKETS
What Is a Socket?

Normally, a server runs on a specific computer and has a socket that is bound to a specific port number.
The server just waits, listening to the socket for a client to make a connection request.

On the client-side: The client knows the hostname of the machine on which the server is running and the
port number on which the server is listening. To make a connection request, the client tries to rendezvous
with the server on the server's machine and port. The client also needs to identify itself to the server so it
binds to a local port number that it will use during this connection. This is usually assigned by the system.

If everything goes well, the server accepts the connection. Upon acceptance, the server gets a new socket
bound to the same local port and also has its remote endpoint set to the address and port of the client. It
needs a new socket so that it can continue to listen to the original socket for connection requests while
tending to the needs of the connected client.

On the client side, if the connection is accepted, a socket is successfully created and the client can use the
socket to communicate with the server.

The client and server can now communicate by writing to or reading from their sockets.

Definition:
A socket is one endpoint of a two-way communication link between two programs running on the network.
A socket is bound to a port number so that the TCP layer can identify the application that data is destined
to be sent to.

15
Introduction to BIG DATA

An endpoint is a combination of an IP address and a port number. Every TCP connection can be uniquely
identified by its two endpoints. That way you can have multiple connections between your host and the
server.

Socket class

The Socket class can be used to create a socket.

Important methods

Method Description

1) public InputStream getInputStream() returns the InputStream attached with this socket.

2) public OutputStream getOutputStream() returns the OutputStream attached with this socket.

3) public synchronized void close() closes this socket

ServerSocket class

The ServerSocket class can be used to create a server socket. This object is used to establish
communication with the clients.

Important methods

Method Description

returns the socket and establish a connection between server and


1) public Socket accept()
client.

2) public synchronized void


closes the server socket.
close()

Example of Java Socket Programming


• Refer next section 5. Simple Client Server Programming Using Java

5. SIMPLE CLIENT SERVER PROGRAMMING USING JAVA


Let's see a simple of java socket programming in which client sends a text and server receives it.

File: MyServer.java

1. import java.io.*;
2. import java.net.*;
3. public class MyServer {
4. public static void main(String[] args){
5. try{

16
Introduction to BIG DATA

6. ServerSocket ss=new ServerSocket(6666);


7. Socket s=ss.accept();//establishes connection
8. DataInputStream dis=new DataInputStream(s.getInputStream());
9. String str=(String)dis.readUTF();
10. System.out.println("message= "+str);
11. ss.close();
12. }catch(Exception e){System.out.println(e);}
13. }
14. }

File: MyClient.java

1. import java.io.*;
2. import java.net.*;
3. public class MyClient {
4. public static void main(String[] args) {
5. try{
6. Socket s=new Socket("localhost",6666);
7. DataOutputStream dout=new DataOutputStream(s.getOutputStream());
8. dout.writeUTF("Hello Server");
9. dout.flush();
10. dout.close();
11. s.close();
12. }catch(Exception e){System.out.println(e);}
13. }
14. }

To execute this program open two command prompts and execute each program at each
command prompt as displayed in the below figure.

After running the client application, a message will be displayed on the server console.

Example-2: Client Server Multithread programming

Server

import java.net.*;import java.io.*;


class HelloServer {
public static void main(String[] args)
{
int port = Integer.parseInt(args[0]);
try

17
Introduction to BIG DATA

{
ServerSocket server = new ServerSocket(port);
}
catch (IOException ioe)
{
System.err.println(“Couldn't run “ + “server on port “ + port);
return;
}
while(true)
{
try {
Socket connection = server.accept();
ConnectionHandler handler = new ConnectionHandler(connection);
new Thread(handler).start();
} catch (IOException ioe1) { }
}
Connection Handler
// Handles a connection of a client to an HelloServer.
// Talks with the client in the 'hello' protocol
class ConnectionHandler implements Runnable
{
// The connection with the client
private Socket connection;
public ConnectionHandler(Socket connection)
{
this.connection = connection;
}
public void run()
{
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(
connection.getOutputStream()));

String clientName = reader.readLine();


writer.println(“Hello “ + clientName);
writer.flush();
} catch (IOException ioe) {}
}
}

Client side

import java.net.*; import java.io.*;


// A client of an HelloServer
class HelloClient {
public static void main(String[] args)
{
String hostname = args[0];
int port = Integer.parseInt(args[1]);
Socket connection = null;
try
{

18
Introduction to BIG DATA

connection = new Socket(hostname, port);


}
catch (IOException ioe)
{
System.err.println("Connection failed");
return;
}
try
{
BufferedReader reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(
connection.getOutputStream()));
writer.println(args[2]); // client name
String reply = reader.readLine();
System.out.println("Server reply: "+reply);
writer.flush();
}
catch (IOException ioe1) { }
}

6. Difficulties In Developing Distributed Programs For Large Scale


Clusters
Designing and implementing a distributed program for the Large Scale Clusters involves more than just
sending and receiving messages and deciding upon the computational and architectural models. While all
these are extremely important, they do not reflect the whole story of developing programs for the
Distributed Programs For Large Scale Clusters.
Some Difficulties In Developing Distributed Programs For Large Scale Clusters are:

1. Heterogeneity
2. Scalability
3. Communication
4. Synchronization
5. fault-tolerance and
6. Security and privacy:
7. scheduling
8. Openness and Extensibility
9. Transparency

1 Heterogeneity

• distributed programs must be designed in a way that masks the heterogeneity of the underlying
hardware, networks, OSs, and the programming languages

• Another serious problem that requires a great deal of attention from distributed programmers is
performance variation.

• Performance variation entails that running the same distributed program on the same cluster twice
can result in largely different execution times.
• Clearly, this can create tricky load-imbalance and subsequently degrade overall performance

19
Introduction to BIG DATA

2. Scalability
• A distributed program is said to be scalable if it remains effective when the quantities of users, data
and resources are increased significantly
• Requires tens and hundreds of thousands of machines to maintain performance and load

3. Communication

• Distributed systems are composed of networked computers that can communicate by explicitly
passing messages or implicitly accessing shared memories.
• Even with distributed shared memory systems, messages are internally passed between machines,
yet in a manner that is totally transparent to users.
• Distributed systems such as the Big Data rely heavily on the underlying network to deliver
messages rapidly enough to destination entities for three main reasons, performance, cost and
quality of service (QoS).
• Specifically, faster delivery of messages entails minimized execution times, reduced costs and
higher QoS, especially for audio and video applications.
• Communication is at the heart of the Large Scale Clusters and is one of its major bottlenecks.

4. Synchronization
• Distributed tasks should be allowed to simultaneously operate on shared data without corrupting
data or causing any inconsistency
• Race-conditions whereby two tasks might try to modify data on a shared edge at the same time,
resulting in a corrupted value.
• Wide use of semaphores, locks and barriers
• Avoiding the deadlock and practicing mutual exclusions are need to apply for synchronizing the
data
5. Fault-tolerance
• The ability to tolerate faults in software system is required in applications like nuclear plant,
Space missions, medical equipments etc.
• Different fault injection techniques are used for fault tolerance by injecting faults in the system
under test.

6. Security and privacy:


• How to apply the security policies to the interdependent system is a great issue in distributed
system. Since distributed systems deal with sensitive data and information so the system must
have a strong security and privacy measurement.
• Protection of distributed system assets, including base resources, storage, communications and
user-interface I/O as well as higher-level composites of these resources, like processes, files,
messages, display windows and more complex objects, are important issues in distributed system

7.Scheduling:
• Focuses on Scheduling problems in homogeneous and heterogeneous parallel distributed systems.
• The performance of distributed systems are affected by Broadcast/multicast processing and
required to develop a delivering procedure that completes the processing in minimum time.

8.Openness and Extensibility:


• Interfaces should be separated and publicly available to enable easy extensions to existing
components and add new components
9.Transparency:
• Transparency means up to what extent the distributed system program should appear to the user
as a single system? Distributed system must be designed to hide the complexity of the system to a
greater extent.

20
Introduction to BIG DATA

7. Introduction to cloud computing


Cloud computing is a form of Internet-based computing that provides shared computer processing
resources and data to computers and other devices on demand.

Characteristics

Cloud computing has a variety of characteristics, with the main ones being:

Shared Infrastructure
- Uses a virtualized software model, enabling the sharing of physical services, storage, and
networking capabilities.
- The cloud infrastructure, regardless of deployment model, seeks to make the most of the
available infrastructure across a number of users.

Dynamic Provisioning
- Allows for the provision of services based on current demand requirements. This is done
automatically using software automation, enabling the expansion and contraction of service
capability, as needed. This dynamic scaling needs to be done while maintaining high levels
of reliability and security
Network Access
- Needs to be accessed across the internet from a broad range of devices such as PCs, laptops,
and mobile devices, using standards-based APIs (for example, ones based on HTTP).
- Deployments of services in the cloud include everything from using business applications
to the latest application on the newest smartphones.
Managed Metering
- Uses metering for managing and optimizing the service and to provide reporting and billing
information.
- In this way, consumers are billed for services according to how much they have actually
used during the billing period.

Service Models
Once a cloud is established, how its cloud computing services are deployed in terms of business models
can differ depending on requirements. The primary service models being deployed (see Figure 1) are
commonly known as:
• Software as a Service (SaaS)
- Consumers purchase the ability to access and use an application or service that is hosted in the
cloud.
- A benchmark example of this is Salesforce.com, as discussed previously, where necessary
information for the interaction between the consumer and the service is hosted as part of the
service in the cloud.
- Also, Microsoft is expanding its involvement in this area, and as part of the cloud computing
option for Microsoft Office 2010, its Office Web Apps are available to Office volume licensing
customers and Office Web App subscriptions through its cloud-based Online Services.
• Platform as a Service (PaaS)
- Consumers purchase access to the platforms, enabling them to deploy their own software and
applications in the cloud. The operating systems and network access are not managed by the
consumer, and there might be constraints as to which applications can be deployed.
• Infrastructure as a Service (IaaS)
- Consumers control and manage the systems in terms of the operating systems, applications,
storage, and network connectivity, but do not themselves control the cloud infrastructure.

21
Introduction to BIG DATA

Also known are the various subsets of these models that may be related to a particular industry or market.
Communications as a Service (CaaS) is one such subset model used to describe hosted IP telephony
services. Along with the move to CaaS is a shift to more IP-centric communications and more SIP trunking
deployments. With IP and SIP in place, it can be as easy to have the PBX in the cloud as it is to have it on the
premise. In this context, CaaS could be seen as a subset of SaaS.

Deployment Models

Deploying cloud computing can differ depending on requirements, and the following four deployment
models have been identified, each with specific characteristics that support the needs of the services
and users of the clouds in particular ways (see Figure 2).

• Private Cloud
- The cloud infrastructure has been deployed, and is maintained and operated for a specific
organization. The operation may be in-house or with a third party on the premises.
• Community Cloud
- The cloud infrastructure is shared among a number of organizations with similar interests and
requirements.
- This may help limit the capital expenditure costs for its establishment as the costs are shared
among the organizations. The operation may be in-house or with a third party on the premises.
• Public Cloud
- The cloud infrastructure is available to the public on a commercial basis by a cloud service
provider. This enables a consumer to develop and deploy a service in the cloud with very little
financial outlay compared to the capital expenditure requirements normally associated with
other deployment options.

22
Introduction to BIG DATA

• Hybrid Cloud
- The cloud infrastructure consists of a number of clouds of any type, but the clouds have the
ability through their interfaces to allow data and/or applications to be moved from one cloud
to another. This can be a combination of private and public clouds that support the
requirement to retain some data in an organization, and also the need to offer services in the
cloud

Benefits

The following are some of the possible benefits for those who offer cloud computing-based
services and applications:

• Cost Savings
— Companies can reduce their capital expenditures and use operational expenditures for
increasing their computing capabilities. This is a lower barrier to entry and also requires fewer in-
house IT resources to provide system support.
• Scalability/Flexibility
— Companies can start with a small deployment and grow to a large deployment fairly rapidly, and
then scale back if necessary. Also, the flexibility of cloud computing allows companies to use extra
resources at peak times, enabling them to satisfy consumer demands.
• Reliability
— Services using multiple redundant sites can support business continuity and disaster recovery.
• Maintenance
— Cloud service providers do the system maintenance, and access is through APIs that do not
require application installations onto PCs, thus further reducing maintenance requirements.
• Mobile Accessible
— Mobile workers have increased productivity due to systems accessible in an infrastructure
available from anywhere.

Challenges

The following are some of the notable challenges associated with cloud computing, and although some of
these may cause a slowdown when delivering more services in the cloud, most also can provide
opportunities, if resolved with due care and attention in the planning stages.

• Security and Privacy


— Perhaps two of the more “hot button” issues surrounding cloud computing relate to storing and
securing data, and monitoring the use of the cloud by the service providers. These issues are
generally attributed to slowing the deployment of cloud services. These challenges can be
addressed, for example, by storing the information internal to the organization, but allowing it to
be used in the cloud. For this to occur, though, the security mechanisms between organization and
the cloud need to be robust and a Hybrid cloud could support such a deployment.
• Lack of Standards
— Clouds have documented interfaces; however, no standards are associated with these, and thus
it is unlikely that most clouds will be interoperable. The Open Grid Forum is developing an Open
Cloud Computing Interface to resolve this issue and the Open Cloud Consortium is working on
cloud computing standards and practices. The findings of these groups will need to mature, but it is
not known whether they will address the needs of the people deploying the services and the
specific interfaces these services need. However, keeping up to date on the latest standards as they
evolve will allow them to be leveraged, if applicable.

• Continuously Evolving
— User requirements are continuously evolving, as are the requirements for interfaces,
networking, and storage. This means that a “cloud,” especially a public one, does not remain static
and is also continuously evolving.

23

You might also like