0% found this document useful (0 votes)
18 views33 pages

BDA Exp Removed Removed

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views33 pages

BDA Exp Removed Removed

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Exp no: WORD COUNT MAP REDUCE PROGRAM TO UNDERSTAND

MAP REDUCE PARADIGM


Date:

AIM:

Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm

PREREQUISITES:

• Java Installation:
• Ensure Java Development Kit (JDK) is installed on all nodes of your Hadoop cluster.
• Set the JAVA_HOME environment variable to point to your JDK installation
directory.
• Hadoop Installation:
• Install Apache Hadoop on your cluster. Ensure Hadoop is properly configured and all
nodes are accessible.
• Hadoop HDFS should be up and running, and you should have basic knowledge of
configuring Hadoop properties (core-site.xml, hdfs-site.xml, mapred-site.xml, etc.).
• Development Environment:
• Set up a development environment with Hadoop installed locally if you're testing on a
single-node setup (pseudo-distributed mode).

SOURCE CODE :

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class wordCount {


public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable <IntWritable> values, Context context)


throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class);


job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job,
new Path(args[1])); job.waitForCompletion(true);
}
}
OUTPUT:

RESULT:
Exp no: IMPLEMENTING MATRIX MULTIPLICATION WITH
HADOOP MAP REDUCE
Date:

AIM:

Implement of Matrix Multiplication with Hadoop Map Reduce

MAPPING :

CO5:Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data
analytics.

PREREQUISITES :

Java Installation:
• Ensure Java Development Kit (JDK) is installed on all nodes of your Hadoop cluster.
• Set the JAVA_HOME environment variable to point to your JDK installation
directory.
Hadoop Installation:
• Install Apache Hadoop on your cluster. Ensure Hadoop is properly configured and all
nodes are accessible.
• Hadoop HDFS should be up and running, and you should have basic knowledge of
configuring Hadoop properties (core-site.xml, hdfs-site.xml, mapred-site.xml, etc.).
Development Environment:
• Set up a development environment with Hadoop installed locally if you're testing on a
single-node setup (pseudo-distributed mode).

ALGORITHM FOR MAP FUNCTION :

a. for each element mij of M do produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto
the number of columns of N
b. for each element njk of N do produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the
number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij) and (N, j,njk) for all
possible values of j.

ALGORITHM FOR REDUCE FUNCTION :


d. for each key (i,k) do
e. sort values begin with M by j in listM sort values begin with N by j in listN multiply mij and njk for
jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk
STEP 1: Download the hadoop jar files with these links.
Download Hadoop Common Jar files: https://goo.gl/G4MyHp
$ wget https://goo.gl/G4MyHp -O hadoop-common-2.2.0.jar
Download Hadoop Mapreduce Jar File: https://goo.gl/KT8yfB
$ wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-2.7.1.jar

CODING:

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
class Element implements Writable {
int tag;
int index;
double value;
Element() {
tag = 0;
index = 0;
value = 0.0;
}
Element(int tag, int index, double value) {
this.tag = tag;
this.index = index;
this.value = value;
}
@Override
public void readFields(DataInput input) throws IOException {
tag = input.readInt();
index = input.readInt();
value = input.readDouble();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair> {
int i;
int j;
Pair() {
i = 0;
j = 0;
}
Pair(int i, int j) {
this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException {
i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare) {
if (i > compare.i) {
return 1;
} else if ( i < compare.i) {
return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j < compare.j) {
return -1;
}
}
return 0;
}
public String toString() {
return i + "" + j + "";
}
}
public class Multiply
{
public static class MatriceMapperM extends Mapper<Object,Text,IntWritable,Element>
{ 24 Department of CSE
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] stringTokens = readLine.split(",");
int index = Integer.parseInt(stringTokens[0]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new IntWritable(Integer.parseInt(stringTokens[1]));
context.write(keyValue, e);
}
}
public static class MatriceMapperN extends Mapper<Object,Text,IntWritable,Element> {
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] stringTokens = readLine.split(",");
int index = Integer.parseInt(stringTokens[1]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(1,index, elementValue);
IntWritable keyValue = new IntWritable(Integer.parseInt(stringTokens[0]));
context.write(keyValue, e);
}
}
public static void main(String[] args) throws Exception {
Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, MatriceMapperN.class);
job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
Job job2 = Job.getInstance();
job2.setJobName("MapFinalOutput");
job2.setJarByClass(Project1.class);
job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);
job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job2, new Path(args[2]));
FileOutputFormat.setOutputPath(job2, new Path(args[3])); job2.waitForCompletion(true);
}
}
#!/bin/bash
rm -rf multiply.jar classes
module load hadoop/2.6.0
mkdir -p classes
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath` Multiply.java
jar cf multiply.jar -C classes .
echo "end"
export HADOOP_CONF_DIR=/home/$USER/cometcluster
module load hadoop/2.6.0
myhadoop-configure.sh
start-dfs.sh
start-yarn.sh
hdfs dfs -mkdir -p /user/$USER
hdfs dfs -put M-matrix-large.txt /user/$USER/M-matrix-large.txt
hdfs dfs -put N-matrix-large.txt /user/$USER/N-matrix-large.txt
hadoop jar multiply.jar edu.uta.cse6331.Multiply /user/$USER/M-matrix-large.txt /user/$USER/N-
matrix-large.txt /user/$USER/intermediate /user/$USER/output
rm -rf output-distr
mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr
stop-yarn.sh
stop-dfs.sh
myhadoop-cleanup.sh
OUTPUT:

module load hadoop/2.6.0


rm -rf output intermediate
hadoop --config $HOME jar multiply.jar edu.uta.cse6331.Multiply M-matrix-small.txt N-matrix-
small.txt intermediate output.

RESULT:
Exp no:

HBASE PRACTICE EXAMPLES


Date:

AIM :
To install and implement HBase commands .

PROCEDURE :

Installing Apache HBase involves several steps to ensure proper setup and configuration. Here's a
general procedure for installing HBase:
Prerequisites
• Java Installation:
• Ensure Java Development Kit (JDK) is installed. HBase requires Java 8 or later
versions.
• Set the JAVA_HOME environment variable to point to your JDK installation
directory.
• Hadoop Installation (Optional):
• HBase typically runs on top of Hadoop HDFS. If you haven't installed Hadoop
separately, you can use HBase's standalone mode for development purposes.

COMMANDS

1.Start hbase shell:


Open your terminal or command prompt and run the following command to start the
HBase shell:
Syntax: hbase shell
2.Verify Table Creation:

To verify that the table has been created successfully, you can use the list
command to list all the tables in HBase.
Syntax: list

3.Create a new table:


To create a new table, you'll use the create command followed by the table
name and the names of the column families you want in the table. Here's the
basic syntax:
Syntax: create 'table_name', 'column_family1', 'column_family2', ...

4.Insert the rows into tables


To insert data into an HBase table, you can use the put command in the HBase shell.
Here's the basic syntax for inserting data
Syntax:
put 'table_name', 'row_key', 'column_family:column_qualifier', 'value'

5.Describe the table


This command provides information about the structure of the 'students' table,
including its column families.
Syntax: describe ‘table_name’

6.Disable the table


This command disables (pauses) the 'students' table, preventing any read or write
operations on it.
Syntax: disable ‘table_name’

7.Enable the table


This command re-enables the 'students' table, allowing read and write operations again.
Syntax: enable ‘table_name’

8.Alter the table


This command adds a new column family named 'contact_details' to the 'students' table.
Syntax: alter 'table_name', {NAME => 'new_column_family'}

9.Count rows in table


This command counts and displays the total number of rows in the 'students' table.
Syntax: count ‘table_name’

10.Check if the table exists


This command checks if a table named 'employees' exists. It will return true if the
table exists or false if it doesn't
Syntax: exists ‘table_name’

11.View the table


This command scans and retrieves all data in the 'students' table.

Syntax: scan ‘table_name’


12.Drop the table
This command permanently deletes the 'employees' table and all its data. Use it with caution.

Syntax: drop ‘table_name’

13.Exit hbase shell:

This command exits the hbase shell and returns you to your system’s command prompt. Syntax:
exit
OUTPUT:
RESULT:
Exp no:

INSTALLATION OF HIVE WITH EXAMPLES


Date:

AIM:
To install cloudera, virtualbox and implement the Hive shell commands in the
terminal.

PROCEDURE:
PRE-REQUISITES :
Cloudera

ORACLE VM virtual box

INSTALLATION PROCEDURE OF CLOUD ERA :


• Download Oracle VM Virtual Box

• Next, we need to download the Cloudera HYPERLINK


"https://drive.google.com/drive/folders/1vLg6XUSjcC3Jl78SWodVGGxbgX0RO
0_N?usp=sharing"Quickstart HYPERLINK
"https://drive.google.com/drive/folders/1vLg6XUSjcC3Jl78SWodVGGxbgX0RO
0_N?usp=sharing" VM. Version : 5.13. File size is 5.5 GB.

• Configuring Cloudera on Virtual Machine


• Once you downloaded Cloudera Quickstart VM from the above link. You will see
two files like below.

• Open the Oracle VirtualBox

• Click on File, Import Appliance


Browse your location of the file and check for the file ending with Open Virtualization
Format (.ovf)

• Click on Next once finished.

• This will be default configuration for this Virtual Machine. You can also change
the configuration and allocate according to your needs. Best is to provide Minimum
4GB for this Virtual Machine.

• Click on Import, it will take a few seconds.

• After that you will be able to see cloudera-quickstart-vm on Oracle VM VirtualBox


Manager. Right Click on it and Select Settings. Go to Network and Change
Attached from NAT to Host-Only-Adapter. Click on Ok to apply the settings.
• Click on Start to start the cloudera environment on your Virtual Machine.

• It will take a few minutes to load the Cloudera Environment on VM.


• Once it finished loading, you will be able to see this screen. That means you have
successfully installed Cloudera on Virtual Machine .

Under the hood, everything is pre-configured so that you don’t need to configure it by
yourself.
Click on Terminal to see hadoop version, hive, oozie, pig, spark-shell, HBase and
many more.

Connect Cloudera VM from your Local System


• This is made possible by HYPERLINK
"https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html"Putty
HYPERLINK "https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html"
for windows users.

• First, we need to know the IP address/Host of this Virtual Machine. Open Cloudera
Terminal and type ‘ifconfig’
ALGORITHM:

Step 1: Create a Database (if not exists)


Database name (`userdb`).

Step 2: Create a Table (if not exists)


Table name (`employee`), columns (`eid`, `name`, `salary`, `designation`), delimiters (`'\t'`,
`'\n'`), and storage location (`'/user/input'`).

Step 3: Load Data into the Table


Path to the local data file (`inputdata.txt`).

Step 4: Create a View


Input: View name (`writer_editor`), condition (`designation='Writer' OR
designation='Editor'`).

Step 5: Create an Index (with Deferred Rebuild)


Index name (`index_salary`), indexed column (`salary`), index handler
(`'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'`).

Step 6: Query Data


Retrieve and display all records from the `employee` table.
Retrieve and display records from the `writer_editor` view.
PROGRAM:

CREATE DATABASE IF NOT EXISTS userdb;

CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, designation String)
COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/input';

LOAD DATA LOCAL INPATH 'inputdata.txt' OVERWRITE INTO TABLE employee;

CREATE VIEW writer_editor AS SELECT * FROM employee WHERE designation='Writer' or


designation='Editor';

CREATE INDEX index_salary ON TABLE employee(salary) AS


'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;

SELECT * from employee;

SELECT * from writer_editor;


OUTPUT:

RESULT:

.
Ex No: 06 INSTALLATION OF HBASE, INSTALLING THRIFT ALONG
Date: WITH PRACTICE EXAMPLES

AIM:
To install HBase in windows.

PROCEDURE:

Step-1: (Extraction of files)


Extract all the files in C drive
Step-2: (Creating Folder)
Create folders named "hbase" and "zookeeper."

Step-3: (Deleting line in HBase.cmd)


Open hbase.cmd in any text editor.
Search for line %HEAP_SETTINGS% and remove it.

Step-4: (Add lines in hbase-env.cmd)


Now open hbase-env.cmd, which is in the conf folder in any text editor.
set JAVA_HOME=%JAVA_HOME%
set HBASE_CLASSPATH=%HBASE_HOME%\lib\client-facing-thirdparty\*
set HBASE_HEAPSIZE=8000
set HBASE_OPTS="-XX:+UseConcMarkSweepGC""-Djava.net.preferIPv4Stack=true"
set SERVER_GC_OPTS="-verbose:gc""-XX:+PrintGCDetails""-XX:+PrintGCDateStamps"
%HBASE_GC_OPTS%
set HBASE_USE_GC_LOGFILE=true
set HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false""-
Dcom.sun.management.jmxremote.authenticate=false"

set HBASE_MASTER_OPTS=%HBASE_JMX_BASE% "-


Dcom.sun.management.jmxremote.port=10101"
set HBASE_REGIONSERVER_OPTS=%HBASE_JMX_BASE% "-
Dcom.sun.management.jmxremote.port=10102"
set HBASE_THRIFT_OPTS=%HBASE_JMX_BASE% "-
Dcom.sun.management.jmxremote.port=10103"
set HBASE_ZOOKEEPER_OPTS=%HBASE_JMX_BASE% -
Dcom.sun.management.jmxremote.port=10104"
set HBASE_REGIONSERVERS=%HBASE_HOME%\conf\regionservers
set HBASE_LOG_DIR=%HBASE_HOME%\logs
set HBASE_IDENT_STRING=%USERNAME%
set HBASE_MANAGES_ZK=true
Step-6: (Setting Environment Variables)
Now set up the environment variables.
Search "System environment variables."

Now click on " Environment Variables."


Then click on "New."

Variable name: HBASE_HOME


Variable Value: Put the path of the Hbase folder.
We have completed the HBase Setup on Windows procedure.
Step 7: Install Apache Thrift

Download Thrift:
Visit the Apache Thrift website: https://thrift.apache.org/download.
Download and extract Thrift.
Build and Install Thrift:
./configure
make
sudo make install

Step 8: Practice Examples (Using Java with HBase and Thrift)


Below is a simple Java example demonstrating how to use Apache Thrift to interact with HBase:
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import org.apache.hadoop.hbase.thrift.generated.Hbase;

public class HBaseThriftExample {

public static void main(String[] args) {


TTransport transport = new TSocket("localhost", 9090);
try {
transport.open();

// Create Thrift client


Hbase.Client client = new Hbase.Client(new TBinaryProtocol(transport));

// Perform operations
// ... add your HBase Thrift operations here ...

// Close the transport


transport.close();
} catch (TException e) {
e.printStackTrace();
}
}
}

Ensure that your HBase Thrift server is running and accessible at the specified host and port. Also,
make sure the necessary HBase Thrift libraries are included in your Java project's classpath.

The provided Java code connects to an HBase Thrift server, performs unspecified operations (indicated
by comments), and handles exceptions. Since the actual operations are not specified in the code, the
output would depend on what operations you perform within the try block.

If everything runs successfully (meaning the HBase Thrift server is running and reachable, and your
operations execute without errors), the program will terminate without any output.

RESULT:
Ex.No: 07 PRACTICE IMPORTING AND EXPORTING DATA FROM
Date: VARIOUS DATABASES

AIM:
To perform importing and exporting data from various databases.
Such as HDFS, Apache Hive and Apache spark

PROCEDURE:

Importing Data:

1. Hadoop Distributed File System (HDFS):

• Use the Hadoop hdfs dfs command-line tool or Hadoop File System API to copy data from a
local file system or another location to HDFS. For example:

S hdfs dfs -put local_file.txt /hdfs/path


 This command uploads the local_file.txt from the local file system to the HDFS path /hdfs/path.

2. Apache Hive:

• Hive supports data import from various sources, including local files, HDFS, and databases.
You can use the LOAD DATA statement to import data into Hive tables. For example:

LOAD DATA INPATH '/hdfs/path/data.txt' INTO TABLE my_table;

• This statement loads data from the HDFS path /hdfs/path/data.txt into the Hive table my_table.

3. Apache Spark:

 Spark provides rich APIs for data ingestion. You can use th DataFrameReader or SparkSession
APIs to read data from different source such as CSV files, databases, or streaming systems. For
example:

val df = spark.read.format("esv").load("/path/to/data.csv")

 This code reads data from the CSV file located at /path/to/data.csv inte DataFrame in Spark.
Exporting Data:

1. Hadoop Distributed File System (HDFS):

• Use the Hadoop hdfs dfs command-line tool or Hadoop File System AP copy data from HDFS
to a local file system or another location. For example:

S hdfs dfs -get/hdfs/path/file.txt local_file.txt

• This command downloads the file /hdfs/path/file.txt from HDFS and saves it as local file.txt in
the local file system.

2. Apache Hive:

• Exporting data from Hive can be done in various ways, depending on the desired output format.
You can use the INSERT OVERWRITE statement to export data from Hive tables to files or
other Hive tables. For example:

INSERT OVERWRITE LOCAL DIRECTORY '/path/to/output SELECT FROM my_table;

• This statement exports the data from the table Hive table to the local directory /path/to/output.

3. Apache Spark:

• Spark provides flexible options for data export. You can use theDataFrame Writer or Dataset
Writer APIs to write data to different file formats, databases, or streaming systems. For example:

df.write.format("parquet").save("/path/to/output")

• This code saves the DataFrame df in Parquet format to the specified output directory.

RESULT:

You might also like