BDA Exp Removed Removed
BDA Exp Removed Removed
AIM:
Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm
PREREQUISITES:
• Java Installation:
• Ensure Java Development Kit (JDK) is installed on all nodes of your Hadoop cluster.
• Set the JAVA_HOME environment variable to point to your JDK installation
directory.
• Hadoop Installation:
• Install Apache Hadoop on your cluster. Ensure Hadoop is properly configured and all
nodes are accessible.
• Hadoop HDFS should be up and running, and you should have basic knowledge of
configuring Hadoop properties (core-site.xml, hdfs-site.xml, mapred-site.xml, etc.).
• Development Environment:
• Set up a development environment with Hadoop installed locally if you're testing on a
single-node setup (pseudo-distributed mode).
SOURCE CODE :
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
RESULT:
Exp no: IMPLEMENTING MATRIX MULTIPLICATION WITH
HADOOP MAP REDUCE
Date:
AIM:
MAPPING :
CO5:Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data
analytics.
PREREQUISITES :
Java Installation:
• Ensure Java Development Kit (JDK) is installed on all nodes of your Hadoop cluster.
• Set the JAVA_HOME environment variable to point to your JDK installation
directory.
Hadoop Installation:
• Install Apache Hadoop on your cluster. Ensure Hadoop is properly configured and all
nodes are accessible.
• Hadoop HDFS should be up and running, and you should have basic knowledge of
configuring Hadoop properties (core-site.xml, hdfs-site.xml, mapred-site.xml, etc.).
Development Environment:
• Set up a development environment with Hadoop installed locally if you're testing on a
single-node setup (pseudo-distributed mode).
a. for each element mij of M do produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto
the number of columns of N
b. for each element njk of N do produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the
number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij) and (N, j,njk) for all
possible values of j.
CODING:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
class Element implements Writable {
int tag;
int index;
double value;
Element() {
tag = 0;
index = 0;
value = 0.0;
}
Element(int tag, int index, double value) {
this.tag = tag;
this.index = index;
this.value = value;
}
@Override
public void readFields(DataInput input) throws IOException {
tag = input.readInt();
index = input.readInt();
value = input.readDouble();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair> {
int i;
int j;
Pair() {
i = 0;
j = 0;
}
Pair(int i, int j) {
this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException {
i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare) {
if (i > compare.i) {
return 1;
} else if ( i < compare.i) {
return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j < compare.j) {
return -1;
}
}
return 0;
}
public String toString() {
return i + "" + j + "";
}
}
public class Multiply
{
public static class MatriceMapperM extends Mapper<Object,Text,IntWritable,Element>
{ 24 Department of CSE
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] stringTokens = readLine.split(",");
int index = Integer.parseInt(stringTokens[0]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new IntWritable(Integer.parseInt(stringTokens[1]));
context.write(keyValue, e);
}
}
public static class MatriceMapperN extends Mapper<Object,Text,IntWritable,Element> {
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] stringTokens = readLine.split(",");
int index = Integer.parseInt(stringTokens[1]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(1,index, elementValue);
IntWritable keyValue = new IntWritable(Integer.parseInt(stringTokens[0]));
context.write(keyValue, e);
}
}
public static void main(String[] args) throws Exception {
Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, MatriceMapperN.class);
job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
Job job2 = Job.getInstance();
job2.setJobName("MapFinalOutput");
job2.setJarByClass(Project1.class);
job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);
job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job2, new Path(args[2]));
FileOutputFormat.setOutputPath(job2, new Path(args[3])); job2.waitForCompletion(true);
}
}
#!/bin/bash
rm -rf multiply.jar classes
module load hadoop/2.6.0
mkdir -p classes
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath` Multiply.java
jar cf multiply.jar -C classes .
echo "end"
export HADOOP_CONF_DIR=/home/$USER/cometcluster
module load hadoop/2.6.0
myhadoop-configure.sh
start-dfs.sh
start-yarn.sh
hdfs dfs -mkdir -p /user/$USER
hdfs dfs -put M-matrix-large.txt /user/$USER/M-matrix-large.txt
hdfs dfs -put N-matrix-large.txt /user/$USER/N-matrix-large.txt
hadoop jar multiply.jar edu.uta.cse6331.Multiply /user/$USER/M-matrix-large.txt /user/$USER/N-
matrix-large.txt /user/$USER/intermediate /user/$USER/output
rm -rf output-distr
mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr
stop-yarn.sh
stop-dfs.sh
myhadoop-cleanup.sh
OUTPUT:
RESULT:
Exp no:
AIM :
To install and implement HBase commands .
PROCEDURE :
Installing Apache HBase involves several steps to ensure proper setup and configuration. Here's a
general procedure for installing HBase:
Prerequisites
• Java Installation:
• Ensure Java Development Kit (JDK) is installed. HBase requires Java 8 or later
versions.
• Set the JAVA_HOME environment variable to point to your JDK installation
directory.
• Hadoop Installation (Optional):
• HBase typically runs on top of Hadoop HDFS. If you haven't installed Hadoop
separately, you can use HBase's standalone mode for development purposes.
COMMANDS
To verify that the table has been created successfully, you can use the list
command to list all the tables in HBase.
Syntax: list
This command exits the hbase shell and returns you to your system’s command prompt. Syntax:
exit
OUTPUT:
RESULT:
Exp no:
AIM:
To install cloudera, virtualbox and implement the Hive shell commands in the
terminal.
PROCEDURE:
PRE-REQUISITES :
Cloudera
• This will be default configuration for this Virtual Machine. You can also change
the configuration and allocate according to your needs. Best is to provide Minimum
4GB for this Virtual Machine.
Under the hood, everything is pre-configured so that you don’t need to configure it by
yourself.
Click on Terminal to see hadoop version, hive, oozie, pig, spark-shell, HBase and
many more.
• First, we need to know the IP address/Host of this Virtual Machine. Open Cloudera
Terminal and type ‘ifconfig’
ALGORITHM:
CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, designation String)
COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/input';
RESULT:
.
Ex No: 06 INSTALLATION OF HBASE, INSTALLING THRIFT ALONG
Date: WITH PRACTICE EXAMPLES
AIM:
To install HBase in windows.
PROCEDURE:
Download Thrift:
Visit the Apache Thrift website: https://thrift.apache.org/download.
Download and extract Thrift.
Build and Install Thrift:
./configure
make
sudo make install
// Perform operations
// ... add your HBase Thrift operations here ...
Ensure that your HBase Thrift server is running and accessible at the specified host and port. Also,
make sure the necessary HBase Thrift libraries are included in your Java project's classpath.
The provided Java code connects to an HBase Thrift server, performs unspecified operations (indicated
by comments), and handles exceptions. Since the actual operations are not specified in the code, the
output would depend on what operations you perform within the try block.
If everything runs successfully (meaning the HBase Thrift server is running and reachable, and your
operations execute without errors), the program will terminate without any output.
RESULT:
Ex.No: 07 PRACTICE IMPORTING AND EXPORTING DATA FROM
Date: VARIOUS DATABASES
AIM:
To perform importing and exporting data from various databases.
Such as HDFS, Apache Hive and Apache spark
PROCEDURE:
Importing Data:
• Use the Hadoop hdfs dfs command-line tool or Hadoop File System API to copy data from a
local file system or another location to HDFS. For example:
2. Apache Hive:
• Hive supports data import from various sources, including local files, HDFS, and databases.
You can use the LOAD DATA statement to import data into Hive tables. For example:
• This statement loads data from the HDFS path /hdfs/path/data.txt into the Hive table my_table.
3. Apache Spark:
Spark provides rich APIs for data ingestion. You can use th DataFrameReader or SparkSession
APIs to read data from different source such as CSV files, databases, or streaming systems. For
example:
val df = spark.read.format("esv").load("/path/to/data.csv")
This code reads data from the CSV file located at /path/to/data.csv inte DataFrame in Spark.
Exporting Data:
• Use the Hadoop hdfs dfs command-line tool or Hadoop File System AP copy data from HDFS
to a local file system or another location. For example:
• This command downloads the file /hdfs/path/file.txt from HDFS and saves it as local file.txt in
the local file system.
2. Apache Hive:
• Exporting data from Hive can be done in various ways, depending on the desired output format.
You can use the INSERT OVERWRITE statement to export data from Hive tables to files or
other Hive tables. For example:
• This statement exports the data from the table Hive table to the local directory /path/to/output.
3. Apache Spark:
• Spark provides flexible options for data export. You can use theDataFrame Writer or Dataset
Writer APIs to write data to different file formats, databases, or streaming systems. For example:
df.write.format("parquet").save("/path/to/output")
• This code saves the DataFrame df in Parquet format to the specified output directory.
RESULT: