0% found this document useful (0 votes)

21 views34 pages

BDA Record (1)

Uploaded by

srikanthpurimitla2k19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views34 pages

BDA Record (1)

Uploaded by

srikanthpurimitla2k19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

ADVANCED DATA ANALYTICS

LAB RECORD
4/4 B.TECH (Computer Science and Engineering)

I Semester

Certificate

This is to certify that the experiments recorded in this book are the bonafide work
of …………………………………. student of …………..………………. carried
out in the subject ……………………. in the Bapatla Engineering College, Bapatla
during the year 2020 – 2021 of experiment recorded is…………….

Lecturer-in-charge
Date : Head of the Department
Department of Computer Science and Engineering

Bapatla Engineering College

INDEX
S.NO NAME OF THE EXPERIMENT PAGE NO
1) Hadoop Installation 3-5

2) Hadoop Commands 6-9

3) MapReduce - WordCount 10 - 20

4) PIG Installation & PIG Operators 21 - 30

5) PIG Scripts: WordCount, CardCount, 31 - 33

MaxTemp
6) PIG UserDefinedFunctions - UDF 34
Class:4/4 B.Tech Regd No: 3

Hadoop Installation steps:

Step-1:-
1)To update the packages from the repositories
sudo apt-get update
2) Java installation
sudo add-apt-repository ppa:linuxprising/java
sudo apt-get install oracle-java15-installer
Step-2:-
1)After installing Java we need to set Java Path for that:
We need to set java path in barsch file.
To open barsch file command is :- gedit ~/.bashrc
2)Go to bottom of the file in bashrc and set below commands.
For Java path setting command is :- export JAVA_HOME=our Java Path(Ex:/usr/lib/jvm/java-15-oracle)
To Know the java path command is :- $JAVA_HOME (Type this on terminal)
Step-3:-
Install SSH using following command :
sudo apt-get install ssh
First, we have to generate RSA, an SSH key for user :
ssh -keygen -t dsa -P “”
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step-4:-
1)Download Hadoop latest version from apache sotware downloads page and download hadoop-3.3.0.tar.gz
binary tar file.
2)Now extract the tar file in the downloads location by using below command:
sudo tar -xzvf /home/Ubuntu/Downloads/hadoop-3.3.0.tar.gz
3)Now, rename the extracted file as Hadoop and move that hadoop file to the location where your user was
created.
sudo mv /home/Ubuntu/Downloads/hadoop /home/usr/local/hadoop
4)We need to copy this Commands to bashrc file:-(on bascrh file)
export HADOOP_INSTALL=/usr/local/Hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 4

export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
Step-5:-
Copy these Commands on terminal
sudo mkdir -p /usr/local/hadoopdata/hdfs/namenode
sudo mkdir -p /usr/local/hadoopdata/hdfs/datanode
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp (Instead of hadoop1:hadoop1 give your user name
ex: user’s_username: group name)
sudo chmod 750 /app/hadoop/tmp
Step-6:-
Modify these files on /usr/local/hadoop/etc/Hadoop
1) core-site.xml
2) hadoop-env.sh
3) mapred-site.xml
4) hdfs-site.xml

core-site.xml :- (copy below code in between <configuration>....</configuration>)

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
hadoop-env.sh:-
export JAVA_HOME=/usr/lib/jvm/java-8-oracle(Java path)
mapred-site.xml:- (You need to remove template from extension)
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 5

hdfs-site.xml:-
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/usr/local/hadoopdata/hdfs/datanode</value>
</property>
Step-7:-
➢ source ~/.bashrc
➢ hadoop namenode -format
➢ start-all.sh (This will start all thenodes)
➢ jps (to check all the nodes, total - 6)
NOTE: If datanode is not started:
First stop all the services using below command:
STEP 1: Go to /app/hadoop/tmp/ location
Delete the all folders
STEP2: IN TERMINAL
sudo chmod -R 755 /app/hadoop
Start all services using commands mentioned above
If you get all the 6 nodes then your hadoop is Ready....

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 6

Hadoop Commands:
1) Print the Hadoop version
Syntax: hadoop version
ex: hadoop version
2) To create adirectory
Syntax: hadoopfs -mkdir [-p] <path>
-p: create parent directory along the path
Ex: hadoop fs -mkdir /hadoop/bec
3) List the contents in human readable format
Syntax: hadoop fs -ls [-R][_h] <path>
Ex: hadoop fs -ls /hadoop
4) Upload a file
Syntax: hadoop fs -put <local src><dest>
Ex: hadoop fs -put /home/Ubuntu/Desktop/hadoop.txt /hadoop/bec/hadop.txt
5) Download a file
Syntax: hadoop fs -get <src> <local dest>
Ex: hadoop fs -get /hadoop/bec/jes.txt .home/ubuntu/doc.txt
6) View the content of the file
Syntax: hadoop fs -cat <path (filename)>
Ex: hadoop fs -cat /hadoop/bec/hadop.txt
7) Copy the file from src to destination within hdfs
Syntax: hadoop fs -cp [-f] <hdfssrc> <hdfsdest>
-f: to overwrite the file if already exists
Ex: hadoop fs -cp /hadoop/fds.txt /hadoop/bec/
8) Move the file from src to destination within the Hdfs
Syntax: hadoop fs -mv <hdfssrc> <hdfsdest>
Ex: hadoop fs -mv /hadoop/bec/hadoop.txt /hadoop/
9) Copy from local to hdfs:
Syntax: hadoop fs -copyFromLocal [-f] <local src> <dest>
-f: to overwrite the file if already exists
Ex: hadoop fs -copyFromLocal /home/Ubuntu/Desktop/hadoop.txt /hadoop/bec/hadop.txt
10) Copy to local from hdfs
Syntax: hadoop fs -copyToLocal [-f] <hdfssrc> <dest>
-f: to overwrite the file if already exists

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 7

Ex: hadoop fs -copyToLocal /hadoop/bec/jes.txt .home/ubuntu/doc.txt

11) To remove a directory or folder from the hdfs
Syntax: hadoop fs -rm [-R] [-skipTrash] <path>
-skipTrash: to delete permanently
To remove textfile
Ex: hadoop fs -rm /hadoop/bec/hadop.txt
To remove a folder
Ex: hadoop fs -rm -R /hadoop/bec
12) To delete non empty files
Syntax: hadoop fs -rmdir [--ignore-fail-0n-non-empty] <path>
--ignore-fail-0n-non-empty: don’t fail even some fails are contain.
Ex: hadoop fs -rmdir /hadoop/bec/cse
13) Display last few lines of a file in hdfs
Syntax: Hadoop fs -tail [-f] <path>
-f: will output the data as the file grows
Ex: hadoop fs -tail /hadoop/bec/hadop.txt
14) Display disk usage of files and directories
Syntax: hadoop fs -du [-s] [-h] <path>
-s: give aggregate summary of file length
-h: human readable format file
Ex: hadoop fs -du /hadoop/bec/hadop.txt
15) To empty the trash of hdfs
Syntax: hadoop fs -expunge
Ex: hadoop fs -expunge

16) Create empty file in hdfs

Syntax: hadoop fs -touchz<path>
Ex: hadoop fs -touchz /hadoop/bec/empty.txt
17) Validate or perform various tests on the files
Syntax: hadoop fs -test -[defsz] <path>
-d: check if path is a directory or not
-e: check if path is exists or not
-f: check if the path is a file or not
-s: check if path is not empty ornot

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 8

-z: check if the file is zero length ornot

Ex: hadoop fs -test -d /hadoop/bec
18) Collect specific information about the file
Syntax: hadoop fs -stat [format] <path>
%b: size of the file in bytes
%F: will return file or directory
%g: group name
%n: file name
%o: HDFS block size in bytes
%r: replication factor
%u: username of owner
%y: UTC date as “yyyy-MM-ddHH:mm:ss”
Ex: hadoop fs -stat “%b %F %g %n %o %r %u %y %Y” /hadoop/bec
19) Changes the replication factor of a file in hadoop
Syntax: hadoop fs -setrep [-W] <number><path>
-w flag requests that the command wait for replication to complete
Ex: hadoop fs -setrep /hadoop/bec
20) Count no of directories, files and bytes in specific path
Syntax: hadoop fs -count [-h] <path>
-h: human readable format
Ex: hadoop fs -count /hadoop/bec
21) Merging multiple files into one
Syntax: hadoop fs -getmerge [-n1] <src><localdest>
-n1: can be set to enable adding a new line at the end of each file
Ex: hadoop fs -getmerge –n1/hadoop/bec/hadop.txt
22) Change group association of files
Syntax: hadoop fs -chgrp [-R] <path>
Ex: hadoop fs -chgrp /hadoop/bec/hadop.txt
23) Move local files into hdfs filesystem
Syntax: hadoop fs -moveFromLocal <local src> <dest>
Ex: hadoop fs -moveFromLocal /home/Ubuntu/details.txt /hadoop/bec/details.txt
24) Information about particular command in hdfs
Syntax: hadoop fs -usage <command>
Ex: hadoop fs -usage rm

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 9

25) Change the user and group ownership of the folders

Syntax: hadoop fs -chown [-R] [owner]:[group] <path of the folder>
Ex: hadoop fs -chown -R hadoop:hadoop /hadoop/bec
26) Check the file system Disk space usage (displays freespace)
Syntax: hadoop fs -df [-h] <path>
Ex: hadoop fs -df /hadoop/bec/
27) Displays the content of the file
Syntax: hadoop fs -text <src>
Ex: hadoop fs -text /hadoop/bec/hadop.txt
28) Merge the content of local file to another file in hdfs
Syntax: hadoop fs -appendToFile <local src> <dest>
EX: hadoop fs -appendToFile /home/lavanya/details.txt /hadoop/bec/hadop.txt
29) Change the permissions of files
Syntax: hadoop fs -chmod [-R] <mode> <path>
Ex: hadoop fs -chmod –R 777 /hadoop
30) List all hadoop file system commands
Syntax: hadoop fs
Ex: hadoop fs
31) Commands information (commands about commands)
Syntax: hadoop fs -help
Ex: hadoop fs -help

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 10

Map-reduce word count program:

Step-1:-
1) Install eclipse using the below commands:
sudo apt-get update
sudo apt-get install eclipse
(or)
We can also install it using the snap:
sudo snap install --classic eclipse
2) Open eclipse from the applications after installing it.

3)After opening the workspace, click on “File” in the menu tab and click click on “New” then select “Java
Project”

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 11

4)Provide a name for the project (say, wordcount) and leave other settings as default and then click on
“Finish”.

Step-2:-
1) Creating a package under the project.
➢ Right click on project name (say, wordcount), click on “New” and then select “Package”.

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 12

2) Provide a name to the package (say, Demo) and click “Finish”.

3) Right click on project name, click on “New” and then select “Class”

4) Provide a name for the class (say, WCount), leave other settings as default and click on “Finish”.

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 13

Step-3:-
WCount.java:
package Demo;
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WCount {

public static class Map extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable> {

@Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter)
throws IOException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
output.collect(value, new IntWritable(1));
}

}
}

public static class Reduce extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 14

throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}

output.collect(key, new IntWritable(sum));

}
}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(WCount.class);

conf.setJobName("wcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}
}

Note: We may get errors at the import statements as we have not inculded the requires jars in project’s
configuration, Follow the below step to do so.

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 15

Step-4:-
1) We need to include some external jar files to our project.
➢ There are two jar files which can be used for this program. They can be downloaded from the
following links given below:
➢ https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.3.0
➢ https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core/3.3.0
➢ These can be included in our project by configuring the build and adding then as external libraries to
the classpath.
2) Right click on the project and then click on “Build Path”

3) In java build path, click on “Libraries” tab and select “Classpath”.

Note: See that that the options on right side are highlighted.
4) Click on “Add External JARs…”

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 16

5) Select the downloaded the jar files and then click on “open”

6) See that the jars are added to the classpath and click on “Apply and Close”

Note: See that the errors are solved in the java class after adding the jars in classpath.

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 17

Step-5:-
1) Now, we need to convert our project into a java jar file. Right click on the project name and then select
“Export”.

2) In Export window, select “Java” and then select “JAR File” from dropdown menu.

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 18

3) In file specification window, select the path and provide a name (say, wcount.jar) for the jar file which is
to be exported.

Note: See that the jar file is saved in the given path.
Step-6:-
1) Create a text document with some text which will be considered as an input for our program.

2) Start the hadoop daemons and move the text file to any hdfs directory and check if it is moved or not
using the below commands.
➢ start-all.sh
➢ jps
➢ hadoop fs -put /home/hayath/Desktop/hayath.txt /hadoop/
➢ hadoop fs -cat /hadoop/hayath.txt

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 19

3) Now we can execute the jar file with the below command by specifying the input_text_file and an
output_dir to save the output.
Syntax: hadoop jar <JAR_FILE_PATH(In local file system)> Package_Name.Class_Name
<Input_Text_File_Path(In HDFS)> <Output_Dir_Path(In HDFS)>
Ex:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 20

4) Check that the specified output folder is created in hdfs. If found, open the directory and look out for the
“part-00000” file and download it. This file gives the output of the program which specifies the count of
each word in the given input file’s text.
Type below commands to view the output file or simply browse the hdfs file system and download the “part-
00000” file to view it in your system.

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 21

Pig Installation Steps:

➢ Download the pig-0.16.0.tar.gz from the link
➢ http://www-us.apache.org/dist/pig/pig-0.16.0/
➢ Extract the zip file and copy and paste it in the usr/local/ directory.
➢ Now Open the bashrc file and copy the following content in it and save it.
o #pig_instalation_steps
o export PIG_HOME="/usr/local/pig"
o export PIG_CONF_DIR="$PIG_HOME/conf"
o export PIG_CLASSPATH="$PIG_CONF_DIR"
o export PATH="$PIG_HOME/bin:$PATH"

➢ pig –version : To know about the details of the pig version installed
➢ Pig Modes:
1. Local Mode: pig -x local;
2. Mapreduce Mode: pig -x mapreduce (or) pig

Pig Operators:
Processing Operators:
Loading and Storing Data:
For example Let us consider the stud.txt which contains the following content:
001,Rajiv,Reddy,21,9848022337,Hyderabad
002, Siddhartha, Battacharya, 22, 9848022338, Kolkata
003, Rajesh, Khanna, 22, 9848022339, Delhi
004, Preethi, Agarwal, 21, 9848022330, Pune
005, Trupthi, Mohanthy, 23, 9848022336, Bhubaneswar
006, Archana, Mishra, 23, 9848022335, Chennai
007, Komal, Nayak, 24, 9848022334, Trivandrum
008, Bharathi, Nambiayar, 24, 9848022333, Chennai
Load operator:
Syntax:
Relation_name = LOAD 'Input file path' USING function as schema;
Example:
Student = load '/home/ubuntu/Desktop/stud' using PigStorage (',') as (id: int, fname:chararray,
lname:chararray, age, contact,city);
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 22

Store operator:
Syntax:
STORE Relation_name INTO ' required_directory_path ' [USING function];
Example:
store student into '/home/ubuntu/Desktop/det' using PigStorage(',');
Output:

Diagnostic Operators:
Dump operator:
Syntax:
Dump Relation_Name;
Example:
dump student;
Output:

Describe Operator:
Syntax:
describe Relation_Name;
Example:
describe student;
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 23

Filtering Data:
Syntax:
Relation2_name = FILTER Relation1_name BY (condition);
Example1:
filter_data = FILTER student BY city == 'Chennai';
dump filter_data;
Output:

Example2:
filter_age = FILTER student BY age == '21';
dump filter_age;
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 24

Foreach Operator:
Syntax:
Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);
Example:
Grunt> foreach_data = foreach student generate id, firstname, city;
Grunt> dump foreach_data;
Output:

Grouping and Joining Data

Group operator:
Syntax:
Group_data = GROUP Relation_name BY age;
Example:
Grunt> Group_data = GROUP student BY age;
Grunt> dump Group_data;
Grouping by Multiple Columns:
Syntax:
relation_2 = GROUP Relation_name by (age, city);
Example:
group_multiple = GROUP student by (age, city);
dump group_multiple;
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 25

Group All:
Syntax:
group_all = GROUP All;
Example:
group_all = GROUP student All;
dump group_all;
Output:

Join Operator:
Let us consider the two files Customers.txt and Orders.txt
customers.txt:
1, Ramesh,32,Ahmedabad,2000.00
2,Khilan,25,Delhi,1500.00
3,kaushik,23,Kota,2000.00
4,Chaitali,25,Mumbai,6500.00
5,Hardik,27,Bhopal,8500.00
6,Komal,22,MP,4500.00
7,Muffy,24,Indore,10000.00
orders.txt:
102,2009-10-08 00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2060
And we load these two files as follows:
customers = LOAD '/home/ubuntu/Desktop/customers’ USING PigStorage (',') as (id:int, name:chararray,
age:int, address:chararray, salary:int);
orders = LOAD '/home/ubuntu/Desktop/orders' USING PigStorage(',')as (oid:int, date:chararray,
customer_id:int, amount:int);
InnerJoin:
Syntax:
Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key ;
Example:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 26

coustomer_orders = JOIN customers BY id, orders BY customer_id;

dump orders;
Self Join:
Syntax:
Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key ;
Example: Load the customer.txt into two files customer1 and customer2 as follows:
customers1 = LOAD '/home/ubuntu/Desktop/customers' USING PigStorage(',') as (id:int, name:chararray,
age:int, address:chararray,salary:int);
customers2 = LOAD '/home/ubuntu/Desktop/customers' USING PigStorage(',') as (id:int, name:chararray,
age:int, address:chararray, salary:int);
customers3 = JOIN customers1 BY id, customers2 BY id; dump customers3;
Output:

Outer Join:
Left Outer Join:
Syntax:
Relation3_name = JOIN Relation1_name BY id LEFT OUTER, Relation2_name BY customer_id;
Example:
outer_left = JOIN customers BY id LEFT OUTER, orders BY customer_id; dump outer_left;
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 27

Right Outer Join:

Syntax:
Relation3_name = JOIN Relation1_name BY id RIGHT OUTER, Relation2_name BY customer_id;
Example:
outer_right= JOIN customers BY id RIGHT OUTER, orders BY customer_id;
dump outer_right;
Output:

Full Outer Join:

Syntax:
Relation3_name = JOIN Relation1_name BY id FULL OUTER, Relation2_name BY customer_id;
Example:
outer_full= JOIN customers BY id FULL OUTER, orders BY customer_id; dump outer_full;
Output:

Union Operator:
stud.txt:
001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata
003,Rajesh,Khanna,9848022339,Delhi

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 28

004,Preethi,Agarwal,9848022330,Pune
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai.
stud1.txt:
007,Komal,Nayak,9848022334,trivendram.
008,Bharathi,Nambiayar,9848022333,Chennai.
Syntax:
Relation_name3 = UNION Relation_name1, Relation_name2;
Example:
student1 = LOAD '/home/ubuntu/Desktop/stud' USING PigStorage (',') as (id:int,firstname:chararray,
lastname:chararray, phone:chararray, city:chararray);
student2 = LOAD '/home/ubuntu/Desktop/stud1' USING PigStorage (',') as (id:int,firstname:chararray,
lastname:chararray,phone:chararray, city:chararray);
student3 = UNION student1,student2;
dump student3;
Output:

Split Operator:
Syntax:
SPLIT Relation1_name INTO Relation2_name IF (condition1), Relation2_name(condition2);
Example:
student_details = LOAD '/home/ubuntu/Desktop/stu' USING PigStorage (',') as (id:int, firstname:chararray,
lastname:chararray, age:int, phone:chararray, city:chararray);
SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age<25);
dump student_details1;

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 29

dump student_details2;

Execution of script file in grunt shell:

Let us consider the hii.pig script file which contains:
student= load '/home/ubuntu/Desktop/stud' using PigStorage (',') as (id: int, fname:chararray,
lname:chararray, age,contact, city);
describe student;
dump student;
1. Using exec:
Syntax: exec file_path;
Example: exec /home/ubuntu/Desktop/hii.pig;
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 30

2. Using run:
Syntax: run file_path;
Example: run /home/ubuntu/Desktop/hii.pig;
Output:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 31

PIG Scripts:
WordCount:
wordcount.pig
lines = load '/home/ubuntu/Desktop/wordcount.txt' as (line: chararray);
words = FOREACH lines GENERATE FLATTEN (TOKENIZE (line)) as word;
grouped = group words by word;
wordcount = FOREACH grouped GENERATE group, COUNT (words);
dump wordcount;
Go to terminal
Grunt> exec /home/ubuntu/Desktop/wordcount.pig
Output:

Maxtemp:
maxtemp.pig
maxtmp =load '/home/ubuntu/Desktop/Maxtmp.txt' using PigStorage (',') as (year:int, tmp:int,
city:chararray);
maxtmp_year = group maxtemp by year;
max_tmp_yr = FOREACH maxtmp_year GENERATE group, MAX (maxtmp.tmp);
dump max_tmp_yr;
Goto terminal
Grunt> exec /home/ubuntu/Desktop/maxtemp.pig
Input:

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 32

Maxtemp.txt
1992,23,HYDERABAD
1996, 28,GOA
1992,53,KOLKATTA
1996, 53,MUMBAI
2013,25,BAPATLA
2018,45,GUNTUR
2013,42,ONGOLE
Output:

Card Count:
cardcount.pig
cards = load '/home/ubuntu/Desktop/Cardcount.txt' USING PigStorage (',') as (color: chararray, symbol:
chararray, num: int);
colors = group cards by color;
cardcount = foreach colors generate group, COUNT(cards.num);
dump cardcount;
Goto terminal
Grunt> exec /home/ubuntu/Desktop/cardcount.pig
Input:
Cardcount.txt
red,club,1
red,diamond,5

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 33

red,sprade,6
blue,sprade,7
blue,diamond,6
black,Sprade,9
black.Sprade,4
black,diamond,3

sym = group cards by symbol;

cou = foreach sym generate group, COUNT (cards.symbol);
dump cou;

Bapatla Engineering College, Bapatla

Class:4/4 B.Tech Regd No: 34

Pig UDFs:
Registering UDFs
--register_java_udf.pig
register ‘your_path_to_piggybank/piggybank.jar’;
divs = load ‘NYSE_dividends’ as (exchange:chararray, symbol:chararray, date:chararray,dividends:float);
Registering Python UDFs: (The Python script must be in your current directory)
--register_python_udf.pig
register ‘production.py’ using python as bballudfs;
players = load ‘baseball’ as (name:chararray, team:chararray, pos:bag{t:(p:chararray)}, bat:map[]);

Writing UDFs
Java UDFs:
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class UPPER extends EvalFunc
{
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try{
String str = (String)input.get(0);
return str.toUpperCase();
}catch(Exception e){
throw new IOException("Caught exception processing input row ", e);
}
}
}

Bapatla Engineering College, Bapatla

BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
BDA-ALLEXP (2)_merged
No ratings yet
BDA-ALLEXP (2)_merged
149 pages
Hadoop-HDFS-commands
No ratings yet
Hadoop-HDFS-commands
1 page
Experiment No 1
No ratings yet
Experiment No 1
13 pages
BDA
No ratings yet
BDA
88 pages
Exp1 Hirday Merged
No ratings yet
Exp1 Hirday Merged
102 pages
Bda Practical File
No ratings yet
Bda Practical File
28 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
lab manual
No ratings yet
lab manual
34 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
hadoop
No ratings yet
hadoop
6 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
EXP 1-2
No ratings yet
EXP 1-2
9 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
Hadoop1
No ratings yet
Hadoop1
15 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Basic UNIX Commands $touch File1 File2 File3
No ratings yet
Basic UNIX Commands $touch File1 File2 File3
3 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
HDFS
No ratings yet
HDFS
6 pages
Hadoop File Complte
No ratings yet
Hadoop File Complte
18 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
bda-manual
No ratings yet
bda-manual
33 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
BDA
No ratings yet
BDA
30 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
BIG data file
No ratings yet
BIG data file
28 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
big data
No ratings yet
big data
28 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
HADOOP RECORD 2024-FINAL
No ratings yet
HADOOP RECORD 2024-FINAL
59 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Big Data File
No ratings yet
Big Data File
16 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Data Science
No ratings yet
Data Science
82 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Lom Log
No ratings yet
Lom Log
277 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
dumpstate
No ratings yet
dumpstate
11 pages
DF-L08-Working With Windows and CLI Systems
No ratings yet
DF-L08-Working With Windows and CLI Systems
78 pages
FINAL MANAGEMENT DOCUMENTS
No ratings yet
FINAL MANAGEMENT DOCUMENTS
15 pages
Gmail - Amazon - Online Test Invitation
No ratings yet
Gmail - Amazon - Online Test Invitation
1 page
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Dumpsys ANR WindowManager
No ratings yet
Dumpsys ANR WindowManager
45 pages
Module - 4 - File Systems
100% (1)
Module - 4 - File Systems
18 pages
AdaBoost
No ratings yet
AdaBoost
2 pages
Arrange Map Code
No ratings yet
Arrange Map Code
2 pages
Applog
No ratings yet
Applog
25 pages
Batch gradient
No ratings yet
Batch gradient
3 pages
Unit-5 Notes (OS)
No ratings yet
Unit-5 Notes (OS)
23 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
DirectFileTopicDownload 1
No ratings yet
DirectFileTopicDownload 1
34 pages
STOPDecrypter Log
No ratings yet
STOPDecrypter Log
20 pages
File Allocation Table
No ratings yet
File Allocation Table
22 pages
NTFS CHKDSK Best Practices and Performance
No ratings yet
NTFS CHKDSK Best Practices and Performance
17 pages
Multiboot Usb
No ratings yet
Multiboot Usb
27 pages
Rsync - Keeping Linux Files and Directories in Sync With Rsync - Enable Sysadmin
No ratings yet
Rsync - Keeping Linux Files and Directories in Sync With Rsync - Enable Sysadmin
6 pages
Oracle JD Edwards - Deployment Server Installation
No ratings yet
Oracle JD Edwards - Deployment Server Installation
12 pages
Hbase-1.1.2-Installation Guide-On-Hadoop-2.x
No ratings yet
Hbase-1.1.2-Installation Guide-On-Hadoop-2.x
7 pages
Linux Commands
No ratings yet
Linux Commands
10 pages
Kali Linux Commands
No ratings yet
Kali Linux Commands
8 pages
Introduction Into Files and Folders (Directory)
100% (1)
Introduction Into Files and Folders (Directory)
17 pages
Unit VIII File System 1. File Concept
No ratings yet
Unit VIII File System 1. File Concept
11 pages
AIX Disk Cloning FAQ
No ratings yet
AIX Disk Cloning FAQ
2 pages
Readme
No ratings yet
Readme
10 pages
EASEUS - Data.Recovery - Wizard.Professional.v5.5.1.Retail (Projectmyskills) Torrent
No ratings yet
EASEUS - Data.Recovery - Wizard.Professional.v5.5.1.Retail (Projectmyskills) Torrent
3 pages
Ps2 Burning For Dummies
No ratings yet
Ps2 Burning For Dummies
2 pages
Sticky Bit: $ Chmod 2770 /data/myweb
No ratings yet
Sticky Bit: $ Chmod 2770 /data/myweb
1 page
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Command Promt Commands
No ratings yet
Command Promt Commands
5 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Patching SOMAFI
No ratings yet
Patching SOMAFI
2 pages
Test King LPI 201 Exam v1.0
No ratings yet
Test King LPI 201 Exam v1.0
32 pages
Backup and Recovery Basics Presentation
No ratings yet
Backup and Recovery Basics Presentation
15 pages

BDA Record (1)

Uploaded by

BDA Record (1)

Uploaded by

ADVANCED DATA ANALYTICS

Bapatla Engineering College

2) Hadoop Commands 6-9

4) PIG Installation & PIG Operators 21 - 30

5) PIG Scripts: WordCount, CardCount, 31 - 33

Hadoop Installation steps:

Bapatla Engineering College, Bapatla

core-site.xml :- (copy below code in between <configuration>....</configuration>)

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Ex: hadoop fs -copyToLocal /hadoop/bec/jes.txt .home/ubuntu/doc.txt

16) Create empty file in hdfs

Bapatla Engineering College, Bapatla

-z: check if the file is zero length ornot

Bapatla Engineering College, Bapatla

25) Change the user and group ownership of the folders

Bapatla Engineering College, Bapatla

Map-reduce word count program:

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

2) Provide a name to the package (say, Demo) and click “Finish”.

Bapatla Engineering College, Bapatla

public class WCount {

public static class Map extends MapReduceBase implements

String line = value.toString();

public static class Reduce extends MapReduceBase implements

Bapatla Engineering College, Bapatla

output.collect(key, new IntWritable(sum));

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(WCount.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

Bapatla Engineering College, Bapatla

3) In java build path, click on “Libraries” tab and select “Classpath”.

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Pig Installation Steps:

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Grouping and Joining Data

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

coustomer_orders = JOIN customers BY id, orders BY customer_id;

Bapatla Engineering College, Bapatla

Right Outer Join:

Full Outer Join:

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Execution of script file in grunt shell:

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

sym = group cards by symbol;

Bapatla Engineering College, Bapatla

Bapatla Engineering College, Bapatla

You might also like