Big Data Analytics: Welcome
Big Data Analytics: Welcome
BY
Introduction to Hive
language.
• Fetch Result-The execution engine receives the results from Data nodes
HIVEQL TO MAPREDUCE
Hive Framework
Data Analyst
rowcount, N
rowcount,1 rowcount,1
String Types
STRING
VARCHAR Only available starting with Hive 0.12.0
CHAR Only available starting with Hive 0.13.0
Strings can be expressed in either single quotes (‘) or double quotes (“)
Miscellaneous Types
BOOLEAN
BINARY Only available starting with Hive
HIVE DATA TYPES
Collection Data Types
STRUCT Similar to ‘C’ struct. Fields are accessed using dot notation.
E.g.: struct('John', 'Doe')
MAP A collection of key - value pairs. Fields are accessed using [] notation.
E.g.: map('first', 'John', 'last', 'Doe')
ARRAY Ordered sequence of same types. Fields are accessed using array index.
E.g.: array('John', 'Doe')
HIVE FILE FORMAT
• Sequential File:
Sequential files are flat files that store binary key-value pairs.
The Hive Query Language (Hive QL) is a query language for Hive to
process and analyze structured data in a Metastore
3. Evaluate functions.
• We have an active database present where we can create some tables as well. To do this,
first switch to the DB you want to use:
Hive deals with two types of table structures like Internal and External tables depending
Note: if the processing data available in local file system then we need to use internal table.
• There are two types of table:
Managed table:
External table
Internal or Managed table:
By Default, any table if you create is a managed table.
Hive> show tables;
trnxrecords
Customer
hive> describe formatted trnxrecords;
its going to say that it is Managed_ table.
Managed table means table is managed by hive.
Location of the table: user/hive/warehouse/vignan.db/trnxrecrords.
• To Create Internal Table( Managed table) :
• Hive>CREATE TABLE guruhive_internaltable (id INT,Name STRING);
• Row format delimited
• Fields terminated by '\t';
• Load the data into internal table:
Hive>LOAD DATA INPATH '/user/guru99hive/data.txt' INTO table
guruhive_internaltable;
• Display the content of the table :
Hive>select * from guruhive_internaltable;
To drop the internal table:
Hive>DROP TABLE from guruhive_internaltable;
EXTERNAL TABLE
1. No need to specify the location in case of managed table and if you drop
the managed table both data and table will be deleted.
Hive> drop table guruhive_internaltable ;//internal table
2. We need to specify the location in the external table and if the drop table
then only table will be dropped but data will be available.
hive>drop table guruhive_external ;//External table
ANALYZING DATA USING HIVE
• Let Us take some Transactional Data which is stored in A Desktop in a Linux
System
• $ CD Desktop
• Desktop$ vi txnsl.txt
Data Inside Txnsl.txt :
• 00000004,20-09-2020,40002613,0.98.81,Teamsport,Fieldhockey,Guntur,Andhapradesh,Credit
Custome Amount
Transaction Id Date of Category Product City State Credit
r Id Spent in
Transactio of Sport name
dollar
n
Just Imagine that We have 50,000 records in the file Txnsl.txt
• Let some customer data which is placed in the Desktop of Linux system.
• $ CD Desktop
• Desktop$ vi custs.txt
• $ vi custs.txt
• Data Inside custs.txt :
4000001,pavan,kumar,55,Pilot
• 50002341,Siva,Prasad,32,Lecturer
• 6234567,Nish,kala, 22, Software engineer
Imagine that we have around 55000 records like this in
the file Custs.txt
CREATE AND LOAD DATA INTO TXNRECORDS TABLE
Folder/Directory
TXNRECOR
DS
Txnsl.txt Files
• We can see this Information using commands:
• User/hive/warehouse/vignan.db/txnrecords---------- Table
• /txnsl.txt………….file
b.product
into more manageable parts known as buckets. So, we can use bucketing in
Hive> insert overwrite table bucket_cse Analyze this query and find
• Function Type :