7 Hive Notes
7 Hive Notes
Hive Shell
Type System
Primitive types
– Integers:TINYINT, SMALLINT, INT, BIGINT.
– Boolean: BOOLEAN.
– Floating point numbers: FLOAT, DOUBLE .
– String: STRING.
Complex types
– Structs: {a INT; b INT}.
– Maps: M['group'].
– Arrays: ['a', 'b', 'c'], A[1] returns 'b'.
Data Model Types - Tables
– Analogous to tables in relational database
– Actual data is stored in a Hadoop Filesystem and each
table has a corresponding HDFS dir
– Metadata is always stored in a meta store
– Managed Tables
• Hive physically moves data into its warehouse
– External Tables
• Hive refers data from existing location in HDFS
Example
– Managed Tables
• $ CREATE TABLE managed_table (dummy STRING);
– External Tables
• $ CREATE EXTERNAL TABLE external_table (dummy STRING)
LOCATION '/user/tom/external_table';
To List schema of a table
DESRIBE <tablename>;
Importing Data into HIVE
DDL Contd.,
Query Operations
Hive QL – Group By
pv_users
pageid age
pageid age count
1 25
1 25 1
2 25
2 25 2
1 32
1 32 1
2 25
• SQL:
SELECT pv.pageid, u.age
FROM page_view pv JOIN user u ON (pv.userid =
u.userid);
Outer Joins
• Left Outer Join
SELECT pv.pageid, u.age FROM page_view pv LEFT OUTER
JOIN user u ON (pv.userid = u.userid);