Mod 2
Mod 2
(Mod 2)
Prepared By: Dr. Kimmi
Kumari
Contents
• Introduction to SQOOP
• Hive
• Hbase
Overview of SQOOP in Hadoop
• The tool which is used to perform data transfer operations
• Transfer is from relational database management system
to Hadoop server
Features of the Sqoop
• Parallel Import/Export
• Import Results of an SQL Query
• Connectors For All Major RDBMS Databases
• Kerberos Security Integration
• Provides Full and Incremental Load
Sqoop Architecture
• Submission of the import/ export command
• Sqoop fetches data from different databases
• Performance of map tasks to load the data on to HDFS
Sqoop - Import All Tables
• Imports a set of tables from an RDBMS to HDFS
• The following syntax is used to import all tables.
$ sqoop import-all-tables (generic-args) (import-
args)
$ sqoop-import-all-tables (generic-args) (import-
args)
Sqoop Workflow
Sqoop - Import All Tables
• Every table in that database must have a primary key
field
• In order to verify all the table data to the userdb database
in HDFS, refer the below command:
$HADOOP_HOME/bin/hadoop fs -ls
Sqoop Export
• Exports a set of files from HDFS back to RDBMS
• Table must already exist
Modes of Sqoop Export
• Insert mode
• Update mode
Syntax for Sqoop Export
$ sqoop-export (generic-args)
(export-args)
Consider the
below table
The below command is used to export the table data
(which is in emp_data file on HDFS) to the employee table
in db database of Mysql database server.
$ sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee \
--export-dir /emp/emp_data
Importing data from MySQL to
HDFS
Step 1: Login into MySQL
Step 2: Create a database and table and insert data.
Step 3: Create a database and table in the hive where data
should be imported
Step 4: Run the import command on Hadoop.
Step 5: Check-in hive if data is imported successfully or not.
Apache Flume
• Collecting , Aggregating and Transporting Streaming
Data.
• Copies Streaming Data from various web server to HDFS.
• Ex : Data generated via Social Media , Messages of
Emails , Log files etc.
Challenges of PUT Command
• TIMESTAMP
• DATES
String Types
• STRING
• Varchar
• CHAR
Complex Type
• Struct
• Map
• Array
Hive DDL Commands
• Create
• Show
• Describe
• Use
• Drop
• Alter
• Truncate
Hive DML Commands
• LOAD
• SELECT
• INSERT
• DELETE
• UPDATE
• EXPORT
• IMPORT
Hive Joining tables
• Inner Join
• Left Outer Join
• Right Outer Join
• Full Outer Join
Inner Join in HiveQL
• Static
Partitioning
• Dynamic
Partitioning
Hive Static Partitioning
• Insert input data files individually into a partition table.
• Alteration of the partition in the static partition is allowed.
• Static partition is in Strict Mode.
Hive Dynamic Partitioning
• Efficient sampling
• Faster query responses
• Flexibility to keep the records in each bucket to be sorted
by one or more columns.
Limitations of Bucketing in Hive