0% found this document useful (0 votes)

647 views

Infosphere DataStage Hive Connector To Read Data From Hive Data Sources

IBM Information Server - Learn how to use Hive Connector in your jobs

Uploaded by

michael breion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

647 views

Infosphere DataStage Hive Connector To Read Data From Hive Data Sources

IBM Information Server - Learn how to use Hive Connector in your jobs

Uploaded by

michael breion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Infosphere DataStage Hive Connector to read data

from Hive data sources

Alekhya Telekicherla ([email protected]) 22 March 2017
Software Developer
IBM

Pallavi Koganti ([email protected])

Software Developer
IBM

Srinivas Mudigonda ([email protected])

Lead Software Developer
IBM India Pvt Ltd

Sunil Kumar Mogulla ([email protected])

Application Developer
IBM

This article describes a solution that is based on integration of the IBM InfoSphere DataStage
with Apache Hive. Data can be fetched from various Hive data sources into Information Server
modules for more processing. You will learn how IBM InfoSphere Information Server can be
used to perform read operation on Hive data source. This step-by-step guide helps you create,
configure, compile, and execute DataStage Hive Connector jobs that can read the data from
Apache Hive.

Introduction
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis. Apache Hive supports analysis of large datasets stored in
Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It supports queries
expressed in a language called HiveQL, which automatically translates SQL-like queries into
MapReduce jobs executed on Hadoop. We need an efficient solution to move information from
different Hive data sources to ETL space to perform further operations.

The integration of IBM InfoSphere DataStage with Apache Hive is achieved by the Infosphere Hive
connector, which is a datastage component. The Hive Connector stage helps in fetching the data
from Hive and then pass this data to other Information Server modules for more ETL processing.
This solution helps the Hive users to make intelligent business decisions based on the data.

© Copyright IBM Corporation 2017 Trademarks

Infosphere DataStage Hive Connector to read data from Hive Page 1 of 8
data sources
developerWorks® ibm.com/developerWorks/

Configuring Hive Connector in Read mode

Hive Connector supports normal read and partitioned read in the form of both Generated SQL and
user-defined SQL.

This section demonstrates a sample use case which performs read operation on Hive using Hive
Connector Stage. This datastage job includes a Hive Connector stage that specifies details about
accessing Apache Hive and a sequential file stage where data extracted to. Read mode of Hive
CC in a Datastage job supports only one output link.

1. Generated SQL

The detailed description of the steps required to read data using generated SQL mode from Hive is
as follows.
Figure 1. Hive Connector Read job

Setting up Hive Connector properties

1. In Properties tab, select "Generated SQL at run time" to yes and provide value for "Table
name" as shown below.
2. If the table is partitioned and if you want to utilize parallelism in the form of partitioned read,
select "Enable Partitioned Reads" to Yes.
Figure 2. Generated SQL Read properties

Infosphere DataStage Hive Connector to read data from Hive Page 2 of 8

data sources
ibm.com/developerWorks/ developerWorks®

3. The primary partition key is used by connector to utilize the parallelism. In this case, pc1 is the
primary partition column and the statements generated will be of the following format:
select c1, c2 from part_test4 where pc1=1, where pc1 here is the primary partition column of
the table.
4. Note that the statement generated by Hive Connector is in regular SQL format, not in HiveQL
format. The conversion from SQL to Hive QL will be handled by the driver internally.
5. Under Output, provide the column name and type details of the columns that you want to
extract, as follows:
Figure 3. Column Properties

6. Provide file name details in the Sequential file.

7. Compile and run the job.
Figure 4. Job Execution1

8. The output is seen as follows

Figure 5. Output Rows

2. User-defined SQL

The detailed description of the steps required to read data using user-defined SQL mode from
Hive is as follows.

Infosphere DataStage Hive Connector to read data from Hive Page 3 of 8

data sources
developerWorks® ibm.com/developerWorks/

Figure 6. Hive Connector Read job 2

Setting up Hive Connector properties

1. In Properties tab, set "Generated SQL at run time" to no.

2. Provide the read statement that needs to be executed under "Select Statement" property.
3. If the table is partitioned and if you want to utilize parallelism in the form of partitioned read,
select "Enable Partitioned Reads" to Yes.
Figure 7. User Defined SQL Read properties

i) In case of partitioned read, provide "Select Statement" in the following format

"select c1,c2 from part_test4 where pc1=[[part-value]]" where pc1 is the primary partition
column and [[part-value]] is the placeholder which will be replaced by the values in the
partition column during job run.
ii) Note that the connector accepts only primary or the first partition column of the table as the
partition column for the select statement.
iii) Incase the table is not partitioned, then the job aborts as the user- defined query is no
longer valid.
4. Under Output, provide the column name and type details of the columns that you want to
extract, as follows:

Infosphere DataStage Hive Connector to read data from Hive Page 4 of 8

data sources
ibm.com/developerWorks/ developerWorks®

Figure 8. Column Properties

5. Provide file name details in the Sequential file.

6. Compile and run the job
Figure 9. Job Execution2

7. The output is seen as follows

Figure 10. Output Rows2

Infosphere DataStage Hive Connector to read data from Hive Page 5 of 8

data sources
developerWorks® ibm.com/developerWorks/

Resources
• Infocenter link: https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.5.0/
com.ibm.swg.im.iis.conn.hive.usage.doc/topics/hive_connector_top_of_nav.html

Infosphere DataStage Hive Connector to read data from Hive Page 6 of 8

data sources
ibm.com/developerWorks/ developerWorks®

About the authors

Alekhya Telekicherla

Alekhya Telekicherla is a Software developer working in the IBM InfoSphere

Information Server Connectivity team. She has around 7 years of experience in IBM
in the Data Integration domain, worked on development of various connectors like
MDM, Hive, ODBC and Sybase. She has a Bachelors degree in Computer Science
Engineering from IIT Guwahati.

Pallavi Koganti

Pallavi Koganti is a developer working in the Data Integration portfolio in the

IBM Infosphere Information Server. She has 11 years of experience in software
development. Having worked on various domains like Network and Systems
management to Data Integration, she is always interested in working on latest
technologies. She holds a Masters Degree (MCA) from Andhra University.

Srinivas Mudigonda

Srinivas Mudigonda is a lead developer working in the Data Integration portfolio in

the IBM InfoSphere Information Server. He has over 16 years of experience in the IT
industry and has varied experience ranging from the Distributed File Systems to the
Data Integration domain. He is always fascinated by the latest technologies and is
keen on leveraging the latest technologies in solving the complex customer problems.
He has a Bachelors degree in Electrical and Electronics Engineering (Hons.) from
BITS Pilani.

Sunil Kumar Mogulla

Sunil K Mogulla has around 6 years of experience as a Senior QA in IBM Information

Server, handling various Datastage connectors like Hive, File, JDBC, Oracle, ODBC
and Streams. Involved in implementation of Hadoop solutions using Information
server. He has worked as Oracle PLSQL developer for 3 years and supported in
performance tuning and design areas using Oracle Database. Certified Oracle
Associate with Developer Track includes SQL and PLSQL.

Infosphere DataStage Hive Connector to read data from Hive Page 7 of 8

data sources
developerWorks® ibm.com/developerWorks/

(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Infosphere DataStage Hive Connector to read data from Hive Page 8 of 8

data sources

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Slowly Changing Dimension (SCD)
No ratings yet
Slowly Changing Dimension (SCD)
4 pages
Navman VHF 7100 Manual
100% (3)
Navman VHF 7100 Manual
52 pages
Datastage - Parameters - Schema Files
No ratings yet
Datastage - Parameters - Schema Files
23 pages
SnapLogic Second Edition
From Everand
SnapLogic Second Edition
Gerardus Blokdyk
No ratings yet
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
Datastage Qa
No ratings yet
Datastage Qa
2 pages
PTC Interview Questions
No ratings yet
PTC Interview Questions
4 pages
22 Informatica Interview Questions and Answers
No ratings yet
22 Informatica Interview Questions and Answers
5 pages
Imp Datastage New
No ratings yet
Imp Datastage New
153 pages
Datastage
No ratings yet
Datastage
12 pages
Datastage Functions and Routines
100% (1)
Datastage Functions and Routines
9 pages
Sandy's DataStage Notes
No ratings yet
Sandy's DataStage Notes
23 pages
SQL Basic to Advance Interview Question and Answer 1731934628
No ratings yet
SQL Basic to Advance Interview Question and Answer 1731934628
12 pages
000 421
No ratings yet
000 421
46 pages
Interview Questions For Oracle
No ratings yet
Interview Questions For Oracle
17 pages
Detailed UNIX Commands
No ratings yet
Detailed UNIX Commands
15 pages
Ds Questions
No ratings yet
Ds Questions
11 pages
Datastage 7.5 Certification
No ratings yet
Datastage 7.5 Certification
5 pages
Issues Datastage
No ratings yet
Issues Datastage
4 pages
Ibm Infosphere Datastage V8.0.1 Training/Workshop: Course Description
No ratings yet
Ibm Infosphere Datastage V8.0.1 Training/Workshop: Course Description
2 pages
Datastage 000-418
No ratings yet
Datastage 000-418
39 pages
Partitioning in Datastage
No ratings yet
Partitioning in Datastage
27 pages
Datastage 8 Dumps
No ratings yet
Datastage 8 Dumps
51 pages
Top 50 Informatica Interview Questions
No ratings yet
Top 50 Informatica Interview Questions
7 pages
Data Stage Scenarios: Scenario1. Cummilative Sum
No ratings yet
Data Stage Scenarios: Scenario1. Cummilative Sum
13 pages
DataStage Interview Questions
No ratings yet
DataStage Interview Questions
3 pages
DataStage Interview Question
No ratings yet
DataStage Interview Question
9 pages
What Is Difference Between Server Jobs and Parallel Jobs? Ans:-Server Jobs
No ratings yet
What Is Difference Between Server Jobs and Parallel Jobs? Ans:-Server Jobs
71 pages
Test Dump PDF
100% (1)
Test Dump PDF
21 pages
T13 - Tableau Handson
No ratings yet
T13 - Tableau Handson
6 pages
Set
No ratings yet
Set
49 pages
Datastage Realtime Projects5
No ratings yet
Datastage Realtime Projects5
32 pages
Data Stage
No ratings yet
Data Stage
3 pages
DataStage NOTES
No ratings yet
DataStage NOTES
165 pages
DataStage PPT
No ratings yet
DataStage PPT
94 pages
Performance Tuning of Datastage Parallel Jobs
No ratings yet
Performance Tuning of Datastage Parallel Jobs
12 pages
Info Sphere DataStage Parallel Framework Standard Practices
No ratings yet
Info Sphere DataStage Parallel Framework Standard Practices
460 pages
356 Eca DDQSJ WTs L8 ZLEmn A
No ratings yet
356 Eca DDQSJ WTs L8 ZLEmn A
23 pages
Data Stage Interview Questions
No ratings yet
Data Stage Interview Questions
15 pages
50+ TOP DataStage Interview Questions and Answers PDF
No ratings yet
50+ TOP DataStage Interview Questions and Answers PDF
2 pages
Ibm Infosphere Datastage Performance Tuning: Menu
No ratings yet
Ibm Infosphere Datastage Performance Tuning: Menu
9 pages
Datastage Enterprise Edition
No ratings yet
Datastage Enterprise Edition
374 pages
SQL Interview
No ratings yet
SQL Interview
73 pages
Data Stage Faqs
No ratings yet
Data Stage Faqs
47 pages
Datastage Errors and Resolution
No ratings yet
Datastage Errors and Resolution
10 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
DB2 Exam C2090-313 Practice Questions
From Everand
DB2 Exam C2090-313 Practice Questions
Robert Wingate
No ratings yet
Oracle SOA BPEL Process Manager 11gR1 A Hands-on Tutorial
From Everand
Oracle SOA BPEL Process Manager 11gR1 A Hands-on Tutorial
Ravi Saraswathi
5/5 (1)
DB2 Exam C2090-320 Practice Questions
From Everand
DB2 Exam C2090-320 Practice Questions
Robert Wingate
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Microsoft Power Platform For Dummies
From Everand
Microsoft Power Platform For Dummies
Jack A. Hyman
No ratings yet
Cloud Infrastructure and Data Center
From Everand
Cloud Infrastructure and Data Center
Duong Tran
No ratings yet
IBM WebSphere eXtreme Scale 6
From Everand
IBM WebSphere eXtreme Scale 6
Anthony Chaves
No ratings yet
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
SQL Server Interview Questions You'll Most Likely Be Asked
From Everand
SQL Server Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
D3165 en
No ratings yet
D3165 en
8 pages
Double Integrals
No ratings yet
Double Integrals
178 pages
Bensaci Aness Zehouani Yacine HAROUR Elamine Chahbi Imad: Group Members
No ratings yet
Bensaci Aness Zehouani Yacine HAROUR Elamine Chahbi Imad: Group Members
10 pages
Ringkasan MP PDF
No ratings yet
Ringkasan MP PDF
43 pages
PAD HERO Guitar Hero Using Arduino
No ratings yet
PAD HERO Guitar Hero Using Arduino
8 pages
08 Chapter 04 PDF
No ratings yet
08 Chapter 04 PDF
463 pages
Android Online Test 2
No ratings yet
Android Online Test 2
19 pages
Workshop Week02
No ratings yet
Workshop Week02
8 pages
AMT 2202 Final Learning Module 1
No ratings yet
AMT 2202 Final Learning Module 1
10 pages
Paradox of Choice - Schwartz.ebs
100% (1)
Paradox of Choice - Schwartz.ebs
23 pages
Zheng Word Part Technique
No ratings yet
Zheng Word Part Technique
8 pages
P - 7 English Lesson Notes PDF
50% (2)
P - 7 English Lesson Notes PDF
48 pages
Arthrokinematics
No ratings yet
Arthrokinematics
6 pages
21 IMSMS-16 CONTEXT OF THE ORGANIZATION
No ratings yet
21 IMSMS-16 CONTEXT OF THE ORGANIZATION
3 pages
Drinking Water Qualityfrom Water Vending Machinesin Selected Public Schoolsin Cebu City Philippines
No ratings yet
Drinking Water Qualityfrom Water Vending Machinesin Selected Public Schoolsin Cebu City Philippines
10 pages
Year 13 June 2021 Past Paper 1
100% (1)
Year 13 June 2021 Past Paper 1
16 pages
(Routledge Studies in Contemporary Philosophy) Sebastian Morello - Conservatism and Grace - The Conservative Case For Religion by Establishmen (2023, Routledge) - Libgen - Li
No ratings yet
(Routledge Studies in Contemporary Philosophy) Sebastian Morello - Conservatism and Grace - The Conservative Case For Religion by Establishmen (2023, Routledge) - Libgen - Li
306 pages
Q3-4-G8-Lesson-Plan Dressmaking
100% (1)
Q3-4-G8-Lesson-Plan Dressmaking
4 pages
Ring Plus Aqua Starter Gear - TERI
No ratings yet
Ring Plus Aqua Starter Gear - TERI
47 pages
EFIS-D100 Pilot's User Guide
No ratings yet
EFIS-D100 Pilot's User Guide
86 pages
BS (Medical Lab Technology)
No ratings yet
BS (Medical Lab Technology)
1 page
IN THE High Court of Tanzania (Land Division) AT Dar Es Salaam Land APPEAL NO.77 OF 2020
No ratings yet
IN THE High Court of Tanzania (Land Division) AT Dar Es Salaam Land APPEAL NO.77 OF 2020
23 pages
Standard Specification: Tecnicas Reunidas, S.A
No ratings yet
Standard Specification: Tecnicas Reunidas, S.A
9 pages
ယမ်းထုတ်လုပ်နည်းမျာ 1
No ratings yet
ယမ်းထုတ်လုပ်နည်းမျာ 1
30 pages
Resume Aslan Ahmadi
No ratings yet
Resume Aslan Ahmadi
2 pages
Craftsman 113.29570 Table Saw Manual
No ratings yet
Craftsman 113.29570 Table Saw Manual
40 pages
Unit 1 Engineering Design Presentation
No ratings yet
Unit 1 Engineering Design Presentation
15 pages
DETAILED LESSON PLAN IN CALCULUS 2 (XYLA BLESHY LESIRA AGABON) - Script
No ratings yet
DETAILED LESSON PLAN IN CALCULUS 2 (XYLA BLESHY LESIRA AGABON) - Script
9 pages
Syllabus UAE - VAT & TAX
No ratings yet
Syllabus UAE - VAT & TAX
15 pages

Infosphere DataStage Hive Connector To Read Data From Hive Data Sources

Uploaded by

Infosphere DataStage Hive Connector To Read Data From Hive Data Sources

Uploaded by

Infosphere DataStage Hive Connector to read data

from Hive data sources

Pallavi Koganti ([email protected])

Srinivas Mudigonda ([email protected])

Sunil Kumar Mogulla ([email protected])

© Copyright IBM Corporation 2017 Trademarks

Configuring Hive Connector in Read mode

Setting up Hive Connector properties

Infosphere DataStage Hive Connector to read data from Hive Page 2 of 8

6. Provide file name details in the Sequential file.

8. The output is seen as follows

Figure 5. Output Rows

Infosphere DataStage Hive Connector to read data from Hive Page 3 of 8

Figure 6. Hive Connector Read job 2

Setting up Hive Connector properties

1. In Properties tab, set "Generated SQL at run time" to no.

i) In case of partitioned read, provide "Select Statement" in the following format

Infosphere DataStage Hive Connector to read data from Hive Page 4 of 8

Figure 8. Column Properties

5. Provide file name details in the Sequential file.

7. The output is seen as follows

Figure 10. Output Rows2

Infosphere DataStage Hive Connector to read data from Hive Page 5 of 8

Infosphere DataStage Hive Connector to read data from Hive Page 6 of 8

About the authors

Alekhya Telekicherla is a Software developer working in the IBM InfoSphere

Pallavi Koganti is a developer working in the Data Integration portfolio in the

Srinivas Mudigonda is a lead developer working in the Data Integration portfolio in

Sunil Kumar Mogulla

Sunil K Mogulla has around 6 years of experience as a Senior QA in IBM Information

Infosphere DataStage Hive Connector to read data from Hive Page 7 of 8

© Copyright IBM Corporation 2017

Infosphere DataStage Hive Connector to read data from Hive Page 8 of 8

You might also like