Talend Tutorial12 Writing and Reading Data in HDFS

This tutorial demonstrates how to write data to HDFS and read data from HDFS using Talend Data Fabric. It involves: 1. Generating random customer data using a tRowGenerator component. 2. Writing the data to HDFS using a tHDFSOutput component. 3. Reading the data from HDFS using a tHDFSInput component with the same schema. 4. Sorting the data by customer ID using a tSortRow component. 5. Displaying the sorted data in the console using a tLogRow component. The job is then run to test the process.

Uploaded by

geoinsys

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views

Talend Tutorial12 Writing and Reading Data in HDFS

Uploaded by

geoinsys

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Talend Tutorial Task Aid >

Writing and Reading Data in HDFS

This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version
5.4.

1. Create a new standard Job

a. Ensure that the Integration perspective is selected.
b. To ensure that the Hadoop cluster connection and the HDFS connection metadata have
been created in the Project Repository, expand Hadoop Cluster.
c. In the Repository, expand Job Designs, right-click Standard, and click Create Standard Job.
In the Name field of the New Job wizard, type ReadWriteHDFS. In the Purpose field, type
Read/Write data in HDFS and in the Description field, type Standard job to write and read
customers data to and from HDFS and click Finish.
The Job opens in the Job Designer.

2. Add and configure a tRowGenerator component to generate

random customer data
a. To generate random customer data, in the Job Designer, add a tRowGenerator
component.
b. To set the schema and function parameters for the tRowGenerator component, double-
click the tRowGenerator_1 component.
c. To add columns to the schema, click the [+] icon three times and type the column names
as CustomerID, FirstName, and LastName. Next, you will configure the attributes for these
fields.
d. To change the Type for the CustomerID column, click the Type field and click Integer and
set the Functions field of the three columns to Numeric.random(int,int),
TalendDataGenerator.getFirstName(), and TalendDataGenerator.getLastName()
respectively.
e. In the table, select the CustomerID column, then, in the Functions parameters tab, set the
max value to 1000.
f. In the Number of Rows for RowGenerator field, type 1000, and click OK to save the
configuration.

Talend takes the complexity out of integration

Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

3. Write data to HDFS

For this, you will create a new tHDFSOutput component that reuses the existing HDFS
metadata available in the Project Repository.
a. From the Repository, under Metadata > HadoopCluster > MyHadoopCluster > HDFS, click
MyHadoopCluster_HDFS and drag it to the Job Designer.
b. In the Components list, select tHDFSOutput and click OK.
c. Create a flow of data from the tRowGenerator_1 component to the
MyHadoopCluster_HDFS component by linking the two components with the Main row
and then double-click the MyHadoopCluster_HDFS component to open the Component
view.
Note that the component is already configured with the pre-defined HDFS metadata
connection information.
d. In the File Name box, type /user/student/CustomersData and in the Action list, select
Overwrite.
The first subjob to write data to HDFS is now complete. It takes the data generated in the
tRowGenerator you created earlier, and writes it to HDFS using a connection defined using
metadata.

4. Read data from HDFS

Next, you will build a subjob to read the customer data on HDFS, sort them, and display them
in the console. To read the customer data from HDFS, you will create a new tHDFSInput
component that reuses the existing HDFS metadata available in the Project Repository.
a. From the Repository, under Metadata > HadoopCluster > MyHadoopCluster > HDFS, click
MyHadoopCluster_HDFS and drag it to the Job Designer.
b. In the Components list, select tHDFSInput and click OK.
c. To open the component view of the MyHadoopCluster_HDFS input component, double-
click the MyHadoopCluster_HDFS input component.
Note that the component is already configured with the pre-defined HDFS metadata
connection information.
d. In the File Name box, type /user/student/CustomersData.

Talend takes the complexity out of integration

Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

5. Specify the schema in the MyHadoopCluster_HDFS input

component to read the data from HDFS
a. To open the schema editor, in the Component view of the MyHadoopCluster_HDFS input
component, click Edit schema.
b. To add columns to the schema, click the [+] icon three times and type the columns names
as CustomerID, FirstName, and LastName.
c. To change the Type for the CustomerID column, click the Type field and click Integer.
Note: This schema is the same as in tRowGenerator and tHDFSOutput. You can copy it from
either of those components and paste it in this schema.
d. Connect the tRowGenerator component to the MyHadoopCluster_HDFS input component
using the OnSubjobOk trigger.

6. Sort data in the ascending order of customer ID, using the

tSortRow component
a. Add a tSortRow component and connect it to the MyHadoopCluster_HDFS input
component with the Main row.
b. To open the Component view of the tSortRow component, double-click the component.
c. To configure the schema, click Sync columns.
d. To add new criteria to the Criteria table, click the [+] icon and in the Schema column, type
CustomerID. In the sort num or alpha? column, select num and in the Order asc or desc?
column, select asc.

7. Display the sorted data in the console using a tLogRow

component
a. Add a tLogRow component and connect it to the tSortRow component with the Main row.
b. To open the Component view of the tLogRow component, double-click the component.
c. In the Mode panel, select Table.
Your Job is now ready to run. First, it generates data and writes it to HDFS. Then, it reads the
data from HDFS, sorts it, and displays it in the console.

Talend takes the complexity out of integration

Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

8. Run the Job and observe the result in the console

a. To run the Job, in the Run view, click Run.
The sorted data is displayed in the console.

Talend takes the complexity out of integration

Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend

Expose Back End Service As OData and Integrate Through CPI
50% (2)
Expose Back End Service As OData and Integrate Through CPI
25 pages
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
OmniStudio Build Simple Integration Procedures
No ratings yet
OmniStudio Build Simple Integration Procedures
11 pages
Create A Simple ABAP CDS View in ADT
100% (1)
Create A Simple ABAP CDS View in ADT
71 pages
CMPE 266 - Mid Exam Final
No ratings yet
CMPE 266 - Mid Exam Final
3 pages
Splunk-7 2 1-Indexer
No ratings yet
Splunk-7 2 1-Indexer
446 pages
Talend Tutorial4 Create and Use Metadata
No ratings yet
Talend Tutorial4 Create and Use Metadata
4 pages
Talend Big Data Data Transformation Pig
No ratings yet
Talend Big Data Data Transformation Pig
8 pages
Talend Big Data Reading A File
No ratings yet
Talend Big Data Reading A File
2 pages
Creating_a_Rich-Client_Interface_(InterBase_Tutorial)
No ratings yet
Creating_a_Rich-Client_Interface_(InterBase_Tutorial)
7 pages
ClientDataSet in Detail3
No ratings yet
ClientDataSet in Detail3
9 pages
Using The ADO Data Control
No ratings yet
Using The ADO Data Control
11 pages
SSRS 2012 Material
No ratings yet
SSRS 2012 Material
58 pages
Braindumpscollection Talend Data Integration Certified Developer Exam Talend Data Integration Certified Developer Exam Verified Questions Answers by Abbott 24-05-2024 9qa
No ratings yet
Braindumpscollection Talend Data Integration Certified Developer Exam Talend Data Integration Certified Developer Exam Verified Questions Answers by Abbott 24-05-2024 9qa
15 pages
Create A CDS View
No ratings yet
Create A CDS View
8 pages
Creating_the_Server_Side_with_DataSnap_Server_(InterBase_Tutorial)
No ratings yet
Creating_the_Server_Side_with_DataSnap_Server_(InterBase_Tutorial)
4 pages
DM02 - Lab Manual - Extras
No ratings yet
DM02 - Lab Manual - Extras
40 pages
ADF For Net Developers
No ratings yet
ADF For Net Developers
51 pages
Joining Two Data Sources With The Tmap
No ratings yet
Joining Two Data Sources With The Tmap
11 pages
vlocity 0
No ratings yet
vlocity 0
26 pages
Y Flash
No ratings yet
Y Flash
16 pages
Oracle: Oracle Cloud Platform Application Integration 2018 Associate
No ratings yet
Oracle: Oracle Cloud Platform Application Integration 2018 Associate
7 pages
Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results
No ratings yet
Google Cloud Certified - Professional Data Engineer Practice Exam 1 - Results
52 pages
MUApiLed3.8 StudentManual Mod03
No ratings yet
MUApiLed3.8 StudentManual Mod03
46 pages
Talend Data Integration Certified Developer Exam Dumps by Young 22 07 2024 12qa Go4braindumps
No ratings yet
Talend Data Integration Certified Developer Exam Dumps by Young 22 07 2024 12qa Go4braindumps
19 pages
Dropwodn List in Gridview
No ratings yet
Dropwodn List in Gridview
3 pages
ABAP Programming Model For SAP Fiori
No ratings yet
ABAP Programming Model For SAP Fiori
141 pages
Data Bound Components
No ratings yet
Data Bound Components
13 pages
ADF Code Corner: 057. How-To Declaratively Build A Master-Detail Behavior With DVT Components
No ratings yet
ADF Code Corner: 057. How-To Declaratively Build A Master-Detail Behavior With DVT Components
8 pages
Ssrsinterview
No ratings yet
Ssrsinterview
14 pages
OmniStudio Create A Type Ahead Block Element
100% (1)
OmniStudio Create A Type Ahead Block Element
9 pages
Web Services Repeating Groups DataStage
0% (1)
Web Services Repeating Groups DataStage
32 pages
LSMW
No ratings yet
LSMW
7 pages
Data Grid View Tips and Secrets
No ratings yet
Data Grid View Tips and Secrets
8 pages
Lab 7 - Orchestrating Data Movement With Azure Data Factory
No ratings yet
Lab 7 - Orchestrating Data Movement With Azure Data Factory
26 pages
Designer Developer - Exercises QlikView
No ratings yet
Designer Developer - Exercises QlikView
30 pages
PowerCenter Developer I Lab Guide
No ratings yet
PowerCenter Developer I Lab Guide
152 pages
Tutorial 3: Accessing Databases Using The ADO Data Control: Contents
100% (2)
Tutorial 3: Accessing Databases Using The ADO Data Control: Contents
9 pages
3.2. OmniStudio Build A FlexCard With External Data
No ratings yet
3.2. OmniStudio Build A FlexCard With External Data
16 pages
Creating A Distributed Application With Csharp
0% (1)
Creating A Distributed Application With Csharp
13 pages
BI Analytics Cloud
No ratings yet
BI Analytics Cloud
56 pages
B2B Commerce Developer Demo
No ratings yet
B2B Commerce Developer Demo
7 pages
Access Integrated Project 6
No ratings yet
Access Integrated Project 6
4 pages
Data Integration Developer Demo
No ratings yet
Data Integration Developer Demo
5 pages
Expt 4-Part A and Part B
No ratings yet
Expt 4-Part A and Part B
13 pages
Talend Tutorial1 Discovering Talend Studio
No ratings yet
Talend Tutorial1 Discovering Talend Studio
2 pages
Urexam: $GVVGT 5gtxkeg Kijgt 3Wcnkv (
No ratings yet
Urexam: $GVVGT 5gtxkeg Kijgt 3Wcnkv (
10 pages
How To Migrate Fluid Dashboards Using Data Migration Workbench
No ratings yet
How To Migrate Fluid Dashboards Using Data Migration Workbench
18 pages
Module 4: Connecting To Additional Resources: in This Module, You Will Learn
No ratings yet
Module 4: Connecting To Additional Resources: in This Module, You Will Learn
31 pages
03 - Build An Editable Grid
No ratings yet
03 - Build An Editable Grid
22 pages
Create Document Templates
No ratings yet
Create Document Templates
17 pages
Exercise 7 - Integrated Analysis with R
No ratings yet
Exercise 7 - Integrated Analysis with R
27 pages
Wilder Muth CH 06
No ratings yet
Wilder Muth CH 06
36 pages
ActiveReports Allows You To Create Master Detail Reports With Grouping by Using The GroupHeader and Detail Sections To Contain Data From Master Files and Detail Files
No ratings yet
ActiveReports Allows You To Create Master Detail Reports With Grouping by Using The GroupHeader and Detail Sections To Contain Data From Master Files and Detail Files
3 pages
1
No ratings yet
1
15 pages
Prueba de Conociemiento - Desarrollador Senior PDF
No ratings yet
Prueba de Conociemiento - Desarrollador Senior PDF
6 pages
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
ASP.NET Application Development Fundamentals
From Everand
ASP.NET Application Development Fundamentals
James Lombard
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Practice Questions for UiPath Certified RPA Associate Case Based
From Everand
Practice Questions for UiPath Certified RPA Associate Case Based
Exam OG
No ratings yet
Gasue34 003 PDF
No ratings yet
Gasue34 003 PDF
270 pages
3days Bigdata Crash Course Content
No ratings yet
3days Bigdata Crash Course Content
3 pages
College Placement Assured Workshop Drive
No ratings yet
College Placement Assured Workshop Drive
3 pages
Gasue34 003 PDF
No ratings yet
Gasue34 003 PDF
270 pages
Couchbase Software Installation
No ratings yet
Couchbase Software Installation
1 page
Chapter 4: Creating Simple Queries: 4.1 Filtering and Sorting Data
No ratings yet
Chapter 4: Creating Simple Queries: 4.1 Filtering and Sorting Data
76 pages
EAWB Product Sheet
No ratings yet
EAWB Product Sheet
2 pages
Apache Spark
No ratings yet
Apache Spark
6 pages
Worksheet in Datawarehousing Case Study - Building ETL Processes
No ratings yet
Worksheet in Datawarehousing Case Study - Building ETL Processes
37 pages
MicroStrategy Education Catalog PDF
100% (1)
MicroStrategy Education Catalog PDF
72 pages
DWH & Data Modeling
No ratings yet
DWH & Data Modeling
50 pages
Table 2013 Ug
No ratings yet
Table 2013 Ug
2 pages
Text Clustering Case Study
No ratings yet
Text Clustering Case Study
1 page
3.2 Informatica - SCD
No ratings yet
3.2 Informatica - SCD
3 pages
Talend Tutorial8 Using Condition Based Filters
No ratings yet
Talend Tutorial8 Using Condition Based Filters
2 pages
Talend Tutorial9 Using Context Variables
No ratings yet
Talend Tutorial9 Using Context Variables
2 pages
Talend Tutorial6 Joining Two Sources of Data
No ratings yet
Talend Tutorial6 Joining Two Sources of Data
2 pages
Talend Preparing Metadata For HDFS Connection
No ratings yet
Talend Preparing Metadata For HDFS Connection
4 pages
Talend Tutorial7 Configuring Joins in TMap
No ratings yet
Talend Tutorial7 Configuring Joins in TMap
2 pages
Talend Tutorial5 Filtering Data Using The TMap Component
No ratings yet
Talend Tutorial5 Filtering Data Using The TMap Component
2 pages
Event Machine
100% (4)
Event Machine
33 pages
Cloudera Installation - 5.11.1 (Using Parcels)
No ratings yet
Cloudera Installation - 5.11.1 (Using Parcels)
18 pages
Certificate in Big Data Analytics For Business and Management
No ratings yet
Certificate in Big Data Analytics For Business and Management
17 pages
Big Bank, Big Problem: Data Ingestion
No ratings yet
Big Bank, Big Problem: Data Ingestion
3 pages
Hadoop Single Node Installation
No ratings yet
Hadoop Single Node Installation
7 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
86 pages
CC Labs All
No ratings yet
CC Labs All
184 pages
M.SC (Computer Science) 2019 Pattern
No ratings yet
M.SC (Computer Science) 2019 Pattern
33 pages
Distributed Database and Big Data
No ratings yet
Distributed Database and Big Data
72 pages
Data Science & ML Syllabus
No ratings yet
Data Science & ML Syllabus
12 pages
Top 5 Data Engineering Tool
No ratings yet
Top 5 Data Engineering Tool
2 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
DSBDA ORAL Question Bank
100% (1)
DSBDA ORAL Question Bank
6 pages
Resume Template 1
No ratings yet
Resume Template 1
2 pages
Sentimental Analysis On Big Data Hadoop: Abstract
No ratings yet
Sentimental Analysis On Big Data Hadoop: Abstract
3 pages
Cloudera Developer Training Slides
No ratings yet
Cloudera Developer Training Slides
729 pages
Topic 2 - Features of Big Data
No ratings yet
Topic 2 - Features of Big Data
34 pages
Synopsis
No ratings yet
Synopsis
8 pages
16 SparkAlgorithms
No ratings yet
16 SparkAlgorithms
64 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Understanding Big Data
No ratings yet
Understanding Big Data
117 pages
BDA Experiments
No ratings yet
BDA Experiments
2 pages
HDFS MAP REDUCE
No ratings yet
HDFS MAP REDUCE
16 pages
Week-5 - Lecture Notes
No ratings yet
Week-5 - Lecture Notes
138 pages
Lec 10
No ratings yet
Lec 10
28 pages
SMK Means An Improved Mini Batch K Means Algorithm
No ratings yet
SMK Means An Improved Mini Batch K Means Algorithm
16 pages
Seminar-Deep Learning Roport
No ratings yet
Seminar-Deep Learning Roport
40 pages