Install Guide Pentaho Data Integration With MySQL Database
Install Guide Pentaho Data Integration With MySQL Database
Global Open Versity Pentaho Business Intelligence BI Suite Training Manual Part II Install Guide Pentaho Data Integration (Kettle) with MySQL
Kefa Rabah Global Open Versity, Vancouver Canada [email protected] www.globalopenversity.org
Table of Contents
Page No.
INSTALL GUIDE PENTAHO DATA INTEGRATION (KETTLE) WITH MYSQL Introduction Background Information Part 1: Starting MySQL Server Part 2: Download & Install Pentaho Data Integration (Kettle) Part 4: Hands-On Lab Assignment 1 References Part 5: Need More Training on Windows Data Warehousing and BI Principles using Pentaho BI Other Related Training Part 6: Hands-on Labs Assignments
2 2 2 3 4 22 22 22 23 23 23
A GOV Open Access Technical Academic Publications Enhancing education & empowering people worldwide through eLearning in the 21st Century
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada
www.globalopenversity.org
Global Open Versity Pentaho Business Intelligence BI Suite Training Manual Part II Install Guide Pentaho Data Integration (Kettle) with MySQL
By Kefa Rabah, [email protected] Sept., 13 2010
GTS Institute
Introduction
The Pentaho BI Project is Open Source application software for enterprise reporting, OLAP analysis, dashboard, data mining, workflow and ETL capabilities for Business Intelligence (BI) platform that have mad it the worlds leading and most widely deployed open source BI suite. It also offers self-service dashboard design for business users and cloud computing support for IT. In Part I of this guide we showed you how to install Pentaho Business Intelligence BI Suite CE server with MySQL, Report Designer CE, and Design Studio CE on a Linux machine. It also included how to setup Pentaho Data Integration (Kettle). In this second part of the series, well continue working with Pentaho Data Integration and show you how to build a simple input-output transformation using your own data source from MySQL database. This guide assumes you have some basic knowledge of Linux and MySQL.
Background Information
Data integration focuses mainly on databases. A database is an organized collection of data. It's similar to a file system, which is an organizational structure for files so they're easy to find, access, and manipulate. Pentaho Data Integration (PDI) is a powerful, metadata-driven ETL tool designed to bridge the gap between business and IT. Kettle is an acronym for "Kettle E.T.T.L. Environment." Kettle is designed to help you with your ETTL needs, which include the Extraction, Transformation, Transportation and Loading of data. Kettle itself is part of Pentaho BI applications suite. It is an independent project initiated by Matt Casters until acquired by Pentaho in 2006. Ever since, Kettle is also known as Pentaho Data Integration (PDI). Matt himself still leads the PDI project development in Pentaho. Kettle comprise of 4 applications: Spoon - graphical designer for designing job and transformation schemes. It is based on swing. Pan - script that is used to execute transformation scheme in .ktr xml file form or from a repository. Kitchen - script that is used to execute job scheme in .kjb xml file form or from a repository. Carte - a temporary web server which is used to execute job/transformation in cluster / parallel
Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle tools Pan and Kitchen. Pan is a data transformation engine that performs a multitude of functions such as reading, manipulating, and writing data to and from various data
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada
www.globalopenversity.org
sources. Kitchen is a program that executes jobs designed by Spoon in XML or in a database repository. Jobs are usually scheduled in batch mode to be run automatically at regular intervals. Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. Pan or Kitchen can then read the data to execute the steps described in the transformation or to run the job. In summary, Pentaho Data Integration makes data warehouses easier to build, update, and maintain E.T.L. and Datawarehousing - being an ETL tool, Kettle is an environment that's designed to: collect data from a variety of sources (extraction) move and modify data (transport and transform) while cleansing, denormalizing, aggregating and enriching it in the process frequently (typically on a daily basis) store data (loading) in the final target destination, which is usually a large, dimensionally modeled database called a data warehouse
4. Create a Pentaho Bi database user "pbiuser" and a "bankdb" database. Were going to use it later in Part 9. 5. Now lets test the login capability of Pentaho BI user, "pbiuser":
[root@fc10ds ~]# mysql -u pbiuser -ppassword Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.0.77 Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada
www.globalopenversity.org
mysql> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | bankdb | | jportaldb | | mytestdb | | osmsdb | | phpwebdb | | storedb | +--------------------+ 14 rows in set (0.37 sec) mysql>
So now were good to go, as our "pbiuser" can login into MySQL server and perform all privileged operations. In the next section you will learn how to Connect to MySQL database repository and deploy your own Repository
www.globalopenversity.org
Fig. 1
4. Youll be presented with the Repository Connection dialog box page, you have the option to enter Repository name, if you have one or you can click on the repository, see Fig. 2.
icon to connect o a data
5.
Now, click on the icon to access the Select the repository type dialog box, as shown in Fig. 3. Select the first option as shown and the click OK.
www.globalopenversity.org
6. Youll be presented with Repository information dialog box as shown in Fig. 4. Click on the New button at the to-right-hand corner. 7. Follow the link below to access the full document.
The full document has moved to Docstoc.com. You can access and download it from here: Install Guide Pentaho BI Data Integration (Spoon) with MySQL
OR http://www.docstoc.com/docs/31451411/?key=NzFiYmMyZDgt&pass=MTY5ZS00OTdj
URL: www.globalopenversity.org
Other Related Articles & Hands-on Lab Manuals: 1. 2. 3. 4. 5. 6. 7. Install Guide for Pentaho Business Intelligence BI Suite CE Install & Configure Apache PHP PostgreSQL & MySQL on Linux v1.1 Connecting Tomcat AS to MySQL and Oracle 10g XE DBs on Linux Using JDBC Installing & Configuring Oracle Database10g XE on Linux CentOS5 v1.0 Using Webmin and Bind9 to Setup DNS Server on Linux Build your own ISP Hosting using EHCP on Ubuntu 10.04 LTS Server Build your Own Private Data Center Backup Solutions using Ubuntu Powered RESTORE Backup Server v1.0 8. Install & Setup Astaro Security Gateway to Protect Corporate Network v1.1
----------------------------------------------Kefa Rabah is the Founder of Global Technology Solutions Institute. Kefa is knowledgeable in several fields of Science & Technology, Information Security Compliance and Project Management, and Renewable Energy Systems. He is also the founder of Global Open Versity, a place to enhance your educating and career goals using the latest innovations and technologies.
April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada
www.globalopenversity.org