0% found this document useful (0 votes)
232 views9 pages

LAB03-Creating An ETL Solution With SSIS

This document describes an exercise in an SSIS lab where the student will: 1. Explore source data from an InternetSales database, extracting a sample of customer data and profiling columns to understand data types, lengths, and other statistics. 2. Implement a data flow task to transfer data from the InternetSales database to a staging database. This will involve extracting sales order data at the line item level, looking up product details from a separate Products database, and calculating sales amounts. 3. The objectives of the lab are to extract and profile source data, implement a data flow, and use transformations in a data flow to prepare the data for loading into a data warehouse.

Uploaded by

hiba_cherif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
232 views9 pages

LAB03-Creating An ETL Solution With SSIS

This document describes an exercise in an SSIS lab where the student will: 1. Explore source data from an InternetSales database, extracting a sample of customer data and profiling columns to understand data types, lengths, and other statistics. 2. Implement a data flow task to transfer data from the InternetSales database to a staging database. This will involve extracting sales order data at the line item level, looking up product details from a separate Products database, and calculating sales amounts. 3. The objectives of the lab are to extract and profile source data, implement a data flow, and use transformations in a data flow to prepare the data for loading into a data warehouse.

Uploaded by

hiba_cherif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

LAB 03 : Creating an ETL Solution with SSIS

Scenario
In this lab, you will focus on the extraction of customer and sales order data from the InternetSales database
used by the company’s e-commerce site, which you must load into the Staging database.
This database contains customer data (in a table named Customers), and sales order data (in tables named
SalesOrderHeader and SalesOrderDetail).
You will extract sales order data at the line item level of granularity. The total sales amount for each sales
order line item is then calculated by multiplying the unit price of the product purchased by the quantity
ordered. Additionally, the sales order data includes only the ID of the product purchased, so your data flow
must look up the details of each product in a separate Products database.

Objectives
After completing this lab, you will be able to :
• Extract and profile source data.
• Implement a data flow.
• Use transformations in a data flow.
Estimated Time : 60 Minutes

Exercise 1 : Exploring Source Data


Scenario
In the last lab, you have designed a data warehouse schema for Adventure Works Cycles, and now you must
design an ETL process to populate it with data from various source systems.
Before creating the ETL solution, you have decided to examine the source data so you can understand it
better.
The main tasks for this exercise are as follows :
1. Prepare the Lab Environment
2. Extract and View Sample Source Data
3. Profile Source Data

Task 1 : Prepare the Lab Environment


1. Start SQL Server Management Studio and connect to the (local) instance of the SQL Server database
engine by using Windows authentication.
2. Restore files all files in the « LABS-Atelier SID\Lab03\BackupFiles » folder. Indication : Refer to
« Préparation Environnement Labxx » file in the « LABS-Atelier SID » folder for help.

Task 2 : Extract and View Sample Source Data


1. On the Start screen, type Import and Export and then start the SQL Server 2014 Import and Export
Data (64-bit) app.
2. On the Welcome to SQL Server Import and Export Wizard page, click Next.
3. On the Choose a Data Source page, specify the following options, and then click Next :
• Data source : SQL Server Native Client 11.0
• Server name: localhost
• Authentication: Use Windows Authentication
• Database: InternetSales
4. On the Choose a Destination page, set the following options, and then click Next :
• Destination: Flat File Destination
• File name: …\LABS-Atelier SID\Lab03\Top 1000 Customers.csv
• Locale: English (United States)
• Unicode: Unselected
• Code page: 1252 (ANSI – Latin 1)
• Format: Delimited
• Text qualifier: " (a quotation mark)
• Column names in the first data row: Selected
5. On the Specify Table Copy or Query page, select Write a query to specify the data to transfer,
and then click Next.
6. On the Provide a Source Query page, enter the following Transact-SQL code, and then click Next:
SELECT TOP 1000 * FROM CustomersE ONLY. STUDENT USE PROHIBITED
7. On the Configure Flat File Destination page, select the following options, and then click Next:
• Source query: [Query]
• Row delimiter: {CR}{LF}
• Column delimiter: Comma {,}
8. On the Save and Run Package page, select only Run immediately, and then click Next.
9. On the Complete the Wizard page, click Finish.
10. When the data extraction has completed successfully, click Close.
11. Double click to open file Top 1000 Customers.csv in the « …\LABS-Atelier SID\Lab03 » folder.
12. Examine the data, noting the columns that exist in the Customers table and the range of data values
they contain, and then close Excel without saving any changes.

Task 3 : Profile Source Data

1. Start Visual Studio, and on the File menu, point to New, and then click Project.
2. In the New Project dialog box, select the following values, and then click OK :
• Project Template: Integration Services Project
• Name: Explore Internet Sales
• Location: « …\LABS-Atelier SID\Lab03 »
• Create directory for solution: Selected
• Solution name: Explore Internet Sales
3. If the Getting Started (SSIS) window is displayed, close it.
4. In the Solution Explorer pane, right-click Connection Managers, and then click New Connection
Manager.
5. In the Add SSIS Connection Manager dialog box, click ADO.NET, and then click Add.
6. In the Configure ADO.NET Connection Manager dialog box, click New.
7. In the Connection Manager dialog box, enter the following values, and then click OK :
• Server name: localhost
• Log on to the server : Use Windows Authentication
• Select or enter a database name: InternetSales
8. In the Configure ADO.NET Connection Manager dialog box, verify that a data connection named
localhost.InternetSales is listed, and then click OK.
9. If the SSIS Toolbox pane is not visible, on the SSIS menu, click SSIS Toolbox. Then, in the SSIS
Toolbox pane, in the Common section, double-click Data Profiling Task to add it to the Control
Flow surface. Alternatively, you can drag the task icon to the Control Flow surface.
10. Double-click the Data Profiling Task icon on the Control Flow surface to open its editor.
11. In the Data Profiling Task Editor dialog box, on the General tab, in the Destination property value
drop-down list, click <New File connection…>.
12. In the File Connection Manager Editor dialog box, in the Usage type drop-down list, click Create
file.
13. In the File box, type « …\LABS-Atelier SID\Lab03\ETL\Internet Sales Data Profile.xml », and then
click OK.
14. In the Data Profiling Task Editor dialog box, on the Profile Requests tab, in the Profile Type
dropdown list, select Column Statistics Profile Request, and then click the RequestID column.
15. In the Request Properties pane, set the following property values. Do not click OK when finished:
• ConnectionManager: localhost.InternetSales
• TableOrView: [dbo].[SalesOrderHeader]
• Column: OrderDate
16. In the Data Profiling Task Editor dialog box, on the Profile Requests tab, in the Profile Type
dropdown list for the empty row under the profile you just added, select Column Length
Distribution Profile Request, and then click the RequestID column.
17. In the Request Properties pane, set the following property values. Do not click OK when finished:
• ConnectionManager: localhost.InternetSales
• TableOrView: [dbo].[Customers]
• Column: AddressLine1
• IgnoreLeadingSpaces: False
• IgnoreTrailingSpaces: True
18. In the Data Profiling Task Editor dialog box, on the Profile Requests tab, in the Profile Type
dropdown list for the empty row under the profile you just added, select Column Null Ratio Profile
Request, and then click the RequestID column.
19. In the Request Properties pane, set the following property values. Do not click OK when finished:
• ConnectionManager: localhost.InternetSales
• TableOrView: [dbo].[Customers]
• Column: AddressLine2
20. In the Data Profiling Task Editor dialog box, on the Profile Requests tab, in the Profile Type
dropdown list for the empty row under the profile you just added, select Value Inclusion Profile
Request, and then click the RequestID column.
21. In the Request Properties pane, set the following property values :
• ConnectionManager: localhost.InternetSales
• SubsetTableOrView: [dbo].[SalesOrderHeader]
• SupersetTableOrView: [dbo].[PaymentTypes]
• InclusionColumns:
o Subset side Columns: PaymentType
o Superset side Columns: PaymentTypeKey
• InclusionThresholdSetting: None
• SupersetColumnsKeyThresholdSetting: None
• MaxNumberOfViolations: 100
22. In the Data Profiling Task Editor dialog box, click OK.
23. On the Debug menu, click Start Debugging.
24. When the Data Profiling task has completed, with the package still running, double-click the Data
Profiling task, and then click Open Profile Viewer.
25. Maximize the Data Profile Viewer window, and under the [dbo].[SalesOrderHeader] table, click
Column Statistics Profiles. Then review the minimum and maximums valus for the OrderDate
column.
26. Under the [dbo].[Customers] table, click Column Length Distribution Profiles and click the
AddressLine1 column to view the statistics. Click the bar chart for any of the column lengths, and
then click the Drill Down button to view the source data that matches the selected column length.
27. Under the [dbo].[Customers] table, click Column Null Ratio Profiles and view the null statistics for
the AddressLine2 column. Select the AddressLine2 column, and then click the Drill Down button
to view the source data.
28. Under the [dbo].[ SalesOrderHeader] table, click Inclusion Profiles and review the inclusion
statistics for the PaymentType column. Select the inclusion violation for the payment type value of
0, and then click the Drill Down button to view the source data.
29. Close the Data Profile Viewer window, and then in the Data Profiling Task Editor dialog box, click
Cancel.
30. On the Debug menu, click Stop Debugging.
31. Close Visual Studio, saving your changes if you are prompted.
Results: After this exercise, you should have a comma-separated text file that contains a sample of customer
data, and a data profile report that shows statistics for data in the InternetSales database.

Exercise 2 : Transferring Data by Using a Data Flow Task

Scenario
Now that you have explored the source data in the InternetSales database, you are ready to start
implementing data flows for the ETL process. A colleague has already implemented data flows for reseller
sales data, and you plan to model your Internet sales data flows on those.
The main tasks for this exercise are as follows:
Examine an Existing Data Flow
1. Examine an Existing Data Flow
2. Create a Data Flow task
3. Add a Data Source to a Data Flow
4. Add a Data Destination to a Data flow
5. Test the Data Flow Task

Task 1 : Examine an Existing Data Flow


1. Open the « …\LABS-Atelier SID\Lab03\Ex2\AdventureWorksETL.sln » solution in Visual Studio.
2. Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two
Data Flow tasks.
3. On the Data Flow tab, view the Extract Resellers task and note that it contains a source named
Resellers and a destination named Staging DB.
4. Examine the Resellers source, noting the connection manager that it uses, the source of the data,
and the columns that its output contains.
5. Examine the Staging DB destination, noting the connection manager that it uses, the destination
table for the data, and the mapping of input columns to destination columns.
6. Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data
flow as it runs, noting the number of rows transferred.
7. When the data flow has completed, stop the debugging session.

N.B. If an error is generated when executing project, you should change the value of « file name » attribute
of the « Orphaned Reseller Sales » connection with the address of the « Orphaned Reseller Sales.csv »
Task 2 : Create a Data Flow task
1. In Solution Explorer, right-click SSIS Packages, and then click New SSIS Package.
2. Right-click Package1.dtsx, click Rename, and then change the name of the package to Extract
Internet Sales Data.dtsx.
3. With the Extract Internet Sales Data.dtsx package open, and the Control Flow surface visible, in the
SSIS Toolbox pane, double-click Data Flow Task, and then drag the new Data Flow task to the center
of the Control Flow surface.
4. Right-click Data Flow Task on the Control Flow surface, click Rename, and then change the name
of the task to Extract Customers.
5. Double-click the Extract Customers Data Flow task to view the Data Flow surface.

Task 3 : Add a Data Source to a Data Flow


1. In Solution Explorer, right-click Connection Managers, and then click New Connection Manager.
2. In the Add SSIS Connection Manager dialog box, click OLEDB, and then click Add.
3. In the Configure OLE DB Connection Manager dialog box, click New.
4. In the Connection Manager dialog box, enter the following values, and then click OK:
• Server name: localhost
• Log on to the server: Use Windows Authentication
• Select or enter a database name: InternetSales
Note: When you create a connection manager, it is named automatically based on the server and
database name, for example, localhost.InternetSales. If you have previously created a connection
manager for the same database, a name such as localhost.InternetSales1 may be generated.
5. In the Configure OLE DB Connection Manager dialog box, verify that a new data connection is
listed, and then click OK.
6. In the SSIS Toolbox pane, in the Favorites section, double-click Source Assistant.
7. In the Source Assistant - Add New Source dialog box, in the Select source type list, select SQL
Server, in the Select connection manager list, select the connection manager for the
localhost.InternetSales1 database that you created previously, and then click OK.
8. Drag the new OLE DB Source data source to the center of the Data Flow surface, right-click it, click
Rename, and then change the name of the data source to Customers.
9. Double-click the Customers source, set the following configuration values, and then click OK :
• On the Connection Manager page, ensure that the OLE DB connection manager for the
localhost.InternetSales database is selected, ensure that the Table or view data access mode
is selected, and then in the Name of the table or the view drop-down list, click
[dbo].[Customers].
• On the Columns tab, ensure that every column from the Customers table is selected, and that
the output columns have the same names as the source columns.

Task 4 : Add a Data Destination to a Data Flow


1. In the SSIS Toolbox pane, in the Favorites section, double-click Destination Assistant.
2. In the Destination Assistant - Add New Destination dialog box, in the Select destination type list,
click SQL Server. In the Select connection manager list, click localhost.Staging, and then click OK.
3. Drag the new OLE DB Destination data destination below the Customers data source, right-click
it,click Rename, and then change the name of the data destination to Staging DB.
4. On the Data Flow surface, click the Customers source, and then drag the blue arrow from the
Customers data source to the Staging DB destination.
5. Double-click the Staging DB destination, set the following configuration values, and then click OK
• On the Connection Manager page, ensure that the localhost.Staging OLE DB connection
manager is selected, ensure that the Table or view – fast load data access mode is selected. In
the Name of the table or the view drop-down list, click [dbo].[Customers], and then click Keep
nulls.
• On the Mappings tab, drag the CustomerKey column from the list of available input columns
to the CustomerBusinessKey column in the list of available destination columns. Then verify
that all other input columns are mapped to destination columns of the same name.

Task 5 : Test the Data Flow Task


1. Right-click anywhere on the Data Flow surface, click Execute Task, and then observe the task as it
runs, noting how many rows are transferred.
2. On the Debug menu, click Stop Debugging.
3. Close Visual Studio.

Results: After this exercise, you should have an SSIS package that contains a single Data Flow task, which
extracts customer records from the InternetSales database and inserts them into the Staging databasedata,
and a data profile report that shows statistics for data in the InternetSales database.

Exercise 3 : Using Transformations in a Data Flow


Scenario
You have implemented a simple data flow to transfer customer data to the staging database. Now you must
implement a data flow for Internet sales records. The new data flow must add a new column that contains the
total sales amount for each line item (which is derived by multiplying the list price by the quantity of units
purchased), and use a product key value to find additional data in a separate Products database. Once again,
you will model your solution on a data flow that a colleague has already implemented for reseller sales data.
The main tasks for this exercise are as follows:
1. Examine an Existing Data Flow
2. Create a Data Flow Task
3. Add a Data Source to a Data Flow
4. Add a Derived Column transformation to a data flow
5. Add a Lookup Transformation to a Data Flow
6. Add a Data Destination to a Data Flow
7. Test the Data Flow task

Task 1 : Examine an Existing Data Flow

1. Open the « …\LABS-Atelier SID\Lab03\Ex3\AdventureWorksETL.sln » solution in Visual Studio.


2. Open the Extract Reseller Data.dtsx package and examine its control flow. Note that it contains two
Data Flow tasks.
3. On the Data Flow tab, view the Extract Reseller Sales task.
4. Examine the Reseller Sales source, noting the connection manager that it uses, the source of the
data, and the columns that its output contains.
5. Examine the Calculate Sales Amount transformation, noting the expression that it uses to create a
new derived column.
6. Examine the Lookup Product Details transformation, noting the connection manager and query that
it uses to look up product data, and the column mappings used to match data and add rows to the
data flow.
7. Examine the Staging DB destination, noting the connection manager that it uses, the destination
table for the data, and the mapping of input columns to destination columns.
8. Right-click anywhere on the Data Flow design surface, click Execute Task, and then observe the data
flow as it runs, noting the number of rows transferred.
9. When the data flow has completed, stop the debugging session.

Task 2 : Create a Data Flow Task


1. In the Solution Explorer pane, double-click Extract Internet Sales Data.dtsx.
2. View the Control Flow tab, and then in the SSIS Toolbox pane, in the Favorites section, double-click
Data Flow Task.
3. Drag the new Data Flow task under the existing Extract Customers task.
4. Right-click the new Data Flow task, click Rename, and then change the name to Extract Internet
Sales.
5. Click the Extract Customers Data Flow task, and then drag the arrow from the Extract Customers
task to the Extract Internet Sales task.
6. Double-click the Extract Internet Sales task to view the Data Flow surface.

Task 3 : Add a Data Source to a Data Flow


1. In the SSIS Toolbox pane, in the Favorites section, double-click Source Assistant.
2. In the Source Assistant - Add New Source dialog box, in the Select source type list, select SQL
Server. In the Select connection manager list, select localhost.InternetSales, and then click OK.
3. Drag the new OLE DB Source data source to the center of the Data Flow surface, right-click it, click
Rename, and then change the name of the data source to Internet Sales.
4. Double-click the Internet Sales source, set the following configuration values, and then click OK:
• On the Connection Manager page, ensure that the localhost.InternetSales OLE DB connection
manager is selected, in the Data access mode list, click SQL command, click Browse, and then
import the InternetSales.sql query file from the « …\LABS-Atelier SID\Lab03 » folder.
• On the Columns tab, ensure that every column from the query

Task 4 : Add a Derived Column transformation to a data flow

1. In the SSIS Toolbox pane, in the Common section, double-click Derived Column.
2. Drag the new Derived Column transformation below the existing Internet Sales data source,
rightclick it, click Rename, and then change the name of the transformation to Calculate Sales
Amount.
3. On the Data Flow surface, click the Internet Sales source, and then drag the blue arrow from the
Internet Sales data source to the Calculate Sales Amount transformation.
4. Double-click the Calculate Sales Amount transformation, in the Derived Column Transformation
Editor dialog box, perform the following steps, and then click OK :
• In the Derived Column Name column, type SalesAmount.
• In the Derived Column column, ensure that <add as new column> is selected.
• Expand the Columns folder, and then drag the UnitPrice column to the Expression box.
• Type *, and then drag the OrderQuantity column to the Expression box so that the expression
looks like the following example: [UnitPrice] * [OrderQuantity]
• Ensure that the Data Type column contains the value numeric [DT_NUMERIC], the Precision
column contains the value 25, and the Scale column contains the value 4.

Task 5 : Add a Lookup Transformation to a Data Flow

1. In the SSIS Toolbox pane, in the Common section, double-click Lookup.


2. Drag the new Lookup transformation below the existing Calculate Sales Amount transformation,
right-click it, click Rename, and then change the name of the transformation to Lookup Product
Details.
3. On the Data Flow surface, click the Calculate Sales Amount transformation, and then drag the blue
arrow from the Calculate Sales Amount transformation to the Lookup Product Details
transformation.
4. Double-click the Lookup Product Details transformation, and in the Lookup Transformation
Editor dialog box, perform the following steps and then click OK :
• In the General column, under Cache mode, ensure that Full cache is selected, and under
Connection type, ensure that OLE DB connection manager is selected. In the Specify how to
handle rows with no matching entries list, click Redirect rows to no match output.
• On the Connection tab, in the OLE DB connection manager list, select localhost.Products
and click OK. Then select Use results of an SQL query, click Browse, and import the
Products.sql query file from the « …\LABS-Atelier SID\Lab03\ » folder.
• On the Columns tab, drag ProductKey from the Available Input Columns list to ProductKey
in the Available Lookup Columns list.
• In the Available Lookup Columns list, select the check box next to the Name column heading
to select all columns, and then clear the check box for the ProductKey column.
5. In the SSIS Toolbox pane, in the Other Destinations section, double-click Flat File Destination.
6. Drag the new flat file transformation to the right of the existing Lookup Product Details
transformation, right-click it, click Rename, and then change the name of the transformation to
Orphaned Sales.
7. On the Data Flow surface, click the Lookup Product Details transformation, and then drag the blue
arrow from the Lookup Product Details transformation to the Orphaned Sales destination.
8. In the Input Output Selection dialog box, in the Output list, click Lookup No Match Output, and
then click OK.
9. Double-click the Orphaned Sales destination, and then in the Flat File Destination Editor dialog
box, next to the Flat File connection manager drop-down list, click New.
10. In the Flat File Format dialog box, click Delimited, and then click OK.
11. In the Flat File Connection Manager Editor dialog box, change the text in the Connection manager
name box to Orphaned Internet Sales.
12. On the General tab, set the File name value to « …\LABS-Atelier SID\Lab03\ETL\Orphaned
Internet Sales.csv », select Column
names in the first data row, and then click OK.
13. In the Flat File Destination Editor dialog box, ensure that Overwrite data in the file is selected,
and then click the Mappings tab. Verify that all input columns are mapped to destination columns
with the same name, and then click OK.

Task 6 : Add a Data Destination to a Data Flow

1. In the SSIS Toolbox pane, in the Favorites section, double-click Destination Assistant.
2. In the Destination Assistant - Add New Destination dialog box, in the Select destination type list,
click SQL Server. In the Select connection manager list, click localhost.Staging, and then click OK.
3. Drag the new OLE DB Destination data destination below the Lookup Product Details
transformation, right-click it, click Rename, and then change the name of the data destination to
Staging DB.
4. On the Data Flow surface, click the Lookup Product Details transformation, and then drag the blue
arrow from the Lookup Product Details transformation to the Staging DB destination. Note that
the Lookup Match Output is automatically selected.
5. Double-click the Staging DB destination, set the following configuration values, and then click OK :
• On the Connection Manager page, ensure that the localhost.Staging OLE DB connection
manager is selected, and ensure that the Table or view – fast load data access mode is selected.
In the Name of the table or the view drop-down list, click [dbo].[InternetSales], and select
Keep nulls.
• On the Mappings tab, drag the following ProductKey columns from the list of available input
columns to the ProductBusinessKey corresponding columns in the list of available destination
columns:
• Verify that all other input columns are mapped to destination columns of the same name.

Task 7 : Test the Data Flow task

1. Right-click anywhere on the Data Flow surface, click Execute Task, and then observe the task as it
runs, noting how many rows are transferred. There should be no orphaned sales records.
2. On the Debug menu, click Stop Debugging.
3. Close Visual Studio.

Results: After this exercise, you should have a package that contains a Data Flow task including Derived
Column and Lookup transformations

You might also like