SSIS 2008 Tutorial
SSIS 2008 Tutorial
In this chapter:
The Import and Export Wizard Creating a Package Working with Connection Managers Building Data Flows Building Control Flows Creating Event Handlers Saving and Running Packages
Files needed:
ISProject1.zip ISProject2.zip
SQL Server Integration Services Microsoft says that SQL Server Integration Services (SSIS) is a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing. A simpler way to think of SSIS is that its the solution for automating data movements. SSIS provides a way to build packages made up of tasks that can move data around from place to place and alter it on the way. There are visual designers (hosted within Business Intelligence Development Studio) to help you build these packages as well as an API for programming SSIS objects from other applications. In this chapter, youll see how to build and use SSIS packages. First, though, well look at a simpler facet of SSIS: The SQL Server Import and Export Wizard. If you choose to use the supplied solution files rather than building your own, you may need to edit the properties of the OLE DB Connection Managers within the projects to point to your own test server. Youll learn more about Connection Managers in the Working with Connection Managers section later in this chapter.
You can launch the Import and Export wizard from the Tasks entry on the shortcut menu of any database in the Object Explorer window of SQL Server Management Studio.
Try It!
To import some data using the Import and Export Wizard, follow these steps: 1. Launch SQL Server Management Studio and log in to your test server.
16-2 Introduction to SQL Server 2008 Copyright 2009 Accelebrate, Inc
The Import and Export Wizard 2. Open a new query window. 3. Select the master database from the Available Databases combo box on the toolbar. 4. Enter this text into the query window: CREATE DATABASE Chapter16 5. 6. 7. 8. 9. Click the Execute toolbar button to create a new database. Expand the Databases node in Object Explorer Right-click on the Chapter16 database and select Tasks Import Data. Read the first page of the Import and Export Wizard and click Next. Select SQL Native Client for the data source and provide login information for your test server. 10. Select the AdventureWorks2008 database as the source of the data to import. 11. Click Next. 12. Because youre importing data, the next page of the wizard will default to connection information for the Chapter16 database. Click Next. 13. Select Copy Data From One or More Tables or Views and click Next. Note that if you only want to import part of a table you can use a query as the data source instead. 14. Select the HumanResources.Department, HumanResources.JobCandidate and HumanResources.Shift tables, as show in Figure 16-1. As you select tables, the wizard will automatically assign names for the target tables.
16-3
15. Select the HumanResources.Shift table and click on the Edit Mappings button. 16. The Column Mappings dialog box lets you change the name, data type, and other properties of the destination table columns. You can also set other options here, such as whether to overwrite or append data when importing data to an existing table. Click Cancel when youre done inspecting the options. 17. Click Next. 18. Check Execute Immediately and click Next. 19. Click Finish to perform the import. SQL Server will display progress as it performs the import, as shown in Figure 16-2.
16-4
20. Click Close to dismiss the report. 21. Expand the Tables node of the Chapter16 database to verify that the import succeeded. In addition to executing its operations immediately, the Import and Export Wizard can also save a package for later execution. Youll learn more about packages in the remainder of this chapter.
16-5
Creating a Package
The Import and Export Wizard is easy to use, but it only taps a small part of the functionality of SSIS. To really appreciate the full power of SSIS, youll need to use BIDS to build an SSIS package. A package is a collection of SSIS objects including: Connections to data sources. Data flows, which include the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. A Control flow, which includes tasks and containers that execute when the package runs. You can organize tasks in sequences and in loops. Event handlers, which are workflows that runs in response to the events raised by a package, task, or container.
Youll see how to build each of these components of a package in later sections of the chapter, but first, lets fire up BIDS and create a new SSIS package.
Try It!
To create a new SSIS package, follow these steps: 1. 2. 3. 4. 5. 6. Launch Business Intelligence Development Studio Select File New Project. Select the Business Intelligence Projects project type. Select the Integration Services Project template. Select a convenient location. Name the new project ISProject1 and click OK.
16-6
Handles
Connecting to ADO objects such as a Recordset. Connecting to data sources through an ADO.NET provider. Connects to a cache either in memory or in a file Connecting to an Analysis Services database or cube. Connecting to an Excel worksheet. Connecting to a file or folder. Connecting to delimited or fixed width flat files. Connecting to an FTP data source. Connecting to an HTTP data source.
16-7
To create a Connection Manager, you right-click anywhere in the Connection Managers area of a package in BIDS and choose the appropriate shortcut from the shortcut menu. Each Connection Manager has its own custom configuration dialog box with specific options that you need to fill out.
Try It!
To add some connection managers to your package, follow these steps: 1. Right-click in the Connection Managers area of your new package and select New OLE DB Connection. 2. Note that the configuration dialog box will show the data connections that you created in Chapter 15; data connections are shared across Analysis Services and Integration Services projects. Click New to create a new data connection. 3. In the Connection Manager dialog box, select the SQL Native Client provider. 4. Select your test server and provide login information. 5. Select the Chapter16 database. 6. Click OK. 7. In the Configure OLE DB Connection Manager dialog box, click OK. 8. Right-click in the Connection Managers area of your new package and select New Flat File Connection. 9. Enter DepartmentList as the Connection Manager Name. 10. Enter C:\Departments.txt as the File Name. 11. Check the Column Names in the First Data Row checkbox. Figure 16-4 shows the completed General page of the dialog box.
16-8
Try It!
12. Click the Advanced icon to move to the Advanced page of the dialog box. 13. Click the New button. 14. Change the Name of the new column to DepartmentName. 15. Click OK. 16. Right-click the DepartmentList Connection Manager and select Copy. 17. Right-click in the Connection Managers area and select Paste. 18. Click on the new DepartmentList 1 connection to select it. 19. Use the Properties Window to change properties of the new connection. Change the Name property to DepartmentListBackup. Change the ConnectionString property to C:\DepartmentsBackup.txt. Figure 16-5 shows the SSIS package with the three Connection Managers defined.
Copyright 2009 Accelebrate, Inc. Introduction to SQL Server 2008 16-9
16-10
Purpose
Execute an ActiveX Script Execute DDL query statements against an Analysis Services server Process an Analysis Services cube Insert data from a file into a database Execute a data mining query Generate a profile of sample data, determining distribution of values or percentage of NULLs, etc. Execute a Data Transformation Services Package (DTS was the SQL Server 2000 version of SSIS) Execute an SSIS package Shell out to a Windows application Run a SQL query Perform file system operations such as copy or delete Perform FTP operations Send or receive messages via MSMQ Execute a custom task Send e-mail Transfer an entire database between two SQL Servers Transfer custom error messages between two SQL Servers Transfer jobs between two SQL Servers Transfer logins between two SQL Servers Transfer stored procedures from the master database on one SQL Server to the master database on another SQL Server Transfer objects between two SQL Servers Execute a SOAP Web method Read data via WMI Wait for a WMI event Perform operations on XML data
Execute DTS 2000 Package Execute Package Execute Process Execute SQL File System FTP Message Queue Script Send Mail Transfer Database Transfer Error Messages Transfer Jobs Transfer Logins Transfer Master Stored Procedures
Transfer SQL Server Objects Web Service WMI Data Reader WMI Event Watcher XML
Task
Back Up Database Check Database Integrity Execute SQL Server Agent Job Execute T-SQL Statement History Cleanup Maintenance Cleanup Notify Operator
Purpose
Back up an entire database to file or tape Perform database consistency checks Run a job Run any T-SQL script Clean out history tables for other maintenance tasks Clean up files left by other maintenance tasks Send e-mail to SQL Server operators
16-11
Container
For Loop Foreach Loop Sequence
Purpose
Repeat a task a fixed number of times Repeat a task by enumerating over a group of objects Group multiple tasks into a single unit for easier management Table 16-4: SSIS containers
Try It!
To add control flow tasks to the package youve been building, follow these steps: 1. If the Toolbox isnt visible already, hover your mouse over the Toolbox tab until it slides out from the side of the BIDS window. Use the pushpin button in the Toolbox title bar to keep the Toolbox visible. 2. Make sure the Control Flow tab is selected in the Package Designer. 3. Drag a File System Task from the Toolbox and drop it on the Package Designer. 4. Drag a Data Flow Task from the Toolbox and drop it on the Package Designer, somewhere below the File System task. 5. Click on the File System Task on the Package Designer to select it. 6. Drag the green arrow from the bottom of the File System Task and drop it on top of the Data Flow Task. This tells SSIS the order of tasks when the File System Task succeeds. 7. Double-click the connection between the two tasks to open the Precedence Constraint Editor. 8. Change the Value from Success to Completion, because you want the Data Flow Task to execute whether the File System Task succeeds or not. 9. Click OK. 10. Select the File System task in the designer. Use the Properties Window to set properties of the File System Task. Set the Source property to DepartmentList. Set the Destination property to DepartmentListBackup. Set the OverwriteDestinationFile property to True then click OK. Figure 16-6 shows the completed set of control flow tasks.
16-12
As it stands, this package uses the file system task to copy the file specified by the DepartmentList connection to the file specified by the DepartmentListBackup connection, overwriting any target file that already exists. It then executes the data flow task. In the next section, youll see how to configure the data flow task.
16-13
Use
Extracts data from a database using a .NET data provider Extracts data from an Excel workbook Extracts data from a flat file Extracts data from a database using an OLE DB provider Extracts data from a raw file (proprietary Microsoft format) Extracts data from an XML file Table 16-5: Data flow sources
Transformation
Aggregate Audit Cache Transform Character Map Conditional Split Copy Column Data Conversion Data Mining Query Derived Column Export Column Fuzzy Grouping Fuzzy Lookup Import Column Lookup Merge Merge Join Multicast OLE DB Command Percentage Sampling Pivot Row Count Row Sampling Script Component Slowly Changing Dimension Sort Term Extraction Term Lookup Union All Unpivot
Effect
Aggregates and groups values in a dataset Adds audit information to a dataset Populates a CACHE connection manager Applies string operations to character data Evaluates and splits up rows in a dataset Copies a column of data Converts data to a different datatype Runs a data mining query Calculates a new column from existing data Exports data from a column to a file Groups rows that contain similar values Looks up values using fuzzy matching Imports data from a file to a column Looks up values in a reference dataset Merges two sorted datasets Merges data from two datasets by using a join Creates copies of a dataset Executes a SQL command on each row in a dataset Extracts a subset of rows from a dataset Builds a pivot table from a dataset Counts the rows of a dataset Extracts a sample of rows from a dataset Executes a custom script Updates a slowly changing dimension table Sorts data Extracts data from a column Looks up the frequency of a term in a column Merges multiple datasets Normalizes a pivot table
16-14
Use
Sends data to a .NET data provider Sends data to an Analysis Services data mining model Sends data to an in-memory ADO.NET DataReader Processes a cube dimension Sends data to an Excel worksheet Sends data to a flat file Sends data to an OLE DB database Processes an Analysis Services partition Sends data to a raw file Sends data to an in-memory ADO Recordset Sends data to a SQL Server CE database Sends data to a SQL Server database Table 16-7: Data Flow Destinations
If you are running SQL Server Integration Services on a 64-bit machine, the Excel source and destination will throw an exception. During development, you can select Project Project_name Properties, select the Debugging page and change the Run64BitRuntime property to false. When deploying the package, youll need to shell out to the 32-bit SSIS runtime when scheduling the package.
Try It!
To customize the data flow task in the package youre building, follow these steps: 1. Select the Data Flow tab in the Package Designer. The single Data Flow Task in the package will automatically be selected in the combo box. 2. Drag an OLE DB Source from the Toolbox and drop it on the Package Designer. 3. Drag a Character Map Transformation from the Toolbox and drop it on the Package Designer. 4. Drag a Flat File Destination from the Toolbox and drop it on the Package Designer. 5. Click on the OLE DB Source on the Package Designer to select it. 6. Drag the green arrow from the bottom of the OLE DB Source and drop it on top of the Character Map Transformation. 7. Click on the Character Map Transformation on the Package Designer to select it.
16-15
SQL Server Integration Services 8. Drag the green arrow from the bottom of the Character Map Transformation and drop it on top of the Flat File Destination. 9. Double-click the OLE DB Source to open the OLE DB Source Editor. Notice that it uses the Chapter16 OLE DB connection manager by default. 10. Select the HumanResources.Department table. Figure 16-7 shows the completed OLE DB Source Editor.
11. Click OK. 12. Double-click the Character Map Transformation. 13. Check the Name column. 14. Select In-Place Change in the Destination column. 15. Select the Uppercase operation. Figure 16-8 shows the completed Character Map Transformation Editor.
16-16
16. Click OK. 17. Double-click the Flat File Destination. 18. Select the DepartmentList Flat File Connection Manager. 19. Select the Mappings page of the dialog box. 20. Drag the Name column from the Available Input Columns list and drop it on top of the DepartmentName column in the Available Destination Columns list. Figure 16-9 shows the completed Mappings page.
16-17
21. Click OK. Figure 16-10 shows the completed data flow.
16-18
The data flow tasks in this package take a table from the Chapter16 database, transform one of the columns in that table to all uppercase characters, and then write that transformed column out to a flat file.
16-19
SQL Server Integration Services By adding event handlers that call the Send Mail task to the OnError event, you can notify operators by e-mail if anything goes wrong in the course of running an SSIS package.
Try It!
To add an event handler to the package weve been building, follow these steps: 1. Open SQL Server Management Studio and connect to your test server. 2. Create a new query and select the Chapter16 database in the available databases list on the toolbar. 3. Enter this text into a query window: CREATE TABLE DepartmentExports( ExportID int IDENTITY(1,1) NOT NULL, ExportTime datetime NOT NULL CONSTRAINT DF_DepartmentExports_ExportTime DEFAULT (GETDATE()), CONSTRAINT PK_DepartmentExports PRIMARY KEY CLUSTERED ( ExportID ASC ) ) 4. 5. 6. 7. Click the Execute toolbar button to create the table. Switch back to the Package Designer in BIDS. Select the Event Handlers tab. In the Executable drop-down list, expand the Package node and then the Executables node. 8. Select the Data Flow Task in the Executable dropdown list, click OK. 9. Select the OnPostExecute event handler. 10. Click the hyperlink on the design surface to create the event handler. 11. Drag an Execute SQL task from the Toolbox and drop it on the Package Designer. 12. Double-click the Execute SQL task to open the Execute SQL Task Editor. 13. Select the Chapter16 OLE DB connection manager as the tasks connection. 14. Set the SQL Statement property to the following query: INSERT INTO DepartmentExports (ExportTime) VALUES (GETDATE()) 15. Click OK to create the event handler.
16-20
Saving and Running Packages This event handler will be called when the Data Flow Task finishes executing, and will insert one new row into the tracking table when it is called.
Try It!
To store copies of the package youve developed, follow these steps. 1. Select File Save Copy of Package.dtsx As from the BIDS menus. 2. Select SSIS Package Store as the Package Location. 3. Select the name of your test server. 4. Enter /File System/ExportDepartments as the package path. 5. Click OK. 6. Select File Save Copy of Package.dtsx As from the BIDS menus. 7. Select SQL Server as the Package Location. 8. Select the name of your test server and fill in your authentication information. 9. Enter ExportDepartments as the package path. 10. Click OK.
Running a Package
You can run the final package from either BIDS or SQL Server Management Studio. When youre developing a package, its convenient to run it directly from BIDS.
Copyright 2009 Accelebrate, Inc. Introduction to SQL Server 2008 16-21
SQL Server Integration Services When the package has been deployed to a production server (and saved to the msdb database or the Package Store) youll probably want to run it from SQL Server Management Studio. SQL Server also includes a command-line utility, dtexec, that lets you run packages from batch files.
Try It!
To run the package that you have loaded in BIDS, follow these steps: 1. Click the Start Debugging toolbar button. SSIS will execute the package, highlighting the steps in the package as they are completed. You can select any tab to watch whats going on. For example, if you select the Control Flow tab, youll see tasks highlighted, as shown in Figure 16-11.
16-22
2. When the package finishes executing, click the hyperlink underneath the Connection Managers pane to stop the debugger. 3. Click the Execution Results tab to see detailed information on the package, as shown in Figure 16-12.
16-23
All of the events you see in the Execution Results pane are things that you can create event handlers to react to within the package. As you can see, DTS issues a quite a number of events, from progress events to warnings about extra columns of data that we retrieved but never used.
Try It!
1. In SQL Server Management Studio, click the Connect button at the top of the Object Explorer window.
16-24 Introduction to SQL Server 2008 Copyright 2009 Accelebrate, Inc
Try It! 2. Select Integration Services. 3. Choose the server with Integration Services installed and click Connect. This will add an Integration Services node at the bottom of Object Explorer. 4. Expand the Stored Packages node. Youll see that you can drill down into the File System node to find packages in the Package Store, or the MSDB node to find packages stored in the msdb database. 5. Expand the File System node. 6. Right-click on the ExportDepartments package and select Run Package. This will open the Execute Package utility, shown in Figure 16-13.
7. Click Execute. 8. Click Close twice to dismiss the progress dialog box and the Execute Package Utility. 9. Enter this text into a query window with the Chapter16 database selected: SELECT * FROM DepartmentExports
16-25
SQL Server Integration Services 10. Click the Execute toolbar button to verify that the package was run. You should see one entry for when the package was run from BIDS and one from when you ran it from SQL Server Management Studio.
16-26
Exercises
Exercises
One common use of SSIS is in data warehousing - collecting data from a variety of different sources into a single database that can be used for unified reporting. In this exercise youll use SSIS to perform a simple data warehousing task. Use SSIS to create a text file, c:\EmployeeDept.txt, containing the last names, department names, start and end dates of the AdventureWorks2008 employees. Retrieve the last names from the Person.Person table and the department start and end dates from the HumanResources.EmployeeDepartmentHistory table in the AdventureWorks2008 database, and the department names from the Chapter16 database. You can use the Merge Join data flow transformation to join data from two sources. One tip: the inputs to this transformation need to be sorted on the joining column.
16-27
Solutions to Exercises
1. 2. 3. 4. 5. 6. 7. Launch Business Intelligence Development Studio Select File New Project. Select the Business Intelligence Projects project type. Select the Integration Services Project template. Select a convenient location. Name the new project ISProject2 and click OK. Right-click in the Connection Managers area of your new package and select New OLE DB Connection. 8. Click New to create a new data connection. 9. In the Connection Manager dialog box, select the SQL Native Client provider. 10. Select your test server and provide login information. 11. Select the AdventureWorks2008 database. 12. Click OK. 13. Right-click in the Connection Managers area of your new package and select New OLE DB Connection. 14. Select the existing connection to the Chapter16 database and click OK. 15. Right-click in the Connection Managers area of your new package and select New Flat File Connection. 16. Enter EmployeeList as the Connection Manager Name. 17. Enter C:\EmployeeDept.txt as the File Name. 18. Check the Column Names in the First Data Row checkbox. 19. Click the Advanced icon to move to the Advanced page of the dialog box. 20. Click the New button. 21. Change the Name of the new column to LastName. 22. Click the New button. 23. Change the Name of the new column to Department. 24. Click the New button. 25. Change the Name of the new column to StartDate and the datatype to Date. 26. Click the New button. 27. Change the Name of the new column to EndDate and the datatype to Date. 28. Click OK. 29. Select the Control Flow tab in the Package Designer. 30. Drag a Data Flow Task from the Toolbox and drop it on the Package Designer. 31. Select the Data Flow tab in the Package Designer. The single Data Flow Task in the package will automatically be selected in the combo box. 32. Drag an OLE DB Source from the Toolbox and drop it on the Package Designer. 33. Drag a second OLE DB Source from the Toolbox and drop it on the Package Designer. 34. Drag a Sort Transformation from the Toolbox and drop it on the Package Designer.
16-28 Introduction to SQL Server 2008 Copyright 2009 Accelebrate, Inc
Solutions to Exercises 35. Drag a second Sort Transformation from the Toolbox and drop it on the Package Designer. 36. Drag a Merge Join Transformation from the Toolbox and drop it on the Package Designer. 37. Drag a Flat File Destination from the Toolbox and drop it on the Package Designer. 38. Click on the first OLE DB Source on the Package Designer to select it. 39. Drag the green arrow from the bottom of the first OLE DB Source and drop it on top of the first Sort Transformation. 40. Click on the second OLE DB Source on the Package Designer to select it. 41. Drag the green arrow from the bottom of the second OLE DB Source and drop it on top of the second Sort Transformation. 42. Click on the first Sort Transformation on the Package Designer to select it. 43. Drag the green arrow from the bottom of the first Sort Transformation and drop it on top of the Merge Join Transformation. 44. In the Input Output Selection dialog box, select Merge Join Left Input. 45. Click OK. 46. Click on the second Sort Transformation on the Package Designer to select it. 47. Drag the green arrow from the bottom of the second Sort Transformation and drop it on top of the Merge Join Transformation. 48. Click on the Merge Join Transformation on the Package Designer to select it. 49. Drag the green arrow from the bottom of the Merge Join Transformation and drop it on top of the Flat File Destination. Figure 16-14 shows the Data Flow tab with the connections between tasks.
16-29
50. Double-click the first OLE DB Source to open the OLE DB Source Editor. 51. Select the connection to the AdventureWorks2008 database. 52. For the Data Access Mode, select SQL Command. 53. Enter the following query: SELECT p.LastName, dh.DepartmentID, dh.StartDate, dh.EndDate FROM Person.Person p INNER JOIN HumanResources.EmployeeDepartmentHistory dh ON p.BusinessEntityID = dh.BusinessEntityID 54. Click OK. 55. Double-click the second OLE DB Source to open the OLE DB Source Editor. 56. Select the connection to the Chapter16 database. 57. Select the HumanResources.Department table. 58. Click OK. 59. Double-click the first Sort Transformation. 60. Check the DepartmentID column. 61. Click OK
16-30 Introduction to SQL Server 2008 Copyright 2009 Accelebrate, Inc
Solutions to Exercises 62. Double-click the second Sort Transformation. 63. Check the DepartmentID column. 64. Click OK 65. Double-click the Merge Join Transformation. 66. Check the Join Key checkbox for the DepartmentID column in both tables, if it is not already checked. 67. Check the selection checkbox for the LastName, StartDate and EndDate columns in the left-hand table and the Name column in the right-hand table; alias the Name column as DepartmentName. Figure 16-15 shows the completed Merge Join Transformation Editor.
SQL Server Integration Services 70. Select the EmployeeList Flat File Connection Manager. 71. Select the Mappings page of the dialog box. 72. The LastName, StartDate and EndDate columns will be automatically mapped. Drag the DepartmentName column from the Available Input Columns list and drop it on top of the Department column in the Available Destination Columns list. 73. Click OK. 74. Right-click the package in Solution Explorer and select Execute Package. 75. Stop debugging when the package is finished executing. 76. Open the c:\EmployeeDept.txt file to inspect the results.
16-32