ETL
ETL Process
4 major components:
Extracting
Gathering raw data from source systems and storing it in ETL staging environment
Cleaning and conforming
Processing data to improve its quality, format it, merge from multiple sources, enforce conformed dimensions
Delivering
Loading data into data warehouse tables
Managing
Management of ETL environment
ETL: Extracting
Data profiling Identifying data that changed since last load extraction
ETL: Cleaning and Conforming
Data cleansing Recording error events Audit dimensions Deduping Creating and maintaining conformed dimensions and facts
ETL: Delivering
Implementation of SCD logic Surrogate key generation Managing hierarchies in dimensions Managing special dimensions such as date and time, junk, mini, shrunken, small static, and usermaintained dimensions
Mini dimensions
used to track changes of dimension attribute when type 2 technique is infeasible. Similar to junk dimensions Typically is used for large dimensions Combinations can be built in advance or on the fly Built from dimension table input
ETL: Delivering (Cont)
Small static dimensions
Dimensions created by the ETL system without real source Lookup dimensions for translations of codes, etc.
User maintained dimensions
Master dimensions without real source system Descriptions, groupings, hierarchies created for reporting and analysis purposes.
ETL: Delivering (Cont)
Fact table loading Building and maintaining bridge dimension tables Handling late arriving data Management of conformed dimensions Administration of fact tables Building aggregations Building OLAP cubes Transferring DW data to other environment for specific purposes
ETL: Managing
Management of ETL environment
Goals
Reliability Availability Manageability
Job scheduler backup system Recovery and restart system Version control system
ETL: Managing (Cont.)
Version migration system Workflow monitor Sorting system Analyzing dependencies and lineage Problem escalation system Parallelization Security system Compliance manager Metadata repository manager
ETL Process
Planning
High level source to target data flow diagram Selection and implementation of ETL tool Development of default strategies for dimension management, error handling, and other processes Development data transformations diagrams by target table Development of job sequencing
ETL Process
Developing one-time historic load
Build and test the historic dimension and fact tables load
Developing incremental load process
Build and test dimension and fact tables incremental load processes Build and test aggregate table loads and/or OLAP processing Design, build, and test the ETL system automation
ETL Tools: Build vs Buy
Many off-the-shelf tools exist Benefits are not seen right away
Setup Learning curve
High-end tools may not justify value for smaller warehouses
Off-the-shelf ETL Tools
Tool Oracle Warehouse Builder (OWB) Data Integrator (BODI) IBM Information Server (Ascential) SAS Data Integration Studio PowerCenter Oracle Data Integrator (Sunopsis) Data Migrator Integration Services Talend Open Studio DataFlow Data Integrator Transformation Server Transformation Manager Data Manager DT/Studio ETL4ALL DB2 Warehouse Edition Jitterbit Pentaho Data Integration Vendor Oracle Business Objects IBM SAS Institute Informatica Oracle Information Builders Microsoft Talend Group 1 Software (Sagent) Pervasive DataMirror ETL Solutions Ltd. Cognos Embarcadero Technologies IKAN IBM Jitterbit Pentaho
ETL Specification Document
Can be as large as 100 pages per business process; In reality, the work starts after the high level design is documented in a few pages. Source-to-target mappings Data profiling reports Physical design decisions Default strategy for extracting from each major source system Archival strategy Data quality tracking and metadata Default strategy for managing changes to dimension attributes
ETL Specification Document (Cont)
System availability requirements and strategy Design of data auditing mechanism Location of staging areas Historic and incremental load strategies for each table
Detailed table design Historic data load parameters (# of months) and volumes (# of rows) Incremental data volumes
ETL Specification Document (Cont)
Handling of late arriving data Load frequency Handling of changes in each dimension attribute (types 1,2,3) Table partitioning Overview of data sources; discussion of sourcespecific characteristics Extract strategy for the source data Change data capture logic for each source table Dependencies Transformation logic (diagram or pseudo code)
ETL Specification Document (Cont)
Preconditions to avoid error conditions Recovery and restart assumptions for each major step of the ETL pipeline Archiving assumptions for each table Cleanup steps Estimated effort
Overall workflow Job sequencing Logical dependencies
Loading Pointers
One time historic load
Disable RI constraints (FKs) and re-enable them after the load is complete Drop indexes and re-create them after the load is complete Use bulk loading techniques Not always the case
Loading Pointers (Cont)
Incremental load
Loading Pointers (Cont)
Sometimes historic and incremental load logic is the same; many times- is similar. Updating aggregations, if necessary Error handling
10
Sample: Generation of Surrogate Keys on SQL Server
As simple as: DECLARE @i INTEGER SELECT @i = MAX(ID) + 1 FROM TableName But may not work with concurrent processes OR Create PROCEDURE pGetNextID (@SeedName VARCHAR(32), @SeedValue BIGINT OUTPUT) AS UPDATE Lookup_Seed SET @SeedValue = SeedValue = SeedValue + 1 WHERE SeedID = @SeedName Lookup_Seed table: SeedID varchar (32) SeedValue bigint
Introduction to ETL Using Microsoft Tools
11
About SSIS
Microsofts ETL tool Solutions created in packages Developed in Business Intelligence Development Studio (BIDS)
Variation of visual studio
How to create SSIS project within BIDS
Click on Start->All Programs->Microsoft SQL Server 2008> SQL server Business Intelligence Studio From the File menu select New->Project In the Project Type, choose Business Intelligence Projects->Integration Services Project Enter name of the project and select storage location Click OK button
12
How to execute SSIS projects
Within BIDS, right click on the project name in the solution explorer and choose Execute Package
Output window shows results
Outside of BIDS
dtexec.exe dtexecui.exe SQL Server Management Studio SQL Server Agent Scheduler Custom .Net application
Tasks We Will Cover
Bulk insert
Bulk insert data from text files into SQL server database
Data flow
transforms, cleans, and modifies data as it is moved from source to destination
Execute SQL
Executes SQL statements in specified databases
13
Connection Managers
Is a connection to a data source Must be configured for each data source and destination We will use OLE DB and flat file connection managers
Others connection managers are available
Configuring Connection Manager
Open a package Click Control Flow tab Right click Connection Manager area, then click New Connection or choose a new connection of a specific type
14
Flat File Connection Manager
Configure General, Columns, and Advanced sections Use Preview option to view how you configurations were applied
OLE DB Connection Manager
Using BIDS 2008:
Select Native OLE DB/SQL Server Native Client 10.0 Enter server name Enter login info Select database Test connection Click OK
15
Data Migration Best Practices
Data types from source must match data types from destination
Conversions may be necessary
Do conversions as early as possible Bring over only data fields that need to be loaded into DW
Bulk Insert Task
Fastest way to copy data to SQL Server No transformations can be performed when copying data Usually is used to bring raw data from sources into Staging databases in the DW environment Configure General, Connection, Option, and Expression sections
16
Execute SQL Task
Executes one or more SQL statements or stored procedures Executing batches is possible by placing GO command to separate batches SQL Statements can be entered directly into the task editor window or may reside in a file or a variable May return result that can be captured in a variable May specify parameters for queries
Use variables
Precedence Constraints
Specify order of task execution 3 types On success green On failure red On completion Blue Configured on the control flow tab
Drag a green arrow that is coming out of a task Connect it to another tasks Right click on it to specify type. Default is On Success
17
Data Flow Task
Transforms, cleans, and modifies data as it is moved from source to destination Is added to the package on the Control Flow tab Its elements are configured on the Data Flow tab Package may contain multiple data flow tasks Elements are grouped into Source, Transformation, and Destination
Transformations
Numerous transformations are available We will look at Lookup transformation only
Essential for loading fact tables Lookup data from dimension table to retrieve key values to be loaded into fact table Caching can be used (either full or partial)
18
Overall Flow
Wipe out staging table
Execute SQL task
Wipe out DW table if data is fully refreshed
Execute SQL task Do not do this if data should be appended
Load raw data into staging table
Bulk Insert task/Data flow task
Load data from staging tables into dimensions
Execute SQL statement task/Data flow task
Load data into fact table
Data flow task
Create aggregations and/or OLAP cubes
Execute SQL task and/or SSAS
Load Data Into Fact Table: A closer Look
Create OLEDB source
use an sql command to select data from staging table
Create lookup transformation for each key to dimension table that will be stored in the fact table Create OLEDB destination
specify fact table name Specify mapping
19
Useful SQL Scripts
Check if a table exists prior to creating it
if not exists ( select * from information_schema.tables where table_name=xyz)
Useful SQL Scripts
Generate Dates dimension at daily grain
DECLARE @gencalendar TABLE (cal_date DATETIME PRIMARY KEY) DECLARE @p_date SMALLDATETIME SET @p_date = '20010101' WHILE @date <= '20151231' BEGIN INSERT INTO @gencalendar(cal_date) VALUES(@p_date) SET @date = dateadd(d, 1, @p_date) END SELECT * FROM @gencalendar
20
Useful SQL Scripts
If Dates dimension has key in a YYYMMDD format, this script populates entire table
DECLARE @p_date SMALLDATETIME SET IDENTITY_INSERT dimdates ON SET @p_date = '19000101' WHILE @p_date <= '20191231' BEGIN INSERT INTO dimdates(Datekey,CalendarDate,Calendaryear,Calendarhalfyear,Calendarquarter,Calendarmonth,Cal endarday) select CAST(convert(varchar(8),@p_date,112) as int) , CAST(convert(varchar(10),@p_date,101) as date), DATEPART(YY,@p_date) , Case When DATENAME(QQ, @p_date) < 3 then 1 else 2 END, DATEPART (QQ, @p_date), DATEPART(MM,@p_date), DATEPART (DD, @p_date) SET @p_date = dateadd(d, 1, @p_date) END
Useful SQL Scripts
If Dates dimension has key in a YYYMMDD format and the date key has already been inserted, this script populates remaining columns in this dimension
Update dimDate set CalMonthName = CAST(datekey as varchar) where isEndOfYear = 9 Update dimDate Set CalDate = Cast(substring(CalMonthName,5,2)+'/'+substring(CalMonthName,7,2) +'/'+substring(CalMonthName,1,4) as DATE) where isEndOfYear = 9 Update dimDate Set CalYear = DATEPART(YY,CalDate) ,CalMonth = DATEPART(MM,CalDate) ,CalMonthName = DATENAME (MM, CalDate) + ' ''' + RIGHT(Cast(CalYear as CHAR(4)),2) ,CalQuarter = DATENAME (QQ, CalDate) ,CalQuarterName = CAST(CalYear as CHAR(4)) +' Q' + CAST(CalQuarter as CHAR(1)) ,CalHY = Case When DATENAME(QQ, CalDate) < 3 then CAST(CalYear as CHAR(4)) +' H1' else CAST(CalYear as CHAR(4)) +' H2' END , isEndOfYear = 8 where isEndOfYear = 9
21
Useful SQL Scripts
Calculate persons age
create function dbo.fCalculateAge(@DOB datetime,@Date datetime) returns int as begin return ( select case when month(@DOB)>month(@Date) then datediff(yyyy,@DOB,@Date)-1 when month(@DOB)<month(@Date) then datediff(yyyy,@DOB,@Date) when month(@DOB)=month(@Date) then case when day(@DOB)>day(@Date) then datediff(yyyy,@DOB,@Date)-1 else datediff(yyyy,@DOB,@Date) end end) end
Useful SQL Scripts
Sample query for loading data into dimension table
INSERT INTO DimXXX (Col1,Col2) select distinct stgcol1,stgcol2 From Staging.dbo.tbl Where ltrim(rtrim(stgcol1))+'|'+ltrim(rtrim(stgcol2)) not in(select distinct ltrim(rtrim(col1))+'|'+ltrim(rtrim(col2))from DimXXX)
22
Useful SQL Scripts
Parsing names in a Last Name, First Name, Middle Name,Suffix format separated by comma, with Middle Name and suffix being optional
create table tmpnames ( fullname varchar(255)) The script is presented on the next slide.
select fullname, CASE WHEN charindex(',', fullname) > 1 THEN substring(fullname, 1, charindex(',', fullname)-1) WHEN charindex(',', fullname) =0 THEN fullname END as lname, CASE WHEN charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))) > 1 THEN substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))), 1, charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-1) WHEN charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))<=1 and charindex(',', fullname) >0 THEN ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))) ELSE '' END as fname, CASE WHEN charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))))))) > 1 and substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),1,charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))1) not in ('Sr','Jr','II','III','IV') THEN substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),1,charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))1) WHEN charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))<=1 and charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))) >0 and ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))) not in ('Sr','Jr','II','III','IV') THEN ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))) ELSE '' END as mname, CASE WHEN charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))))))))))) > 1 and substring(ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))))),1,charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))))))-1) in ('Sr','Jr','II','III','IV') THEN substring(ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))))),1,charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))))))-1) WHEN charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))<=1 and charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))) >0 THEN ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))) WHEN charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))))))<=1 and charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))))))) >0 THEN ltrim(rtrim(substring(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))),charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))+1,len(ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))))))-charindex(',', ltrim(rtrim(substring(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))),charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))+1,len(ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname)))))-charindex(',', ltrim(rtrim(substring(fullname,charindex(',', fullname)+1,len(fullname)-charindex(',', fullname))))))))))))
23
OLAP
SSAS is used
Cubes KPIs Data mining engine Actions
Cubes
is a multidimensional structure that contains dimensions and measures Dimensions define the structure of the cube, and measures provide the numerical values of interest to the end user. a cube allows a client application to retrieve values as if cells in the cube defined every possible summarized value. Cell positions in the cube are defined by the intersection of dimension members and contain aggregated measures Cubes can be queried using MDX (Multidimensional Expressions), not SQL
24
KPI
Key performance indicators From business perspective, it is a quantifiable measurement for gauging business success a KPI is a collection of calculations that are associated with a measure group in a cube that are used to evaluate business success
Data Mining Engine
Uses mathematical analysis to extract patterns and trends from that exist in data Some scenarios:
Forecasting sales Targeting mailings toward specific customers Determining which products are likely to be sold together Finding sequences in the order that customers add products to a shopping cart
25
Actions
SSAS commands that are used by clients
Types:
Drillthrough actions, which return the set of rows that represents the underlying data of the selected cells of the cube where the action occurs. Reporting actions, which return a report from Reporting Services that is associated with the selected section of the cube where the action occurs. Standard actions, which return the action element (URL, HTML, DataSet, RowSet, and other elements) that is associated with the selected section of the cube where the action occurs.
Creating Cubes
Click on Start->All Programs->Microsoft SQL Server 2008> SQL server Business Intelligence Studio From the File menu select New->Project In the Project Type, choose Business Intelligence Projects->Analysis Services Project Enter name of the project and select storage location Click OK button
26
Creating Cubes
Create a new data source Right click on Data Sources and choose new data source Create a new data source view Right click on Data Source Views and choose new data source view Create Dimensions Right-click on Dimensions and choose new dimension Create cubes Right click on Cubes and choose new cube
Creating Cubes
Once development completed, it is deployed and processed
Deploy=creating cube structure on the server Process=data calculations, aggregations, etc. Right click on the cube name in the solution explorer and choose Process (prior to this step right click the project name in solution explorer, choose properties and specify deployment location)
Cube is ready to be consumed by users
27