Chapter 4
Chapter 4
Chapter-4
Enforcing Data Quality, Extending SQL Server Integration
Services
The repeat on data quality of data warehouse estimates that data quality
problems cause business more than 600 million dollars per year (US).
The finding of the report based on the interview with industry expert, many
customer & survey data from responses.
Data Quality Management (DQM) community there is a generally held view
that quality of a data set is depend on whether it meets defined requirements.
SQL Server 2012 is used to Data Quality Services (DQS) & Master Data
Service (MDS).
You can solve data quality problems in a proactive way.
One of the most powerful option in the MDS Master Data Service for excel is
the ability to duplicate data by using data quality services.
You can use this option if your data quality services instance is installed on
the same SQL Server instance as master data services.
Data cleansing is performed on your source data using a bowleg base that has
been built in DQS assigns a high quality data set.
A SQL Server data quality service is a bowleg driven data quality product
aim at the data if professional to improve the quality of their business data.
The data cleansing process are using the data quality tools.
DQS allows creating a knowledge base by discovering & managing the
information about the data we will use the bowleg base for cleansing data.
The data cleansing is that incorrect value should be corrected & incomplete
values should be made complete.
A DQS knowledge base must be available on data quality server again which
you want to compare & cleansing your source data.
DQS uses knowledge base automatic & computer assisted data cleansing.
Design By: Mr.Ronak J. Goda
Data Warehousing With SQL Server 2012 3
After the automatic process is done you can manually review & additionally
edit the process data.
You can use SQL Server or excel data source if the source data is in the
source data is in an excel file, then excel must be installed on the same
computer data quality client.
A DQS knowledge base must exist before you can start a DQS cleansing or
matching project.
For Example:
If you are cleansing company name the knowledge base you use should
have high quality data about company name.
A KB used for cleansing company name could have synonyms & turn
based relations define.
A DQS project uses a single KB multiple projects can use the same KB
(Knowledge Base).
A cleansing process has the following stage:
1) Mapping.
2) Computer Assisted Cleansing.
3) Interactive Cleansing.
4) Export.
After you have a knowledge base you can use DQS process to validate &
cleanse your data.
A matching policy consists of one or more matching rules that identify which
domains will be used when DQS assesses.
The matching policy activity in the Knowledge Base Management wizard
analyzes sample data by applying each matching rule to compare two records
at a time throughout the range of records.
The matching policy is run on domains mapped to the sample data.
A knowledge base is available for use in a data quality project only when it is
published.
The matching rules in it cannot be changed by a user other than the person
who created it.
DQS performs data de-duplication by comparing each row in the source data
to every other row.
Using the matching policy defined in the knowledge base, and producing a
probability that the rows are a match.
A data matching project consists of a computer-assisted process and an
interactive process.
DQS performs the matching analysis, it creates clusters of records that DQS
considers matches.
You can export the results of the matching process either to a SQL Server
table or a .csv file.
The script task is ability to create a variable value from the SSIS package into
the script & other than write a message out of the script task.
The script task can interact with SSIS variable. You can use .net code to
manipulate & respond to variable values.
The SSIS script allows you to add functionality to your SSIS package that
does not already exists with the other pre-defined task.
SSIS script task one of the most interesting tools to increase SSIS capability.
The script task you can program new functionality using C# & VB.
The script task & script command have to design in time mode.
The script task provides the entire required infrastructure for the custom code
for you,letting you focus exclusively on the code.
The Script Task Editor exposes property expressions on the Expressions page
as other taskeditors do.
That you need to list the SSIS variables you want to use in your script in
theReadOnlyVariablesand ReadWriteVariables properties of the script task.
Design By: Mr.Ronak J. Goda
Data Warehousing With SQL Server 2012 7
Script task & component you can implement custom programmatic logic in
SSIS package quality & efficiency.
The definition of a script task or component is embedded in the definition of
the SSIS package itself.
In custom task & components can be developed, deployed & maintained
independently in the SSIS package.
SSIS data flow components, and after you have determined that due
tocomplexity, dependency, or reusability requirements.
The following guidelines to plan the design of thecustom component:
1) Role.
2) Usage.
Design By: Mr.Ronak J. Goda
Data Warehousing With SQL Server 2012 8
Role:
Usage:
If the component will perform lookup operations or will need to access data
that isnot available in the current data flow, it will require access to external
data sources.
Toaccess data stored in variables or parameters, the component will also need
access tothose variables and parameters.
Behavior:
Configuration:
Design-Time Methods
Run-Time Methods