Ibm Websphere Qualitystage: 8 Release 1
Ibm Websphere Qualitystage: 8 Release 1
Version 8 Release 1
Tutorial
SC18-9925-01
IBM WebSphere QualityStage
Version 8 Release 1
Tutorial
SC18-9925-01
Note
Before using this information and the product that it supports, read the information in “Notices” on page 51.
The Designer client provides a common user interface in which you design your
data quality jobs. In addition, you have the power of the parallel processing engine
to process large stores of source data.
The integrated stages available in the Repository provide the basis for
accomplishing the following data cleansing tasks:
v Resolving data conflicts and ambiguities
v Uncovering new or hidden attributes from free-form or loosely controlled source
columns
v Conforming data by transforming data types into a standard format
v Creating one unique result
Learning objectives
The key points that you should keep in mind as you complete this tutorial include
the following concepts:
v How the processes of standardization and matching improve the quality of the
data
v The ease of combining both QualityStage and DataStage stages in the same job
v How the data flows in an iterative process from one job to another
v The surviving data results in the best available record
In this tutorial, you will create a project by using the data that is provided.
To start a QualityStage job, you open the Designer client and create a new Parallel
job. You build the QualityStage job by adding stages, source and target files, and
links from the Repository, and placing them onto the Designer canvas. The
In this tutorial, you build four QualityStage jobs. Each job is built around one of
the Data Quality stages and additional DataStage stages.
The Designer client stages are stored in the Designer tool palette. You can access all
the QualityStage stages in the Data Quality group in the palette. You configure
each stage to perform the type of actions on the data that obtain the required
results. Those results are used as input data to the next stage. The following stages
are included in QualityStage:
v Investigate stage
v Standardize stage
v Match Frequency stage
v Unduplicate Match stage
v Reference Match stage
v Survive stage
You can also add any of the DataStage stages to your job. In some of the lessons,
you add DataStage stages to enhance the types of tools for processing the data.
In this tutorial, you use all of these components when you build and run your
QualityStage project.
In this tutorial, you have the role of a database analyst for a bank that provides
many financial services. The bank has a large database of customers; however,
there are problems with the customer list because it contains multiple names and
address records for a single household. Because the marketing department wants
to market additional services to existing customers, you need to find and remove
duplicate addresses.
For example, a married couple has four accounts, each in their own names. The
accounts include two checking accounts, an IRA, and a mutual fund.
To save money on the mailing, the bank wants to consolidate the household
information so that each household receives only one mailing. In this tutorial, you
are going to use IBM WebSphere QualityStage to standardize all customer
addresses. In addition, you need to locate and consolidate all records of customers
who are living at the same address.
Learning objectives
After you complete these tasks, you should understand how QualityStage stages
restructure and cleanse the data by using applied business rules.
Skill level
Audience
This tutorial is intended for business analysts and systems analysts who are
interested in understanding QualityStage.
Prerequisites
Expected results
Upon the completion of this tutorial, you should be able to use the Designer client
to create your own QualityStage projects to meet the business requirements and
data quality standards of your company.
The IBM WebSphere QualityStage server may be installed on the same Windows
computer as the clients, or it may be on a separate Windows, UNIX®, or Linux
computer. Sometimes the engine tier is referred to as the DataStage and
QualityStage server. When you created the project for the tutorial, you
automatically created a folder or directory for that project on the computer where
the engine tier is installed.
1. Open the tutorial folder TutorialData\QualityStage that you created on the client
computer and locate the input.csv file.
2. Open the project folder on the computer where the engine tier is installed for
the tutorial project you created. Where tutorial_project is the name of the
project you created, examples of path names are:
v For a Windows server: C:\IBM\InformationServer\Server\Projects\
tutorial_project
v For a UNIX or Linux server: opt/IBM/InformationServer/Server/Projects/
tutorial_project
3. On the client computer, right-click the input.csv file in the tutorial folder and
select Copy from the shortcut menu.
4. Move to the project folder on the server computer, right-click and select Paste
from the shortcut menu.
Starting a project
Use Designer client project as a container for your QualityStage jobs.
Open the DataStage Designer client to begin the tutorial. The DataStage Designer
Parallel job provides the executable file that runs your QualityStage jobs.
1. Click Start → All Programs → IBM Information Server → IBM WebSphere
DataStage and QualityStage Designer. The Attach to Project window opens.
2. In the Domain field, type the name of the server that you are connected to.
3. In the User name field, type your user name.
4. In the Password field, type your password.
5. In the Project field, select the project you created (for example, Tutorial).
6. Click OK. The New Parallel job opens in the Designer client.
Creating a job
The Designer client provides the interface to the parallel engine that processes the
QualityStage jobs. You are going to save a job to a folder in the DataStage
repository.
You have created a new parallel job named Investigate and saved it in the folder
Jobs\MyTutorial in the repository. Using these procedures, create 3 more jobs in
this folder and name them Standardize1, Unduplicate1, and Survive1.
You can use the information in the reports to make basic assumptions about the
data and the steps you must take to attain the goal of providing a legitimate
address for each customer in the database.
Learning objectives
After completing the lessons in this module, you should know how to do the
following tasks:
1. Add QualityStage or DataStage stages and links to a job
2. Configure stage properties to specify which action they take when the job is
run
3. Load and process customer data and metadata
4. Compile and run a job
5. Produce data for reports
If you have not already done so, open the Designer client.
1. From the left pane of the Designer, go to the MyTutorial folder you created for
this tutorial and double click on Investigate1 to open the job.
2. Click Palette → Data Quality to select the Investigate stage.
If you do not see the palette, click View → Palette.
3. Drag the Investigate stage onto the Designer canvas and drop it in the middle
of the canvas.
4. Drag a second Investigate stage and drop it beneath the first Investigate
stage. You must use two investigate stages to create the data for the reports.
5. Click Palette → File and select Sequential File.
6. Drag the Sequential File onto the Designer canvas and drop it to the left of
the first Investigate stage. This sequential file is the source file.
7. Click Palette → Processing and select the Copy stage. This stage duplicates the
data from the source file and copies it to the two Investigate stages.
8. Drag the Copy stage onto the Designer canvas and drop it between the
Sequential File and the first Investigate stage.
9. Click Palette → File, and drag a second Sequential File onto the Designer
canvas and drop it to the right of the first Investigate stage.
The data from the Investigate stage is sent to the second Sequential File
which is the target file.
Lesson checkpoint
When you set up the Investigate job, you are connecting the source file and its
source data and metadata to all the stages and linking the stages to the target files.
In completing this lesson, you learned the following about the Designer:
v How to add stages to the Designer canvas
v How to combine Data Quality and Processing stages on the Designer canvas
v How to link all the stages together
When you rename the links and stages, do not use spaces. The Designer client
resets the name back to the generic value if you enter spaces. The goal of this
lesson is to replace the generic names for the icons on the canvas with more
appropriate names.
Stage Change to
Copy CopyStage
Investigate InvestigateStage
Investigate InvestigateStage2
8. Rename the three target files from the top in the following order:
a. NameTokenReport
b. AreaTokenReport
c. AreaPatternReport
9. Right click on the names of the following links, select Rename and type the
new link name in the highlighted box:
Link Change to
From InvestigateStage to NameTokenReport TokenData
From InvestigateStage2 to AreaTokenReport AreaTokenData
From InvestigateStage2 to AreaPatternReport AreaPatternData
Lesson checkpoint
In this lesson, you changed the generic stages and links to names appropriate for
the job.
The goal of this lesson is to attach the input data of customer names and addresses
and load the metadata.
To add data and metadata to the Investigate job, configure the source file to locate
the input data file input.csv stored on your computer and load the metadata
columns.
This lesson explains how to configure a DataStage Processing stage, the Copy
stage, to a QualityStage job to duplicate the metadata and send the output
metadata to the two Investigate stages.
This procedure shows you how to map columns to two different outputs.
Lesson checkpoint
In this lesson, you mapped the input metadata to the two output links to continue
the propagation of the metadata to the next two stages.
The Investigate stage analyzes each record from the source file. In this lesson, you
®
select the NAME rule set to apply USPS standards.
3. Select Name from the Available Data Columns section and click to
move the Name column into the Standard Columns pane. The
InvestigateStage analyzes the Name column by using the rule set that you
select in step 4.
4. In the Rule Set: field, click to select a rule set for the InvestigateStage.
a. In the Rule Sets window, double-click the Standardization Rules folder to
open the Standardization Rules tree.
b. Double-click the USA folder, double-click the USNAME folder, then select
USNAME. The USNAME rule set parses the Name column according to
United States Post Office standards for names.
c. Right-click USNAME and select Provision All from the shortcut menu.
d. Click OK to exit the Rule Sets window.
Your Name map should look like the following figure:
5. Click the Token Report check box in the Output Dataset section of the
window.
6. Click the Stage Properties → Output → Mapping tab.
Lesson summary
This lesson explained how to configure the Investigate stage by using the
USNAME rule set.
You learned how to configure the Investigate stage in the Investigate job by doing
the following tasks:
v Selecting the columns to investigate
v Selecting a rule from the rules set
v Mapping the output columns
5. In the Rule Set: field, click to locate a rule set for the InvestigateStage2.
a. In the Rule Sets window, double-click the Standardization Rules folder to
open the Standardization Rules tree.
9. Click the Output → Mapping tab. Since there are two output links from the
second Investigate stage, you must map the columns to each link:
a. In the Output name field above the Columns pane, select
AreaPatternData.
b. Select the Columns pane.
c. Right-click and select Select All from the shortcut menu.
d. Right-click and select Copy from the shortcut menu.
e. Select the AreaPatternData pane, right-click and select Paste Column from
the shortcut menu. The columns are mapped to the AreaPatternData
output link.
f. In the Output name field above the Columns pane, select AreaTokenData.
g. Repeat steps b through e, except select the AreaTokenData pane in step e.
10. Click OK to close the InvestigateStage2 window.
Lesson summary
This lesson explained how to configure the second Investigate stage to the AREA
rule set.
You learned how to configure the second Investigate stage in the Investigate job by
doing the following tasks:
v Selecting the columns to investigate
v Selecting a rule from the rules set
v Verifying the link ordering for the output reports
The Investigate job modifies the unformed source data into readable data which is
later configured into Investigation reports.
3. In the File field, click and browse to the path name of the folder on the
server computer where the input data file resides.
4. In the File name field, type tokrpt.csv to display the path and file name in
theFile field, (for example, C:\IBM\InformationServer\Server\Projects\
tutorial\tokrpt.csv).
5. Click OK to close the stage.
6. Double-click the AreaPatternReport icon.
7. Repeat steps 2 to 5 except type areapatrpt.csv.
8. Double-click the AreaTokenReport icon.
9. Repeat steps 2 to 5 except type areatokrpt.csv.
Lesson checkpoint
This lesson explained how to configure the target files for use as reports.
You configured the three target data files by linking the data to each report file.
Compile the Investigate job in the Designer client. After the job compiles
successfully, open the Director client and run the job.
2. Click to compile the job. The Compile Job window opens and the job
begins to compile. When the compiler finishes, the following message is shown
Job successfully compiled with no errors.
3. Click Tools → Run Director. The Director application opens with the job shown
in the Director Job Status View window.
Lesson checkpoint
In this lesson, you learned how to compile and process an Investigate job.
You processed the data into three output files by doing the following tasks:
v Compiling the Investigate job
v Running the Investigate job in the Director
Module 1: Summary
In Module 1, you set up, configured, and processed an IBM WebSphere DataStage
and QualityStage Investigate job.
An Investigate job looks at each record column-by-column and analyzes the data
content of the columns that you select. The Investigate job loads the name and
address source data stored in the database of the bank, parses the columns into a
form that can be analyzed, and then organizes the data into three data files.
The Investigate job modifies the unformed source data into readable data that you
can configure into Investigation reports using the Information Server for Web
console. You select the QualityStage Reports to access the reports interface in the
Web console.
The next module organizes the unformed data into standardized data that provides
usable data for matching and survivorship.
Lessons learned
By completing this module, you learned about the following concepts and tasks:
v How to correctly set up and link stages in a job so that the data propagates from
one stage to the next
v How to configure the stage properties to apply the correct rule set to analyze the
data
v How to compile and run a job
v How to create data for analysis
When you worked on the data in Module 1, some addresses were free form and
nonstandard. Removing duplicates of customer addresses and guaranteeing that a
single address is the correct address for that customer would be very difficult
without standardizing the data.
Learning objectives
After completing the lessons in this module, you should know how to do the
following tasks:
1. Add stages and links to a Standardize job
2. Configure the various stage properties to correctly process the data when the
job is run
3. Work with handling nulls by using derivations
4. Generate frequency and standardized data
If you have not already done so, open the Designer client.
As you learned in Lesson 1.1, you must add stages and links to the Designer
canvas to create a standardize job. The Investigate job that you completed helped
you determine how to formulate a business strategy by using Investigation reports.
The Standardize job applies rule sets to the source data to condition it for
matching.
Stage Change to
SequentialFile Customer
Standardize stage Standardize
Transformer stage CreateAdditionalMatchColumns
Copy stage Copy
Data_Set file Stan
Match Frequency stage MatchFrequency
Data_Set file Frequencies
6. Right click on the names of the following links, select Rename and type the
new link name in the highlighted box:
Link Change to
From Customer to Standardize Input
From Standardize to Standardized
CreateAdditionalColumns
From CreateAdditionalColumns to Copy ToCopy
From Copy to Stan StandardizedData
From Copy to MatchFrequency ToMatchFrequency
From MatchFrequency to Frequencies ToFrequencies
The following figure shows the Standardized job stages and links.
You set up and linked a Standardize job by doing the following tasks:
v Adding Data Quality and Processing stages to the Designer canvas
v Linking all the stages
v Renaming the links and stages
The source data is attached to the Customer source file and table definitions are
loaded to organize the data into standard address columns.
c. Click to move the Name column into the Selected Columns pane.
The Optional NAMES Handling field is activated.
d. Click OK.
You configured the Standardize stage to apply the USNAME, USADDR, and
USAREA rule sets to the customer data and saved the table definitions.
The metadata from the Standardize and Transformer stages is duplicated and
written to two output links.
Lesson checkpoint
This lesson explained how to configure the source file and all the stages for the
Standardize job.
You have now applied settings to each stage and mapped the output files to the
next stage for the Standardize job. You learned how to do the following tasks:
v Configure the source file to load the customer data and metadata
v Apply United States postal service compliant rule set to the customer name and
address data
v Add additional columns for matching and create derivatives to handle nulls
v Write data to two output links and associate the data to the correct links
v Create frequency data
The job standardizes the data according to applied rules and adds additional
matching columns to the metadata. The data is written to two target data sets as
the source files for a later job.
Lesson checkpoint
This lesson explained how to attach files to the target data sets to store the
processed standardized customer name and address data and frequency data.
You have configured the Stan and Frequencies target data set files to accept the
data when it is processed.
Module 2: Summary
In Module 2, you set up and configured a Standardize job.
Running a Standardize job conforms the data to ensure that all the customer name
and address data has the same content and format. The Standardize job loads the
name and address source data stored in the database of the bank and adds table
definitions to organize the data into a format that can be analyzed by the rule sets.
Further processing by the Transformer stage increases the number of columns and
frequency data is generated for input into the match job.
By completing this module, you learned about the following concepts and tasks:
v How to create standardized data to match records effectively
v How to run DataStage and Data Quality stages together in one job
v How to apply country or region-specific rule sets to analyze the address data
v How to use derivatives to handle nulls
v How to create the data that can be used as source data in a later job
The Unduplicate Match stage is one of two stages that matches records while
removing duplicates and residuals. The other matching stage is Reference Match
stage.
The Unduplicate Match stage groups records that share common attributes. The
Match specification that you apply was configured to separate all records with
weights above a certain match cutoff as duplicates. The master record is then
identified by selecting the record within the set that matches to itself with the
highest weight.
Any records that are not part of a set of duplicates are residuals. These records
along with the master records are used for the next pass. Do not include duplicates
because you want them to belong to only one set.
Using a matching stage ensures data integrity because you are applying
probabilistic matching technology. This technology is applied to any relevant
attribute for evaluating the columns, parts of columns, or individual characters that
you define. In addition, you can apply agreement or disagreement weights to key
data elements.
Learning objectives
After completing the lessons in this module, you should know how to do the
following tasks:
1. Add DataStage links and stages to a job
2. Add standardize and frequencies data as the source files
3. Configure stage properties to specify which action they take when the job is
run
4. Remove duplicate addresses after the first pass
5. Apply a Match specification to determine how matches are selected
6. Funnel the common attribute data to a separate target file
If you have not already done so, open the Designer client.
As you learned in the previous module, you must add stages and links to the
Designer canvas to create an Unduplicate Match job. The Standardize job you just
Stage Change to
top left Data Set Frequencies
lower left Data Set Standardized
Unduplicate Match Unduplicate
Funnel CollectMatched
top right Sequential File MatchedOutput_csv
middle right Sequential File ClericalOutput_csv
lower right Sequential File NonMatchedOutput_csv
6. Right click on the names of the following links, select Rename from the
shortcut menu and type the new link name in the highlighted box:
Links Change to
From MatchFrequencies to Unduplicate FrequencyData
From StandardizedData to Unduplicate StandardizedData
Unduplicate to CollectMatched MatchedData
Unduplicate to CollectMatched Duplicates
CollectMatched to MatchOutput_csv MatchedOutput
Unduplicate to ClericalOutput_csv Clerical
Unduplicate to NonMatchedOutput_csv NonMatched
You set up and linked an Unduplicate Match job by doing the following tasks:
v Adding Data Quality and Processing stages to the Designer canvas
v Linking all the stages
v Renaming the links and stages with appropriate names
3. In the File field, click and browse to the path name of the folder on the
server computer where the input data file resides.
The data from the Standardize job is loaded into the source files for the
Unduplicate Match job.
9. Click the Output → Mapping tab and map the following columns to the
correct links:
a. In the Output name field above the Columns pane, select MatchedData .
b. Right-click in the Columns pane and select Select All from the shortcut
menu.
3. In the File field, click and browse to the folder on the server computer
where the input data file resides.
4. In the File name field, type MatchedOutput.csv to display the path and file
name in theFile field, (for example, C:\IBM\InformationServer\Server\
Projects\tutorial\MatchedOutput.csv).
5. Click the Formats tab, right-click and select Field Defaults from the menu.
6. Click Add sub-property from the menu.
7. Click Null field value and type double quotes (no spaces) in the Null field
value field.
8. Save the Table Definitions.
a. Click the Columns tab.
b. Click Save to open Save Table Definitions window.
c. In the Data source type field, type Table Definitions.
d. In the Data source name field, type MatchedOutput1.
e. In the Table/file name field, type MatchedOutput1.
f. Click OK to open the Save Table Definition As window.
g. Click Save to save the table definition and close the Save Table Definition
As window.
h. Click OK to close the stage window.
9. Repeat steps 1 through 8 for each of the following target files:
Lesson checkpoint
In this lesson, you combined the matched and duplicate address records into one
file. The nonmatched and clerical output records were separated into individual
files. The clerical output records can be reviewed manually for matching records.
The nonmatched records are used in the next pass. The matched and duplicate
address records are used in the Survive job.
You learned how to separate the output records from the Unduplicate Match stage
to the various target files.
Module 3: Summary
In Module 3, you set up and configured an Unduplicate stage job to isolate
matched and duplicate name and address data into one file.
In creating an Unduplicate stage job, you added a Match specification to apply the
blocking and matching criteria to the standardized and frequency data created in
the Standardize job. After applying the Match specification, the resulting records
were sent out through four output links, one for each type of record. The matches
and duplicates were sent to a Funnel stage that combined the records into one
output which was written to a file. The unmatched or residuals records were sent
to a file, as were the clerical output records.
Lessons learned
By completing Module 3, you learned about the following concepts and tasks:
v How to apply a Match specification to the Unduplicate stage
v How the Unduplicate stage groups records with similar attributes
v How to ensure data integrity by applying probability matching technology
The Unduplicate job identifies a group of records with similar attributes. In the
Survive job, you specify which columns and column values from each group
creates the output record for the group. The output record can include the
following information:
v An entire input record
v Selected columns from the record
v Selected columns from different records in the group
Select column values based on rules for testing the columns. A rule contains a set
of conditions and a list of targets. If a column tests true against the conditions, the
column value for that record becomes the best candidate for the target. After
testing each record in the group, the columns declared best candidates combine to
become the output record for the group. Column survival is determined by the
target. Column value survival is determined by the rules.
Learning objectives
After completing the lessons in this module, you should know how to do the
following tasks:
1. Add stages and links to a Survive job
2. Choose the selected column
3. Add the rules
4. Map the output columns
In this lesson, add the Data Quality Survive stage, the source file of combined data
from the Unduplicate Match job, and the target file for the best records.
Stage Change to
left Sequential file MatchedOutput
Survive stage Survive
right Sequential file Survived_csv
6. Right click on the names of the following links, select Rename from the
shortcut menu and type the new link name in the highlighted box:
Links Change to
From MatchedOutput to Survive Matchesandduplicates
From Survive to Survived_csv Survived
Lesson checkpoint
In this lesson, you learned how to set up a Survive job by adding as source data
the results of the Unduplicate Match job, the Survive stage, and the target file as
the output record for the group.
You have learned that the Survive stage takes one input link and one output link.
In the Survive job, you are testing column values to determine which columns are
the best candidates for that record. These columns are combined to become the
output record for the group. In selecting a best candidate, you can specify that
these column values be tested:
v Record creation data
v Data source
v Length of data in a column
v Frequency of data in a group
3. In the File field, click and browse to the path name of the folder on the
server computer where the input data file resides.
4. Click on the MatchedOutput.csv file.
You attached file the MatchedOutput.csv file and loaded the Table Definitions into
the MatchedOutput file.
You can view the rules you added in the Survive grid.
9. From the Select the group identification data column section, click the Selected
Column qsMatchSetID from the list
10. Click the Stage Properties → Output → Mapping. tab.
11. Right click in the Columns pane and select Select All from the shortcut menu.
12. Select Copy from the shortcut menu.
13. Move to the Survived pane, right click and select Paste Column from the
shortcut menu.
2. In the File field, click and browse to the folder on the server computer
where the input data file resides.
3. In the File name field, type record.csv to display the path and file name in
theFile field, (for example, C:\IBM\InformationServer\Server\Projects\
tutorial\record.csv).
4. Click File → Save to save the job.
Lesson checkpoint
You have set up the survive job, renamed the links and stages, and configured the
source and target files, and the Survive stage.
Module 4: Summary
In Module 4, you completed the last job in the IBM WebSphere QualityStage work
flow. In this module, you set up and configured the Survive job to select the best
record from the matched and duplicates name and address data that you created
in the Unduplicate Match stage.
In configuring the Survive stage, you selected a rule, included columns from the
source file, added a rule to each column and applied the data. After the Survive
stage processed the records to select the best record, the information was sent to
the output file.
Lessons learned
In completing Module 4, you learned about the following tasks and concepts:
v How to use the Survive stage to create the best candidate in a record
v How to apply simple rules to the column values
The tutorial presented a common business problem which was to verify customer
names and addresses, and showed the steps to take by using QualityStage jobs to
reconcile the various names that belonged to one household. The tutorial presented
four modules that covered the four jobs in the QualityStage work flow. These jobs
provide customers with the following assurances:
v Investigating data to identify errors and validate the contents of fields in a data
file
v Conditioning data to ensure that the source data is internally consistent
v Matching data to identify all records in one file that correspond to similar
records in another file
v Identifying which records from the match data survive to create a best candidate
record
Lessons learned
By completing this tutorial, you learned about the following concepts and tasks:
v About the QualityStage work flow
v How to set up a QualityStage job
v How data created in one job is the source for the next job
v How to create quality data by using QualityStage
A subset of the product documentation is also available online from the product
documentation library at publib.boulder.ibm.com/infocenter/iisinfsv/v8r1/
index.jsp.
PDF file books are available through the IBM Information Server software installer
and the distribution media. A subset of the information center is also available
online and periodically refreshed at www.ibm.com/support/docview.wss?rs=14
&uid=swg27008803.
You can also order IBM publications in hardcopy format online or through your
local IBM representative.
You can send your comments about documentation in the following ways:
v Online reader comment form: www.ibm.com/software/data/rcf/
v E-mail: [email protected]
Contacting IBM
You can contact IBM for customer support, software services, product information,
and general information. You can also provide feedback on products and
documentation.
Customer support
For customer support for IBM products and for product download information, go
to the support and downloads site at www.ibm.com/support/us/.
You can open a support request by going to the software support service request
site at www.ibm.com/software/support/probsub.html.
My IBM
You can manage links to IBM Web sites and information that meet your specific
technical support needs by creating an account on the My IBM site at
www.ibm.com/account/us/.
For information about software, IT, and business consulting services, go to the
solutions site at www.ibm.com/businesssolutions/us/en.
General information
Product feedback
You can provide general product feedback through the Consumability Survey at
www.ibm.com/software/data/info/consumability-survey.
Documentation feedback
You can click the feedback link in any topic in the information center to comment
on the information center.
You can also send your comments about PDF file books, the information center, or
any other documentation in the following ways:
v Online reader comment form: www.ibm.com/software/data/rcf/
v E-mail: [email protected]
required_item
required_item
optional_item
If an optional item appears above the main path, that item has no effect on the
execution of the syntax element and is used only for readability.
optional_item
required_item
v If you can choose from two or more items, they appear vertically, in a stack.
If you must choose one of the items, one item of the stack appears on the main
path.
required_item required_choice1
required_choice2
If choosing one of the items is optional, the entire stack appears below the main
path.
required_item
optional_choice1
optional_choice2
If one of the items is the default, it appears above the main path, and the
remaining choices are shown below.
default_choice
required_item
optional_choice1
optional_choice2
v An arrow returning to the left, above the main line, indicates an item that can be
repeated.
If the repeat arrow contains a comma, you must separate repeated items with a
comma.
required_item repeatable_item
A repeat arrow above a stack indicates that you can repeat the items in the
stack.
v Sometimes a diagram must be split into fragments. The syntax fragment is
shown separately from the main syntax diagram, but the contents of the
fragment should be read as if they are on the main path of the diagram.
required_item fragment-name
Fragment-name:
required_item
optional_item
The IBM Information Server product modules and user interfaces are not fully
accessible. The installation program installs the following product modules and
components:
v IBM Information Server Business Glossary Anywhere
v IBM Information Server FastTrack
v IBM Metadata Workbench
v IBM WebSphere Business Glossary
v IBM WebSphere DataStage and QualityStage
v IBM WebSphere Information Analyzer
v IBM WebSphere Information Services Director
Accessible documentation
IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user’s responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you
any license to these patents. You can send license inquiries, in writing, to:
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION ″AS IS″ WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply
to you.
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it
believes appropriate without incurring any obligation to you.
IBM Corporation
J46A/G4
555 Bailey Avenue
San Jose, CA 95141-1003 U.S.A.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.
All statements regarding IBM’s future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.
This information is for planning purposes only. The information herein is subject to
change before the products described become available.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
© (your company name) (year). Portions of this code are derived from IBM Corp.
Sample Programs. © Copyright IBM Corp. _enter the year or years_. All rights
reserved.
If you are viewing this information softcopy, the photographs and color
illustrations may not appear.
Trademarks
IBM trademarks and certain non-IBM trademarks are marked on their first
occurrence in this information with the appropriate symbol.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corporation in the United States, other countries,
or both. If these and other IBM trademarked terms are marked on their first
occurrence in this information with a trademark symbol (® or ™), these symbols
indicate U.S. registered or common law trademarks owned by IBM at the time this
information was published. Such trademarks may also be registered or common
law trademarks in other countries. A current list of IBM trademarks is available on
the Web at ″Copyright and trademark information″ at www.ibm.com/legal/
copytrade.shtml.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered
trademarks or trademarks of Adobe Systems Incorporated in the United States,
and/or other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo,
Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Notices 53
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
The United States Postal Service owns the following trademarks: CASS, CASS
Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service,
USPS and United States Postal Service. IBM Corporation is a non-exclusive DPV
and LACSLink licensee of the United States Postal Service.
W
WebSphere DataStage
Copy stage 13, 25
creating a job 6
Designer client 1
WebSphere DataStage Designer 6
WebSphere QualityStage
jobs 1
projects 1
stages 2
Survive stage 37, 38
Survive stage job 37
summary 41
Unduplicate Match stage 29, 31
Unduplicate stage job 33
summary 35
value 1
Word 9
Word pattern report 9
Printed in USA
SC18-9925-01