SDI April2023 Tasks en
SDI April2023 Tasks en
April 2023
Tasks
Informatica Data Integration - Free & PayGo Tasks
April 2023
© Copyright Informatica LLC 2022, 2023
This software and documentation are provided only under a separate license agreement containing restrictions on use and disclosure. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC.
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial
computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such,
the use, duplication, disclosure, modification, and adaptation is subject to the restrictions and license terms set forth in the applicable Government contract, and, to the
extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License.
Informatica, Informatica Cloud, Informatica Intelligent Cloud Services, PowerCenter, PowerExchange, and the Informatica logo are trademarks or registered trademarks
of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://
www.informatica.com/trademarks.html. Other company and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties. Required third party notices are included with the product.
The information in this documentation is subject to change without notice. If you find any problems in this documentation, report them to us at
[email protected].
Informatica products are warranted according to the terms and conditions of the agreements under which they are provided. INFORMATICA PROVIDES THE
INFORMATION IN THIS DOCUMENT "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT.
Table of Contents 3
Guidelines for sources and targets in data integration tasks. . . . . . . . . . . . . . . . . . . . . . . . . . 28
Rules and guidelines for flat file sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Rules and guidelines for database sources and targets. . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Table of Contents
Configuring the target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Configuring the field mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Configuring runtime options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Running a data transfer task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Table of Contents 5
Preface
Use Tasks to learn how to set up and run Data Integration tasks manually or on a schedule.
Informatica Resources
Informatica provides you with a range of product resources through the Informatica Network and other online
portals. Use the resources to get the most from your Informatica products and solutions and to learn from
other Informatica users and subject matter experts.
Informatica Documentation
Use the Informatica Documentation Portal to explore an extensive library of documentation for current and
recent product releases. To explore the Documentation Portal, visit https://docs.informatica.com.
If you have questions, comments, or ideas about the product documentation, contact the Informatica
Documentation team at [email protected].
https://network.informatica.com/community/informatica-network/products/cloud-integration
Developers can learn more and share tips at the Cloud Developer community:
https://network.informatica.com/community/informatica-network/products/cloud-integration/cloud-
developers
https://marketplace.informatica.com/
6
Data Integration connector documentation
You can access documentation for Data Integration Connectors at the Documentation Portal. To explore the
Documentation Portal, visit https://docs.informatica.com.
To search the Knowledge Base, visit https://search.informatica.com. If you have questions, comments, or
ideas about the Knowledge Base, contact the Informatica Knowledge Base team at
[email protected].
Subscribe to the Informatica Intelligent Cloud Services Trust Center to receive upgrade, maintenance, and
incident notifications. The Informatica Intelligent Cloud Services Status page displays the production status
of all the Informatica cloud products. All maintenance updates are posted to this page, and during an outage,
it will have the most current information. To ensure you are notified of updates and outages, you can
subscribe to receive updates for a single component or all Informatica Intelligent Cloud Services
components. Subscribing to all components is the best way to be certain you never miss an update.
For online support, click Submit Support Request in Informatica Intelligent Cloud Services. You can also use
Online Support to log a case. Online Support requires a login. You can request a login at
https://network.informatica.com/welcome.
The telephone numbers for Informatica Global Customer Support are available from the Informatica web site
at https://www.informatica.com/services-and-training/support-services/contact-us.html.
Preface 7
Chapter 1
Mapping tasks
Mapping tasks process data based on the data flow logic defined in a mapping.
A mapping reads data from one or more sources, transforms the data based on logic that you define,
and writes it to one or more targets. Create a mapping when you need to augment or manipulate your
data before you load it to a target. For example, if you need to aggregate data, calculate values, perform
complex joins, normalize data, or route data to different targets, you can create a mapping to do this.
A mapping task runs the data flow logic that you've defined in the mapping. Choose this task type after
you've created a mapping so that you can run the data flow logic defined in the mapping.
Data transfer tasks move data from one or two sources to a target. You can also choose to sort and
filter the data before you load it to the target.
Choose this task type when you want to transfer data from a source object, optionally add fields from a
second source object, and write the data to a new or existing target object without changing the source
data. For example, if you want to move customer records from an on-premises database table to a table
in your cloud data warehouse, create a data transfer task.
Data loader tasks provide secure data loading from multi-object sources to corresponding objects in
your cloud data warehouse. They can load data incrementally and provide support for schema drift.
To optimize performance, data loading occurs in parallel batches. To fine-tune the data, you can exclude
certain objects and fields and also apply some simple filters. If your source data changes frequently, you
can load only new and changed records each time the task runs.
Choose this task type when you need to ingest data as-is from multiple objects into your cloud data
warehouse. For example, if you need to repeatedly load all the data from files an Amazon S3 bucket to
corresponding tables in Snowflake Data Cloud, create a data loader task.
When you create a task, Data Integration walks you through the required steps. The options and properties
that display depend on the task type.
You can create a workflow of multiple tasks by linking the tasks in taskflows. For more information, see
Taskflows.
8
Data filters
You can create the following type of data filters in a data transfer task:
• Simple
• Advanced
You can create a set of data filters for each object included in the task. Each set of data filters acts
independently of the other sets.
When you create multiple simple data filters, the associated task creates an AND operator between the filters
and loads rows that apply to all simple data filters.
For example, you load rows from the Account Salesforce object to a database table. However, you want to
load only accounts that have greater than or equal to $100,000 in annual revenue and that have more than
500 employees. You configure the following simple data filters:
When you create an advanced data filter, you enter one expression that contains all filters. The expression
that you enter becomes the WHERE clause in the query used to retrieve records from the source.
Data filters 9
For example, you load rows from the Account Salesforce object to a database table. However, you want to
load records where the billing state is California or New York and the annual revenue is greater than or equal
to $100,000. You configure the following advanced filter expression:
(BillingState = 'CA' OR BillingState = 'NY') AND (AnnualRevenue >= 100000)
When you create a data filter on a Salesforce object, the corresponding task generates a SOQL query with a
WHERE clause. The WHERE clause represents the data filter. The SOQL query must be less than 20,000
characters. If the query exceeds the character limit, the following error appears:
Salesforce SOQL limit of 5000 characters has been exceeded for the object: <Salesforce
object>. Please exclude more fields or decrease the filters.
Note: Filter conditions are not validated until runtime.
1. In a data transfer task, in the Filters area on the Source or Second Source page, select Advanced.
To convert all simple data filters to one advanced data filter, select Advanced.
2. If necessary, specify the object on which to create the data filter.
You create separate data filters for each source object included in the task.
3. Enter the filter expression.
Click the field name to add the field to the expression.
4. Click OK.
To delete a data filter, click the Delete icon next to the data filter.
5. Click Next.
The following table shows the operators you can use for each field type:
String =, !=, LIKE'_%', LIKE'%_', LIKE'%_%', Is Null, Is Not Null, <, <=, >, >=
Textarea =, !=, LIKE'_%', LIKE'%_', LIKE'%_%', Is Null, Is Not Null, <, <=, >, >=
Variable Description
$LastRunDate The start date in GMT time zone of the last task run that was successful or ended with a warning.
Does not include time. For example, 2018-09-24. Can be used as a value for filter where the field type
is DATE.
$LastRunTime The start date and time in GMT time zone of the last task run that was successful or ended with a
warning. For example, 2018-09-24 15:23:23. Can be used as a value for filter where the field type is
DATETIME.
For example, you can include the following simple filter condition:
LastModifiedDate > $LastRunTime
Note: Consider time zone differences when comparing dates across time zones. The date and time of the
$LastRunDate and $LastRunTime variables are based on the time zone set in Informatica Intelligent Cloud
Services. The date and time of the actual job is based on the GMT time zone for Salesforce sources and the
database server for database sources. The difference in the time zones may yield unexpected results.
Data filters 11
• If you specify a date and no time for a date/time filter, Data Integration uses 00:00:00 (12:00:00 a.m.) as
the time.
• The list of available operators in a simple data filter depends on the data type of the field included in the
data filter. Some operators do not apply to all fields included in data filters.
• When you enter more than one simple data filter, applications filter rows that meet the requirements of all
data filters.
• When you use a parameter in a data filter, start the data filter with the parameter. For example, use $
$Sales=100000 instead of 100000=$$Sales.
Field expressions
When you configure a data transfer task, you can configure the field mapping. The field mapping defines how
source fields are mapped to target fields. You can specify an expression for each field mapping.
You can map multiple source fields to the same target field. For example, you can map SourceFieldA and
SourceFieldB to TargetFieldC.
Data Integration might suggest operations when you map multiple source fields to a single target field. For
example, if you map multiple text fields to a target text field, Data Integration concatenates the source text
fields by default. You can change the default expression.
Data Integration provides a transformation language that includes SQL-like functions to transform source
data. Use these functions to write expressions, which modify data or test whether data matches the
conditions that you specify.
For more information about functions and the Data Integration transformation language, see Function
Reference.
1. In the Field Mappings page, select the target field for which you want to add an expression.
2. Click Add or Edit Expression.
By default, the Field Expression dialog box shows the source field as the expression, which indicates
that the target contains the same value as the source.
3. Enter the new field expression.
To include source fields and system variables in the expression, you can select them from the Source
Fields and System Variables tabs to insert them into the expression or you can add them to the
expression manually.
4. Click Validate Mapping to validate the field mappings.
5. Click Save.
• When you validate mappings, Data Integration performs the following validations:
- Verifies that the source and target fields in the task exist in the source or target. If the field does not
exist, an error appears.
- Verifies that the correct parameters are used for each function and that the function is valid.
• The expression validator does not perform case-sensitive checks on field names.
• The expression validator verifies that the data type of a field in an expression matches the data type
expected by the containing function. However, the expression validator does not check for incompatible
data types between the following sets of objects:
- Source and target fields of tasks.
• Fields. Use the name of a source field to refer to the value of the field.
• Literals. Use numeric or string literals to refer to specific values.
• Functions. Use these SQL-like functions to change data in a task.
• Operators. Use transformation operators to create expressions to perform mathematical computations,
combine data, or compare data.
• Constants. Use the predefined constants to reference values that remain constant, such as TRUE.
Expression syntax
You can create a simple expression that only contains a field, such as ORDERS, or a numeric literal, such as
10. You can also write complex expressions that include functions nested within functions, or combine
different fields using the transformation language operators.
Note: Although the transformation language is based on standard SQL, there are differences between the two
languages.
Field expressions 13
String literals are case sensitive and can contain any character except a single quotation mark. For example,
the following string is not allowed:
'Joan's car'
To return a string containing a single quotation mark, use the CHR function:
'Joan' || CHR(39) || 's car'
Do not use single quotation marks with numeric literals. Just enter the number you want to include. For
example:
.05
or
$$Sales_Tax
• For each source field, you can perform a lookup or create an expression. You cannot do both.
• You cannot use strings in numeric expressions.
For example, the expression 1 + '1' is not valid because you can only perform addition on numeric data
types. You cannot add an integer and a string.
• You cannot use strings as numeric parameters.
For example, the expression SUBSTR(TEXT_VAL, '1', 10) is not valid because the SUBSTR function
requires an integer value, not a string, as the start position.
• You cannot mix data types when using comparison operators.
For example, the expression 123.4 = '123.4' is not valid because it compares a decimal value with a
string.
• You can pass a value from a field, literal string or number, or the results of another expression.
• Separate each argument in a function with a comma.
• Except for literals, the transformation language is not case sensitive.
• The colon (:), comma (,), and period (.) have special meaning and should be used only to specify syntax.
• Data integration tasks treat a dash (-) as a minus operator.
• If you pass a literal value to a function, enclose literal strings within single quotation marks. Do not use
quotation marks for literal numbers. Data integration tasks treat any string value enclosed in single
quotation marks as a character string.
• Do not use quotation marks to designate fields.
• You can nest multiple functions within an expression. Data integration tasks evaluate the expression
starting with the innermost function.
• When you use a parameter in an expression, use the appropriate function to convert the value to the
necessary data type. For example, you might use the following expression to define a quarterly bonus for
employees:
IIF((EMP_SALES < TO_INTEGER($$SalesQuota), 200, 0)
• Two dashes:
-- These are comments
Reserved words
Some keywords, such as constants, operators, and system variables, are reserved for specific functions.
These include:
• :EXT
• :INFA
• :LKP
• :MCR
• :SD
• :SEQ
• :SP
• :TD
• AND
• DD_DELETE
• DD_INSERT
• DD_REJECT
• DD_UPDATE
• FALSE
• NOT
• NULL
• OR
• PROC_RESULT
• SPOUTPUT
• TRUE
• WORKFLOWSTARTTIME
Field expressions 15
The following words are reserved for Informatica Intelligent Cloud Services:
• ABORTED
• DISABLED
• FAILED
• NOTSTARTED
• STARTED
• STOPPED
• SUCCEEDED
Note: You cannot use a reserved word to name a field. Reserved words have predefined meanings in
expressions.
• General
• Performance
• Advanced
• Error handling
Session Log File Name for the session log. Use any valid file name.
Name You can customize the session log file name in one of the following ways:
- Using a static name. A static log file name is a simple static string with or without a file
extension.
If you use a static name, the log file name is appended with a sequence number each time
the task runs, for example samplelog.1, samplelog.2. When the maximum number of log
files is reached, the numbering sequence begins a new cycle.
- Using a dynamic name. A log file name is dynamic when it includes a parameter defined in
a parameter file or a system variable. You can include any of the following system
variables:
- $CurrentTaskName. Replaced with the task name.
- $CurrentTime. Replaced with the current time.
- $CurrentRunId. Replaced with the run ID for the current job.
If you use a dynamic name, the file name is unique for every task run. The Maximum
Number of Log Files property is not applied. To purge old log files, delete the files
manually.
Session Log File Directory where the session log is saved. Use a directory local to the Secure Agent to run the
Directory task.
By default, the session log is saved to the following directory:
<Secure Agent installation directory>/apps/Data_Integration_Server/logs
Source File Directory Source file directory path. Use for flat file connections only.
Treat Source Rows When the task reads source data, it marks each row with an indicator that specifies the target
as operation to perform when the row reaches the target. Use one of the following options:
- Insert. All rows are marked for insert into the target.
- Update. All rows are marked for update in the target.
- Delete. All rows are marked for delete from the target.
- Data Driven. The task uses the Update Strategy object in the data flow to mark the
operation for each source row.
Commit Type Commit type to use. Use one of the following options.
- Source. The task performs commits based on the number of source rows.
- Target. The task performs commits based on the number of target rows.
When you do not configure a commit type, the task performs a target commit.
Rollback Rolls back the transaction at the next commit point when the task encounters a non-fatal
Transactions on error.
Errors When the task encounters a transformation error, it rolls back the transaction if the error
occurs after the effective transaction generator for the target.
Performance settings
The following table describes the performance settings:
Performance Description
settings
DTM Buffer Size Amount of memory allocated to the task from the DTM process.
By default, a minimum of 12 MB is allocated to the buffer at run time.
Use one of the following options:
- Auto. Enter Auto to use automatic memory settings. When you use Auto, configure
Maximum Memory Allowed for Auto Memory Attributes.
- A numeric value. Enter the numeric value that you want to use. The default unit of measure
is bytes. Append KB, MB, or GB to the value to specify a different unit of measure. For
example, 512MB.
You might increase the DTM buffer size in the following circumstances:
- When a task contains large amounts of character data, increase the DTM buffer size to 24
MB.
- When a source contains a large binary object with a precision larger than the allocated DTM
buffer size, increase the DTM buffer size so that the task does not fail.
Session Retry on The task retries a write on the target when a deadlock occurs.
Deadlock
Create Temporary Allows the task to create temporary view objects in the database when it pushes the task to
View the database.
Use when the task includes an SQL override in the Source Qualifier transformation or Lookup
transformation.
Enable cross-schema Enables pushdown optimization for tasks that use source or target objects associated with
pushdown different schemas within the same database.
optimization To see if cross-schema pushdown optimization is applicable to the connector you use, see
the help for the relevant connector.
This property is enabled by default.
Allow Pushdown for Indicates that the database user of the active database has read permission on idle
User Incompatible databases.
Connections If you indicate that the database user of the active database has read permission on idle
databases, and it does not, the task fails.
If you do not indicate that the database user of the active database has read permission on
idle databases, the task does not push transformation logic to the idle databases.
Session Sort Order Order to use to sort character data for the task.
Advanced options
The following table describes the advanced options:
Default Buffer Block Size of buffer blocks used to move data and index caches from sources to targets. By default,
Size the task determines this value at run time.
Use one of the following options:
- Auto. Enter Auto to use automatic memory settings. When you use Auto, configure
Maximum Memory Allowed for Auto Memory Attributes.
- A numeric value. Enter the numeric value that you want to use. The default unit of measure
is bytes. Append KB, MB, or GB to the value to specify a different unit of measure. For
example, 512MB.
The task must have enough buffer blocks to initialize. The minimum number of buffer blocks
must be greater than the total number of Source Qualifiers, Normalizers for COBOL sources,
and targets.
The number of buffer blocks in a task = DTM Buffer Size / Buffer Block Size. Default settings
create enough buffer blocks for 83 sources and targets. If the task contains more than 83,
you might need to increase DTM Buffer Size or decrease Default Buffer Block Size.
Line Sequential Number of bytes that the task reads for each line. Increase this setting from the default of
Buffer Length 1024 bytes if source flat file records are larger than 1024 bytes.
Maximum Memory Maximum memory allocated for automatic cache when you configure the task to determine
Allowed for Auto the cache size at run time.
Memory Attributes You enable automatic memory settings by configuring a value for this attribute. Enter a
numeric value. The default unit is bytes. Append KB, MB, or GB to the value to specify a
different unit of measure. For example, 512MB.
If the value is set to zero, the task uses default values for memory attributes that you set to
auto.
Maximum Maximum percentage of memory allocated for automatic cache when you configure the task
Percentage of Total to determine the cache size at run time. If the value is set to zero, the task uses default
Memory Allowed for values for memory attributes that you set to auto.
Auto Memory
Attributes
Additional Restricts the number of pipelines that the task can create concurrently to pre-build lookup
Concurrent Pipelines caches. You can configure this property when the Pre-build Lookup Cache property is enabled
for Lookup Cache for a task or transformation.
Creation When the Pre-build Lookup Cache property is enabled, the task creates a lookup cache before
the Lookup receives the data. If the task has multiple Lookups, the task creates an additional
pipeline for each lookup cache that it builds.
To configure the number of pipelines that the task can create concurrently, select one of the
following options:
- Auto. The task determines the number of pipelines it can create at run time.
- Numeric value. The task can create the specified number of pipelines to create lookup
caches.
Custom Properties Configure custom properties for the task. You can override the custom properties that the
task uses after the job has started. The task also writes the override value of the property to
the session log.
Pre-build Lookup Allows the task to build the lookup cache before the Lookup receives the data. The task can
Cache build multiple lookup cache files at the same time to improve performance.
Configure one of the following options:
- Always allowed. The task can build the lookup cache before the Lookup receives the first
source row. The task creates an additional pipeline to build the cache.
- Always disallowed. The task cannot build the lookup cache before the Lookup receives the
first row.
When you use this option, configure the Configure the Additional Concurrent Pipelines for
Lookup Cache Creation property. The task can pre-build the lookup cache if this property is
greater than zero.
DateTime Format Date time format for the task. You can specify seconds, milliseconds, or nanoseconds.
String To specify seconds, enter MM/DD/YYYY HH24:MI:SS.
To specify milliseconds, enter MM/DD/YYYY HH24:MI:SS.MS.
To specify microseconds, enter MM/DD/YYYY HH24:MI:SS.US.
To specify nanoseconds, enter MM/DD/YYYY HH24:MI:SS.NS.
By default, the format specifies microseconds, as follows: MM/DD/YYYY HH24:MI:SS.US.
Error handling
The following table describes the error handling options:
Stop on Errors Indicates how many non-fatal errors the task can encounter before it stops the session. Non-
fatal errors include reader, writer, and DTM errors.
Enter the number of non-fatal errors you want to allow before stopping the session. The task
maintains an independent error count for each source, target, and transformation. If you specify
0, non-fatal errors do not cause the session to stop.
On Pre-Session Determines the behavior when a task that includes pre-session shell commands encounters
Command Task errors. Use one of the following options:
Error - Stop Session. The task stops when errors occur while executing pre-session shell commands.
- Continue Session. The task continues regardless of errors.
By default, the task stops.
On Pre-Post SQL Determines the behavior when a task that includes pre-session or post-session SQL encounters
Error errors:
- Stop Session. The task stops when errors occur while executing pre-session or post-session
SQL.
- Continue. The task continues regardless of errors.
By default, the task stops.
Error Log Type Specifies the type of error log to create. You can specify flat file or no log. Default is none.
You cannot log row errors from XML file sources. You can view the XML source errors in the
session log.
Do not use this property when you use the Pushdown Optimization property.
Error Log File Specifies the directory where errors are logged. By default, the error log file directory is
Directory $PMBadFilesDir\.
Error Log File Specifies error log file name. By default, the error log file name is PMError.log.
Name
Log Row Data Specifies whether or not to log transformation row data. When you enable error logging, the
task logs transformation row data by default. If you disable this property, n/a or -1 appears in
transformation row data fields.
Log Source Row Specifies whether or not to log source row data. By default, the check box is clear and source
Data row data is not logged.
Data Column Delimiter for string type source row data and transformation group row data. By default, the
Delimiter task uses a pipe ( | ) delimiter.
Tip: Verify that you do not use the same delimiter for the row data as the error logging columns.
If you use the same delimiter, you may find it difficult to read the error log file.
Parameter files
A parameter file is a list of user-defined parameters and their associated values.
Use a parameter file to define values that you want to update without having to edit the task. You update the
values in the parameter file instead of updating values in a task. The parameter values are applied when the
task runs.
You can use a parameter file to define parameter values in mapping tasks.
• Source
• Target
• Lookup
• SQL
• Source
• Target
• Lookup
Also, define values for parameters in data filters, expressions, and lookup expressions.
Note: Not all connectors support parameter files. To see if a connector supports runtime override of
connections and data objects, see the help for the appropriate connector.
Schedules
You can run tasks manually or you can use schedules to run them at a specific time or interval such as
hourly, daily, or weekly.
To use a schedule, you associate the task with a schedule when you configure the task. You can use an
existing schedule or create a new schedule. If you want to create a schedule, you can create the schedule
from the task's Schedule page during task configuration.
When you create a schedule, you specify the date and time. You can configure a schedule to run associated
assets throughout the day between 12:00 a.m. and 11:55 p.m. Informatica Intelligent Cloud Services might
add a small schedule offset to the start time, end time, and all other time configurations. As a result,
scheduled tasks and taskflows might start later than expected. For example, you configure a schedule to run
hourly until noon, and the schedule offset for your organization is 10 seconds. Informatica Intelligent Cloud
Services extends the end time for the schedule to 12:00:10 p.m., and the last hourly task or taskflow starts at
12:00:10 p.m. To see the schedule offset for your organization, check the Schedule Offset organization
property.
You can monitor scheduled tasks from the All Jobs page in Monitor. Scheduled tasks do not appear on the
My Jobs page.
When you copy a task that includes a schedule, the schedule is not associated with the new task. To
associate a schedule with the new task, edit the task.
If you remove a task from a schedule as the task runs, the job completes. Data Integration cancels any
additional runs associated with the schedule.
Repeat frequency
The repeat frequency determines how often tasks run. You can set the repeat frequency to every N minutes,
hourly, daily, weekly, biweekly, or monthly.
Option Description
Every N Tasks run on an interval based on a specified number of minutes. You can configure the following
minutes options:
- Repeat frequency. Select a frequency in minutes. Options are 5, 10, 15, 20, 30, 45.
- Days. Days of the week when you want tasks to run. You can select one or more days of the week.
- Time range. Hours of the day when you want tasks to start. Select All Day or configure a time range.
You can configure a time range between 00:00-23:55.
- Repeat option. The range of days when you want tasks to run. You can select Repeat Indefinitely or
configure an end date and time.
Schedules 23
Option Description
Hourly Tasks run on an hourly interval based on the start time of the schedule.
You can configure the following options:
- Repeat frequency. Select a frequency in hours. Options are 1, 2, 3, 4, 6, 8, 12.
- Days. Days of the week when you want tasks to run. You can select one or more days of the week.
- Time range. Hours of the day when you want tasks to start. Select All Day or configure a time range.
You can configure a time range between 00:00-23:55.
- Repeat option. The range of days when you want tasks to run. You can select Repeat Indefinitely or
configure an end date and time.
Daily Tasks run daily at the start time configured for the schedule.
You can configure the following options:
- Repeat frequency. The frequency at which you want tasks to run. Select Every Day or Every Weekday.
- Repeat option. The range of days when you want tasks to run. You can select Repeat Indefinitely or
configure an end date and time.
Weekly Tasks run on a weekly interval based on the start time of the schedule.
You can configure the following options:
- Days. Days of the week when you want tasks to run. You can select one or more days of the week.
- Repeat option. The range of days when you want tasks to run. You can select Repeat Indefinitely or
configure an end date and time.
If you do not specify a day, the schedule runs regularly on the same day of the week as the start date.
Biweekly Tasks run every two weeks based on the start time of the schedule.
You can configure the following options:
- Days. Days of the week when you want tasks to run. You can select one or more days of the week. You
must select at least one day.
- Repeat option. The range of days when you want tasks to run. You can select Repeat Indefinitely or
configure an end date and time.
If you configure a biweekly schedule to start at 5 p.m. on a Tuesday and run tasks every two weeks on
Mondays, the schedule begins running tasks on the following Monday.
Monthly Tasks run on a monthly interval based on the start time of the schedule.
You can configure the following options:
- Day. Day of the month when you want tasks to run. You can configure one of the following options:
- Select the exact date of the month, between 1-28. If you want the task to run on days later in the
month, use the <n> <day of the week> option.
- Select the <n> <day of the week>. Options for <n> include First, Second, Third, Fourth, and Last.
Options for <day of the week> includes Day, and Sunday-Saturday.
Tip: With the Day option, you can configure tasks to run on the First Day or the Last Day of the month.
- Repeat option. The range of days when you want tasks to run. You can select Repeat Indefinitely or
configure an end date and time.
When you create a schedule, you select the time zone for the scheduler to use. You can select a time zone
that is different from your time zone or your organization time zone.
When Daylight Savings time goes into effect, tasks scheduled to run between 2:00 a.m. and 2:59 a.m., do not
run the day that the time changes from 2:00 a.m. to 3:00 a.m. If a task is scheduled to run biweekly at 2 a.m.,
it will run at 3 a.m. the day of the time change and at 2 a.m. for the next run.
Daylight Savings Time does not trigger additional runs for tasks that are scheduled to run between 1:00 a.m. -
1:59 a.m. when Standard Time begins. For example, a task is scheduled to run every day at 1:30 a.m. When
the time changes from 2 a.m. to 1 a.m., the task does not run again at 1:30 a.m.
Tip: To ensure that Informatica Intelligent Cloud Services does not skip any scheduled runs near the 2 a.m.
time change, do not schedule jobs to run between 12:59 a.m. and 3:01 a.m.
Creating a schedule
You can create a schedule in Data Integration when you configure a task. You can also create a schedule in
Administrator.
The following procedure describes how to create a schedule when you access the Schedule page from Data
Integration during task configuration.
Property Description
Schedules 25
Property Description
Time Zone Select the time zone for the schedule to use. The time zone can differ from the organization
time zone or user time zone.
Repeats Repeat frequency for the schedule. Select one of the following options:
- Does Not Repeat
- Every N Minutes
- Hourly
- Daily
- Weekly
- Monthly
Default is Does Not Repeat.
3. Click Save to save the schedule and return to the task configuration page.
1. On the Schedule page for the task, select Run this task on a schedule.
2. To specify whether to use an existing schedule or a new schedule, perform one of the following tasks:
• To use an existing schedule, select the schedule that you want to use.
• To create a schedule to use for the task, click New, and then configure the schedule properties.
3. Click Save.
Email notification
You can configure email notification for a task. When you configure custom email notification, Data
Integration uses the custom email notification instead of the email notification options configured for the
organization.
To configure email notification options, perform the following steps in the task wizard:
1. Specify whether to use the default email notification options that have been set for your organization or
create custom email notification for the task. Configure email notification using the following options:
Field Description
Use Default Email Notification Use the email notification options configured for the organization.
Options for my Organization
Use Custom Email Notification Use the email notification options configured for the task. You can send email
Options for this Task to different addresses based on whether the task failed, completed with errors,
or completed successfully.
Use commas to separate a list of email addresses.
When you select this option, email notification options configured for the
organization are not used.
Use the following rules and guidelines when creating the SQL commands:
• Use any command that is valid for the database type. However, Data Integration does not allow nested
comments, even if the database allows them.
• Use a semicolon (;) to separate multiple statements. Data Integration issues a commit after each
statement.
• Data Integration ignores semicolons within comments. If you need to use a semicolon outside of
comments, you can escape it with a backslash (\).
If the Secure Agent is on a Windows machine, separate commands with an ampersand (&). If the Secure
Agent is on a Linux machine, separate commands with a semicolon (;).
• Monitor the jobs that you initiated on the My Jobs page in Data Integration.
• Monitor running jobs in your organization on the Running Jobs page in Monitor.
• Monitor all jobs in your organization on the All Jobs page in Monitor.
For more information about monitoring jobs, see Monitor.
Stopping a job
A job is an instance of a mapping, task, or taskflow. You can stop a running job on the All Jobs, Running
Jobs, or My Jobs page.
1. Open Monitor and select All Jobs or Running Jobs, or open Data Integration and select My Jobs.
2. In the row that contains the job that you want to stop, click the Stop icon.
To view details about the stopped job, click the job name.
• All date columns in a flat file source must have the same date format.
• The flat file cannot contain empty column names. If a file contains an empty column name, the following
error appears:
Invalid header line: Empty column name found.
• Column names in a flat file must contain printable tab or ASCII characters (ASCII code 32-126). If the file
contains a character that is not valid, the following error appears:
Invalid header line: Non-printable character found. The file might be binary or might
have invalid characters in the header line.
• You can use a tab, space, or any printable special character as a delimiter. The delimiter can have a
maximum of 10 characters. The delimiter must be different from the escape character and text qualifier.
• For flat file sources with multibyte data on Linux, the default locale must be UTF-8.
• You can use database tables as targets. You can use database tables, aliases, and views as sources.
• Relational targets must meet the minimum system requirements.
• The database user account for each database target connection must have DELETE, INSERT, SELECT, and
UPDATE privileges.
Mapping tasks
Use the mapping task to process data based on the data flow logic defined in a mapping.
When you create a mapping task, you select the mapping for the task to use. The mapping must already exist
before you can create a mapping task for it. Alternatively, you can create a mapping task using a template.
If the mapping includes parameters, you can define the parameters when you configure the task or define the
parameters when you run the task. You can use user-defined parameters for data filters, expressions, and
lookup expressions in a mapping task. You define user-defined parameters in a parameter file associated
with the task.
At run time, a mapping task processes task data based on the data flow logic from the mapping, the
parameters defined in the task, and the user-defined parameters defined in a parameter file, when available.
Each mapping task template is based upon a mapping template. Use a mapping task template when the
mapping on which the mapping task template is based suits your needs. When you select a mapping task
template, Data Integration creates a copy of the template for you to use. When you define the mapping task
in the task wizard, you save a copy of the mapping template on which the mapping task template is based.
30
Templates are divided into three categories: Integration, Cleansing, and Warehousing, as shown in the
following image:
The templates range from simple templates that you can use to copy data from one source to another, to
complex templates that you can use for data warehousing-related tasks.
Related objects
When a mapping includes a source that is a parameter and is configured for multiple objects, you can join
related objects in the task.
You can join related objects based on existing relationships or custom relationships. Data Integration
restricts the type of relationships that you can create based on the connection type.
Existing relationships
You can use relationships defined in the source system to join related objects. You can join objects with
existing relationships for Salesforce, database, and some Data Integration Connectors connection types.
After you select a primary object, you select a related object from a list of related objects.
Custom relationships
You can use custom relationships to join multiple source objects. You can create custom relationships
for the database connection type.
When you create a custom relationship for database objects, you create an inner, left outer, or right outer
join on the source fields that you select.
To join source objects, you add the primary source object in the Objects and Relationships table. Then you
add related objects, specify keys for the primary and related objects, and configure the join type and operator.
Related objects 31
For more information about related source objects, see the Source Transformation section in
Transformations.
Advanced relationships
You can create an advanced relationship for database sources when the source object in the mapping is a
parameter and configured for multiple sources. You cannot create an advanced relationship between source
objects that have been joined using a custom relationship.
When you create an advanced relationship, the wizard converts any relationships that you defined to an SQL
statement that you can edit.
To create an advanced relationship, you add the primary source object in the Objects and Relationships table.
Then you select fields and write the SQL statement that you want to use. Use an SQL statement that is valid
for the source database. You can also add additional objects from the source.
Pushdown optimization
Pushdown optimization pushes some of the transformation logic in a mapping to source or target databases
for execution, which can improve task performance. By default, pushdown optimization is enabled for all
mapping tasks.
When you run a task configured for pushdown optimization, the task converts the transformation logic to an
SQL query. The task sends the query to the database, and the database executes the query.
The amount of transformation logic that you can push to the database depends on the database,
transformation logic, and task configuration. The task processes all transformation logic that it cannot push
to a database.
Note: Pushdown optimization functionality varies depending on the support available for the connector. For
more information, see the help for the appropriate connector.
You can use multiple task instances in a Parallel Paths step of a taskflow or in two different taskflows that
run in parallel.
To enable simultaneous task runs, select the Allow the mapping task to be executed simultaneously on the
Schedule tab when you configure the task.
Use caution when you configure mapping tasks to run simultaneously. Mapping features that change each
time the task runs, such as in-out parameters and sequence generator values, might produce unexpected
results when you run the task instances simultaneously.
Field metadata
You can view and edit field metadata such as the type, precision, and scale for parameterized source and
lookup objects with certain connection types.
To view and edit field metadata, use the Edit Types option on the appropriate page of the mapping task
wizard. You configure source field metadata on the Sources page and lookup field metadata on the Input
Parameters page.
If you edit field metadata in the task and the field metadata changes after the task is saved, Data Integration
uses the updated metadata. Typically, the is the desired behavior. However, if the task uses a flat file
connection and you want to retain the metadata used at design time, enable the Retain existing fields at
runtime option.
To see if a connector supports field metadata configuration, see the help for the appropriate connector.
By default, if you make changes to the schema, Data Integration does not pick up the changes automatically.
If you want Data Integration to refresh the data object schema every time the mapping task runs, you can
enable dynamic schema handling.
Data Integration automatically refreshes the schema for relational objects every time the task runs. If you
want to dynamically refresh the schema for other object types, enable dynamic schema change handling on
the Schedule page when you configure the task.
Option Description
Asynchronous Default. Data Integration refreshes the schema when you edit the mapping or mapping task, and
when Informatica Intelligent Cloud Services is upgraded.
Dynamic Data Integration refreshes the schema every time the task runs.
Applicable for source, target, and lookup objects of certain connector types. For some connector
types, Data Integration can only refresh the schema if the data object is a flat file.
If you select this option, the file object format must be delimited.
Not applicable to hierarchical data.
To see if a connector supports dynamic schema change handling, see the help for the appropriate
connector.
If you update fields in the source object and you enable dynamic schema handling, be sure to update the
Target transformation field mapping. Data Integration writes Null to the target fields that were previously
mapped to the renamed or deleted source fields. If you use a target created at run time, update the target
object name so that Data Integration creates a new target when the task runs. The task fails if Data
Integration tries to alter a target created in a previous task run.
To select target schema options, the target field mapping must be automatic.
When you configure target schema options for objects that are created at runtime, Data Integration creates
the target the first time you run the task. In subsequent task runs, Data Integration updates the target based
on the schema change option that you select.
The schema change handling options available are based on the target connection. To see if a connector
supports dynamic schema change handling, see the help for the appropriate connector.
Drop Current and Database Data Integration drops the existing target table and creates a new target table
Recreate with the schema from the upstream transformations on every run.
Alter and Apply Database Data Integration updates the target schema with additive changes to match
Changes the schema from the upstream transformations. It does not delete columns
from the target.
Don't Apply DDL Database Data Integration fetches the target schema at runtime and does not apply
Changes upstream schema changes to the target table.
Data Integration does not pass field constraints to the target. For example, the source contains fields S1 and
S2 configured with the NOT NULL constraint. The target contains fields T1 and T2 also configured with the
NOT NULL constraint. You select the Alter and Apply Changes schema handling option. When you run the
task, fields S1 and S2 are written to the target with no constraints.
Consider the following rules and guidelines when you enable dynamic schema change handling:
• Changes to the object schema take precedence over changes to the field metadata in the mapping. For
example, you add a field to the source object and then edit the metadata of an existing field in the
mapping. At run time, Data Integration adds the new field and does not edit the existing field.
• Data Integration resolves parameters before picking up the object schema.
• Data Integration treats renamed fields as deleted and added columns. If you rename a field, you might
need to update transformations that reference the renamed field. For example, if you rename a field that
is used in the lookup condition, the lookup cannot find the new field and the task fails.
• When you rename, add, or delete fields, you might need to update the field mapping. For example, if you
delete all the previously mapped fields in a target object, you must remap at least one field or the task
fails.
• Data Integration writes Null values to a target field in the following situations:
- You rename a target field with automatic field mapping, and the field name does not match a source
field.
- You rename a source field with manual field mapping, and you do not remap the field to the target.
• If you delete a field from a source or lookup object and a downstream transformation references the field,
the task fails.
• If you change a source or lookup field type, the task might fail if the new field type results in errors
downstream. For example, if you change an integer field in an arithmetic expression to a string field, the
expression is not valid and the task fails.
• If you change a target field type, Data Integration converts the data from the incoming field to the new
target field type. If the conversion results in an error, Data Integration drops the row. For example if you
change a string type to a date type where the string does not contain a date, Data Integration drops the
row.
When you run a mapping through a mapping task, you can configure an advanced option that specifies how
Data Integration handles mismatched schemas:
• Skip mismatched files and continue. When Data Integration finds a schema mismatch, it stops searching
for other errors in the same file and writes that error to the log. Data Integration doesn't process any other
records from that file and continues with the next file.
• Stop on first mismatched file. Data Integration stops all processing when it finds a schema mismatch
error, and it writes the error to the log. Data Integration does not roll backfiles processed before the error
was found, and it doesn't process the file containing the error.
When you run a mapping in the Mapping Designer, Data Integration evaluates every file and skips the entire
file when it encounters a schema mismatch.
As you work through the task wizard, you can click Save to save your work at any time. When you have
completed the wizard, you can click Finish to save and close the task wizard.
Field Description
Runtime Runtime environment that contains the Secure Agent to run the task.
Environment
3. Click Next.
Configuring sources
The Sources page displays differently based on the basis for the task. If the mapping does not include source
parameters, the Sources page might not appear.
You can add a single source object or multiple source objects based on the connection type and the mapping
configuration. You can also configure a source filter.
Add Currently Adds the source file name to each row. Data Integration adds the
Processed File Name CurrentlyProcessedFileName field to the source at run time.
Available for parameterized source objects with flat file connections.
2. For a parameterized source object, configure field metadata if required. You can configure field
metadata for sources with certain connection types. To see if a connector supports field metadata
configuration, see the help for the appropriate connector.
To configure field metadata, click Edit Types. In the Edit Field datatypes dialog box, configure the
following attributes and click OK:
Retain existing When enabled, the task uses the field metadata that is configured in the task.
fields at If field metadata changes after a task is saved, Data Integration uses the updated field
runtime metadata. Typically, this is the desired behavior. However, if the task uses a flat file
connection and you want to retain the metadata used at design time, enable this option.
Precision Total number of digits in a number. For example, the number 123.45 has a precision of 5.
The precision must be greater than or equal to 1.
Scale Number of digits to the right of the decimal point of a number. For example, the number 123.45
has a scale of 2.
Scale must be greater than or equal to 0.
The scale of a number must be less than its precision.
The maximum scale for a numeric data type is 65535.
Not editable for all data types.
Configuring targets
The Targets page displays differently depending on the basis for the task.
For mappings, the Targets page displays when the mapping includes parameters for target connections or
target objects. The properties that you need to specify are based on the type of parameter. For example, if
the target is parameterized but the connection is not, you must specify a target and you can optionally
change the connection.
Tip: All of the properties display in one list, even when the task includes multiple objects. Place your the
cursor over the properties to determine which objects the properties apply to. For example, in the following
image, the Enable target bulk load parameter applies to the target named TargetSQLAllCust.
• For a mapplet, you select a connection. You do not select objects for mapplets.
• When a connection name displays without surrounding dollar signs, it is a logical connection. If the logical
connection is associated with multiple objects on the Targets page, you select the logical connection
once, and then select each object.
• If the logical connection is associated with objects on other pages of the task wizard, be sure to use the
same connection for logical connections with the same name.
When you select an object, the Data Preview area displays a portion of the data in the object. Data preview
displays the first ten rows of the first five columns in the object. It also displays the total number of columns
in the object.
If the page has more than one object, you can select the object in the Data Preview area to display its data.
• Mapplet data.
• Certain Unicode characters.
• Binary data. If the object contains binary data, data preview shows the following text:
BINARY DATA
1. On the Targets page, configure the following details as required:
2. Click Next.
You can also configure email notification and advanced options for the task on the Schedule page.
1. To specify whether to run the task on a schedule or without a schedule, choose one of the following
options:
• If you want to run the task on a schedule, click Run this task on schedule. Select the schedule you
want to use or click New to create a schedule.
• If you want to run the task without a schedule, click Do not run this task on a schedule.
2. Configure email notification options for the task.
Field Description
Maximum Number of session log files to retain. By default, Data Integration stores each type of log file
Number of Log for 10 runs before it overwrites the log files for new runs.
Files Note: If a dollar sign ($) is present in a custom session log file name, for example, MyLog_
$CurrentTime, the file name is dynamic. If you customize the session log file name using a
dynamic name, the Maximum Number of Log Files property doesn't apply. To purge old log
files, delete the files manually.
Schema Change Determines how Data Integration picks up changes to the object schema. Select one of the
Handling following options:
- Asynchronous. Data Integration refreshes the schema when you update the mapping or
mapping task, and after an upgrade.
- Dynamic. Data Integration refreshes the schema every time the task runs.
Default is Asynchronous.
Dynamic Schema Determines how Data Integration applies schema changes from upstream transformations to
Handling the target object. Available when the schema change handling is dynamic and the field
mapping is automatic.
For each target, select how Data Integration updates the target schema. The options
available are based on the target connection.
For more information, see “Schema change handling” on page 33 or the help for the appropriate
connector.
4. Optionally, if the mapping task contains parameters, you can use parameter values from a parameter
file. Choose one of the following options:
Field Description
Parameter Path for the directory that contains the parameter file, excluding the parameter file name.
File Directory The Secure Agent must be able to access the directory.
You can use an absolute file path or a path relative to one of the following $PM system
variables:
- $PMRootDir
- $PMSourceFileDir
- $PMLookupFileDir
- $PMCacheDir
- $PMSessionLogDir
- $PMExtProcDir
- $PMTempDir
By default, Data Integration uses the following parameter file directory:
<Secure Agent installation directory>/apps/Data_Integration_Server/
data/userparameters
Parameter Name of the file that contains the definitions and values of user-defined parameters used in
File Name the task.
You can provide the file name or the relative path and file name in this field.
• To use a cloud-hosted file, select Cloud Hosted. Enter the following information about the file:
Field Description
Connection Connection where the parameter file is stored. You can use the following connection types:
- Amazon S3
- Google Storage V2
- Azure Data Lake Store Gen2
Object Name of the file that contains the definitions and values of user-defined parameters used in the
task.
5. Optionally, if you want to create a parameter file based on the parameters and default values specified in
the mapping on which the task is based, click Download Parameter File Template.
For more information about parameter file templates, see Mappings.
6. Choose whether to run the task in standard or verbose execution mode.
If you select verbose mode, the mapping generates additional data in the logs that you can use for
troubleshooting. Select verbose execution mode only for troubleshooting purposes. Verbose execution
mode impacts performance because of the amount of data it generates.
Property Description
Optimization Provides context about the mapping configuration for pushdown optimization. If you
Context Type select an option other than None, Data Integration constructs a single query for pushdown
optimization by combining multiple targets in the mapping based on the target
configurations. If you select None, the query is not optimized.
If Data Integration cannot apply the selected context, Data Integration uses the default
pushdown optimization behavior.
Select one of the following options:
- None
- SCD Type 2 merge
- Multi-insert
Default is None.
For more information, see the help for the appropriate connector.
If pushdown mode Cancels the task if the selected pushdown optimization type is unavailable for the
is not possible, connection.
cancel the task Default is disabled.
Create Temporary Allows the task to create temporary view objects in the database when it pushes the task
View to the database.
Use when the task includes an SQL override in the Source Qualifier transformation or
Lookup transformation.
Default is enabled.
Disabled when the pushdown optimization type is None.
To change the beginning value, you change the Current Value field in the Sequences page in the mapping
task wizard. The Current Value field shows the first value the task will generate in the sequence, based on
the last value generated in the last task execution.
For example, the last time you ran the CustDataIDs task, the last value generated was 124. The next time the
task runs, the first number in the sequence is 125 because the Sequence Generator transformation is
configured to increment by 1. If you want the sequence to begin with 200, you change the Current Value to
200.
• Manually. To run a mapping task manually, on the Explore page, navigate to the task. In the row that
contains the task, click Actions and select Run.
You can also run a mapping task manually, with or without advanced options, from the Task Details page.
To access the Task Details page, click Actions and select View.
• On a schedule. To run a mapping task on a schedule, edit the task in the mapping task wizard to associate
the task with a schedule.
Jobs that reprocess incrementally-loaded files read and process the incrementally-loaded files that were
modified during the time that you specify. You can choose to reprocess files that were modified in a given
time interval or all files modified after a given start time.
You can reprocess source files when at least one source is configured to incrementally load files and the
files have been loaded at least once. The advanced options are not available if the mapping does not have
sources configured to incrementally load files, the mapping task has never run, or the last load time was
reset.
Reprocessing does not change the mapping task’s last load time, so future jobs created from the mapping
task run and load data as usual.
Note: Processing is unpredictable when the start or end time of a reprocessing job is within the repeated hour
on the day that Daylight Saving Time ends.
When you configure a data transfer task, you can augment the source data with data from a lookup source.
Based on the source connection that you use, you can also sort and filter the data before you load it to the
target.
To see if a data transfer task is applicable to the connectors you are using, see the help for the relevant
connectors.
Task operations
When you configure a data transfer task, you specify the task operation. The operations available are based
on the target that you select.
Inserts all source rows into the target. If Data Integration finds a source row that exists in the target, the
row fails.
Update
Updates rows in the target that exist in the source. If Data Integration finds a row in the source that does
not exist in the target, the row fails.
Upsert
Updates all rows in the target that also exist in the source and inserts all new source rows in to the
target.
If a source field contains a NULL value and the corresponding target field contains a value, Data
Integration retains the existing value in the target field.
Delete
Deletes all rows from the target that exist in the source.
47
Data transfer task sources
You can select a single source to transfer data from.
The formatting and advanced options that you can configure for the source depend on the source connection
that you select. For example, for a flat file source, you can configure formatting options such as the
formatting type. For Salesforce sources, you can configure advanced options such as the SOQL filter
condition, row limit, and bulk query.
For information about the options that you can configure for a source connection, see the help for the
appropriate connector.
You can preview the data in the source. The preview returns the first 10 rows. To preview rows in alphabetical
order, select Display fields in alphabetical order. Data Integration does not change the order of rows in the
actual source. You can also download the preview results as a CSV file.
Source filters
Apply filter conditions to filter the source data that you transfer to the target.
Select the source field, and configure the operator and value to use in the filter.
When you define more than one filter condition, the task evaluates them in the order that you specify.
The task evaluates the filter conditions using the AND logical operator to join the conditions. It returns
rows that match all the filter conditions.
Advanced
Create a filter expression using the expression editor. You enter one expression that contains all filters.
You can use source fields and built-in functions in the expression.
For more information about advanced filters, see “Advanced data filters” on page 9.
Sort conditions
For certain source types, you can sort the source data to provided sorted data to the target.
When you sort data, you select one or more source fields to sort by. If you apply more than one sort
condition, Data Integration sorts fields in the listed order.
To see if a connector supports sorting, see the help for the appropriate connector.
Second sources
When you configure a data transfer task, you can add a second source to use as a lookup source. Configure
the lookup source on the Second Source page.
The task queries the lookup source based on the lookup condition that you specify and returns the result of
the lookup to the target.
Select a second source when you want to augment the source data with a related value or values from the
lookup source. For example, the source is an orders table that contains a customer ID field. You might
To optimize performance, the task caches the lookup source. The cache remains static and does not change
as the task runs. The task deletes the cache files after the task completes.
You can preview the data in the lookup source. The preview returns the first 10 rows. To preview rows in
alphabetical order, select Display fields in alphabetical order. Data Integration does not change the order of
rows in the actual source. You can also download the preview results to a CSV file.
You can also filter the data from both sources before writing it to the target.
Lookup condition
When you select a second source to use as a lookup source, you must configure one or more lookup
conditions.
A lookup condition defines when the lookup returns values from the lookup source. When you configure a
lookup condition, you compare the value of one or more fields from the original source with values in the
lookup source.
A lookup condition includes an incoming field from the original source, a field from the lookup source, and an
operator. To avoid possible naming conflicts, the data transfer task applies the prefix SRC_ to the fields from
the original source. If this results in a naming conflict for any field from the original source, the task applies
the prefix IN_SRC_ to the field from the original source.
For example, you might configure the following lookup condition when the original source contains the
CustID field, the lookup source contains the CustomerID field, and you want to return values from the lookup
source when the customer IDs match:
• Equals
• Not Equals
• Less Than
• Less Than or Equals
• Greater Than
• Greater Than or Equals
When you enter multiple conditions, the task evaluates the lookup conditions using the AND logical operator
to join the conditions. It returns rows that match all the lookup conditions.
When you include multiple conditions, to optimize performance, enter the conditions in the following order:
1. Equals
2. Less Than, Less Than or Equals, Greater Than, Greater Than or Equals
3. Not Equals
The lookup condition matches null values. When an input field is NULL, the task evaluates the NULL equal to
null values in the lookup.
If the lookup condition has multiple matches, the task returns any row.
Second sources 49
Second source filters
You can apply filter conditions to filter the combined data.
To configure a filter condition, select a source field and configure the operator and value to use in the
filter. You can select a field from either source. Fields from the original source are prefixed with the
characters SRC_ or IN_SRC_.
When you define more than one filter condition, the task evaluates them in the order that you specify.
The task evaluates the filter conditions using the AND logical operator to join the conditions. It returns
rows that match all the filter conditions.
Advanced
Create a filter expression using the expression editor. You enter one expression that contains all filters.
You can use source fields and built-in functions in the expression.
For more information about advanced filters, see “Advanced data filters” on page 9.
The task operations that you can select depend on the target connection that you use. For more information
about task operations for different target types, see the help for the appropriate connector.
You can preview the data in the target. The preview returns the first 10 rows. You can download the preview
results to a CSV file. To preview rows in alphabetical order, select Display fields in alphabetical order. Data
Integration does not change the order of rows in the actual target.
Update columns
Update columns are columns that uniquely identify rows in the target table. Add update columns when the
database target table does not contain a primary key and the data transfer task uses an update, upsert, or
delete operation.
When you run the data transfer task, the task uses the field mapping to match rows in the source to the
database table. If the data transfer task matches a source row to multiple target rows, it performs the
specified task operation on all matched target rows.
You must map at least one source field to a target field. If the task uses multiple sources, fields from the
original source are prefixed with the characters SRC_ or IN_SRC_.
• Show All
• Show Mapped
• Show Unmapped
Automap
Data Integration automatically links fields with the same name or similar name. Click Automap and
select from the following mapping options:
• Exact Field Name. Data Integration matches fields of the same name.
• Smart Map. Data Integration matches fields with similar names. For example, if you have an incoming
field Cust_Name and a target field Customer_Name, Data Integration automatically links the Cust_Name
field with the Customer_Name field.
• Undo Automap. Data Integration clears fields mapping with Smart Map or Exact Field Name but does
not clear manually mapped fields.
Actions
• Map Selected. Links the selected incoming field with the selected target field.
• Unmap Selected. Clears the link for the selected field.
• Clear Mapping. Clears all field mappings.
After you map a field, if you want to configure a field expression, click the mapped field name. You can
include fields and built-in functions in the expression but not user-defined functions.
When you create a target at run time, Data Integration maps the source fields to the target fields. You cannot
unmap or edit the source fields mapped to the target but you can add fields to the target. You can also edit
the mapped field expression and metadata and reorder the added fields. You cannot reorder the fields copied
from the source.
Field mapping 51
Field data types
When you create a data transfer task, Data Integration assigns a data type to each field in the source and
target. When you add a field to a target that you create at run time, you select the data type.
Property Description
Runtime Environment Runtime environment that contains the Secure Agent to run the task.
3. Click Next.
1. On the Second Source page, select Yes to add a second source to the task.
If you do not want to configure a second source, select No.
2. If you add a second source, perform the following steps to configure the source:
a. In the Source Details area, select Augment Data with Lookup.
b. Select the source connection and source object.
To create a connection, click New. To edit a connection, click View, and the in View Connection
dialog box, click Edit.
c. For file sources, configure formatting options.
d. If preview data does not appear automatically, expand the Data Preview area to preview the source
data.
e. Configure one or more lookup conditions.
f. Optionally, configure data filters for the combined sources.
3. Click Next.
Property Description
Formatting For flat file connections only. Select a delimiter and text qualifier. Optionally, select an
Options escape character.
If you choose Other for the delimiter, the delimiter cannot be an alphanumeric character or a
double quotation mark.
Truncate Target Database targets with the Insert task operation only. Truncates a database target table before
inserting new rows.
- True. Truncates the target table before inserting all rows.
- False. Inserts new rows without truncating the target table.
Default is False.
Enable Target Select this option to write data in bulk mode. The default value is false.
Bulk Load
1. To match fields with the same name, click Automap > Exact Field Name. Or, to match fields with similar
names, click Automap > Smart Map.
You can also select and drag the source fields to the applicable target fields.
2. To configure the field expression, click the mapped field. In the Field Expression window, enter the
expression you want to use and click OK.
3. If you create a target at run time and want to add target fields, click Add. Configure the following field
properties:
Property Description
Precision Total number of digits in a number. For example, the number 123.45 has a precision of 5. The
precision must be greater than or equal to 1.
Scale The number of digits to the right of the decimal point of a number. For example, the number 123.45
has a scale of 2. Scale must be greater than or equal to 0. The scale of a number must be less than
its precision. The maximum scale for a numeric data type is 65535.
4. Click Next.
1. To run a data transfer task on a schedule, select Run on a schedule and then select the schedule.
Note: You must create the schedule in Administrator before you can select it in the task.
2. Configure email notification options for the task.
3. Click Save.
• Manually. To run a data transfer task manually, on the Explore page, navigate to the task. In the row that
contains the task, click Actions and select Run.
You can also run a data transfer task manually from the Task Details page. To access the Task Details
page, click Actions and select View.
• On a schedule. To run a data transfer task on a schedule, edit the task in the data transfer task wizard to
associate the task with a schedule.
To start creating a task, click New in navigation menu on the left or Get Started on the Welcome page.
56
Step 1. Connect to your source
Configure a source connection to connect to your source. Configure the source connection on the Connect
Source page. You can create a new connection or select an existing connection.
To create a source connection, click New Connection. You can also select an existing connection. After you
choose the connection, you might have to configure additional properties like Row Limit and Use queryAll
that vary based on the connection type. For more information about connection properties, see Connections.
By default, Data Integration adds only the source objects you want to include. You can configure this feature
using options in the Source Objects field.
For some connection types, Data Integration automatically detects primary key fields and watermark fields.
Watermark fields identify the records that were added or changed since the last task run.
When you create a task, it's best to remove unnecessary source objects, fields, and records from the
data flow. Removing unnecessary data decreases the time it takes to run a task and helps to minimize
rejected records.
You can choose which source objects to read on the Connect Source page. You can also select the
fields to exclude and configure filters to exclude unnecessary records.
Primary key fields uniquely identify records in the source and target objects. When you re-run a task, the
task uses the primary key fields so that it can update existing rows and insert new rows into the target
tables. If the source objects don't have primary key fields defined, the task inserts rows into the target
tables, but it cannot update existing rows, which can lead to duplicate rows in the target tables.
Data loader tasks can automatically detect primary key fields for most connection types. You can also
select the primary key fields manually. Configure primary key field options on the Connect Source page.
Watermark fields are date/time or numeric fields that identify which records were added or changed. If
the source objects don't have watermark fields defined, the task must process all records in the source
objects each time the task runs, which increases the task processing time.
Data loader tasks can automatically detect watermark fields for most connection types. You can also
select the watermark fields manually. Configure watermark field options on the Connect Source page.
The following table lists the possible primary key and watermark field configurations and the expected
results:
Primary key fields configured. Changed records updated, new records inserted into the target tables
Watermark fields configured. (upsert).
Recommended configuration for best performance.
Primary key fields configured. Changed records updated, new records inserted into the target tables
Watermark fields not required. (upsert).
Impacts task performance because the task must perform a full scan on the
source.
Primary key fields not required. All records inserted into the target tables. Records that already exist are
Watermark fields configured. duplicated.
Primary key fields not required. All records inserted into the target tables. Records that already exist are
Watermark fields not required. duplicated.
Impacts task performance because the task must perform a full scan on the
source. This is the least recommended configuration.
To load the data as quickly as possible, configure the task to read only the source objects that you need to
process. Reading data from fewer source objects decreases the time it takes to run the task.
You can configure the objects to read on the Connect Source page under Define Objects.
By default, the task reads data only from the objects you include in the source location. If you want to read
data from most objects in the source location, choose Exclude some, and then select the objects that you
don't want to read. If you want to read data from all objects, choose Include all. If you choose Inculde all a
data loader task can read up to 2000 objects.
If you choose to read all objects or exclude some objects, you'll need to enter the source path for some
source types. If you entered a source path when you created the connection, the default source path is the
same as the one you entered when you created the connection. Otherwise, the source path is empty by
default. If you plan to read all objects or exclude objects in different locations, enter the path to the parent
container.
To configure the objects to exclude or include, click the plus sign (+) icon in the Excluded Source Objects or
Included Source Objects area. After you select objects, the Excluded Source Objects or Included Source
Objects area displays the objects you excluded or included. To delete an object, click the Delete icon in the
row that contains the object.
For most source types, you enter the source path in the format <container or bucket name>/<folder
name>/<subfolder name> or <database name>/<schema name>. If you enter a source path for an Amazon S3,
Azure Blob Storage, or Azure Data Lake Storage Gen 2 source, you'll need to append a slash character at the
end of each folder name to distinguish the folder from a file. So, for these sources, the source path format is
<container or bucket name>/<folder name/>/<subfolder name/>, for example, bucket1/folder1//
folder2//folder3/.
For more information about configuring the source path for a certain connection type, see the help for the
appropriate connector.
Configure the formatting options on the Connect Source page under Define Source Format. If you don't see
the Define Source Format area, then you don't have to configure formatting options for your source.
When you configure formatting options, you can preview the data to ensure that the formatting options are
correct. You can select the fields to use in the data preview.
Configure the fields to exclude on the Connect Source page under Exclude Fields. To exclude fields, click the
plus sign (+) icon in the Excluded Fields area, and then select the source object and the fields to exclude.
The Excluded Fields area displays the fields you excluded for each source object. To update the excluded
fields for a source object, click the excluded fields for that source object. To delete all excluded fields for a
source object, click the Delete icon in the row that contains the source object.
Configuring filters
You can configure filters for one or more source objects so that only records that match the filter conditions
are written to the target. The task processes the records that match all of the filter conditions.
Configure filters on the Connect Source page under Define Filters. To add filters, click + in the Filter
Conditions area. In the Configure Filters dialog box, select the source object and configure one or more filter
conditions.
To add a filter condition, click the plus sign (+) icon. Each filter condition contains a field, operator, and value.
You can use the following operators in a filter condition:
= (equals)
If you add multiple filter conditions, the task evaluates the filter conditions in the order in which they appear
in the Configure Filters dialog box.
For example, you want to extract records from the Orders table only when the city is New York and the order
date is later than January 1, 1990. Select "Orders" as the source object, and configure the following filter
conditions:
The Filter Conditions area displays the filters you added for each source object. To update the filters for a
source object, click the filters in the row that contains the source object. To delete all filters for a source
object, click the Delete icon in the row that contains the source object.
Primary key fields uniquely identify each record in a source object. For example, in a CUSTOMERS table, the
values in the CUSTOMER_ID column uniquely identify each customer. A source object can have one primary
key field, a composite key that consists of several primary key fields, or no primary key fields.
Primary key fields are needed if you want to update rows when you re-run the task. If there are no primary key
fields, the task inserts records into the target tables, but it cannot update rows.
Select primary key fields on the Connect Source page under Define Primary Keys.
The task identifies the primary key fields automatically in each source object that has them. This is the
default option for many types of sources. This option is not available for file-based sources.
If you choose this option but there are no primary key fields in the source objects or the task can't detect
the primary key fields, then the task inserts records into the target tables.
To find out whether a data loader task can detect primary key fields for a specific source type, see
Connections.
Select this option if you want to perform a full load every time the task runs. For example, you want to
read data from database tables that have primary key fields, but you don't want to update rows in the
target when you re-run the task.
When you select this option, the task inserts records into the target tables.
Select this option to choose the primary key fields in each source object. If you select this option, you
must also specify whether the source objects have the same primary key fields or different primary key
fields.
If the source objects have the same primary key fields, select The same across all sources. Then, click
Choose to select the primary key fields.
If the source objects have different primary key fields, select Different across sources. Then, click the
plus sign (+) icon in the Primary Key Fields area and select the source object and the primary key fields.
Repeat this process to select primary key fields for other source objects.
The Primary Key Fields area displays the primary key fields for each source object. To update the
primary key fields for a source object, click the primary key fields for that source object. To delete all
primary key fields for a source object, click the Delete icon in the row that contains the source object.
Watermark fields are date/time or numeric fields that identify the records that were added or changed since
the last task run. For example, you might select the MODIFIED_DATE column as the watermark field for a
source table. A source object can have one or no watermark fields.
If you want the task to load only new and changed data, you'll need to make sure that watermark fields are
configured. If there are no watermark fields, the task processes all fields in the source objects.
Select watermark fields on the Connect Source page under Define Watermark Fields.
The task identifies the watermark field automatically in each source object that has one. This is the
default option.
If you choose this option but there are no watermark fields in the source objects or the task can't detect
the watermark fields, then the task processes all records each time the task runs.
To find out whether a data loader task can detect watermark fields for a specific source type, see
Connections.
Select this option if you want the task to processes all records each time the task runs.
Enter manually
Select this option to choose the watermark field in each source object. If you select this option, you must
also specify whether the source objects have the same watermark field or different watermark fields.
If the source objects have the same watermark field, select The same across all sources. Then, click
Choose to select the watermark field.
If the source objects have different watermark fields, select Different across sources. Then, click the
plus sign (+) icon in the Watermark Fields area and select the source object and the watermark field.
Repeat this process to select watermark fields for other source objects.
The Watermark Fields area displays the watermark field for each source object. To update the
watermark field for a source object, click the watermark field for that source object. To delete the
watermark field for a source object, click the Delete icon in the row that contains the source object.
To create a target connection, click New Connection. You can also select an existing connection. After you
choose the connection, you might have to configure additional properties like Write Disposition and Staging
Location that vary based on the connection type. For more information about connection properties, see
Connections.
The first time you run a data loader task, the task creates one target table for each source object. By default,
target table names are the same as the source object names. However, special characters in the source table
names will be replaced with underscore characters. For example, if the source table name is Orders$, the
corresponding target table will be named Orders_.
If you enter a target name prefix, the target object names will be the same as the source object names, but
they'll be preceded with the prefix. For example, if a source table name is Account, and you enter the prefix
tgt_, the target table name will be tgt_Account.
When you re-run a task, you can either load data to the target incrementally or drop and re-create the target
tables.
Incremental loading
You can configure a data loader task to load data to the target incrementally. When the task loads data
incrementally, only new and changed data is loaded to the target each time you re-run the task. Incremental
loading increases task performance since fewer rows are loaded.
To configure the task to load data incrementally, select Yes under Load to existing tables on the Connect
Target page.
When you load data incrementally, the task runs most efficiently when you configure the following options in
the source:
The task uses the primary key fields to identify rows when updating or inserting data into the target
tables. If there are no primary key fields, the task can insert data into the target tables, but it can't update
existing rows. This can create duplicate rows in the target tables.
To load data incrementally, the source objects must contain watermark fields. The task uses the
watermark fields to determine which records have been added or changed since the last task run. If
If the source object structure changes between task runs, the data loader task detects the changes and alters
the target tables to match the source objects. Therefore, if you add a field to a source object or change a
field's data type, the task adds the new column or updates the column data type in the corresponding target
table. The task does not delete any existing target fields, however. If you want to delete unnecessary target
fields, you can either delete the fields manually or configure the task to drop and re-create the target tables.
If the name or data type of the watermark column changes between task runs, or if you change the
watermark column between task runs, the task performs a full load even if you've configured the target to
load to existing tables.
To configure the task to drop and re-create the target tables, select No, create new tables every time under
Load to existing tables on the Connect Target page. To ensure that the task processes all records each time
it runs, you also need to select Watermark field not required under Define Watermark Fields on the Connect
Source page.
When you select these options, the task performs a full target load each time it runs.
1. Validation icon
Before you run the task, you can check it for errors using the validation panel. To open the validation panel,
click the Validation icon. If there are validation errors, you can save the task, but you can’t run it until you fix
the errors. If there are no errors, you can run the task now, save it and run it later, or run it on a schedule.
To run the task now, click Save to save the task, and then click Run. To save it and run it later, click Save.
You can also configure the schedule, email notification options, error handling, and task location. The runtime
environment is always the Informatica Cloud Hosted Agent, which is maintained by Informatica.
Select a schedule for your task on the Let's Go page under Schedule. Click Run on a schedule, and then
select the schedule.
If you want to create a new schedule, click New Schedule. To make the schedule easy to identify, give the
schedule a meaningful name such as "Fridays at 12PM." For the start time, the date format is MM-DD-YYYY,
and the time appears in the 12-hour format.
You can also create, view, edit, and delete schedules in the Administrator service. For more information, see
Organization Administration.
Configure email notification options for your task on the Let's Go page under Notifications. You can send
notifications to different addresses based on whether the task completed successfully, completed with
warnings, or failed.
You can enter multiple email addresses in each field. If you enter multiple addresses, separate them with
commas.
To determine how Data Loader handles runtime errors, set one of the following options:
Configure the task location on the Let's Go page under Task Location. By default, the task is stored in the
currently open project, or, if no project is open, in the project where you saved your last task. If you haven't
previously created a task, the task is stored in the Default project. To select a different location, click the
project name and choose a different project.
If you're editing a task you saved before, the task location is read-only. You can move the task to a different
project on the Explore page.
You can view the runtime environment on the Let's Go page under Runtime Environment. The runtime
environment for all data loader tasks is the Informatica Cloud Hosted Agent.
The Informatica Cloud Hosted Agent connects to your sources and targets securely and does the data
processing when you run a task. The hosted agent is maintained by Informatica. There is nothing to
download, and there are no settings to configure.
C creation 52
description 47
CHR function field mapping 55
inserting single quotation mark 13 lookup condition 49
Cloud Application Integration community lookup source 48
URL 6 operations 47
Cloud Developer community running 55
URL 6 runtime options 55
comments scheduling 55
adding to field expressions 14 second source 48
constants second source filters 50
definition 13 sort conditions 48
source filters 48
targets 50
67
F mapping tasks (continued)
configuring 36
FALSE constant configuring schedules, email, and advanced options 41
reserved word 15 configuring sources 37
field data type 52 creating 36
field expressions editing 36
comments, adding 14 input and in-out parameters 41
components 13 joining related sources 31
creating 12 overview 30
literals 13 pushdown optimization 32
reserved words 15 reprocessing incrementally-loaded files 46
rules 14 running 46
rules for validating 13 Sequence Generator configuration 45
syntax 13 templates 30
field mapping using parameter files 22
configuration 55 viewing details 45
field metadata 33 Masking tasks
filters using parameter files 22
configuring 60 :MCR reference qualifier
flat files reserved word 15
editing metadata in mapping tasks 33 monitoring jobs 28
rules for sources in data integration tasks 29
functions
definition 13 N
NOT
I reserved word 15
NULL constant
in-out parameters reserved word 15
in mapping tasks 41
:INFA reference qualifier
reserved word 15
Informatica Cloud Hosted Agent 66
O
Informatica Global Customer Support operators
contact information 7 definition 13
Informatica Intelligent Cloud Services OR
web site 6 reserved word 15
input parameters
in mapping tasks 41
P
J parameter file
configuring for mapping tasks 41
jobs parameter files
monitoring 28 overview 22
scheduling 23 parameters
configuring 41
user defined 22
L postprocessing commands
SQL commands 27
$LastRunDate tasks 27
data filter variable 11 preprocessing and postprocessing commands
$LastRunTime configuring for mapping tasks 41
data filter variable 11 rules for operating system commands 27
literals rules for SQL commands 27
definition 13 preprocessing commands
single quotation mark requirement 13 SQL commands 27
string and numeric 13 tasks 27
:LKP reference qualifier primary keys
reserved word 15 configuring 61
PROC_RESULT variable
reserved word 15
M pushdown optimization
mapping tasks 32
maintenance outages 7
mapping tasks
advanced options 46
advanced session properties for mappings 16
68 Index
Q synchronization tasks (continued)
using parameter files 22
quotation marks syntax
inserting single using CHR function 13 for field expressions 13
system status 7
R T
repeat frequency
schedules 23 targets
reserved words configuring 63
list 15 configuring for mapping tasks 39
rules for sources and targets 28 dropping and re-creating 64
runtime environment incremental loading 63
data loader tasks 66 task operations
data transfer tasks 47
taskflows
S monitoring 28
scheduling 23
schedules tasks
configuring 26 configuring email notification 26
creating 25 configuring field expressions 12
Daylight Savings Time 25 creating schedules 25
repeat frequency 23 monitoring 28
time zones 24 scheduling 23
scheduling stopping a running job 28
mapping tasks 41 :TD reference qualifier
taskflows 23 reserved word 15
tasks 23 time zones
:SD reference qualifier description 24
reserved word 15 TRUE constant
:SEQ reference qualifier reserved word 15
reserved word 15 trust site
simple data filters description 7
configuring 9
sort conditions
data transfer task 48
source filters
U
data transfer tasks 48, 50 update columns
sources data transfer task 50
configuring filters 60 upgrade notifications 7
configuring for mapping tasks 37 user parameters 22
configuring primary keys 61
configuring watermark fields 62
customization guidelines 58
formatting objects 59
V
selecting fields 60 validation
selecting objects 59 rules 13
source path 59 variables
source path syntax 59 for data filters 11
:SP reference qualifier
reserved word 15
SPOUTPUT
reserved word 15
W
status watermark fields
Informatica Intelligent Cloud Services 7 configuring 62
stopping a running job 28 web site 6
string literals WORKFLOWSTARTTIME variable
single quotation mark requirement 13 reserved word 15
synchronization tasks
rules for database sources and targets 29
Index 69