0% found this document useful (0 votes)
603 views

Teradata Utilities FastLoad

FastLoad is a Teradata utility that can load large amounts of data very quickly into empty tables by taking advantage of Teradata's parallel processing architecture. It works by assembling data into 64K blocks that can be loaded simultaneously using multiple sessions. However, it has limitations in that it does not support secondary indexes, referential integrity constraints, or triggers on the target table. The document provides an analogy comparing FastLoad's data loading approach to different types of construction projects to illustrate how it works and why it is faster than alternatives like MultiLoad.

Uploaded by

karamvir1
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
603 views

Teradata Utilities FastLoad

FastLoad is a Teradata utility that can load large amounts of data very quickly into empty tables by taking advantage of Teradata's parallel processing architecture. It works by assembling data into 64K blocks that can be loaded simultaneously using multiple sessions. However, it has limitations in that it does not support secondary indexes, referential integrity constraints, or triggers on the target table. The document provides an analogy comparing FastLoad's data loading approach to different types of construction projects to illustrate how it works and why it is faster than alternatives like MultiLoad.

Uploaded by

karamvir1
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Teradata Utilities: FastLoad

Reprinted for KV Satish Kumar, IBM


[email protected]

Reprinted with permission as a subscription benefit of Books24x7,


http://www.books24x7.com/
i

Table of Contents
Chapter 3: FastLoad........................................................................................................................1
Why it is Called "FAST" Load................................................................................................1
How FastLoad Works.......................................................................................................1
FastLoad Has Some Limits..............................................................................................2
Three Key Requirements for FastLoad to Run................................................................2
Maximum of 15 Loads......................................................................................................5
FastLoad Has Two Phases....................................................................................................5
Phase 1: Acquisition.........................................................................................................5
Phase 2: Application........................................................................................................6
FastLoad Commands.............................................................................................................6
Fastload Sample....................................................................................................................6
Executing a FastLoad Script..................................................................................................8
Another Sample FastLoad Script...........................................................................................9
Checkpoints.........................................................................................................................12
Converting Data Types with FastLoad.................................................................................13
A FastLoad Conversion Example........................................................................................13
When You Cannot RESTART FastLoad..............................................................................14
When You Can RESTART FastLoad...................................................................................15
Step Two: Run the FastLoad script.................................................................................16
What Happens When FastLoad Finishes............................................................................17
You Receive an Outcome Status...................................................................................17
You Receive a Status Report.........................................................................................17
You can Troubleshoot....................................................................................................17
Restarting FastLoad: A More In-Depth Look.......................................................................18
How the CHECKPOINT Option Works...........................................................................18
Restarting with CHECKPOINT.......................................................................................18
Restarting without CHECKPOINT........................................................................................18
Using INMODs with FastLoad..............................................................................................19
Chapter 3: FastLoad
"Where there is no patrol car, there is no speed limit."
- Al Capone

Why it is Called "FAST" Load


FastLoad is known for its lightning-like speed in loading vast amounts of data from flat files from a
host into empty tables in Teradata. Part of this speed is achieved because it does not use the
Transient Journal. You will see some more of the reasons enumerated below. But, regardless of the
reasons that it is fast, know that FastLoad was developed to load millions of rows into a table.

The way FastLoad works can be illustrated by home construction, of all things! Let's look at three
scenarios from the construction industry to provide an amazing picture of how the data gets loaded.

Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the
foundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot.
The fewer barriers there are to deal with, the quicker the new construction can progress. Building
custom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to start
with an empty table, like an empty lot, and then populate it with rows of data from another source.
Because the target table is empty, this method is typically the fastest way to load data. FastLoad will
never attempt to insert rows into a table that already holds data.

Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of
land on which to build a home, but the lot already has a house on it. In this case, the person may
determine that it is quicker and more advantageous just to demolish the old house and start fresh
from the ground up — allowing for brand new construction. FastLoad also likes this approach to
loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure,
and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new
rows, this process will run much quicker than using MultiLoad to populate the existing table. Another
option is to DELETE all the data rows from a populated target table and reload it. This requires less
updating of the Data Dictionary than dropping and recreating a table. In either case, the result is a
perfectly empty target table that FastLoad requires!

Scenario Three: Sometimes, a customer has a good house already but wants to remodel a portion
of it or to add an additional room. This kind of work takes more time than the work described in
Scenario One. Such work requires some tearing out of existing construction in order to build the
new section. Besides, the builder never knows what he will encounter beneath the surface of the
existing home. So you can easily see that remodeling or additions can take more time than new
construction. In the same way, existing tables with data may need to be updated by adding new
rows of data. To load populated tables quickly with large amounts of data while maintaining the data
currently held in those tables, you would choose MultiLoad instead of FastLoad. MultiLoad is
designed for this task but, like renovating or adding onto an existing house, it may take more time.

How FastLoad Works


What makes FastLoad perform so well when it is loading millions or even billions of rows? It is
because FastLoad assembles data into 64K blocks (64,000 bytes) to load it and can use multiple
sessions simultaneously, taking further advantage of Teradata's parallel processing.

This is different from BTEQ and TPump, which load data at the row level. It has been said, "If you
have it, flaunt it!" FastLoad does not like to brag, but it takes full advantage of Teradata's parallel
architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module
Processor — the software processor in Teradata responsible for reading and writing data to the
disks) in order to maximize parallel processing. This advantage is passed along to the FastLoad
user in terms of awesome performance. Teradata is the only data warehouse loads data, processes
data and backs up data in parallel.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 2

FastLoad Has Some Limits


There are more reasons why FastLoad is so fast. Many of these become restrictions and therefore,
cannot slow it down. For instance, can you imagine a sprinter wearing cowboy boots in a race? Of
course, not! Because of its speed, FastLoad, too, must travel light! This means that it will have
limitations that may or may not apply to other load utilities. Remembering this short list will save you
much frustration from failed loads and angry colleagues. It may even foster your reputation as a
smooth operator!

Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only
allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI and
NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build only
data rows. A secondary index is stored in a subtable block and many times on a different AMP from
the data row. This would slow FastLoad down and they would have to call it: get ready now,
HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just
drop them. You may easily recreate them after completing the load.

Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are
defined with Referential Integrity (RI). This would require too much system checking to prevent
referential constraints to a different table. FastLoad only does one table. In short, RI constraints will
need to be dropped from the target table prior to the use of FastLoad.

Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay
attention to the needs of other tables, which is what Triggers are all about. Additionally, these
require more than one AMP and more than one table. FastLoad does one table only. Simply ALTER
the Triggers to the DISABLED status prior to using FastLoad.

Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables that
allow duplicate rows — that is when the values in every column are identical. When FastLoad finds
duplicate rows, they are discarded. While FastLoad can load data into a multi-set table, FastLoad
will not load duplicate rows into a multi-set table because FastLoad discards duplicate rows!

Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP
must be repaired before the load process can be restarted. Other than this, FastLoad can recover
from system glitches and perform restarts. We will discuss Restarts later in this chapter.

Rule #6: No more than one data type conversion is allowed per column during a FastLoad.
Why just one? Data type conversion is highly resource intensive job on the system, which requires a
"search and replace" effort. And that takes more time. Enough said!

Three Key Requirements for FastLoad to Run


FastLoad can be run from either MVS/ Channel (mainframe) or Network (LAN) host. In either case,
FastLoad requires three key components. They are a log table, an empty target table and two error
tables. The user must name these at the beginning of each script.

Log Table: FastLoad needs a place to record information on its progress during a load. It uses the
table called Fastlog in the SYSADMIN database. This table contains one row for every FastLoad
running on the system. In order for your FastLoad to use this table, you need INSERT, UPDATE
and DELETE privileges on that table.

Empty Target Table: We have already mentioned the absolute need for the target table to be
empty. FastLoad does not care how this is accomplished. After an initial load of an empty target
table, you are now looking at a populated table that will likely need to be maintained.

If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of speed
and for less interaction with the Data Dictionary, just to delete all the rows from that table and then
reload it with fresh data. The syntax DELETE <databasename>.<tablename> should be used for
this. But sometimes, as in some of our FastLoad sample scripts below (see Figure 4-1), you want to
drop that table and recreate it versus using the DELETE option. To do this, FastLoad has the ability
to run the DDL statements DROP TABLE and CREATE TABLE. The problem with putting DDL in
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 3

the script is that is no longer restartable and you are required to rerun the FastLoad from the
beginning. Otherwise, we recommend that you have a script for an initial run and a different script
for a restart.

AXSMOD Short for Access Module, this command specifies input protocol
like OLEDB or reading a tape from REEL Librarian. This
parameter is for network-attached systems only. When used, it
must precede the DEFINE command in the script.
BEGIN LOADING This identifies and locks the FastLoad target table for the
duration of the load. It also identifies the two error tables to be
used for the load. CHECKPONT and INDICATORS are
subordinate commands in the BEGIN LOADING clause of the
script. CHECKPOINT, which will be discussed below in detail, is
not the default for FastLoad. It must be specified in the script.
INDICATORS is a keyword related to how FastLoad handles
nulls in the input file. It identifies columns with nulls and uses a
bitmap at the beginning of each row to show which fields contain
a null instead of data. When the INDICATORS option is on,
FastLoad looks at each bit to identify the null column. The
INDICATORS option does not work with VARTEXT.
CREATE TABLE This defines the target table and follows normal syntax. If used,
this should only be in the initial script. If the table is being loaded,
it cannot be created a second time.
DEFINE This names the input file and describes the columns in that file
and the data types for those columns.
DELETE Deletes all the rows of a table. This will only work in the initial run
of the script. Upon restart, it will fail because the table is locked.
DROP TABLE Drops a table and its data. It is used in FastLoad to drop previous
Target and error tables. At the same time, this is not a good thing
to do within a FastLoad script since it cancels the ability to
restart.
END LOADING Success! This command indicates the point at which that all the
data has been transmitted. It tells FastLoad to proceed to Phase
II. As mentioned earlier, it can be used as a way to partition data
loads to the same table by omitting if from the script. This is true
because the table remains empty until after Phase II. You can
also use .End Loading to go to Phase 2. Instead of then being
finished, Fastload will instead be paused.
ERRLIMIT Specifies the maximum number of rejected ROWS allowed in
error table 1 (Phase I). This handy command can be a lifesaver
when you are not sure how corrupt the data in the input file is.
The more corrupt it is, the greater the clean up effort required
after the load finishes. ERRLIMIT provides you with a safety
valve. You may specify a particular number of error rows beyond
which FastLoad will precede to the abort. This provides the
option to restart the FastLoad or to scrub the input data more
before loading it. Remember, all the rows in the error table are
not in the data table. That becomes your responsibility.
HELP Designed for online use, the Help command provides a list of all
possible FastLoad commands along with brief, but pertinent tips
for using them.
HELP TABLE Builds the table columns list for use in the FastLoad DEFINE
statement when the data matches the Create Table statement
exactly. In real life this does not happen very often.
INSERT This is FastLoad's favorite command! It inserts rows into the
target table.
LOGON/LOGOFF No, this is not the WAX ON / WAX OFF from the movie, The
or, QUIT Karate Kid! LOGON simply begins a session. LOGOFF ends a
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 4

session. QUIT is the same as LOGOFF.


NOTIFY Just like it sounds, the NOTIFY command used to inform the job
that follows that some event has occurred. It calls a user exit or
predetermined activity when such events occur. NOTIFY is often
used for detailed reporting on the FastLoad job's success.
RECORD Specifies the beginning record number (or with THRU, the
ending record number) of the Input data source, to be read by
FastLoad. Syntactically, This command is placed before the
INSERT keyword. Why would it be used? Well, it enables
FastLoad to bypass input records that are not needed such as
tape headers, manual restart, etc. When doing a partition data
load, RECORD is used to over-ride the checkpoint.
SET RECORD Used only in the LAN environment, this command states in what
format the data from the Input file is coming: FastLoad,
Unformatted, Binary, Text, or Variable Text. The default is the
Teradata RDBMS standard, FastLoad.
SESSIONS This command specifies the number of FastLoad sessions to
establish with Teradata. It is written in the script just before the
logon. The default is 1 session per available AMP. The purpose
of multiple sessions is to enhance throughput when loading large
volumes of data. Too few sessions will stifle throughput. Too
many will preclude availability of system resources to other
users. You will need to find the proper balance for your
configuration.
SLEEP Working in conjunction with TENACITY, the SLEEP command
specifies the amount of time in minutes to wait before retrying to
logon and establish all sessions. This situation can occur if all of
the loader slots are used or if the number of requested sessions
are not available. The default is 6 minutes. For example,
suppose that Teradata sessions are already maxed-out when
your job is set to run. If TENACITY were set at 4 and SLEEP at
10, then FastLoad would attempt to logon every 10 minutes for
up to 4 hours. If there were no success by that time, all efforts to
logon would cease.
TENACITY Sometimes there are too many sessions already established with
Teradata for a FastLoad to obtain the number of sessions it
requested to perform its task or all of the loader slots are
currently used. TENACITY specifies the amount of time, in hours,
to retry to obtain a loader slot or to establish all requested
sessions to logon. The default for FastLoad is "no tenacity",
meaning that it will not retry at all. If several FastLoad jobs are
executed at the same time, we recommend setting the
TENACITY to 4, meaning that the system will continue trying to
logon for the number of sessions requested for up to four hours.

Figure 4-1
Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be
populated should errors occur during the load process. These are required by the FastLoad utility,
which will automatically create them for you; all you must do is to name them. The first error table is
for any translation errors or constraint violations. For example, a row with a column containing a
wrong data type would be reported to the first error table. The second error table is for errors
caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one
occurrence for every UPI. The other occurrences will be stored in this table. However, if the entire
row is a duplicate, FastLoad counts it but does not store the row. These tables may be analyzed
later for troubleshooting should errors occur during the load. For specifics on how you can
troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 5

Maximum of 15 Loads
The Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or
FastExports at the same time. This maximum is determined by a value stored in the DBS Control
record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5
concurrent jobs.

Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will
protect the amount of system resources available by queuing up the extra load. For example, if the
maximum number of jobs are currently running on the system and you attempt to run one more, that
job will not be started. You should view this limit as a safety control. Here is a tip for remembering
how the load limit applies: If the name of the load utility contains either the word "Fast" or the word
"Load", then there can be only a total of fifteen of them running at any one time.

FastLoad Has Two Phases


Teradata is famous for its end-to-end use of parallel processing. Both the data and the tasks are
divided up among the AMPs. Then each AMP tackles its own portion of the task with regard to its
portion of the data. This same "divide and conquer" mentality also expedites the load process.
FastLoad divides its job into two phases, both designed for speed. They have no fancy names but
are typically known simply as Phase 1 and Phase 2. Sometimes they are referred to as Acquisition
Phase and Application Phase.

Phase 1: Acquisition
The primary function of Phase 1 is to transfer data from the host computer to the Access Module
Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata
does not take the time to hash each row of data based on the Primary Index. That will be done later.
Instead, it does the following:

When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the
SQL just once. The PE is the Teradata software processor responsible for parsing syntax and
generating a plan to execute the request. It then opens a Teradata session from the FastLoad client
directly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems,
it is normally a good idea to limit the number of sessions using the SESSIONS command. This
capability is shown below.

Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer
to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To
accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without
any concern for which AMP gets the block. The result is that data rows arrive on different AMPs
than those they would live, had they been hashed.

So how do the rows get to the correct AMPs where they will permanently reside? Following the
receipt of every data block, each AMP hashes its rows based on the Primary Index, and
redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP
but remain unsorted until Phase 1 is complete.

Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping
industry today. How do the key players in this industry handle a parcel? When the shipping
company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for
the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that
hub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way that
the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP.
This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards them to
their true destination. This is like the shipping parcel being sent from a hub city to its destination city!

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 6

Phase 2: Application
Following the scenario described above, the shipping vendor must do more than get a parcel to the
destination city. Once the packages arrive at the destination city, they must then be sorted by street
and zip code, placed onto local trucks and be driven to their final, local destinations.

Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address (i.e.,
where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it
writes the rows into the table space on disks where they will permanently reside. Rows of a table
are stored on the disks in data blocks. The AMP uses the block size as defined when the target
table was created. If the table is Fallback protected, then the Fallback will be loaded after the
Primary table has finished loading. This enables the Primary table to become accessible as soon as
possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!

FastLoad Commands
Here is a table of some key FastLoad commands and their definitions. They are used to provide
flexibility in control of the load process. Consider this your personal redireference guide! You will
notice that there are only a few SQL commands that may be used with this utility (Create Table,
Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional
functions that would slow it down.

Fastload Sample

"Mistakes are a part of being human. Appreciate your mistakes for what they are:
precious life lessons that can only be learned the hard way. Unless it's a fatal
mistake, which, at least, others can learn from."
– Al Franken

Fastload is a utility we can use to populate empty tables. Make no mistake about how useful
Fastload can be or how fatal errors can occur. The next 2 slides illustrate the essentials needed
when constructing your fastload script. The first will highlight the important areas about the
FastLoad script, and the second slide is a blank copy of the script that you can use to create your
own FastLoad script. Use the flat file we created in the BTEQ chapter to help run the script.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 7

Simply copy the following text into notepad, then save it with a name and location that you can
easily remember (we saved ours as c:\temp\Fastload_First_Script.txt).

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 8

This script is going to create a table called Employee_Table02. After the table is created, it's going
to take the information from our flat file and insert it into the new table. Afterwards, the
Employee_Table and Employee_Table02 should look identical.

Executing a FastLoad Script

"A good plan, violently executed now, is better than a perfect plan next week."
- George S. Patton

We can execute the Fastload utility like we do with BTEQ; however we use the command "fastload"
instead of "BTEQ". If we get a return code of 0 then the Fastload worked perfectly. What did
General Patton say when his Fastload gave him a return code of 12? I shall return 0!

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 9

Executing our Fastload script

Let's see if it worked:

The load utilities often scare people because there are many things that appear complicated. In
actuality, the load scripts are very simple. Think of FastLoad as:

• Logging onto Teradata

• Defining the Teradata table that you want to load (target table)

• Defining the INPUT data file

• Telling the system to start loading

Another Sample FastLoad Script


Normally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. The
reason is that when any of the tables that FastLoad is using are dropped, the script cannot be
restarted. It can only be rerun from the beginning. Since FastLoad has restart logic built into it, a
restart is normally the better solution if the initial load attempt should fail. However, for purposes of
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 10

this example, it shows the table structure and the description of the data being read.

Let's look at another FastLoad script that you might see in the real world. In the script below, every
comment line is placed inside the normal Teradata comment syntax, [/*. . . . */]. FastLoad and SQL
commands are written in upper case in order to make them stand out. In reality, Teradata utilities,
like Teradata itself, are by default not case sensitive. You will also note that when column names
are listed vertically we recommend placing the comma separator in front of the following column.
Coding this way makes reading or debugging the script easier for everyone. The purpose of this
script is to load the Employee_Profile table in the SQL01 database. The input file used for the load
is named EMPS.TXT. Below the sample script each step will be described in detail.

Since this script does not


/* FASTLOAD SCRIPT TO LOAD THE */ drop the target or error
/* Employee_Profile TABLE */ tables, it is restartable.
/* Created by Coffing Data Warehousing */
This is a good thing for
production jobs.
/* Setup the FastLoad Parameters */
Specify the number of
SESSIONS 100; /*or, the number of sessions supportable*/ sessions to logon.
Tenacity is set to 4 hr; Wait
TENACITY 4; /* the default is no tenacity, means no retry */ 10 Min between retries.
SLEEP 10; /* the default is 6, means retry in 6 minutes */
LOGON CW/SQL01,SQL01;
SHOW VERSIONS; /* Shows the Utility's release number */
/* Set the Record type to a comma delimited for FastLoad */
RECORD 2;
Starts with the second
SET RECORD VARTEXT ","; record.
Specifies if record layout is
/* Define the Text File Layout and Input File */ vartext with a comma
delimiter.
Notice that all fields are
DEFINE Employee_No (VARCHAR(10)) defined as VARCHAR.
, Last_name (VARCHAR(20)) When using VARTEXT, the
, First_name (VARCHAR(12))
, Salary (VARCHAR(5)) fields do not contain the
, Dept_No (VARCHAR(6)) length field like in these
formats: text, FastLoad, or
FILE= EMPS.TXT; unformatted.
/* Optional to show the layout of the input */
SHOW;
Specifies table to load and
/* Begin the Load and Insert Process into the */ lock
/* Employee_Profile Table */
Names the error tables
BEGIN LOADING SQL01. Employee Profile Sets the number of rows at
ERRORFILESSQLOLEmp Err1.SQL01.Emp Err2 which to pause & record
CHECKPOINT 100000;
progress in the restart log
before loading further.
Defines the insert
INSERT INTO SQL01.Employee_Profile VALUES statement to use for
( :Employee_No loading the rows
,:Last_name
,:First_name
,:Salary
,:Dept_No );
Continues loading process
END LOADING; with Phase 2.
Logs off of Teradata.
LOGOFF;

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 11

Step One: Before logging onto Teradata, it is important to specify how many sessions you need.
The syntax is [SESSIONS {n}].

Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands
in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying
commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot
["."] in front of them and therefore need a semicolon. At this point we chose to have Teradata tell us
which version of FastLoad is being used for the load. Why would we recommend this? We do
because as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts may
have to be revisited.

Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure
in the DEFINE statement, you must first set the RECORD layout type for the file being passed by
FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are
FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file
ahead of time.

Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of
the flat file to be used as the input FILE, or source file for the load.

Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what
you want loaded. In the BEGIN LOADING statement, the script must name the target table and the
two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error
tables in this script? FastLoad will automatically create them for you once you name them in the
script. In this instance, they are named "Emp_Err1" and "Emp_Err2". Phase 1 uses "Emp_Err1"
because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You may
call them whatever you like. At the same time, they must be unique within a database, so using a
combination of your userid and target table name helps insure this uniqueness between multiple
FastLoad jobs occurring in the same database.

In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter.
We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a
vital task with regard to the load. In the old days, children were always told to focus on the three
"R's' in grade school ("reading, 'riting, and 'rithmatic"). There are two very different, yet equally
important, R's to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN
means that the job is capable of running all the processing again from the beginning of the load.
RESTART means that the job is capable of running the processing again from the point where it left
off when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allows
FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will
learn more about CHECKPOINT in the section on Restarting FastLoad.

Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier's
do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to
Phase 2 without the END LOADING command.

In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the
start of the job, it prevents loading rows as they arrive from different time zones. However, to
accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the
same FastLoad multiple times and continue loading the worktables until the last file is received.
Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into
smaller segments instead of one huge job. This makes FastLoad even faster!

Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or
CREATE commands within the script. Additionally, every script is exactly the same with the
exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase
2. That's a pretty clever way to do a partitioned type of data load.

Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the
last utility command in your script. At this point the table lock is released and if there are no rows in
the error tables, they are dropped automatically. However, if a single row is in one of them, you are
responsible to check it, take the appropriate action and drop the table manually.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 12

Checkpoints

"Once the game is over, the king and the pawn go back in the same box."
- Italian Proverb

Fastload has the ability to save checkpoints during the loading process. Checkpoints are what
enable utilities to pick up from where they left off if the loading process was interrupted in any way.
Choosing a correct checkpoint can be easily calculated:

Determining a Checkpoint

Add up the approximate byte count of 1 row. The row below adds up to:

Integer = 4 bytes
• Employee_No:

Smallint = 2 bytes
• Dept_No:

Char(20) = 20 bytes
• Last_Name:

VarChar(12) = 14 bytes
• First_Name:

Decimal(8,2) = 5 bytes
• Salary:

= 45 bytes
• Total:

Now take the total number of bytes per row (45 bytes in our case) and divide 64,000 by that
number. (64,000 / 45 = 1422.2) The number you come up with is the number of rows that will be
bundled together in each data block set.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 13

Setting the checkpoint to 1000 would be pointless because the computer would take a checkpoint
every data block! A 1,000,000 checkpoint would work well here, sending approximately 703 data
blocks between checkpoints.

Converting Data Types with FastLoad

"You don't drown by falling in the water; you drown by staying in the water."
- Edwin Louis Cole

Converting data is easy. Just define the input data types in the input file. Then, FastLoad will
compare that to the column definitions in the Data Dictionary and convert the data for you! But the
cardinal rule is that only one data type conversion is allowed per column. In the example below,
notice how the columns in the input file are converted from one data type to another simply by
redefining the data type in the CREATE TABLE statement.

FastLoad allows six kinds of data conversions. Here is a chart that displays them:
IN FASTLOAD YOU MAY CONVERT
CHARACTER DATA TO NUMERIC DATA
FIXED LENGTH DATA TO VARIABLE LENGTH DATA
CHARACTER DATA TO DATE
INTEGERS TO DECIMALS
DECIMALS TO INTEGERS
DATE TO CHARACTER DATA
NUMERIC DATA TO CHARACTER DATA

Figure 4-5
When we said that converting data is easy, we meant that it is easy for the user. It is actually quite
resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is
important, keep the number of columns being converted to a minimum!

A FastLoad Conversion Example


This next script example is designed to show how FastLoad converts data automatically when the
INPUT data type differs from the Target Teradata Table data type. The actual script is in the left
column and our comments are on the right.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 14

LOGON TO TERADATA
LOGON CDW/jones, cowboys;
NOTICE THAT DEPT_NO IS AN INTEGER
CREATE TABLE SQL01.Department HERE IN THE TARGET TABLE, BUT A
( CHAR(4) IN THE FLAT FILE DEFINITION
Dept_No INTEGER
,Dept_Name CHAR(20) BELOW - CHAR(4) will convert to integer
,Dept_Start_Date DATE These date columns are DATE data type
,Dept_Finish_Date DATE will be converted from CHAR(10)
,Dept_Name CHAR(20) )
UNIQUE PRIMARY INDEX ( Dept_No );
CHAR(4) converts to INTEGER
DEFINE
Department_No (CHAR(4)) Character dates in different style in the
,Department Name (CHAR(20))
,SDate (CHAR(10)) file:
,FDate (CHAR(10))
CHAR(10) comes in as YYYY-MM-DD

CHAR(10) comes in as MM/DD/YYYY


DEFINES THE FLAT FILE AND NAME
FILE= Dept_Flat.txt; INPUT FILE
Names the target table and error tables,
BEGIN LOADING SQL01.Department don't let the word "errorfiles" fool you,
ERRORFILES SQL01.Dept_Err1, they are tables.
SQL01.Dept_Err2
CHECKPOINT 15000;
Will check point every 15000 rows

The INSERT does automatic conversion:


Converts character to integer
INSERT INTO SQL01.Department
VALUES ( Converts character from ANSI date to
:Department_No
,:Department_Name DATE Converts character as other date to
,:SDate DATE by describing the input format in the
,:FDate(DATE, file. Without the format, this row goes into
FORMAT 'mm/dd/yyyy') ); the error table.

END LOADING;
LOGOFF;

Figure 4-5

When You Cannot RESTART FastLoad


There are two types of FastLoad scripts: those that you can restart and those that you cannot
without modifying the script. If any of the following conditions are true of the FastLoad script that you
are dealing with, it is NOT restartable:

• The Error Tables are DROPPED

• The Target Table is DROPPED

• The Target Table is CREATED

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 15

Can you tell from the following sample FastLoad script why it is not restartable?

LOGON TO TERADATA
LOGON CDW/tommy, cowboys;
DROPS TARGET TABLE AND
DROP TABLE SQL01.Department; ERROR TABLES
DROP TABLE SQL01.Dept_Err1;
DROP TABLE SQL01.Dept_Err2;
CREATES THE DEPARTMENT
CREATE TABLE SQL01.Department TARGET TABLE IN THE SQL01
(Dept_No INTEGER DATABASE IN TERADATA.
,Dept_Name CHAR(20)
)
UNIQUE PRIMARY INDEX (Dept_No);
DEFINES THE FLATFILE
DEFINE Department_No (INTEGER) STRUCTURE AND NAME OF
,Department_Name (CHAR(20)) THE INPUT FILE.
FILE= Dept_Flat.txt;
SPECIFIES TABLE TO LOAD
BEGIN LOADING SQL01.Department AND ERROR TABLES.
ERRORFILES SQL01.Dept_Err1, INDICATORS are defined on the
SQL01.Dept_Err2
INDICATORS BEGIN
CHECKPOINT 5000;
INSERT COMMAND
INSERT INTO SQL01.Department VALUES
(:Department_No
,:Department_Name);
Optional use of self-defining
/* since data file and table are the same, self-defining INSERT, the DEFINE would
INSERT would also work: contain only the FILE=
INSERT INTO SQL01.Department.*; */
START PHASE 2
END LOADING;
LOGOFF; LOGOFF

Figure 4-7
Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a
system reset or some glitch that stops the job one half way through it. Maybe the mainframe went
down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably
just RERUN the job for small data loads.

However, when you are loading a billion rows, this is not a good idea because it wastes time. So the
most common way to deal with these situations is simply to RESTART the job. But what if the
normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows
loaded? In that case, you might want to make sure that the job is totally restartable. Let's see how
this is done.

When You Can RESTART FastLoad


If all of the following conditions are true, then FastLoad is ALWAYS restartable:

• The Error Tables are NOT DROPPED in the script

• The Target Table is NOT DROPPED in the script

• The Target Table is NOT CREATED in the script

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 16

• You have defined a checkpoint

So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have a
table whose data changes so much that you typically drop it monthly and build it again. Let's go
back to the script we just reviewed above and see how we can break it into the two parts necessary
to make it fully RESTARTABLE. It is broken up below.

STEP ONE: Run the following SQL statements in Queryman or BTEQ before
you start FastLoad:
DROPS TARGET TABLE AND
DROP TABLE SQL01.Department; ERROR TABLES
DROP TABLE SQL01.Dept_Err1;
DROP TABLE SQL01.Dept_Err2;
CREATES THE DEPARTMENT
CREATE TABLE SQL01.Department TARGET TABLE IN THE SQL01
(Dept_No INTEGER DATA BASE IN TERADATA
,Dept_Name CHAR(20)
)
UNIQUE PRIMARY INDEX (Dept_No);

Figure 4-8
First, you ensure that the target table and error tables, if they existed previously, are blown away. If
there had been no errors in the error tables, they would be automatically dropped. If these tables did
not exist, you have not lost anything. Next, if needed, you create the empty table structure needed
to receive a FastLoad.

Step Two: Run the FastLoad script

This is the portion of the earlier script that carries out these vital steps:

• Defines the structure of the flat file

• Tells FastLoad where to load the data and store the errors

• Specifies the checkpoint so a RESTART will not go back to row one

• Loads the data

If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with
the next record after the last checkpoint. Now, with that said, if you did not request a checkpoint, the
output message will normally indicate how many records were loaded.

You may optionally use the RECORD command to manually restart on the next record after the one
indicated in the message.

Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN
LOADING and END LOADING. It will then restart right into Phase 2.

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 17

What Happens When FastLoad Finishes


You Receive an Outcome Status
The most important thing to do is verify that FastLoad completed successfully. This is accomplished
by looking at the last output in the report and making sure that it is a return code or status code of
zero (0). Any other value indicates that something wasn't perfect and needs to be fixed.

The locks will not be removed and the error tables will not be dropped without a successful
completion. This is because FastLoad assumes that it will need them for its restart. At the same
time, the lock on the target table will not be released either. When running FastLoad, you
realistically have two choices once it is started. First choice is that you get it to run to a successful
completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is
normally to get it to finish successfully via a restart.

You Receive a Status Report


What happens when FastLoad finishes running? Well, you can expect to see a summary report on
the success of the load. Following is an example of such a report.

Line 1: TOTAL RECORDS READ = 1000000


Line 2: TOTAL ERRORFILE1 = 50
Line 3: TOTAL ERRORFILE2 =0
Line 4: TOTAL INSERTS APPLIED = 999950
Line 5: TOTAL DUPLICATE ROWS =0

Figure 4-9

The first line displays the total number of records read from the input file. Were all of them loaded?
Not really. The second line tells us that there were fifty rows with constraint violations, so they were
not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that
there were zero entries into the second error table, indicating that there were no duplicate Unique
Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the
empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the
duplicates would only have been counted. They are not stored in the error tables anywhere. When
FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the
number of records read in line 1.

Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be
duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint
(quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of
rows will be sent to the AMPs again because the restart starts on the next record after the value
stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some
of the consecutive rows are sent a second time. These will be caught as duplicate rows after the
sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET
table. It assumes they are duplicates because of this logic.

You can Troubleshoot


In the example above, we know that the load was not entirely successful. But that is not enough.
Now we need to troubleshoot in order identify the errors and correct them. FastLoad generates two
error tables that will enable us to find the culprits. The first error table, which we named Errortable1,
contains just three columns: The column ErrorCode contains the Teradata FastLoad code number
to a corresponding translation or constraint error. The second column, named ErrorField, specifies
which column in the table contained the error. The third column, DataParcel, contains the row with
the problem. Errortable2 contains the same columns as the target table.

As a user, you can select from either error table. To check errors in Errortable1 you would use this
syntax:
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 18

SELECT DISTINCT ErrorCode, Errorfieldname FROM Errortable1;

Corrected rows may be inserted to the target table using another utility that does not require an
empty table.

To check errors in Errortable2 you would the following syntax:


SELECT * FROM Errortable2;

The definition of the second error table is exactly the same as the target table with all the same
columns and data types.

Restarting FastLoad: A More In-Depth Look

"Never engage in a battle of wits against an unarmed person."


- Anonymous

How the CHECKPOINT Option Works


CHECKPOINT option defines the points in a load job where the FastLoad utility pauses to record
that Teradata has processed a specified number of rows. When the parameter "CHECKPOINT [n]"
is included in the BEGIN LOADING clause the system will stop loading momentarily at increments
of [n] rows.

At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly.
Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains
a row for all currently running FastLoad jobs with the last successfully reached checkpoint for each
job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last
successfully reported checkpoint prior to the error. It will then restart from the record immediately
following that checkpoint and start building the next block of data to load. If such an error occurs in
Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row. If this is not
desirable, the RECORD statement can be used to force a restart at the next record after the failure.

Restarting with CHECKPOINT


Sometimes you may need to restart FastLoad. If the FastLoad script requests a CHECKPOINT
(other than 0), then it is restartable from the last successful checkpoint. Therefore, if the job fails,
simply resubmit the job. Here are the two options: Suppose Phase 1 halts prematurely; the Data
Acquisition phase is incomplete. Resubmit the FastLoad script. FastLoad will begin from RECORD
1 or the first record past the last checkpoint. If you wish to manually specify where FastLoad should
restart, locate the last successful checkpoint record by referring to the SYSADMIN.FASTLOG table.
To manually specify where a restart will start, use the RECORD command. Normally, it is not
necessary to use the RECORD command — let FastLoad automatically determine where to restart
from.

If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know
that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the
BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of the
target table.

Restarting without CHECKPOINT


When a failure occurs and the FastLoad Script did not utilize the CHECKPOINT (i.e.,
CHECKPOINT 0), one procedure is to DROP the target table and error tables and rerun the job.
Here are some other options available to you:

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 19

1. Resubmit job again and hope there is enough PERM space for all the rows already sent to
the unsorted target table plus all the rows that are going to be sent again to the same target
table. Other than using space, these rows will be rejected as duplicates. As you can
imagine, this is not the most efficient way since it processes many of the same rows twice.

2. If CHECKPOINT wasn't specified, then CHECKPOINT defaults to 0 (no checkpoint). You


can perform a manual restart using the RECORD statement. If the output print file shows
that record 100000 was read, use something like the following command: [RECORD
100001;]. This statement will skip records 1 through 100000 and resume on record 100001.

Using INMODs with FastLoad


When you find that FastLoad does not read the file type you have or you wish to control the access
for any reason, then it might be desirable to use an INMOD. An INMOD (Input Module), is fully
compatible with FastLoad in either mainframe or LAN environments, providing that the appropriate
programming languages are used. However, INMODs replace the normal mainframe DDNAME or
LAN defined FILE name with the following statement: DEFINE INMOD=<INMOD-name>. For a
more indepth discussion of INMODs, see the chapter of this book titled, "INMOD Processing".

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

You might also like