Teradata Utilities FastLoad
Teradata Utilities FastLoad
Table of Contents
Chapter 3: FastLoad........................................................................................................................1
Why it is Called "FAST" Load................................................................................................1
How FastLoad Works.......................................................................................................1
FastLoad Has Some Limits..............................................................................................2
Three Key Requirements for FastLoad to Run................................................................2
Maximum of 15 Loads......................................................................................................5
FastLoad Has Two Phases....................................................................................................5
Phase 1: Acquisition.........................................................................................................5
Phase 2: Application........................................................................................................6
FastLoad Commands.............................................................................................................6
Fastload Sample....................................................................................................................6
Executing a FastLoad Script..................................................................................................8
Another Sample FastLoad Script...........................................................................................9
Checkpoints.........................................................................................................................12
Converting Data Types with FastLoad.................................................................................13
A FastLoad Conversion Example........................................................................................13
When You Cannot RESTART FastLoad..............................................................................14
When You Can RESTART FastLoad...................................................................................15
Step Two: Run the FastLoad script.................................................................................16
What Happens When FastLoad Finishes............................................................................17
You Receive an Outcome Status...................................................................................17
You Receive a Status Report.........................................................................................17
You can Troubleshoot....................................................................................................17
Restarting FastLoad: A More In-Depth Look.......................................................................18
How the CHECKPOINT Option Works...........................................................................18
Restarting with CHECKPOINT.......................................................................................18
Restarting without CHECKPOINT........................................................................................18
Using INMODs with FastLoad..............................................................................................19
Chapter 3: FastLoad
"Where there is no patrol car, there is no speed limit."
- Al Capone
The way FastLoad works can be illustrated by home construction, of all things! Let's look at three
scenarios from the construction industry to provide an amazing picture of how the data gets loaded.
Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the
foundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot.
The fewer barriers there are to deal with, the quicker the new construction can progress. Building
custom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to start
with an empty table, like an empty lot, and then populate it with rows of data from another source.
Because the target table is empty, this method is typically the fastest way to load data. FastLoad will
never attempt to insert rows into a table that already holds data.
Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of
land on which to build a home, but the lot already has a house on it. In this case, the person may
determine that it is quicker and more advantageous just to demolish the old house and start fresh
from the ground up — allowing for brand new construction. FastLoad also likes this approach to
loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure,
and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new
rows, this process will run much quicker than using MultiLoad to populate the existing table. Another
option is to DELETE all the data rows from a populated target table and reload it. This requires less
updating of the Data Dictionary than dropping and recreating a table. In either case, the result is a
perfectly empty target table that FastLoad requires!
Scenario Three: Sometimes, a customer has a good house already but wants to remodel a portion
of it or to add an additional room. This kind of work takes more time than the work described in
Scenario One. Such work requires some tearing out of existing construction in order to build the
new section. Besides, the builder never knows what he will encounter beneath the surface of the
existing home. So you can easily see that remodeling or additions can take more time than new
construction. In the same way, existing tables with data may need to be updated by adding new
rows of data. To load populated tables quickly with large amounts of data while maintaining the data
currently held in those tables, you would choose MultiLoad instead of FastLoad. MultiLoad is
designed for this task but, like renovating or adding onto an existing house, it may take more time.
This is different from BTEQ and TPump, which load data at the row level. It has been said, "If you
have it, flaunt it!" FastLoad does not like to brag, but it takes full advantage of Teradata's parallel
architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module
Processor — the software processor in Teradata responsible for reading and writing data to the
disks) in order to maximize parallel processing. This advantage is passed along to the FastLoad
user in terms of awesome performance. Teradata is the only data warehouse loads data, processes
data and backs up data in parallel.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 2
Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only
allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI and
NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build only
data rows. A secondary index is stored in a subtable block and many times on a different AMP from
the data row. This would slow FastLoad down and they would have to call it: get ready now,
HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just
drop them. You may easily recreate them after completing the load.
Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are
defined with Referential Integrity (RI). This would require too much system checking to prevent
referential constraints to a different table. FastLoad only does one table. In short, RI constraints will
need to be dropped from the target table prior to the use of FastLoad.
Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay
attention to the needs of other tables, which is what Triggers are all about. Additionally, these
require more than one AMP and more than one table. FastLoad does one table only. Simply ALTER
the Triggers to the DISABLED status prior to using FastLoad.
Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables that
allow duplicate rows — that is when the values in every column are identical. When FastLoad finds
duplicate rows, they are discarded. While FastLoad can load data into a multi-set table, FastLoad
will not load duplicate rows into a multi-set table because FastLoad discards duplicate rows!
Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP
must be repaired before the load process can be restarted. Other than this, FastLoad can recover
from system glitches and perform restarts. We will discuss Restarts later in this chapter.
Rule #6: No more than one data type conversion is allowed per column during a FastLoad.
Why just one? Data type conversion is highly resource intensive job on the system, which requires a
"search and replace" effort. And that takes more time. Enough said!
Log Table: FastLoad needs a place to record information on its progress during a load. It uses the
table called Fastlog in the SYSADMIN database. This table contains one row for every FastLoad
running on the system. In order for your FastLoad to use this table, you need INSERT, UPDATE
and DELETE privileges on that table.
Empty Target Table: We have already mentioned the absolute need for the target table to be
empty. FastLoad does not care how this is accomplished. After an initial load of an empty target
table, you are now looking at a populated table that will likely need to be maintained.
If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of speed
and for less interaction with the Data Dictionary, just to delete all the rows from that table and then
reload it with fresh data. The syntax DELETE <databasename>.<tablename> should be used for
this. But sometimes, as in some of our FastLoad sample scripts below (see Figure 4-1), you want to
drop that table and recreate it versus using the DELETE option. To do this, FastLoad has the ability
to run the DDL statements DROP TABLE and CREATE TABLE. The problem with putting DDL in
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 3
the script is that is no longer restartable and you are required to rerun the FastLoad from the
beginning. Otherwise, we recommend that you have a script for an initial run and a different script
for a restart.
AXSMOD Short for Access Module, this command specifies input protocol
like OLEDB or reading a tape from REEL Librarian. This
parameter is for network-attached systems only. When used, it
must precede the DEFINE command in the script.
BEGIN LOADING This identifies and locks the FastLoad target table for the
duration of the load. It also identifies the two error tables to be
used for the load. CHECKPONT and INDICATORS are
subordinate commands in the BEGIN LOADING clause of the
script. CHECKPOINT, which will be discussed below in detail, is
not the default for FastLoad. It must be specified in the script.
INDICATORS is a keyword related to how FastLoad handles
nulls in the input file. It identifies columns with nulls and uses a
bitmap at the beginning of each row to show which fields contain
a null instead of data. When the INDICATORS option is on,
FastLoad looks at each bit to identify the null column. The
INDICATORS option does not work with VARTEXT.
CREATE TABLE This defines the target table and follows normal syntax. If used,
this should only be in the initial script. If the table is being loaded,
it cannot be created a second time.
DEFINE This names the input file and describes the columns in that file
and the data types for those columns.
DELETE Deletes all the rows of a table. This will only work in the initial run
of the script. Upon restart, it will fail because the table is locked.
DROP TABLE Drops a table and its data. It is used in FastLoad to drop previous
Target and error tables. At the same time, this is not a good thing
to do within a FastLoad script since it cancels the ability to
restart.
END LOADING Success! This command indicates the point at which that all the
data has been transmitted. It tells FastLoad to proceed to Phase
II. As mentioned earlier, it can be used as a way to partition data
loads to the same table by omitting if from the script. This is true
because the table remains empty until after Phase II. You can
also use .End Loading to go to Phase 2. Instead of then being
finished, Fastload will instead be paused.
ERRLIMIT Specifies the maximum number of rejected ROWS allowed in
error table 1 (Phase I). This handy command can be a lifesaver
when you are not sure how corrupt the data in the input file is.
The more corrupt it is, the greater the clean up effort required
after the load finishes. ERRLIMIT provides you with a safety
valve. You may specify a particular number of error rows beyond
which FastLoad will precede to the abort. This provides the
option to restart the FastLoad or to scrub the input data more
before loading it. Remember, all the rows in the error table are
not in the data table. That becomes your responsibility.
HELP Designed for online use, the Help command provides a list of all
possible FastLoad commands along with brief, but pertinent tips
for using them.
HELP TABLE Builds the table columns list for use in the FastLoad DEFINE
statement when the data matches the Create Table statement
exactly. In real life this does not happen very often.
INSERT This is FastLoad's favorite command! It inserts rows into the
target table.
LOGON/LOGOFF No, this is not the WAX ON / WAX OFF from the movie, The
or, QUIT Karate Kid! LOGON simply begins a session. LOGOFF ends a
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 4
Figure 4-1
Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be
populated should errors occur during the load process. These are required by the FastLoad utility,
which will automatically create them for you; all you must do is to name them. The first error table is
for any translation errors or constraint violations. For example, a row with a column containing a
wrong data type would be reported to the first error table. The second error table is for errors
caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one
occurrence for every UPI. The other occurrences will be stored in this table. However, if the entire
row is a duplicate, FastLoad counts it but does not store the row. These tables may be analyzed
later for troubleshooting should errors occur during the load. For specifics on how you can
troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 5
Maximum of 15 Loads
The Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or
FastExports at the same time. This maximum is determined by a value stored in the DBS Control
record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5
concurrent jobs.
Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will
protect the amount of system resources available by queuing up the extra load. For example, if the
maximum number of jobs are currently running on the system and you attempt to run one more, that
job will not be started. You should view this limit as a safety control. Here is a tip for remembering
how the load limit applies: If the name of the load utility contains either the word "Fast" or the word
"Load", then there can be only a total of fifteen of them running at any one time.
Phase 1: Acquisition
The primary function of Phase 1 is to transfer data from the host computer to the Access Module
Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata
does not take the time to hash each row of data based on the Primary Index. That will be done later.
Instead, it does the following:
When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the
SQL just once. The PE is the Teradata software processor responsible for parsing syntax and
generating a plan to execute the request. It then opens a Teradata session from the FastLoad client
directly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems,
it is normally a good idea to limit the number of sessions using the SESSIONS command. This
capability is shown below.
Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer
to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To
accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without
any concern for which AMP gets the block. The result is that data rows arrive on different AMPs
than those they would live, had they been hashed.
So how do the rows get to the correct AMPs where they will permanently reside? Following the
receipt of every data block, each AMP hashes its rows based on the Primary Index, and
redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP
but remain unsorted until Phase 1 is complete.
Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping
industry today. How do the key players in this industry handle a parcel? When the shipping
company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for
the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that
hub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way that
the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP.
This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards them to
their true destination. This is like the shipping parcel being sent from a hub city to its destination city!
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 6
Phase 2: Application
Following the scenario described above, the shipping vendor must do more than get a parcel to the
destination city. Once the packages arrive at the destination city, they must then be sorted by street
and zip code, placed onto local trucks and be driven to their final, local destinations.
Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address (i.e.,
where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it
writes the rows into the table space on disks where they will permanently reside. Rows of a table
are stored on the disks in data blocks. The AMP uses the block size as defined when the target
table was created. If the table is Fallback protected, then the Fallback will be loaded after the
Primary table has finished loading. This enables the Primary table to become accessible as soon as
possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!
FastLoad Commands
Here is a table of some key FastLoad commands and their definitions. They are used to provide
flexibility in control of the load process. Consider this your personal redireference guide! You will
notice that there are only a few SQL commands that may be used with this utility (Create Table,
Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional
functions that would slow it down.
Fastload Sample
"Mistakes are a part of being human. Appreciate your mistakes for what they are:
precious life lessons that can only be learned the hard way. Unless it's a fatal
mistake, which, at least, others can learn from."
– Al Franken
Fastload is a utility we can use to populate empty tables. Make no mistake about how useful
Fastload can be or how fatal errors can occur. The next 2 slides illustrate the essentials needed
when constructing your fastload script. The first will highlight the important areas about the
FastLoad script, and the second slide is a blank copy of the script that you can use to create your
own FastLoad script. Use the flat file we created in the BTEQ chapter to help run the script.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 7
Simply copy the following text into notepad, then save it with a name and location that you can
easily remember (we saved ours as c:\temp\Fastload_First_Script.txt).
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 8
This script is going to create a table called Employee_Table02. After the table is created, it's going
to take the information from our flat file and insert it into the new table. Afterwards, the
Employee_Table and Employee_Table02 should look identical.
"A good plan, violently executed now, is better than a perfect plan next week."
- George S. Patton
We can execute the Fastload utility like we do with BTEQ; however we use the command "fastload"
instead of "BTEQ". If we get a return code of 0 then the Fastload worked perfectly. What did
General Patton say when his Fastload gave him a return code of 12? I shall return 0!
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 9
The load utilities often scare people because there are many things that appear complicated. In
actuality, the load scripts are very simple. Think of FastLoad as:
• Defining the Teradata table that you want to load (target table)
this example, it shows the table structure and the description of the data being read.
Let's look at another FastLoad script that you might see in the real world. In the script below, every
comment line is placed inside the normal Teradata comment syntax, [/*. . . . */]. FastLoad and SQL
commands are written in upper case in order to make them stand out. In reality, Teradata utilities,
like Teradata itself, are by default not case sensitive. You will also note that when column names
are listed vertically we recommend placing the comma separator in front of the following column.
Coding this way makes reading or debugging the script easier for everyone. The purpose of this
script is to load the Employee_Profile table in the SQL01 database. The input file used for the load
is named EMPS.TXT. Below the sample script each step will be described in detail.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 11
Step One: Before logging onto Teradata, it is important to specify how many sessions you need.
The syntax is [SESSIONS {n}].
Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands
in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying
commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot
["."] in front of them and therefore need a semicolon. At this point we chose to have Teradata tell us
which version of FastLoad is being used for the load. Why would we recommend this? We do
because as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts may
have to be revisited.
Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure
in the DEFINE statement, you must first set the RECORD layout type for the file being passed by
FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are
FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file
ahead of time.
Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of
the flat file to be used as the input FILE, or source file for the load.
Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what
you want loaded. In the BEGIN LOADING statement, the script must name the target table and the
two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error
tables in this script? FastLoad will automatically create them for you once you name them in the
script. In this instance, they are named "Emp_Err1" and "Emp_Err2". Phase 1 uses "Emp_Err1"
because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You may
call them whatever you like. At the same time, they must be unique within a database, so using a
combination of your userid and target table name helps insure this uniqueness between multiple
FastLoad jobs occurring in the same database.
In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter.
We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a
vital task with regard to the load. In the old days, children were always told to focus on the three
"R's' in grade school ("reading, 'riting, and 'rithmatic"). There are two very different, yet equally
important, R's to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN
means that the job is capable of running all the processing again from the beginning of the load.
RESTART means that the job is capable of running the processing again from the point where it left
off when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allows
FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will
learn more about CHECKPOINT in the section on Restarting FastLoad.
Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier's
do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to
Phase 2 without the END LOADING command.
In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the
start of the job, it prevents loading rows as they arrive from different time zones. However, to
accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the
same FastLoad multiple times and continue loading the worktables until the last file is received.
Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into
smaller segments instead of one huge job. This makes FastLoad even faster!
Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or
CREATE commands within the script. Additionally, every script is exactly the same with the
exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase
2. That's a pretty clever way to do a partitioned type of data load.
Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the
last utility command in your script. At this point the table lock is released and if there are no rows in
the error tables, they are dropped automatically. However, if a single row is in one of them, you are
responsible to check it, take the appropriate action and drop the table manually.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 12
Checkpoints
"Once the game is over, the king and the pawn go back in the same box."
- Italian Proverb
Fastload has the ability to save checkpoints during the loading process. Checkpoints are what
enable utilities to pick up from where they left off if the loading process was interrupted in any way.
Choosing a correct checkpoint can be easily calculated:
Determining a Checkpoint
Add up the approximate byte count of 1 row. The row below adds up to:
Integer = 4 bytes
• Employee_No:
Smallint = 2 bytes
• Dept_No:
Char(20) = 20 bytes
• Last_Name:
VarChar(12) = 14 bytes
• First_Name:
Decimal(8,2) = 5 bytes
• Salary:
= 45 bytes
• Total:
Now take the total number of bytes per row (45 bytes in our case) and divide 64,000 by that
number. (64,000 / 45 = 1422.2) The number you come up with is the number of rows that will be
bundled together in each data block set.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 13
Setting the checkpoint to 1000 would be pointless because the computer would take a checkpoint
every data block! A 1,000,000 checkpoint would work well here, sending approximately 703 data
blocks between checkpoints.
"You don't drown by falling in the water; you drown by staying in the water."
- Edwin Louis Cole
Converting data is easy. Just define the input data types in the input file. Then, FastLoad will
compare that to the column definitions in the Data Dictionary and convert the data for you! But the
cardinal rule is that only one data type conversion is allowed per column. In the example below,
notice how the columns in the input file are converted from one data type to another simply by
redefining the data type in the CREATE TABLE statement.
FastLoad allows six kinds of data conversions. Here is a chart that displays them:
IN FASTLOAD YOU MAY CONVERT
CHARACTER DATA TO NUMERIC DATA
FIXED LENGTH DATA TO VARIABLE LENGTH DATA
CHARACTER DATA TO DATE
INTEGERS TO DECIMALS
DECIMALS TO INTEGERS
DATE TO CHARACTER DATA
NUMERIC DATA TO CHARACTER DATA
Figure 4-5
When we said that converting data is easy, we meant that it is easy for the user. It is actually quite
resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is
important, keep the number of columns being converted to a minimum!
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 14
LOGON TO TERADATA
LOGON CDW/jones, cowboys;
NOTICE THAT DEPT_NO IS AN INTEGER
CREATE TABLE SQL01.Department HERE IN THE TARGET TABLE, BUT A
( CHAR(4) IN THE FLAT FILE DEFINITION
Dept_No INTEGER
,Dept_Name CHAR(20) BELOW - CHAR(4) will convert to integer
,Dept_Start_Date DATE These date columns are DATE data type
,Dept_Finish_Date DATE will be converted from CHAR(10)
,Dept_Name CHAR(20) )
UNIQUE PRIMARY INDEX ( Dept_No );
CHAR(4) converts to INTEGER
DEFINE
Department_No (CHAR(4)) Character dates in different style in the
,Department Name (CHAR(20))
,SDate (CHAR(10)) file:
,FDate (CHAR(10))
CHAR(10) comes in as YYYY-MM-DD
END LOADING;
LOGOFF;
Figure 4-5
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 15
Can you tell from the following sample FastLoad script why it is not restartable?
LOGON TO TERADATA
LOGON CDW/tommy, cowboys;
DROPS TARGET TABLE AND
DROP TABLE SQL01.Department; ERROR TABLES
DROP TABLE SQL01.Dept_Err1;
DROP TABLE SQL01.Dept_Err2;
CREATES THE DEPARTMENT
CREATE TABLE SQL01.Department TARGET TABLE IN THE SQL01
(Dept_No INTEGER DATABASE IN TERADATA.
,Dept_Name CHAR(20)
)
UNIQUE PRIMARY INDEX (Dept_No);
DEFINES THE FLATFILE
DEFINE Department_No (INTEGER) STRUCTURE AND NAME OF
,Department_Name (CHAR(20)) THE INPUT FILE.
FILE= Dept_Flat.txt;
SPECIFIES TABLE TO LOAD
BEGIN LOADING SQL01.Department AND ERROR TABLES.
ERRORFILES SQL01.Dept_Err1, INDICATORS are defined on the
SQL01.Dept_Err2
INDICATORS BEGIN
CHECKPOINT 5000;
INSERT COMMAND
INSERT INTO SQL01.Department VALUES
(:Department_No
,:Department_Name);
Optional use of self-defining
/* since data file and table are the same, self-defining INSERT, the DEFINE would
INSERT would also work: contain only the FILE=
INSERT INTO SQL01.Department.*; */
START PHASE 2
END LOADING;
LOGOFF; LOGOFF
Figure 4-7
Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a
system reset or some glitch that stops the job one half way through it. Maybe the mainframe went
down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably
just RERUN the job for small data loads.
However, when you are loading a billion rows, this is not a good idea because it wastes time. So the
most common way to deal with these situations is simply to RESTART the job. But what if the
normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows
loaded? In that case, you might want to make sure that the job is totally restartable. Let's see how
this is done.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 16
So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have a
table whose data changes so much that you typically drop it monthly and build it again. Let's go
back to the script we just reviewed above and see how we can break it into the two parts necessary
to make it fully RESTARTABLE. It is broken up below.
STEP ONE: Run the following SQL statements in Queryman or BTEQ before
you start FastLoad:
DROPS TARGET TABLE AND
DROP TABLE SQL01.Department; ERROR TABLES
DROP TABLE SQL01.Dept_Err1;
DROP TABLE SQL01.Dept_Err2;
CREATES THE DEPARTMENT
CREATE TABLE SQL01.Department TARGET TABLE IN THE SQL01
(Dept_No INTEGER DATA BASE IN TERADATA
,Dept_Name CHAR(20)
)
UNIQUE PRIMARY INDEX (Dept_No);
Figure 4-8
First, you ensure that the target table and error tables, if they existed previously, are blown away. If
there had been no errors in the error tables, they would be automatically dropped. If these tables did
not exist, you have not lost anything. Next, if needed, you create the empty table structure needed
to receive a FastLoad.
This is the portion of the earlier script that carries out these vital steps:
• Tells FastLoad where to load the data and store the errors
If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with
the next record after the last checkpoint. Now, with that said, if you did not request a checkpoint, the
output message will normally indicate how many records were loaded.
You may optionally use the RECORD command to manually restart on the next record after the one
indicated in the message.
Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN
LOADING and END LOADING. It will then restart right into Phase 2.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 17
The locks will not be removed and the error tables will not be dropped without a successful
completion. This is because FastLoad assumes that it will need them for its restart. At the same
time, the lock on the target table will not be released either. When running FastLoad, you
realistically have two choices once it is started. First choice is that you get it to run to a successful
completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is
normally to get it to finish successfully via a restart.
Figure 4-9
The first line displays the total number of records read from the input file. Were all of them loaded?
Not really. The second line tells us that there were fifty rows with constraint violations, so they were
not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that
there were zero entries into the second error table, indicating that there were no duplicate Unique
Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the
empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the
duplicates would only have been counted. They are not stored in the error tables anywhere. When
FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the
number of records read in line 1.
Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be
duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint
(quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of
rows will be sent to the AMPs again because the restart starts on the next record after the value
stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some
of the consecutive rows are sent a second time. These will be caught as duplicate rows after the
sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET
table. It assumes they are duplicates because of this logic.
As a user, you can select from either error table. To check errors in Errortable1 you would use this
syntax:
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 18
Corrected rows may be inserted to the target table using another utility that does not require an
empty table.
The definition of the second error table is exactly the same as the target table with all the same
columns and data types.
At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly.
Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains
a row for all currently running FastLoad jobs with the last successfully reached checkpoint for each
job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last
successfully reported checkpoint prior to the error. It will then restart from the record immediately
following that checkpoint and start building the next block of data to load. If such an error occurs in
Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row. If this is not
desirable, the RECORD statement can be used to force a restart at the next record after the failure.
If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know
that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the
BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of the
target table.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 19
1. Resubmit job again and hope there is enough PERM space for all the rows already sent to
the unsorted target table plus all the rows that are going to be sent again to the same target
table. Other than using space, these rows will be rejected as duplicates. As you can
imagine, this is not the most efficient way since it processes many of the same rows twice.
Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited