Teradata Utilities
Teradata Utilities
Chapter 2: BTEQ
An Introduction to BTEQ
Why it is called BTEQ?
Why is BTEQ available on every Teradata system ever built? Because the Batch TEradata Query (BTEQ) tool was the original way that SQL was submitted to Teradata as a means of getting an answer set in a desired format. This is the utility that I used for training at Wal*Mart, AT&T, Anthem Blue Cross and Blue Shield, and SouthWestern Bell back in the early 1990's. BTEQ is often referred to as the Basic TEradata Query and is still used today and continues to be an effective tool. Here is what is excellent about BTEQ: BTEQ can be used to submit SQL in either a batch or interactive environment. Interactive users can submit SQL and receive an answer set on the screen. Users can also submit BTEQ jobs from batch scripts, have error checking and conditional logic, and allow for the work to be done in the background. BTEQ outputs a report format, where Queryman outputs data in a format more like a spreadsheet. This allows BTEQ a great deal of flexibility in formatting data, creating headings, and utilizing Teradata extensions, such as WITH and WITH BY that Queryman has problems in handling. BTEQ is often used to submit SQL, but is also an excellent tool for importing and exporting data. o Importing Data: Data can be read from a file on either a mainframe or LAN attached computer and used for substitution directly into any Teradata SQL using the INSERT, UPDATE or DELETE statements. o Exporting Data: Data can be written to either a mainframe or LAN attached computer using a SELECT from Teradata. You can also pick the format you desire ranging from data files to printed reports to Excel formats. There are other utilities that are faster than BTEQ for importing or exporting data. We will talk about these in future chapters, but BTEQ is still used for smaller jobs.
Logging on to BTEQ
Before you can use BTEQ, you must have user access rights to the client system and privileges to the Teradata DBS. Normal system access privileges include a userid and a password. Some systems may also require additional user identification codes depending on company standards and operational procedures. Depending on the configuration of your Teradata DBS, you may need to include an account identifier (acctid) and/or a Teradata Director Program Identifier (TDPID).
Page 1 of 91
Figure 2-2
Figure 2-3 Notice that the BTEQ command is immediately followed by the <Batc hScript.txt' to tell BTEQ which file contains the commands to execute. Then, the >Output.txt' names the file where the output messages are written. Here is an example of the contents of BatchScript.txt file. BatchScript.txt File
Figure 2-4 The above illustration shows how BTEQ can be manually invoked from a command prompt and displays how to specify the name and location of the batch script file to be executed. The previous examples show that when logging onto BTEQ in interactive mode, the user actually types in a logon string and then Teradata will prompt for a password. However, in batch mode, Teradata requires both a logon and password to be directly stored as part of the script. Since putting this sensitive information into a script is scary for security reasons, inserting the password directly into a script that is to be processed in batch mode may not be a good idea. It is generally recommended and a common practice to store the logon and password in a separate file that that can be secured. That way, it is not in the script for anyone to see. For example, the contents of a file called "mylogon.txt" might be: .LOGON cdw/sql00,whynot. Then, the script should contain the following command instead of a .LOGON, as shown below and again in the following script: .RUN FILE=mylogon.txt This command opens and reads the file. It then executes every record in the file.
Page 2 of 91
Figure 2-5
Page 3 of 91
Teradata Utilities-Breaking the Barriers, First Edition Since both DATA and INDICDATA store each column on disk in native format with known lengths and characteristics, they are the fastest method of transferring data. However, it becomes imperative that you be consistent. When it is exported as DATA, it must be imported as DATA and the same is true for INDICDATA. Again, this internal processing is automatic and potentially important. Yet, on a network-attached system, being consistent is our only responsibility. However, on a mainframe system, you must account for these bits when defining the LRECL in the Job Control Language (JCL). Otherwise, your length is too short and the job will end with an error. To determine the correct length, the following information is important. As mentioned earlier, one bit is needed per field output onto disk. However, computers allocate data in bytes, not bits. Therefore, if one bit is needed a minimum of eight (8 bits per byte) are allocated. Therefore, for every eight fields, the LRECL becomes 1 byte longer and must be added. In other words, for nine columns selected, 2 bytes are added even though only nine bits are needed. With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the Host computer the ability to allocate bits in the form of a byte. Therefore, if one bit is required by the host system, INDICDATA mode will automatically allocate eight of them. This means that from one to eight columns being referenced in the SELECT will add one byte to the length of the record. When selecting nine to sixteen columns, the output record will be two bytes longer. When executing on non-mainframe systems, the record length is automatically maintained. However, when exporting to a mainframe, the JCL (LRECL) must account for this addition length. DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus. The optional limit is to tell BTEQ to stop returning rows after a specific number (n) of rows. This might be handy in a test environment to stop BTEQ before the end of transferring rows to the file.
The following example uses a Record (DATA) Mode format. The output of the exported data will be a flat file. Employee_Table
Figure 2-6
Page 4 of 91
Figure 2-7 After this script has completed, the following report will be generated on disk. Employee_No 2000000 1256349 1333454 1121334 1324657 2341218 1232578 1000234 2312225 Last_name Jones Harrison Smith Strickling Coffing Reilly Chambers Smythe Larkins First_name Squiggy Herbert John Cletus Billy William Mandee Richard Loraine Salary 32800.50 54500.00 48000.00 54500.00 41888.88 36000.00 56177.50 64300.00 40200.00 Dept_No ? 400 200 400 200 400 100 10 300
I remember when my mom and dad purchased my first Lego set. I was so excited about building my first space station that I ripped the box open, and proceeded to follow the instructions to complete the station. However, when I was done, I was not satisfied with the design and decided to make changes. So I built another space ship and constructed another launching station. BTEQ export works in the same manner, as the basic EXPORT knowledge is acquired, the more we can build on that foundation. With that being said, the following is an example that displays a more robust example of utilizing the Field ( Report) option. This example will export data in Field (Report) Mode format. The output of the exported data will appear like a standard output of a SQL SELECT statement. In addition, aliases and a title have been added to the script.
Figure 2-8 After this script has been completed, the following report will be generated on disk. Employee Profiles Employee Number Last Name First Name Salary Department Number Page 5 of 91
From the above example, a number of BTEQ commands were added to the export script. Below is a review of those commands. The WIDTH specifies the width of screen displays and printed reports, based on characters per line. The FORMAT command allows the ability to enable/inhibit the page-oriented format option. The HEADING command specifies a header that will appear at the top every page of a report.
Page 6 of 91
Figure 2-9 From the above example, a number of BTEQ commands were added to the import script. Below is a review of those commands. .QUIET ON limits BTEQ output to reporting only errors and request processing statistics. Note: Be careful how you spell .QUIET, else forgetting the E becomes .QUIT and it will. .REPEAT * causes BTEQ to read a specified number of records or until EOF. The default is one record. Using REPEAT 10 would perform the loop 10 times. The USING defines the input data fields and their associated data types coming from the host. The following builds upon the IMPORT Record (DATA) example above. The example below will still utilize the Record (DATA) Mode format. However, this script will add a CREATE TABLE statement. In addition, the imported data will populate the newly created Employee_Profile table.
Page 7 of 91
Figure 2-10 Notice that some of the scripts have a .LOGOFF and .QUIT. The .LOGOFF is optional because when BTEQ quits, the session is terminated. A logoff makes it a friendly departure and also allows you to logon with a different user name and password.
Page 8 of 91
Teradata Utilities-Breaking the Barriers, First Edition Variable columns: Variable length columns should be calculated as the maximum value plus two. This two bytes is for the number of bytes for the binary length of the field. In reality you can save much space because trailing blanks are not kept. The logical record will assume the maximum and add two bytes as a length field per column. VARCHAR(8) VARCHAR(10) 10 bytes 12 bytes
Indicator columns: As explained earlier, the indicators utilize a single bit for each field. If your record has 8 fields (which require 8 bits), then you add one extra byte to the total length of all the fields. If your record has 9-16 fields, then add two bytes.
BTEQ Commands
The BTEQ commands in Teradata are designed for flexibility. These commands are not used directly on the data inside the tables. However, these 60 different BTEQ commands are utilized in four areas. Session Control Commands File Control Commands Sequence Control Commands Format Control Commands
Page 9 of 91
Figure 2-11
Page 10 of 91
Figure 2-12
Figure 2-13
Page 11 of 91
Figure 2-14
Page 12 of 91
Chapter 3: FastExport
An Introduction to FastExport
Why it is called "FAST" Export
FastExport is known for its lightning speed when it comes to exporting vast amounts of data from Teradata and transferring the data into flat files on either a mainframe or network-attached computer. In addition, FastExport has the ability to except OUTMOD routines, which provides the user the capability to write, select, validate, and preprocess the exported data. Part of this speed is achieved because FastExport takes full advantage of Teradata's parallelism. In this book, we have already discovered how BTEQ can be utilized to export data from Teradata in a variety of formats. As the demand increases to store data, the ever-growing requirement for tools to export massive amounts of data. This is the reason why FastExport (FEXP) is brilliant by design. A good rule of thumb is that if you have more than half a million rows of data to export to either a flat file format or with NULL indicators, then FastExport is the best choice to accomplish this task. Keep in mind that FastExport is designed as a one-way utility-that is, the sole purpose of FastExport is to move data out of Teradata. It does this by harnessing the parallelism that Teradata provides. FastExport is extremely attractive for exporting data because it takes full advantage of multiple sessions, which leverages Teradata parallelism. FastExport can also export from multiple tables during a single operation. In addition, FastExport utilizes the Support Environment, which provides a job restart capability from a checkpoint if an error occurs during the process of executing an export job.
FastExport Fundamentals
#1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it takes data off of Teradata (Exports Data). FastExport does not import data into Teradata. Additionally, like BTEQ it can output multiple files in a single run. #2: FastExport only supports the SELECT statement. The only DML statement that FastExport understands is SELECT. You SELECT the data you want exported and FastExport will take care of the rest. #3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows . When a large amount of data is being exported, FastExport is recommended over BTEQ Export. The only drawback is the total number of FastLoads, FastExports, and MultiLoads that can run at the same time, which is limited to 15. BTEQ Export Page 13 of 91
Teradata Utilities-Breaking the Barriers, First Edition does not have this restriction. Of course, FastExport will work with less data, but the speed may not be much faster than BTEQ. #4: FastExport supports multiple SELECT statements and multiple tables in a single run. You can have multiple SELECT statements with FastExport and each SELECT can join information up to 64 tables. #5: FastExport supports conditional logic, conditional expressions, arithmetic calculations, and data conversions. FastExport is flexible and supports the above conditions, calculations, and conversions. #6: FastExport does NOT support error files or error limits. FastExport does not record particular error types in a table. The FastExport utility will terminate after a certain number of errors have been encountered. #7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows you write INMOD and OUTMOD routines so you can select, validate and preprocess the exported data
Maximum of 15 Loads
The Teradata RDBMS will only support a maximum of 15 simultaneous FastLoad, MultiLoad, or FastExport utility jobs. This maximum value is determined and configured by the DBS Control record. This value can be set from 0 to 15. When Teradata is initially installed, this value is set at 5. The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to transfer data. If more then 15 simultaneous jobs were supported, a saturation point could be reached on the availability of resources. In this case, Teradata does an excellent job of protecting system resources by queuing up additional FastLoad, MultiLoad, and FastExport jobs that are attempting to connect. For example, if the maximum numbers of utilities on the Teradata system is reached and another job attempts to run that job does not start. This limitation should be viewed as a safety control feature. A tip for remembering how the load limit applies is this, "If the name of the load utility contains either the word 'Fast' or the word 'Load', then there can be only a total of fifteen of them running at any one time". BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting data. However, if two many load jobs are running. BTEQ is an alternate choice for exporting data.
Page 14 of 91
Figure 3-1
Task Commands
Figure 3-2
Page 15 of 91
SQL Commands
Figure 3-3
Page 16 of 91
Figure 3-4
Figure 3-5
Page 17 of 91
FastExport Formats
FastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement specifies the format for each record being exported which are: FASTLOAD BINARY TEXT UNFORMAT The default FORMAT is FASTLOAD in a UNIX or LAN environment. FASTLOAD Format is a two-byte integer, followed by the data, followed by an end-of- record marker. It is called FASTLOAD because the data is exported in a format ready for FASTLOAD. BINARY Format is a two-byte integer, followed by data. TEXT is an arbitrary number of bytes followed by an end-of-record marker. UNFORMAT is exported as it is received from CLIv2 without any client modifications.
Figure 3-6
Page 18 of 91
Chapter 4: FastLoad
An Introduction to FastLoad
Why it is called "FAST" Load
FastLoad is known for its lightning-like speed in loading vast amounts of data from flat files from a host into empty tables in Teradata. Part of this speed is achieved because it does not use the Transient Journal. You will see some more of the reasons enumerated below. But, regardless of the reasons that it is fast, know that FastLoad was developed to load millions of rows into a table. The way FastLoad works can be illustrated by home construction, of all things! Let's look at three scenarios from the construction industry to provide an amazing picture of how the data gets loaded. Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the foundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot. The fewer barriers there are to deal with, the quicker the new construction can progress. Building custom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to start with an empty table, like an empty lot, and then populate it with rows of data from another source. Because the target table is empty, this method is typically the fastest way to load data. FastLoad will never attempt to insert rows into a table that already holds data. Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of land on which to build a home, but the lot already has a house on it. In this case, the person may determine that it is quicker and more advantageous just to demolish the old house and start fresh from the ground up-allowing for brand new construction. FastLoad also likes this approach to loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure, and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new rows, this process will run much quicker than using MultiLoad to populate the existing table. Another option is to DELETE all the data rows from a populated target table and reload it. This requires less updating of the Data Dictionary than dropping and recreating a table. In either case, the result is a perfectly empty target table that FastLoad requires! Scenario Three: Sometimes, a customer has a good house already but wants to remodel a portion of it or to add an additional room. This kind of work takes more time than the work described in Scenario One. Such work requires some tearing out of existing construction in order to build the new section. Besides, the builder never knows what he will encounter beneath the surface of the existing home. So you can easily see that remodeling or additions can take more time than new construction. In the same way, existing tables with data may need to be updated by adding new rows of data. To load populated tables quickly with large amounts of data while maintaining the data currently held in those tables, you would choose MultiLoad instead of FastLoad. MultiLoad is designed for this task but, like renovating or adding onto an existing house, it may take more time.
Teradata Utilities-Breaking the Barriers, First Edition Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI and NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build only data rows. A secondary index is stored in a subtable block and many times on a different AMP from the data row. This would slow FastLoad down and they would have to call it: get ready now, HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just drop them. You may easily recreate them after completing the load. Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are defined with Referential Integrity (RI). This would require too much system checking to prevent referential constraints to a different table. FastLoad only does one table. In short, RI constraints will need to be dropped from the target table prior to the use of FastLoad. Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay attention to the needs of other tables, which is what Triggers are all about. Additionally, these require more than one AMP and more than one table. FastLoad does one table only. Simply ALTER the Triggers to the DISABLED status prior to using FastLoad. Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables that allow duplicate rows-that is when the values in every column are identical. When FastLoad finds duplicate rows, they are discarded. While FastLoad can load data into a multi-set table, FastLoad will not load duplicate rows into a multi-set table because FastLoad discards duplicate rows! Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP must be repaired before the load process can be restarted. Other than this, FastLoad can recover from system glitches and perform restarts. We will discuss Restarts later in this chapter. Rule #6: No more than one data type conversion is allowed per column during a FastLoad. Why just one? Data type conversion is highly resource intensive job on the system, which requires a "search and replace" effort. And that takes more time. Enough said!
Page 20 of 91
Page 21 of 91
Figure 4-1 Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be populated should errors occur during the load process. These are required by the FastLoad utility, which will automatically create them for you; all you must do is to name them. The first error table is for any translation errors or constraint violations. For example, a row with a column containing a wrong data type would be reported to the first error table. The second error table is for errors caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one occurrence for every UPI. The other occurrences will be stored in this table. However, if the entire row is a duplicate, FastLoad counts it but does not store the row. These tables may be analyzed later for troubleshooting should errors occur during the load. For specifics on how you can troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."
Maximum of 15 Loads
The Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or FastExports at the same time. This maximum is determined by a value stored in the DBS Control record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5 concurrent jobs. Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will protect the amount system resources available by queuing up the extra load. For example, if the maximum number of jobs are currently running on the system and you attempt to run one more, that job will not be started. You should view this limit as a safety control. Here is a tip for remembering how the load limit applies: If the name of the load utility contains either the word "Fast" or the word "Load", then there can be only a total of fifteen of them running at any one time.
PHASE 1: Acquisition
The primary function of Phase 1 is to transfer data from the host computer to the Access Module Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata does not does not take the time to hash each row of data based on the Primary Index. That will be done later. Instead, it does the following: When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL just once. The PE is the Teradata software processor responsible for parsing syntax and generating a plan to execute the request. It then opens a Teradata session from the FastLoad client directly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems, it is normally a good idea to limit the number of sessions using the SESSIONS command. This capability is shown below. Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To accomplish this, the rows are packed, Page 22 of 91
Teradata Utilities-Breaking the Barriers, First Edition unhashed, into large blocks and sent to the AMPs without any concern for which AMP gets the block. The result is that data rows arrive on different AMPs than those they would live, had they been hashed. So how do the rows get to the correct AMPs where they will permanently reside? Following the receipt of every data block, each AMP hashes its rows based on the Primary Index, and redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP but remain unsorted until Phase 1 is complete. Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping industry today. How do the key players in this industry handle a parcel? When the shipping company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that hub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way that the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP. This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards them to their true destination. This is like the shipping parcel being sent from a hub city to its destination city!
PHASE 2: Application
Following the scenario described above, the shipping vendor must do more than get a parcel to the destination city. Once the packages arrive at the destination city, they must then be sorted by street and zip code, placed onto local trucks and be driven to their final, local destinations. Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address (i.e., where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it writes the rows into the table space on disks where they will permanently reside. Rows of a table are stored on the disks in data blocks. The AMP uses the block size as defined when the target table was created. If the table is Fallback protected, then the Fallback will be loaded after the Primary table has finished loading. This enables the Primary table to become accessible as soon as possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!
FastLoad Commands
Here is a table of some key FastLoad commands and their definitions. They are used to provide flexibility in control of the load process. Consider this your personal redireference guide! You will notice that there are only a few SQL commands that may be used with this utility (Create Table, Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional functions that would slow it down
Page 23 of 91
Figure 4-2
Page 24 of 91
Figure 4-4 Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is [SESSIONS {n}]. Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot ["."] in front of them and therefore need a semicolon. At this point we chose to have Teradata tell us which version of FastLoad is being used for the load. Why would we recommend this? We do because as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts may have to be revisited. Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure in the DEFINE statement, you must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file ahead of time. Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used as the input FILE, or source file for the load. Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what you want loaded. In the BEGIN LOADING statement, the script must name the target table and the two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error tables in this script? FastLoad will automatically create them for you once you name them in the script. In this instance, they are named " Emp_Err1" and "Emp_Err2". Phase 1 uses "Emp_Err1" because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You may call them whatever you like. At the same time, they must be unique within a database, so using a combination of your userid and target table name helps insure this uniqueness between multiple FastLoad jobs occurring in the same database. In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task with regard to the load. In the old days, children were always told to focus on the three 'R's' in grade school ("reading, riting, and rithmatic"). There are two very different, yet equally important, R's to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN means that the job is capable of running all the processing again from the beginning of the load. RESTART means that the job is capable of running the processing again from the point where it left off when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allows Page 25 of 91
Teradata Utilities-Breaking the Barriers, First Edition FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will learn more about CHECKPOINT in the section on Restarting FastLoad. Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier's do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to Phase 2 without the END LOADING command. In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the start of the job, it prevents loading rows as they arrive from different time zones. However, to accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the same FastLoad multiple times and continue loading the worktables until the last file is received. Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into smaller segments instead of one huge job. This makes FastLoad even faster! Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or CREATE commands within the script. Additionally, every script is exactly the same with the exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase 2. That's a pretty clever way to do a partitioned type of data load. Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the last utility command in your script. At this point the table lock is released and if there are no rows in the error tables, they are dropped automatically. However, if a single row is in one of them, you are responsible to check it, take the appropriate action and drop the table manually.
Figure 4-5 When we said that converting data is easy, we meant that it is easy for the user. It is actually quite resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is important, keep the number of columns being converted to a minimum!
Page 26 of 91
Figure 4-5
Figure 4-7
Page 27 of 91
Teradata Utilities-Breaking the Barriers, First Edition Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a system reset or some glitch that stops the job one half way through it. Maybe the mainframe went down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably just RERUN the job for small data loads. However, when you are loading a billion rows, this is not a good idea because it wastes time. So the most common way to deal with these situations is simply to RESTART the job. But what if the normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows loaded? In that case, you might want to make sure that the job is totally restartable. Let's see how this is done.
Figure 4-8 First, you ensure that the target table and error tables, if they existed previously, are blown away. If there had been no errors in the error tables, they would be automatically dropped. If these tables did not exist, you have not lost anything. Next, if needed, you create the empty table structure needed to receive a FastLoad.
Teradata Utilities-Breaking the Barriers, First Edition The locks will not be removed and the error tables will not be dropped without a successful completion. This is because FastLoad assumes that it will need them for its restart. At the same time, the lock on the target table will not be released either. When running FastLoad, you realistically have two choices once it is started. First choice is that you get it to run to a successful completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is normally to get it to finish successfully via a restart.
Figure 4-9 The first line displays the total number of records read from the input file. Were all of them loaded? Not really. The second line tells us that there were fifty rows with constraint violations, so they were not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that there were zero entries into the second error table, indicating that there were no duplicate Unique Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the duplicates would only have been counted. They are not stored in the error tables anywhere. When FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the number of records read in line 1. Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint (quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of rows will be sent to the AMPs again because the restart starts on the next record after the value stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some of the consecutive rows are sent a second time. These will be caught as duplicate rows after the sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic.
Teradata Utilities-Breaking the Barriers, First Edition At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly. Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains a list of all currently running FastLoad jobs and the last successfully reached checkpoint for each job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last successfully reported checkpoint prior to the error. It will then restart from the record immediately following that checkpoint and start building the next block of data to load. If such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row.
Page 30 of 91
Chapter 5: MultiLoad
An Introduction to MultiLoad
Why it is called "Multi" Load
If we were going to be stranded on an island with a Teradata Data Warehouse and we could only take along one Teradata load utility, clearly, MultiLoad would be our choice. MultiLoad has the capability to load multiple tables at one time from either a LAN or Channel environment. This is in stark contrast to its fleet-footed cousin, FastLoad, which can only load one table at a time. And it gets better, yet! This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DML functions may be run either solo or in combinations, against one or more tables. For these reasons, MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment. As the volume of data being loaded or updated in a single block, the performance of MultiLoad improves. MultiLoad shines when it can impact more than one row in every data block. In other words, MultiLoad looks at massive amounts of data and says, "Bring it on!" Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradata load utilities resemble each other, although they may have some differences. You are going to be pleased to find that you do not have to learn all new commands and concepts for each load utility. MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump. The similarities will be evident as you work with them. Where there are some quirky differences, we will point them out for you.
Page 31 of 91
Teradata Utilities-Breaking the Barriers, First Edition In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months, monthly data is rotated in and out. At the end of every month, the oldest month of data is removed and the new month is added. The cycle is "add a month, delete a month, add a month, delete a month." In our illustration, that means that January data must be deleted to make room for May's data. Here is a question for you: What if there was another way to accomplish this same goal without consuming all of these extra resources? To illustrate, let's consider the following scenario: Suppose you have TableA that contains 12 billion rows. You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad DELETE to DELETE FROM TableA WHERE <date-column> < 2002-02-01'. The final step would be to INSERT the new rows for May using MultiLoad IMPORT.
Teradata Utilities-Breaking the Barriers, First Edition qualified name (<databasename>.<tablename>) in the script or use the DATABASE command to change the current database. Where will you find these tables in the load script? The Logtable is generally identified immediately prior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOAD statement. Do not underestimate the value of these tables. They are vital to the operation of MultiLoad. Without them a MultiLoad job can not run. Now that you have had the "executive summary", let's look at each type of table individually. Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require the use of two error tables per target table. MultiLoad will automatically create these tables. Rows are inserted into these tables only when errors occur during the load process. The first error table is the acquisition Error Table (ET). It contains all translation and constraint errors that may occur while the data is being acquired from the source(s). The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you might see a UPI error that shows a second employee number "99." In this case, if the name for employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the Employee table. However, if the name showed up as David Jackson, then you know that further investigation is needed, because employee numbers must be unique. Each error table does the following: Identifies errors Provides some detail about the errors Stores the actual offending row for debugging You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they default to ET_<target_table_name> and UV_<target_table_name>. In either case, MultiLoad will not accept error table names that are the same as target table names. It does not matter what you name them. It is recommended that you standardize on the naming convention to make it easier for everyone on your team. For more details on how these error tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors." Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed step. Work Table(s): MultiLoad will automatically create one worktable for each target table. This means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only works on one target table. The purpose of worktables is to hold two things: 1. The Data Manipulation Language (DML) tasks 2. The input data that is ready to APPLY to the AMPs The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.
Page 33 of 91
Figure 5-1
Page 34 of 91
Figure 5-2 The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2.
Page 35 of 91
Figure 5-3 Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-sequence sorted block from Phase 3 and each block of the base table is read only once to reduce I/O operations to gain speed. Then, all matching rows in the base block are inserted, updated or deleted before the entire block is written back to disk, one time. This is why the match tags are so important. Changes are made based upon corresponding data and DML (SQL) based on the match tags. They guarantee that the correct operation is performed for the rows and blocks with no duplicate operations, a block at a time. And each time a table block is written to disk successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again from the very beginning if a RESTART is needed. What happens when several tables are being updated simultaneously? In this case, all of the updates are scripted as a multi-statement request. That means that Teradata views them as a single transaction. If there is a failure at any point of the load process, MultiLoad will merely need to be RESTARTed from the point where it failed. No rollback is required. Any errors will be written to the proper error table.
MultiLoad Commands
Two Types of Commands
You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoad tasks are commands that are used by the MultiLoad utility for specific individual steps as it processes a load. Support functions are those commands that involve the Teradata utility Support Environment (covered in Chapter 9), are used to set parameters, or are helpful for monitoring a load. The chart below lists the key commands, their type, and what they do.
Page 36 of 91
Figure 5-4
Page 37 of 91
Figure 5-5
Teradata Utilities-Breaking the Barriers, First Edition Step Two: Identifying the Target, Work and Error tables- In this step of the script you must tell Teradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then you will preface the names of these tables with the sub-commands TABLES, WORKTABLES AND ERROR TABLES. All you must do is name the tables and specify what database they are in. Work tables and error tables are created automatically for you. Keep in mind that you get to name and locate these tables. If you do not do this, Teradata might supply some defaults of its own! At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not specifically been named, the script would still execute and build these tables. They would have been built in the default database for the user. The name of the worktable would be WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and UV_EMPLOYEE_DEPT1, respectively. Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One customer calls this database CORP_WORK. This is where all of the logtables and worktables are normally created. You can use a DATABASE command to point all table creations to it or qualify the names of these tables individually. Step Three: Defining the INPUT flat file record structure- MultiLoad is going to need to know the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous fields length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No field could have been replaced with the number 132 ( CHAR(11)+CHAR(20)+CHAR(100)+1 ). Then, the .FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE can be used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is specified in the LAYOUT. Step Four: Defining the DML activities to occur- The .DML LABEL names and defines the SQL that is to execute. It is like setting up executable code in a programming language, but using SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept table. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Are you allowed to use multiple labels in a script? Sure! But remember this: Every label must be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and its format type- This step is vital! Using the .IMPORT command, we have identified the INFILE data as being contained in a file called "CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced the LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the DML LABEL called INSERTS - that is, to INSERT the data rows into the target table. This is still a sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the real name of the file. Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until it finds the semicolon to define the end of the command. This is how it determines one operation from another. Therefore, it is very important or it would have attempted to process the END LOADING as part of the IMPORT-it wouldn't work. Step Six: Finishing loading and logging off of Teradata- This is the closing ceremonies for the load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system. Important note: Since the script above in Figure 5-6 does not DROP any tables, it is completely capable of being restarted if an error occurs. Compare this to the next script in Figure 5-7. Do you think it is restartable? If you said no, part yourself on the back.
Figure 5-6
Page 39 of 91
Figure 5-7
Page 40 of 91
Figure 5-8
Figure 5-9 In IMPORT mode, you may specify as many as five distinct error-treatment options for one.DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you have a standard INSERT load that you know will end up recording about 20,000 duplicate row errors. Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the error table. By ignoring those errors, you gain three benefits: 1. You do not need to see all the errors. 2. The error table is not filled up needlessly. 3. MultiLoad runs much faster since it is not conducting a duplicate row check. When doing an UPSERT, there are two rules to remember: The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an UPSERT. So, this keeps these rows out of your error table. The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a row from the data source if that row does not exist in the target table because the update didn't find it. Page 41 of 91
Teradata Utilities-Breaking the Barriers, First Edition The table that follows shows you, in more detail, how flexible your options are:
Figure 5-10
Figure 5-12
Page 43 of 91
Figure 5-13
Figure 5-14 How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here are a few that we saw: At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command. You need not specify it in the .END MLOAD command. Page 44 of 91
Teradata Utilities-Breaking the Barriers, First Edition You will readily notice that this mode has no .DML LABEL command. Since it is focused on just one absolute function, no APPLY clause is required so you see no .DML LABEL. Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad command, so it has no dot prefix. Since default names are available for worktables (WT_<target_tablename>) and error tables (ET_<target_tablename> and UV_<target_tablename>), they need not be specifically named, but be sure to define the Logtable.
Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoad IMPORT. The IMPORT delete is used to remove small volumes of data rows based upon the Primary Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassing the Transient Journal. Because there is no Transient Journal, there are no rollbacks when the job fails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoad DELETE task is never based upon the Primary Index. Because we are not importing any data rows, there is neither a need for worktables or an Acquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That statement will be applied to every table row. If the condition is met, then the row is deleted. Using the match tags, each target block is read once and the appropriate rows are deleted.
Figure 5-15
Page 45 of 91
Figure 5-16
Figure 5-17
Page 46 of 91
Page 47 of 91
Figure 5-18
Figure 5-19
Page 48 of 91
Figure 5-20
RESTARTing MultiLoad
Who hasn't experienced a failure at some time when attempting a load? Don't take it personally! Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the impressive ability to RESTART from failures in either environment. In fact, it requires almost no effort to continue or resubmit the load job. Here are the factors that determine how it works: First, MultiLoad will check the Restart Logtable and automatically resume the load process from the last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where it left off. Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing! Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and start again from that very spot. Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run until complete. Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause will be enacted. The results are stored in the Logtable. During the Application Phase, CHECKPOINTs are logged each time a data block is successfully written to its target table. HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of records is assumed.
Page 49 of 91
Teradata Utilities-Breaking the Barriers, First Edition You should be very cautious using the RELEASE command. It could potentially leave your table half updated. Therefore, it is handy for a test environment, but please don't get too reliant on it for production runs. They should be allowed to finish to guarantee data integrity.
Figure 5-21
Page 50 of 91
Chapter 6: TPump
An Introduction to TPump
The chemistry of relationships is very interesting. Frederick Buechner once stated, "My assumption is that the story of any one of us is in some measure the story of us all." In this chapter, you will find that TPump has similarities with the rest of the family of Teradata utilities. But this newer utility has been designed with fewer limitations and many distinguishing abilities that the other load utilities do not have. Do you remember the first Swiss Army knife you ever owned? Aside from its original intent as a compact survival tool, this knife has thrilled generations with its multiple capabilities. TPump is the Swiss Army knife of the Teradata load utilities. Just as this knife was designed for small tasks, TPump was developed to handle batch loads with low volumes. And, just as the Swiss Army knife easily fits in your pocket when you are loaded down with gear, TPump is a perfect fit when you have a large, busy system with few resources to spare. Let's look in more detail at the many facets of this amazing load tool.
Page 51 of 91
Teradata Utilities-Breaking the Barriers, First Edition More benefits: Just when you think you have pulled out all of the options on a Swiss Army knife, there always seems to be just one more blade or tool you had not noticed. Similar to the knife, TPump always seems to have another advantage in its list of capabilities. Here are several that relate to TPump requirements for target tables. TPump allows both Unique and Non-Unique Secondary Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad, which allows just NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to be populated with data rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besides this, Referential Integrity is allowed and need not be dropped. As to the existence of Triggers, TPump says, "No problem!" Support Environment compatibility: The Support Environment (SE) works in tandem with TPump to enable the operator to have even more control in the TPump load environment. The SE coordinates TPump activities, assists in managing the acquisition of files, and aids in the processing of conditions for loads. The Support Environment aids in the execution of DML and DDL that occur in Teradata, outside of the load utility. Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locks may be dropped with no ill consequences. Is this too good to be true? Are there no limits to this load utility? TPump does not like to steal any thunder from the other load utilities, but it just might become one of the most valuable survival tools for businesses in today's data warehouse environment.
Figure 6-2
Figure 6-3
Teradata Utilities-Breaking the Barriers, First Edition Finishing loading and logging off of Teradata
The following script assumes the existence of a Student_Names table in the SQL01 database. You may use preexisting target tables when running TPump or TPump may create the tables for you. In most instances you will use existing tables. The CREATE TABLE statement for this table is listed for your convenience. CREATE TABLE SQL01.Student Names ( Student_ID ,Last_Name ,First_Name INTEGER CHAR (20) VARCHAR (14))
Unique Primary Index ( Student_ID); Much of the TPump command structure should look quite familiar to you. It is quite similar to MultiLoad. In this example, the Student_Names table is being loaded with new data from the university's registrar. It will be used as an associative table for linking various tables in the data warehouse.
Figure 6-4 Step One: Setting up a Logtable and Logging onto Teradata-First, you define the Logtable using the .LOGTABLE command. We have named it LOG_PUMP in the WORK_DB database. The Logtable is automatically created for you. It may be placed in any database by qualifying the table name with the name of the database by using syntax like this: <databasename>.<tablename> Next, the connection is made to Teradata. Notice that the commands in TPump, like those in MultiLoad, require a dot in front of the command key word. Step Two: Begin load process, add Parameters, naming the Error Table- Here, the script reveals the parameters requested by the user to assist in managing the load for smooth operation. It also names the one error table, calling it SQL01.ERR_PUMP. Now let's look at each parameter: ERRLIMIT 5 says that the job should terminate after encountering five errors. You may set the limit that is tolerable for the load. Page 54 of 91
Teradata Utilities-Breaking the Barriers, First Edition CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in increments of one minute. If the factor is between 1 and 60, it refers to minutes. If it is over 60, then it refers to the number of rows at which the checkpointing should occur. SESSIONS 64 tells TPump to establish 64 sessions with Teradata. TENACITY 2 says that if there is any problem establishing sessions, then to keep on trying for a period of two hours. PACK 40 tells TPump to "pack" 40 data rows and load them at one time. RATE 1000 means that 1,000 data rows will be sent per minute.
Step Three: Defining the INPUT flat file structure- TPump, like MultiLoad, needs to know the structure the INPUT flat file record. You use the .LAYOUT command to name the layout. Following that, you list the columns and data types of the INPUT file using the .FIELD, .FILLER or .TABLE commands. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous field's length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER with the correct number of bytes as character to position to the cursor to the next field, or the "*" can be replaced by a number that equals the lengths of all previous fields added together plus 1 extra byte. When you use this technique, the .FILLER is not needed. In our example, this says to begin with Student_ID, continue on to load Last_Name, and finish when First_Name is loaded. Step Four: Defining the DML activities to occur- At this point, the .DML LABEL names and defines the SQL that is to execute. It also names the columns receiving data and defines the sequence in which the VALUES are to be arranged. In our example, TPump is to INSERT a row into the SQL01.Student_NAMES. The data values coming in from the record are named in the VALUES with a colon prior to the name. This provides the PE with information on what substitution is to take place in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and defining its FORMAT- Using the .IMPORT INFILE command, we have identified the INPUT data file as "CDW_Export.txt". The file was created using the TEXT format. Step Six: Associate the data with the description- Next, we told the IMPORT command to use the LAYOUT called, "FILELAYOUT." Step Seven: Telling TPump to start loading- Finally, we told TPump to APPLY the DML LABEL called INSREC-that is, to INSERT the data rows into the target table. Step Seven: Finishing loading and logging off of Teradata- The .END LOAD command tells TPump to finish the load process. Finally, TPump logs off of the Teradata system.
Page 55 of 91
Figure 6-5
Page 56 of 91
Figure 6-6
Page 57 of 91
Figure 6-7
Page 58 of 91
Figure 6-8 The following is the output from the above UPSERT:
Page 59 of 91
Figure 6-9 NOTE: The above UPSERT uses the same syntax as MultiLoad. This continues to work. However, there might soon be another way to accomplish this task. NCR has built an UPSERT and we have tested the following statement, without success: UPDATE SQL01.Student_Profile SET Last_Name ,First_Name ,Class_Code ,Grade_Pt WHERE = :Last_Name = :First_Name = :Class_Code = :Grade_Pt
Student_ID = :Student_ID;
ELSE INSERT INTO SQL01.Student_Profile VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt); We are not sure if this will be a future technique for coding a TPump UPSERT, or if it is handled internally. For now, use the original coding technique.
Monitoring TPump
TPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check the status of TPump jobs as they run and to change (remember "throttle up" and "throttle down?") the statement rate on the fly. Key to this monitor is the "SysAdmin.TpumpStatusTbl" table in the Data Dictionary Directory. If your Database Administrator Page 60 of 91
Teradata Utilities-Breaking the Barriers, First Edition creates this table, TPump will update it on a minute-by-minute basis when it is running. You may update the table to change the statement rate for an IMPORT. If you want TPump to run unmonitored, then the table is not needed. You can start a monitor program under UNIX with the following command: tpumpmon [-h] [TDPID/] <UserName>,<Password>[,<AccountID>] Below is a chart that shows the Views and Macros used to access the "SysAdmin.TpumpStatusTbl" table. Queries may be written against the Views. The macros may be executed.
Figure 6-9
Figure 6-10
Page 61 of 91
RESTARTing TPump
Like the other utilities, a TPump script is fully restartable as long as the log table and error tables are not dropped. As mentioned earlier you have a choice of setting ROBUST either ON (default) or OFF. There is more overhead using ROBUST ON, but it does provide a higher degree of data integrity, but lower performance.
Page 62 of 91
Figure 6-11
Page 63 of 91
The following diagram illustrates the logic flow when using an INMOD with the utility:
As seen in the above diagrams, there is an extra step involved with the processing of an INMOD. On the other hand, it can eliminate the need to create an intermediate file by literally using another RDBMS as its data source. However, the user still scripts and executes the utility, like when using a file, that portion does not change. The following chart shows the appropriate languages for mainframe and network-attached systems: written in.
Figure 7-2
Writing an INMOD
The writing of an INMOD is primarily concerned with processing an input data source. However, it cannot do the processing haphazardly. It must wait for the utility to tell it what and when to perform every operation. It has been previously stated that the INMOD returns data to the utility. At the same time, the utility needs to know that it is expecting to receive the data. Therefore, a high degree of handshake processing is necessary for the two components (INMOD and utility) to know what is expected. As well as passing the data, a status code is sent back and forth between the utility and the INMOD. As with all processing, we hope for a successful completion. Earlier in this book, it was shown that a zero status code indicates a successful completion. That same situation is true for communications between the utility and the INMOD. Therefore, a memory area must be allocated that is shared between the INMOD and the utility. The area contains the following elements: 1. The return or status code 2. The length of the data that follows 3. The data area
Figure 7-5 Entry point for FastLoad used in the DEFINE: Figure 7-6 NCR Corporation provides two examples for writing a FastLoad INMOD. The first is called BLKEXIT.C, which does not contain the checkpoint and restart logic, and the other is BLKEXITR.C that does contain both checkpoint and restart logic.
Figure 7-7 Second Parameter definition for INMOD to MultiLoad, TPump and FastExport
Figure 7-8 Return/status codes for MultiLoad, TPump and FastExport to the INMOD
Figure 7-9 The following diagram shows how to use the return codes of 6 and 7: Return/status codes for the INMOD to MultiLoad, TPump and FastExport:
Page 66 of 91
Figure 7-10 Entry point for MultiLoad, TPump and FastExport: Figure 7-11
Migrating an INMOD
As seen in figures 7-4 and 7-9, many of the return codes are the same. However, it should also be noted that FastLoad must remember the record count in case a restart is needed. Where as, the other utilities send the record count to the INMOD. If the INMOD fails to accept the record count when sent to it, the job will abort or hang and never finish successfully. This means that if a FastLoad INMOD is used in one of the other utilities, it will work as long as the utility never requests that a checkpoint take place. Remember that unlike FastLoad, the newer utilities default to a checkpoint every 15 minutes. The only way to turn it off is to set the CHECKPOINT option of the .BEGIN to a number than is higher than the number of records being processed. Therefore, it is not the best practice to simply use a FastLoad INMOD as if it is interchangeable. It is better to modify the INMOD logic for the restart and checkpoint processing necessary to receive the record count and use it for the repositioning operation
Sample INMOD
Below is and example of the PROCEDURE DIVISION commands that might be used for MultiLoad, TPump or FastExport. PROCEDURE DIVISION USING PARM-1, PARM-2. BEGIN. MAIN. {specific user processing goes here, followed by:} IF RETCODE= 0 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 0 - INITIALIZE & READ" PERFORM 100-OPEN-FILES PERFORM 200-READ-INPUT GOBACK Page 67 of 91
Teradata Utilities-Breaking the Barriers, First Edition ELSE IF RETCODE= 1 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 1- READ" PERFORM 200-READ-INPUT GOBACK ELSE IF RETCODE= 2 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 2 - RESTART" PERFORM 900-GET-REC-COUNT PERFORM 950-FAST-FORWARD-INPUT GOBACK ELSE IF RETCODE= 3 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 3 - CHECKPOINT" PERFORM 600-SAVE-REC-COUNT GOBACK ELSE IF RETCODE= 5 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 5 - DONE" MOVE 0 TO RETLENGTH MOVE 0 TO RETCODE GOBACK ELSE DISPLAY "INMOD RECEIVED - INVALID RETURN CODE" MOVE 0 TO RETLENGTH MOVE 16 TO RETCODE GOBACK. 100-OPEN-FILES. OPEN INPUT DATA-FILE. MOVE 0 TO RETCODE. 200-READ-INPUT. READ INMOD-DATA-FILE INTO DATA-AREA1 AT END GO TO END-DATA. ADD 1 TO NUMIN. MOVE 80 TO RETLENGTH. MOVE 0 TO RETCODE. ADD 1 TO NUMOUT. END-DATA. CLOSE DATA-FILE. DISPLAY "NUMBER OF INPUT RECORDS = "NUMIN. DISPLAY "NUMBER OF OUTPUT RECORDS = "NUMOUT. MOVE 0 TO RETLENGTH. MOVE 0 TO RETCODE. GOBACK. INMOD
Page 68 of 91
As seen in the above diagram, there is an extra step involved with the processing of an OUTMOD. On the other hand, it eliminates the need to create an intermediate file. The data destination can be another RDBMS. However, the user still executes the utility, that portion does not change. The following chart shows the available languages for mainframe and networkattached systems:
Figure 8-1
Page 69 of 91
Teradata Utilities-Breaking the Barriers, First Edition Normally the utility script contains the name of the file or JCL statement (DDNAME). When using an OUTMOD, the FILE designation is no longer specified. Instead, the name of the program to call is defined in the script. The following chart indicates the appropriate statement to define the OUTMOD: Figure 8-2
Writing an OUTMOD
The writing of an OUTMOD is primarily concerned with processing the output data destination. However, it cannot do the processing haphazardly. It must wait for the utility to tell it what and when to perform every operation. It has been previously stated that the OUTMOD receives data from the utility. At the same time, the utility needs to know that it is expecting to receive the data. Therefore, a handshake degree of processing is necessary for the two components (OUTMOD and FastExport) to know what is expected. As well as passing the data, a status code is sent back and forth between them. Just like all processing, we hope for a successful completion. Earlier in this book, it was shown that a zero status code indicates a successful completion. A memory area must be allocated that is shared between the OUTMOD and the utility. The area contains the following elements: 1. The return or status code 2. The sequence number of the SELECT within FastExport 3. The length of the data area in bytes 4. The response row from Teradata 5. The length of the output data record 6. The output data record Cart of the various programming language definitions for the parameters
Figure 8-6
Sample OUTMOD
Below is and example of the PROCEDURE DIVISION commands that might be used for MultiLoad, TPump or FastExport. LINKAGE SECTION 01 OUTCODE PIC S9(5) COMP. 01 OUTSEQNUM S9(5) COMP. 01 OUTRECLEN PIC S9(5) COMP. 01 OUTRECORD 05 INDICATOR PIC 9. 05 REGN PIC XXX. 05 PRODUCT PIC X(8). 05 QTY PIC S9(8) COMP. 05 PRICE PIC S9(8) COMP. 01 OUTRECLEN PIC S9(5) COMP.
01 OUTDATA PIC XXXX. PROCEDURE DIVISION USING OUTCODE, STATEMENT-NO, OUTRECLEN, OUTRECORD, OUTRECLEN, OUTDATA. BEGIN. MAIN. IF OUTCODE = 1 THEN OPEN OUTPUT SALES-DROPPED-FILE OPEN OUTPUT BAD-REGN-SALES-FILE GOBACK. IF OUTCODE = 2 THEN CLOSE SALES-DROPPED-FILE CLOSE BAD-REGN-SALES-FILE GOBACK. IF OUTCODE = 3 THEN PERFORM TYPE-3 GOBACK. IF OUTCODE = 4 THEN GOBACK IF OUTCODE = 5 THEN CLOSE SLAES-DROPPED-FILE OPEN OUTPUT SALES-DROPPED-FILE CLOSE BAD-REGN-SALES-FILE Page 71 of 91
Teradata Utilities-Breaking the Barriers, First Edition OPEN OUTPUT BAD-REGN-SALES-FILE GOBACK. IF OUTCODE = 6 THEN OPEN OUTPUT SALES-DROPPED-FILE OPEN OUTPUT BAD-REGN-SALES-FILE GOBACK. TYPE-3 IF QTY IN OUTRECORD * PRICE IN OUTRECORD < 100 THEN MOVE 0 TO OUTRECLEN WRITE DROPPED-TRANLOG FROM OUTRECORD ELSE PERFORM TEST-NULL-REGN. TEST-NULL-REGN. IF REGN IN OUTRECORD = SPACES MOVE 999 TO REGN IN OUTRECORD WRITE BAD-REGN-OUTRECORD FROM OU
Page 72 of 91
Figure 9-1 The SE allows the writer of the script to perform housekeeping chores prior to calling the desired utility with a .BEGIN. At a minimum, these chores include the specification of the restart log table and logging onto Teradata. Yet, it brings to the party the ability to perform any Data Definition Language (DDL) and Data Control Language (DCL) command available to the user as defined in the Data Dictionary. In addition, all Data Manipulation Language (DML) commands except a SELECT are allowed within the SE.
Page 73 of 91
Beginning a Utility
Once the script has connected to Teradata and established all needed environmental conditions, it is time to run the desired utility. This is accomplished using the .BEGIN command. Beyond running the utility, it is used to define most of the options used within the execution of the utility. As an example, setting the number of sessions is requested here. See each of the individual utilities for the names, usage and any recommendations for the options specific to it. The syntax for writing a .BEGIN command: .BEGIN <utility-task> [utility-options>]; The utility task is defined as one of the following:
Figure 9-2
Ending a Utility
Once the utility finishes its task, it needs to be ended. To request the termination, use the .END command. The syntax for writing a .END command: .END <utility-task> When the utility ends, control is returned to the SE. It can then check the return code (see Figure 9-4) status and verify that the utility finished the task successfully. Based on the status value in the return code, the SE can be used to determine what processing should occur next.
Teradata Utilities-Breaking the Barriers, First Edition Optionally, the user may request a specific return code be sent to the host computer that was used to start the utility. This might include the job control language (JCL) on a mainframe, the shell script for a UNIX system, or bat file on DOS. This value can then be checked by that system to determine conditional processing as a result of the completion code specified.
The format of the accepted record is comprised of either character or numeric data. Character data must be enclosed in single quotes () and numeric data does not need quotes. When multiple values are specified on a single record, a space is used to delimit them from one another. The assignment of a value to a parameter is done sequentially as the names appear in the .ACCEPT and the data appears on the record. The first value is assigned to the first parameter and so forth until there are no more parameter names in which to assign values. The system variables are defined later in this chapter. They are automatically set by the system to provide information regarding the execution of the utility. For example, they include the date, time and return code, to name a few. Here they can be used to establish the value for a user parameter instead of reading the data from a file. Example of using a .ACCEPT command: .ACCEPT char_parm, int_num_parm, dec_num_parm FILE parm-record;
Contents of parm-record: 'This is some character data enclosed in quotes with spaces in it' 123 35.67
Once accepted, this data is available for use within the script. Optionally, an IGNORE can be used to skip one or more of the specified variables in the record. This makes it easy to provide one parameter record that is used by multiple job scripts and allowing the script to determine which and how many of the values it needs. To not use the integer data, the above .ACCEPT would be written as: .ACCEPT char_parm, dec_num_parm FILE parm-record IGNORE 39 THRU 42; Page 75 of 91
Teradata Utilities-Breaking the Barriers, First Edition Note: if the system is a mainframe, the FILE is used to name the DD statement in the Job Control Language (JCL). For example, for the above .ACCEPT, the following JCL would be required: //PARM-RECORD DD DSN=<pds-member-name>, DISP=(old, keep)
Since these are the only two pre-defined formats, any other format must be defined in the INSERT of the utility, as in the following example for a MM/DD/YYYY date: INSERT INTO my_date_table VALUES( :incoming_date (DATE, FORMAT 'mm/dd/yyyy'));
Note: If the system is a mainframe, the FILE portion of the command is used to name the DD statement in the JCL. The JCL must also contain any names, space requirements, record and block size, or disposition information needed by the system to create the file.
The comparison symbols are normally one of the following: Figure 9-3
Routing Messages
The .ROUTE command is used to write messages to an output file. This is normally system information generated by the SE during the execution of a utility. The default file is SYSPRINT on a mainframe and STDOUT on other platforms. The syntax for writing a .ROUTE command: .ROUTE <messages>[ TO ] FILE <file-name> [[WITH] ECHO { OFF | [ TO ] FILE <file-name>];
Note: If the system is a mainframe, the FILE is used to name the DD statement in the JCL. The JCL must also contain any names, space requirements, record and block size, or disposition information needed by the system.
The IGNORE and the THRU options work here the same as they do as explained in the .ACCEPT above. Note: If the system is a mainframe, the FILE is used to name the DD statement in the JCL.
Note: The expression can be a literal value based on the data type of the variable or a mathematical operation for numeric data. The math can use one or more variables and one or more literals.
Teradata Utilities-Breaking the Barriers, First Edition command, it is important to know which operating system is being used. This information can be obtained from one of the system variables below. The syntax for writing a .SYSTEM command: .SYSTEM '<operating-system-specific-command>';
Note: There is a system variable that contains this data and can be found in the System Variable section of this chapter.
Figure 9-4
Teradata Utilities-Breaking the Barriers, First Edition INSERT INTO &tablename VALUES (&variable1, '&parm_data2', &parm_data1); .LOGOFF; - - - - - - - - - - - - - - - - Contents of logon-file: Contents of myparmfile: .logon ProdSys/mikel,larkins1 My_test_table 125 'Some character data'
- - - - - - - - - - - - - - - - The following SYSOUT file is created from a run of the above script on a day other than Friday: ======================================================== = = = FastExport Utility Release FEXP.07.02.01 = = =
======================================================== **** 13:40:45 UTY2411 Processing start date: TUE AUG 13, 2002 ======================================================== = = Logon/Connection = 0001 .LOGTABLE WORKDB.MJL_Util_log; 0002 .DATEFORM ansidate; **** 13:40:45 UTY1200 DATEFORM has been set to ANSIDATE. 0003 .RUN FILE logonfile.txt; 0004 .logon cdw/mikel,; **** 13:40:48 UTY8400 Maximum supported buffer size: 64K **** 13:40:50 UTY8400 Default character set: ASCII **** 13:40:52 UTY6211 A successful connect was made to the RDBMS. **** 13:40:52 UTY6217 Logtable 'SQL00.MJL_Util_log' has been created. ========================================================= = = = 0005 /* test the system day to see if it is Friday notice that the character string is used in single quotes so that it is compared as a character string. Contrast this below for the table name */ .IF '&SYSDAY' = 'FRI' THEN; **** 13:40:52 UTY2402 Previous statement modified to: 0006 .IF 'TUE' = 'FRI' THEN; 0007 0008 0009 0010 0011 /* .DISPLAY '&SYSDATE(4) is a &SYSDAYday' FILE outfl.txt; .ELSE; .DISPLAY '&SYSUSER, &SYSDATE(4) is not Friday' FILE outfl.txt; .DISPLAY 'Michael, 02/08/13(4) is not Friday' FILE outfl.txt; .LOGOFF 16; */ Page 79 of 91 Processing Control Statements = = = = = =
========================================================
=========================================================
===============================================================
=============================================================== **** 13:40:55 UTY6212 A successful disconnect was made from the RDBMS. **** 13:40:55 UTY6216 The restart log table has been dropped. **** 13:40:55 UTY2410 Total processor time used = '10.906 Seconds' . . . Start : 13:40:45 - TUE AUG 13, 2002 End : 13:40:55 - TUE AUG 13, 2002 Highest return code encountered = '16'.
The following SYSOUT file is created from a run of the above script on a day other than Friday: ======================================================== = = = FastExport Utility Release FEXP.07.02.01 = = =
======================================================== **** 13:40:45 UTY2411 Processing start date: FRI AUG 16, 2002 ======================================================== = = = 0001 .LOGTABLE WORKDB.MJL_Util_log; 0002 .DATEFORM ansidate; **** 13:40:45 UTY1200 DATEFORM has been set to ANSIDATE. 0003 .RUN FILE logonfile.txt; 0004 .logon cdw/mikel,; **** 13:40:48 UTY8400 Maximum supported buffer size: 64K **** 13:40:50 UTY8400 Default character set: ASCII **** 13:40:52 UTY6211 A successful connect was made to the RDBMS. **** 13:40:52 UTY6217 Logtable 'SQL00.MJL_Util_log' has been created. ========================================================= = = = 0005 /* test the system day to see if it is Friday notice that the character string is used in single quotes so that it is compared as a character string. Contrast this below for the table name */ .IF '&SYSDAY' = 'FRI' THEN; **** 13:40:52 UTY2402 Previous statement modified to: 0006 .IF 'FRI' = 'FRI' THEN; 0007 0008 .DISPLAY '&SYSDATE(4) is a &SYSDAYday' FILE outfl.txt; .DISPLAY '02/08/13(4) is a FRIday' FILE outfl.txt; Page 80 of 91 **** 13:40:52 UTY2402 Previous statement modified to: Processing Control Statements = = = Logon/Connection = = =
========================================================
=========================================================
Teradata Utilities-Breaking the Barriers, First Edition 0009 0010 0011 /* .ELSE; DISPLAY '&SYSUSER, &SYSDATE(4) is not Friday' FILE outfl.txt; .LOGOFF 16; */
/* notice that the endif allows for more than one operation after the comparison */ .ENDIF; 0012 /* establish and store data into a variable */ /* the table name and two values are obtained from a file */ .ACCEPT tablename, parm_data1, parm_data2 FILE myparmfile.txt; 0013 /* the table name is not in quotes here because it is not character data */ .SET variable1 TO &parm_data1 + 125; **** 13:40:52 UTY2402 Previous statement modified to: 0014 /* the table name is not in quotes here because it is not character data */ .SET variable1 TO 123 + 125; 0015 INSERT INTO &tablename VALUES (&variable1, '&parm_data2', &parm_data1); **** 13:40:52 UTY2402 Previous statement modified to: 0016 INSERT INTO My_test_table VALUES (248, 'some character data', 123); **** 13:40:54 UTY1016 'INSERT' request successful. 0017 .LOGOFF; ================================================================= = = = Logoff/Disconnect = = =
================================================================= **** 13:40:55 UTY6212 A successful disconnect was made from the RDBMS. **** 13:40:55 UTY6216 The restart log table has been dropped. **** 13:40:55 UTY2410 Total processor time used = '10.906 Seconds' . . . Start : 13:40:45 - FRI AUG 16, 2002 End : 13:40:55 - FRI AUG 16, 2002 Highest return code encountered = '0'.
Page 81 of 91
TEMPLATE',CLASS=S,MSGCLASS=0,
//*-----------------------------------------------------------------------------------------------------// // // // // BTEQ1 LOGON IDBENV SYSIN SYSPRINT DD EXEC PGM=BTQMAIN DD DD DD SYSOUT=* DSN=B09XXZ.APPLUTIL.CLASS.JCL(ILOGON),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(IDBENV),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(BTEQSCPT),DISP=SHR
/*---------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION ------------------------*/ /*---------------------------------------------------------------*/ /* PURPOSE & FLOW: /* SPECIAL OR UNUSUAL LOGIC: /* PARM /* XXXX - NONE /* ABEND CODES: */ */ */ */ */
Teradata Utilities-Breaking the Barriers, First Edition .RUN FILE=ILOGON; */ .RUN FILE=IDBENV; SQL_CLASS; */ .EXPORT DATA DDNAME=REPORT SELECT EMPLOYEE_NO, LAST_NAME, FIRST_NAME, SALARY, DEPT_NO FROM EMPLOYEE_TABLE .IF ERRORCODE > 0 THEN .GOTO Done .EXPORT RESET .LABEL Done .QUIT /*JCL IDBENV DATABASE /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
TEMPLATE',CLASS=S,MSGCLASS=0,
//*-----------------------------------------------------------------------------------------------------// // // // // // BTEQ1 LOGON IDBENV SYSIN SYSPRINT DD EXEC PGM=BTQMAIN DD DD DD SYSOUT=* DSN=B09XXZ.APPLUTIL.CLASS.JCL(ILOGON),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(IDBENV),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(BTEQSCPT),DISP=SHR
Teradata Utilities-Breaking the Barriers, First Edition /*---------------------------------------------------------------------*/ /* /* /* /* SCRIPT=XYYY9999 SCRIPT TYPE=TERADATA BTEQ LANGUAGE=UTILITY COMMANDS AND SQL RUN MODE=BATCH */ */ */ */
/*---------------------------------------------------------------------*/ /*------------------PROGRAM DESCRIPTION -------------------------------*/ /*---------------------------------------------------------------------*/ /* PURPOSE & FLOW: /* SPECIAL OR UNUSUAL LOGIC: /* PARM /* XXXX .SESSIONS 1 .RUN FILE=ILOGON; */ .RUN FILE=IDBENV; .IMPORT DATA DDNAME=REPORT .QUIET ON .REPEAT * USING EMPLOYEE_NO LAST_NAME FIRST_NAME SALARY DEPT_NO INSERT INTO EMPLOYEE_TABLE VALUES (:EMPLOYEE_NO, :LAST_NAME :SALARY :DEPT_NO); .QUIT , , :FIRST_NAME , (INTEGER), (CHAR(20)), (VARCHAR(12)), (DECIMAL(8,2)), (SMALLINT) /*JCL /*JCL ILOGON IDBENV - .LOGON CDW/SQL01,WHYNOT; DATABASE SQL08; */ - NONE /* ABEND CODES: */ */ */ */ */
/*---------------------------------------------------------------------*/
Teradata Utilities-Breaking the Barriers, First Edition // // // // // // // // // // // // // // FEXP1 ILOGON IDBENV SYSIN OUTDATA DD JOBLIB DD DD DSN=C309.B0SNCR.NM.R60.APPLOAD,DISP=SHR
//*--------------------------------------------------------------
DISP=(NEW,CATLG,DELETE), UNIT=SYSDA,SPACE=(CYL,(1,1),RLSE), DCB=(RECFM=FB,LRECL=80,BLKSIZE=0) SYSPRINT DD SYSABEND DD SYSTERM DD SYSDEBUG DD SYSOUT=* SYSOUT=* SYSOUT=* DUMMY
/*---------------------------------------------------------------------*/ /*------------------PROGRAM DESCRIPTION -------------------------------*/ /*---------------------------------------------------------------------*/ /* /* /* /* /* PURPOSE & FLOW: SPECIAL OR UNUSUAL LOGIC: PARM XXXX - NONE ABEND CODES: */ */ */ */ */ PROGRAM MODIFICATION -----------------------*/
/*---------------------------------------------------------------------*/ /*--------------/*---------------------------------------------------------------------*/ /* MAINTENANCE LOG - ADD LATEST CHANGE TO THE TOP*/ /* MOD-DATE AUTHOR MOD DESCRIPTION */ /*---------------------------------------------------------------------*/ .LOGTABLE SQL08.SQL08_RESTART_LOG; .RUN FILE ILOGON; */ .RUN FILE IDBENV; SQL_CLASS; */ .BEGIN EXPORT SESSIONS 1; .EXPORT OUTFILE OUTDATA MODE RECORD FORMAT TEXT; Page 85 of 91 /*JCL IDBENV DATABASE /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
Teradata Utilities-Breaking the Barriers, First Edition SELECT STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT FROM STUDENT_TABLE; .END EXPORT; .LOGOFF; (CHAR(11)), (CHAR(20)), (CHAR(14)), (CHAR(2)), (CHAR(7))
DSN=C009.B0SNCR.NM.R60.TRLOAD,DISP=SHR
//*-------------------------------------------------------------FAST LOAD SCRIPT FILE SYSIN DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(FLODSCPT),DISP=SHR SYSOUT=* SYSOUT=* SYSOUT=* SYSPRINT DD SYSUDUMP DD SYSTERM DD
/*----------------------------------------------------------------------*/
Teradata Utilities-Breaking the Barriers, First Edition /*------------------ PROGRAM DESCRIPTION -------------------------------*/ /*----------------------------------------------------------------------*/ /* /* /* /* /* PURPOSE & FLOW: SPECIAL OR UNUSUAL LOGIC: PARM - NONE ABEND CODES: XXXX */ */ */ */ */
/*----------------------------------------------------------------------*/ /*-----------------PROGRAM MODIFICATION --------------------------------*/ /*----------------------------------------------------------------------*/ /*MAINTENANCE LOG-ADD LATEST CHANGE TO THE TOP /* MOD-DATE .SESSIONS 1; LOGON TDP0/SQL08,SQL08; DROP TABLE SQL08.ERROR_ET; DROP TABLE SQL08.ERROR_UV; DELETE FROM SQL08.EMPLOYEE_PROFILE; DEFINE EMPLOYEE_NO LAST_NAME FIRST_NAME SALARY DEPT_NO (INTEGER), (CHAR(20)), (VARCHAR(12)), (DECIMAL(8,2)), (SMALLINT) AUTHOR MOD DESCRIPTION */ */
/*----------------------------------------------------------------------*/
Teradata Utilities-Breaking the Barriers, First Edition //* ----------------------------------------------------------//* // // // //* // // //* // // // // // //* // ILOGON IDBENV INPTFILE DD SYSPRINT DD SYSABEND SYSTERM DD DD DD DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(ILOGON),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(IDBENV),DISP=SHR JOB INFORMATION AND COMMENTS MLOAD STEPLIB DD EXEC PGM=MLOAD DD DSN=C009.B0SNCR.NM.R60.APPLOAD,DISP=SHR DSN=C009.B0SNCR.NM.R60.TRLOAD,DISP=SHR //* -----------------------------------------------------------
SPECIFY THE MULTILOAD INPUT DATA FILE (LOAD FILE) DSN=XXXXXX.YYYYYYY.INPUT.FILENAME,DISP=SHR SYSOUT=* SYSOUT=* SYSOUT=* DUMMY DSN=B09XXZ.APPLUTIL.CLASS.JCL(MLODSCPT),DISP=SHR
SYSDEBUG DD SYSIN DD
/*----------------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION -------------------------------*/ /*----------------------------------------------------------------------*/ /* /* /* /* /* PURPOSE & FLOW: SPECIAL OR UNUSUAL LOGIC: PARM XXXX - NONE ABEND CODES: */ */ */ */ */ PROGRAM MODIFICATION --------------------------------*/ */ */
/*----------------------------------------------------------------------*/ /*--------------/*----------------------------------------------------------------------*/ /*MAINTENANCE LOG-ADDED CHANGE TO THE TOP /* MOD-DATE AUTHOR MOD DESCRIPTION
/*----------------------------------------------------------------------*/ .LOGTABLE SQL08.UTIL_RESART_LOG; .RUN FILE ILOGON; */ .RUN FILE IDBENV; .BEGIN MLOAD Page 88 of 91 /*JCL IDBENV DATABASE SQL08; */ /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
Teradata Utilities-Breaking the Barriers, First Edition TABLES Student_Profile1 ERRLIMIT 1 SESSIONS 1; .LAYOUT INPUT_FILE; .FIELD .FIELD .FIELD .FIELD .FIELD .FIELD STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT FILLER 1 CHAR(11) * CHAR(20) * CHAR(14) * CHAR(2) * CHAR(7) * CHAR(26) ; ; ; ; ; ;
.END MLOAD;
REGION=6M,NOTIFY=B09XXZ
DSN=C009.B0SNCR.NM.R60.TRLOAD,DISP=SHR
Teradata Utilities-Breaking the Barriers, First Edition //* // // // // // SPECIFY TPUMP SCTIPT FILE NAME SYSIN DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(TPMPSCPT),DISP=SHR SYSOUT=* SYSOUT=* SYSOUT=* DUMMY SYSPRINT DD SYSABEND DD SYSTERM DD SYSDEBUG DD
/*----------------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION -------------------------------*/ /*----------------------------------------------------------------------*/ /* PURPOSE & FLOW: /* SPECIAL OR UNUSUAL LOGIC: /* PARM /* - NONE XXXX /* ABEND CODES: */ */ */ */ */ PROGRAM MODIFICATION --------------------------------*/ */ */
/*----------------------------------------------------------------------*/ /*--------------/*----------------------------------------------------------------------*/ /*MAINTENANCE LOG-ADDED CHANGE TO THE TOP /* MOD-DATE AUTHOR MOD DESCRIPTION
/*----------------------------------------------------------------------*/ .LOGTABLE SQL08.TPUMP_RESTART; .RUN FILE ILOGON; */ .RUN FILE IDBENV; .BEGIN LOAD SESSIONS 1 TENACITY 2 ERRORTABLE TPUMP_UTIL_ET ERRLIMIT PACK 40 RATE 1000 ROBUST OFF; 5 CHECKPOINT 1 /*JCL IDBENV DATABASE SQL08; */ DROP TABLE SQL08.TPUMP_UTIL_ET; /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
Teradata Utilities-Breaking the Barriers, First Edition .FIELD .FIELD .FIELD .FIELD .FIELD .FIELD STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT FILLER 1 CHAR(11); * CHAR(20); * CHAR(14); * CHAR(2) ; * CHAR(7) ; * CHAR(26);
.DML LABEL INPUT_INSERT IGNORE DUPLICATE ROWS IGNORE MISSING ROWS; INSERT INTO Student_Profile4 VALUES (:STUDENT_ID :LAST_NAME :FIRST_NAME :CLASS_CODE :GRADE_PT ); .IMPORT INFILE INPUT LAYOUT INPUT_LAYOUT APPLY INPUT_INSERT; .END LOAD; .LOGOFF; (INTEGER), (CHAR(20)), (VARCHAR(12)), (CHAR(2)), (DECIMAL(5,2))
Page 91 of 91