0% found this document useful (0 votes)
5 views

Week 1-PART 2-Understanding Data Step Processing

This document outlines the processing of SAS DATA steps in epidemiological research, detailing the two main phases: compilation and execution. During the compilation phase, the input buffer and program data vector are created, and syntax errors are checked, while the execution phase involves reading data values and initializing variables. Key concepts such as automatic variables _N_ and _ERROR_ are also introduced, which help track the execution process and errors.

Uploaded by

KinSparkin'
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Week 1-PART 2-Understanding Data Step Processing

This document outlines the processing of SAS DATA steps in epidemiological research, detailing the two main phases: compilation and execution. During the compilation phase, the input buffer and program data vector are created, and syntax errors are checked, while the execution phase involves reading data values and initializing variables. Key concepts such as automatic variables _N_ and _ERROR_ are also introduced, which help track the execution process and errors.

Uploaded by

KinSparkin'
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Week 1-Part 2

Understanding DATA
STEP processing
PHEB 631: SAS PROGRAMMING IN
EPIDEMIOLOGICAL RESEARCH
Xiaohui Xu, Ph.D.
Department of Epidemiology and Biostatistics
Part 2 Understanding SAS
programs Processing
Lecture Outlines
• Understanding the steps involved in processing SAS
programs
• Identify the two phases that occur when a DATA step is processed
• Interpret automatic variables
• Identify the processing phase in which an error occurs
2.1 Understanding SAS DATA step
Processing
• A SAS DATA step is processed in two phases:
2.1.1 Compilation Phase

• 2.1.1.1 Input Buffer


• At the beginning of the compilation phase, the input buffer (an area of memory) is
created to hold a record from the external file.
2.1.1 Compilation Phase

• 2.1.1.2 Program Data Vector (PDV)


• After the input buffer is created, the program data vector is created. The program data vector
is the area of memory where SAS builds a data set, one observation at a time.
• The program data vector contains two automatic variables that can be used for processing but
which are not written to the data set as part of an observation.
 _N_ counts the number of times that the DATA step begins to execute.
 _ERROR_ signals the occurrence of an error that is caused by the data during execution.
The default value is 0, which means there is no error. When one or more errors occur, the
value is set to 1.
2.1.1 Compilation Phase

• 2.1.1.3 Data Set Variables


• As the INPUT statement is compiled, a slot is added to the program data vector for
each variable in the new data set.
2.1.1 Compilation Phase

• 2.1.1.3 Data Set Variables


• As the INPUT statement is compiled, a slot is added to the program data vector for
each variable in the new data set.
2.1.1 Compilation Phase

• 2.1.1.3 Data Set Variables


• As the INPUT statement is compiled, a slot is added to the program data vector for
each variable in the new data set.
2.1.1 Compilation Phase

• 2.1.1.3 Data Set Variables


• As the INPUT statement is compiled, a slot is added to the program data vector for
each variable in the new data set.
2.1.1 Compilation
Phase

• 2.1.1.4 Descriptor Portion of the SAS


Data Set
• At the bottom of the DATA step (in this
example, when the RUN statement is
encountered), the compilation phase is
complete, and the descriptor portion of
the new SAS data set is created. The
descriptor portion of the data set
includes
• The name of the data set
• The number of observations and
variables
• The names and attributes of the
variables.
2.1.1 Compilation Phase

• 2.1.1.5 Syntax Checking


• During the compilation phase, SAS also scans each statement in the DATA
step, looking for syntax errors. Syntax errors include
 Missing or misspelled keywords
 Invalid variable names
 Missing or invalid punctuation
 Invalid options.
2.1.2. Execution
Phase
• After the DATA step is compiled, it
is ready for execution. During the
execution phase, the data portion
of the data set is created. The data
portion contains the data values.
2.1.2. Execution Phase
• 2.1.2.1 Initializing Variables
• At the beginning of the execution phase, the value of _N_ is 1. Because there are no
data errors, the value of _ERROR_ is 0.
*Numeric values are of 2 types- Std and non-std e.g., date, currency ($);
2.1.2. Execution Phase
• 2.1.2.2 Input Data
• When an INPUT statement begins to read data values from a record that is held in the
input buffer, it uses an input pointer to keep track of its position.
• The input pointer starts at column 1 of the first record, unless otherwise directed. As
the INPUT statement executes, the raw data is read by the order and is assigned to
variables in the program data vector.
Iterations of the data step until end of the data

You might also like