An Introduction To The SAS System: Phil Spector
An Introduction To The SAS System: Phil Spector
to the
SAS System
Phil Spector
Statistical Computing Facility
Department of Statistics
University of California, Berkeley
What is SAS?
• Developed in the early 1970s at North Carolina State
University
• Originally intended for management and analysis of
agricultural field experiments
• Now the most widely used statistical software
• Used to stand for “Statistical Analysis System”, now it is not
an acronym for anything
• Pronounced “sass”, not spelled out as three letters.
2
Overview of SAS Products
• Base SAS - data management and basic procedures
• SAS/STAT - statistical analysis
• SAS/GRAPH - presentation quality graphics
• SAS/OR - Operations research
• SAS/ETS - Econometrics and Time Series Analysis
• SAS/IML - interactive matrix language
• SAS/AF - applications facility (menus and interfaces)
• SAS/QC - quality control
There are other specialized products for spreadsheets, access to
databases, connectivity between different machines running SAS,
etc.
4
Resources: Other SAS Institute Publications
Basic Manuals (should be available for most serious users):
SAS Language: Reference, Version 6, 1st Edition
SAS Language and Procedures: Usage, Version 6, 1st Edition
SAS Language and Procedures: Usage 2, Version 6, 1st Edition
Product Manuals (should be available for reference)
SAS/STAT User’s Guide, Version 6, 4th Edition, Volumes 1 & 2
SAS/ETS User’s Guide, Version 6, 2nd Edition
SAS/FSP Software: Usage and Reference, Version 6, 1st Edition
SAS/GRAPH Software: Usage, Version 6, 1st Edition
SAS/GRAPH Software: Reference, Version 6, 1st Edition, Volumes 1 & 2
SAS/AF Software: Usage and Reference, Version 6, 1st Edition
SAS/OR User’s Guide, Version 6, First Edition
Syntax-only Guides (may be useful for experienced users)
SAS Language and Procedures: Syntax, Version 6, 1st Edition
SAS Screen Control Language: Syntax, Version 6, 1st Edition
SAS/ETS Software: Syntax, Version 6, 1st Edition
SAS/GRAPH Software: Syntax, Version 6, 1st Edition
SAS/STAT Software: Syntax, Version 6, 1st Edition
6
Online Resources
Online help: Type help in the SAS display manager input window.
Sample Programs, distributed with SAS on all platforms.
SAS Institute Home Page: http://www.sas.com
SAS Institute Technical Support:
http://www.sas.com/service/techsup/find answer.html
Searchable index to SAS-L, the SAS mailing list:
http://www.listserv.uga.edu/archives/sas-l.html
Usenet Newsgroup (equivalent to SAS-L):
comp.soft-sys.sas
SAS for the Masses:
http://faith.hypno.net/sasmass/
Michael Friendly’s Guide to SAS Resources on the Internet:
http://www.math.yorku.ca/SCS/StatResource.html#SAS
Brian Yandell’s Introduction to SAS:
http://www.stat.wisc.edu/computing/sas/intro.html
8
Accessing SAS
There are four ways to access SAS on a UNIX system:
1. Type sas . This opens the SAS “display manager”, which
consists of three windows (program, log, and output). Some
procedures must be run from the display manager.
2. Type sas -nodms . You will be prompted for each SAS
statement, and output will scroll by on the screen.
3. Type sas -stdio . SAS will act like a standard UNIX
program, expecting input from standard input, sending the log
to standard error, and the output to standard output;
4. Type sas filename.sas . This is the batch mode of SAS -
your program is read from filename.sas, the log goes to
filename.log and the output goes to filename.lst.
10
Structure of SAS programs
• Lines beginning with an asterisk (*) are treated as comments.
Alternatively you can enclose comments between /* and */.
• You can combine as many data and proc steps in whatever
order you want.
• Data steps begin with the word data and procedure steps
begin with the word proc.
• The run; command signals to SAS that the previous
commands can be executed.
• Terminate an interactive SAS job with the endsas; statement.
• There are global options (like linesize and pagesize) as well as
options specific to datasets and procedures.
• Informative messages are written to the SAS log - make sure
you read it!
11
12
Data Step: Basics
Each data step begins with the word data and optionally one or
more data set names (and associated options) followed by a
semicolon. The name(s) given on the data step are the names of
data sets which will be created within the data step.
If you don’t include any names on the data step, SAS will create
default data set names of the form datan, where n is an integer
which starts at 1 and is incremented so that each data set created
has a unique name within the current session. Since it becomes
difficult to keep track of the default names, it is recommended that
you always explicitly specify a data set name on the data
statement.
When you are running a data step to simply generate a report, and
don’t need to create a data set, you can use the special data set
name _null_ to eliminate the output of observations.
13
14
Data Step: input Statement
There are three basic forms of the input statement:
1. List input (free form) - data fields must be separated by at
least one blank. List the names of the variables, follow the
name with a dollar sign ($) for character data.
2. Column input - follow the variable name (and $ for character)
with startingcolumn – endingcolumn.
3. Formatted input - Optionally precede the variable name with
@startingcolumn; follow the variable name with a SAS format
designation. (Examples of formats: $10. (10 column
character), 6. (6 column numeric))
When mixing different input styles, note that for column and
formatted input, the next input directive reads from the column
immediately after the previous value, while for list input, the next
directive reads from the second column after the previous value.
15
16
Other Modifiers for the Input Statement
+number advance number columns.
#number advance to line number.
/ advance to next line.
trailing @ hold the line to allow further input statements in this
iteration of the data step on the same data.
trailing @@ hold the line to allow continued reading from the line
on subsequent iterations of the data step.
Note: If SAS needs to read an additional line to input all the
variables referenced in the input statement it prints the following
message on the log:
NOTE: SAS went to a new line when INPUT statement reached past
the end of a line.
If you see this note, make sure you understand why it was printed!!
17
18
FTP Access
SAS provides the ability to read data directly from an FTP server,
without the need to create a local copy of the file, through the ftp
keyword of the filename statement.
Suppose there is a data file called user.dat in the directory
public on an ftp server named ftp.myserver.com. If your user
name is joe and your password is secret, the following statement
will establish a fileref for reading the data:
filename myftp ftp ’user.dat’ cd=’/public’ user=’joe’
pass=’secret’ host=’ftp.myserver.com’;
The fileref can now be used in the infile statement in the usual
way.
You can read files from http (web) servers in a similar fashion,
using the url keyword.
19
20
Reading SAS programs from external files
The infile statement can be used to read data which is stored in
a file separate from your SAS program. When you want SAS to
read your program from an external file you can use the %include
statement, followed by a filename or fileref. After SAS processes a
%include statement, it continues to read data from its original
source (input file, keyboard or display manager.)
For example, suppose the SAS program statements to read a file
and create a data set are in the system file readit.sas. To process
those statements, and then print the data set, the following
commands can be used:
%include "readit.sas";
proc print;
run;
21
22
Titles and Footnotes
SAS allows up to ten lines of text at the top (titles) and bottom
(footnotes) of each page of output, specified with title and
footnote statements. The form of these statements is
title<n> text; or footnote<n> text;
where n, if specified, can range from 1 to 10, and text must be
surrounded by double or single quotes. If text is omitted, the title
or footnote is deleted; otherwise it remains in effect until it is
redefined. Thus, to have no titles, use:
title;
By default SAS includes the date and page number on the top of
each piece of output. These can be suppressed with the nodate and
nopage system options.
23
Missing Values
SAS handles missing values consistently throughout various
procedures, generally by deleting observations which contain
missing values. It is therefore very important to inspect the log and
listing output, as well as paying attention to the numbers of
observations used, when your data contains missing values.
For character variables, a missing value is represented by a blank
(" " ; not a null string)
For numeric variables, a missing value is represented by a period
(with no quotes). Unlike many languages, you can test for equality
to missing in the usually fasion:
if string = " " then delete; * character variable;
if num = . then delete; * numeric variable;
24
Special Missing Values
In addition to the regular missing value (.), you can specify one or
more single alphabetic characters which will be treated as missing
values when encountered in your input.
Most procedures will simply treat these special missing values in
the usual way, but others (such as freq and summary) have options
to tabulate each type of missing value separately. For example,
data one;
missing x; The 5th and 7th observations will
input vv @@;
both be missing, but internally they
datalines;
12 4 5 6 x 9 . 12 are stored in different ways.
;
Note: When you use a special missing value, it will not be detected
by a statement like if vv = .; in the example above, you would
need to use if vv = .x to detect the special missing value, or to
use the missing function of the data step.
25
Variable Lists
SAS provides several different types of variable lists, which can be
used in all procedures, and in some data step statements.
• Numbered List - When a set of variables have the same prefix,
and the rest of the name is a consecutive set of numbers, you
can use a single dash (-) to refer to an entire range:
x1 - x3 ⇒ x1, x2, x3; x01 - x03 ⇒ x01, x02, x03
• Colon list - When a set of variables all begin with the same
sequence of characters you can place a colon after the sequence
to include them all. If variables a, b, xheight, and xwidth
have been defined, then x: ⇒ xwidth, xheight.
• Special Lists - Three keywords refer to a list with the obvious
meaning: numeric character all
In a data step, special lists will only refer to variables which
were already defined when the list is encountered.
26
Variable Lists (cont’d)
• Name range list - When you refer to a list of variables in the
order in which they were defined in the SAS data set, you can
use a double dash (--) to refer to the range:
If the input statement
input id name $ x y z state $ salary
was used to create a data set, then
x -- salary ⇒ x, y, z, state, salary
If you only want character or numeric variables in the name
range, insert the appropriate keyword between the dashes:
id -numeric- z ⇒ id, x, y, z
In general, variables are defined in the order they appear in the
data step. If you’re not sure about the order, you can check
using proc contents.
27
28
drop= and keep= data set options
Sometimes you don’t need to use all of the variables in a data set
for further processing. To restrict the variables in an input data
set, the data set option keep= can be used with a list of variable
names. For example, to process the data set big, but only using
variables x, y, and z, the following statements could be used:
data new;
set big(keep = x y z);
. . .
Using a data set option in this way is very efficient, because it
prevents all the variables from being read for each observation. If
you only wanted to remove a few variables from the data set, you
could use the drop= option to specify the variables in a similar
fashion.
29
30
retain statement
SAS’ default behavior is to set all variables to missing each time a
new observation is read. Sometimes it is necessary to “remember”
the value of a variable from the previous observation. The retain
statement specifies variables which will retain their values from
previous observations instead of being set to missing. You can
specify an initial value for retained variables by putting that value
after the variable name on the retain statement.
Note: Make sure you understand the difference between retain
and keep.
For example, suppose we have a data set which we assume is sorted
by a variable called x. To print a message when an out-of-order
observation is encountered, we could use the following code:
retain lastx .; * retain lastx and initialize to missing;
if x < lastx then put ’Observation out of order, x=’ x;
else lastx = x;
31
sum Statement
Many times the sum of a variable needs to be accumulated between
observations in a data set. While a retain statement could be used,
SAS provides a special way to accumulate values known as the sum
statement. The format is
variable + expression;
where variable is the variable which will hold the accumulated
value, and expression is a SAS expression which evaluates to a
numeric value. The value of variable is automatically initialized
to zero. The sum statement is equivalent to the following:
retain variable 0;
variable = variable + expression;
with one important difference. If the value of expression is
missing, the sum statement treats it as a zero, whereas the normal
computation will propogate the missing value.
32
Default Data Sets
In most situations, if you don’t specify a data set name, SAS will
use a default dataset, using the following rules:
• When creating data sets, SAS uses the names data1, data2,
etc, if no data set name is specified. This can happen because
of a data step, or if a procedure automatically outputs a data
set which you have not named.
• When processing data sets, SAS uses the most recently created
data set, which has the special name last . This can happen
when you use a set statement with no dataset name, or invoke
a procedure without a data= argument. To override this, you
can set the value of last to a data set of your choice with the
options statement:
options _last_ = mydata;
33
34
Permanent Data Sets
You can save your SAS data sets permanently by first specifying a
directory to use with the libname statement, and then using a two
level data set name in the data step.
libname project "/some/directory";
data project.one;
Data sets created this way will have filenames of the form
datasetname.ssd or datasetname.ssd01.
In a later session, you could refer to the data set directly, without
having to create it in a data step.
libname project "/some/directory";
proc reg data=project.one;
To search more than one directory, include the directory names in
parentheses.
libname both ("/some/directory" "/some/other/directory");
35
Operators in SAS
Arithmetic operators:
* multiplication + addition / division
- subtraction ** exponentiation
Comparison Operators:
= or eq equal to ^= or ne not equal to
> or gt greater than >= or ge greater than or equal to
< or lt less than <= or le less than or equal to
Boolean Operators:
& or and and | or or or ^ or not negation
Other Operators:
>< minimum <> maximum || char. concatenation
The in operator lets you test for equality to any of several constant
values. x in (1,2,3) is the same as x=1 | x=2 | x=3.
36
Comparison Operators
Use caution when testing two floating point numbers for equality,
due to the limitations of precision of their internal representations.
The round function can be used to alleviate this problem.
Two SAS comparison operators can be combined in a single
statement to test if a variable is within a given range, without
having to use any boolean operators. For example, to see if the
variable x is in the range of 1 to 5, you can use if 1 < x < 5 ....
SAS treats a numeric missing value as being less than any valid
number. Comparisons involving missing values do not return
missing values.
When comparing characters, if a colon is used after the comparison
operator, the longer argument will be truncated for the purpose of
the comparison. Thus, the expression name =: "R" will be true
for any value of name which begins with R.
37
Logical Variables
When you write expressions using comparison operators, they are
processed by SAS and evaluated to 1 if the comparison is true, and
0 if the comparison is false. This allows them to be used in logical
statements like an if statement as well as directly in numerical
calculations.
For example, suppose we want to count the number of observations
in a data set where the variable age is less than 25. Using an if
statement, we could write:
if age < 25 then count + 1;
(Note the use of the sum statement.)
With logical expressions, the same effect can be acheived as follows:
count + (age < 25);
38
Logical Variables (cont’d)
As a more complex example, suppose we want to create a
categorical variable called agegrp from the continuous variable age
where agegrp is 1 if age is less than 20, 2 if age is from 21 to 30, 3
if age is from 31 to 40, and 4 if age is greater than 40. To perform
this transformation with if statements, we could use statements
like the following:
agegrp = 1;
if 20 < age <= 30 then agegrp = 2;
if 30 < age <= 40 then agegrp = 3;
if age > 40 then agegrp = 4;
Using logical variables provides the following shortcut:
agegrp = 1 + (age > 20) + (age > 30) + (age > 40);
39
Variable Attributes
There are four attributes common to SAS variables.
• length - the number of bytes used to store the variable in a
SAS data set
• informat - the format used to read the variable from raw data
• format - the format used to print the values of the variable
• label - a descriptive character label of up to 40 characters
You can set any one of these attributes by using the statement of
the appropriate name, or you can set all four of them using the
attrib statement.
Since named variable lists depend on the order in which variables
are encountered in the data step, a common trick is to use a
length or attribute statement, listing variables in the order you
want them stored, as the first statement of your data step.
40
Variable Lengths: Character Values
• For character variables, SAS defaults to a length of 8
characters. If your character variables are longer than that,
you’ll need to use a length statement, an informat statement or
supply a format on the input statement.
• When specifying a length or format for a character variable,
make sure to precede the value with a dollar sign ($):
attrib string length = $ 12 format = $char12.;
• The maximum length of a SAS character variable is 200.
• By default SAS removes leading blanks in character values. To
retain them use the $charw. informat.
• By default SAS pads character values with blanks at the end.
To remove them, use the trim function.
41
42
Initialization and Termination
Although the default behavior of the data step is to automatically
process each observation in an input file or existing SAS data set, it
is often useful to perform specific tasks at the very beginning or
end of a data step. The automatic SAS variable _n_ counts the
number of iterations of the data set. It is always available within
the data step, but never output to a data set. This variable will be
equal to 1 only on the first iteration of the data step, so it can be
used to signal the need for initializations.
To tell when the last observation is being processed in a data step,
the end= variable of either the infile or set statement can be
used. This variable is not output to a data set, but will be equal to
1 only when the last observation of the input file or data set is
being processed, and will equal 0 otherwise; thus any actions to be
done at the very end of processing can be performed when this
variable is equal to 1.
43
44
Flow Control: Subsetting if
Using an if statement without a corresponding then serves as a
filter; observations which do not meet the condition will not be
processed any further.
For example, the statement
if age < 60;
is equivalent to the statement
if age >= 60 then delete;
and will prevent observations where age is not less than 60 from
being output to the data set. This type of if statement is therefore
known as a subsetting if.
Note: You can not use an else statement with a subsetting if.
45
46
Flow Control: stop, abort, return
Although rarely necessary, it is sometimes useful to override SAS’
default behavior of processing an entire set of data statements for
each observation. Control within the current execution of the data
step can be acheived with the goto statement; these statements
provide more general control.
stop immediately discontinue entire execution of the data step
abort like stop, but set error to 1
error like abort, but prints a message to the SAS log
return begin execution of next iteration of data step
For example, the following statement would stop processing the
current data step and print an error message to the log:
if age > 99 then error "Age is too large for subject number " subjno ;
47
Do-loops
Do-loops are one of the main tools of SAS programming. They
exist in several forms, always terminated by an end; statement
• do; - groups blocks of statements together
• do over arrayname; - process array elements
• do var =start to end <by inc>; - range of numeric values
• do var =list-of-values;
• do while(expression); (expression evaluated before loop)
• do until(expression); (expression evaluated after loop)
The do until loop is guaranteed to be executed at least once.
Some of these forms can be combined, for example
do i= 1 to end while (sum < 100);
48
Iterative Do-loops: Example 1
Do-loops can be nested. The following example calculates how long
it would take for an investment with interest compounded monthly
to double:
data interest;
do rate = 4,4.5,5,7,9,20;
mrate = rate / 1200; * convert from percentage;
months = 0;
start = 1;
do while (start < 2);
start = start * (1 + mrate);
months + 1;
end;
years = months / 12;
output;
end;
keep rate years;
run;
49
50
Getting out of Do-loops
There are two options for escaping a do-loop before its normal
termination:
You can use a goto statement to jump outside the loop:
count = 0;
do i=1 to 10;
if x{i} = . then count = count + 1;
if count > 5 then goto done:
end;
done: if count < 5 then output;
. . .
You can also force termination of a do-loop by modifying the value
of the index variable. Use with caution since it can create an
infinite loop.
do i=1 to 10;
if x{i} = . then count = count + 1;
if count > 5 then i=10;
end;
51
52
SAS Functions: Statistical Summaries
The statistical summary functions accept unlimited numbers of
arguments, and ignore missing values.
Name Function Name Function
css corrected range maximium − minimum
sum of squares skewness skewness
cv coefficient std standard deviation
of variation stderr standard error
kurtosis kurtosis of the mean
max maximum sum sum
mean mean uss uncorrected
min minimum sum of squares
var variance
In addition, the function ordinal(n,...) gives the nth ordered
value from its list of arguments.
53
54
SAS Functions: Character Manipulation
compress(target,<chars-to-remove>)
expr = "one, two: three:";
new = compress(expr,",:");
results in new equal to one two three
With no second argument compress removes blanks.
index(source,string) - finds position of string in source
where = "university of california";
i = index(where,"cal");
results in i equal to 15
indexc(source,string) - finds position of any character in
string in source
where = "berkeley, ca";
i = indexc(where,"abc");
results in i equal to 1, since b is in position 1.
index and indexc return 0 if there is no match
55
56
SAS Functions: Character Manipulation (cont’d)
substr(string,position,<n>) - returns pieces of a variable
field = "smith, joe";
last = substr(field,1,index(field,",") - 1);
results in last equal to smith
translate(string,to,from) - changes from chars to to chars
word = "eXceLLent";
new = translate(word,"xl","XL");
results in new equal to excellent
trim(string) - returns string with leading blanks removed
upcase(string) - converts lowercase to uppercase
verify(source,string) - return position of first char. in source
which is not in string
check = verify(val,"0123456789.");
results in check equal to 0 if val is a character string containing
only numbers and periods.
57
58
Generating Random Numbers
The following example, which uses no input data, creates a data set
containing simulated data. Note the use of ranuni and the int
function to produce a categorical variable (group) with
approximately equal numbers of observations in each category.
data sim;
do i=1 to 100;
group = int(5 * ranuni(12345)) + 1;
y = rannor(12345);
output;
end;
keep group y;
run;
59
60
Subsetting Observations
Although the subsetting if is the simplest way to subset
observations you can actively remove observations using a delete
statement, or include observations using a output statement.
• delete statement
if reason = 99 then delete;
if age > 60 and sex = "F" then delete;
No further processing is performed on the current observation
when a delete statement is encountered.
• output statement
if reason ^= 99 and age < 60 then output;
if x > y then output;
Subsequent statements are carried out (but not reflected in the
current observation). When a data step contains one or more
output statements, SAS’ usual automatic outputting at the end
of each data step iteration is disabled — only observations
which are explicitly output are included in the data set.
61
62
Random Access of Observations: Example
The following program reads every third observation from the data
set big:
data sample;
do obsnum = 1 to total by 3;
set big point=obsnum nobs=total;
if _error_ then abort;
output;
end;
stop;
run;
Note that the set statement is inside the do-loop. If an attempt is
made to read an invalid observation, SAS will set the automatic
variable error to 1. The stop statement insures that SAS does
not go into an infinite loop;
63
64
Application: Random Sampling II
Now suppose we wish to randomly extract exactly n observations
from a data set. To insure randomness, we must adjust the fraction
of observations chosen depending on how many observations we
have already chosen. This can be done using the nobs= option of
the set statement. For example, to choose exactly 15 observations
from a data set all, the following code could be used:
data some;
retain k 15 n ;
drop k n;
set all nobs=nn;
if _n_ = 1 then n = nn;
if ranuni(0) < k / n then do;
output;
k = k - 1;
end;
if k = 0 then stop;
n = n - 1;
run;
65
By Processing in Procedures
In procedures, the by statement of SAS allows you to perform
identical analyses for different groups in your data. Before using a
by statement, you must make sure that the data is sorted (or at
least grouped) by the variables in the by statement.
The form of the by statement is
by <descending> variable-1 · · · <<descending> variable-n <notsorted>>;
By default, SAS expects the by variables to be sorted in ascending
order; the optional keyword descending specifies that they are in
descending order.
The optional keyword notsorted at the end of the by statement
informs SAS that the observations are grouped by the by variables,
but that they are not presented in a sorted order. Any time any of
the by variables change, SAS interprets it as a new by group.
66
Selective Processing in Procedures: where statement
When you wish to use only some subset of a data set in a
procedure, the where statement can be used to select only those
observations which meet some condition. There are several ways to
use the where statement.
As a procedure statement: As a data set option:
proc reg data=old; proc reg data=old(where = (sex eq ’M’));
where sex eq ’M’; model y = x;
model y=x; run;
run;
data new;
set old(where = (group = ’control’));
67
68
Multiple Data Sets: Overview
One of SAS’s greatest strengths is its ability to combine and
process more than one data set at a time. The main tools used to
do this are the set, merge and update statements, along with the
by statement and first. and last. variables.
We’ll look at the following situations:
• Concatenating datasets by observation
• Interleaving several datasets based on a single variable value
• One-to-one matching
• Simple Merge Matching, including table lookup
• More complex Merge Matching
69
70
Concatenating Data Sets (cont’d)
Consider two data sets clerk and manager:
Name Store Position Rank Name Store Position Staff
Joe Central Sales 5 Fred Central Manager 10
Harry Central Sales 5 John Mall Manager 12
Sam Mall Stock 3
The SAS statements to concatenate the data sets are:
data both;
set clerk manager;
run;
resulting in the following data set:
Name Store Position Rank Staff
Joe Central Sales 5 .
Harry Central Sales 5 .
Sam Mall Stock 3 .
Fred Central Manager . 10
John Mall Manager . 12
Note that the variable staff is missing for all observations from set
clerk, and rank is missing for all observations from manager. The
observations are in the same order as the input data sets.
71
72
Interleaving Datasets based on a Single Variable
If you want to combine several datasets so that observations
sharing a common value are all adjacent to each other, you can list
the datasets on a set statement, and specify the variable to be
used on a by statement. Each of the datasets must be sorted by the
variable on the by statement.
For example, suppose we had three data sets A, B, and C, and each
contained information about employees at different locations:
Set A Set B Set C
Loc Name Salary Loc Name Salary Loc Name Salary
NY Harry 25000 LA John 18000 NY Sue 19000
NY Fred 20000 NY Joe 25000 NY Jane 22000
NY Jill 28000 SF Bill 19000 SF Sam 23000
SF Bob 19000 SF Amy 29000 SF Lyle 22000
Notice that there are not equal numbers of observations from the
different locations in each data set.
73
74
One-to-one matching
To combine variables from several data sets where there is a
one-to-one correspondence between the observations in each of the
data sets, list the data sets to be joined on a merge statement. The
output data set created will have as many observations as the
largest data set on the merge statement. If more than one data set
has variables with the same name, the value from the rightmost
data set on the merge statement will be used.
You can use as many data sets as you want on the merge
statement, but remember that they will be combined in the order
in which the observations occur in the data set.
75
76
Simple Match Merging
When there is not an exact one-to-one correspondence between
data sets to be merged, the variables to use to identify matching
observations can be specified on a by statement. The data sets
being merged must be sorted by the variables specified on the by
statement.
Notice that when there is exactly one observation with each by
variable value in each data set, this is the same as the one-to-one
merge described above. Match merging is especially useful if you’re
not sure exactly which observations are in which data sets.
By using the IN= data set option, explained later, you can
determine which from data set(s) a merged observation is derived.
77
78
Simple Match Merging (cont’d)
Here’s the result of the merge:
ID Score1 Score2 Score3 Score4
7 20 18 19 12
9 15 19 . .
10 . . 12 20
12 9 15 10 19
Notes
1. All datasets must be sorted by the variables on the by
statement.
2. If an observation was missing from one or more data sets, the
values of the variables which were found only in the missing
data set(s) are set to missing.
3. If there are multiple occurences of a common variable in the
merged data sets, the value from the rightmost data set is used.
79
Table Lookup
Consider a dataset containing a patient name and a room number,
and a second data set with doctors names corresponding to each of
the room numbers. There are many observations with the same
room number in the first data set, but exactly one observation for
each room number in the second data set. Such a situation is called
table lookup, and is easily handled with a merge statement
combined with a by statement.
Patients Doctors
Patient Room Doctor Room
Smith 215 Reed 215
Jones 215 Ellsworth 217
Williams 215 . . .
Johnson 217
Brown 217
. . .
80
Table Lookup (cont’d)
The following statements combine the two data sets.
data both;
merge patients doctors;
by room;
run;
resulting in data set both
Patient Room Doctor
Smith 215 Reed
Jones 215 Reed
Williams 215 Reed
Johnson 217 Ellsworth
Brown 217 Ellsworth
Notes: . . .
• As always, both data sets must be sorted by the variables on
the by list.
• The data set with one observation per by variable must be the
second dataset on the merge statement.
81
82
Example: update statement
Set orig Set upd
ID Account Balance ID Account Balance
1 2443 274.40 1 . 699.00
2 4432 79.95 2 2232 .
3 5002 615.00 2 . 189.95
3 6100 200.00
Data set orig can be updated with the values in upd using the
following statements:
data orig;
update orig upd;
by id;
resulting in the updated data set:
ID Account Balance
1 2443 699.00
2 2232 189.95
3 6100 200.00
83
84
More Complex Merging (cont’d)
The following example, although artificial, illustrates some of the
points about complex merging:
one two three
a b c a b d a b c d
1 3 20 1 3 17 1 3 20 17
1 3 19 1 5 12 1 5 19 12
1 7 22 2 9 21 =⇒ 1 7 22 12
2 9 18 2 3 15 2 9 18 21
2 3 22 2 6 31 2 3 22 15
2 6 22 31
The data sets were merged with the following statements:
data three;
merge one two;
by a;
85
86
Example of in= data set option
data both problem;
merge scores1(in=one) scores2(in=two);
by id;
if one and two then output both;
else output problem;
run;
The resulting data sets are shown below; note that the in=
variables are not output to the data sets which are created.
87
88
Application: Finding Duplicate Observations I
Many data sets are arranged so that there should be exactly one
observation for each unique combination of variable values. In the
simplest case, there may be an identifier like a social security or
student identification number, and we want to check to make sure
there are not multiple observations with the same value for that
variable.
If the data set is sorted by the identifier variable (say, ID), code like
the following will identify the duplicates:
data check;
set old;
by id;
if first.id and ^last.id;
run;
The duplicates can now be found in data set check
89
90
Example of first. and last. variables (cont’d)
Here are the results of a simple example of the previous program:
Set grp
Group X
1 16
1 12
1 19 Set max
1 15 Group X1 X2 X3
1 18 1 19 18 17
1 17 =⇒ 2 30 20 14
2 10 3 59 45 18
2 20
2 8
2 14
2 30
3 59
3 45
3 2
3 18
91
Sorting datasets
For procedures or data steps which need a data set to be sorted by
one or more variables, there are three options available:
1. You can use proc sort; to sort your data. Note that SAS
stores information about the sort in the dataset header, so that
it will not resort already sorted data.
2. You can enter your data in sorted order. If you choose this
option, make sure that you use the correct sorting order.* To
prevent proc sort from resorting your data, use the
sortedby= data set option when you create the data set.
3. You can create an index for one or more combinations of
variables which will stored along with the data set.
* EBCDIC Sorting Sequence (IBM mainframes):
blank .<(+|\&!$*);^-/,%_>?‘:#@’="abcdefghijklmnopqr~stuvwxyz{ABCDEFGHI}JKLMNOOPQR\STUVWXYZ0123456789
ASCII Sorting Sequence (most other computers):
blank !"#$%&’()* +,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_‘abcdefghijklmnopqrstuvwxyz{|}~
92
Indexed Data Sets
If you will be processing a data set using by statements, or
subsetting your data based on the value(s) of one or more variables,
you may want to store an index based on those variables to speed
future processing.
proc datasets is used to create an index for a SAS data set.
Suppose we have a data set called company.employ, with variables
for branch and empno. To store a simple index (based on a single
variable), statements like the following are used:
proc datasets library=company;
modify employ;
index create branch;
run;
More than one index for a data set can be specified by including
multiple index statements. The index will be used whenever a by
statement with the indexed variable is encountered.
93
In the previous example, the composite index would mean the data
set is also indexed for branch, but not for idnum.
Note: If you are moving or copying an indexed data set, be sure to
use a SAS procedure like proc copy, datasets, or cport rather
than system utilities, to insure that the index gets correctly copied.
94
Formats and Informats
Formats are used for controlling the appearance of variable values
in many procedures as well as for printing variable values using the
put statement of the data step. In addition, many procedures use
the format statement to group values together for analysis.
Informats are used in conjunction with the input statement to
specify the way that variables are to be read if they are not in the
usual numeric or character format.
SAS provides a large number of predefined formats, as well as the
ability to write your own formats, for input as well as output.
You can permanently associate a format with a variable by
including a format or attribute statement in the data step when
the data set is created, or temporarily by using a format statement
within a procedure.
95
Basic Formats
Numeric formats are of the form w. or w.d, representing a field
width of w, and containing d decimal places.
put x 6.; *write x with field width of 6;
format price 8.2; *use field width of 8 and 2 d.p. for price;
The bestw. format can be used if you’re not sure about the
number of decimals. (For example, best6. or best8..)
Simple character formats are of the form $w., where w is the
desired field width. (Don’t forget the period.)
put name $20.; * write name with field width of 20;
format city $50.; * use field width of 50 for city;
You can also use formats with the put function to create character
variables formatted to your specifications:
x = 8;
charx = put(x,8.4);
creates a character variable called charx equal to 8.0000
96
Informats
The basic informats are the same as the basic formats, namely w.d
for numeric values, and $w. for character variables.
By default, leading blanks are stripped from character values. To
retain them, use the $charw. format.
When you specify a character informat wider than the default of 8
columns, SAS automatically will make sure the variable is big
enough to hold your input values.
Some Other SAS Informats
Name Description Name Description
hexw. numeric hexadecimal $hexw. character hexadecimal
octalw. numeric octal $octalw. character octal
bzw.d treat blanks as zeroes ew.d scientific notation
rbw.d floating point binary ibw.d integer binary
pdw.d packed decimal $ebcdicw. EBCDIC to ASCII
97
98
User-defined Format: Examples
For a variable with values from 1 to 5, the format qval. displays 1
and 2 as low, 3 as medium and 4 and 5 as high.
The format mf. displays values of 1 as male, 2 as female and all
other values as invalid.
The format tt. display values below .001 as undetected, and all
other values in the usual way.
proc format;
value qval 1-2=’low’ 3=’medium’ 4-5=’high’;
value mf 1=’male’ 2=’female’ other=’invalid’;
value tt low-.001=’undetected’;
run;
99
100
Recoding Values using Formats (cont’d)
If the two variables in our survey are called years and happy, the
following program would produce a cross tabulation:
proc freq;tables years*happy/nocol norow nocum nopct;
101
102
SAS Date and Time Values
There are three types of date and time values which SAS can
handle, shown with their internal representation in parentheses:
• Time values (number of seconds since midnight)
• Date values (number of days since January 1, 1970)
• Datetime values (number of seconds since January 1, 1970)
You can specify a date and/or time as a constant in a SAS program
by surrounding the value in quotes, and following it with a t, a d or
a dt. The following examples show the correct format to use:
3PM ⇒ ’3:00p’t or ’15:00’t or ’15:00:00’t
January 4, 1937 ⇒ ’4jan37’d
9:15AM November 3, 1995 ⇒
’3nov95:9:15’dt or ’3nov95:9:15:00’dt
103
104
Other Date and Time Informats and Formats
105
106
Date and Time Functions
datepart – Extract date from datetime value
dateonly = datepart(fulldate);
day,month year – Extract part of a date value
day = day(date);
dhms – Construct value from date, hour, minute and second
dtval = dhms(date,9,15,0);
mdy – Construct date from month, day and year
date = mdy(mon,day,1996);
time – Returns the current time of day
now = time();
today – Returns the current date
datenow = today();
intck – Returns the number of intervals between two values
days = intck(’day’,then,today());
intnx – Increments a value by a number of intervals
tomrw = intnx(’day’,today(),1);
107
108
Application: Blue Moons (cont’d)
Now we can use the put function to create a variable with the full
month name and the year.
data bluemoon;
set fullmoon;
by year month;
if last.month and not first.month then do;
when = put(date,monname.) || ", " || put(date,year.);
output;
end;
run;
proc print data=bluemoon noobs;
var when;
run;
The results look like this:
December, 1998
August, 2000
May, 2002
. . .
109
To print the values of the variables x and y on one line and name
and address on a second line, you could use:
put x 8.5 y best10. / name $20 @30 address ;
110
Additional Features of the put statement
By default, the put statement puts a newline after the last item
processed. To prevent this (for example to build a single line with
multiple put statements, use a trailing @ at the end of the put
statement.
The n* operator repeats a string n times. Thus
put 80*"-";
prints a line full of dashes.
Following a variable name with an equal sign causes the put
statement to include the variable’s name in the output. For
example, the statements
x = 8;
put x=;
results in X=8 being printed to the current output file. The keyword
all on the put statement prints out the values of all the variables
in the data set in this named format.
111
112
Output Delivery System (ODS)
To provide more flexibility in producing output from SAS data
steps and procedures, SAS introduced the ODS. Using ODS, output
can be produced in any of the following formats (the parenthesized
keyword is used to activate a particular ODS stream):
• SAS data set (OUTPUT)
• Normal listing (LISTING) - monospaced font
• Postscript output (PRINTER) - proportional font
• PDF output (PDF) - Portable Document Format
• HTML output (HTML) - for web pages
• RTF output (RTF) - for inclusion in word processors
Many procedures produce ODS objects, which can then be output
in any of these formats. In addition, the ods option of the file
statement, and the ods option of the put statement allow you to
customize ODS output.
113
ODS Destinations
You can open an ODS output stream with the ODS command and a
destination keyword. For example, to produce HTML formatted
output from the print procedure:
ods html file="output.html";
proc print data=mydata;
run;
ods html close;
Using the print and ods options of the file statement, you can
customize ODS output:
ods printer;
data _null_;
file print ods;
... various put statements ...
run;
ods printer close;
114
SAS System Options
SAS provides a large number of options for fine tuning the way the
program behaves. Many of these are system dependent, and are
documented online and/or in the appropriate SAS Companion.
You can specify options in three ways:
1. On the command line when invoking SAS, for example
sas -nocenter -nodms -pagesize 20
2. In the system wide config.sas file, or in a local config.sas
file (see the SAS Companion for details).
3. Using the options statement:
options nocenter pagesize=20;
Note that you can precede the name of options which do not take
arguments with no to shut off the option. You can display the
value of all the current options by running proc options.
115
116
Application: Rescanning Input
Suppose we have an input file which has a county name on one line
followed by one or more lines containing x and y coordinates of the
boundaries of the county. We wish to create a separate observation,
including the county name, for each set of coordinates.
A segment of the file might look like this:
alameda
-121.55 37.82 -121.55 37.78 -121.55 37.54 -121.50 37.53
-121.49 37.51 -121.48 37.48
amador
-121.55 37.82 -121.59 37.81 -121.98 37.71 -121.99 37.76
-122.05 37.79 -122.12 37.79 -122.13 37.82 -122.18 37.82
-122.20 37.87 -122.25 37.89 -122.27 37.90
calaveras
-121.95 37.48 -121.95 37.48 -121.95 37.49 -122.00 37.51
-122.05 37.53 -122.07 37.55 -122.09 37.59 -122.11 37.65
-122.14 37.70 -122.19 37.74 -122.24 37.76 -122.27 37.79
-122.27 37.83 -122.27 37.85 -122.27 37.87 -122.27 37.90
. . .
117
118
Application: Reshaping a Data Set I
Since SAS procedures are fairly rigid about the organization of
their input, it is often necessary to use the data step to change the
shape of a data set. For example, repeated measurements on a
single subject may be on several observations, and we may want
them all on the same observation. In essence, we want to perform
the following transformation:
Subj Time X
1 1 10
1 2 12
··· Subj X1 X2 ··· Xn
1 n 8 =⇒ 1 10 12 ··· 8
2 1 19 2 19 7 ··· 21
2 2 7
···
2 n 21
119
120
Application: Reshaping a Data Set II
A similar problem to the last is the case where the data for several
observations is contained on a single line, and it is necessary to
convert each of the lines of input into several observations. Suppose
we have test scores for three different tests, for each of several
subjects; we wish to create three separate observations from each of
these sets of test scores:
data scores;
* assume set three contains id, group and score1-score3;
set three;
array sss score1-score3;
do time = 1 to dim(sss);
score = sss{time};
output;
end;
drop score1-score3;
run;
121
122
Using ODS to create data sets
Many procedures use the output delivery system to provide
additional control over the output data sets that they produce. To
find out if ODS tables are available for a particular procedure, use
the following statement before the procedure of interest:
ods trace on;
Each table will produce output similar to the following on the log:
Output Added:
-------------
Name: ExtremeObs
Label: Extreme Observations
Template: base.univariate.ExtObs
Path: Univariate.x.ExtremeObs
-------------
123
124
Output Data Sets: Example I
It is often useful to have summary information about a data set
available when the data set is being processed. Suppose we have a
data set called new, with a variable x, and we wish to calculate a
variable px equal to x divided by the maximum value of x.
proc summary data=new;
var x;
output out=sumnew max=maxx;
run;
data final;
if _n_ = 1 then set sumnew(keep=maxx);
set new;
px = x / maxx;
run;
125
The nway option limits the output data set to contain observations
for each unique combination of the variables given in the class
statement.
126
Output Data Sets: Example III
Suppose we have a data set called hdata, consisting of three
variables: hospital, time and score, representing the score of
some medical exam taken at three different times at three different
hospitals, and we’d like to produce a plot with three lines: one for
the means of each of the three hospitals over time. The following
statements could be used:
proc means noprint nway data=hdata;
class hospital time;
var score;
output out=hmeans mean=mscore;
run;
127
128
Plotting the Means
The following program produces the graph shown on the right:
symbol1 interpol=join
value=plus;
symbol2 interpol=join
value=square;
symbol3 interpol=join
value=star;
title "Means versus Time";
proc gplot data=hmeans;
plot mscore*time=hospital;
run;
129
130
SAS Macro Language: Overview
At it’s simplest level, the SAS Macro language allows you to assign
a string value to a variable and to use the variable anywhere in a
SAS program:
%let header = "Here is my title";
. . .
proc print ;
var x y z;
title &header;
run;
This would produce exactly the same result as if you typed the
string "Here is my title" in place of &header in the program.
Notice that the substitution is very simple - the text of the macro
variable is substituted for the macro symbol in the program.
131
132
SAS Macro Language: Overview (cont’d)
A large part of the macro facility’s utility comes from the macro
programming statements which are all preceded by a percent sign
(%). For example, suppose we need to create 5 data sets, named
sales1, sales2, etc., each reading from a corresponding data file
insales1, insales2, etc.
%macro dosales;
%do i=1 %to 5;
data sales&i;
infile "insales&i";
input dept $ sales;
run;
%end;
%mend dosales;
%dosales;
Note that, until the last line is entered, no actual SAS statements
are carried out; the macro is only compiled.
133
134
call symput: Example
Suppose we want to put the maximum value of a variable in a title.
The following program shows how.
data new;
retain max -1e20;
set salary end = last;
if salary > max then max = salary;
if last then call symput("maxsal",max);
drop max;
run;
135
136
Another Alternative to the Macro Facility
In addition to writing SAS statements to a file, SAS provides the
call execute function. This function takes a quoted string, a
character variable, or a SAS expression which resolves to a
character variable and then executes its input when the current
data step is completed.
For example, suppose we have a data set called new which contains
a variable called maxsal. We could generate a title statement
containing this value with statements like the following.
data _null_;
set new;
call execute(’title
"Salaries of employees (Maximum = ’|| put(maxsal,6.) || ’)";’);
run;
137
138
Application: Reading a Series of Files
Suppose we have a data set containing the names of files to be read,
and we wish to create data sets of the same name from the data in
those files. First, we use the call symput function in the data step
to create a series of macro variables containing the file names
data _null_;
set files end=last;
n + 1;
call symput("file"||left(n),trim(name));
if last then call symput("num",n);
run;
Since macros work by simple text substitution, it is important that
there are no blanks in either the macro name or value, thus the use
of left and trim
139
140
Macros with Arguments
Consider the following program to print duplicate cases with
common values of the variables a and b in data set one:
data one;
input a b y @@;
datalines;
1 1 12 1 1 14 1 2 15 1 2 19 1 3 15 2 1 19 2 4 16 2 4 12 2 8 18 3 1 19
proc summary data=one nway ;
class a b;
output out=next(keep = a b _freq_ rename=(_freq_ = count));
data dups;
merge one next;
by a b;
if count > 1;
proc print data=dups;
run;
If we had simple way of changing the input data set name and the
list of variables on the by statement, we could write a general
macro for printing duplicates in a data set.
141
142
Accessing Operating System Commands
If you need to run an operating system command from inside a
SAS program, you can use the x command. Enclose the command
in quotes after the x, and end the statement with a semicolon. The
command will be executed as soon as it is encountered.
For example, in an earlier program, a file called tmpprog.sas was
created to hold program statements which were later executed. To
remove the file after the statements were executed (on a UNIX
system) you could use the SAS statement:
x ’rm tmpprog.sas’;
Other interfaces to the operating system may be available. For
example, on UNIX systems the pipe keyword can be used on a
filename statement to have SAS read from or write to a process
instead of a file. See the SAS Companion for your operating system
for more details.
143
144
SAS/CONNECT
SAS also provides a product called SAS/CONNECT which lets you
initiate a SAS job on a remote computer from a local SAS display
manager session. It also provides two procedures, proc upload and
proc download to simplify transporting data sets. If
SAS/CONNECT is available on the machines between which the
data set needs to be moved, it may be the easiest way to move the
data set.
SAS/CONNECT must be run from the display manager. When
you connect with the other system, you will be prompted for a
login name and a password (if appropriate). Once you’re
connected, the rsubmit display manager command will submit jobs
to the remote host, even though the log and output will be
managed by the local host.
145
146
proc transpose
Occasionally it is useful to switch the roles of variables and
observations in a data set. The proc transpose program takes
care of this task.
To understand the workings of proc transpose, consider a data
set with four observations and three variables (x, y and z). Here’s
the transformation proc transpose performs:
Original data Transposed data
X Y Z _NAME_ COL1 COL2 COL3 COL4
12 19 14 X 12 21 33 14
21 15 19 =⇒ Y 19 15 27 32
33 27 82 Z 14 19 82 99
14 32 99
The real power of proc transpose becomes apparent when it’s
used with a by statement.
147
148
proc transpose with a by statement (cont’d)
To make sure proc transpose understands the structure that we
want in the output data set, an id statement is used to specify
time as the variable which defines the new variables being created.
The prefix= option controls the name of the new variables:
proc transpose data=one out=two prefix=value;
by subj;
id time;
Notice that the missing value for subject 2, time 2 was handled
correctly.
149
proc contents
Since SAS data sets cannot be read like normal files, it is important
to have tools which provide information about data sets. proc
print can show what’s in a data set, but it not always be
appropriate. The var and libname windows of the display manager
are other useful tools, but to get printed information or to
manipulate that information, you should use proc contents.
Among other information, proc contents provides the name,
type, length, format, informat and label for each variable in the
data set, as well as the creation date and time and the number of
observations. To use proc contents, specify the data set name as
the data= argument of the proc contents statement; to see the
contents of all the data sets in a directory, define an appropriate
libname for the directory, and provide a data set name of the form
libname. all .
150
Options for proc contents
The short option limits the output to just a list of variable names.
The position option orders the variables by their position in the
data set, instead of the default alphabetical order. This can be
useful when working with double dashed lists.
The directory option provides a list of all the data sets in the
library that the specified data set comes from, along with the usual
output for the specified data set.
The nods option, when used in conjunction with a data set of the
form libname. all , limits the output to a list of the data sets in
the specified libname, with no output for the individual data sets.
The out= option specifies the name of an output data set to
contain the information about the variables in the specified data
set(s). The program on the next slide uses this data set to write a
human readable version of a SAS data set.
151
152
The Display Manager
When SAS is invoked, it displays three windows to help you
interact with your programs and output:
• Program Window - for editing and submitting SAS statements
• Log Window - for viewing and saving log output
• Output Window - for viewing and saving output
Some commands which open other useful windows include:
• assist - menu driven version of SAS
• dir - shows data sets in a library
• var - shows variables in a data set
• notepad - simple text window
• options - view and change system options
• filename - view current filename assignments
• help - interactive help system
• libname - view current libname assignments
153
154
Entering Display Manager Commands
You can type display manager commands on the command line of
any display manager window. (To switch from menu bar to
command line select Globals -> Options -> Command line; to
switch back to menu bar, enter the command command.)
You can also enter display manager commands from the program
editor by surrounding them in quotes, and preceding them by dm,
provided that the display manager is active.
Some useful display manager commands which work in any window
include:
• clear - clear the contents of the window
• end - close the window
• endsas - end the sas session
• file "filename" - save contents of the window to filename
• prevcmd - recall previous display manager command
155
156
Using the Program Editor
There are two types of commands which can be used with the
program editor
• Command line commands are entered in the Command ===>
prompt, or are typed into a window when menus are in effect.
• Line commands are entered by typing over the numbered lines
on the left hand side of the editor window. Many of the line
commands allow you to operate on multiple selected lines of
text.
In addition, any of the editor or other display manager commands
can be assigned to a function or control key, as will be explained
later.
Note: The undo command can be used to reverse the effect of
editing commands issued in the display manager.
157
158
Defining Function and Control Keys
You can define function keys, control keys, and possibly other keys
depending on your operating system, through the keys window of
the display manager.
To define a function key to execute a display manager command,
enter the name of the command in the right hand field next to the
key you wish to define.
To define a function key to execute an editor line command, enter
the letter(s) corresponding to the command preceded by a colon (:)
in the right hand field.
To define a function key to insert text into the editor, precede the
text to be inserted with a tilda (~) in the right hand field.
Some display manager commands only make sense when defined
through keys. For example the command home puts the cursor on
the command line of a display manager window.
159
160
Cutting and Pasting
If block moves and/or copies do not satisfy your editing needs, you
can cut and paste non-rectangular blocks of text. Using these
commands generally requires that keys have been defined for the
display manager commands mark, cut, and paste, or home.
To define a range of text, issue the mark command at the beginning
and end of the text you want cut or pasted. Then issue the cut
(destructive) or store (non-destructive) command. Finally, place
the cursor at the point to which you want the text moved, and
issue the paste command.
When using cut or store, you can optionally use the append
option, which allows you to build up the contents of the paste
buffer with several distinct cuts or copies, or the buffer=name
option, to create or use a named buffer.
161
162
Using the Find and Change Commands
• To change every occurence of the string “sam” to “fred”,
ignoring the case of the first string, enter
change sam fred all icase
• To selectively change the word cat to dog, use
change cat dog word
followed by repeated use of rfind, to find the next occurence
of the word, and rchange if a change is desired.
• To count the number of occurences of the word fish, use
find fish all
and the count will be displayed under the command line.
If an area of text is marked (using the display manager mark
command), then search and/or find commands apply only to the
marked region.
163
164