Lesson 3 - Import SAS Dataset
Lesson 3 - Import SAS Dataset
Study Notes
The INPUT statement, which is part of the DATA step, tells SAS how to read your raw data.
To write an INPUT statement using list input, simply list the variable names after the
INPUT keyword in the order they appear in the data file. If the values are character (not
numeric), then place a dollar sign ($) after the variable name. Leave at least one space
between names, and remember to place a semicolon at the end of the statement.
DATA 'drive:\directory\filename';
SET datasetname;
RUN;
If your data lines are long, then use the LRECL= option in the INFILE statement to specify
the longest record in your data file.
INFILE 'drive:\directory\filename.dat' LRECL=2000;
SAS Code Example 3.3 (similar dataset as example 3.1, but comma separated):
Try to import US_president.csv in SASUSER library using Import Wizard. Then print the
result like example 3.1. You will observe the following in the log.
SAS Code Example 3.4 (similar dataset as example 3.1, now different spacing):
* Read internal data into SAS data set uspresid3_4;
DATA uspresid3_4;
INPUT President $ Party $ Number;
DATALINES;
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;
RUN;
PROC PRINT DATA = uspresid3_4;
RUN;
• Space files
If the values in your raw data file are all separated by at least one space, then using list
input (or call free format input). List input is an easy way to read raw data into SAS, but
with ease come a few limitations. You must read all the data in a record — no skipping
over unwanted values. Any missing data must be indicated with a period.
SAS Code Example 3.6 (same dataset as example 3.5, now in hsb.dat):
* Create a SAS data set named hsb3_6;
* Read the data file hsb.dat using list input;
DATA hsb3_6;
INFILE 'C:\Users\elainemo\hsb.dat';
INPUT id female race schtype $ read write math science;
RUN;
Column input has several features that make it useful for reading raw data. It can be used to
read character variable values that contain embedded blanks. Fields can be read in any order.
No placeholder is required for missing data where a blank field is read as missing. Fields or
parts of fields can be re-read. Fields do not have to be separated by blanks or other delimiters.
SAS Code Example 3.8 (similar dataset as example 3.5, now different spacing):
The following is the data file named hsb_Column.dat with ruler added on top (actual dataset
does not have the rule).
1---+----10---+----20--
147 1 1 pub 47 62 53 53
108 0 1 pri 34 33 41 36
18 . 3 pub 50 33 49 44
53 0 1 pub 39 40 39
50 0 2 pri 50 59 42 53
51 1 2 pri 42 36 42 31
102 0 1 . 52 41 51 53
* Create a SAS data set named hsb3_8;
* Read the hsb_Column.dat using column input;
DATA hsb3_8;
INFILE 'C:\Users\elainemo\hsb_Column.dat';
INPUT id 1-3 schtype $ 8-11 female 4-5 race 6-7
read 12-14 write 15-17 math 18-20 science 21-22;
RUN;
PROC PRINT DATA = hsb3_8;
TITLE 'Example 3.8';
RUN;
DATA weblogs;
INFILE 'C:\Users\elainemo\dogweblogs.dat';
INPUT @'[' AccessDate DATE11. @'GET' File :$20.;
/* Read the data file using column pointers */
RUN;
To read multiple lines of raw data per observation using line pointers
(1) Use slash (/) to skip to the next line
(2) Use pound-n (#n) specify number of the line of raw data for that observation
(3) Can be used to skip backwards or forwards between multiple data lines
To read multiple observations per line of raw data using double trailing
(1) Use double trailing at signs (@@) at the end of INPUT statement
(2) SAS will continue to read observations until it either runs out of data or reaches an
INPUT statement that does not end with a double trailing
Sometimes raw data are not straightforward numeric or character. For example, the non-
standard numerical data mentioned in previous lesson with values that contain percent signs,
dollar signs, and commas, or date and time values. We can tell SAS what to do in the
INPUT statement.
The variable Name has an informat of $16., meaning that it is a character variable 16 columns
wide. Variable Age has an informat of three, is numeric, three columns wide, and has no
decimal places. The +1 skips over one column. Variable Type is character, and it is one column
wide. Variable Date has an informat MMDDYY10. and reads dates in the form 10-31-2007 or
10/31/2007, each 10 columns wide. The remaining variables, Score1 through Score5, all
require the same informat, 4.1. By putting the variables and the informat in separate sets of
parentheses, you only have to list the informat once.
SAS Code Example 3.15 (similar dataset as example 3.14, with different spacing):
The following is a sample of the data file named Pumpkin _List.dat.
Alicia Grossman 13 c 10-28-2008 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2008 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2008 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2008 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2008 8.9 9.5 10.0 9.7 9.0
Brian Williams 11 C 10-29-2008 7.8 8.4 8.5 7.9 8.0
DATA contest3_15;
INFILE 'C:\Users\elainemo\Pumpkin_List.dat';
LENGTH Name $ 16;
INPUT Name & Age Type $ Date : MMDDYY10.
Score1 Score2 Score3 Score4 Score5;
/* Read the data file using modified list input */
RUN;
PROC PRINT DATA = contest3_15;
TITLE 'Example 3.15';
RUN;
By default, SAS interprets two or more delimiters in a row as a single delimiter. For
delimiter-sensitive data files with missing values, and two delimiters in a row indicate a
missing value, we can use DSD option.
To read delimiter-sensitive data
(1) Ignores delimiters in data values enclosed in quotation marks
(2) Does not read quotation marks as part of the data value
(3) Treats two delimiters in a row as a missing value
(4) The DSD option in the INFILE statement, it assumes the delimiter is a comma
(5) Use DLM= option with the DSD option to specify other delimiters. For example,
DLM='09'X option equivalent of a tab character hexadecimal
DATA address3_18b;
INFILE 'C:\Users\elainemo\Address.dat' MISSOVER;
/* assign missing value if the data does not have
full length input */
INPUT Name $ 1-15 Number 16-19 Street $ 22-37;
RUN;
PROC PRINT DATA = address3_18b;
TITLE 'Example 3.18b';
RUN;
SAS Code Example 3.20 (similar dataset at example 3.2, with data description):
The US_president_Obs.dat has a description of the data in the first two lines and a remark at
the end of the file that was not part of the data.
Information of selected US presidents
President Party Number
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
Data copy from wiki
DATA uspresid3_20;
INFILE 'C:\Users\elainemo\US_president_Obs.dat'
FIRSTOBS=3 OBS=6;
/* Read the data from line 3 to 6 */
INPUT President $ Party $ Number;
RUN;
• Delimited files
Delimited files are raw data files that have a special character separating data values, often
with commas or tab characters for delimiters. We specify different DBMS identifier
for SAS to read the following types of file:
Type of File Extension identifier
Comma-delimited .csv CSV
Tab-delimited .txt TAB
Delimiters other than commas or tabs - DLM
Apart from different delimited files, there are also some other options we can specify after
the PROC IMPORT procedure:
SAS Code Example 3.22 (similar dataset as example 3.1, but saved in excel .xlsx file):
PROC IMPORT DATAFILE = 'C:\Users\elainemo\US_president.xlsx'
DBMS=XLSX OUT = uspresid3_22 REPLACE;
/* file type extension .xlsx is excel file */
RUN;
PROC PRINT DATA = uspresid3_22;
TITLE 'Example 3.22';
RUN;