0% found this document useful (0 votes)
84 views

Reading Data in SAS

1. The document discusses different methods for reading data into SAS, including reading data inline, using column pointers, reading from ASCII files, and reading dBase files. It provides examples of code for each method. 2. Key points covered include using the INPUT statement to define variables, identifying character variables with a $, and reading multiple observations per line using the @@ symbol in the INPUT statement. The document also discusses using the INFILE and FILENAME statements to read from external files. 3. Reading external files provides more flexibility than reading inline, as it avoids having to copy all the raw data into the SAS program. Options like FIRSTOBS= and OBS= allow control over which records are read from the external file.

Uploaded by

Chand Sha
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Reading Data in SAS

1. The document discusses different methods for reading data into SAS, including reading data inline, using column pointers, reading from ASCII files, and reading dBase files. It provides examples of code for each method. 2. Key points covered include using the INPUT statement to define variables, identifying character variables with a $, and reading multiple observations per line using the @@ symbol in the INPUT statement. The document also discusses using the INFILE and FILENAME statements to read from external files. 3. Reading external files provides more flexibility than reading inline, as it avoids having to copy all the raw data into the SAS program. Options like FIRSTOBS= and OBS= allow control over which records are read from the external file.

Uploaded by

Chand Sha
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

Reading data inline


2. Reading data with column pointers
3. Reading data from ASCII files
4. Reading dBase files
5. Creating a permanent data set
6. Reading Excel files into SAS

6.1. Using the Import facility


6.2. Using PROC ACCESS
6.3. Using Dynamic Data Exchange (DDE)

A DATA step creates a SAS data set. Each data set consists of observations and variables.
You can think of a data set as a two-dimensional table where variables are the columns and
observations the rows. The columns are identified by variable names. The name of a variable has
to start with a character and can be up to eight characters long.

  Variable 1 Variable 2 Variable 3


Observation 1 ---->      
Observation 2 ---->      
Observation 3 ---->      

1. Reading data inline

Reading data inline means that the raw data is part of the SAS program. You enter the actual
data points inside the PROGRAM EDITOR. The following DATA step creates a data set named
SURVEY with ten observations and seven variables. The variables are called ID, SEX, AGE, INC,
R1, R2, and R3. The data stems from a consumer survey where INC denotes annual income in
thousands, R1 through R3 are rating scores of three particular products. The INPUT statement
defines the variables to be read in each line of data. The DATALINES statement indicates to SAS
that DATA step statements are completed and the next line contains real data. Notice that the
lines of data do not end in a semicolon. Instead of the DATALINES statement you can also use
the CARDS statement. The two are equivalent. CARDS stems from way back when reading data
meant feeding punch cards. The lone semicolon before the PROC PRINT statement is something
I am used to doing. You could delete that line altogether.

options nocenter;
DATA survey;
   INPUT id sex $ age inc r1 r2 r3 ;
   DATALINES;
 1  F  35 17  7 2 2
17  M  50 14  5 5 3
33  F  45  6  7 2 7
49  M  24 14  7 5 7
65  F  52  9  4 7 7
81  M  44 11  7 7 7
2   F  34 17  6 5 3
18  M  40 14  7 5 2
34  F  47  6  6 5 6
50  M  35 17  5 7 5
;
PROC PRINT; RUN;

The output from this run looks as follows:

OBS    ID    SEX    AGE     INC    R1    R2    R3

  1     1     F       35     17     7      2     2


  2    17     M       50     14     5      5     3
  3    33     F       45      6      7     2     7
  4    49     M       24     14     7      5     7
  5    65     F       52      9      4     7     7
  6    81     M       44     11     7      7     7
  7     2     F       34     17     6      5     3
  8    18     M       40     14     7      5     2
  9    34     F       47      6      6     5     6
 10    50     M       35     17     5      7     5

The variable SEX in the INPUT statement is followed by a dollar sign ($) indicating that this is a
character variable. Without specific instructions, SAS assumes that variables are numeric. If it
then encounters a character, it will assign a missing value (.) to the variable for that observation.
If the dollar sign is removed, the result is as follows:

DATA survey;
   INPUT id sex  age inc r1 r2 r3 ;
   DATALINES;
 1  F  35 17  7 2 2
17  M  50 14  5 5 3
33  F  45  6  7 2 7
49  M  24 14  7 5 7
65  F  52  9  4 7 7
81  M  44 11  7 7 7
2   F  34 17  6 5 3
18  M  40 14  7 5 2
34  F  47  6  6 5 6
50  M  35 17  5 7 5
;
PROC PRINT; RUN;

The output from this run looks as follows:

OBS    ID    SEX    AGE     INC    R1    R2    R3

  1     1     .       35     17     7      2     2


  2    17     .       50     14     5      5     3
  3    33     .       45      6      7     2     7
  4    49     .       24     14     7      5     7
  5    65     .       52      9      4     7     7
  6    81     .       44     11     7      7     7
  7     2     .       34     17     6      5     3
  8    18     .       40     14     7      5     2
  9    34     .       47      6      6     5     6
 10    50     .       35     17     5      7     5
3.2. Reading data with column pointers

Sometimes it is useful to tell SAS exactly where to find an observation. For example, if the data
set contains a characer variable whose entries are of different length or contain embedded
blanks.

data thisone;
  input name $1-20  age city $29-37 state $ ;
  datalines;
Oliver Schabenberger    33  Lansing    MI
John T. Smith        37     New York   NY
;
run;
proc print data=thisone; run;

Following the variable name NAME with $1-20 indicates that NAME is a character variable whose
entries are found in columns 1-20 of the datalines. Since blanks automatically delimit variables in
datalines, the fact that the two ages are not aligned has no effect. They will be read correctly
without pointing SAS to a specific column of the dataline. The city then is again read by pointing
SAS to specific columns. This is not necessary for the variable STATE since its entries begin in
the same column and do not contain embedded blanks. The data set THISONE looks as follows:

OBS    NAME                     AGE       CITY      STATE

 1     Oliver Schabenberger     33     Lansing      MI


 2     John T. Smith             37     New York     NY

If the $29-37 pointer after CITY is replaced with a simple $ to indicate a character variable in the
INPUT statement, here is what happens.

data thisone;
  input name $1-20  age city $ state $ ;
  datalines;
Oliver Schabenberger    33  Lansing    MI
John T. Smith        37     New York   NY
;
run;
proc print data=thisone; run;

OBS    NAME                     AGE       CITY      STATE

 1     Oliver Schabenberger     33     Lansing      MI


 2     John T. Smith             37     New          York

For the second observation, SAS finds the characters 'New' and assigns it to the CITY variable
thinking the blank between 'New' and 'York' delimits two variables. The string 'York' is then
assigned to the next variable in the INPUT statement, which is STATE.

Data lines can contain more information than is necessary for a single observation. Assume a
data set contains three variables, X, Y, and Z. A DATA step could look like:

data XYZ;
  input x y z;
  datalines;
1  1  12.4
1  2  11.3
1  3   1.4
2  1   2.1
2  2  19.4
2  3  10.0
;
run;

 You can also read multiple observations in each line of data. Simply add the @@ symbol at the
end of the input statement:

data XYZ;
  input x y z @@;
  datalines;
1  1  12.4 1  2  11.3 1  3   1.4
2  1   2.1 2  2  19.4 1  3  10.0
;
run;

SAS will go through each line of data and fill the variables X, Y, and Z in turn. Without the @@
symbol, it would switch to the next line as soon as the last variable was filled (which by the way
prevented in the last example the value 'NY'  from ever being read). With the @@ symbol, SAS
will continue filling the variables until it reaches the end of the line.

3.3. Reading data from ASCII files


Reading data from ASCII files is more convenient than reading data line. Especially when the data
contains many observations, this cuts down on the size of the actual program. To read an ASCII file in a
SAS DATA step use the infile statement before the INPUT statement. The structure of the INPUT
statement does not change. It is exactly as if you were to copy the contents of the ASCII file inside the
DATA step after the DATALINES statement.

There are two versions of using the INFILE statement. You can specify expicitly in the INFILE
statement which file to read:

data readasc;
  infile 'D:\Research\Toxin\Data1998\DON.txt';
  input location truck probe DON;
run;

or you can use the FILENAME statement. The FILENAME statement associates a physical file
with a short-cut name inside a SAS program. The FILENAME statement appears outside the
DATA step. For example

filename toxic 'D:\Research\Toxin\Data1998\DON.txt';


data readasc;
  infile toxic;
  input location truck probe DON;
run;

associates the file D:\Research\Toxin\Data1998\DON.txt with the short-cut name TOXIC. The
INFILE statement inside the DATA step then refers only to the short-cut. The short-cut in SAS
lingo is known as a fileref. If the columns of the ASCII file are not delimited by blanks (MSExcel
files saved as *.prn text files are comma delimited, for example) you can add a delimiter option to
the INFILE statement:

filename toxic 'D:\Research\Toxin\Data1998\DON.txt';


data readasc;
  infile toxic delimiter=',';
  input location truck probe DON;
run;

Many files contain header information which you do not want to make part of the SAS
data set. To skip a records at the top of the file use the FIRSTOBS= option. To read only
a specified number of records use the OBS= option. For example,

filename toxic 'D:\Research\Toxin\Data1998\DON.txt';


data readasc;
  infile toxic delimiter=',' firstobs=12 obs=100;
  input location truck probe DON;
run;

reads 100 - (12) + 1 = 89 observations from the file D:\Research\Toxin\Data1998\DON.txt


beginning with the 12th line of the file and looks for commas to delimit variables.

3.4. Reading dBase files

Although dBase has lost importance as a database programming environment, the dBase data
format is still very common and many packages including spreadsheets can read and write data
in dBase format. A dBase file can easily be transported into a SAS data set. The procedure DBF
is especially designed for this purpose. However, this procedure is not documented in the SAS
help files and thus a well-kept secret. Its usage is rather simple however. (Remark: This page
was written before 2005 and proc dbf is now documented in SAS. 07/2007)  Assume a dBase file
mydata.dbf resides in C:\Temp. You want to convert it into a SAS data set named dbdata. The
statements

filename inf 'C:\Temp\mydata.dbf';

proc dbf db4=inf out=dbdata;


run;

will do the trick. If the dBase format is not compatible with dBaseIV, e.g. dBaseIII+, use db3=
instead of db4=. If you use dBaseV format replace db4= with db5=.

3.5. Creating a permanent data set

By default all data sets created during a SAS session are temporary data sets and are deleted
when you close SAS. While this is not a real problem for small data sets that you can recreate on
the fly in the program, it is inconvenient, if you work with large data sets or your data sets require
a lot of manipulation before they are in the form that can be used by a certain procedure. A
permanent data set on the other hand is a data set that will not be deleted when SAS is exited. It
is available directly when a new SAS session is started. To understand how permanent data sets
are created and work, let's revisit a simple example used earlier. When executing the DATA step

data XYZ;
  input x y z;
  datalines;
1  1  12.4
1  2  11.3
1  3   1.4
2  1   2.1
2  2  19.4
2  3  10.0
;
run;

SAS will write to the LOG window

202  data XYZ;


203    input x y z;
204    datalines;

NOTE: The data set WORK.XYZ has 6 observations and 3 variables.


NOTE: The DATA statement used 0.26 seconds.
 

211  ;
212  run;
213

Although the data set was created with name XYZ, SAS created a data set named WORK.XYZ.
The name in front of the period (WORK) is the name of a library which can contain many data
sets (among other things, it can contain programs, graphs, etc.). All data sets associated with the
library WORK are deleted at the end of the SAS session (they are temporary). To prevent this
and thereby create a permanent data set, simply use a different library name to create a data
set. Before you can use a library name, however, you have to associate the library with a
directory accessible from your computer. This is accomplished with the LIBNAME statement. For
example

libname mylib 'D:\Research\Data\1996';

associates the directory D:\Research\Data\1996 with the name mylib. mylib is known as a libref
(a library reference) similar to file references created with the FILENAME statement. Once a libref
is defined, you can create data sets in that library (=that directory) by preceding the data set
name with the libref and delimiting the two with a period. For example,

libname mylib 'C:\Research\Data\Manure';

data mylib.XYZ;
  input x y z;
  datalines;
1  1  12.4
1  2  11.3
1  3   1.4
2  1   2.1
2  2  19.4
2  3  10.0
;
run;
will create a permanent data set and store it in the directory C:\Research\Data\Manure'. The
name of the physical file stored in the directory will be xyz.sd2. When you start a new SAS
session, the data can be accessed directly as soon as a libref pointing to the particular directory
exists. The name of the libref can be different from the name you used when creating the
permanent data set. For example if a new session is started,

libname what 'C:\Research\Data\Manure';

proc print data=what.xyz;


run;

will directly access the permanent data set. By the way, you may find that SAS data sets are
often much more compact than, e.g., spreadsheet or database files. Storing your data in SAS
form is a very economical and convenient way.

3.6. Reading Excel files into SAS

MSExcel has become a very common format for exchanging data. Although a spreadsheet
application Excel data can be converted to SAS data relatively easy. There are three ways this
can be accomplished (there may be more, but these are the ones I know off)

3.6.1.  Using the Import facility

The Import facility is accessed through the File menu. It starts a wizard that takes you through the
process of converting the Excel spreadsheet into SAS. While Excel can be running at the time of
import, the conversion will not work, if the spreadsheet that you wish to convert is open. The
easiest conversion is done if the spreadsheet is organized in variables and observations similar to
a SAS data set with the data to be converted in the top-most worksheet and the variable names
in the first row. When I convert Excel files, I make sure that the column names in Excel already
comply to SAS rules for naming variables (no more than eight characters, start with a letter, no
embedded blanks). SAS has rules for converting non-complying column names into variable
names. Once the import is completed, print the SAS data set with PROC PRINT to make
When the Import wizard asks for a member, enter the SAS data set name you wish to assign to
the data. sure all variable names and observations have been converted correctly.

Important: When selecting the file format to be converted version 6.12 of SAS will indicate that it
can convert Excel5.0 and Excel7.0 spreadsheets. Spreadsheets created with Excel97 will not
convert properly. Make sure that spreadsheets are saved in Excel5.0 (=Excel95) format.

3.6.2.  Using PROC ACCESS

Of the three methods to read Excel data this is no doubt the most flexible, but also the most
cryptic method. In order for it to work SAS/ACCESS must be installed on your system. To see
whether it is, execute

proc access;
run;

If PROC ACCESS is installed, a window will appear.

To convert an Excel spreadsheet into a SAS data set on the fly in your program, you can use
syntax such as the following. It requires that a libref is assigned to which interim data sets are
written.
proc access dbms=xls;
   create convert.xlsa.access;
   path = 'C:\Courses\Independent Study\Data\Soyref95.xls';
   scantype  = yes;
   worksheet = REF95;
   getnames  = yes;
   assign    = yes;
   unique    = yes;
   create Convert.xlsv.view;
   select all;
run;

data Soybean;
  set convert.xlsv;
run;

proc datasets library=convert;


  delete xlsv / memtype=view;
  delete xlsa / memtype=access;
run;
quit;

proc print data=soybean;


run;

This code reads the Excel spreadsheet C:\Courses\Independent Study\Data\Soyref95.xls and


converts it into the SAS data set SOYBEAN. The data is stored on worksheet REF95. The
getnames=, assign=, and unique= statements of PROC ACCESS tell SAS to look up variable
names from column names in the spreadsheet which are stored in the first row, and to keep those
variable names unique, even if two column names in Excel (if truncated to eight characters)
would be the same. The PROC DATASETS code deletes the two intermediate files created by
SAS in the conversion process.

3.6.3. Using Dynamic Data Exchange (DDE)

DDE is used to dynamically exchange data between applications. You can use it not only to read
data from Excel into SAS, but also to populate an Excel spreadsheet with information stored in a
SAS data set. For saving data in Excel format, I find the Export wizard (File-Export menu)Import
facility, Excel must be running and the spreadsheet from which to read data must be open. To
read the data converted in the PROC ACCESS example, use the following code: very helpful,
easy to use and quick. I am concentrating here on how to read data from Excel into SAS
via DDE. In contrast to import with the

filename soy dde 'Excel|C:\Courses\Independent Study\Data\[SOYREF95.XLS]REF95!


R2C2:R229C10';
data bean;
  infile soy missover;
  input labid $ month reps moisture abn dead abn1 abn2 abn3;
run;
proc print data=bean; run;

The string after dde in the filename statement is called the DDE triplet consisting of application |
topic ! item.
The specific form of the DDE triplet differs from application to application and may be hard to
figure out. Fortunately, there is a simple trick to get the DDE triplet. In Excel, highlight the rows
and columns you want to transport to SAS and copy them to the clipboard (Ctrl-C). Then go into
SAS and choose Solutions-DDE Triplet from the menu. The dialog box that appears contains the
DDE triplet. Just copy it from the dialog to the filename statement (Ctrl-C works here too). Notice
that the triplet begins reading the spreadsheet in row 2, since the first row of the spreadsheet
contains the column names. The MISSOVER option on the INFILE statement prevents SAS from
going to a new input line if it does not find values in the current line for  all the INPUT statement
variables, that is some of the values are missing. With the MISSOVER option, when an INPUT
statement reaches the end of the current record, values that are expected but not found are set to
missing.

You might also like