0% found this document useful (0 votes)
634 views

SAS Training - 101

Agenda Module 1 Introduction to SAS Getting / Extracting Data in / from SAS Working with the data Module 2 Introduction to SAS Proc Statements Combining and Modifying SAS Datasets Module 3 Proc SQL Arrays / DO-END Retain / First. Last. Agenda - Module 3 Getting Started With SAS Interactive windows enable interface with SAS Navigating SAS Windowing Environment Execute the SAS Program program Contains reports Contains reports generated by generated by SAS procedures SAS procedures and data steps and
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
634 views

SAS Training - 101

Agenda Module 1 Introduction to SAS Getting / Extracting Data in / from SAS Working with the data Module 2 Introduction to SAS Proc Statements Combining and Modifying SAS Datasets Module 3 Proc SQL Arrays / DO-END Retain / First. Last. Agenda - Module 3 Getting Started With SAS Interactive windows enable interface with SAS Navigating SAS Windowing Environment Execute the SAS Program program Contains reports Contains reports generated by generated by SAS procedures SAS procedures and data steps and
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 119

SAS Training 101

2006, Cognizant Technology Solutions. All Rights Reserved. The information contained herein is subject to change without notice.

Agenda
Module 1
Introduction to SAS Getting/Extracting Data in/from SAS Working with the Data

Module 2
Introduction to SAS Proc Statements Combining and Modifying SAS Datasets

Module 3
Proc SQL Arrays / DO-END Retain / First. Last.

Agenda Module 1
Introduction to SAS
Getting Started with SAS environment The two parts of a SAS program Reading the SAS Log SAS Dataset

Getting/Extracting Data in/from SAS


SAS Data Libraries Importing Data Exporting Data

Working with the Data


Data Step OPTIONS Using IF-THEN Statements Using RETAIN and SUM Statements PROC PRINT and PROC CONTENTS

What is SAS ?

A programming environment and language for data manipulation and analysis

Data Warehousing - Easily access, manage and analyze data from many sources

Analytical Solutions - From simple to advanced statistics

Business Solutions - Manages and reports on data from many sources

Getting Started With SAS


Interactive windows enable interface with SAS

Navigating SAS Windowing Environment

Execute the Execute the SAS SAS Program Program

Contains reports Contains reports generated by generated by SAS procedures SAS procedures and DATA steps and DATA steps

View SAS View SAS Datasets Datasets

Write Write Programs Programs

Contains information Contains information about the processing about the processing of this SAS program, of this SAS program, including warning including warning and error messages and error messages

Contains reports Contains reports generated by generated by SAS procedures SAS procedures and DATA steps and DATA steps

Exploring SAS Libraries


Select the Explorer tab in the SAS window bar to open the Explorer window

Functionality of the SAS explorer is similar to explorers for window-based systems Select view explorer

Expand and collapse directories on the left. Drill-down and open specific files in the right Right-click on a SAS dataset and select properties Provides general information about the dataset

Double click on the dataset to open it in VIEWTABLE window Can be used to edit datasets, create datasets and customize view of a SAS dataset

Running a SAS Program


Select file Open or Click on and select the file D:\Projects\......Click or Select run submit* to submit the program for execution
Open a SAS Program
Enhanced Editor

on

Access and edit existing SAS programs Write new SAS programs Submit SAS programs Save SAS programs to a file

* Programs can also be executed without opening them in the SAS environment using batch submit

LOG and OUTPUT windows


Log and output windows are open by default. These can also be accessed by selecting window window Output respectively Log and

Log Window

Output Window

An audit trail of the SAS session


Contains programming statements as submitted Contains notes about Files read Records read Program execution and results Contains warning and error messages

Accumulates output in the order in which it is generated


Select Edit window Clear All to clear the contents of the

SAS Programs
A SAS program is a sequence of steps that the user submits for execution

Raw Data Data Step SAS Data Set Proc Step Output

SAS Data Set

Data steps are used to CREATE SAS datasets PROC steps are used to PROCESS SAS datasets

SAS Statements

SAS Syntax Rules

Usually begin with an identifying keyword Always end with a semicolon Statements that begin with /* and end with */ are treated as comments

SAS Statements can be upper/lower case One or more blanks or special characters can be used separate words They can begin and end in any column A single statement can span multiple lines Several statements can be on the same line

DATA and PROC steps

DATA steps
Begin with DATA statements Read and Modify data Create a SAS data

PROC steps
Begin with PROC statements Performs specific analysis or function Produces results or reports

PROC steps can create data sets A step ends when SAS encounters a new statement (DATA or PROC statement ) or RUN DATA step executes line by line

Debugging a SAS Program


When SAS encounters a syntax error, SAS identifies the error and writes the location and explanation of the error to the SAS log
Diagnosing and Correcting Syntax Errors

Syntax errors include Misspelled keywords Missing or invalid punctuation Invalid options

daat work.staff; infile raw-data-file; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59; run; proc print data=work.staff run;

Canceling Submitted Statements


Submitting a SAS Program That Contains Unbalanced Quotes

data work.staff; infile raw-data-file; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class JobTitle; var Salary; run;

To correct the problem in the Windows environment, click the break icon Select Cancel Submitted Statements in the Tasking Manager window and select ok

Open and submit the code where the closing quote for the INFILE statement is missing Submit the program and browse the SAS log There are no notes in the SAS log because all the SAS statements after the INFILE statement have become part of the quoted string

SAS Dataset
The data portion of a SAS dataset is a rectangular table of data values & descriptor portion is the header
SAS Data Sets:
Variable names

LastName TORRES LANGKAMM SMITH WAGSCHAL TOERMOEN

FirstName JAN SARAH MICHAEL NADJA JOCHEN

JobTitle Pilot Mechanic Mechanic Pilot Pilot

Salary 50000 80000

Variable Values

40000 77500 65000

Character values

Numeric values

Variables (Columns) : Correspond to fields of data, and each data column is named Observations (Rows) : Correspond to records or data lines

Variable Names and Values


Variable Names Can be 32 characters long Can be uppercase, lowercase or mixedcase. Variable names are not casesensitive. Must start with a letter or underscore. Subsequent characters can be letters, underscores or numeric digits (no special character) Examples Valid names: Data_5 bad cub2c3 Invalid names: Data 5 1bad count # 5 Variable Values Variable Types Character: Contain any value, letters, numbers, special characters, and blanks. Character values are stored with a length of 1 to 32,767 bytes Numeric: Stored as floating point numbers in 8 bytes of storage by default Date is stored as a numeric variable in SAS. Conversely, any numeric variable may be interpreted as a date. Internally, a date value is an integer which represents the number of days since January 1, 1960 SAS allows dates to be read and output in various format Commonly used ones are:

Stored Value 0 0

Format MMDDYY8. MMDDYY10.

Displayed Value 01/01/60 01/01/1960

today ( ) function returns the current date -1 DATE9. 31DEC1959 365 DDMMYY10. 31/12/1960 A date literal is specified as <formatted date> d e.g. 31DEC1959 d

Agenda Module 1
Introduction to SAS
Getting Started with SAS environment The two parts of a SAS program Reading the SAS Log SAS Dataset

Getting/Extracting Data in/from SAS


SAS Data Libraries Importing Data Exporting Data

Working with the Data


Data Step OPTIONS Using IF-THEN Statements Using RETAIN and SUM Statements PROC PRINT and PROC CONTENTS

SAS Data Libraries


A SAS data library is a collection of SAS files that are recognized as a unit by SAS SAS Data Library Sample
SAS File

Libname sample C:\mysasfiles;

SAS File SAS File

SAS data libraries are identified by assigning a library reference name On invoking SAS, one automatically has access to a temporary and a permanent SAS data library Work - Temporary library SAS user - Permanent library One can also create and access new permanent libraries The work library and its SAS data-files are deleted after the SAS session ends SAS datasets in permanent libraries are saved after the SAS session ends

Creating Data
data PS_AA_team; input NAME $ Age prior_work_ex $; datalines; Sayaji 40 30 30 . 20 . . . 30 . 20 20 20 Y Y Y Y N N N Y Y Y N N N

Datalines / Cards is used Default format of variable is numeric Missing value for numeric needs to be entered as . Default length for character variables is 8

Vikrant Yashjit Hita Tuhin Sharmila Aditi Shikha Anirban Lata Deepak Ambrish Vaibhav ; run;

Informat statement
General form of an informat:

$informat-namew.d

$ informat-name w . d

indicates a character format names the informat is an optional field width is the required delimiter optionally, specifies a decimal for numeric informats

Selected Informats
7. or 7.0 7.2 reads seven columns of numeric data. reads seven columns of numeric data and inserts a decimal point in the data value. reads five columns of character data and removes leading blanks. reads five columns of character data and preserves leading blanks. reads seven columns of numeric data and removes selected nonnumeric characters, such as dollar signs and commas. reads dates of the form 01/20/2000

$5.

$CHAR5.

COMMA7.

MMDDYY10.

Importing Data
List directed input - data must be separated by a delimiter; must read in all variables. In case of delimited data the data values are separated by a specially designated character called the delimiter. For example, in case of comma separated values, the comma separates individual data values from each other. Column input - data in fixed columns;must know where data starts and ends; can read in selected variables. In fixed format files the data values are placed at pre-specified column addresses in the data file. Informat - alternative to column input; most flexible; must be used for special data Input data can have variable names as part of the data values. In case if the data values have the names of the variables specified in the top most row of the file, then one can use PROC IMPORT;
Fixed Format Names Available Raw Data PROC IMPORT (Use Wizard) Delimited PROC IMPORT

INFILE/INPUT INFILE / INPUT @ signifies the start of the data value DLM OPTION

Importing Data (Fixed Format / Delimited)


Raw Files
Infile X:\raw-file" LRECL = <length-of-observation> MISSOVER; Input @<start-of-var1> @<start-of-var1> . . @<start-of-var1> ; var3 $<length-of-var3>. var1 var2 <length-of-var1>. <length-of-var2>.

To read a fixed file format raw file, one need to know the exact position from where each of the variables start and length of the variable For all char variable $ symbol is used while declaring its length If no $ symbol is used that variable by default is taken as numeric The MISSOVER option prevents SAS from loading a new record when the end of the current record is reached. If SAS reaches the end of the row without finding values for all fields, variables without values are set to missing. FIRSTOBS = option tells SAS what line to begin reading data OBS = specifies number of observations to be read DLM = specifies the delimiter used

Example:
Convert a fixed format file (YYY.txt) to SAS Dataset.
Start
1 10 40 65 85

End
9 39 64 84 86 89 99

Length
9 30 25 20 2 3 10

Type
Num Char Char Char Char Char Num

Variable
DOCID Spec STREET CITY STATE ZIP PHONE

Description
Doctor ID Speciality Address - Street Address - City Address - State Address - ZIP Telephone Number

t x . YYY f ot uoy aL t

87 90

data <dataset>; infile X:\YYY.txt" LRECL = 99 MISSOVER; input @1 @10 @40 @65 @85 @87 @90 run; DOCID SPEC STREET CITY STATE ZIP PHONE 9. $30. $25. $20. $2. $3. 10. ;

PROC IMPORT
General form of the IMPORT procedure
PROC IMPORT OUT=SAS-data-set DATAFILE=external-file-name DBMS=file-type; GETNAMES=YES; RUN;

Example Code
PROC IMPORT datafile='D:\fun\Ritesh Training\comp.csv' out=yyy DBMS=CSV REPLACE; GETNAMES=YES; RUN;

Delimited Text Files


PROC IMPORT with slight change can read the delimited file. General format is:
PROC IMPORT OUT=SAS-data-set DATAFILE=external-file-name DBMS=Delimiter REPLACE; GETNAMES=YES; RUN;

Example: Following code converts tab delimited file to SAS dataset


PROC IMPORT data = 'D:\fun\Ritesh Training\Broker comp file.txt' out=xxx DBMS=TAB REPLACE; GETNAMES=YES; run;

IMPORT Wizard
Wizard is the a SAS provided graphical interface to convert raw data file to SAS dataset. It can only convert Delimited and Excel files to SAS files.
Select the type of raw file which is to be imported Browse to the raw file

IMPORT Wizard
Enter the library name and name where you want to save SAS dataset Press Finish to convert raw file to SAS dataset

Import Wizard basically first generates PROC IMPORT code and then executes it. You can save the code that the wizard generates.

Exporting Data From SAS


SAS dataset can be converted into other file formats by using either proc export or the SAS export wizard

Example Code

The following code segment illustrates the use of the export procedure in SAS to output a file in the csv format. PROC EXPORT DATA= <Name of Dataset> OUTFILE= <Output Filename> DBMS=CSV REPLACE; RUN; Note: The output filename should be given under quotes with the full path

EXPORT wizard
SAS export wizard allows us to convert a SAS dataset into other file formats without having to write any code.

Step 1: Click on file and select Export Data

Step 2: Select the Data to be exported

EXPORT wizard
The SAS export wizard also allows us to save the corresponding proc export code Step 4: Specify the output filename and its location Step 3: Select the file format

Step 5: Enter the filename to save the code for export

Agenda Module 1
Introduction to SAS
Getting Started with SAS environment The two parts of a SAS program Reading the SAS Log SAS Dataset

Getting/Extracting Data in/from SAS


SAS Data Libraries Importing Data Exporting Data

Working with the Data


Data Step OPTIONS Using IF-THEN Statements Using RETAIN and SUM Statements PROC PRINT and PROC CONTENTS

Using SAS Data Set Options


SAS language has 3 types of options:
System options they have the most global influence (stay in effect for the duration of your job/session) and affect how SAS operates. They are issued when you invoke SAS or when you use OPTIONS statement Statement options they appear in individual statements and influence how SAS runs that particular DATA or PROC step. DATA=, for example, is a statement option telling SAS which dataset to use for a procedure Data set options they affect only how SAS reads or writes an individual data set. You can use data set options in DATA or PROC statements. Simply put the option between parenthesis directly following the data set name. example, KEEP = variable list , DROP = variable list, RENAME = (oldvar = newvar) FIRSTOBS = n IN = new_var_name

PUT/INPUT Statement
PUT Statement is used to convert variables from numeric to character and INPUT Statement is used for vice-versa

Character to Numeric newvar = INPUT (oldvar,informat);

Numeric to Character newvar = PUT (oldvar,informat);

Character to Numeric newB = INPUT (VarB,1.);

Numeric to Character newD = PUT (VarD,2.);

Using IF-THEN Statements


Basic form:
IF Condition THEN action; If model = Mustang Then Make = Ford;

You can use symbolic or mnemonic operators You may also use the IN operator to make comparisons

Symbolic Mnemonic = EQ <>, ^= NE > GT < LT >= GE <= LE

Example:
If Model IN (Corvette, Camaro) Then Make = Chevrolet;

Using IF-THEN Statements


Single IF-THEN statement can have only one action. To execute more than one action, add DO and END Example,
If Model = Mustang Then DO; Make = Ford Size = Compact End;

Alternatively use AND / OR Example,


If Model = Mustang and Year < 1975 Then Status = Classic;

Using IF-THEN-ELSE Statements


Basic form:
IF condition THEN action; ELSE IF condition THEN action; ELSE action;

Else is automatically executed for all observations failing to satisfy any of the previous IF statements Else statement is simply an IF-THEN statement with an ELSE tacked onto the front

Example
a from a survey of home improvements, containing owners name, description of work done and cost of improvement. Group the cost into High, Medium, Low.
ory y er n cabinet facelift bathroom addition paint exterior second floor 2000 11350 3910 75362.9

Dat

Greg Moll Luth Susa

e:
home_cost; le C:\Home_data.dat; t Owner $1-7 Description $9-33 Cost; Cost < 2000 Then CostGrp = low; if Cost < 10,000 Then CostGrp = medium; CostGrp = high;

Cod

Data Infi Inpu If Else Else Run;

Subsetting your data


Often you want to use some of the observations of the dataset and exclude the rest Use IF statement in a DATA step
Basic form: IF expression; Example:
If sex = f; If sex = m Then delete;

Use IF when it is easier to specify a condition for including observations Use DELETE when it is easier to specify a condition for excluding variables

Using RETAIN and SUM statements


When reading raw data, SAS sets the value of all variables equal to missing at the start of each iteration of the DATA step. With RETAIN statement a variable is assigned its value from the previous iterations of the DATA step
Basic form : RETAIN variables;
RETAIN variables initial-value;

A sum statement also retains values from previous iteration of the DATA step, but you use it for cases where you simple want to cumulatively add the value of an expression to a variable
Basic form: Variable + expression

Example
Data from base ball game containing the date the game was played, team played, hits and run for the game
6-19 6-20 7-1 7-2 7-4 7-5 Columbia Peaches Columbia Peaches Plains Peanuts Plains Peanuts Sacremento Sacremento 8 3 10 2 10 12 3 4 5 3 10 8

Team wants two additional variables cumulative number of runs for the season and maximum number of runs in a game to date.

Example (Contd)..

Data games; Infile C:\Games.dat; Input Month 1 Day 3-4 Team $6-25 Hits 27-28 Runs 30-31; RETAIN MaxRuns; MaxRuns = Max (MaxRuns, Runs); RunsToDate + Runs; Run;

Questions ??????

Agenda
Module 1
Introduction to SAS Getting/Extracting Data in/from SAS Working with the Data

Module 2
Introduction to SAS Proc Statements Combining and Modifying SAS Datasets

Module 3
Proc SQL Arrays / DO-END Retain / First. Last.

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

SAS Procedures
Start with the keyword PROC
Eg :
PROC CONTENTS DATA = Sales_force_team;

SAS will use the most recently created data if data option is not specified BY statement
required for only PROC SORT everywhere else SAS performs separate analysis for each combination of BY variables

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

PROC SORT
Default sorting is ascending Form of PROC SORT statement
PROC SORT Data = data-name; BY variable-1 variable-2 variable-3 variable-n; RUN;

NODUPKEY eliminates observation having same value for the BY variable


PROC SORT Data = data-name Out = data-name NODUPKEY ;

Sorting in descending
BY variable-1 DESCENDING variable-2 DESCENDING variable-3 ;

PROC SORT Example


data marine; input NAME $ FAMILY $ length ; datalines; beluga whale basking gray mako sperm dwarf whale humpback blue killer ; run; PROC SORT data = marine out = seasort NODUPKEY ; BY family DESCENDING length; whale shark shark whale shark whale shark shark . whale whale 15 40 30 50 12 60 .5 40 50 100 30

OUTPUT
Whales and Sharks Obs 1 2 3 4 5 6 7 8 9 10 Name humpback whale basking mako dwarf blue sperm gray killer beluga Family . shark shark shark shark whale whale whale whale whale Length 50.0 40.0 30.0 12.0 0.5 100.0 60.0 50.0 30.0 15.0

PROC PRINT data = seasort; TITLE Whales and Sharks; run;

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

PROC MEANS
Form of PROC MEANS statement
PROC MEANS Data = data-name BY variable-list; VAR variable-list; RUN ; options;

If PROC MEANS is used with no other option it gives number of non-missing values, mean, std, min and max for all variables

Writing summary statistic into a SAS dataset


PROC MEANS Data = zoo NOPRINT; VAR lions tigers bears; OUTPUT OUT = zoosum MEAN ( lions bears ) = Avglionwt Avgbearwt SUM ( tigers ) = Tottigerwt; RUN ;

PROC MEANS Example


data cake; input LastName $ 1-12 Age 13-14 PresentScore 16-17 TasteScore 19-20 Flavor $ 23-32 Layers 34 ; datalines; Orlando Ramey Goldston Roe Larsen Davis Strickland Nguyen 27 93 80 32 84 72 46 68 75 38 79 73 23 77 84 51 86 91 19 82 79 57 77 84 Vanilla Rum Vanilla Vanilla Chocolate Spice Chocolate Vanilla Chocolate Vanilla Chocolate 1 2 1 2 . 3 1 . 1 2 1 1 Chocolate Vanilla Chocolate Spice Chocolate Spice Chocolate Chocolate 2 1 2 2 2 2 1 1;

proc means data=cake n mean max min range std fw=8; var PresentScore TasteScore; title 'Summary of Presentation and Taste Scores'; run;

OUTPUT
Summary of Presentation and Taste Scores The MEANS Procedure
Variable PresentScore TasteScore N 20 20 Mean 76.150 81.350 Maximum 93.000 94.000 Minimum 56.000 72.000 Range 37.000 22.000 Std Dev 9.376 6.611

Hildenbrand 33 81 83 Byron Sanders Jaeger Davis Conrad Walters Rossburger Matthew Becker Anderson Merritt 62 72 87 26 56 79 43 66 74 28 69 75 69 85 94 55 67 72 28 78 81 42 81 92 36 62 83 27 87 85 62 73 84

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

PROC FREQ
Form of PROC FREQ statement
PROC FREQ Data = data-name options; BY variable-list; OUTPUT statistic-keyword(s) <OUT=SAS-data-set>; TABLES request(s) </ option(s)>; RUN ;

To do this Calculate separate frequency or cross-tabulation tables for each BY group Create an output data set that contains specified statistics

Use this statement BY OUTPUT

Specify frequency or cross-tabulation tables and request tests and measures of TABLES association

PROC FREQ Example


data color; input Region Eyes $ Hair $ Count @@; label eyes='Eye Color' region='Geographic Region'; datalines; 1 blue 1 blue fair dark 23 11 18 5 3 44 50 23 53 1 blue red 7 19 14 41 46 40 31 56 54 1 blue medium 24 7 34 40 21 6 37 42 13 hair='Hair Color'

1 green fair 1 green dark 1 brown medium 2 blue 2 blue fair dark

1 green red 1 brown fair 1 brown dark 2 blue 2 blue red black

1 green medium 1 brown red 1 brown black 2 blue medium

The TABLES statement requests three tables: Eyes and Hair frequencies Eyes by Hair cross-tabulation. OUT = creates FREQCNT data set that contains cross-tabulation table frequencies. OUTEXPECT stores expected cell frequencies SPARSE stores zero cell counts in FREQCNT

2 green fair 2 green dark 2 brown medium ;

2 green red 2 brown fair 2 brown dark

2 green medium 2 brown red 2 brown black

proc freq data=color; weight count; tables eyes hair eyes*hair/out=freqcnt outexpect sparse; title 'Eye and Hair Color of European Children'; run; proc print data=freqcnt noobs; title2 'Output Data Set from PROC FREQ;run;

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

PROC SUMMARY
Form of PROC SUMMARY statement
PROC SUMMARY <option(s)> <statistic-keyword(s)>; CLASS variable(s) </ option(s)>; VAR variable(s); OUTPUT <OUT=SAS-data-set><output-statistic-specification(s)> <id-group-specification(s)> <maximum-id-specification(s)> <minimum-id-specification(s)></ option(s)> ; RUN;

To do this Calculate separate frequency or crosstabulation tables for each BY group Create an output data set that contains specified statistics Grouping Variables List of variables needs to be summarized

Use this statement BY OUTPUT CLASS VAR

PROC SUMMARY Example


data color; input Region Eyes $ Hair $ Count @@; label eyes='Eye Color' region='Geographic Region'; datalines; 1 blue 1 blue fair dark 23 11 18 5 3 44 50 23 53 1 blue red 7 19 14 41 46 40 31 56 54 1 blue medium 24 7 34 40 21 6 37 42 13 hair='Hair Color'

1 green fair 1 green dark 1 brown medium 2 blue 2 blue fair dark

1 green red 1 brown fair 1 brown dark 2 blue 2 blue red black

1 green medium 1 brown red 1 brown black 2 blue medium

2 green fair 2 green dark 2 brown medium ;

2 green red 2 brown fair 2 brown dark

2 green medium 2 brown red 2 brown black

proc summary data=color; class eyes hair; var count; Output out = Summary run; (drop=_freq_) sum=;

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

Changing observations to variables using PROC TRANSPOSE


Used to transpose SAS datasets (turning observations into variables or variables into observations) Basic form
PROC TRANSPOSE DATA = oldname OUT = newname; BY variable-list; ID variable; VAR variable-list;

To do this Used if you have any grouping variables that you want to retain as variables. These variables are included in transposed data set, but are not themselves transposed Names the variables whose formatted values will become new variable names. In absence of an ID statement, the new variables will be named COL1, COL2, and so on Names the variables whose values you want to transpose

Use this statement BY

ID

VAR

PROC TRANSPOSE Example


data color; input Region Eyes $ Hair $ Count @@; label eyes='Eye Color' region='Geographic Region'; datalines; 1 blue 1 blue fair dark 23 11 18 5 3 44 50 23 53 1 blue red 7 19 14 41 46 40 31 56 54 1 blue medium 24 7 34 40 21 6 37 42 13 hair='Hair Color'

1 green fair 1 green dark 1 brown medium 2 blue 2 blue fair dark

1 green red 1 brown fair 1 brown dark 2 blue 2 blue red black

1 green medium 1 brown red 1 brown black 2 blue medium

2 green fair 2 green dark 2 brown medium ;

2 green red 2 brown fair 2 brown dark

2 green medium 2 brown red 2 brown black

proc transpose data=color out = transpose; by eyes hair; id Region; var count; run;

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

Using SET Statement


To read a SAS data set - start with DATA statement specifying the name of the new SAS data set. Then follow with the SET statement specifying the name of the old SAS dataset you want to read

DATA new-data-set;
SET data-set;

To stack data sets (appending) With two or more datasets (that have all or most of the same variables but different observations), in addition to reading the data, the SET statement concatenates the datasets one on top of the other

DATA new-data-set; SET data-set-1 data-set-n;

Interleaving data sets using SET Statement


The datasets you want to stack are already sorted by some important variable Simple stacking would result in unsorting Option 1 Do a simple stacking and then use Proc SORT Recommended Option Use a BY statement with your SET statement

DATA new-data-set;
SET data-set-1 data-set-n; BY variable-list;

Before you can use the BY statement, the datasets must be sorted by the BY variables

Agenda Module 2
Introduction to SAS Proc Statements
Proc Sort Proc Means Proc Freq Proc Summary Proc Transpose

Combining and Modifying SAS Datasets


Set statement Merge statement

One to One Match Merge


First sort all datasets by the common variable(s) Basic form

DATA new-data-set;
MERGE data-set-1 data-set-n; BY variable-list;

If the datasets being merged have variables with same names (besides the BY variables), then the variables from the second dataset will overwrite any variables having the same name in the first data set. All observations from both the data sets are included in the final data set, irrespective of whether they had a match or not

One to Many Match Merge


Each observation in dataset 1 matches with more than one observation in dataset 2 Basic form

DATA new-data-set;
MERGE data-set-1 data-set-n; BY variable-list;

The order of the datasets in the MERGE statement does not matter to SAS, i.e., a one to many merge is same as many to one merge One to many merge cannot be done without a BY statement. Without any BY variables for matching, SAS simply joins together the first observation from each data set, then the second observation from each data set and so on.

Various ways of merging data sets


We merge the data with certain conditions like: having the data in one file only, the data common to all datasets, the data in one file not present in other Basic form
DATA new-data-set; MERGE data-set-1 (in = a) data-set-2 (in = b); BY variable-list; IF condition..;

Various Conditions used while merging data sets are:


IF a or b: Union of two datasets IF a and b: Intersection of two datasets IF a and not b: Data in one file not present in other

Merging Summary statistics with the original data


Say, you want to compare each observation in a group to the groups mean Summarize your data using PROC MEANS and write the results in a new dataset Merge the summarized data back with the original data using a one-to-many match merge

Combining a grand total with the original data


MERGE cannot be used as there are no common variables. You can use two SET statements
DATA new-data-set; IF _N_ = 1 THEN SET summary-data-set; SET original-data-set;

Original-dataset is the data with more than one observation and summary data set is the data with a single observation. SAS reads original data set in a normal SET statement. It also reads the summary data set with the SET statement but only in the first iteration of the data step and then retains the value of variables from summary dataset for all observations in new data set

Using SAS Data Set Options


SAS language has 3 types of options:
System options they have the most global influence (stay in effect for the duration of your job/session) and affect how SAS operates. They are issued when you invoke SAS or when you use OPTIONS statement Statement options they appear in individual statements and influence how SAS runs that particular DATA or PROC step. DATA=, for example, is a statement option telling SAS which dataset to use for a procedure Data set options they affect only how SAS reads or writes an individual data set. You can use data set options in DATA or PROC statements. Simply put the option between parenthesis directly following the data set name. Example KEEP = variable list , DROP = variable list, RENAME = (oldvar = newvar) FIRSTOBS = n IN = new-var-name

Tracking and selecting observations with the IN = Option


Can be used while combining two datasets, to track which of the original data sets contributed to each observation Unlike most variables, IN= variables are temporary, exiting only during the current DATA step SAS gives the IN= variables a value of 0 or 1 (1 implying that the dataset did contribute to the current observation and a value of 0 means that it did not)

Writing multiple data sets using the OUTPUT statement


To create multiple datasets in a single DATA step, simply put more than one data set name in your DATA statement Example
DATA lions tigers bears;

In the above example, SAS would create 3 identical data sets To create different datasets, use the OUTPUT statement Basic form
OUTPUT data-set-name;

Example
IF family = Ursidae then OUTPUT bears;

Making several observations from one using the OUTPUT statement


To write several observations for each pass through the DATA step, put an OUTPUT statement in a DO loop or just use several OUTPUT statements Example - Say we want to generate data points for plotting the equation y=x2
DATA generate; DO x = 1 to 6 Y = x ** 2; OUTPUT; END;

Since the OUTPUT statement is within the DO loop, an observation is created each time through the loop. Without the OUTPUT statement, SAS would have written only one observation at the end of the DATA step

Some useful functions used in SAS


To do certain modifications or changes to the observations of the data To extract certain portion of the data value
new_variable = SUBSTR (variable, starting text, length of text)

To check the length of values:


new_variable = LENGTH (variable)

To remove extra spaces within values:


new_variable = COMPRESS (variable)

To extract the data after some special characters like -, (, _ etc.


new_variable = SCAN (variable, position of special character, special character)

To extract month, year or day part of dates:


new_variable = MONTH (variable) or YEAR (variable)

When variable has both Date and Time i.e. 23Apr06 00:00:00, the date part is extracted using:
new_variable = DATEPART (variable)

Agenda
Module 1
Introduction to SAS Getting/Extracting Data in/from SAS Working with the Data

Module 2
Introduction to SAS Proc Statements Combining and Modifying SAS Datasets

Module 3
Proc SQL Arrays / DO-END Retain / First. Last.

Agenda Module 3
Proc SQL Arrays / DO-END Retain / First. Last.

PROC SQL What?


What can SQL do?
Selecting Ordering/sorting Subsetting Restructuring Creating table/view Joining/Merging Transforming variables Editing

PROC SQL Why?


The Advantage of using SQL
Combined functionality Faster for smaller tables SQL code is more portable for non-SAS applications Not require presorting Not require common variable names to join on. (need same type , length)

Selecting Data
PROC SQL; SELECT DISTINCT rating FROM MFE.MOVIES; QUIT;

The simplest SQL code, need 3 statements By default, it will print the resultant query, use NOPRINT option to suppress this feature. Begin with PROC SQL, end with QUIT; not RUN; Need at least one SELECT FROM statement DISTINCT is an option that removes duplicate rows

Ordering/Sorting Data
PROC SQL ; SELECT * FROM MFE.MOVIES ORDER BY category; QUIT;

Remember the placement of the SAS statements has no effect; so we can put the middle statement into 3 lines SELECT * means we select all variables from dataset MFE.MOVIES Put ORDER BY after FROM We sort the data by variable category

Sub-Setting Data Character searching in WHERE


PROC SQL; SELECT title, category FROM MFE.MOVIES WHERE category CONTAINS 'Action'; QUIT;

Use comma (,) to separate selected variables CONTAINS in WHERE statement only for character variables Also try WHERE UPCASE(category) LIKE '%ACTION%'; Use wildcard char. Percent sign (%) with LIKE operator.

Sub-Setting Data Phonetic Matching in WHERE


PROC SQL; SELECT title, category, rating FROM MFE.MOVIES WHERE category =* 'Drana'; QUIT;

Always Put WHERE after FROM Sounds like operator =* Search movie title for the phonetic variation of drama, also help possible spelling variations

Creating New Data Create Table


PROC SQL; CREATE TABLE ACTION AS SELECT title, category FROM MFE.MOVIES WHERE category CONTAINS 'Action'; QUIT;

CREATE TABLE AS can always be in front of SELECT FROM statement to build a sas file. In SELECT, the results of a query are converted to an output object (printing). Query results can also be stored as data. The CREATE TABLE statement creates a table with the results of a query. The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps. Produce a new dataset (table) ACTION in work directory, no printing

Join Tables (Merge datasets) Cartesian Join


PROC SQL; SELECT * FROM MFE.CUSTOMERS, MFE.MOVIES; QUIT;

Terminology: Join (Merge) datasets (tables) No prior sorting required one advantage over DATA MERGE Use comma (,) to separate two datasets in FROM Without WHERE, all possible combinations of rows from each tables is produced, all columns are included

Turn on the HTML result option for better display: Tool/Options/Preferences/Results/ chec Create HTML/OK

Transforming Data Summarizing Data using SQL functions


PROC SQL; SELECT *, COUNT(title) AS notitle, MAX(year) AS most_recent, MIN(year) AS earliest, SUM(length) AS total_length, NMISS(rating) AS nomissing FROM MFE.MOVIES GROUP BY rating; QUIT;

Simple summarization functions available All function can be operated in GROUPs

Agenda Module 3
Proc SQL Arrays / DO-END Retain / First. Last.

Array Processing
You can use arrays to simplify programs that
perform repetitive calculations create many variables with the same attributes read data rotate SAS data sets by making variables into observations or observations into variables compare variables perform a table lookup.

What Is a SAS Array?


An array in SAS provides a means for repetitively processing variables using a do-loop. Arrays are merely a convenient way of grouping variables, and do not persist beyond the data step in which they are used SAS arrays can be used for simple repetitive tasks, reshaping data sets, and remembering values from observation-to-observation Arrays can be used to allow some traditional matrix-style programming techniques to be used in the data step In short a SAS array
is a temporary grouping of SAS variables that are arranged in a particular order is identified by an array name exists only for the duration of the current DATA step is not a variable.

Each value in an array is


called an element identified by a subscript that represents the position of the element in the array.

Array Statement: Syntax


ARRAY name<fnelemg> <$> <<elements <(initial-values)>>; Examples:
array array array array array x x1-x3; check{5} _temporary_; miss{4} _temporary_ (9 9 99 9); dept $ dept1-dept4 ('Sales', Research', Training'); value{3}; * generates value1, value2 and value3;

All variables in an array must have the same type (numeric or character) An array name can't have the same name as a variable You must explicitly state the number of elements when using _temporary_; in other cases SAS figures it out from context, generating new variables if necessary.

What is a SAS Array?


Array name CONTRIB CONTRIB QTR1 QTR2 QTR3 QTR4

ID

First element CONTRIB{1}

Second element CONTRIB{2}

Third element CONTRIB{3}

Fourth element CONTRIB{4}

Array references

...

The ARRAY Statement


The ARRAY statement defines the elements in an array. These elements will be processed as a group. You refer to elements of the array by the array name and subscript.

ARRAY array-name {subscript} <$> <length> ARRAY array-name {subscript} <$> <length> <array-elements> <(initial-value-list)>; <array-elements> <(initial-value-list)>;

The ARRAY Statement


The ARRAY statement
must contain all numeric or all character elements must be used to define an array before the array name can be referenced creates variables if they do not already exist in the PDV is a compile-time statement.

Defining an Array
Write an ARRAY statement that defines the four quarterly contribution variables as elements of an array.

array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4;

CONTRIB CONTRIB

ID

QTR1

QTR2

QTR3

QTR4

First element

Second element

Third element

Fourth element

...

Defining an Array
Variables that are elements of an array need not have similar, related or numbered names.
array Contrib2{4} Q1 Qrtr2 ThrdQ Qtr4;

CONTRIB2 CONTRIB2

ID

Q1

QRTR2

THRDQ

QTR4

First element

Second element

Third element

Fourth element

...

Processing an Array
Array processing often occurs within DO loops. An iterative DO loop that processes an array has the following form:

DO index-variable=1 TO number-of-elements-in-array; DO index-variable=1 TO number-of-elements-in-array; additional SAS statements additional SAS statements using array-name{index-variable} using array-name{index-variable} END; END;

To execute the loop as many times as there are elements in the array, specify that the values of index-variable range from 1 to number-of-elements-in-array.

Processing an Array
array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end;

array reference Value of index variable Qtr 1 CONTRIB{1} QTR1

CONTRIB{QTR} CONTRIB{QTR}

2 CONTRIB{2} QTR2

3 CONTRIB{3} QTR3

4 CONTRIB{4} QTR4

First element

Second element

Third element

Fourth element ...

Performing Repetitive Calculations


data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run;

...

Performing Repetitive Calculations


data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run; Contrib{1}=Contrib{1}*1.25;

When

Qtr=1

Qtr1=Qtr1*1.25;

...

Performing Repetitive Calculations


data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run; Contrib{2}=Contrib{2}*1.25;

When

Qtr=2

Qtr2=Qtr2*1.25;

...

Performing Repetitive Calculations


data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run; Contrib{3}=Contrib{3}*1.25

When

Qtr=3

Qtr3=Qtr3*1.25;

...

Performing Repetitive Calculations


data charity(drop=Qtr); set prog2.donate; array Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4; do Qtr=1 to 4; Contrib{Qtr}=Contrib{Qtr}*1.25; end; run; Contrib{4}=Contrib{4}*1.25;

When

Qtr=4

Qtr4=Qtr4*1.25;

...

Performing Repetitive Calculations


proc print data=charity noobs; run;

ID E00224 E00367 E00441 E00587 E00598

Partial PROC PRINT Output


Qtr1 15.00 43.75 . 20.00 5.00 Qtr2 41.25 60.00 78.75 23.75 10.00 Qtr3 27.50 50.00 111.25 37.50 7.50 Qtr4 . 37.50 112.50 36.25 1.25

Creating Variables with Arrays

Calculate the percentage that each quarter's contribution represents of the employee's total annual contribution. Base the percentage only on the employee's actual contribution and ignore the company contributions. Partial Listing of prog2.donate

ID E00224 E00367

Qtr1 12 35

Qtr2 33 48

Qtr3 22 40

Qtr4 . 30

Creating Variables with Arrays


data percent(drop=Qtr); set prog2.donate; Total=sum(of Qtr1-Qtr4); array Contrib{4} Qtr1-Qtr4; array Percent{4}; do Qtr=1 to 4; Percent{Qtr}=Contrib{Qtr}/Total; end; run;

The second ARRAY statement creates four numeric variables: Percent1, Percent2, Percent3, and Percent4.

c07s3d1.sas

Creating Variables with Arrays


proc print data=percent noobs; var ID Percent1-Percent4; format Percent1-Percent4 percent6.; run;

Partial PROC PRINT Output


ID E00224 E00367 E00441 E00587 E00598 Percent1 18% 23% . 17% 21% Percent2 49% 31% 26% 20% 42% Percent3 33% 26% 37% 32% 32% Percent4 . 20% 37% 31% 5%

Creating Variables with Arrays


Calculate the difference in each employee's actual contribution from one quarter to the next. Partial Listing of prog2.donate

Second difference

ID E00224 E00367

Qtr1 12 35

Qtr2 33 48

Qtr3 22 40

Qtr4 . 30

First difference

Third difference ...

Creating Variables with Arrays


data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run;

c07s3d2.sas

Creating Variables with Arrays


data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run; Diff{1}=Contrib{2}-Contrib{1};

When

i=1

Diff1=Qtr2-Qtr1;

...

Creating Variables with Arrays


data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run; Diff{2}=Contrib{3}-Contrib{2};

When

i=2

Diff2=Qtr3-Qtr2;

...

Creating Variables with Arrays


data change(drop=i); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{3}; do i=1 to 3; Diff{i}=Contrib{i+1}-Contrib{i}; end; run; Diff{3}=Contrib{4}-Contrib{3};

When

i=3

Diff3=Qtr4-Qtr3;

...

Creating Variables with Arrays


proc print data=change noobs; var ID Diff1-Diff3; run;

Partial PROC PRINT Output


ID E00224 E00367 E00441 E00587 E00598 Diff1 21 13 . 3 4 Diff2 -11 -8 26 11 -2 Diff3 . -10 1 -1 -5

Assigning Initial Values


Determine the difference between employee contributions and last years average quarterly goals of $10, $15, $5, and $10 per employee.

data compare(drop=Qtr Goal1-Goal4); set prog2.donate; array Contrib{4} Qtr1-Qtr4; array Diff{4}; array Goal{4} Goal1-Goal4 (10,15,5,10); do Qtr=1 to 4; Diff{Qtr}=Contrib{Qtr}-Goal{Qtr}; end; run;

Assigning Initial Values


proc print data=compare noobs; var ID Diff1 Diff2 Diff3 Diff4; run;

Partial PROC PRINT Output


ID E00224 E00367 E00441 E00587 E00598 Diff1 2 25 . 6 -6 Diff2 18 33 48 4 -7 Diff3 17 35 84 25 1 Diff4 . 20 80 19 -9

Agenda Module 3
Proc SQL Arrays / DO-END Retain / First. Last.

Using SAS Automatic Variables


_N_ and _ERROR_
N_ indicates the number of times SAS has looped through the DATA step (not necessarily equal to the observation number) _ERROR_ has a value of 1 if there is a data error for that observation and 0 if there isnt

FIRST. variable and LAST. Variable


FIRST. variable and LAST. variable are available when using a BY statement in a DATA step. The FIRST. variable will have a value 1 when SAS is processing an observation with the first occurrence of a new value for that variable and a value of 0 for the other observations. Similarly for LAST. variable, value is 1 for an observation with the last occurrence of a value for that variable.

Use of Retain, first. and last.


The goal is to compare each observation with the previous and the next observation. If they are the same then flag the observation.
data real_life; input person topicA; cards; 1 0 1 1 3 -1 1 0 2 0 1 1 2 -1 2 -1 3 0 3 1 4 0 1 1 4 1 4 0 2 -1 4 0 4 0 1 -1 ; run;

.Using first.
We need to number the observations within each person. We will be using first. person in the process of doing this, so we must first sort the data on person. Then we will create the count variable which will enumerates the observations within each person.
proc sort data=real_life out=sort_real; by person; run; data count_real; set sort_real; retain count; by person; if first.person then count = 0; count = count + 1; run; proc print data=count_real noobs; run;

.Use of both first. and last.


We now convert the data set from long to wide. Note: We are using first. person and last. person but we do not need to resort the data since it is already sorted on person.
data wide_real; set count_real; array AtopicA(6) topicA_1-topicA_6; retain topicA_1-topicA_6; by person; if first.person then do; do i = 1 to 6; AtopicA[i] = .; end; end; AtopicA(count) = topicA; /*looping across values in the variable count*/ if last.person then output; /* outputs only the last obs per person */ run; proc print data=wide_real noobs; var person topicA_1-topicA_6; run;

.Use of both first. and last.


Now, let's find the people who have the same value for 3 observations in a row.

data three; set wide_real; array topic(6) topicA_1-topicA_6; do i = 2 to 5; if topic[i-1] ne . & topic[i] ne . & topic[i+1] ne . & topic[i]=topic[i-1] & topic[i]=topic[i+1] then flagA=1; end; if flagA=. then flagA=0; run; proc print data=three noobs; var person topicA_1-topicA_6 flagA; run;

Thank you !

2006, Cognizant Technology Solutions.

Confidential

17

You might also like