Base Five 08
Base Five 08
3. Numeric functions
5. Character functions
6. Global Statements
1
1. Overview of Operators in SAS
Definitions: A SAS operator is a symbol that represents a comparison, arithmetic
calculation, or logical operation; a SAS function; or grouping parentheses. SAS uses two
major kinds of operators:
prefix operators
infix operators.
An infix operator applies to the operands on each side of it, for example, 6<8. Infix
operators include the following:
arithmetic
comparison
logical, or Boolean
minimum
maximum
concatenation.
When used to perform arithmetic operations, the plus and minus signs are infix
operators.
2. Infix Operator
Arithmetic Operators
2
** exponentiation where y = a**2;
Comparison Operators
2.3 Others
2.3.1 IN Operator
The IN operator, which is a comparison operator, searches for character and numeric
values that are equal to one from a list of values. The list of values must be in
parentheses, with each character value in quotation marks and separated by either a
comma or blank.
For example, suppose you want all sites that are in North Carolina or Texas. You could
specify:
where state = 'NC' or state = 'TX';
However, the easier way would be to use the IN operator, which says you want any state
in the list:
where state in ('NC','TX');
In addition, you can use the NOT logical operator to exclude a list. For example,
3
where state not in ('CA', 'TN', 'MA');
Example:
data in_ex;
input store $ vcr_price vcd_price cd_player_price;
datalines;
future_shop 169.99 69.99 79.99
sony_store 179.99 64.99 84.99
radio_shack 159.99 64.99 69.99
three_d 174.99 67.49 74.99
electron 174.99 65.99 69.99
;
data in_ex_2;
set in_ex;
*where/if vcr_price not in (159.99, 169.99);
run;
You can combine the NOT logical operator with a fully-bounded range condition to select
observations that fall outside the range. Note that parentheses are required:
You can specify the limits of the range as constants or expressions. Any range you specify
is an inclusive range, so that a value equal to one of the limits of the range is within the
range. The general syntax for using BETWEEN-AND is:
WHERE variable BETWEEN value AND value;
For example:
where empnum between 500 and 1000;
where taxes between salary*0.30 and salary*0.50;
4
Also called Logical operators
Two comparisons with a common variable linked by AND (operator) can be condensed
with an implied AND. For example, the following two subsetting IF statements produce
the same result:
For example, in this DATA step, the value that results from the concatenation contains
blanks because the length of the COLOR variable is eight:
data namegame;
length color name $8 game $12;
color='black';
name='jack';
*game=color||name;
game=trim(color)||name;
*put game=;
run;
The value of GAME is 'black jack'. To correct this problem, use the TRIM
function in the concatenation operation as follows:
game=trim(color)||name; This statement produces a value of 'blackjack'
for the variable GAME
data _null_;
month='sep'; year=99;
date=trim(month) || left(put(year,8.));
/*PUT function to convert a numeric value to a character value*/
put date=; run;
3. Numeric functions
3.1 Mathematical:
The following is a brief summary of SAS functions useful for defining models.
ABS(x) the absolute value of x
5
COS(x) the cosine of x. x is in radians.
EXP(x) ex
LOG(x) the natural logarithm of x
LOG10(x) the log base ten of x
LOG2(x) the log base two of x
SIN(x) the sine of x. x is in radians.
SQRT(x) the square root of x
TAN(x) the tangent of x. x is in radians and is not an odd multiple of 2.
3.2 Statistical
Kurtosis: describes the “heaviness of the tails” of a distribution
Skewness: measure of the tendency for the distribution values to be more spread out on
one side that other
Max and Min:
Mean: average is the total of sum divided by the number of scores
Median: the point that corresponds to the value that lies in the middle of the distribution
or the value that divides a distribution exactly in half (3, 5, 8, 10, 11). It serves as a
valuable alternative to the mean in the specific situations: a) there are a few extreme
scores in the distribution, b) some values have undetermined values, c) when the data are
measured on an ordinal scale (see example below)
Range: max to min
Sum:
Variance: measure of variability
Std: standard deviation, the square root of the variance
Stderr: standard error of the mean, std dev/the square root of sample size
Example:
data income_1;
input region $ 1-8 income;
cards;
Edison 99800
Edison 109800
Edison 120000
Edison 96500
Edison 90550
Edison 115000
Edison 142500
Edison 73000
Edison 79850
Edison 55890
Edison 23000
Edison 19800
Edison 82000
Edison 76800
Edison 39800
Edison 22800
Edison 58650
6
;
run;
proc univariate data=income_1 plot normal;
var income;
run;
proc chart data =income_1;
vbar income/levels =8;
run;
data income_2;
input region $ 1-8 income;
cards;
Edison 11800
Edison 29800
Edison 15000
Edison 26500
Edison 39550
Edison 22000
Edison 62500
Edison 83000
Edison 29850
Edison 35890
Edison 53000
Edison 19800
Edison 72000
Edison 36800
Edison 39800
Edison 22800
Edison 58650
;
run;
proc univariate data=income_2 plot normal;
var income;
run;
proc chart data =income_2;
vbar income/levels =10;
run;
7
SAS time value: is a value representing the number of seconds since midnight of the
current day.
SAS datetime value: is a special value that combines both date and time information. A
SAS datetime value is stored as the number of seconds between midnight on January 1,
1960, and a given date and time.
Example:
options nodate pageno=1 linesize=80 pagesize=60;
data test;
Time1=86399;/*datetime value*/;
format Time1 datetime.;
Date1=86399;/*date value*/;
format Date1 date.;
Time2=86399;/*time value*/;
format Time2 timeampm.;
run;
proc print data=test;
title 'Same Number, Different SAS Values';
run;
Output:
Obs Time1 Date1 Time2
8
If dates in your external data sources or SAS program statements contain two-digit years,
you can determine which century prefix should be assigned to them by using the
YEARCUTOFF= system option. The YEARCUTOFF= system option specifies the first
year of the 100-year span that is used to determine the century of a two-digit year.
Before you use the YEARCUTOFF= system option, examine the dates in your data:
If the dates in your data fall within a 100-year span, you can use the
YEARCUTOFF= system option.
If the dates in your data do not fall within a 100-year span, you must either
convert the two-digit years to to four-digit years or use a DATA step with
conditional logic to assign the proper century prefix.
Once you've determined that the YEARCUTOFF= system option is appropriate for your
range of data, you can determine the setting to use. The best setting for YEARCUTOFF=
is a year just slightly lower than the lowest year in your data. For example, if you have
data in a range from 1921 to 1999, set YEARCUTOFF= to 1920, if that is not already
your system default. The result of setting YEARCUTOFF= to 1920 is that
SAS interprets all two-digit dates in the range of 20 through 99 as 1920 through
1999.
SAS interprets all two-digit dates in the range 00 through 19 as 2000 through
2019.
The following figure shows the span of years when the YEARCUTOFF= option is set to
a value of 1920. The 100-year span in this case is from 1920 to 2019.
9
The SAS System converts date, time and datetime values back and forth between
calendar dates and clock times with SAS language elements called formats and
informats. SAS uses formats and informats to interpret and display data in convenient
ways.
Formats present a value, recognized by SAS, such as a time or date value, as a
calendar date or clock time in a variety of lengths and notations, and provide
instructions for how to display a variable on output.
Informats read notations or a value, such as a clock time or a calendar date, which
may be in a variety of lengths, and then convert the data to a SAS date, time, or
datetime value and provides instructions for how to interpret data as it is read.
Informats can be specified using an Informat statement, or on the INPUT
command following a colon after the variable name.
They ALWAYS end with a period!
Example:
data aa;
length dob 8;
input @1 id 3.
@5 doa mmddyy8.
@14 dob mmddyy8.;
informat dob ddmmyy8. doa ddmmyy8.;
age = (doa-dob)/365.25;
age2 =int((doa-dob)/365.25);
datalines;
001 06/21/97 05/13/66
002 05/04/98 11/28/96
003 10/15/99 09/25/45
;
10
returns the number of boundaries of intervals of the given kind that lie between the two
date or datetime values.
data intk;
input dob mmddyy8.;
informat dob ddmmyy8.;
age_year= intck('year', dob, today());
age_month= intck('month', dob, today());
age_day = intck ('day', dob, today());
datalines;
06/17/97
;
- MDY
MDY converts a month, day, and year value into SAS date variable to compute the date.
Example:
data m_d_y;
input month day year;
date=mdy (month, day, year);
drop month day year;
format date mmddyy8.;
datalines;
11 25 98
03 20 04
;
-TODAY() Function
This function returns today’s date as a SAS date value from your computer’s system
clock.
data to;
input @1 jobid $ @6 date yymmdd10.;
a=today();
b=year(a)-year(date);
datalines;
A010 19970426
;
11
Special features in Base SAS Software allow users to declare a particular date or time as
a constant without having to know the number of days from January 1, 1960 and/or the
number of seconds since midnight. A date or time constant is declared by enclosing a date
or time in single quotes, followed by the letter D and/or T to signify either date or time or
DT to signify a datetime variable. For example:
X = ‘04JUL97’D will set the new variable X equal to the number of days between
January 1, 1960 and July 4, 1997.
Y = ’09:00’T sets the new variable Y to the number of seconds between midnight and 9
am.
Z = ‘04JUL97:12:00’DT sets the value of the variable Z to the number of seconds from
January
1, 1960 to noon on July 4, 1997.
data dt;
X = '04JUL97'D ;
Y = '09:00'T ;
Z = '04JUL97:12:00'DT ;
run;
5. Character functions
SAS software is rich in its assortment of functions that deal with character data. The class
of functions is sometimes called STRING functions. In this lecture, we demonstrate some
of the more useful string functions.
12
The function returns N characters, beginning at character number POSITION from the
string SOURCE.
- SOURCE—This is the larger or reference string. It can be a variable or a string of
characters.
- POSITION—This value is a positive integer and references the starting point to begin
reading the internal group of characters.
- N—This value is a positive integer and references the number of characters to read from
the starting point POSITION in the
field SOURCE.
Note that the SUBSTR function reads and writes Character variables only.
Applications
- right side application
Examples;
data subs;
input city $ ;
city_id= substr(city, 1, 3);
datalines;
VAN505
CAL408
OTT307
;
data subs2;
input dxcode $; /*input dxcode; why error/
dxcode1=substr(dxcode, 1, 3);
datalines;
5791
5792
5796
;
data subs3;
input longvar $15.;
cards;
19JAN1985215000
27NOV1993317500
11JUL1996376250
;
data subs4; set subs3;
day=substr(longvar,1,2);
month=substr(longvar,3,3);
year=substr(longvar,6,4);
homeval=substr(longvar,10,6);
Examples:
13
data auth;
input author $ 1-20 ;
datalines;
David F. Drak
David G. Hartwell
Paul S. Lovecraft
Horace V. Wadpole
Stuart D. Schiff
;
data auth2;
length first_name middle_name last_name $ 20;
set auth;
first_name=scan(author, 1, ' ');
middle_name=scan(author, 2, ' ');
last_name=scan(author, 3, ' ');
proc print data=auth2;
var first_name middle_name last_name;
title 'Break the Author's name into Three Parts';
run;
data aa;
input address1 $ 1-31 address2 $32-64;
cards;
450 Shepard Ave. TO ON M3C7C1 450, Shepard Ave., TO, ON, M3C7C1
;
run;
data ab;
set aa;
num1 =scan(address1, 1);
st_nam1 =scan(address1, 2);
city_nam1 =scan(address1, 4);
state_nam1 =scan(address1, 5);
p_code1 =scan(address1, 6);
run;
data ac;
set aa;
num2 =scan(address2, 1, ',');
st_nam2 =scan(address2, 2, ',');
city_nam2 =scan(address2, 3, ',');
state_nam2=scan(address2, 4, ',');
p_code2 =scan(address2, 5, ',');
run;
Example:
data up1;
infile datalines ;
input gender $ @@;
datalines;
f f m m ;
data up2;
14
set up1;
gender=upcase(gender);
run;
INDEXC(source,excerpt-1<,... excerpt-n>)
source specifies the character expression to search.
excerpt specifies the characters to search for in the character expression.
It searches source, from left to right, for the first occurrence of any character present in
the excerpts and returns the position in source of that character. If none of the characters
in excerpt-1 through excerpt-n in source are found, INDEXC returns a value of 0.
15
/* USE INDEXC to locate the first occurrence of *any*
character(s)
specified in the excerpt. This example helps us determine
if
STRING is a street name or part of an address containing
numbers
*/
DATA two;
INPUT string $25.;
IF INDEXC (string,'0123456789')> 0 then has_numbers=string;
ELSE no_numbers=string;
CARDS;
Box 101
Pine Street
;
INDEXW(source, excerpt)
source specifies the character expression to search.
excerpt specifies the string of characters to search for in the character expression. SAS
removes the leading and trailing blanks from excerpt.
The INDEXW function searches source, from left to right, for the first occurrence of
excerpt and returns the position in source of the substring's first character. If the substring
is not found in source, INDEXW returns a value of 0. If there are multiple occurrences of
the string, INDEXW returns only the position of the first occurrence.
data a2;
source = "Single cpu, Multi-CPU.";
location1 = find(source, "cpu");
16
location2 = find(source, "cpu", -999, "I");
put location1= /location2=;
run;
SAS results:
location1=8
location2=19
data test;
input (x1-x4) ($);
x5=' 5';
length new1 $40 new2-new4 $10 ;
new1=cat(of x1-x5);
new2=cats(of x1-x5);
new3=catt(x1,x2,x3,x4,x5);
new4=catx(',', of x1-x5);
keep new:;
datalines;
1 2 3 4
5 6 . 8
;
proc print;
var new1-new4;
run;
data test2
infile cards missover;
length first last $20;
input first $ last $ ;
datalines;
jone smith
john wayne
bill
phil hodge
;
run;
data test3;
set test2;
name = catx(", ", of last first );/*removes leading and trailing
blanks, and inserts separators*/
name1 = cat(of last first); /*without removing leading or trailing
blanks */
name2 = cats(of last first); /*leading or trailing blanks*/
name3 = catt(of last first); /*trailing blanks*/
newname1=trim(left(first))||' '||left(last);
newname2=catx(' ', first, last);
run;
proc print data = test3;
17
run;
5.9 Compress
A more general problem is to remove selected characters from a string. For example,
suppose you want to remove blanks, parentheses, and dashes from a phone number that
has been stored as a character value. The COMPRESS function can remove any number
of specified characters from a character variable. The program below uses the
COMPRESS function twice. The first time, to remove blanks from the string; the second
to remove blanks plus the other above mentioned characters. Here is the code:
data phone;
input phone $ 1-15;
phone1 = compress(phone);
phone2 = compress(phone,'(-) ');
datalines;
(908)235-4490
(201) 555-77 99
;
title "Listing of Data Set PHONE";
proc print data=phone noobs;
run;
5.10 Complb
This example will demonstrate how to convert multiple blanks to a single blank. Suppose
you have some names and addresses in a file. Some of the data entry clerks placed extra
spaces between the first and last names and in the address fields.
data multiple;
input name $20. address $30.;
name = compbl(name);
address = compbl(address);
datalines;
Ron Cody 89 Lazy Brook Road
;
proc print data=multiple noobs;
run;
18
data trans2;
length name $ 100;
name ='Mrs. Susan, Miss Debbie';
name1=tranwrd(name, "Mrs.", "Ms.");
name2=tranwrd(name, "Miss", "Ms.");
run;
6. Global Statements
Global statements can be specified anywhere in your SAS program, and they remain in
effect until changed.
FOOTNOTE for printing footnote lines at the bottom of each page
%INCLUDE for including files of SAS statements
LIBNAME for accessing SAS data libraries
FILENAME
OPTIONS for setting various SAS system options: pageno, center, linesize, pagesize,
date, nodate.
RUN for executing the preceding SAS statements
TITLE for printing title lines at the top of each page
19