SAS Programming 3 Advanced Techniques and Efficiencies
SAS Programming 3 Advanced Techniques and Efficiencies
SAS Programming 3:
Advanced Techniques and
Efficiencies
Course Notes
SAS® Programming 3: Advanced Techniques and Efficiencies Course Notes was developed by Linda
Jolley and Jane Stroupe. Additional contributions were made by Kay Alden, Brian Gayle, Alistair Horn,
Marjorie Lampton, Robert Ligtenberg, Linda Mitterling, Georg Morsing, Kent Reeve, and Jane Whitten.
Editing and production support was provided by the Curriculum Development and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
Copyright © 2010 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
Book code E1833, course code LWPRG3/PRG3, prepared date 16Sep2010. LWPRG3_003
ISBN 978-1-60764-748-5
For Your Information iii
Table of Contents
Prerequisites ................................................................................................................................. x
2.5 Controlling the Page Size and the Number of Available Buffers (Self-Study) .............. 2-53
Exercises.................................................................................................................. 5-18
Chapter 6 Using DATA Step Hash and Hiter Objects .......................................... 6-1
6.3 Loading a Hash Object with Data from a SAS Data Set................................................ 6-31
Exercises.................................................................................................................. 6-42
6.5 Using a Hash Object for Chained Lookups (Self-Study) ............................................... 6-67
Demonstration: Creating a List of Values ............................................................... 6-82
Exercises.................................................................................................................. 6-84
8.1 DATA Step Merges and SQL Procedure Joins ................................................................. 8-3
Demonstration: Using the DATA Step to Perform a Match-Merge........................... 8-7
Demonstration: Using a PROC SQL Join to Perform a Match-Merge ................... 8-10
Exercises (Optional) ................................................................................................ 8-17
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically ............................. 10-10
Exercises................................................................................................................ 10-46
10.4 Using FILE and PUT Statements to Create a SAS Program File ................................ 10-78
Demonstration: Using the DATA Step to Send E-Mail ......................................... 10-87
Exercises................................................................................................................ 10-90
Course Description
• This course is for SAS programmers who prepare data for analysis. The comparisons of manipulation
techniques and resource cost benefits are designed to help programmers choose the most appropriate
technique for their data situation.You will learn how to compare various SAS programming techniques
that enable you to
• control memory, I/O, and CPU resources
• create and use indexes
• combine data horizontally and vertically
• use hash and hiter DATA step component objects, arrays, and formats as lookup tables
• compress SAS data sets
• sample your SAS data sets.
To learn more…
For information on other courses in the curriculum, contact the SAS Education
Division at 1-800-333-7660, or send e-mail to [email protected]. You can also
find this information on the Web at support.sas.com/training/ as well as in the
Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this
Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to [email protected]. Customers outside the
USA, please contact your local SAS office.
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a
complete list of books and a convenient order form.
x For Your Information
Prerequisites
This course is not appropriate for beginning SAS software users. Before attending this course, you should
have at least nine months of SAS programming experience and should have completed the SAS®
Programming 2: Data Manipulation Techniques course. Specifically, you should be able to do the
following:
• understand your operating system file structures and perform basic operating system tasks
• understand programming logic concepts
• understand the compilation and execution process of the DATA step
• use different varieties of input to create SAS data sets from external files
• use SAS software to access SAS libraries
• create and use SAS date values
• read, concatenate, merge, match-merge, and interleave SAS data sets
• use the DROP=, KEEP=, and RENAME= data set options
• create multiple output data sets
• use array processing and DO loops to process data iteratively
• use SAS functions to perform data manipulation and transformations
Chapter 1 Introduction
Objectives
List the tasks in the SAS Programming 3 course.
Explain the naming convention that is used for the
course files.
Compare the three levels of exercises that are used
in the course.
Describe, at a high level, how data is used and stored
at Orion Star Sports & Outdoors.
Navigate to the Help facility.
4
1-4 Chapter 1 Introduction
Resource Utilization
As programmers, you want to perform these tasks
as efficiently as possible and optimize the use of the
following resources:
programmer time
I/O
CPU
memory
network bandwidth
Business Scenarios
The business scenarios are opportunities to compare
multiple techniques for performing the tasks.
For example:
Task: Table Lookups
Possible Techniques:
6
1.1 Course Logistics 1-5
Filename Conventions
p304d01x
p304a01
Code Type p304a02 Example:
p304a02s The SAS Programming 3
a Activity
course ID is p3, so
p304d01
d Demo p304d01 =
p304d02 SAS Programming 3,
e Exercise p304e01 Chapter 4, Demo 1.
s Solution p304e02
p304s01
p304s02
9
1-6 Chapter 1 Introduction
11
1.1 Course Logistics 1-7
12
13
1-8 Chapter 1 Introduction
1.02 Quiz
Start your SAS session.
Open the Help facility.
Determine the path to use to obtain information about
the SAS component objects.
15
SAS OnlineDoc
You can also obtain information from SAS OnlineDoc.
17
1.2 Measuring Efficiencies 1-9
Objectives
Identify the resources used by a SAS program.
Report computer resource usage using SAS system
options.
Interpret resource usage statistics in your operating
environment.
Benchmark resource usage.
20
21
1-10 Chapter 1 Introduction
22
1.2 Measuring Efficiencies 1-11
resources used
network memory
bandwidth
data storage
space
23
CPU is a measurement of the amount of time that the central processing unit
uses to perform requested tasks such as calculations, reading and writing
data, conditional and iterative logic, and so on.
Memory is the size of the work area required to hold executable program modules,
data, and buffers.
Data storage space is the amount of space that is required to store data on a disk or tape.
Programmer time is the amount of time required for the programmer to write and maintain
the program. This can be decreased through well-documented, best
programming practices.
Network bandwidth is the amount of data that can pass through a network interface over time.
This time can be minimized by performing as much of the subsetting and
summarizing as possible on the data host before transferring the results to
the local computer. The network bandwidth is heavily dependent on
network loads.
1-12 Chapter 1 Introduction
25
26
1.2 Measuring Efficiencies 1-13
Data Data
Space
6
CPU usage. 6
CPU
27 ...
For example, data file compression might decrease storage use but increase processing time when SAS
reads the compressed data.
I/O
28
Your Programs
29
You must decide which factors are the most important for improving resource usage at your site.
To make this decision, you must know the following:
• which resources are scarce or costly at your site
• how and when your programs will be used
• the type and volume of data that your programs will process
1.2 Measuring Efficiencies 1-15
System Load
SAS Environment
30
Environmental factors that affect the efficiency of SAS programs include the following:
System load the number of users or jobs sharing system resources, including
network bandwidth and network traffic
SAS environment which SAS software products are installed, how they were installed,
and which methods are available to run SAS programs at your site
Often, one or two resources constitute the limiting factor or bottleneck within an organizational
computing environment. Tuning can be used to shift dependence away from a constrained resource. By
tuning your SAS programs to use the more available resources, you might improve the performance.
1-16 Chapter 1 Introduction
32
33
1.2 Measuring Efficiencies 1-17
34
When you know the characteristics of your data, you can select the techniques that best suit those
characteristics.
36
1-18 Chapter 1 Introduction
Considering Trade-Offs
In this class, many tasks are performed using one or more
techniques.
To decide which technique is most efficient for a given
task, benchmark, or measure and compare, the resource
usage of each technique.
You should benchmark with the actual data to determine
which technique is the most efficient.
37
38 continued...
1.2 Measuring Efficiencies 1-19
39
41
1-20 Chapter 1 Introduction
STIMER
FULLSTIMER
43
There are four SAS system options that you can use to track and report on resource utilization:
STIMER tracks the CPU time used to perform a task (DATA or PROC step).
FULLSTIMER tracks usage of additional resources and divides CPU into system CPU
time and user CPU time. This option is ignored unless STIMER or
MEMRPT is in effect.
STATS writes information tracked by the above options to the SAS log. z/OS only
The availability, usage, and aliases of these options are specific to the operating environment.
1.2 Measuring Efficiencies 1-21
z/OS
STIMER» | NOSTIMER Invocation option only
FULLSTIMER B B B
STIMER ID BD BD
Use the OPTIONS procedure with the HOST option to determine the default settings of these
options at your site.
proc options host;
run;
1-22 Chapter 1 Introduction
Business Scenario
You should benchmark to determine the most efficient
technique for creating a new variable based on a
condition.
The following methods can be used:
IF-THEN with an assignment statement
45
1.07 Quiz
1. Open and submit p301a01a.
Record the user CPU: ____________
Exit SAS.
2. Start SAS.
Open and submit p301a01b.
Record the user CPU: ____________
Exit SAS.
3. Start SAS.
Open and submit p301a01c.
Record the user CPU: ____________
4. Which technique is most efficient?
In z/OS, record the CPU.
47
1.2 Measuring Efficiencies 1-23
p301a01a
options fullstimer;
data _null_;
length var $ 30;
retain var2-var50 0 var51-var100 'ABC';
do x=1 to 10000000;
var1=10000000*ranuni(x);
if var1>1000000 then var='Greater than 1,000,000';
if 500000<=var1<=1000000
then var='Between 500,000 and 1,000,000';
if 100000<=var1<500000 then var='Between 100,000 and 500,000';
if 10000<=var1<100000 then var='Between 10,000 and 100,000';
if 1000<=var1<10000 then var='Between 1,000 and 10,000';
if var1<1000 then var='Less than 1,000';
end;
run;
p301a01b
options fullstimer;
data _null_;
length var $ 30;
retain var2-var50 0 var51-var100 'ABC';
do x=1 to 10000000;
var1=10000000*ranuni(x);
if var1>1000000 then var='Greater than 1,000,000';
else if 500000<=var1<=1000000
then var='Between 500,000 and 1,000,000';
else if 100000<=var1<500000
then var='Between 100,000 and 500,000';
else if 10000<= var1<100000
then var='Between 10,000 and 100,000';
else if 1000<=var1<10000 then var='Between 1,000 and 10,000';
else if var1<1000 then var='Less than 1,000';
end;
run;
(Continued on the next page.)
1-24 Chapter 1 Introduction
p301a01c
options fullstimer;
data _null_;
length var $ 30;
retain var2-var50 0 var51-var100 'ABC';
do x=1 to 10000000;
var1=10000000*ranuni(x);
select;
when (var1>1000000) var='Greater than 1,000,000';
when (500000<=var1<=1000000)
var='Between 500,000 and 1,000,000';
when (100000<=var1<500000) var='Between 100,000 and 500,000';
when (10000<=var1<100000) var='Between 10,000 and 100,000';
when (1000<=var1<10000) var='Between 1,000 and 10,000';
when (var1<1000) var='Less than 1,000';
otherwise;
end;
end;
run;
1.2 Measuring Efficiencies 1-25
48 p301a01a
Real Time the amount of time spent to process the SAS job. (Real time is also referred
to as elapsed time.)
User CPU Time the CPU time spent to execute the SAS code as written by the user
System CPU Time the CPU time spent to perform operating system tasks (system overhead
tasks) that support the execution of SAS code
OS Memory the largest amount of memory that SAS requested from the operating
system during the step
Timestamp the date and time that the statistics were produced
1-26 Chapter 1 Introduction
49 p301a01a
SAS uses the getrusage() and times() UNIX system calls for your operating environment to obtain the
statistics presented with FULLSTIMER.
Different “flavors” of UNIX show different statistics. This log was obtained on a Solaris system.
Description of FULLSTIMER statistics in the UNIX operating environment:
Real Time the amount of time spent to process the SAS job. (Real time is also referred
to as elapsed time.)
User CPU Time the CPU time spent to execute the SAS code as written by the user
System CPU Time the CPU time spent to perform operating system tasks (system overhead
tasks) that support the execution of SAS code
OS Memory the largest amount of memory that SAS requested from the operating
system during the step
Timestamp the date and time that the statistics were produced
Page Faults the number of pages that SAS tried to access but were not in the main
memory and required I/O activity
Page Reclaims the number of pages that were accessed without I/O activity
(Continued on the next page.)
1.2 Measuring Efficiencies 1-27
Page Swaps the number of times that a SAS process was swapped out of main memory
Voluntary Context the number of times that the SAS process had to pause because of a
Switches resource constraint such as a disk drive
Involuntary the number of times that the operating system forced the SAS session to
Context Switches pause processing to enable other processes to run
Block Input the number of I/O operations that were performed to read the data into
Operations memory
Block Output the number of I/O operations that were performed to write the data to a file
Operations
1-28 Chapter 1 Introduction
50 p301a01a
CPU Time The actual time spent on the task. This number should be constant (within
.02 seconds) across repetitions of the same job.
Elapsed Time The wall-clock time required to complete the task. Because elapsed time
varies greatly for several runs of the same job due to differences in waiting
time caused by other tasks being performed by the CPU, it is not normally
used to benchmark programs.
EXCP Count The number of I/O operations required to transfer external data to and from
memory. EXCP is the acronym for EXecute Channel Program.
Task Memory The actual memory required for a task in kilobytes with breakdowns for
data and program storage. This number is stable for a given task.
Total Memory The memory required for all tasks in kilobytes. This session total is useful
for deciding the minimum region size required for the job to execute
successfully.
1.3 SAS DATA Step Processing 1-29
Objectives
List the attributes of a data set page and define how
it relates to the structure of SAS data sets.
Describe how SAS reads and writes data.
53
54
By default, SAS uses the minimum optimal page size for the operating environment.
1-30 Chapter 1 Introduction
55
The total number of bytes occupied by orion.sales_history can be calculated as shown below:
(16,384 * 18)=294,912 bytes
1.08 Quiz
Use one of the following to determine the page size
of the orion.customer_dim SAS data set:
the CONTENTS procedure
p301a02
57
1.3 SAS DATA Step Processing 1-31
64
When a raw data file is read with INFILE and INPUT statements, the following occur:
• A block of data is read into a buffer in memory. The size of each buffer is the block size of the input
raw data file. In Windows and UNIX, the data might be cached so that the data is copied from disk to
an area of memory managed by the operating environment before it is copied into the buffer managed
by SAS.
• Each record of the raw data file is copied into an input buffer.
• The data is converted from an external format to the SAS format using the instructions provided in the
INPUT statement and is stored in an area of SAS memory named the program data vector (PDV). Any
subsequent processing specified in the DATA step is performed on the values in the PDV.
• At the end of an iteration for the DATA step, the contents of the PDV are copied to an output buffer
in memory.
• After the buffer (or multiple buffers) is full, the data in the buffer is written to the output SAS data set
in one output operation.
• Sequential processing continues until the pointer reaches the end-of-file marker in the raw data file.
1-32 Chapter 1 Introduction
Caches memory
PDV
ID Gender Country Name
Output
SAS I/O
Data measured
here
71 ...
When a SAS data set is read with a SET or MERGE statement, the following occur:
• A page (or multiple pages) is read into a buffer (or multiple buffers) in memory. The size of each buffer
is the page size of the input SAS data set. In Windows and UNIX, the data might be cached so that the
data is copied from disk to an area of memory managed by the operating environment before it is
copied into the buffer managed by SAS.
• The data in the buffer is read sequentially and copied into the program data vector (PDV) one
observation at a time.
• Each observation of the new SAS data set is copied into a buffer (or multiple buffers). An observation
must fit entirely into the buffer or the observation is written to another buffer.
• After the buffer (or multiple buffers) is full, the data in the buffer is written to the output SAS data set
in one output operation.
• Sequential processing continues until the pointer reaches the end-of-file marker in the input SAS data
set.
1.3 SAS DATA Step Processing 1-33
Exercises
Level 1
1. Benchmarking
Open the program p301e01.sas (Windows or UNIX) or '.prg3.sascode(p301e01)' (z/OS).
Use best practices to benchmark the program, change it according to step 1.d, and determine which
resource(s) were conserved.
data order_fact;
infile 'order_fact.dat' pad; M
input @37 Order_Date date9. @; N
input @1 Customer_ID 12.
@13 Employee_ID 12.
@25 Street_ID 12.
@46 Delivery_Date date9.
@55 Order_ID 12.
@67 Order_Type 2.
@69 Product_ID 12.
@81 Quantity 4.
@90 Total_Retail_Price 13.
@105 CostPrice_Per_Unit 10.
@115 Discount 5.;
if year(Order_Date)=2006;
format Customer_ID Employee_ID Street_ID Order_ID
Product_ID 12. Order_Date Delivery_Date date9.
Order_Type 2. Quantity 4. Total_Retail_Price dollar13.2
CostPrice_Per_Unit dollar10.2 Discount Percent.;
run;
Notes about the syntax:
c PAD controls whether SAS pads the records that are read from an external file with
blanks to the length that is specified in the LRECL= option. The LRECL=
option specifies the logical record length; it is dependent on the operating
environment.
d @ holds an input record for the execution of the next INPUT statement within the
same iteration of the DATA step. This line-hold specifier is called a trailing @.
a. Turn on the appropriate options for reporting the resource statistics in the log.
b. Submit the program.
1-34 Chapter 1 Introduction
Level 2
Level 3
b. Can both the WORK and UTILLOC SAS system options be specified for the same SAS session?
(YES or NO)
c. Explain your answer to part b.
1-36 Chapter 1 Introduction
Chapter Review
1. What are the six resources consumed
by SAS programs?
75
1.5 Solutions 1-37
1.5 Solutions
Solutions to Exercises
1. Benchmarking
a. Turn on the appropriate options for reporting the resource statistics in the log.
options fullstimer;
data order_fact;
<additional SAS code>
run;
b. Submit the program.
c. Record the following resource utilizations:
1) User CPU Time:
2) I/O:
(not applicable on Windows)
3) User Memory:
d. Move the subsetting IF closer to the top of the DATA step. Make sure that you move the
subsetting IF to the appropriate location in the program.
data order_fact;
infile 'order_fact.dat' pad;
input @37 Order_Date date9. @;
if year(Order_Date)=2006;
input @1 Customer_ID 12.
@13 Employee_ID 12.
@25 Street_ID 12.
@46 Delivery_Date date9.
@55 Order_ID 12.
@67 Order_Type 2.
@69 Product_ID 12.
@81 Quantity 4.
@90 Total_Retail_Price 13.
@105 CostPrice_Per_Unit 10.
@115 Discount 5.;
format Customer_ID Employee_ID Street_ID Order_ID Product_ID 12.
Order_Date Delivery_Date date9.
Order_Type 2. Quantity 4. Total_Retail_Price dollar13.2
CostPrice_Per_Unit dollar10.2 Discount Percent.;
run;
Contents tab
Æ SAS Products
Æ Base SAS
Æ SAS 9.2 Language
Reference Dictionary
Æ Dictionary of
Component
Object Language
Elements
16
42
1.5 Solutions 1-41
p301a02
58
1-42 Chapter 1 Introduction
network bandwidth
CPU
Memory
I/O
76 continued...
77 continued...
1.5 Solutions 1-43
78
1-44 Chapter 1 Introduction
Chapter 2 Controlling I/O
Processing and Memory
2.5 Controlling the Page Size and the Number of Available Buffers (Self-Study) ........ 2-53
Objectives
Describe the importance of conserving I/O.
List techniques for reducing I/O.
I/O (Review)
SAS programs typically perform the following tasks:
reading data sets sequentially
4
2-4 Chapter 2 Controlling I/O Processing and Memory
Input
SAS Buffers
Data
* Caches memory
PDV
Output Buffers ID Gender Country Name
SAS
Data
I/O
measured
here
* Windows and UNIX Only
5
6
2.1 Controlling I/O 2-5
7 continued...
8
8
2-6 Chapter 2 Controlling I/O Processing and Memory
– WHERE statement
– WHERE= data set option
– OBS= and FIRSTOBS= data set options
Subsetting Data
Program 1: Subsetting in the Procedure
One way to create a subset is to use the WHERE
statement in a procedure.
data bonus;
set orion.staff;
YrEndBonus=Salary*0.05;
run;
proc means data=bonus mean sum;
where Job_Title contains 'Manager';
class Manager_ID;
var YrEndBonus;
run;
Subsetting Data
Program 2: Subsetting in the DATA Step
Because the DATA step is required to create the variable
YrEndBonus, it is more efficient to subset in the DATA
step.
data bonus(keep=Manager_ID YrEndBonus);
set orion.staff(keep=Job_Title Salary Manager_ID);
where Job_Title contains 'Manager';
YrEndBonus=Salary*0.05;
run; I/O savings result
proc means data=bonus mean sum; from reducing the
class Manager_ID; number of variables
var YrEndBonus; and observations
run; in the input and
output data sets.
The data set bonus contains two variables and 41
observations. p302d01
11
Because of the way that SAS reads data, the savings in the DATA step are when the data set
bonus is output. Fewer variables and observations are written, so more can go on a single data set
page.
memory
PDV
Job_ Salary Manager YrEnd
D D
Output Title _ID Bonus
Buffers
I/O
Data measured
Set here
14
15
2.1 Controlling I/O 2-9
Use indexes.
17
p302d02
18
2-10 Chapter 2 Controlling I/O Processing and Memory
p302d02
19
data prices;
infile 'prices.dat' dlm='*';
input Product_ID : 12. Start_Date : date9. End_Date : date9.
Unit_Cost_Price:dollar7.2 Unit_Sales_Price:dollar7.2;
run;
proc means data=prices(keep=Unit_Cost_Price Unit_Sales_Price);
var Unit_Cost_Price Unit_Sales_Price;
run;
p302d03
20
2.1 Controlling I/O 2-11
Business Scenario
Create reports using the PRINT, TABULATE, MEANS,
and FREQUENCY procedures against a single SAS data
set.
sasfile orion.customer_dim load;
SASFILE <libref.>member-name
<(password-data-set-option(s))>
OPEN | LOAD | CLOSE;
24
OPEN opens the file and allocates the buffers, but defers reading the data into
memory until a procedure or a statement that references the file is executed.
LOAD opens the file, allocates the buffers, and reads the data into memory.
Exercises
Level 1
c. Add the appropriate statement(s) to open and load the entire orion.organization_dim data set into
memory. At the end of the program, close the data set.
d. Submit the revised program.
e. Note the following resource utilizations:
1) User CPU Time:
2) I/O:
(not applicable on Windows)
3) User Memory:
f. Which resources were conserved?
Level 2
proc sql;
select Employee_Name,
sum(Qtr1, Qtr2, Qtr3, Qtr4) as Total_Contribution,
Recipients
from orion.employee_addresses as a,
orion.employee_donations as d
where a.Employee_ID=d.Employee_ID;
quit;
options nofullstimer;
2.1 Controlling I/O 2-15
b. Add the appropriate statement(s) to open and load both the orion.employee_addresses and
orion.employee_donations data sets into memory. At the end of the program, close the data sets.
c. Submit the revised program.
Level 3
b. Add the appropriate statement(s) to open and load the entire work.sales data set into memory, a
PROC APPEND step to append the temporary work.nonsales data set to the temporary
work.sales data set, and a PROC PRINT step to print the work.sales data set. At the end of the
program, close the data set.
c. Submit the revised program.
2-16 Chapter 2 Controlling I/O Processing and Memory
Objectives
List techniques to reduce data storage.
Describe how SAS stores numeric values.
Describe how to safely reduce the space required
to store numeric values in SAS data sets.
28
29
2.2 Controlling Data Set Size 2-17
30
+0.35298*(10**5)
Sign Mantissa Base Exponent
31
IBM mainframe 16 7 56
Log
445 data emps_short;
446 length Street_ID 6
447 Employee_ID Manager_ID 5
448 Street_Number Employee_Hire_Date
449 Employee_Term_Date Birth_Date
450 Salary 4
451 Dependents 3;
452 merge employee_addresses
453 employee_organization
454 employee_payroll
455 employee_phones;
456 by Employee_ID;
457 run;
WARNING: Multiple lengths were specified for the BY variable Employee_ID by input data sets and
LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This may cause unexpected results.
WARNING: Multiple lengths were specified for the variable Street_ID by input data set(s). This
may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Street_Number by input data set(s).
This may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Manager_ID by input data set(s). This
may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Salary by input data set(s). This may
cause truncation of data.
WARNING: Multiple lengths were specified for the variable Birth_Date by input data set(s). This
may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Employee_Hire_Date by input data
set(s). This may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Employee_Term_Date by input data
set(s). This may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Dependents by input data set(s). This
may cause truncation of data.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_ADDRESSES.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_ORGANIZATION.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_PAYROLL.
NOTE: There were 923 observations read from the data set WORK.EMPLOYEE_PHONES.
NOTE: The data set WORK.EMPS_SHORT has 923 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.18 seconds
cpu time 0.03 seconds
To decrease the length of all newly created numeric variables, you can use the DEFAULT= option in the
LENGTH statement:
data emps_short;
length default=4;
<additional SAS code>
run;
The length of a character variable is determined by the first reference that creates the variable
when the DATA step is compiled. In addition to the LENGTH statements, character variables can
be created by using assignment statements, format statements, and read statements, for example
SET, MERGE, and INPUT statements.
2-20 Chapter 2 Controlling I/O Processing and Memory
Comparing Results
To determine whether the data sets emps_short
and emps are equivalent, you can use the COMPARE
procedure.
p302d06
33
Task Statement
Compare two variables in the same data set. WITH and VAR
2.2 Controlling Data Set Size 2-21
Observation Summary
First Obs 1 1
Last Obs 923 923
NOTE: No unequal values were found. All values compared are exactly equal.
p302d06
34
35
The numbers are consecutive. For example, you can store numbers from -8192 to 8192 consecutively in
3 bytes on ASCII systems.
2-22 Chapter 2 Controlling I/O Processing and Memory
36
The numbers are consecutive. For example, you can store numbers from −256 continuously to 256 in
2 bytes on EBCDIC systems.
37
p302a01
38
In the same way that a decimal number system cannot store the fraction 1/3 exactly in a finite
number of digits, a binary number system (or multiple thereof, such as octal or hexadecimal)
cannot store the fraction 1/10 exactly in any finite number of digits.
2.02 Poll
Open the program p302a01 and submit it.
Look at the log.
Are the values of X and Y equal?
Yes
No
40
2-24 Chapter 2 Controlling I/O Processing and Memory
Numeric Precision
Partial SAS Log (Windows)
7 data test;
8 length x 4;
9 X=1/10;
10 Y=1/10;
11 run;
12
13 data _null_;
14 set test;
15 put X=;
16 put Y=;
17 run;
x=0.0999999642
y=0.1
NOTE: There were 1 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
42
Numeric Precision
Partial SAS Log (Windows)
120 data test;
121 length X 3;
122 X=8193;
123 run;
124
125 data _null_;
126 set test;
127 put X=;
128 run;
x=8192
NOTE: There were 1 observations read from the data set
WORK.TEST.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
44
45
2-26 Chapter 2 Controlling I/O Processing and Memory
Exercises
Level 1
Level 2
Level 3
title;
proc print noobs;
var Value MinLen;
run;
2-28 Chapter 2 Controlling I/O Processing and Memory
a. Run the program that is stored in p302e06 and examine the output.
b. Investigate the Help facility to determine why the minimum length for the number 8194 is less
than that of the number 8193 (Windows and UNIX) or why 272 is less than that of the number
271 (z/OS). The information can be found by following this path:
SAS Products Ö Base SAS Ö SAS Language References: Concepts Ö SAS System
Concepts Ö SAS Variables Ö Numeric Precision in SAS Software
2.3 Compressing SAS Data Sets 2-29
Objectives
Define the structure of a compressed SAS data file.
Create a compressed SAS data file.
List the advantages and disadvantages
of compression.
49
2.03 Poll
By default, the observations in a SAS data file have
varying lengths.
Yes
No
51
2-30 Chapter 2 Controlling I/O Processing and Memory
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
53
This is a visualization tool to help you understand how SAS data files are structured. SAS data
files are not actually stored in exactly this manner.
54 continued...
2.3 Compressing SAS Data Sets 2-31
55
24 | 12 | 24 O O Obs O
Page 40 bytes/ Obs b Obs b Obs b
* Obs 14 Obs 13 10
2 byte obs 16 s 12 s 9 s
OH OH 15 11 8
.
.
.
24 | 12 | 24 O
40 bytes/ b
Page byte obs * s Obs z
n OH OH y
* Unused space
56
This is a visual depiction of the storage used for a compressed SAS data file.
2-32 Chapter 2 Controlling I/O Processing and Memory
57 continued...
SAS data files, but not views, can be stored in compressed form.
58
Compressing a file reduces the number of bytes required to represent each observation. In a compressed
file, each observation is a variable-length record.
2.3 Compressing SAS Data Sets 2-33
59
2-34 Chapter 2 Controlling I/O Processing and Memory
60
CHAR | YES uses the run-length encoding (RLE) compression algorithm, which
compresses repeating consecutive bytes, such as trailing blanks or repeated
zeros
BINARY uses Ross Data Compression (RDC), which combines run-length encoding
and sliding window compression
The COMPRESS= data set option overrides the COMPRESS= system option.
The COMPRESS= options interact with two other system or data set options, POINTOBS= and
REUSE=. See “COMPRESS= Data Set Option” in the dictionary of SAS language elements in
SAS® Language Reference: Dictionary in the Base SAS documentation or use the online Help facility
for additional information about these interactions.
2.3 Compressing SAS Data Sets 2-35
61
2.04 Quiz
Open the program p302a02.
1. Change the data set name to empchar. Add the
COMPRESS=CHAR data set option to the DATA
step and submit the program.
By what percentage was the data set reduced or
increased?
2. Change the data set name to empbin. Add the
COMPRESS=BINARY data set option to the DATA
step and submit the program.
By what percentage was the data set reduced or
increased?
63
2-36 Chapter 2 Controlling I/O Processing and Memory
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_ADDRESSES.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_ORGANIZATION.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_PAYROLL.
NOTE: There were 923 observations read from the data set WORK.EMPLOYEE_PHONES.
NOTE: The data set WORK.EMPCHAR has 923 observations and 21 variables.
NOTE: Compressing data set WORK.EMPCHAR decreased size by 60.71 percent.
Compressed is 11 pages; un-compressed would require 28 pages.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
65
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_ADDRESSES.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_ORGANIZATION.
NOTE: There were 424 observations read from the data set WORK.EMPLOYEE_PAYROLL.
NOTE: There were 923 observations read from the data set WORK.EMPLOYEE_PHONES.
NOTE: The data set WORK.EMPBIN has 923 observations and 21 variables.
NOTE: Compressing data set WORK.EMPBIN decreased size by 57.14 percent.
Compressed is 12 pages; un-compressed would require 28 pages.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
67
2.3 Compressing SAS Data Sets 2-37
68
69
2-38 Chapter 2 Controlling I/O Processing and Memory
70
Using COMPRESS=BINARY
Ross Data Compression uses both run-length encoding
and sliding window compression.
A SAS data set has these variables:
Name Type Length
Answer1 Numeric 8
...
Answer200 Numeric 8
Using COMPRESS=BINARY
In Ross Data Compression form, the first observation
in the data file resembles this:
1 2 3 4 5 6 7 8 9
+ +
@ 1 # @ 2 # %
1 1
72
+
indicates the sign and exponent.
1
Compression Dependencies
Some data sets do not compress well or at all.
73
2-40 Chapter 2 Controlling I/O Processing and Memory
Compression Dependencies
SAS Log (Windows)
1 data orders(compress=yes);
2 set orion.orders;
3 run;
NOTE: There were 490 observations read from the data set ORION.ORDERS.
NOTE: The data set WORK.ORDERS has 490 observations and 6 variables.
NOTE: Compressing data set WORK.ORDERS decreased size by 0.00 percent.
Compressed is 7 pages; un-compressed would require 7 pages.
NOTE: DATA statement used (Total process time):
real time 1.04 seconds
cpu time 0.12 seconds
55 data orders(compress=binary);
56 set orion.orders;
57 run;
NOTE: There were 490 observations read from the data set ORION.ORDERS.
NOTE: The data set WORK.ORDERS has 490 observations and 6 variables.
NOTE: Compressing data set WORK.ORDERS increased size by 28.57 percent.
Compressed is 9 pages; un-compressed would require 7 pages.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds p302d08
74
Compression Dependencies
When you use the COMPRESS= data set option or the
COMPRESS= system option, SAS knows the following:
the size of the overhead introduced by compression
75
2.3 Compressing SAS Data Sets 2-41
Compression Dependencies
SAS Log (Windows)
18 data test(compress=yes);
19 x=1;
20 run;
p302d09
76
Compression Trade-Offs
Uncompressed Compressed
Usually requires more disk Usually requires less disk
storage storage
Requires less CPU time to Requires more CPU time to
prepare an observation for prepare an observation for
I/O I/O
Uses more I/O operations Uses fewer I/O operations
Compression Trade-Offs
Uncompressed Compressed
An updated observation fits in An updated observation might
its original location. be moved from its original
location.
Deleted observation space is Deleted observation space
never reused. can be reused.
New observations are always When REUSE=YES, new
inserted at the end of the observations might not be
data file. inserted at the end of the data
file.
78
2.3 Compressing SAS Data Sets 2-43
Exercises
Level 1
Level 2
a. You need to merge the two data sets together by Supplier_ID and create a compressed SAS data
set named supplier_names.
Which method of compression do you think would be the most appropriate?
2-44 Chapter 2 Controlling I/O Processing and Memory
Your results might vary from the suggested solutions depending on the operating platform
and method used to create the data on that platform.
Level 3
9. Compressing a Library
a. Write a LIBNAME statement to assign the libref orcomp to the path as listed below. Use the
LIBNAME statement option COMPRESS=YES to compress the data sets that will be written to
that data library.
Windows C:\temp
UNIX ~/temp
z/OS .prg3.tempdata
b. Write a PROC COPY step to copy data sets from the orion library to the orcomp library. The
PROC COPY step should copy only those data sets that begin with the letter "c". In addition,
ensure that you do not compress any of the data sets created in exercises after this one.
Hint: If you do not see compression messages the first time that you submit your code, look in the
Help facility or SAS OnlineDoc at the options for the PROC COPY statement.
c. Did any of them get larger? Yes or No
d. Why or why not?
e. Write a PROC DATASETS step with a DELETE statement to delete the library orcomp.
2.4 Controlling Memory (Self-Study) 2-45
Objectives
Investigate techniques for controlling memory.
Use system options to specify the amount of available
memory.
82
83
Memory is a bigger issue on shared SAS systems (Windows servers, UNIX, z/OS) than on stand-alone
SAS systems (Windows PCs). Most individual SAS users will not encounter memory problems unless
their SAS programs use procedures with many distinct categorical values, perform sorts of large SAS data
sets, or use large in-memory lookup tables. In the first two cases, the swapping of utility files to disk
when physical memory is fully used can also increase the use of CPU and I/O resources.
2-46 Chapter 2 Controlling I/O Processing and Memory
84
85
2.4 Controlling Memory (Self-Study) 2-47
p302d10
87
2-48 Chapter 2 Controlling I/O Processing and Memory
88
p302d10
89
2.4 Controlling Memory (Self-Study) 2-49
90
2-50 Chapter 2 Controlling I/O Processing and Memory
2.05 Quiz
Open and submit the program p302a03. Answer
the following questions:
1. What is the advantage of technique 1?
92
p302a03
options fullstimer;
data order_fact(index=(Order_Date));
set orion.order_fact;
run;
/* Technique 1 */
/* Technique 2 */
options nofullstimer;
2.4 Controlling Memory (Self-Study) 2-51
MEMSIZE=
95
Consult the SAS OnlineDoc to see specifics for each operating environment.
96
Consult the SAS OnlineDoc for specifics for each operating environment.
2-52 Chapter 2 Controlling I/O Processing and Memory
97
98
2.5 Controlling the Page Size and the Number of Available Buffers (Self-Study) 2-53
Objectives
Control the page size of a SAS data set.
Use system and data set options to control memory
usage.
Describe the effect of operating environment caching.
101
PDV
Output Buffers
ID Gender Country Name
I/O
SAS
measured
Data here
The size of this buffer is the page
size of the output data set.
102
2-54 Chapter 2 Controlling I/O Processing and Memory
BUFNO= n
103
BUFSIZE= can only be used on output SAS data sets. BUFSIZE= sets the page size of a SAS data file,
which is a permanent attribute of the data set.
Increasing the BUFSIZE= option is sometimes useful for SAS data sets that are read sequentially (top to
bottom). Using a small BUFSIZE= value and BUFNO=1 is useful for SAS data sets that are read using
random access.
BUFSIZE= Value Specifies
n | nK | nM | nG | nT specifies the page size in multiples of 1 (bytes); 1,024 (kilobytes); 1,048,576
(megabytes); 1,073,741,824 (gigabytes); or 1,099,511,627,776 (terabytes). For
example, a value of 8 specifies 8 bytes, and a value of 3M specifies 3,145,728
bytes.
The default is 0, which causes SAS to use the minimum optimal page size for
the operating environment.
hexX specifies the page size as a hexadecimal value. You must specify the value
beginning with a number (0-9), followed by an X. For example, the value 2dx sets
the page size to 45 bytes.
MIN sets the page size to the smallest possible number in your operating environment,
down to the smallest four-byte, signed integer, which is -231-1, or approximately
-2 billion bytes.
This setting might cause unexpected results and should be avoided. Use
BUFSIZE=0 in order to reset the buffer page size to the default value in
your operating environment.
MAX sets the page size to the maximum possible number in your operating
environment, up to the largest four-byte, signed integer, which is 231-1, or
approximately 2 billion bytes.
The specific values for the BUFSIZE= option depend on your operating environment.
2.5 Controlling the Page Size and the Number of Available Buffers (Self-Study) 2-55
16384 2 32,768
104
bufno=3 data
106
For information about structuring RAID arrays and SAN arrays optimally for SAS, see
the white paper: Best Practices for Configuring your IO Subsystem for SAS®9 Applications
(support.sas.com/rnd/papers/sgf07/sgf2007-iosubsystem.pdf).
NOSGIO | SGIO
107
Scatter-read/gather-write is active only for SAS I/O opened in INPUT or OUTPUT mode. If any SAS I/O
files are opened in UPDATE or RANDOM mode, SGIO is inactive for that process. Compressed and
encrypted files can also be read ahead using scatter-read/gather-write. I/O performance usually improves
as the value for the BUFNO increases.
109
Data sets in the library specified without the USEDIRECTIO data set option use UNIX caching.
2-58 Chapter 2 Controlling I/O Processing and Memory
SAS-data-set-name (USEDIRECTIO=NO|YES)
110
For more information about using direct file I/O in UNIX, refer to the following:
support.sas.com/documentation/cdl/en/hostunx/61879/HTML/default/chloptfmain.htm
2.6 Chapter Review 2-59
Chapter Review
1. What is the purpose of the SASFILE statement?
112
2-60 Chapter 2 Controlling I/O Processing and Memory
2.7 Solutions
Solutions to Exercises
1. Using the SASFILE Statement
a. Open the program p302e01 and submit it.
b. Note the following resource utilizations:
1) User CPU Time:
2) I/O:
(not applicable on Windows)
3) User Memory:
c. Add the appropriate statement(s) to open and load the entire data set orion.organization_dim into
memory. At the end of the program, close the data set.
2.7 Solutions 2-61
proc sql;
select Employee_Name,
sum(Qtr1, Qtr2, Qtr3, Qtr4) as Total_Contribution,
Recipients
from orion.employee_addresses as a,
orion.employee_donations as d
where a.Employee_ID=d.Employee_ID;
quit;
/*************************************/
/* The DATASETS procedure changes */
/* the names of FIRST and LAST to be */
/* compatable with the variables in */
/* the sales data. */
/*************************************/
/*************************************/
/* Alternative Solution */
/*************************************/
data all_customers;
length Quantity 3
Customer_ID Order_Date Delivery_Date 4
Employee_ID 5
Street_ID Order_ID 6
Product_ID 7;
set orion.catalog orion.internet orion.retail;
run;
2.7 Solutions 2-65
CPU times vary by platform and other factors not controllable by SAS.
data sales(compress=binary);
merge orion.supplier product_list;
by Supplier_ID;
run;
d. Which method was better?
CHAR
e. Why was that method better?
Heavily character data
9. Compressing a Library
a. Write a LIBNAME statement to assign the libref orcomp to the path as listed below. Use the
LIBNAME statement option COMPRESS=YES to compress the data sets that will be written to
that data library.
b. Write a PROC COPY step to copy data sets from the orion library to the orcomp library. The
PROC COPY step should copy only those data sets that begin with the letter "c". In addition,
ensure that you do not compress any of the data sets created in exercises after this one.
c. Did any of them get larger? Yes
d. Why or why not? Some of the data sets increased in size. They were not large data sets.
e. Write a PROC DATASETS step.
p302s09
libname orcomp 'C:\temp' compress=yes; /* Windows */
* libname orcomp '~/temp' compress=yes; /* UNIX */
* libname orcomp '.workshop.tempdata' compress=yes; /* z/OS */
16
41
2-68 Chapter 2 Controlling I/O Processing and Memory
52
64 continued...
2.7 Solutions 2-69
66 continued...
93
2-70 Chapter 2 Controlling I/O Processing and Memory
94
2.7 Solutions 2-71
113
2-72 Chapter 2 Controlling I/O Processing and Memory
Chapter 3 Accessing Observations
Objectives
Define indexes.
List the uses of indexes.
Use the DATA step to create indexes.
Use PROC DATASETS to create and maintain
indexes.
Use PROC SQL to create and maintain indexes.
5
3-4 Chapter 3 Accessing Observations
Using Indexes
An index is an optional file that you can create for
a SAS data file that does the following:
points to observations based on the values of one
or more key index variables
provides direct access to specific observations
The index is stored with the key values in ascending sorted order.
3.1 Creating an Index 3-5
data customer14958;
set orion.sales_history;
where Customer_ID=14958;
run;
9
3-6 Chapter 3 Accessing Observations
Input
SAS Buffers The WHERE statement
selects observations
Data by reading data
Data sequentially.
pages are
loaded. PDV
15
data customer14958;
set orion.sales_history;
where Customer_ID=14958;
run;
16
3.1 Creating an Index 3-7
Input
SAS Buffers The WHERE statement
Data selects observations
Only by using direct access.
necessary
pages are PDV
loaded.
ID Gender Country Name
Output Buffers
SAS
Data
23
When SAS uses an index to process data, SAS does the following:
• performs a binary search on the index file
• positions the index to the first entry containing a qualified value
• transfers a page of data containing the first record identifier for the qualified value to a buffer
• directly accesses the value specified by the record identifier
• positions the index to the next entry containing a qualified value
• transfers the page of data, if it is not already in the buffer
• directly accesses the value specified by the record identifier
• continues to process the data until there is no more data that satisfies the WHERE expression
If the stored data values are sorted in ascending order by the indexed variables, fewer I/O
operations are required. If the data is not sorted on the index key values, but observations with the
same key values are near each other in the file, I/O will be minimized.
3-8 Chapter 3 Accessing Observations
Index
Index
Buffer
Input
SAS Buffers
Data
PDV
ID Gender Country Name
Output Buffers
SAS
Data
24
3.1 Creating an Index 3-9
IBUFNO=n | nK | nM | nG | nT
25
SAS automatically allocates a minimal number of buffers in order to navigate the index file. Typically,
you do not need to specify extra buffers. However, using IBUFNO= to specify extra buffers can improve
execution time by limiting the number of input/output operations that are required for a particular index
file. However, the improvement in execution time comes at the expense of increased memory
consumption.
Whereas too few buffers allocated to the index file decrease performance, over-allocation of
index buffers creates performance problems as well. Experimentation is the best way to determine
the optimal number of index buffers. For example, experiment with IBUFNO=3, then
IBUFNO=4, and so on, until you find the least number of buffers that produces satisfactory
performance results.
hexX specifies the number of extra index buffers as a hexadecimal value. You must
specify the value beginning with a number (0-9), followed by an X.
MIN sets the number of extra index buffers to 0. This is the default.
MAX sets the number of extra index buffers to 0. This is the default
3-10 Chapter 3 Accessing Observations
PDV
27 ...
3.1 Creating an Index 3-11
2 top half or
14844 the
121042 . . . 4006 17(85)
3
bottom99999999
14864 half? . . . 4021 17(89)
4 14909 120436 . . .
4059 17(90)
. . .
. . . . . . 4063 17(80, 86)
where Customer_ID=14958;
. . .
.
22 14918 120918 . . . .
23 14844 121042 . . . .
28 ...
The binary search essentially divides the index file in half and asks, “Is the key value that I am searching
for above or below the halfway point?” The binary search continues to divide the remaining portions of
the index file in half until the key value is found.
2 top half or
14844 the
121042 . . . 4006 17(85)
3
bottom half?
14864 99999999 . . . 4021 17(89)
4 14909 120436 . . .
4059 17(90)
. . .
. . . . . . 4063 17(80, 86)
where Customer_ID=14958;
. . .
.
22 14918 120918 . . . .
23 14844 121042 . . . .
29 ...
3-12 Chapter 3 Accessing Observations
30 ...
31
3.1 Creating an Index 3-13
33
Business Scenario
The SAS data set orion.sales_history is often queried
with a WHERE statement.
35
3-14 Chapter 3 Accessing Observations
Business Scenario
You need to create three indexes on the most frequently
used subsetting columns.
Index Name Index Variables
Customer_ID Customer_ID
Product_Group Product_Group
SaleID Order_ID
Product_ID
Partial Listing of orion.sales_history
Customer Order_ Order_ Product_
. . . Product_ID . . . . . .
_ID ID Type Group
36
Creating an Index
Customer_ID
Customer_ID
Order_ID
Product_Group
Product_ID
SaleID
Product_Group
Index Terminology
There are two types of indexes.
38
Index Terminology
Index options include the following:
UNIQUE Values of the key variable(s) must be unique. This
option prevents an observation with a duplicate value
for the key variable(s) from being added to the data set.
Partial Listing of orion.sales_history
Customer Employee_ Order_
. . . Order_ID Product_ID Quantity . . .
_ID ID Type
14958 121031 . . . 1230016296 1 210200600078 1 . . .
14844 121042 . . . 1230096476 1 220100100354 1 . . .
14864 99999999 . . . 1230028104 2 240600100115 1 . . .
14909 120436 . . . 1230044374 1 240100200001 1 . . .
14862 120481 . . . 1230021668 1 240500200056 1 . . .
14853 120454 . . . 1230021653 1 220200200085 3 . . .
14838 121039 . . . 1230140184 1 220100300042 4 . . .
In an existing data set, if the variable(s) on which you attempt to create a unique index has duplicate
values, the index is not created and an error message is written to the SAS log.
3-16 Chapter 3 Accessing Observations
Index Terminology
Index options include the following:
NOMISS excludes all observations with missing values from the
index. Observations with missing values can still be
read from the data set, but not using the index.
42
3.1 Creating an Index 3-17
Creating Indexes
To create indexes at the same time that you create
a data set, use the INDEX= data set option on the output
data set.
To create or delete indexes on existing data sets,
use one of the following:
DATASETS procedure
SQL procedure
44
Creating Indexes
When you create the index, do the following:
designate the key variable(s)
45
3-18 Chapter 3 Accessing Observations
46 p303d01
3.1 Creating an Index 3-19
SAS-data-file-name (INDEX =
(index-specification-1</option> </option>
…<index-specification-n</option> </option> >));
47
11 options msglevel=i;
12 data orion.sales_history(index=
13 (Customer_ID Product_Group
14 SaleID=(Order_ID
15 Product_ID)/unique));
16 set orion.sales_history;
17 run;
NOTE: There were 1500 observations read from the data set ORION.SALES_HISTORY.
NOTE: The data set ORION.SALES_HISTORY has 1500 observations and 22 variables.
NOTE: Composite index SaleID has been defined.
NOTE: Simple index Product_Group has been defined.
NOTE: Simple index Customer_ID has been defined.
48
I prints informational or INFO notes that pertain to index creation and usage,
merge processing, host sort utilities, and threading in addition to notes,
warnings, and error messages.
49
3.1 Creating an Index 3-21
The value for the MSGLEVEL= SAS system option is set to n because PROC DATASETS
issues its own notes.
3-22 Chapter 3 Accessing Observations
51
The INDEX CREATE statement in PROC DATASETS cannot be used if the index to be created already
exists.
If the index to be created already exists, you must do the following:
• delete the existing index of the same name
• create the new index
If you delete and create indexes in the same step, delete indexes first so that the newly created indexes can
reuse the space of the deleted indexes.
You can specify the UNIQUE or NOMISS option in the INDEX CREATE statement.
3.1 Creating an Index 3-23
3.04 Quiz
Open and submit the program p303a01.
What error messages are in the log?
p303a01
options msglevel=n;
proc datasets library=orion nolist;
modify sales_history;
index create Customer_ID;
index create Product_Group;
index create SaleID=(Order_ID
Product_ID)/unique;
quit;
53
p303d03
56
The value for the MSGLEVEL= SAS system option is set to n because PROC SQL issues its own
notes.
3.1 Creating an Index 3-25
PROC SQL;
DROP INDEX index-name
FROM table-name;
CREATE <option> INDEX index-name
ON table-name(column-name-1,...
column-name-n);
QUIT;
57
The DATA step can perform PROC DATASETS cannot The CREATE INDEX
data manipulation at the same perform data manipulation. statement cannot perform
time that the index is created. data manipulation.
To delete one or more One or more indexes can be One or more indexes can be
indexes, you must re-create deleted without deleting all of deleted without deleting all of
the other required indexes. the indexes on the data set. the indexes on the data set.
An existing index can be re- If an index exists, it must be If an index exists, it must be
created without first deleting it. deleted before it can be re- deleted before it can be re-
created. created.
59
Documenting Indexes
The following can be used to document indexes:
SAS Explorer
PROC CONTENTS
PROC DATASETS
60
3.1 Creating an Index 3-27
61
Index Documentation
proc contents data=orion.sales_history; These
run; two
steps
proc datasets lib=orion nolist; produce
contents data=sales_history; identical
quit; output.
# of
Unique Unique
# Index Option Values Variables
1 Customer_ID 1046
2 Product_Group 56
3 SaleID YES 1500 Order_ID Product_ID
62 p303d04
3-28 Chapter 3 Accessing Observations
Exercises
Level 1
1. Creating Indexes
a. Open the program p303e01, and add the INDEX= option to create two indexes:
• a simple index Customer_ID, based on the variable Customer_ID
• a unique index Order_ID, based on the variable Order_ID
b. Use PROC SQL to delete the Order_ID index from the orders data set.
c. Use PROC DATASETS to create a composite index named OrDate based on the Order_ID and
Order_Date variables for the orders data set.
d. Use PROC CONTENTS or PROC DATASETS to look at the index information.
Level 2
2. Updating Indexes
a. Use the orion.price_list SAS data set to create a temporary data set named price_list that contains
a new variable named Unit_Profit that is the difference between the variables Unit_Sales_Price
and Unit_Cost_Price. Create a unique index on the Product_ID variable.
b. Open the program p303e02 and submit it.
c. View the log, and determine whether the new observation was added.
d. Why or why not?
Level 3
Objectives
Describe when an index is used for WHERE statement
processing.
Describe when an index is not used for WHERE
statement processing.
66
67
3-30 Chapter 3 Accessing Observations
68
Subtle improvements were made to the circumstances under which SAS uses an index in SAS 9.2.
Trailing blanks in the CONTAINS operator pattern to be searched for are ignored. Escape characters in
the LIKE operator are permitted. Examples are provided in the table below:
Condition Examples
Fully bounded range conditions where 5000 < Order_ID < 10000;
specifying both an upper and a lower where Order_ID between 5000 and 10000;
limit, which includes the BETWEEN-
AND operator
For more information about when index usage is possible, see SAS 9.2 Language Reference:
Concepts Ö SAS Files Concepts Ö SAS Data Files Ö Understanding SAS Indexes in the
Help facility.
# of
Unique Unique
# Index Option Values Variables
1 Customer_ID 1046
2 Product_Group 56
3 SaleID YES 1500 Order_ID Product_ID
70
3-32 Chapter 3 Accessing Observations
71
73
3.2 Using an Index 3-33
75
Using a Subsetting IF
Input
SAS Buffers
The subsetting IF
Data statement
selects observations.
PDV
ID Gender Country Name
Output Buffers
SAS
Data
77
3-34 Chapter 3 Accessing Observations
No Index Usage
SAS does not use an index when a WHERE expression
references an indexed variable if the following conditions
exist:
No single index can supply all required observations.
78
Condition Examples
For more information about when an index is not used, see SAS 9.2 Language Reference:
Concepts Ö SAS Files Concepts Ö SAS Data Files Ö Understanding SAS Indexes in the
Help facility.
3.2 Using an Index 3-35
Compound Optimization
A WHERE expression that references multiple variables
can take advantage of a composite index.
79
Compound Optimization
For compound optimization to occur, all of the following
must be true:
At least the first two key variables in the composite
index must be used in the WHERE conditions.
The conditions must be connected using the AND
operator.
At least one condition must use the EQ, equal sign (=),
or IN operator.
80
3-36 Chapter 3 Accessing Observations
82
84
3.2 Using an Index 3-37
Subset Size
SAS might
use an index.
SAS will
probably
33.3% use an index.
3%
0%
Data Set
SAS will
use an index.
87
To determine whether it is more efficient to satisfy the WHERE expression by using the index or by
reading the data sequentially, SAS uses these guidelines:
• If only a few observations are qualified, it is more efficient to use the index than to do a sequential
search of the entire data file.
• If most or all of the observations qualify, then it is more efficient to read the data file sequentially.
If the subset is between small and large, other factors such as data order are important.
3-38 Chapter 3 Accessing Observations
Subset Size
The SAS index includes cumulative percentiles or
centiles. By default, SAS stores 21 centiles or every
5th percentile of the index. This information is used
to estimate the size of a qualifying subset.
88
For information about updating and viewing the centile information, see the UPDATE
CENTILES information in the SAS documentation for the DATASETS procedure and the
CENTILES option for the PROC CONTENTS statement.
90
3.2 Using an Index 3-39
92
Data Order
Obs Customer_ID
. For data that is sorted
.
8939
.
56487
and indexed on the same
8940
8941
70175
74667
variable(s), retrieval time
.
. through the index is much
.
faster than either sorted or
.
.
indexed data alone.
.
32548 89619 where Customer_ID in
32549 70187 (70201, 70187, 70175);
32550 76278
.
.
.
.
Fewer pages are
.
.
copied into memory
45775 84989 if the data is sorted.
45776 70201
45777 20209
.
.
.
Unsorted data Sorted data
93
All of the observations meeting a specific criteria (Customer_ID = 14844) are on the same or
adjacent data set pages. Thus, fewer data set pages must be read to retrieve the same selected
observations.
3-40 Chapter 3 Accessing Observations
options msglevel=i;
proc print data=orion.sales_history(idxwhere=yes);
where Customer_ID in (14844,4983,5862,10032)
and Product_Group contains 'Shoes';
var Customer_ID Product_ID Product_Group ;
title 'With an Index';
run;
p303d05
95
3.2 Using an Index 3-41
p303d05
96
98
3.2 Using an Index 3-43
Maintaining Indexes
Data Management Tasks Index Action Taken
Copy the data set with the Index file constructed for
COPY procedure or the new data file
DATASETS procedure
Move the data set with the Index file deleted from IN=
MOVE option in the COPY library; rebuilt in OUT=
procedure library
Copy the data set with a Index file constructed for
drag-and-drop action in new file
SAS Explorer
99 continued...
Maintaining Indexes
Data Management Tasks Index Action Taken
Rename the data set Index file renamed
Rename the variable Variable renamed to new
name in index file
Add observations Value/Identifier pairs added
Delete observations Value/Identifier pairs
deleted; space recovered
for re-use
Update observations Value/Identifier pairs
updated if values change
The APPEND procedure and the INSERT INTO
statement in the SQL procedure update the index file
after all the data is appended or inserted.
100 continued...
Indexes are maintained by updates in place, such as using the VIEWTABLE window to update, add, or
delete observations, and the APPEND or SQL procedure to append data. Using the Explorer window or
the DATASETS procedure also maintains indexes when data sets or variables are renamed. However,
re-creating a data set with the SET, MERGE, or UPDATE statement does not automatically maintain
indexes.
3-44 Chapter 3 Accessing Observations
Maintaining Indexes
Data Management Tasks Index Action
Taken
Delete a data set. Index file deleted
proc datasets lib=work;
delete a;
run;
Rebuild a data set with a DATA step or the Index file deleted
SQL procedure.
data a; proc sql;
set a; create table a as
run; select * from a;
quit;
Sort the data set in place with the FORCE Index file deleted
option in the SORT procedure.
proc sort data=a force;
by var;
run;
101
If you use the UPLOAD procedure or the DOWNLOAD procedure in SAS/CONNECT, the index is
re-created by default when you upload or download a single data set and omit the OUT= option or when
you upload or download a SAS data library. Use the INDEX=NO data set option to upload or download
without re-creating the index.
Index re-created:
proc upload data=schedule;
run;
Index not re-created:
proc download data=Sales(index=no);
run;
If you are using the CPORT procedure to create transport files, you can use the INDEX=YES option in
the PROC CPORT statement to transport the index file along with the data set. INDEX=YES is the
default.
3.2 Using an Index 3-45
A variable such as Gender is not discriminating. A discriminating variable is one that enables
you to break the data into many small groups or subsets.
Index Trade-offs
Advantages Disadvantages
Exercises
Level 1
4. Using an Index
Open the program p303e04, and submit it. Consult the log and answer the questions following the
program code shown here.
p303e04
options msglevel=I;
*** Example 1;
data rdu;
set orion.sales_history;
if Order_ID=1230166613;
run;
*** Example 2;
*** Example 3;
*** Example 4;
**** Example 5;
*****Example 6;
data saleshistorycopy;
set orion.sales_history;
run;
3-48 Chapter 3 Accessing Observations
Questions:
a. Does Example 1 use an index? Why or why not?
Replace the IF statement with a WHERE statement, and resubmit the program. Does the example
now use an index? Why or why not?
Replace the OR operator with the AND operator, and resubmit the program. Does the example
now use an index? Why or why not?
Replace the NE operator with the EQ operator, and resubmit the program. Does the example now
use an index? Why or why not?
Add the IDXWHERE=NO data set option and resubmit the program. Is the output from the
PROC PRINT step with an index different from the output from the PROC PRINT step without
an index?
What message do you see in the log?
3.2 Using an Index 3-49
Level 2
Level 3
Objectives
Create a systematic sample.
Create a random sample with replacement.
Create a random sample without replacement.
108
Business Scenario
The Marketing Department wants to send customer
satisfaction questionnaires to a sample of the customers
in the orion.order_fact SAS data set.
Partial Listing of orion.order_fact
Customer Employee Delivery_
Street_ID Order_Date Order_ID . . .
_ID _ID Date
63 121039 9260125492 11JAN2003 11JAN2003 1230058123 ...
5 99999999 9260114570 15JAN2003 19JAN2003 1230080101 ...
45 99999999 9260104847 20JAN2003 22JAN2003 1230106883 ...
41 120174 1600101527 28JAN2003 28JAN2003 1230147441 ...
183 120134 1600100760 27FEB2003 27FEB2003 1230315085 ...
. . . . . .
. . . . . .
. . . . . .
109
3.3 Creating a Sample Data Set (Self-Study) 3-51
Business Scenario
Select a subset by reading every 50th observation from
observation number 1 to the end of the SAS data set.
data subset;
e do PickIt=1 to TotObs by 50; d
set orion.order_fact(keep=Customer_ID
Employee_ID Street_ID Order_ID)
point=PickIt
nobs=TotObs; c
output; f
end;
stop; g
run;
p303d07
110
c The NOBS= option creates a temporary numeric variable that contains the total number of
observations in the input data. This variable is populated at compilation.
d You can refer to the NOBS= variable in executable statements that appear before the SET statement.
e The DO loop assigns a value to the variable PickIt. PickIt is used by the POINT= option in the SET
statement to select an observation from the SAS data set. PickIt must have a value before the SET
statement executes.
f The OUTPUT statement writes the PDV values to the SAS data set.
g The STOP statement stops the DATA step from continuing to execute after the five observations are
selected. Without a STOP statement, the DATA step continues in an infinite loop.
3-52 Chapter 3 Accessing Observations
3.09 Quiz
Are POINT= and NOBS= individual statements
or part of the SET statement?
data subset;
do PickIt=1 to TotObs by 50;
set orion.order_fact(keep=Customer_ID
Employee_ID Street_ID Order_ID)
point=PickIt
nobs=TotObs;
output;
end;
stop;
run;
p303d07
112
114
The POINT= option value should be an integer greater than zero and less than or equal to the number of
observations in the SAS data set.
• If the value is not integral, the SET statement effectively applies the FLOOR function to the value.
• If, during processing, the POINT= value does not match an observation number (is negative or is
greater than NOBS), a data error results and no observation is read by the SET statement. The DATA
step will output the current contents of the PDV and continue processing.
3.3 Creating a Sample Data Set (Self-Study) 3-53
is retained
115
STOP;
116
3-54 Chapter 3 Accessing Observations
Compilation
data subset;
do PickIt=1 to TotObs by 50;
set orion.order_fact
(keep=Customer_ID
Employee_ID
Street_ID
Order_ID)
point=PickIt
nobs=TotObs;
output;
end;
stop;
run;
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
. 617 . . . . .
p303d07
117 ...
During compilation, the value for TotObs is retrieved from the descriptor portion of orion.order_fact.
Execution
Partial Listing of data subset;
orion.order_fact do PickIt=1 to TotObs by 50;
Customer Employee
. . .
set orion.order_fact
obs _ID _ID (keep=Customer_ID
1 63 121039 . . . Employee_ID
2 5 99999999 . . . Street_ID
. .
Order_ID)
. . . . . point=PickIt
. . nobs=TotObs;
50 17023 99999999 . . . output;
51 17023 99999999 . . . end;
. .
stop;
. . . . . run;
. .
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
1 617 63 121039 9260125492 1230058123 1
119 ...
The SET statement executes and reads the first observation. The first observation is read because the
variable PickIt has a value of 1, not because SAS is reading sequentially.
3.3 Creating a Sample Data Set (Self-Study) 3-55
Execution
Partial Listing of data subset;
orion.order_fact do PickIt=1 to TotObs by 50;
Customer Employee
. . .
set orion.order_fact
obs _ID _ID (keep=Customer_ID
1 63 121039 . . . Employee_ID
2 5 99999999 . . . Street_ID
. .
Order_ID)
. . . . . point=PickIt
. . nobs=TotObs;
50 17023 99999999 . . . output;
51 17023 99999999 . . . end; Output current
. .
stop; observation.
. . . . . run;
. .
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
1 617 63 121039 9260125492 1230058123 1
120 ...
Execution
Partial Listing of data subset;
orion.order_fact do PickIt=1 to TotObs by 50;
Customer Employee
. . .
set orion.order_fact
obs _ID _ID (keep=Customer_ID
1 63 121039 . . . Employee_ID
2 5 99999999 . . . Street_ID
. .
Order_ID)
. . . . . point=PickIt
. . nobs=TotObs;
50 17023 99999999 . . . output;
51 17023 99999999 . . . end;
. .
stop;
. . . . . run;
. .
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
51 617 17023 99999999 2600100021 1230931366 1
123 ...
This time when the SET statement executes, observation 51 is read from orion.order_fact.
3-56 Chapter 3 Accessing Observations
Execution
Partial Listing of data subset;
orion.order_fact do PickIt=1 to TotObs by 50;
Customer Employee
. . .
set orion.order_fact
obs _ID _ID (keep=Customer_ID
1 63 121039 . . . Employee_ID
2 5 99999999 . . . Street_ID
. .
Order_ID)
. . . . . point=PickIt
. . nobs=TotObs;
50 17023 99999999 . . . output;
51 17023 99999999 . . . end; Output current
. .
stop; observation.
. . . . . run;
. .
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
51 617 17023 99999999 2600100021 1230931366 1
124 ...
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
651 617 215 120175 1600102721 1243963366 1
126 ...
When PickIt has a value of 651, its value is greater than the range (1-617) in the iterative DO loop.
3.3 Creating a Sample Data Set (Self-Study) 3-57
Execution
Partial Listing of data subset;
orion.order_fact do PickIt=1 to TotObs by 50;
Customer Employee
. . .
set orion.order_fact
obs _ID _ID (keep=Customer_ID
1 63 121039 . . . Employee_ID
2 5 99999999 . . . Street_ID
. .
Order_ID)
. . . . . point=PickIt
. . nobs=TotObs;
50 17023 99999999 . . . output;
51 17023 99999999 . . . end;
. .
stop; Execution stops.
. . . . . run;
. .
PDV
Tot Customer_ Employee_ Street_
D PickIt D Order_ID D _N_
Obs ID ID ID
651 617 215 120175 1600102721 1243963366 1
127 ...
Control goes to the next executable statement after the end of the DO loop.
128
3-58 Chapter 3 Accessing Observations
RANUNI(seed)
129
A 0 argument for the RANUNI function uses the system clock time, which results in a different
stream of random numbers each time that the program is run.
3.3 Creating a Sample Data Set (Self-Study) 3-59
0 1
ranuni(seed)
Examples:
Random number
.01253689
.95196500
130 ...
0 5
ranuni(seed) * 5
Examples:
Random number * 5
.01253689 Î 0.06268445
.95196500 Î 4.75982500
131 ...
3-60 Chapter 3 Accessing Observations
1 2 3 4 5
ceil(ranuni(seed) * 5)
Examples:
Random number * 5 CEIL( )
.01253689 Î 0.06268445 Î 1
.95196500 Î 4.75982500 Î 5
132
The CEIL function returns the smallest integer that is greater than or equal to the argument.
int(ranuni(seed) * 5)
134
3.3 Creating a Sample Data Set (Self-Study) 3-61
3.10 Poll
Instead of the CEIL function, would the INT function return
the same results?
Yes
No
135
3-62 Chapter 3 Accessing Observations
With a seed value of 0, you get different results each time that the program is executed, but it is
possible that some of the same observations that were selected in previous executions will be
selected.
3.3 Creating a Sample Data Set (Self-Study) 3-63
p303d09
Create a random sample without replacement. A sample without replacement cannot contain duplicate
observations because after an observation is output to work.subset, it cannot be selected again
programmatically.
p303d09
data subset(drop=ObsLeft SampSize);
c SampSize=10;
d ObsLeft=TotObs;
do while(SampSize>0 and ObsLeft>0);
e PickIt+1;
if ranuni(0)<SampSize/ObsLeft then
do;
ObsPicked=PickIt;
set orion.order_fact point=PickIt
nobs=TotObs;
output;
SampSize=SampSize-1;
end;
ObsLeft=ObsLeft-1;
end;
stop;
run;
With a seed value of 0, you get different results each time that the program is executed, but it is
possible that some of the same observations will be selected as were selected in previous
executions.
In each iteration of the DO loop, the following occur:
1. PickIt is incremented by 1.
2. The IF expression ranuni(0) < Sampsize/ObsLeft is evaluated.
a. If true, these actions occur:
1) The observation PickIt is selected in the sample.
2) SampSize is decreased by 1.
b. If false, the observation PickIt is skipped.
3. ObsLeft is decreased by 1.
The process ends when SampSize is 0; no additional observations are needed.
Be aware of the following:
• Each observation is considered for selection.
• An observation number is considered only once.
• The data set is read only when an observation number is selected.
This is an adaptation of a sampling routine that was used by statisticians for many years.
• The sample size is fixed.
• An observation can be selected only once.
• Each observation has an equal probability of being selected.
• The selection probability for an observation is independent of the selection of another
observation.
3.3 Creating a Sample Data Set (Self-Study) 3-65
138
p303d10
139
3-66 Chapter 3 Accessing Observations
140
STRATA partitions the input data set into non-overlapping groups defined by the
STRATA variables. PROC SURVEYSELECT then selects independent
samples from these strata, according to the selection method and design
parameters specified in the PROC SURVEYSELECT statement. PROC
SURVEYSELECT expects the input data set to be sorted in the order of the
STRATA variables.
CONTROL names variables for sorting the input data set. The CONTROL variables
can be character or numeric. PROC SURVEYSELECT sorts the input data
set by the CONTROL variables before selecting the sample. If you also
specify a STRATA statement, PROC SURVEYSELECT sorts by the
CONTROL variables within the strata.
SIZE names one and only one size measure variable, which contains the size
measures to be used when sampling with probability proportional to size.
The SIZE variable must be numeric. When the value of an observation's
SIZE variable is missing or non-positive, that observation has no chance of
being selected for the sample.
ID names variables from the DATA= input data set to be included in the
OUT= data set of selected units. If there is no ID statement, PROC
SURVEYSELECT includes all variables from the DATA= data set in the
OUT= data set. The ID variables can be character or numeric.
3.3 Creating a Sample Data Set (Self-Study) 3-67
141
142
3-68 Chapter 3 Accessing Observations
143
p303d10
144
3.3 Creating a Sample Data Set (Self-Study) 3-69
145
c Because the SEED= option is not specified in the PROC SURVEYSELECT statement, the seed
value is obtained using the datetime value from the computer's clock.
d The Selection Probability for each individual unit is calculated as 10/617 (sample size/number of
observations in the input data set).
e The Sampling Weight is the inverse of the selection probability, 617/10.
146
3-70 Chapter 3 Accessing Observations
Exercises
Level 1
Level 2
Level 3
Chapter Review
1. What is one purpose of an index?
149 continued...
Chapter Review
6. Does a subsetting IF use an index?
150
3-72 Chapter 3 Accessing Observations
3.5 Solutions
Solutions to Exercises
1. Creating Indexes
a. Open the program p303e01, and add the INDEX= option to create two indexes:
• a simple index Customer_ID, based on the variable Customer_ID
• a unique index Order_ID, based on the variable Order_ID
p303s01
options msglevel=i;
data orders(index=(Customer_ID Order_ID / unique));
set orion.orders;
Days_To_Delivery=Delivery_Date - Order_Date;
run;
options msglevel=n;
b. Use PROC SQL to delete the Order_ID index from the orders data set.
proc sql;
drop index Order_ID
from orders;
quit;
c. Use PROC DATASETS to create a composite index OrDate based on the Order_ID and
Order_Date variables for the orders data set.
proc datasets library=work nolist;
modify orders;
index create OrDate=(Order_ID Order_Date);
quit;
d. Use PROC CONTENTS or PROC DATASETS to look at the index information.
/* CONTENTS solution */
proc contents data=orders;
run;
/* DATASETS solution */
proc datasets library=work nolist;
contents data=orders;
quit;
3.5 Solutions 3-73
2. Updating Indexes
a. Use the orion.price_list SAS data set to create a temporary data set named price_list that
contains a new variable named Unit_Profit that is the difference of the variables
Unit_Sales_Price and Unit_Cost_Price. Create a unique index on the Product_ID variable.
p303s02
data price_list(index=(Product_ID / unique));
set orion.price_list;
Unit_Profit=Unit_Sales_Price - Unit_Cost_Price;
run;
b. Open the program p303e02 and submit it.
c. View the log and determine whether the new observation was added.
Partial SAS Log
208 /* Part b */
209 proc sql;
210 insert into price_list(Product_ID, Start_Date,
211 End_Date, Unit_Cost_Price,
212 Unit_Sales_Price, Factor,Unit_Profit)
213 values (210200100009, '15FEB2007'd, '31DEC9999'd, 15.50, 34.70, 1.00,
213! 19.20);
ERROR: Duplicate values not allowed on index Product_ID for file PRICE_LIST.
NOTE: This insert failed while attempting to add data from VALUES clause 1 to
the data set.
NOTE: Deleting the successful inserts before error noted above to restore table
to a consistent state.
214 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.31 seconds
cpu time 0.04 seconds
Current # of
Unique Update Update Unique
# Index Option Centiles Percent Values Variables
b. Using the DATASETS procedure, set the indicator for updating the centile information about the
Order_ID index to 1% of the data.
p303s06
proc datasets lib=work nolist;
modify orders;
index centiles Order_ID / updatecentiles=1;
quit;
c. Submit the program p303e06c, which adds new observations to the work.orders data set.
3.5 Solutions 3-77
d. Submit a PROC CONTENTS step to view the contents of orders. Compare the centile
information from step 6.a. to the current centile information. Were the centiles updated or not?
Yes
proc contents data=orders centiles;
run;
Partial PROC CONTENTS Output
Alphabetic List of Indexes and Attributes
Current # of
Unique Update Update Unique
# Index Option Centiles Percent Values Variables
p303s07
data products_sample;
do i=10 to TotObs by 10;
set orion.product_dim(keep=Product_Line Product_ID
Product_Name Supplier_Name)
nobs=TotObs
point=i;
output;
end;
stop;
run;
34
43
3.5 Solutions 3-81
72
d. Yes, Order_ID is the primary key variable in the SaleID index, so that index could be used.
e. This statement would not execute because there is a syntax error. The WHERE statement requires a
numeric constant (3245) because Customer_ID is a numeric variable.
3.5 Solutions 3-83
76
83
91
p303d07
113
3.5 Solutions 3-85
136
3-86 Chapter 3 Accessing Observations
153
3-88 Chapter 3 Accessing Observations
Chapter 4 Introduction to Lookup
Techniques
Objectives
Define table lookup.
List table lookup techniques.
Table Lookups
Lookup values for a table lookup can be stored in the
following: Lookup Values
array
hash object
format
data set
FORMAT statement,
PUT function
MERGE, SET/SET, join
4
8
4.2 In-Memory Lookup Techniques 4-5
Objectives
Describe arrays as a lookup technique.
Describe hash objects as a lookup technique.
Describe formats as a lookup technique.
10
12
4-6 Chapter 4 Introduction to Lookup Techniques
Overview of Arrays
An array is similar to a numbered row of buckets.
1 2 3 4
15
Overview of Arrays
General form of the ARRAY statement:
DATA data-set-name;
ARRAY array-name { subscript } <$><length>
<array-elements> <(initial-value-list)>;
< READ statement (s)>
new-variable=array-name{subscript-value};
RUN;
The ARRAY statement
associates variables or The assignment statement
initial values to be retrieved retrieves values from the
using the array name and a array based on the value of
subscript value. the subscript.
Overview of Arrays
data country_info;
array Cont_Name{91:96} $ 30 _temporary_
('North America',
' ',
'Europe',
'Africa',
'Asia',
'Australia/Pacific');
set orion.country;
Continent=Cont_Name{Continent_ID};
run;
19
4-8 Chapter 4 Introduction to Lookup Techniques
20
p304d02
26
4-10 Chapter 4 Introduction to Lookup Techniques
28
29
4.2 In-Memory Lookup Techniques 4-11
Overview of a Format
A format is similar to rows of buckets that are identified
by the data value.
Data Value Label SAS puts data values and
label values in the buckets
when the format is used in a
FORMAT statement, PUT
function, or PUT statement.
SAS uses a binary search
on the data value bucket in
order to return the value in
the label bucket.
33
Overview of a Format
General form of the user-defined format:
The FORMAT step
PROC FORMAT; compiles the format
VALUE <$>fmtname range-1=label-1 and stores it on disk.
...
range-n=label-n;
RUN;
When the PUT
DATA data-set-name; function executes,
< READ statement(s)>; the format is loaded
new-variable=PUT(variable,fmtname.); into memory, and a
RUN; binary search is
used to retrieve the
format value.
Overview of a Format
The FORMAT step
proc format; compiles the format
value Cont_Name and stores it on disk.
91='North America'
93='Europe'
94='Africa'
95='Asia'
96='Australia/Pacific';
run;
data country_info;
set orion.country;
Continent=put(Continent_ID,Cont_Name.);
run;
Objectives
List methods for combining data horizontally.
Use multiple SET statements to combine data
horizontally.
Compare methods for combining SAS data sets.
37
UPDATE statement
MODIFY statement
38
4-14 Chapter 4 Introduction to Lookup Techniques
40
41
4.3 Disk Storage Techniques 4-15
Continent_ID Continent_ID
Continent_ID
42
data country_info;
merge country orion.continent;
by Continent_ID;
run;
Matches on equal
values for like-named
variables
p304d04
43
4-16 Chapter 4 Introduction to Lookup Techniques
data country_info;
merge country orion.continent;
by Continent_ID;
run;
45
46
4.3 Disk Storage Techniques 4-17
PROC SQL;
CREATE TABLE SAS-data-set AS
SELECT column-1, column-2,… ,column-n
FROM table-1, table-2,…,table-n
WHERE joining criteria
ORDER BY sorting criteria;
QUIT;
Performs an inner join based
on the WHERE criteria
48
p304d05
49
4-18 Chapter 4 Introduction to Lookup Techniques
51
53
4.3 Disk Storage Techniques 4-19
54
DATA data-set-name;
SET SAS-data-set;
SET SAS-data-set;
RUN;
55
4-20 Chapter 4 Introduction to Lookup Techniques
Listing of country_info
Country_ Country_ Continent_ Country_Former
Obs Country Name Population ID ID Name Continent_Name
p304d06
56
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total _N_
1 2 . 1
57 ...
4.3 Disk Storage Techniques 4-21
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total D _N_
1 2 A . 1
58 ...
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total D _N_
1 2 A 3 1
59 ...
4-22 Chapter 4 Introduction to Lookup Techniques
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
Implicit OUTPUT;
Implicit RETURN;
PDV
X Y Z Total D _N_
1 2 A 3 1
60 ...
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
Initialize PDV.
PDV
X Y Z Total D _N_
1 2 A . 2
61 ...
4.3 Disk Storage Techniques 4-23
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total D _N_
2 3 A . 2
62 ...
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total D _N_
2 3 B . 2
63 ...
4-24 Chapter 4 Introduction to Lookup Techniques
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total D _N_
2 3 B 5 2
64 ...
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
Implicit OUTPUT;
Implicit RETURN;
PDV
X Y Z Total D _N_
2 3 B 5 2
65 ...
4.3 Disk Storage Techniques 4-25
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
Initialize PDV.
PDV
X Y Z Total D _N_
2 3 B . 3
66 ...
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
run;
PDV
X Y Z Total D _N_
3 4 B . 3
67 ...
4-26 Chapter 4 Introduction to Lookup Techniques
Execution
one two
X Y Z data three;
1 2 A set one;
2 3 B set two;
3 4 Total=X+Y;
EOF run;
Processing stops.
PDV
X Y Z Total D _N_
3 4 B . 3
three
X Y Z Total
1 2 A 3
2 3 B 5
68
71
Chapter Review
1. What are the three types of in-memory table lookups?
74
4.5 Solutions 4-29
4.5 Solutions
21
4-30 Chapter 4 Introduction to Lookup Techniques
30
47
4.5 Solutions 4-31
52
72
4-32 Chapter 4 Introduction to Lookup Techniques
75
Chapter 5 Using DATA Step Arrays
5.3 Loading a Multidimensional Array from a SAS Data Set .......................................... 5-40
Exercises .............................................................................................................................. 5-63
Objectives
Define one-dimensional arrays.
Use a one-dimensional array for a table lookup task.
1 2 3 4
4
5-4 Chapter 5 Using DATA Step Arrays
restructuring data
number-of- elements is the number of variables in the group. You must enclose this value in
parentheses, braces, or brackets.
length specifies the length of elements in the array that were not previously
assigned a length.
list-of-variables is a list of the names of the variables in the group. All variables that are
defined in a given array must be of the same type, either all character or
all numeric.
initial-values gives initial values for the corresponding positional elements in the array.
8 ...
9 ...
5.1 Using One-Dimensional Arrays 5-7
array char{4} $ 6;
creates four character variables, char1 – char4, each a length of 6
10
a. 0
b. 1
c. 12
d. Unknown
12
5-8 Chapter 5 Using DATA Step Arrays
Equivalent code:
array numarray{12} Num1 – Num12;
<additional statements>
do i=1 to 12;
<additional statements>
end;
14
DIM(array-name)
Business Scenario
The data set orion.employee_payroll contains each
employee’s hired date and current salary.
15
5.1 Using One-Dimensional Arrays 5-9
Business Scenario
The data set orion.salary_stats contains statistics for all
Orion Star employees for the years 1974 through 2007.
For example, the average salary of the employees hired
in 1974 is currently $39,243.61.
Partial Listing of orion.salary_stats
Statistic Yr1974 Yr1975 Yr1976 . . . Yr2006 Yr2007
Num_of_Emps 61 4 6 . . . 97 3
Median_Salary 30025 29442.5 30020 . . . 26970 27240
Std_Salary 28551.9 9918.35 22356.91 . . . 2579.67 2922.12
Sum_Salary 2393860 132150 235030 . . . 2704720 86585
Avg_Salary 39243.61 33037.5 39171.67 . . . 27883.71 28861.67
16
Business Scenario
The two data sets must be combined to calculate the
difference between the average salary and the actual
current salary for each employee based on the year
of hire.
Partial Listing of compare
Using One Dimensional Arrays
Year_
Obs Employee_ID Hired Salary Average Salary_Dif
1 120101 2003 $163,040.00 $35,082.50 $127,957.50
2 120102 1989 $108,255.00 $88,588.75 $19,666.25
3 120103 1974 $87,975.00 $39,243.61 $48,731.39
4 120104 1981 $46,230.00 $36,436.67 $9,793.33
5 120105 1999 $27,110.00 $36,533.75 $-9,423.75
6 120106 1974 $26,960.00 $39,243.61 $-12,283.61
7 120107 1974 $30,475.00 $39,243.61 $-8,768.61
8 120108 2006 $27,660.00 $27,883.71 $-223.71
17
5-10 Chapter 5 Using DATA Step Arrays
5.02 Poll
Can the two data sets be merged with the DATA step
MERGE statement or joined with the SQL procedure
without pre-processing the data?
Yes
No
20
5.1 Using One-Dimensional Arrays 5-11
5.03 Poll
What do the two data sets have in common?
They have the year in common.
They have nothing in common.
22
p305d01
24
c The array yr is associated with the variables Yr1974, Yr1975, Yr1976, and so forth through
YR2007.
d Read only the observation where the value of the variable Statistic is Avg_Salary.
e The value of the element on the yr array is referenced positionally by the value of the variable
Year_Hired and is assigned to the variable Average.
5-12 Chapter 5 Using DATA Step Arrays
Resulting Data
proc print data=compare(obs=8);
var Employee_ID Year_Hired Salary Average Salary_Dif;
title 'Using One Dimensional Arrays';
run;
Year_
Obs Employee_ID Hired Salary Average Salary_Dif
1 120101 2003 $163,040.00 $35,082.50 $127,957.50
2 120102 1989 $108,255.00 $88,588.75 $19,666.25
3 120103 1974 $87,975.00 $39,243.61 $48,731.39
4 120104 1981 $46,230.00 $36,436.67 $9,793.33
5 120105 1999 $27,110.00 $36,533.75 $-9,423.75
6 120106 1974 $26,960.00 $39,243.61 $-12,283.61
7 120107 1974 $30,475.00 $39,243.61 $-8,768.61
8 120108 2006 $27,660.00 $27,883.71 $-223.71
p305d01
34
5.1 Using One-Dimensional Arrays 5-17
36
Using either of the alternative ARRAY statements, you must change the array reference that creates the
variable Average.
p305d01a
data compare;
keep Employee_ID Year_Hired Salary Average Salary_Dif;
format Salary Average Salary_Dif dollar12.2;
c array yr{34} Yr1974-Yr2007;
if _N_=1 then
set orion.salary_stats(where=(Statistic='Avg_Salary'));
set orion.employee_payroll;
d Year_Hired=year(Employee_Hire_Date)-1973;
Average=yr{Year_Hired};
Salary_Dif=Salary-Average;
run;
c The array yr is associated with the variables Yr1974, Yr1975, Yr1976, and so forth through
YR2007.
d Because the subscript values for the yr array are 1 to 34, adjust the Year_Hired variable so that
yr{1} corresponds to 1974, yr{2} corresponds to 1975, and so forth. The value of the element of the
yr array is referenced positionally by the value of the variable Year_Hired and is assigned to the
variable Average.
5-18 Chapter 5 Using DATA Step Arrays
Exercises
Level 1
Order_ Delivery_
Obs Customer_ID Employee_ID Street_ID Date Date Order_ID
Total_Retail_ CostPrice_
Obs Product_ID Quantity Price Per_Unit Discount
The data set orion.retail_information has statistics about those retail sales.
Partial Listing of orion.retail_information
Partial orion.retail_information Data Set
a. Combine the two data sets to create a data set named compare. The data set should contain the
variables from orion.retail and variables named Month and Median_Retail_Price, where
Month is the month of the date that the product was ordered.
b. Print the first eight observations of the resulting data set.
PROC PRINT Output
Partial Compare Data Set
Order_ Delivery_
Obs Customer_ID Employee_ID Street_ID Date Date Order_ID
Median_
Total_Retail_ CostPrice_ Retail_
Obs Product_ID Quantity Price Per_Unit Discount Month Price
Level 2
a. Use arrays to create a data set named trans that has 24 observations.
5-20 Chapter 5 Using DATA Step Arrays
Product_
Obs Stat Line Value
1 Frequency 21 66.000
2 Frequency 22 277.000
3 Frequency 23 .
4 Frequency 24 18.000
5 Mfg_Suggested_Retail_Price_Mean 21 70.788
6 Mfg_Suggested_Retail_Price_Mean 22 174.292
7 Mfg_Suggested_Retail_Price_Mean 23 .
8 Mfg_Suggested_Retail_Price_Mean 24 173.056
9 Mfg_Suggested_Retail_Price_Min 21 17.000
10 Mfg_Suggested_Retail_Price_Min 22 13.000
11 Mfg_Suggested_Retail_Price_Min 23 .
12 Mfg_Suggested_Retail_Price_Min 24 5.000
13 Mfg_Suggested_Retail_Price_Max 21 130.000
14 Mfg_Suggested_Retail_Price_Max 22 385.000
15 Mfg_Suggested_Retail_Price_Max 23 .
16 Mfg_Suggested_Retail_Price_Max 24 398.000
17 Mfg_Suggested_Retail_Price_Median 21 68.000
18 Mfg_Suggested_Retail_Price_Median 22 164.000
19 Mfg_Suggested_Retail_Price_Median 23 .
20 Mfg_Suggested_Retail_Price_Median 24 190.500
21 Mfg_Suggested_Retail_Price_StdDev 21 21.731
22 Mfg_Suggested_Retail_Price_StdDev 22 71.703
23 Mfg_Suggested_Retail_Price_StdDev 23 .
24 Mfg_Suggested_Retail_Price_StdDev 24 141.389
Level 3
1 89 1 03JAN2007 04JAN2007 6
2 89 1 01OCT2007 01OCT2007 1
3 89 1 01OCT2007 01OCT2007 1
4 89 1 15DEC2007 15DEC2007 4
5 89 2 17JUN2007 21JUN2007 2
6 2550 3 04MAY2007 09MAY2007 3
7 2550 3 04MAY2007 09MAY2007 1
5.1 Using One-Dimensional Arrays 5-21
b. Create the data set named all that has one observation for each Order_Type where there are a
varying number of observations for each Order_Type in the original data set order_fact. Use the
maximum number of observations for each order type as the array dimension to create three
arrays that create variables to hold the order dates, the delivery dates, and the quantity.
c. Print the first three observations of all.
PROC PRINT Output
The Resulting Data Set
Delivery_ Delivery_
Obs Date3 Date4 Quantity1 Quantity2 Quantity3 Quantity4
1 01OCT2007 15DEC2007 6 1 1 4
2 . . 2 . . .
3 . . 3 1 . .
1 4 89 15DEC2007 15DEC2007 1 4
2 1 89 17JUN2007 21JUN2007 2 2
3 2 2550 04MAY2007 09MAY2007 3 1
5-22 Chapter 5 Using DATA Step Arrays
Objectives
Define a multidimensional array.
Explain the differences between a one-dimensional
array and a multidimensional array.
Use a multidimensional array as a lookup table.
41
Business Scenario
The SAS data set orion.profit has information about
every company for the years 2003 through 2007,
separated by month.
42
5.2 Using Multidimensional Arrays 5-23
5.05 Quiz
What is the type of the variable YYMM in the data set
orion.profit?
44
Business Scenario
This table contains the budgeted amounts for each of
those months and years. Each row represents a month,
and each column represents a year.
Yr2003 Yr2004 Yr2005 Yr2006 Yr2007
$1,590,000 $1,880,000 $2,300,000 $1,960,000 $1,970,000
$1,290,000 $1,550,000 $1,830,000 $1,480,000 $1,640,000
$1,160,000 $1,380,000 $1,640,000 $1,410,000 $1,440,000
$1,710,000 $2,100,000 $2,420,000 $2,130,000 $2,270,000
$1,990,000 $2,350,000 $2,840,000 $2,480,000 $2,670,000
$2,560,000 $3,020,000 $3,580,000 $3,070,000 $3,410,000
$2,590,000 $2,890,000 $3,550,000 $3,010,000 $3,490,000
$2,550,000 $2,840,000 $3,580,000 $3,030,000 $3,500,000
$1,070,000 $1,180,000 $1,550,000 $1,260,000 $1,520,000
$1,160,000 $1,270,000 $1,600,000 $1,360,000 $1,700,000
$1,260,000 $1,470,000 $1,780,000 $1,540,000 $1,950,000
$2,870,000 $3,120,000 $3,760,000 $3,210,000 $4,370,000
46 continued...
The budget values in the table are not stored in a SAS data set.
5-24 Chapter 5 Using DATA Step Arrays
Business Scenario
You need to combine the budget amounts in the table
with the actual amount in the SAS data set to create the
following report:
Listing of budget_amt
Actual vs Budgeted Amounts (Two Observations)
47
5.06 Quiz
What do the data set orion.profit and the lookup table
have in common?
Partial Listing of orion.profit (where=(Sales ne .))
Company YYMM Sales Cost Salaries Profit
Logistics 03M01 $457,809 $210,914 $127,525 $119,370
Logistics 03M02 $325,138 $149,718 $127,525 $47,895
Logistics 03M03 $276,805 $127,827 $134,198 $14,780
Logistics 03M04 $558,806 $264,868 $134,198 $159,741
49
5.2 Using Multidimensional Arrays 5-25
52
The keyword _TEMPORARY_ can be used instead of elements to avoid creating new variables in the
program data vector.
5-26 Chapter 5 Using DATA Step Arrays
53
For this example, only the first two rows are included in the array.
The initial values fill all the columns in a row before moving on to the next row.
PDV
B1 B2 B3 B4 B5 B6 B7 B8 B9 B10
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
54
When you use a multidimensional array, the following statements are true:
• You must supply a subscript value for each dimension to process a specific array element.
• You can use a DO loop to process elements in a given dimension.
• You can use nested DO loops to process elements in more than one dimension.
5.2 Using Multidimensional Arrays 5-27
Business Scenario
Find the budgeted amounts for each company, year,
and month.
58
5-28 Chapter 5 Using DATA Step Arrays
59
p305d02
60
c Ten hardcoded values initialize the array. The _TEMPORARY_ keyword creates an array that is not
associated with variables in the program data vector.
d The variable Y (the column number) is calculated using the YEAR function on the date variable,
YYMM.
e The variable M (the row number) is created using the MONTH function on the date variable,
YYMM.
f The row and column numbers are used to look up the values of Budget in the array B.
5.2 Using Multidimensional Arrays 5-29
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
. . . . . . . . 1
61 ...
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M01 457809 210914 127525 119370 . . . 1
62 ...
5-30 Chapter 5 Using DATA Step Arrays
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M01 457809 210914 127525 119370 2003 1 . 1
63 ...
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M01 457809 210914 127525 119370 2003 1 . 1
64 ...
5.2 Using Multidimensional Arrays 5-31
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{1,2003};
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M01 457809 210914 127525 119370 2003 1 1590000 1
65 ...
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit Implicit
(1590000, OUTPUT;
1880000, 2300000,
Company YYMM Sales Cost . . . Implicit
1960000, RETURN;
1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M01 457809 210914 127525 119370 2003 1 1590000 1
66 ...
5-32 Chapter 5 Using DATA Step Arrays
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
Reinitialize PDV.
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M01 457809 210914 127525 119370 . . . 2
67 ...
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M02 325138 149718 127525 47895 2003 2 . 2
68 ...
5.2 Using Multidimensional Arrays 5-33
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M02 325138 149718 127525 47895 2003 2 . 2
69 ...
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{2,2003};
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M02 325138 149718 127525 47895 2003 2 1290000 2
70 ...
5-34 Chapter 5 Using DATA Step Arrays
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit Implicit
(1590000, OUTPUT;
1880000, 2300000,
Company YYMM Sales Cost . . . Implicit
1960000, RETURN;
1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M02 325138 149718 127525 47895 2003 2 1290000 2
71 ...
Execution
data budget_amt;
drop Y M;
array B{2,2003:2007} _temporary_
Partial Listing of orion.profit (1590000, 1880000, 2300000,
Company YYMM Sales Cost . . . 1960000, 1970000, 1290000,
1550000, 1830000, 1480000,
Logistics 03M01 457809 210914 . . . 1640000);
set orion.profit(where=(Sales ne .)
Logistics 03M02 325138 149718 . . . obs=2);
Y=year(YYMM);
M=month(YYMM);
BudgetAmt=B{M,Y};
Execution stops. run;
1590000 1880000 2300000 1960000 1970000 1290000 1550000 1830000 1480000 1640000
PDV
Company YYMM Sales Cost Salaries Profit D Y DM BudgetAmt D_N_
Logistics 03M02 325138 149718 127525 47895 2003 2 1290000 2
72 ...
5.2 Using Multidimensional Arrays 5-35
Exercises
Level 1
The data set orion.order_fact contains the variables Customer_ID, Quantity, and Order_Type.
Partial Listing of orion.order_fact
Order_
Obs Customer_ID Type Quantity
1 63 1 1
2 5 2 1
3 45 2 1
4 41 1 2
5 183 1 3
6 79 2 1
7 23 2 1
8 23 2 2
9 45 2 2
10 45 2 1
a. Use a two-dimensional array to combine the data set with the table of values to create a data set
named customer_coupons with a variable named Coupon_Value.
5-36 Chapter 5 Using DATA Step Arrays
Order_ Coupon_
Obs Customer_ID Type Quantity Value
1 63 1 1 10
2 5 2 1 10
3 45 2 1 10
4 41 1 2 10
5 183 1 3 15
Level 2
Product
Line 1 2
21 . 70.79
22 173.79 174.40
23 . .
24 29.65 287.8
The data set orion.shoe_sales contains the Product_ID, the Product_Name, and the
Total_Retail_Price for all of the shoes sold by Orion Star.
Partial Listing of orion.shoe_sales
Total_Retail_
Product_ID Product_Name Price
a. Create a data set named combine using a two-dimensional array to combine the table of values
with the product line and the product category ID. The product line is the first two digits of the
Product_ID variable. The product category ID is the third and fourth digits of the Product_ID
variable.
b. Print the first five observations of the combine data set.
PROC PRINT Output
Total_Retail_
Obs Product_ID Product_Name Price
Manufacturer_
Product_ Product_ Suggested_
Obs Prod_ID Line Cat_ID Price
1 220200200024 22 2 174.40
2 220200100092 22 2 174.40
3 240200100043 24 2 287.80
4 220100700024 22 1 173.79
5 220200300157 22 2 174.40
Level 3
6. Using a Three-Dimensional Array
The warehouse location for the products in the orion.product_list data set is given in the following
table:
Warehouse Locations
21 0 0 A2100
21 0 1 A2101
21 1 0 A2110
21 1 1 A2111
21 2 0 A2120
21 2 1 A2121
22 0 0 B2200
22 0 1 B2201
22 1 0 B2210
22 1 1 B2211
22 2 0 B2220
22 2 1 B2221
Open the program p305e06 that retrieves the Level 1products from the orion.product_list data set.
p305e06
data warehouses;
set orion.product_list(keep=Product_ID Product_Name
Product_Level
where=(Product_Level=1));
Prod_ID=put(Product_ID,12.);
Product_Line=input(substr(Prod_ID,1,2),2.);
Product_Cat_ID=input(substr(Prod_ID,3,2),2.);
Product_Loc_ID=input(substr(Prod_ID,12,1),1.);
/* subset the data for this exercise */
if Product_Line in (21,22) and Product_Cat_ID<=2
and Product_Loc_ID<2;
run;
a. Type the values of the Warehouse column into a three-dimensional array using the values of
Product_Line, Product_Cat_ID, and Product_Loc_ID as the dimensions.
b. Create a data set named warehouses. Use the Product_ID variable to determine the values of
Product_Line, Product_Cat_ID, and Product_Loc_ID.
• The product line is the first two digits of the Product_ID variable.
• The product category ID is the third and fourth digits of the Product_ID variable.
• The product location ID identifies the location within a warehouse of the product and is the last
digit of the Product_ID variable.
c. Print the first five observations of the warehouses data set.
PROC PRINT Output
Warehouses Data
Product_
Obs Product_ID Product_Name Level
1 210200400020 21 2 0 A2120
2 210200400070 21 2 0 A2120
3 210201000050 21 2 0 A2120
4 220100100101 22 1 1 B2211
5 220100100241 22 1 1 B2211
5-40 Chapter 5 Using DATA Step Arrays
Objectives
Load a multidimensional array from a SAS data set.
Identify the advantages of an array as a lookup table.
Identify the disadvantages of an array as a lookup
table.
76
Business Scenario
Budget values are stored in a SAS data set named
orion.budget where the rows represent months
and the columns represent years.
Load the array from the values in the SAS data set.
Listing of orion.budget
Month Yr2003 Yr2004 Yr2005 Yr2006 Yr2007
1 $1,590,000 $1,880,000 $2,300,000 $1,960,000 $1,970,000
2 $1,290,000 $1,550,000 $1,830,000 $1,480,000 $1,640,000
3 $1,160,000 $1,380,000 $1,640,000 $1,410,000 $1,440,000
4 $1,710,000 $2,100,000 $2,420,000 $2,130,000 $2,270,000
5 $1,990,000 $2,350,000 $2,840,000 $2,480,000 $2,670,000
6 $2,560,000 $3,020,000 $3,580,000 $3,070,000 $3,410,000
7 $2,590,000 $2,890,000 $3,550,000 $3,010,000 $3,490,000
8 $2,550,000 $2,840,000 $3,580,000 $3,030,000 $3,500,000
9 $1,070,000 $1,180,000 $1,550,000 $1,260,000 $1,520,000
10 $1,160,000 $1,270,000 $1,600,000 $1,360,000 $1,700,000
11 $1,260,000 $1,470,000 $1,780,000 $1,540,000 $1,950,000
12 $2,870,000 $3,120,000 $3,760,000 $3,210,000 $4,370,000
77
5.3 Loading a Multidimensional Array from a SAS Data Set 5-41
78
The subscript variables I and J are used to process all the budget values in orion.budget.
c For each value of I, the SET statement reads an observation from the data set orion.budget
and fills a row in the tmp array.
d For each value of J, the yearly budget value, referenced through tmp{J}, is assigned to the
corresponding position J in the current row of the B array. The current row of the B array is
referenced by the value of I.
e The two-dimensional array B is loaded with the values of the tmp array.
5-42 Chapter 5 Using DATA Step Arrays
a. 0
b. 24
c. 48
d. 60
81
. . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
83 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-43
. . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
84 ...
. . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
85 ...
5-44 Chapter 5 Using DATA Step Arrays
. . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
86 ...
. . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
87 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-45
1590000 . . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
88 ...
1590000 . . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
89 ...
5-46 Chapter 5 Using DATA Step Arrays
1590000 . . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
90 ...
1590000 1880000 . . .
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
91 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-47
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
92 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
93 ...
5-48 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
94 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
95 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-49
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
96 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
97 ...
5-50 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
98 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
99 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-51
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
100 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
101 ...
5-52 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
102 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
103 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-53
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
104 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
105 ...
5-54 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
106 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
107 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-55
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
108 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
. . . . . . . . 1
109 ...
5-56 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 . . . 1
110 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 2003 1 . 1
111 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-57
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 2003 1 . 1
112 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 2003 1 1590000 1
113 ...
5-58 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 2003 1 1590000 1
114 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 . . . 2
115 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-59
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M01 457809 210914 127525 119370 . . . 2
116 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M02 325138 149718 127525 47895 . . . 2
117 ...
5-60 Chapter 5 Using DATA Step Arrays
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M02 325138 149718 127525 47895 2003 2 1290000 2
118 ...
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M02 325138 149718 127525 47895 2003 2 1290000 2
119 ...
5.3 Loading a Multidimensional Array from a SAS Data Set 5-61
Budget
YYMM Sales Cost Salaries Profit D Y DM D_N_
Amt
03M02 325138 149718 127525 47895 2003 2 1290000 2
120 ...
Using an Array
Advantages Disadvantages
of Using an Array of Using an Array
faster than a hash object or a contiguous chunk of memory
format if you can use it requested at compile time
use of positional order memory requirements to load
the entire array
use of multiple values to requirement that you must have
determine the array element to a numeric value as a pointer to
be returned the array elements
ability to use a non-sorted and the return of only a single value
non-indexed base data set from the lookup operation
use of numeric expressions to dimensions supplied at compile
determine which element of the time by either hardcoding or
array is to be looked up; exact macro variables
121
match not required
5-62 Chapter 5 Using DATA Step Arrays
Review of Arrays
Array
122
5.3 Loading a Multidimensional Array from a SAS Data Set 5-63
Exercises
Level 1
1 1 10 10 15 20 20 25
2 2 10 15 20 25 25 30
3 3 10 15 15 20 25 25
The data set orion.order_fact contains variables Customer_ID, Order_Type, and Quantity.
Partial Listing of orion.order_fact
Order_
Obs Customer_ID Type Quantity
1 63 1 1
2 5 2 1
3 45 2 1
4 41 1 2
5 183 1 3
6 79 2 1
7 23 2 1
8 23 2 2
9 45 2 2
10 45 2 1
a. Create a two-dimensional array with the values from orion.coupons. Use values from
orion.order_fact and the array to create a new variable named Coupon_Value. Name the new
data set customer_coupons.
5-64 Chapter 5 Using DATA Step Arrays
Order_ Coupon_
Obs Customer_ID Type Quantity Value
1 63 1 1 10
2 5 2 1 10
3 45 2 1 10
4 41 1 2 10
5 183 1 3 15
6 79 2 1 10
7 23 2 1 10
8 23 2 2 15
9 45 2 2 15
10 45 2 1 10
1 1 1 10
2 1 2 10
3 1 3 15
4 1 4 20
5 1 5 20
6 1 6 25
7 2 1 10
8 2 2 15
9 2 3 20
10 2 4 25
11 2 5 25
12 2 6 30
13 3 1 10
14 3 2 15
15 3 3 15
16 3 4 20
17 3 5 25
18 3 6 25
The data set orion.order_fact contains variables Customer_ID, Order_Type, and Quantity.
5.3 Loading a Multidimensional Array from a SAS Data Set 5-65
1 63 1 1
2 5 2 1
3 45 2 1
4 41 1 2
5 183 1 3
6 79 2 1
7 23 2 1
8 23 2 2
9 45 2 2
10 45 2 1
a. Create a two-dimensional array with the values from orion.coupon_pct. Use values from
orion.order_fact and the array to create a new variable named Coupon_Value. Name the new
data set customer_coupons.
b. Print the first 10 observations of the customer_coupons data set.
PROC PRINT Output
The Coupon Value
Order_ Coupon_
Obs Customer_ID Type Quantity Value
1 63 1 1 10
2 5 2 1 10
3 45 2 1 10
4 41 1 2 10
5 183 1 3 15
6 79 2 1 10
7 23 2 1 10
8 23 2 2 15
9 45 2 2 15
10 45 2 1 10
5-66 Chapter 5 Using DATA Step Arrays
Level 2
1 21 2101 .
2 21 2102 70.79
3 22 2201 173.79
4 22 2202 174.40
5 23 2301 .
6 23 2302 .
7 24 2401 29.63
8 24 2402 287.80
The data set orion.shoe_sales contains the Product_ID, Product_Name, and Total_Retail_Price
for all of the shoes sold by Orion Star.
Partial Listing of orion.shoe_sales
Total_Retail_
Product_ID Product_Name Price
a. Create a data set named combine using a two-dimensional array to combine the table of values from
orion.msp with orion.shoe_sales. Create a new variable named Manufacturer_Suggested_Price
based on the values of product line and product category. The product line is the first two digits of
the Product_ID variable. The product category ID is the third and fourth digits of the Product_ID
variable. Keep only the Product_ID, Product_Name, Total_Retail_Price, and
Manufacturer_Suggested_Price variables.
5.3 Loading a Multidimensional Array from a SAS Data Set 5-67
1 $174.40 220200200024 Pro Fit Gel Gt 2030 Women's Running Shoes $178.50
2 $174.40 220200100092 Big Guy Men's Air Terra Sebec Shoes $83.00
3 $287.80 240200100043 Bretagne Performance Tg Men's Golf Shoes L. $282.40
4 $173.79 220100700024 Armadillo Road Dmx Women's Running Shoes $99.70
5 $174.40 220200300157 Hardcore Men's Street Shoes Large $220.20
Level 3
1 21 0 0 A2100
2 21 0 1 A2101
3 21 1 0 A2110
4 21 1 1 A2111
5 21 2 0 A2120
6 21 2 2 A2122
7 21 2 3 A2123
8 21 2 4 A2124
9 21 2 5 A2125
10 21 2 6 A2126
Chapter Review
1. Define an array.
125
Chapter Review
4. How many elements are created in the following
ARRAY statement?
array myarray{5:9,7};
127
5-70 Chapter 5 Using DATA Step Arrays
5.5 Solutions
Solutions to Exercises
1. Using a One-Dimensional Array to Combine Data
a. Combine the two data sets to create a data set named compare. The data set should contain the
variables from orion.retail and variables named Month and Median_Retail_Price, where
Month is the month of the date that the product was ordered.
b. Print the first eight observations of the resulting data set.
p305s01
data compare;
drop Month1-Month12 Statistic;
array mon{12} Month1-Month12;
if _N_=1 then
set orion.retail_information
(where=(Statistic='Median_Retail_Price'));
set orion.retail;
Month=month(Order_Date);
Median_Retail_Price=mon(Month);
run;
/************************************************************/
/* The SQL step counts the number of Order Types so that */
/* you know the dimensions for the arrays that the program */
/* needs. */
/************************************************************/
proc sql;
select Order_Type, count(*)
from order_fact
group by Order_Type;
quit;
(Continued on the next page.)
5-72 Chapter 5 Using DATA Step Arrays
/************************************************************/
/* The DATA step creates 4 variables for the order dates, */
/* 4 for the delivery dates, and 4 for the quantities. */
/* N is a counter of observations for each Order_Type. */
/* N needs to initialized to 0 when the DATA step iterates. */
/* The DATA step will execute a new time when the DO UNTIL */
/* loop ends. This happens when the last observation */
/* for an Order_Type has been processed. */
/* The three assignment statements in the DO loop */
/* are creating the variables for each value of Order_Type. */
/************************************************************/
data all;
array ordt{*} Ordered_Date1-Ordered_Date4;
array deldt{*} Delivery_Date1-Delivery_Date4;
array q{*} Quantity1 - Quantity4;
format Ordered_Date1-Ordered_Date4
Delivery_Date1-Delivery_Date4
date9.;
N=0;
do until (last.Order_Type);
set order_fact;
by Order_Type;
N+1;
ordt{N}=Order_Date;
deldt{N}=Delivery_Date;
q{N}=Quantity;
end;
run;
/***********************************************************/
/* to get a macro variable for the number of observations */
/* in order_fact */
/* */
/* proc sql; */
/* create table temp as */
/* select count(*) as Num */
/* from order_fact */
/* group by Customer_ID; */
/* select max(num) into :NumObs */
/* from temp; */
/* */
/* Then substitute &NumObs into the program instead of */
/* the 4 */
/***********************************************************/
proc sql;
create table temp as
select count(*) as Num
from order_fact
group by Order_Type
select max(num) into :NumObs
from temp;
%let NumObs=&NumObs;
quit;
data all;
array ordt{*} Ordered_Date1-Ordered_Date&NumObs;
array deldt{*} Delivery_Date1-Delivery_Date&NumObs;
array q{*} Quantity1 - Quantity&NumObs;
format Ordered_Date1-Ordered_Date&NumObs
Delivery_Date1-Delivery_Date&NumObs
date9.;
N=0;
do until (last.Order_Type);
set order_fact;
by Order_Type;
N+1;
ordt{N}=Order_Date;
deldt{N}=Delivery_Date;
q{N}=Quantity;
end;
run;
proc print data=all;
run;
5-74 Chapter 5 Using DATA Step Arrays
b. Create a data set named warehouses. Use the Product_ID variable to determine the values of
Product_Line, Product_Cat_ID, and Product_Loc_ID.
• The product line is the first two digits of the Product_ID variable.
• The product category ID is the third and fourth digits of the Product_ID variable.
• The product location ID identifies the location within a warehouse of the product and is the last
digit of the Product_ID variable.
c. Print the first five observations of the warehouses data set.
p305s06
data warehouses;
array W{21:22,0:2,0:1} $ 5 _temporary_ ('A2100',
'A2101',
'A2110',
'A2111',
'A2120',
'A2121',
'B2200',
'B2201',
'B2210',
'B2211',
'B2220',
'B2221');
set orion.product_list(keep=Product_ID Product_Name
Product_Level
where=(Product_Level=1));
Prod_ID=put(Product_ID,12.);
Product_Line=input(substr(Prod_ID,1,2),2.);
Product_Cat_ID=input(substr(Prod_ID,3,2),2.);
Product_Loc_ID=input(substr(Prod_ID,12,1),1.);
/* subset the data for this exercise */
if Product_Line in (21,22) and Product_Cat_ID<=2
and Product_Loc_ID<2;
Warehouse=W(Product_Line, Product_Cat_ID, Product_Loc_ID);
run;
a. 0
b. 1
c. 12
d. Unknown
13
21
5-80 Chapter 5 Using DATA Step Arrays
23
37
5.5 Solutions 5-81
45
50
5-82 Chapter 5 Using DATA Step Arrays
a. 0
b. 24
c. 48
d. 60
82
5.5 Solutions 5-83
126
35
5. What are the names of the variables created
by the ARRAY statement in question 4?
myarray1 – myarray35
128
5-84 Chapter 5 Using DATA Step Arrays
Chapter 6 Using DATA Step Hash
and Hiter Objects
6.3 Loading a Hash Object with Data from a SAS Data Set ............................................ 6-31
Exercises .............................................................................................................................. 6-42
6.5 Using a Hash Object for Chained Lookups (Self-Study) ........................................... 6-67
Demonstration: Creating a List of Values............................................................................. 6-82
6.1 Introduction
Objectives
Define the DATA step hash object.
6.01 Poll
Have you used hash objects in SAS or other computer
languages?
Yes
No
5
6-4 Chapter 6 Using DATA Step Hash and Hiter Objects
is sized dynamically
Additional information about the hash object is available at the DATA Step Community Web site:
support.sas.com/rnd/base/index-datastep.html
can be composite
9
6-6 Chapter 6 Using DATA Step Hash and Hiter Objects
An attribute is a property.
A method is a function.
10
6.2 Using Hash Object Methods 6-7
Objectives
Investigate hash object syntax.
Use hash object methods to load data into
a hash object.
Use a hash object method to match records.
12
Business Scenario
The SAS data set orion.europe_customers has
variables that contain the customer type for the last year
and for the current year.
Listing of orion.europe_customers
LastYr ThisYr
Customer_Name Customer_Address Country
Type Type
Cornelia Krahl Kallstadterstr. 9 DE 20 20
Elke Wallstab Carl-Zeiss-Str. 15 DE 10 20
Markus Sepke Iese 1 DE 20 10
Ulrich Heyde Oberstr. 61 DE 30 10
Oliver S. Füßling Hechtsheimerstr. 18 DE 20 30
Rolf Robak Münsterstraße 67 DE 10 30
Thomas Leitmann Carl Von Linde Str. 13 DE 10 20
Gert-Gunter Mendler Humboldtstr. 1 DE 20 30
Carsten Maestrini Münzstr. 28 DE 20 30
Ines Deisser Bahnweg 1 DE 10 20
13
6-8 Chapter 6 Using DATA Step Hash and Hiter Objects
Business Scenario
Code Member Type
Customer descriptions must be assigned Orion Club
based on customer code values for 10
members
member type. The values are shown Orion Club
20
Gold members
in the table on the right but are not stored Internet/
in a SAS data set. 30 Catalog
customers
Listing of orion.europe_customers
LastYr ThisYr
Customer_Name Customer_Address Country
Type Type
Cornelia Krahl Kallstadterstr. 9 DE 20 20
Elke Wallstab Carl-Zeiss-Str. 15 DE 10 20
Markus Sepke Iese 1 DE 20 10
Ulrich Heyde Oberstr. 61 DE 30 10
Oliver S. Füßling Hechtsheimerstr. 18 DE 20 30
Rolf Robak Münsterstraße 67 DE 10 30
Thomas Leitmann Carl Von Linde Str. 13 DE 10 20
Gert-Gunter Mendler Humboldtstr. 1 DE 20 30
Carsten Maestrini Münzstr. 28 DE 20 30
Ines Deisser Bahnweg 1 DE 10 20
14
A set of lookup values can be stored in a hash object. Whereas an array uses a series of consecutive
integers to address array elements, a hash object can use any combination of numeric and character values
as addresses.
16
6.2 Using Hash Object Methods 6-9
Code MemberType
HASH Object T
KEY DATA
10 Orion Club members
20 Orion Club Gold members
30 Internet/Catalog Customers
18
p306d01
19
6-10 Chapter 6 Using DATA Step Hash and Hiter Objects
Compilation
data mem_type;
length Code $2 MemberType $40;
if _N_=1 then do;
declare hash T();
T.definekey('Code');
T.definedata('MemberType');
T.definedone();
T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
T.add(key:'30',data:'Internet/Catalog Customers');
end;
set orion.europe_customers;
rc1=T.find(key:ThisYrType);
if rc1=0 then ThisYrMember=MemberType;
rc2=T.find(key:LastYrType);
if rc2=0 then LastYrMember=MemberType;
run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . .
20 ...
Execution
Partial orion.europe_customers
True
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
Markus
T.definedata('MemberType');
. . 20 10 T.definedone();
Sepke
. . . T.add(key:'10',data:'Orion Club members');
. . . . . T.add(key:'20',data:'Orion Club Gold members');
. . . T.add(key:'30',data:'Internet/Catalog Customers');
end;
set orion.europe_customers;
rc1=T.find(key:ThisYrType);
if rc1=0 then ThisYrMember=MemberType;
rc2=T.find(key:LastYrType);
if rc2=0 then LastYrMember=MemberType;
run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . 1
21 ...
6.2 Using Hash Object Methods 6-11
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
Gold members rc2=T.find(key:LastYrType);
Internet/Catal if rc2=0 then LastYrMember=MemberType;
30
og Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . 1
22 ...
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . 1
25 ...
6-12 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . 1
29 ...
31
6.2 Using Hash Object Methods 6-13
34
6-14 Chapter 6 Using DATA Step Hash and Hiter Objects
35
6.2 Using Hash Object Methods 6-15
Reference Information
HASHEXP
In order to maximize the efficiency of the hash object lookup routines, you should set the hash table size
according to the amount of data in the hash object.
• The hash table is similar to an array of buckets. If the HASHEXP=4, the hash would have 16 buckets.
This does not limit the hash table to 16 key values. Each bucket can hold an unlimited number of keys.
• When the DATA step adds data to the hash object or retrieves data values from the hash table, the key is
passed to a hash function, which returns the number of the bucket in which to add data or from which
to retrieve data.
• If the number of key and data combinations is larger than the number of buckets, performance might be
reduced because more combinations will be stored per bucket in a binary tree structure.
• If the number of key and data combinations is smaller than the number of buckets, then some of the
buckets will be empty, which wastes memory.
Try different HASHEXP values until you obtain the best result. For example, if the hash object contains a
large number of items, a hash table size of 16 (hashexp=4) is not very efficient. A hash table size of 512
or 1024 (hashexp=9 or 10) results in better performance.
If there is not enough memory in which to load the hash object, the load fails.
Several techniques can be used to determine the amount of memory required by the hash object.
• The size of a hash record is approximately the sum of the sizes of values being placed into the record.
For example, two million 64-byte records take approximately 128 MB. If the SAS system option
MEMSIZE= is set larger than 128 MB and the machine can support executing SAS with at least 128
MB of memory free for loading the hash object, the hash object loads successfully.
• Use the FULLSTIMER SAS system option to determine how much memory the hash object uses with
fewer records. For example, if you load approximately one-third of the records into the hash object,
you can multiply the amount of memory reported by FULLSTIMER by three to determine the
approximate amount of memory needed for the entire hash table. This is an estimate, because the
reported memory usage includes the memory needed to execute the non-hash object portions of the
DATA step.
• The maximum size of the hash object that you can load depends on the maximum amount of memory
addressable per CPU on your particular operating system. For instance, a 4-CPU computer with 8 GB
of memory might limit each CPU to 2 GB of memory. In this case, the maximum size of a hash object
would be less than 2 GB.
Suggestions to avoid memory constraints include the following:
• Subset large data sets before loading the data into the hash object.
• Create a view of a large data set. The view should include syntax that limits the number of columns that
need to be read from the large data set into the hash table.
• Make the length of the hash record as small as possible. For example, instead of the numeric values of
1 and 2, store the values as character '1' and '2'. Numeric data is always stored as 8 bytes in the hash
record.
6-16 Chapter 6 Using DATA Step Hash and Hiter Objects
In SAS 9.2, you can use data set options in the DECLARE statement when you load the hash object from
a SAS data set.
Example:
declare hash T(dataset: 'orion.members(where=(Code=102
keep=Code Member_Type)';
6.2 Using Hash Object Methods 6-17
OBJECT.METHOD(<arg_tag-1: value-1
<,…arg_tag-n: value-n>>);
Without the DEFINEDONE method, the log reports the following errors:
ERROR: Method defineDone must be called to complete initialization of hash object before line
189 column 7.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
Selected hash object methods available in SAS 9.1 include the following:
DEFINEKEY defines key variables for the hash object.
FIND searches the hash object for a key value, and returns a zero if successful. If the
key is in the hash object, then the FIND method also sets the data variable to
the value of the data item so that it is available for use after the method call.
OUTPUT outputs the hash object’s data values to a SAS data set.
REMOVE removes a key and its associated data from the hash object.
CHECK checks whether the specified key is stored in the hash object.
Additional hash object methods available in SAS 9.2 include the following:
CLEAR removes all items from the hash object without deleting the hash object instance.
FIND_NEXT sets the current list item to the next item in the current key's multiple item list and
sets the data for the corresponding data variables.
FIND_PREV sets the current list item to the previous item in the current key's multiple item list
and sets the data for the corresponding data variables.
HAS_NEXT determines whether there is a next item in the current key's multiple data item list.
HAS_PREV determines whether there is a previous item in the current key's multiple data item
list.
REF consolidates the FIND and ADD methods into a single method call.
REMOVEDUP removes the data that is associated with the specified key's current data item from
the hash object.
REPLACEDUP replaces the data that is associated with the current key's current data item with new
data.
SUM retrieves the summary value for a given key from the hash table and stores the value
in a DATA step variable.
SUMDUP retrieves the summary value for the current data item of the current key and stores
the value in a DATA step variable.
6.2 Using Hash Object Methods 6-19
38
6.04 Quiz
Why were the statements and methods that instantiate
and load the hash object inside an IF-THEN/DO group?
data mem_type;
length Code $2 MemberType $40;
if _N_=1 then do;
declare hash T();
T.definekey('Code');
T.definedata('MemberType');
T.definedone();
T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
T.add(key:'30',data:'Internet/Catalog Customers');
end;
set orion.europe_customers;
rc1=T.find(key:ThisYrType);
if rc1=0 then ThisYrMember=MemberType;
rc2=T.find(key:LastYrType);
if rc2=0 then LastYrMember=MemberType;
run;
40
6-20 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Cornelia Krahl 20 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . 1
43 ...
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Cornelia Krahl 20 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 . 1
44 ...
6.2 Using Hash Object Methods 6-21
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Orion Club Gold members Cornelia Krahl 20 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 . 1
45 ...
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Orion Club Gold members Cornelia Krahl 20 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 Orion Club Gold members . 1
46 ...
6-22 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Orion Club Gold members Cornelia Krahl 20 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 Orion Club Gold members 0 Orion Club Gold members 1
50 ...
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold Initialize PDV.
rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Cornelia Krahl 20 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
. . 2
52 ...
6.2 Using Hash Object Methods 6-23
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke T.definekey('Code');
. . 10 20
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Orion Club members Elke Wallstab 10 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 Orion Club Gold members 0 Orion Club members 2
61 ...
Execution
Partial orion.europe_customers
Last This
Customer data mem_type;
. . Yr Yr
_Name
Type Type length Code $2 MemberType $40;
Cornelia
. . 20 20
if _N_=1 then do;
Krahl declare hash T();
Elke
. . 10 20 T.definekey('Code'); Continue until EOF.
Wallstab
T.definedata('MemberType');
HASH Object T T.definedone();
KEY: DATA: T.add(key:'10',data:'Orion Club members');
T.add(key:'20',data:'Orion Club Gold members');
Code MemberType T.add(key:'30',data:'Internet/Catalog Customers');
Orion Club end;
10 set orion.europe_customers;
members
Orion Club Gold rc1=T.find(key:ThisYrType);
20 if rc1=0 then ThisYrMember=MemberType;
members rc2=T.find(key:LastYrType);
Internet/Catalog if rc2=0 then LastYrMember=MemberType;
30
Customers run;
Partial PDV
Member Customer_ LastYr ThisYr
Code
Type Name ... Type Type
...
Orion Club members Ines Deisser 10 20
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 Orion Club Gold members 0 Orion Club members 10
63 ...
6-24 Chapter 6 Using DATA Step Hash and Hiter Objects
ThisYr LastYr
rc1 rc2 D _N_
Member Member
0 Orion Club Gold members -2147450842 8
65
66
6.2 Using Hash Object Methods 6-25
6.06 Quiz
Submit the program p306a01 and examine the SAS log.
What are the notes about Code and MemberType?
69
6.07 Quiz
As the last statement in the DO group in the program
p306a01, add the statement:
call missing(Code, MemberType);
72
6-26 Chapter 6 Using DATA Step Hash and Hiter Objects
74
If the current length of the character variable is any value up to the maximum length, the current
length is not changed. Otherwise, if no length is set for the variable, the current length is set to 1.
6.2 Using Hash Object Methods 6-27
75
KEY: keyvalue specifies the key value whose type must match the corresponding key
variable that is specified in a DEFINEKEY method call. The number of
KEY: keyvalue pairs depends on the number of key variables that you
define by using the DEFINEKEY method.
Exercises
Level 1
1. Using the ADD Method to Create a Hash Object with a Single Key
The following table shows the code that Orion Star uses for each type of order and the description of
the order:
1 Retail Sale
2 Catalog Sale
3 Internet Sale
Level 2
AU Australia
Employee_
Obs State_Name Country_Name ID Country
Level 3
3. Using the ADD Method and Creating a SAS Data Set from a Hash Object
The following table contains the continent ID, the location, and the name of the continent:
a. Write a DATA step to create a hash object from the values in the table.
b. After the hash object is created, use the OUTPUT method to create a SAS data set named
continents.
Hint: Consult the SAS OnlineDoc to determine how to use the OUTPUT method.
c. Print the continents data set.
PROC PRINT Output
continents Data Set
Continent_
Obs ID Continent_Name Location
1 96 Australia/Pacific South
2 95 Asia South
3 94 Africa South
4 93 Europe North
5 91 North America North
6.3 Loading a Hash Object with Data from a SAS Data Set 6-31
6.3 Loading a Hash Object with Data from a SAS Data Set
Objectives
Load a hash object from a SAS data set.
Use a hash object method to match records.
80
Business Scenario
The data set, orion.supplier, contains demographics
about the suppliers for the products.
Partial Listing of orion.supplier
Sup_
Supplier_ Street_
Supplier_Name Supplier_Address Street_ Country
ID ID
Number
Scandinavian
50 6850100389 Kr. Augusts Gate 13 13 NO
Clothing A/S
109 Petterson AB 8500100286 Blasieholmstorg 1 1 SE
316 Prime Sports Ltd 9250103252 9 Carlisle Place 9 GB
755 Top Sports 3150108266 Jernbanegade 45 45 DK
AllSeasons
772 9260115819 553 Cliffview Dr 553 US
Outdoor Clothing
. . . . . .
. . . . . .
. . . . . .
81
6-32 Chapter 6 Using DATA Step Hash and Hiter Objects
Business Scenario
You need to combine orion.supplier with the data set,
orion.product_list, which contains product information.
Partial Listing of orion.product_list
Supplier_ Product_ Product_
Product_ID Product_Name
ID Level Ref_ID
210000000000 Children . 4 .
210100000000 Children Outdoors . 3 210000000000
Outdoor things,
210100100000 . 2 210100000000
Kids
210200000000 Children Sports . 3 210000000000
210200100000 A-Team, Kids . 2 210200000000
Kids Sweat Round
210200100009 3298 1 210200100000
Neck,Large Logo
. . . . .
. . . . .
. . . The first
five
. values of .
Supplier_ID are missing.
82
Execution
orion.product_list (obs=1)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
210000000000 Children . 4 .
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
. . . 1
84 ...
Execution
orion.product_list (obs=1)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
210000000000 Children . 4 .
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
210000000000 Children . . 1
85 ...
6-34 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
orion.product_list (obs=1)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
210000000000 Children . 4 .
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0; 2147450842
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
210000000000 Children . 2147450842 1
86 ...
Execution
orion.product_list (obs=1)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
210000000000 Children . 4 .
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US
Sports
Ct
rc=S.find();
if rc=0;
False
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
210000000000 Children . 2147450842 1
87 ...
6.3 Loading a Hash Object with Data from a SAS Data Set 6-35
Execution
orion.product_list (obs=6)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
Kids Sweat Round
210200100009 3298 1 210200100000
Neck,Large Logo
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
S.definekey('Supplier_ID');
Clothing A/S
Gate 13
Blasieh-
Continue until
S.definedata('Supplier_Name',
'Supplier_Address',
109 Petterson AB olmstorg
1
SE
'Country'); _N_=6.
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
210200100009
Kids Sweat Round
Neck,Large Logo 3298 . 6
88 ...
Execution
orion.product_list (obs=6)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
Kids Sweat Round
210200100009 3298 1 210200100000
Neck,Large Logo
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
Kids Sweat Round
210200100009
Neck,Large Logo 3298 0 6
89 ...
6-36 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
orion.product_list (obs=6)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
Kids Sweat Round
210200100009 3298 1 210200100000
Neck,Large Logo
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
Kids Sweat Round
A Team Sports 2687 Julie Ann Ct US 210200100009
Neck,Large Logo 3298 0 6
90 ...
Execution
orion.product_list (obs=6)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
Kids Sweat Round
210200100009 3298 1 210200100000
Neck,Large Logo
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . True
Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
Kids Sweat Round
A Team Sports 2687 Julie Ann Ct US 210200100009
Neck,Large Logo 3298 0 6
91 ...
6.3 Loading a Hash Object with Data from a SAS Data Set 6-37
Execution
orion.product_list (obs=6)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
Kids Sweat Round
210200100009 3298 1 210200100000
Neck,Large Logo
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
50
Scandinavian
Clothing A/S
Augusts NO Implicit OUTPUT;
S.definekey('Supplier_ID');
Gate 13
Blasieh- Implicit RETURN;
S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
Prime Sports
9 S.definedone();
316 Carlisle GB call missing(Supplier_Name,
Ltd
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
Kids Sweat Round
A Team Sports 2687 Julie Ann Ct US 210200100009
Neck,Large Logo 3298 0 6
92 ...
Execution
orion.product_list (obs=556)
Supplier Product Product_
Product_ID Product_Name
_ID _Level Ref_ID
Top Equipe 99
240800200063 13198 1 210200100000
Black
data supplier_info;
Partial HASH Object S drop rc;
length Supplier_Name $ 40
KEY: DATA: DATA: DATA: Supplier_Address $ 45
Supplier Supplier_ Supplier_ Country
_ID Name Address
Country $ 2;
if _N_=1 then do;
Kr. declare hash S(dataset:'orion.supplier');
Scandinavian
50 Augusts NO
Clothing A/S
Gate 13
S.definekey('Supplier_ID');
Blasieh- S.definedata('Supplier_Name',
109 Petterson AB olmstorg SE 'Supplier_Address',
1 'Country');
S.definedone();
316
Prime Sports
Ltd
9
Carlisle GB call missing(Supplier_Name,Continue until EOF.
Place
. . . .
Supplier_Address,
. . . . Country);
. . . . end;
A Team
2687 set orion.product_list;
3298 Julie Ann US rc=S.find();
Sports
Ct
if rc=0;
run;
Partial PDV
Supplier_ Supplier_ Product Product_ Supplier
rc D_N_
Name Address
Country
_ID Name _ID . . .D
1648 Bloodworth Top Equipe 99
Twain Inc St US 240800200000
Black 13198 0 556
93 ...
6-38 Chapter 6 Using DATA Step Hash and Hiter Objects
Results
proc print data=supplier_info(obs=10);
var Product_ID Supplier_ID Supplier_Name
Supplier_Address Country;
title "Product Information";
run;
94 p306d02
96
6.3 Loading a Hash Object with Data from a SAS Data Set 6-39
Not Creating rc
The program created the variable rc and then dropped it.
How can you avoid creating the variable so that you
do not have to drop it?
data supplier_info;
length Supplier_Name $ 40 Supplier_Address $ 45
Country $ 2;
if _N_=1 then do;
declare hash S(dataset:'orion.supplier');
S.definekey('Supplier_ID');
S.definedata('Supplier_Name',
'Supplier_Address', 'Country');
S.definedone();
call missing(Supplier_Name,
Supplier_Address,
Country);
end;
set orion.product_list;
if S.find()=0;
run;
98 p306d02
6.09 Quiz
How do you know the lengths of the character variables
Supplier_Name, Supplier_Address, and Country?
100
6-40 Chapter 6 Using DATA Step Hash and Hiter Objects
104
Exercises
Level 1
Customer_ Customer_
Obs Type_ID Customer_Type Group_ID Customer_Group
The data set orion.customer contains the Customer_ID variable and the Customer_Type_ID
variable.
Partial Listing of orion.customer
Partial orion.customer
Customer_
Obs Customer_ID Type_ID
1 4 1020
2 5 2020
3 9 2020
4 10 1040
5 11 1040
6 12 1030
7 13 2010
8 16 3010
9 17 1030
10 18 1020
a. Write a DATA step to create a data set named customers that reads the variables Customer_ID
and Customer_Type_ID from the data set orion.customer.
b. Create a hash object and load it with the data from orion.customer_type. The key should be the
variable Customer_Type_ID, and the data item should be the variable Customer_Type.
c. Use the hash object to look up the Customer_Type description.
6.3 Loading a Hash Object with Data from a SAS Data Set 6-43
Customer_
Obs Customer_Type Customer_ID Type_ID
Level 2
1 210000000000 Children
2 210100000000 Children Outdoors
3 210100100000 Outdoor things, Kids
4 210200000000 Children Sports
5 210200100000 A-Team, Kids
Customer_
Obs Customer_ID Country Customer_Name
1 4 US James Kvarniq
2 5 US Sandrina Stephano
3 9 DE Cornelia Krahl
4 10 US Karen Ballinger
5 11 DE Elke Wallstab
6-44 Chapter 6 Using DATA Step Hash and Hiter Objects
Country_
Obs Country Name
1 AU Australia
2 CA Canada
3 DE Germany
4 IL Israel
5 TR Turkey
The data set orion.order_fact contains Customer_ID and information about the orders.
Partial Listing of orion.order_fact
Partial orion.order_fact
Order_ Total_Retail_
Obs Customer_ID Date Product_ID Quantity Price
a. Create a data set named billing that reads Customer_ID, Order_Date, Product_ID, Quantity,
and Total_Retail_Price from orion.order_fact.
b. Create a hash object from orion.product_list with the key Product_ID and the data
Product_Name.
c. Create a hash object from orion.customer_dim with the key Customer_ID and the data
Customer_Country and Customer_Name.
d. Create a hash object from orion.country with the key Country and the data Country_Name.
e. Use the three hash objects to look up Customer_Name, Country_Name, and Product_Name.
6.3 Loading a Hash Object with Data from a SAS Data Set 6-45
f. Sort the billing data set by Customer_ID and Product_ID and print the first five observations.
Partial PROC PRINT Output
Billing Information
Using a HASH Data Step Object
Customer_
Obs Customer_ID Customer_Name Country Country_Name Product_ID
Order_ Total_Retail_
Obs Product_Name Date Quantity Price
Level 3
6. Loading the Hash Object from a SAS Data Set and Retrieving Multiple Values
The data set orion.staff contains the employee ID and the manager ID for that employee.
Partial Listing of orion.staff
Partial orion.staff
Start_
Obs Employee_ID Date End_Date Job_Title Salary
Employee_ Street_
Obs ID Employee_Name Street_ID Number
Postal_
Obs Street_Name City State Code Country
The data set orion.employee_payroll has an employee ID and the salary for each employee.
Partial Listing of orion.employee_payroll
Partial orion.employee_payroll
a. Write a DATA step to create a data set named manager that reads the Employee_ID and Salary
variables from orion.employee_payroll.
b. Create hash objects from the data sets orion.employee_addresses and orion.staff.
c. Use the hash object from orion.staff to return the Manager_ID value for each Employee_ID in
orion.employee_payroll.
d. Use the hash object from orion.employee_addresses to retrieve the names for both employees
and the manager for the employees.
6.3 Loading a Hash Object with Data from a SAS Data Set 6-47
Manager_
Obs EmpName ManagerName Employee_ID Salary ID
Objectives
Define a hiter object
Investigate the methods for the hiter object
Write a DATA step using the hiter object.
109
110
6.4 Using the DATA Step Hiter Object 6-49
111
112
6-50 Chapter 6 Using DATA Step Hash and Hiter Objects
Business Scenario
The data set orion.order_fact contains the total retail price
of items that were ordered. You need to know the two
customers who ordered the most expensive items and the
two customers who ordered the least expensive items.
Partial Listing of orion.order_fact
Customer Employee Total_Retail CostPrice
Street_ID . . . Discount
_ID _ID _Price _Per_Unit
63 121039 9260125492 . .. $16.50 $7.45 .
5 99999999 9260114570 . .. $247.50 $109.55 .
45 99999999 9260104847 . .. $28.30 $8.55 .
41 120174 1600101527 . .. $32.00 $6.50 .
183 120134 1600100760 . .. $63.60 $8.80 .
. . . . . . .
. . . . . . .
. . . . . . .
113
Hiter Object
data top bottom;
drop i;
if 0 then set orion.order_fact(keep=Customer_ID Product_ID
Total_Retail_Price);
if _N_=1 then do;
declare hash Customer(dataset:'orion.order_fact',
ordered:'descending');
customer.definekey('Total_Retail_Price', 'Customer_ID');
customer.definedata('Total_Retail_Price', 'Customer_ID',
'Product_ID');
customer.definedone();
declare hiter C('customer');
end;
C.first();
do i=1 to 2;
output top;
C.next();
end;
C.last();
do i=1 to 2;
output bottom;
C.prev();
end;
stop;
run;
114 p306d04
6.4 Using the DATA Step Hiter Object 6-51
Hiter Object
data top bottom;
drop i;
if 0 then set orion.order_fact(keep=Customer_ID Product_ID
Total_Retail_Price);
if _N_=1 then do;
declare hash Customer(dataset:'orion.order_fact',
ordered:'descending');
customer.definekey('Total_Retail_Price', 'Customer_ID');
customer.definedata('Total_Retail_Price', 'Customer_ID',
'Product_ID');
customer.definedone();
declare hiter C('customer');
end;
C.first();
do i=1 to 2;
output top;
C.next();
end;
C.last();
do i=1 to 2;
output bottom;
C.prev();
end;
stop;
run;
115 p306d04
Execution
Partial Hash Object customer
KEY: DATA: data top bottom;
KEY: DATA: DATA: drop i;
Total_ Total_ if 0 then set orion.order_fact
Customer Customer Product_
Retail_ Retail_ (keep=Customer_ID Product_ID
_ID _ID ID Total_Retail_Price);
Price Price
if _N_=1 then do;
16.50 63 16.50 63 220101300017 declare hash
247.50 5 247.50 5 230100500026 customer(dataset:'orion.order_fact',
ordered:'descending');
28.30 45 28.30 45 240600100080 customer.definekey('Total_Retail_Price',
32.00 41 32.00 41 240600100010 'Customer_ID');
. . . . . customer.definedata('Total_Retail_Price',
'Customer_ID',
. . . . . 'Product_ID');
. . . . . customer.definedone();
declare hiter C('customer');
95.10 10 95.10 10 240500200016 end;
48.20 10 48.20 10 240500200122
75.20 89 75.20 89 240700200018
33.80 5 Notice the
33.80 5 220101400130
unordered
hash object.
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
. . . . 1
116 ...
6-52 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Hiter C View of Partial Hash Object customer
KEY: DATA: data top bottom;
KEY: DATA: DATA: drop i;
Total_ Total_
Customer Customer Product_ if 0 then set orion.order_fact
Retail_ Retail_ (keep=Customer_ID Product_ID
_ID _ID ID Total_Retail_Price);
Price Price
if _N_=1 then do;
1937.20 70100 1937.20 70100 240200100173 declare hash
1796.00 79 1796.00 79 240200100076 customer(dataset:'orion.order_fact',
1687.50 16 1687.50 16 230100700009 ordered:'descending');
customer.definekey('Total_Retail_Price',
1561.80 183 1561.80 183 240300300090 'Customer_ID');
. . . . . customer.definedata('Total_Retail_Price',
'Customer_ID',
. . . . . 'Product_ID');
. . . This is the
. . customer.definedone();
declare hiter C('customer');
3.20 69 3.20 69 230100500004
3.00 5 3.00 hiter object’s
5 240100100433
end;
117
Hiter Object
data top bottom;
drop i;
if 0 then set orion.order_fact(keep=Customer_ID Product_ID
Total_Retail_Price);
if _N_=1 then do;
declare hash customer(dataset:'orion.order_fact',
ordered:'descending');
customer.definekey('Total_Retail_Price', 'Customer_ID');
customer.definedata('Total_Retail_Price', 'Customer_ID',
'Product_ID');
customer.definedone();
declare hiter C('customer');
end;
C.first();
do i=1 to 2;
output top;
C.next();
end;
C.last();
do i=1 to 2;
output bottom;
C.prev();
end;
stop;
run;
118 p306d04
6.4 Using the DATA Step Hiter Object 6-53
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
70100 240200100173 1937.20 . 1
119 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
70100 240200100173 1937.20 1 1
120 ...
6-54 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . C.prev();
. Output. current observation.
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
70100 240200100173 1937.20 1 1
121 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 240200100076 1796.00 1 1
122 ...
6.4 Using the DATA Step Hiter Object 6-55
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 240200100076 1796.00 2 1
123 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 240200100076 1796.00 2 1
124 ...
6-56 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
c.prev();
.
3.20
.
69
.
3.20
.
69
Output
.
230100500004
current observation.
end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 240200100076 1796.00 2 1
125 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
16 230100700009 1687.50 2 1
126 ...
6.4 Using the DATA Step Hiter Object 6-57
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
16 230100700009 1687.50 3 1
127 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop; Exit the
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run; DO loop.
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
16 230100700009 1687.50 3 1
128 ...
6-58 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 230100500045 2.60 3 1
129 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 230100500045 2.60 1 1
130 ...
6.4 Using the DATA Step Hiter Object 6-59
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_ output top;
Retail_ Retail_
_ID _ID ID
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00
2.70
5
11171
3.00
2.70 11171
5 Output
240100100433
240200100021
current observation.
stop;
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
79 230100500045 2.60 1 1
131 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
11171 240200100021 2.70 1 1
132 ...
6-60 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
11171 240200100021 2.70 2 1
133 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
11171 240200100021 2.70 2 1
134 ...
6.4 Using the DATA Step Hiter Object 6-61
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00
2.70
5
11171
3.00
2.70
Output current observation.
5 240100100433
11171 240200100021
stop;
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
11171 240200100021 2.70 2 1
135 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
5 240100100433 3.00 2 1
136 ...
6-62 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
5 240100100433 3.00 3 1
137 ...
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
. . . . .
. . . . . C.prev();
3.20 69 3.20 69 230100500004 end;
3.00 5 3.00 5 240100100433 stop;
2.70 11171 2.70 11171 240200100021
2.60 79 2.60 79 230100500045
run;
Exit the
DO loop.
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
5 240100100433 3.00 3 1
138 ...
6.4 Using the DATA Step Hiter Object 6-63
Execution
Hiter C View of Partial Hash Object customer C.first();
KEY: DATA:
Total_
KEY:
Total_
DATA: DATA: do i=1 to 2;
Customer Customer Product_
Retail_
_ID
Retail_
_ID ID
output top;
Price Price C.next();
1937.20 70100 1937.20 70100 240200100173
1796.00 79 1796.00 79 240200100076
end;
1687.50 16 1687.50 16 230100700009 C.last();
1561.80 183 1561.80 183 240300300090 do i=1 to 2;
. . . . . output bottom;
.The STOP statement prevents
. . . .
. . . . . C.prev();
the following note in the log:
3.20 69 3.20 69 230100500004 end;
NOTE: DATA STEP stopped
3.00
2.70
5
11171
3.00
2.70 11171
5 240100100433
240200100021
stop;
run;
due to looping.
2.60 79 2.60 79 230100500045
PDV
Total_
Product_ D D
Customer_ID Retail_ i _N_
ID
Price
5 240100100433 3.00 3 1
139 ...
p306d04
proc print data=top;
title 'Top 2 Big Spenders';
run;
Total_Retail_
Obs Customer_ID Product_ID Price
Total_Retail_
Obs Customer_ID Product_ID Price
1 79 230100500045 $2.60
2 11171 240200100021 $2.70
6-64 Chapter 6 Using DATA Step Hash and Hiter Objects
STOP;
140
6.4 Using the DATA Step Hiter Object 6-65
Exercises
Level 1
Total_Retail_
Obs Product_ID Product_Name Price
Listing of least_expensive
The Five Least Expensive Shoes
Total_Retail_
Obs Product_ID Product_Name Price
Level 2
Total_Retail_
Obs Product_ID Product_Name Price Rank
Level 3
Order_
Obs Customer_ID Type
1 4 1
2 4 3
3 5 1
4 5 2
5 5 3
6 9 3
7 10 1
8 10 2
9 11 3
10 12 1
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-67
Objectives
Define a chained lookup.
Use a hash object to perform a chained lookup.
144
145
6-68 Chapter 6 Using DATA Step Hash and Hiter Objects
146
147
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-69
148
Example 1
proc sort data=orion.multiple_orders
out=multiple_orders;
by Customer_ID;
run;
data multiple_orders;
set multiple_orders;
rename Order_Date=OD;
ObsNum=_N_;
run;
p306d05
149 continued...
6-70 Chapter 6 Using DATA Step Hash and Hiter Objects
Example 1
data lookup;
format Next_Order_Date date9.;
keep Customer_ID Product_ID Order_Date
Next_Order_Date;
if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
LU.definekey('ObsNum');
LU.definedata('OD');
LU.definedone();
call missing(OD);
end;
set multiple_orders(rename=(OD=Order_Date));
by Customer_ID;
Obs=ObsNum + 1;
rc=LU.find(key:Obs);
if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
run;
p306d05
150
6.10 Quiz
What is the purpose of the BY statement in the DATA
step?
152
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-71
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. . . . . .
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 1 . . 1
154 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
multiple_orders.sas7bdat
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. . 16 220200100035 27AUG2006 1
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 0 . . 1
155 ...
6-72 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. . 16 220200100035 27AUG2006 1
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 0 2 . 1
156 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. 28AUG2006 16 220200100035 27AUG2006 1
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 0 2 0 1
157 ...
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-73
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
28AUG2006 28AUG2006 16 220200100035 27AUG2006 1
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 0 2 0 1
158 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
False
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
28AUG2006 28AUG2006 16 220200100035 27AUG2006 1
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 0 2 0 1
159 ...
6-74 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone(); Implicit OUTPUT;
call missing(OD);
4 07APR2007 end; Implicit RETURN;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
28AUG2006 28AUG2006 16 220200100035 27AUG2006 1
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
1 0 2 0 1
160 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. . 16 220200100035 28AUG2006 2
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 0 3 . 2
161 ...
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-75
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. 30AUG2006 16 220200100035 28AUG2006 2
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 0 3 0 2
162 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
30AUG2006 30AUG2006 16 220200100035 28AUG2006 2
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 0 3 0 2
163 ...
6-76 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
False
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
30AUG2006 30AUG2006 16 220200100035 28AUG2006 2
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 0 3 0 2
164 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone(); Implicit OUTPUT;
call missing(OD); Implicit RETURN;
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
30AUG2006 30AUG2006 16 220200100035 28AUG2006 2
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 0 3 0 2
165 ...
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-77
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. . 16 220200100035 30AUG2006 3
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 4 . 3
166 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. 07APR2007 16 220200100035 30AUG2006 3
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 4 0 3
167 ...
6-78 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
07APR2007 07APR2007 16 220200100035 30AUG2006 3
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 4 0 3
168 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end; True
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
07APR2007 07APR2007 16 220200100035 30AUG2006 3
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 4 0 3
169 ...
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-79
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. 07APR2007 16 220200100035 30AUG2006 3
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 4 0 3
170 ...
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD'); Implicit OUTPUT;
3 30AUG2006 LU.definedone();
call missing(OD); Implicit RETURN;
4 07APR2007 end;
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. 07APR2007 16 220200100035 30AUG2006 3
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 4 0 3
171 ...
6-80 Chapter 6 Using DATA Step Hash and Hiter Objects
Execution
data lookup;
Partial Hash Object LU format Next_Order_Date date9.;
ObsNum OD keep Customer_ID Product_ID Order_Date
Next_Order_Date;
1 27AUG2006 if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
2 28AUG2006 LU.definekey('ObsNum');
LU.definedata('OD');
3 30AUG2006 LU.definedone();
call missing(OD);
4 07APR2007 end; Continue until EOF.
set multiple_orders(rename=(OD=Order_Date));
5 08APR2007 by Customer_ID;
. . Obs=ObsNum+1;
rc=LU.find(key:Obs);
. . if rc=0 then Next_Order_Date=OD;
if last.Customer_ID then Next_Order_Date=.;
. . run;
PDV
Next_
Customer Product_ Order_
Order_ OD D ObsNum
_ ID ID Date
Date
...
. . 70165 240200100050 19SEP2007 32
First. Last.
D D D Obs D rc D _N_
Customer_ID Customer_ID
0 1 33 -2147450842 32
172
Chained Lookup
proc print data=lookup(obs=10);
var Customer_ID Order_Date Next_Order_Date;
title 'Chained Lookup Example';
run;
PROC PRINT Output
Chained Lookup Example
Next_
Customer_ Order_ Order_
Obs ID Date Date
1 16 27AUG2006 28AUG2006
2 16 28AUG2006 30AUG2006
3 16 30AUG2006 .
4 49 07APR2007 08APR2007
5 49 08APR2007 10APR2007
6 49 10APR2007 11APR2007
7 49 11APR2007 .
8 79 27SEP2007 30SEP2007
9 79 30SEP2007 01OCT2007
10 79 01OCT2007 . p306d05
173
6.5 Using a Hash Object for Chained Lookups (Self-Study) 6-81
Partial orion.multiple_orders
Customer_ID Product_ID Order_Date
16 220200100035 27AUG2006
16 220200100035 28AUG2006
16 220200100035 30AUG2006
49 210201000126 07APR2007
. . .
. . .
. . .
174
Listing of lookup
Customer_
All_Dates Product_ID
ID
27AUG2006, 28AUG2006, 30AUG2006 16 220200100035
07APR2007, 08APR2007, 10APR2007, 11APR2007 49 210201000126
27SEP2007, 30SEP2007, 01OCT2007 79 240500100057
31AUG2007, 05SEP2007, 08SEP2007,
171 230100500004
10SEP2007, 11SEP2007, 13SEP2007, 14SEP2007
29JAN2007, 01FEB2007 2806 240100400058
22JUL2007, 25JUL2007, 26JUL2007,
28JUL2007, 30JUL2007, 01AUG2007, 70108 240200200071
02AUG2007, 05AUG2007
08SEP2007, 10SEP2007, 16SEP2007,
70165 240200100050
18SEP2007, 19SEP2007
175
6-82 Chapter 6 Using DATA Step Hash and Hiter Objects
p306d06
proc sort data=orion.multiple_orders out=multiple_orders;
by Customer_ID;
run;
data multiple_orders;
set multiple_orders;
rename Order_Date=OD;
ObsNum=_N_;
run;
data lookup;
length All_Dates $200;
keep Customer_ID Product_ID All_Dates;
if _N_=1 then do;
declare hash LU(dataset: "multiple_orders");
LU.definekey('ObsNum', 'Customer_ID');
LU.definedata('OD');
LU.definedone();
call missing(OD);
end;
do until (Last);
set multiple_orders(rename=(OD=Order_Date)) end=Last;
by Customer_ID;
if first.Customer_ID then All_Dates=put(Order_Date, date9.);
Obs=ObsNum + 1;
rc=LU.find(key:Obs, key:Customer_ID);
if rc=0 then
All_Dates=catx(', ', All_Dates, put(OD, date9.));
else output;
end;
run;
Customer_
Obs ID
1 16
2 49
3 79
4 171
5 2806
6 70108
7 70165
Obs All_Dates
This problem can also be solved using FIRST. and LAST. processing in the DATA step.
p306d07
proc sort data=orion.multiple_orders out=multiple_orders;
by Customer_ID;
run;
data lookup;
retain All_Dates;
length All_Dates $200;
keep Customer_ID Product_ID All_Dates;
set multiple_orders;
by Customer_ID;
if first.Customer_ID then All_Dates=put(Order_Date, date9.);
Next_Date=lag(Order_Date);
if not first.Customer_ID then All_Dates=
catx(', ', All_Dates, put(Next_Date, date9.));
if last.Customer_ID then output;
run;
title;
6-84 Chapter 6 Using DATA Step Hash and Hiter Objects
Exercises
Level 1
Level 2
Customer Customer
Country List
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
US 4, 5, 10, 12, 17, 18, 20, 23, 24, 27, 31, 34, 36,
39, 45, 49, 52, 56, 60, 63, 69, 71, 75, 79, 88,
89, 90, 92
Level 3
b. Write a PROC REPORT step to display the data set suppliers. Ensure that the entire lists for
All_Products and All_Names are printed.
Partial Listing of suppliers
Supplier Product List
Product Names of
Supplier List Products
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chapter Review
1. Describe a hash object.
179
Chapter Review
6. What is the purpose of the FIND method?
181
6-88 Chapter 6 Using DATA Step Hash and Hiter Objects
Chapter Review
9. Why are the DECLARE, DEFINEKEY, DEFINEDATA,
and DEFINEDONE methods executed in the IF _N_=1
THEN/DO group?
183
6.7 Solutions 6-89
6.7 Solutions
Solutions to Exercises
1. Using the ADD Method to Create a Hash Object with a Single Key
a. Write a DATA step that creates a data set named orders.
b. Use the ADD method to create a hash table containing the values of the Order_Type as the key
values and the corresponding Sale_Type as the data values. Use the FIND method to retrieve sale
type based on the variable Order_Type in the data set orion.orders.
c. Keep only the variables Order_ID, Order_Type, and Sale_Type.
d. Print the first five observations from the orders data set.
p306s01
data orders;
length Sale_Type $40;
keep Order_ID Order_Type Sale_Type;
if _N_=1 then do;
declare hash Product();
Product.definekey('Order_Type');
Product.definedata('Sale_Type');
Product.definedone();
Product.add(key:1, data:'Retail Sale');
Product.add(key:2, data:'Catalog Sale');
Product.add(key:3, data:'Internet Sale');
call missing(Sale_Type);
end;
set orion.orders;
rc=Product.find();
if rc=0;
run;
title;
4. Loading the Hash Object from a SAS Data Set
a. Write a DATA step to create a data set named customers that reads the variables Customer_ID
and Customer_Type_ID from the data set orion.customer.
b. Create a hash object and load it with the data from orion.customer_type. The key should be the
variable Customer_Type_ID, and the data item should be the variable Customer_Type.
c. Use the hash object to look up the Customer_Type description.
d. Print the first 10 observations of the customers data set.
p306s04
data customers;
length Customer_Type $40;
keep Customer_ID Customer_Type_ID Customer_Type;
if _N_=1 then do;
declare hash Customer(dataset:'orion.customer_type');
Customer.definekey('Customer_Type_ID');
Customer.definedata('Customer_Type');
Customer.definedone();
call missing(Customer_Type);
end;
set orion.customer;
if Customer.find()=0;
run;
/* alternate solution */
data customers;
keep Customer_ID Customer_Type_ID Customer_Type;
if 0 then set orion.customer_type(keep=Customer_Type_ID
Customer_Type);
if _N_=1 then do;
declare hash Customer(dataset:'orion.customer_type');
Customer.definekey('Customer_Type_ID');
Customer.definedata('Customer_Type');
Customer.definedone();
call missing(Customer_Type);
end;
set orion.customer;
if Customer.find()=0;
run;
6-92 Chapter 6 Using DATA Step Hash and Hiter Objects
S.first();
do i=1 to 5;
output expensive;
S.next();
end;
S.last();
do i=1 to 5;
output least_expensive;
S.prev();
end;
stop;
run;
b. Print each of the data sets.
p306s07
proc print data=expensive;
title "The Five Most Expensive Shoes";
run;
S.first();
do i=1 to 5;
Rank=catx(' ', 'Top', i);
output;
S.next();
end;
S.last();
do i=1 to 5;
Rank=catx(' ', 'Bottom', i);
output;
S.prev();
end;
stop;
run;
data order_fact;
set order_fact;
rename Product_ID=PID Total_Retail_Price=TRP;
ObsNum=_N_;
run;
data next_products;
keep Customer_ID Product_ID Total_Retail_Price
Next_Product_ID Next_Price;
if _N_=1 then do;
declare hash Lu(dataset: "order_fact");
Lu.definekey('ObsNum');
Lu.definedata('PID', 'TRP');
Lu.definedone();
call missing(PID, TRP);
end;
set order_fact(rename=(PID=Product_ID
TRP=Total_Retail_Price));
by Customer_ID;
Obs=ObsNum + 1;
rc=Lu.find(key:Obs);
if rc=0 then do;
Next_Product_ID=PID;
Next_Price=TRP;
end;
if last.Customer_ID then do;
Next_Product_ID=.;
Next_Price=.;
end;
run;
data customers;
set customers;
ObsNum=_N_;
run;
data customer_list;
length All_Customers $500;
if _N_=1 then do;
declare hash Lu(dataset: "customers");
Lu.definekey('ObsNum','Country');
Lu.definedata('Country','Customer_ID');
Lu.definedone();
end;
do until (Last);
set customers end=Last;
by Country;
if first.Country then All_Customers=Customer_ID;
Obs=ObsNum + 1;
rc=Lu.find(key:Obs, key:Country);
if rc=0 then
All_Customers=catx(', ', All_Customers, Customer_ID);
else output;
end;
run;
b. Open and submit the program p306e11 that contains a PROC REPORT step.
p306e11
proc report data=customer_list nowd headline headskip;
column Country All_Customers;
define Country / width=20 order 'Customer/Country';
define All_Customers / width=50 flow 'Customer/List';
break after Country / skip;
run;
6.7 Solutions 6-99
data product_dim;
set product_dim;
ObsNum=_N_;
run;
data suppliers;
length All_Products $500 All_Names $750;
if _N_=1 then do;
declare hash Lu(dataset: "product_dim");
Lu.definekey('ObsNum', 'Supplier_ID');
Lu.definedata('Supplier_Name','Product_ID','Product_Name');
Lu.definedone();
end;
do until (Last);
set product_dim end=Last;
by Supplier_ID;
if first.Supplier_ID then do;
All_Products=Product_ID;
All_Names=Product_Name;
end;
Obs=ObsNum + 1;
rc=Lu.find(key:Obs, key:Supplier_ID);
if rc=0 then do;
All_Products=catx(', ', All_Products, Product_ID);
All_Names=catx(', ', All_Names, Product_Name);
end;
else output;
end;
run;
b. Write a PROC REPORT step to display the data set suppliers. Ensure that the entire lists for
All_Products and All_Names are printed.
proc report data=suppliers nowd headline headskip ls=132;
column Supplier_Name All_Products All_Names;
define Supplier_Name / width=30 order 'Supplier';
define All_Products / width=30 flow 'Product/List';
define All_Names / width=50 flow 'Names of/Products';
break after Supplier_Name / skip;
run;
6-100 Chapter 6 Using DATA Step Hash and Hiter Objects
32
6.7 Solutions 6-101
67
6-102 Chapter 6 Using DATA Step Hash and Hiter Objects
71
73
6.7 Solutions 6-103
97
101
6-104 Chapter 6 Using DATA Step Hash and Hiter Objects
153
6.7 Solutions 6-105
182
6-106 Chapter 6 Using DATA Step Hash and Hiter Objects
184
Chapter 7 Creating and Using
Formats
Objectives
Create permanent formats.
Access permanent formats.
Create formats from SAS data sets.
Maintain formats.
Use formats as lookup tables.
4
7-4 Chapter 7 Creating and Using Formats
Business Scenario
The data set orion.country contains the country code
and the country name. Create a format from this data set.
Listing of orion.country
Country_ Country_ Continent_ Country_
Country Population
Name ID ID FormerName
AU Australia 20,000,000 160 96
CA Canada . 260 91
East/West
DE Germany 80,000,000 394 93
Germany
IL Israel 5,000,000 475 95
6
7.1 Using Formats as Lookup Tables 7-5
p307d01
/* Step 1 */
/* Make a CNTLIN data set containing */
/* the variables FMTNAME, START, and */
/* LABEL. */
data country;
keep Start Label FmtName;
retain FmtName '$country';
set orion.country(rename=(Country=Start
Country_Name=Label));
run;
proc print data=country noobs;
title 'Country';
run;
/* Step 2 */
/* Use the data set COUNTRY to */
/* make the format $country. */
10
11
7-8 Chapter 7 Creating and Using Formats
Nesting Formats
In the VALUE statement, you can specify that the format
use a second format as the formatted value.
value=[existing-format]
p307d01
13
Avoid nesting formats for more than one level. The resource requirements can increase
dramatically with each additional level.
SAS Catalogs
work.formats orion.formats orion.MyFmts
PROC FORMAT;
15
Documenting Formats
You can use the SAS Explorer Window to view the
formats stored in a catalog.
16
7-10 Chapter 7 Creating and Using Formats
Documenting Formats
The CATALOG procedure manages entries in
SAS catalogs.
Selected capabilities of PROC CATALOG include the
following:
creating a listing of the contents of a catalog
17
7.1 Using Formats as Lookup Tables 7-11
Output
Contents of Catalog ORION.MYFMTS
p307d01
18
Documenting Formats
You can use the FMTLIB option in the PROC FORMAT
statement to document the format.
proc format library=orion.MyFmts fmtlib;
select $country;
run;
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ FORMAT NAME: $COUNTRY LENGTH: 13 NUMBER OF VALUES: 7 ‚
‚ MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 13 FUZZ: 0 ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚START ‚END ‚LABEL (VER. V7|V8 05MAY2009:12:34:42)‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚AU ‚AU ‚Australia ‚
‚CA ‚CA ‚Canada ‚
‚DE ‚DE ‚Germany ‚
‚IL ‚IL ‚Israel ‚
‚TR ‚TR ‚Turkey ‚
‚US ‚US ‚United States ‚
‚ZA ‚ZA ‚South Africa ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒŒ
p307d01
19
You can use either the SELECT or EXCLUDE statement to process specific formats rather than
an entire catalog.
Using Formats
You can reference formats in any of the following:
FORMAT statements
PUT statements
20
7.1 Using Formats as Lookup Tables 7-13
Using Formats
When a format is referenced, SAS does the following:
loads the format from the catalog entry into memory
21
7.02 Quiz
Submit the program p307a01.
What error messages do you see in the SAS log?
data customers;
set orion.customer;
Country_Name=put(Country,$country.);
run;
23
7-14 Chapter 7 Creating and Using Formats
7.03 Quiz
1. Add the following OPTIONS statement to p307a01
and resubmit the program. What is the result?
options nofmterr;
25
28
FMTERR specifies that when SAS cannot find a specified variable format, it
generates an error message and does not allow default substitution to occur.
NOFMTERR replaces missing formats with the w. or $w. default format, issues a note,
and continues processing.
7.1 Using Formats as Lookup Tables 7-15
29
work.formats
library.formats
orion.formats
orion.MyFmts
30
Because orion is a libref without a catalog name, formats is assumed to be the catalog name.
SAS supplied formats are always searched first. The work.formats catalog is always searched second,
unless it appears in the FMTSEARCH list. If the library libref is assigned, the library.formats catalog is
searched after work.formats and before anything else in the FMTSEARCH list, unless it appears in the
list. To assign the library libref, use the code shown below:
Maintaining Formats
To maintain formats, perform one of the following tasks:
Edit the PROC FORMAT code that created the original
format.
Create a SAS data set from the format, edit the data
set, and use the CNTLIN= option to re-create the
format.
31
Step 2
SAS Edit
Data Set Values
Step 3
proc format library=libref.catalog
cntlin=SAS-data-set;
run;
32
When the data set created by the CNTLOUT= option will be used as a CNTLIN= data set in a
subsequent FORMAT procedure step, the minimum variables that must be included are START,
END, FMTNAME, and LABEL.
7.1 Using Formats as Lookup Tables 7-17
p307d02
/* Step 1 */
/* Step 2 */
proc sql;
insert into countryfmt(FmtName, Start, End, Label)
values('$country', 'BR', 'BR', 'Brazil')
values('$country', 'CH', 'CH', 'Switzerland')
values('$country', 'MX', 'MX', 'Mexico');
quit;
/* Step 3 */
DEFAULT a numeric variable that indicates the default length for format or informat
EEXCL a character variable that indicates whether the range’s ending value is excluded
FILL for picture formats, a numeric variable whose value is the value of the FILL= option
FUZZ a numeric variable whose value is the value of the FUZZ= option
HLO a character variable that contains range information about the format or informat in
the form of different letters that can appear in any combination
LABEL a character variable whose value is the informatted or formatted value or the name of
an existing informat or format
LENGTH a numeric variable whose value is the value of the LENGTH= option
MAX a numeric variable whose value is the value of the MAX= option
MIN a numeric variable whose value is the value of the MIN= option
MULT a numeric variable whose value is the value of the MULT= option
NOEDIT for picture formats, a numeric variable whose value indicates whether the NOEDIT
option is in effect
PREFIX for picture formats, a character variable whose value is the value of the PREFIX=
option
SEXCL a character variable that indicates whether the range’s starting value is excluded
34
35
ability to be stored permanently
7-20 Chapter 7 Creating and Using Formats
To estimate the amount of memory used by a format, refer to Usage Note 23084 at
support.sas.com/kb/23/084.html.
7.1 Using Formats as Lookup Tables 7-21
Exercises
Level 1
1 91 North America
2 93 Europe
3 94 Africa
4 95 Asia
5 96 Australia/Pacific
a. Create a CNTLIN data set named continent that reads the data from orion.continent and contains
the variables FmtName, Start, and Label. The name of the format should be CONTINENT.
b. Use the CNTLIN= option to create a format from the continent data set and store the format in
the orion.MyFmts catalog.
c. Open the program p307e01c and submit it. The program should execute successfully with no
errors in the SAS log.
p307e01c
/*******************/
/* Part C */
/* Use continent. */
/*******************/
data countries;
set orion.country;
Continent_Name=put(Continent_ID, continent.);
run;
proc sql;
insert into continentfmt(fmtname, Start, End, Label)
values('continent', '90', '90', 'Antarctica')
values('continent', '92', '92', 'South America');
quit;
1) Before the PROC SQL step, add a PROC FORMAT step with the CNTLOUT= option to
create a control output data set named continentfmt from the CONTINENT format.
2) Submit the program to add new observations to the continentfmt data set.
3) Add another PROC FORMAT step with the CNTLIN= option to read the continentfmt data
set and re-create the CONTINENT format. Use the FMTLIB option in this PROC FORMAT
step to ensure that the new values were added to the format CONTINENT.
Level 2
1 15 30 15-30 years
2 30 45 31-45 years
3 45 60 46-60 years
4 60 75 61-75 years
a. Create a format from the orion.ages data set and store it permanently in the orion.MyFmts
catalog. Use the appropriate option to view the values in the format.
b. Write a DATA step to create a data set named sales that reads the Employee_ID and Birth_Date
variables from the orion.sales data set. Create a new variable named Age that is the employee’s
age as of the current date and another new variable named Age_Cat that is the value of the
variable Age using the AGE format.
7.1 Using Formats as Lookup Tables 7-23
c. Print the first five observations of the sales data set to confirm that the new variables were created
correctly.
PROC PRINT Output (As of May 5, 2009)
Sales Data Set
Birth_
Obs Employee_ID Date Age Age_Cat
Level 3
1 15 30 15-29 years
2 30 45 30-44 years
3 45 60 45-59 years
4 60 75 60-75 years
a. Create a format named AGES_MOD from the orion.ages_mod data set and store it permanently
in the orion.MyFmts catalog. Use the appropriate option to view the values in the format.
The value of the Last_Age variable is not to be included in the Description variable. Use
SAS Help or SAS OnlineDoc to investigate the EEXCL variable that is required to get
the correct results for this exercise.
b. Write a DATA step to create a data set named sales that reads the Employee_ID and Birth_Date
variables from the orion.sales data set. Create a new variable named Age that is the employee’s
age as of the current date and another new variable named Age_Cat that is the value of the
variable Age using the AGES_MOD format.
c. Print the first five observations of the sales data set to confirm that the new variables were created
correctly.
PROC PRINT Output (as of May 5, 2009)
Sales Data Set
Birth_
Obs Employee_ID Date Age Age_Cat
Objectives
Use a picture format to format numeric data.
40
213 **********213
41
7.2 Using a Picture Format (Self-Study) 7-25
Picture Formats
Some uses for picture formats include the following:
displaying numbers with leading zeros (0005)
Business Scenario
The data set orion.phone contains the phone number of
the employees from the United States and Australia. The
phone number is stored in a numeric variable named
Phone. Create a data set that contains the phone number
in the correct formatted form.
Partial Listing of orion.phone
Employee_
Phone_Type Country Phone
ID
120101 Home AU 61255551849
120101 Work AU 61255510001
120102 Home AU 61355559700
. . . .
. . . .
. . . .
121147 Home US 13055510423
121148 Work US 13055554118
121148 Home US 13055510424
43
7-26 Chapter 7 Creating and Using Formats
data phone_list;
set orion.phone;
if Country='AU' then
Phone_Number=put(Phone,au_phone.);
else if Country='US' then
Phone_Number=put(Phone,us_phone.);
run;
p307d03
44
7.04 Quiz
Open and submit the program p307d03.
1. How are the Australian phone numbers displayed?
46
7.2 Using a Picture Format (Self-Study) 7-27
Employee_ Phone_
Obs ID Type Country Phone Phone_Number
<observations removed>
49
PROC FORMAT;
PICTURE name
value-or-range-1 <..., value-or-range-n>='picture';
RUN;
50
7-28 Chapter 7 Creating and Using Formats
p307d03
52
7.2 Using a Picture Format (Self-Study) 7-29
To insert the open parenthesis in the phone number 2155555906, use the following PROC FORMAT step:
proc format;
picture us_phone_withzeros
low-<10000000='0 (000) 000-9999'
10000000-high='0 (000) 000-9999' (prefix='(');
run;
7.05 Quiz
Submit the program p307a02. How many digits are
displayed to the right of the decimal point?
proc format;
picture rtfmt 0 - high='999,999.9999';
picture wzrfmt 0 - high='000,009.0000';
run;
56
7.06 Quiz
Submit the program p307a03. How many digits are
printed to the left of the decimal point?
proc format;
picture small low - high='0,009.99';
picture large low - high='000,009.99';
run;
58
7.2 Using a Picture Format (Self-Study) 7-31
61
7-32 Chapter 7 Creating and Using Formats
Inserting Characters
proc format;
picture paren low - high='(999)999-9999';
picture nospace low-high='999)999-9999'
(prefix= '(' );
picture space low-high=' 999)999-9999'
(prefix= '(' );
run;
Date Directives
Consider this date value: -3334 (November 15, 1950).
Use This Variable
To Display
Directive Display
Abbreviated weekday name %a Wed
Full weekday name %A Wednesday
Abbreviated month name %b NOV
Full month name %B November
Month value as decimal %m 11
Date Directives
proc format;
picture longdate (default=30)
'01jan1950'd-'31dec2004'd='%A, %B %d'
(datatype=date);
picture noleadz
'01jan1950'd-'31dec2004'd='%y~%m~%d'
(datatype=date);
picture leadzero
'01jan1950'd-'31dec2004'd='%0y~%0m~%0d'
(datatype=date);
run;
p307d08
65
7-34 Chapter 7 Creating and Using Formats
Exercises
Level 1
1230058123 12-30-05-8123
b. Print the first five observations of the orion.order_fact data set to validate the formatted values
of the variable.
PROC PRINT Output
Formatted Values of Order_ID
Obs Order_ID
1 12-30-05-8123
2 12-30-08-0101
3 12-30-10-6883
4 12-30-14-7441
5 12-30-31-5085
Level 2
b. Print the first five observations of the orion.denmark_customers data set to validate the
formatted values of the variable.
PROC PRINT Output
Using a PICTURE Format
Obs Total_Retail_Price
Abbreviation Meaning
eks eksklusiv (exclusive)
Level 3
b. Print the first five observations of the orion.order_fact data set to validate the formatted values
of the variable.
PROC PRINT Output
Obs Order_Date
1 Saturday, 1.11.2003
2 Wednesday, 1.15.2003
3 Monday, 1.20.2003
4 Tuesday, 1.28.2003
5 Thursday, 2.27.2003
7-36 Chapter 7 Creating and Using Formats
Chapter Review
1. What PROC FORMAT statement option is used
to create a permanent format?
68 continued...
Chapter Review
4. What PROC FORMAT option is used to view
the contents of a format?
70
7.4 Solutions 7-37
7.4 Solutions
Solutions to Exercises
1. Creating Formats with Values from a SAS Data Set
a. Create a CNTLIN data set named continent that reads the data from orion.continent and contains
the variables FmtName, Start, and Label. The name of the format should be CONTINENT.
p307s01
/*********************/
/* Part A */
/* Make continent */
/*********************/
data continent;
keep Start Label FmtName;
retain FmtName 'continent';
set orion.continent(rename=(Continent_ID=Start
Continent_Name=Label));
run;
c. Print the first five observations of the sales data set to confirm that the new variables were created
correctly.
p307s02
proc print data=sales(obs=5);
format Birth_Date date9.;
title 'Sales Data Set';
run;
3. Creating Formats with Exclusive Ranges from a SAS Data Set
a. Create a format named ages_mod from the orion.ages_mod data set and store it permanently in
the orion.MyFmts catalog. Use the appropriate option to view the values in the format.
p307s03
data ages_mod;
set orion.ages_mod(rename=(First_Age=Start Last_Age=End
Description=Label));
retain fmtname 'ages_mod';
EEXCL='Y';
run;
data sales;
set orion.sales(keep=Employee_ID Birth_Date);
Age=int(yrdif(Birth_Date, today(), 'ACT/ACT'));
Age_Cat=put(Age, ages_mod.);
run;
c. Print the first five observations of the sales data set to confirm that the new variables were created
correctly.
p307s03
proc print data=sales(obs=5);
format birth_date date9.;
title 'Sales Data Set';
run;
7-40 Chapter 7 Creating and Using Formats
1230058123 12-30-05-8123
b. Print the first five observations of the orion.order_fact data set to validate the formatted values
of the variable.
p307s04
proc format;
picture product low – high='99-99-99-9999';
run;
b. Print the first five observations of the orion.denmark_customers data set to validate the
formatted values of the variable.
p307s05
proc format;
picture kroner 0 - high='000.009,99 eks.moms' (mult=100
prefix='kr. ');
run;
b. Print the first five observations of the orion.order_fact data set to validate the formatted values
of the variable.
p307s06
proc format;
picture day_of_week(default=21) low – high='%A, %m.%d.%Y'
(datatype=date);
run;
12
480 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.CUSTOMERS may be incomplete. When this step was stopped there were
0 observations and 13 variables.
WARNING: Data set WORK.CUSTOMERS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
481
482 proc freq data=orion.employee_addresses;
483 tables Country;
484 format Country $extra.;
ERROR: The format $EXTRA was not found or could not be loaded.
485 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.10 seconds
cpu time 0.00 seconds
24
7.4 Solutions 7-43
data customers;
set orion.customer;
Country_Name=put(Country,$country.);
run;
data customers;
set orion.customer;
Country_Name=put(Country,$country.);
run;
48
57
7.4 Solutions 7-45
59
7-46 Chapter 7 Creating and Using Formats
69 continued...
71
Chapter 8 Combining Data
Horizontally
8.1 DATA Step Merges and SQL Procedure Joins ............................................................. 8-3
Demonstration: Using the DATA Step to Perform a Match-Merge ......................................... 8-7
Objectives
Use the DATA step with a MERGE statement
to combine more than two SAS data sets.
Use the SQL procedure to join SAS data sets
without a common variable.
Describe the differences between the DATA step
MERGE statement and PROC SQL.
4
8-4 Chapter 8 Combining Data Horizontally
7
8.1 DATA Step Merges and SQL Procedure Joins 8-5
Business Scenario
The SAS data set orion.staff contains the employee’s ID
and the employee’s manager’s ID.
Partial Listing of orion.staff
Emp_
Employee_ Emp_Hire_ Manager
Start_Date . . . Term_
ID Date _ID
Date
120101 01JUL2003 ... 01JUL2003 . 120261
120102 01JUN1989 ... 01JUN1989 . 120101
120103 01JAN1974 ... 01JAN1974 . 120101
120104 01JAN1981 ... 01JAN1981 . 120101
120105 01MAY1999 ... 01MAY1999 . 120101
120106 01JAN1974 ... 01JAN1974 . 120104
120107 01FEB1974 ... 01FEB1974 . 120104
120108 01AUG2006 ... 01AUG2006 . 120104
. . . . . .
. . . . . .
9 . . . . . .
Business Scenario
The SAS data set orion.employee_addresses
contains the employee’s ID and name.
Partial Listing of orion.employee_addresses
Employee_ Postal_
Employee_Name . . . State Country
ID Code
121044 Abbott, Ray . . . FL 33135 US
120145 Aisbitt, Sandy . . . 2001 AU
Akinfolarin,
120761 . . . PA 19145 US
Tameaka
120656 Amos, Salley . . . CA 92116 US
121107 Anger, Rose . . . PA 19142 US
121038 Anstey, David . . . FL 33157 US
120273 Antonini, Doris . . . FL 33141 US
. . . . . .
. . . . . .
. . . . . .
10
8-6 Chapter 8 Combining Data Horizontally
Business Scenario
You need to combine these two data sets to determine the
employee’s name and the employee’s manager’s name.
Partial Listing of names
Employee Manager_
Employee_Name Manager_Name
_ID ID
120102 120101 Zhou, Tom Lu, Patrick
120103 120101 Dawes, Wilson Lu, Patrick
120104 120101 Billington, Kareen Lu, Patrick
120105 120101 Povey, Liz Lu, Patrick
120121 120102 Elvish, Irenie Zhou, Tom
120122 120102 Ngan, Christina Zhou, Tom
120123 120102 Hotstone, Kimiko Zhou, Tom
120124 120102 Daymond, Lucian Zhou, Tom
. . . .
. . . .
. . . .
11
12
8.1 DATA Step Merges and SQL Procedure Joins 8-7
p308d01
proc sort data=orion.employee_addresses(keep=Employee_ID
Employee_Name)
out=addresses;
by Employee_ID;
run;
data temp1;
keep Employee_Name Employee_ID Manager_ID;
merge orion.staff(in=S keep=Employee_ID Manager_ID)
addresses(in=A);
by Employee_ID;
if S and A; /* Matches only */
run;
data names;
merge temp1(in=T)
addresses(rename=(Employee_ID=Manager_ID
Employee_Name=Manager_Name) in=A);
by Manager_ID;
if A and T;
run;
a b
* Example X Y X Y
1 2 1 3
data c;
merge a b; c
by X;
X Y
run;
1 3
15
8.1 DATA Step Merges and SQL Procedure Joins 8-9
18
8-10 Chapter 8 Combining Data Horizontally
p308d02
proc sql;
create table namessql as
select e.Employee_ID,
e.Employee_Name,
Manager_ID,
m.Employee_Name as Manager_Name
from orion.staff,
orion.employee_addresses as e,
orion.employee_addresses as m
where e.Employee_ID=staff.Employee_ID
and m.Employee_ID=staff.Manager_ID
order by Manager_ID,
Employee_ID;
quit;
21
22
8-12 Chapter 8 Combining Data Horizontally
Multiple data sets can be created. Only one data set can be created
with one CREATE TABLE statement.
Complex business logic can be incorporated CASE logic can be used for business
using IF-THEN or SELECT/WHEN logic. logic; however, it is not as flexible as
DATA step syntax.
The data sets being merged must be sorted or The data sets being joined do not
indexed on the BY variable(s). have to be sorted nor indexed.
Comparison Programs
The DATA step merge and the PROC SQL inner join
do not always give you the same results.
The following programs are used to generate the results
for the next four result sets:
proc sql;
data three; create table three as
merge one two; select one.X, one.Y, two.Z
by X; from one, two
run; where one.X=two.X;
quit;
24
8.1 DATA Step Merges and SQL Procedure Joins 8-13
25
Reference Information
Reference Information
The following SQL step produces results that are identical to those of the DATA step when there is
nonmatching data.
proc sql;
select coalesce(one.X, two.X) as X, Y, Z
from one full join two
on one.X=two.X;
quit;
The following DATA step merge produces results that are identical to those of the SQL inner join when
there is nonmatching data.
data three;
merge one(in=O) two(in=T);
by X;
if O and T;
run;
8.1 DATA Step Merges and SQL Procedure Joins 8-17
Exercises (Optional)
Level 1
Order_ Delivery_
Obs Customer_ID Employee_ID Street_ID Date Date Order_ID
1 4 James Kvarniq
2 5 Sandrina Stephano
3 9 Cornelia Krahl
4 10 Karen Ballinger
5 11 Elke Wallstab
8-18 Chapter 8 Combining Data Horizontally
The data set orion.product_dim has the product names and supplier names.
Partial Listing of orion.product_dim
Partial orion.product_dim
a. Combine the three data sets to create a data set named purchases that contains the customer
name, product name, and supplier name for the customers in the orion.order_fact data set.
b. Order the data by Product_ID and print the first five observations of the purchases data set.
PROC PRINT Output
Partial purchases Data Set
Level 2
Order_ Delivery_
Obs Customer_ID Employee_ID Street_ID Date Date Order_ID
1 4 James Kvarniq
2 5 Sandrina Stephano
3 9 Cornelia Krahl
4 10 Karen Ballinger
5 11 Elke Wallstab
The data set orion.product_dim has the product names and supplier names.
Partial Listing of orion.product_dim
Partial orion.product_dim
Combine the three data sets to create the following data sets:
• a data set named no_purchases that contains the customers who did not make any purchases
• a data set named purchases that contains the customer name, product name, and supplier name for
those customers in the orion.order_fact data set
• a data set named no_products that contains the product names and suppliers for products that were
not purchased
Partial Listing of no_purchases
no_purchases Data Set
1 33 Rolf Robak
2 42 Thomas Leitmann
8-20 Chapter 8 Combining Data Horizontally
Order_ Delivery_
Obs Customer_ID Employee_ID Street_ID Date Date Order_ID
Level 3
The data set orion.employee_addresses contains the employee IDs and the employee names for all
employees.
Partial Listing of orion.employee_addresses
Partial orion.employee_addresses
Employee_
Obs ID Employee_Name
Create a data set named manager_names that contains the Employee_ID variable, the six
Manager_ID variables, and the six manager names.
Partial Listing of manager_names
Partial manager_names Data
Manager5_ Manager6_
Obs Manager2_Name Manager3_Name Manager4_Name Name Name
Objectives
Use the SET statement with the KEY= option to
combine two SAS data sets.
Use _IORC_ to determine whether the index search
was successful.
32
Business Scenario
The data set orion.catalog contains the order information
for catalog sales and has 38 observations.
33
8.2 Using an Index to Combine Data 8-23
Business Scenario
The data set orion.customer_dim_more contains
information about customers and has 1,500 observations.
Partial Listing of orion.customer_dim_more
Customer_ Customer_ Customer Customer_ Customer_ Customer_ Customer_
ID Country _Gender Name ... Type Group Age
Orion Club
James Orion Club
4 US M
Kvarniq ... members low
members
33
activity
Orion Club Gold
Sandrina Orion Club
5 US F
Stephano ... members medium
Gold members
28
activity
Orion Club Gold
Cornelia Orion Club
9 DE F
Krahl ... members medium
Gold members
33
activity
Orion Club
Karen Orion Club
10 US F
Ballinger ... members high
members
23
activity
Orion Club
Elke Orion Club
11 DE F
Wallstab ... members high
members
33
activity
. . . . . . . .
. . . . . . . .
. . . . . . . .
34
Business Scenario
You need to combine the two data sets to create two new
data sets: one with information about the customers who
purchase products from the catalog for whom you have
demographics and the other for customers for whom you
do not have any information.
Partial PROC PRINT Output: catalog_customers PROC PRINT
Catalog Customers (Partial Output) Output: errors
Total_Retail_ No Demographic Data
Obs Customer_ID Order_ID Quantity Price Available
1 5 1230080101 1 $247.50
2 45 1230106883 1 $28.30 Obs Customer_ID
3 79 1230333319 1 $234.60
4 23 1230338566 1 $35.40 1 15
5 16 1230450371 2 $128.40 2 66
Customer_ Customer_ Customer_
Obs Country Gender Customer_Name Age
1 US F Sandrina Stephano 28
2 US F Dianne Patchin 28
3 US F Najma Hicks 21
4 US M Tulio Devereaux 58
5 DE M Ulrich Heyde 68
35
8-24 Chapter 8 Combining Data Horizontally
37
p308d03
39
40
Assign a value to the index key variable(s) before the SET statement is executed. The index is then used
to retrieve an observation with the key value. WHERE processing is not enabled for a data set read with
the KEY= option.
41
8-26 Chapter 8 Combining Data Horizontally
Reference Information
You can use the automatic variable _IORC_ with the %SYSRC AUTOCALL macro to test for specific
I/O error conditions that are created when you use the KEY= option in the SET statement.
General form for using %SYSRC with _IORC_:
IF _IORC_=%SYSRC(mnemonic) THEN…
Mnemonic Meaning
The %SYSRC macro is in the AUTOCALL library. You must have the MACRO system option in effect
to use this macro. Consult SAS OnlineDoc for more information. Follow the path shown below:
Support & Training Ö Knowledge Base Ö Documentation Ö Base SAS Ö
SAS 9.2 Macro Language: Reference Ö Macro Language Dictionary Ö AutoCall Macros
The IORCMSG function returns the formatted error message that is associated with the current value of
the automatic variable _IORC_.
General form of the IORCMSG function:
character-variable=IORCMSG();
Character-variable specifies a character variable with a length of 200, unless the length was previously
assigned.
8.2 Using an Index to Combine Data 8-27
Example:
p308d03a
data catalog_customers(keep=Customer_ID Order_ID Quantity
Total_Retail_Price
Customer_Country
Customer_Gender
Customer_Name
Customer_Age_Group)
errors(keep=Customer_ID);
set orion.catalog(keep=Customer_ID Order_ID
Quantity Total_Retail_Price);
set orion.customer_dim_more key=Customer_ID;
if _IORC_=0 then output catalog_customers;
else do;
output errors;
Message=iorcmsg();
_ERROR_=0;
putlog _N_ ' The problem is ' Message;
end;
run;
Execution
Partial Listing of
orion.customer_dim_more
Simplified Index on
Customer Customer_ Customer Customer_
orion.customer_dim_more _ID Country _Gender Name
...
RID
Customer_ Record James
1 4 US M ...
Identifiers Kvarniq
ID
Sandrina
4 RID 2 5 US F ...
Stephano
5 RID Cornelia
3 9 DE F ...
Krahl
9 RID Karen
4 10 US F ...
. . Ballinger
. .
. . 5 Elke
11 DE F ...
Wallstab
13 RID
. . . .
16 RID . . . .
. . . . . .
. . Dianne
. . 45 US F ...
Patchin
45 RID . . . .
. . . . . .
. .
. . . .
. .
45
8-30 Chapter 8 Combining Data Horizontally
8.04 Quiz
Why do you not want this observation output to
catalog_customers?
Partial PDV
Total_
Customer_ Order_ Customer_
Quantity Retail_ ... ... D _IORC_ D _N_
ID ID Name
Price
Sandrina
15 1240080101 3 216.50 ...
Stephano
... 1230015 2
53
8.05 Quiz
Open and submit the program p308a01.
1. What messages do you see in your SAS log?
2. What is the value of _ERROR_?
3. Replace the ELSE statement with the following ELSE
DO group:
else do;
_ERROR_=0;
output errors;
end;
4. Resubmit the program and look at the log.
5. Why are there no messages now?
59
63
8-36 Chapter 8 Combining Data Horizontally
one two
Variable Variable
A A
A A
A A
64 ...
If there are contiguous duplications in one, each of which has a match in two, then SAS performs a
one-to-one read.
one two
Variable Variable
A A
A A
A B No
Match
Run-time error
65 ...
If there are contiguous duplications in one, some of which do not have a match in two, then SAS
performs a one-to-one read until it finds a nonmatch. At that time, SAS encounters a run-time error.
8.2 Using an Index to Combine Data 8-37
66
If there are contiguous duplications in one and the UNIQUE suboption in the KEY= option is used, then
SAS reads the first observation in two.
one two
Variable Variable
A A
B B
A A
If there are noncontiguous duplications in one, then SAS reads the first observation in two.
8-38 Chapter 8 Combining Data Horizontally
The data sets being The data sets being joined The data sets on all but the
merged must be sorted or do not have to be sorted first SET statement must
indexed on the BY nor indexed. have the index named on
variable(s). the KEY= option.
An exact match on the BY Inequality joins can be An exact match on the key
variable(s) value(s) must performed. value is required.
be found.
70
8-40 Chapter 8 Combining Data Horizontally
proc sql;
create index Customer_ID
on orion.customer_dim(Customer_ID);
quit;
In_Dim=InDim;
In_Int=InInt;
In_Cat=InCat;
proc sql;
drop index Customer_ID
from orion.customer_dim;
quit;
Exercises
Level 1
Birth_ Emp_Hire_
Employee_ID Job_Title Salary Gender Date Date
Emp_Term_
Date Manager_ID SSN Employee_Name
The SAS data set orion.organization_dim contains information about all employees. There is an
index on the Employee_ID variable.
Partial Listing of orion.organization_dim
orion.organization_dim SAS Data Set
(Partial Output)
Employee_
Employee_ID Country Company Department Section Org_Group
. 2 120261 120259 . . . .
. 3 120101 120261 120259 . . .
. 3 120101 120261 120259 . . .
. 3 120101 120261 120259 . . .
. 3 120101 120261 120259 . . .
a. Create a SAS data set named sales_emps by using an index on Employee_ID to combine the
two data sets, orion.salesstaff and orion.organization_dim. Check the SAS log to ensure that
you do not have any data errors. Read only the variables Employee_ID, Department, Section,
and Org_Group from orion.organization_dim.
b. Print the first five observations of the sales_emps SAS data set.
PROC PRINT Output
Sales Employee Data
(Partial Output)
Level 2
The data set orion.shoe_prices contains pricing information for all shoes.
Partial Listing of orion.shoe_prices
shoe_prices Data Set
(Partial Listing)
Total_Retail_ CostPrice_
Obs Product_ID Price Per_Unit
Create a SAS data set named shoes and a SAS data set named errors by using an index on
Product_ID to combine the two data sets, orion.shoe_vendors and orion.shoe_prices.
8.2 Using an Index to Combine Data 8-45
a. Create a simple index on the variable Product_ID in the data set orion.shoe_prices.
b. Read only the variables Product_ID, Product_Name, Supplier_Name, and
Mfg_Suggested_Retail_Price from orion.shoe_vendors.
Hint: There is a permanent format assigned to the Supplier_Country variable. To avoid a syntax
error, use the NOFMTERR system option.
c. Read only the variables Product_ID, Total_Retail_Price, CostPrice_Per_Unit from
orion.shoe_prices.
The shoes data set should have the price information for the shoe products.
d. The errors data set should contain data that is in orion.shoe_vendors, which is not in the
orion.shoe_prices data. The errors data set should contain only the variables Product_ID,
Product_Name, and Supplier_Name.
The errors data set can then be used to determine why these vendors do not have
observations in price_list.
e. Delete the Product_ID index on the data set orion.shoe_prices.
f. Print the first five observations of the shoes SAS data set.
PROC PRINT Output
Shoe Data
(Partial Output)
Supplier_
Obs Product_ID Product_Name Name
Supplier_
Obs Product_ID Product_Name Name
1 210200400027 Toddle Children's Air Mantra (3) (Bg) Shoes Eclipse Inc
2 210200400047 Toddler Fit Shoes Eclipse Inc
3 210201000174 Freestyle Children's Leather Street Shoes 3Top Sports
4 220200100123 Big Guy Men's Deschutz Slide Shoes Eclipse Inc
Level 3
6. Combining Data Sets Using an Index and Using the Macro Facility to Monitor Errors
The data set orion.first_internet_order contains the first order that a customer placed via the
Internet.
Partial Listing of orion.first_internet_order
orion.first_internet_order SAS Data Set
(Partial Output)
Order_ Delivery_
Customer_ID Employee_ID Street_ID Date Date Order_ID
Total_Retail_ CostPrice_
Product_ID Quantity Price Per_Unit Discount
The data set orion.internet contains multiple orders that a customer placed via the Internet. There is
an index on the Order_ID variable.
8.2 Using an Index to Combine Data 8-47
Order_ Delivery_
Customer_ID Employee_ID Street_ID Date Date Order_ID
Total_Retail_ CostPrice_
Product_ID Quantity Price Per_Unit Discount
a. Create a data set named processed_orders that contains the variables from
orion.first_internet_order and a variable named Comment. Use the index on the variable
Order_ID to retrieve the matching observation from orion.internet.
b. The variable Comment has the value Order has been processed if the Order_ID is in
both orion.first_internet_order and orion.internet. The value is Order has not been
processed if the Order_ID is not in both data sets.
c. Use the %SYSRC AUTOCALL macro described in the reference information in this chapter. In
addition, refer to SAS documentation by following the path shown below:
Support & Training Ö Knowledge Base Ö Documentation Ö Base SAS Ö
SAS 9.2 Macro Language: Reference Ö Macro Language Dictionary Ö AutoCall Macros
8-48 Chapter 8 Combining Data Horizontally
Order_ Delivery_
Obs Customer_ID Employee_ID Street_ID Date Date Order_ID Product_ID
Total_Retail_ CostPrice_
Obs Quantity Price Per_Unit Discount Comment
Objectives
Create an output SAS data set that contains
summary statistics from PROC SUMMARY.
Combine the output SAS data set from PROC
SUMMARY with a detail SAS data set.
Use the SQL procedure to combine summary and
detail data.
Use the SQL procedure to calculate the summary
statistic and combine it with every observation in the
data set.
Use the DATA step to calculate the summary
statistic and combine it with every observation in the
data set.
75
Business Scenario
The data set Partial Listing of
orion.totalsalaries has one orion.totalsalaries
observation for every value Manager_ Numemps DeptSal
ID
of Manager_ID.
120101 4 $269,570
Each observation contains 120102 48 $1,344,595
the number of people who 120103 30 $793,835
report to that manager, and 120104 15 $425,215
DeptSal is the total salary 120259 6 $941,155
for all of those employees. 120260 3 $216,065
120261 6 $595,935
120262 10 $545,255
120270 1 $43,635
120271 9 $280,155
76
8-50 Chapter 8 Combining Data Horizontally
Business Scenario
You need to calculate the total salaries paid by the
company. Then, divide each individual manager's
DeptSal by that total to create a variable named Percent.
Partial PROC PRINT Output
Percentage of Total Salaries
for Each Manager
(Partial Output)
77
78
8.3 Combining Summary and Detail Data 8-51
79
Reference Information
To use the Output Delivery System to calculate the sum statistic, use the following program:
p308d05
ods output summary=sumdata;
minimum
maximum
standard deviation
80
Listing of summary
1 0 53 $15,695,800
p308d05
81
The output data set has variables that contain the requested statistics, plus the following variables:
82
data percent;
if _N_=1 then set summary(keep=GrandTot);
set orion.totalsalaries;
Percent=DeptSal / GrandTot;
format Percent percent8.2;
run;
p308d06
83
The _N_=1 condition causes the summary data set to be read only during the first iteration of the DATA
step. Without it, the DATA step reaches the end of file in summary on the second iteration of the DATA
step, and the DATA step terminates with one observation in the data set percent.
One observation from the data set orion.totalsalaries is read in each iteration of the DATA step.
8-54 Chapter 8 Combining Data Horizontally
Execution
summary True
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
. . . . . 1
84 ...
Execution
summary
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 . . . . 1
85 ...
8.3 Combining Summary and Detail Data 8-55
Execution
summary
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 120101 4 269570 . 1
86 ...
Execution
summary
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 120101 4 269570 0.0172 1
87 ...
8-56 Chapter 8 Combining Data Horizontally
Execution
summary
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065 Implicit OUTPUT;
. . .
. . . Implicit RETURN;
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 120101 4 269570 0.0172 1
88 ...
Execution
summary
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065 Initialize PDV.
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 120101 4 269570 . 2
89 ...
8.3 Combining Summary and Detail Data 8-57
Execution
summary False
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 120101 4 269570 . 2
90 ...
Execution
summary
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 120102 48 1344595 . 2
91 ...
8-58 Chapter 8 Combining Data Horizontally
Execution
Continue until EOF in
summary orion.totalsalaries.
GrandTot
15695800
data percent;
if _N_=1 then
Partial orion.totalsalaries set summary(keep=GrandTot);
Manager_ set orion.totalsalaries;
Numemps DeptSal
ID Percent=DeptSal/GrandTot;
120101 4 269570 format Percent percent8.2;
120102 48 1344595 run;
120103 30 793835
120104 15 425215
120259 6 941155
120260 3 216065
. . .
. . .
. . .
PDV
GrandTot Manager_ID Numemps DeptSal Percent D _N_
15695800 121145 45 1216055 0.077 53
94
8.06 Quiz
Open and submit the program p308a02.
1. How many observations are in the resulting data set?
2. Why?
96
8.3 Combining Summary and Detail Data 8-59
p308d08
101
The SUM function with one argument calculates the total for the column DeptSal.
Because the alias GrandTot is assigned to the sum(DeptSal) column, the SELECT statement can use the
CALCULATED keyword to refer to GrandTot as the denominator in this calculation.
When SQL remerges summary data, it puts a note in the SAS log.
SAS Log
proc sql;
2 create table percentsql as
3 select Manager_ID,
4 DeptSal,
5 sum(DeptSal) as GrandTot,
6 DeptSal/calculated GrandTot
7 as Percent format=8.2
8 from orion.totalsalaries;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.PERCENTSQL created, with 53 rows and 4 columns.
9 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.39 seconds
8.3 Combining Summary and Detail Data 8-61
In addition to using the SQL procedure for calculating the percentages in one step, the REPORT
procedure and the TABULATE procedure can calculate percentages in one step.
p308d08a
proc report data=orion.totalsalaries
out=report_pct(drop=_break_)
nowd;
column Manager_ID DeptSal DeptSal=PctSal;
define Manager_ID / display 'Manager ID';
define DeptSal / sum 'Department Salaries';
define PctSal / pctsum format=percent8.2
'Percent of Total Salaries';
run;
data percent(drop=i);
c if _N_=1 then do i=1 to TotObs;
d set orion.totalsalaries(keep=DeptSal)
nobs=TotObs;
e GrandTot + DeptSal;
end;
f set orion.totalsalaries;
g Percent=DeptSal / GrandTot;
format Percent percent8.2;
run;
p308d09
102
c During the first execution of the DATA step, the DO loop executes the SET statement for each
observation in the orion.totalsalaries data set.
d When the SET statement executes, it reads the value of DeptSal from orion.totalsalaries.
e The SUM statement GrandTot + DeptSal accumulates the value of DeptSal into the variable
GrandTot.
f The DO loop completes execution when i is greater than TotObs, preventing SAS from reaching the
end-of-file marker. The second SET statement reads the observations from orion.totalsalaries starting
with observation 1.
g The variable Percent is calculated for each of those observations.
8.3 Combining Summary and Detail Data 8-63
Reference Information
In SAS 9.2 you can use the SUM method for the hash object to calculate the grand total of the variable
DeptSal.
p308d10
data tot_sal / view=tot_sal;
set orion.totalsalaries;
Key='A';
run;
data percent;
retain GrandTot 0;
if _N_=1 then do;
dcl hash H(suminc:'DeptSal');
H.definekey('Key');
H.definedone();
do while(not Done);
set tot_sal end=Done;
H.ref();
end;
H.sum(sum:GrandTot);
end;
set orion.totalsalaries;
Percent=DeptSal / GrandTot;
format Percent percent8.2;
run;
8-64 Chapter 8 Combining Data Horizontally
Exercises
Level 1
Customer_
Customer_Type Customer_Group Age
Customer_
Obs AvgAge Customer_ID Age Age_Difference
1 41.9740 4 33 -8.9740
2 41.9740 5 28 -13.9740
3 41.9740 9 33 -8.9740
4 41.9740 10 23 -18.9740
5 41.9740 11 33 -8.9740
8.3 Combining Summary and Detail Data 8-65
Level 2
Paid_By
Cash or Check
Payroll Deduction
Payroll Deduction
Cash or Check
Payroll Deduction
a. Select any method to create a SAS data set named compare by performing the following tasks:
• Calculate the total contribution for each employee.
• Determine the average of the total contribution for all of the employees.
• Calculate the difference between the average and each individual employee's total
contribution.
b. Print the first five observations of the compare SAS data set.
PROC PRINT Output
The compare Data Set
(Partial Output)
Avg_
Obs Donation Employee_ID Qtr1 Qtr2 Qtr3 Qtr4
1 47.2581 120265 . . . 25
2 47.2581 120267 15 15 15 15
3 47.2581 120269 20 20 20 20
4 47.2581 120270 20 10 5 .
5 47.2581 120271 20 20 20 20
Total_
Obs Recipients Paid_By Donation Difference
1 Mitleid International 90%, Save the Baby Animals 10% Cash or Check 25 -22.2581
2 Disaster Assist, Inc. 80%, Cancer Cures, Inc. 20% Payroll Deduction 60 12.7419
3 Cancer Cures, Inc. 10%, Cuidadores Ltd. 90% Payroll Deduction 80 32.7419
4 AquaMissions International 10%, Child Survivors 90% Cash or Check 35 -12.2581
5 Cuidadores Ltd. 80%, Mitleid International 20% Payroll Deduction 80 32.7419
8-66 Chapter 8 Combining Data Horizontally
Level 3
Order_ Delivery_
Customer_ID Employee_ID Street_ID Date Date Order_ID
The data set orion.product_dim contains the variables Product_ID and Product_Name.
Partial Listing of orion.product_dim
orion.product_dim SAS Data Set
(Partial Output)
Product_ Product_
Product_ID Line Category Product_Group Product_Name
210200100009 Children Children Sports A-Team, Kids Kids Sweat Round Neck,Large Logo
210200100017 Children Children Sports A-Team, Kids Sweatshirt Children's O-Neck
210200200022 Children Children Sports Bathing Suits, Kids Sunfit Slow Swimming Trunks
210200200023 Children Children Sports Bathing Suits, Kids Sunfit Stockton Swimming Trunks Jr.
210200300006 Children Children Sports Eclipse, Kid's Clothes Fleece Cuff Pant Kid'S
Supplier_
Country Supplier_Name Supplier_ID
a. Select any method to create a SAS data set named products by performing the following tasks:
• Calculate the total CostPrice_Per_Unit weighted by Quantity.
• Combine the weighted total with the orion.order_fact data. Create a new variable named
Percent that is based on the actual total cost (CostPrice_Per_Unit *Quantity) and the
weighted total.
b. Print the first five observations of the products SAS data set.
PROC PRINT Output
The products Data Set
(Partial Output)
CostPrice_
Obs Customer_ID Quantity Per_Unit Product_Name Percent
Reference Information
To create a running total for a variable, you can use either the DATA step or the SQL procedure.
p308d11
proc sort data=orion.order_fact out=order_fact;
by Order_Date Order_ID;
run;
data running_totals;
keep Order_Date Product_ID Total_Retail_Price
Sum_Total_Retail_Price;
set order_fact;
Sum_Total_Retail_Price + Total_Retail_Price;
format Sum_Total_Retail_Price dollar8.2;
run;
Sum_Total_
Order_ Total_Retail_ Retail_
Obs Date Product_ID Price Price
p308d11
proc sql;
create table order_fact_with_obsnum as
select monotonic() as obsnum,
*
from orion.order_fact;
create table running_totals_sql as
select o1.Order_Date,
o1.Product_ID,
o1.Total_Retail_Price,
(select sum(o2.Total_Retail_Price)
from order_fact_with_obsnum as o2
where o2.obsnum <= o1.obsnum) as Sum_Total_Retail_Price
format=dollar8.2
from order_fact_with_obsnum as o1
order by Order_Date, Order_ID, Sum_Total_Retail_Price;
The monotonic function enables you to create row numbers in SQL that are written to the table
and not only a displayed value. This Base SAS function returns 1 the first time that it is called, 2
the second time, 3 the next time, and so forth. See SAS Usage Note 15138 for more information
about the monotonic function.
Running Totals using PROC SQL
Date
Order was Total Retail
placed by Price for Sum_Total_
Customer Product ID This Product Retail_Price
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
11JAN2003 220101300017 $16.50 $16.50
15JAN2003 230100500026 $247.50 $264.00
20JAN2003 240600100080 $28.30 $292.30
28JAN2003 240600100010 $32.00 $324.30
27FEB2003 240200200039 $63.60 $387.90
02MAR2003 240100400005 $234.60 $622.50
03MAR2003 240800200062 $35.40 $657.90
03MAR2003 240800200063 $73.80 $731.70
09MAR2003 240500100004 $127.00 $858.70
09MAR2003 240500200003 $23.20 $881.90
8-70 Chapter 8 Combining Data Horizontally
To ensure that the data sets are the same, you can use PROC COMPARE.
p308d11
proc compare data=running_totals compare=running_totals_sql;
title 'Comparing the Resulting Data Sets';
run;
Variables Summary
Observation Summary
First Obs 1 1
Last Obs 617 617
NOTE: No unequal values were found. All values compared are exactly equal.
8.4 Combining Data Conditionally (Self-Study) 8-71
Objectives
Combine data conditionally using multiple SET
statements.
Combine data conditionally with the SQL procedure.
Combine data conditionally using a hash object.
106
Business Scenario
Some combinations of data are based on a condition.
The data set orion.order_fact contains the
Total_Retail_Price for all values of Order_Date.
orion.order_fact(where=(Order_Date between
'01SEP2007'd and '30SEP2007'd))
Customer Employee Total_Retail CostPrice
Street_ID Order_Date . . . Discount
_ID _ID _Price _Per_Unit
928 99999999 9050100016 04SEP2007 ... $86.30 $41.40 .
27 99999999 9260105670 05SEP2007 ... $78.40 $16.45 .
31 121057 9260128428 06SEP2007 ... $50.30 $25.25 .
45 121065 9260104847 06SEP2007 ... $78.20 $39.20 .
5 121026 9260114570 09SEP2007 ... $52.50 $22.25 .
12 121051 9260103713 18SEP2007 ... $87.20 $44.95 .
69 121029 9260116402 20SEP2007 ... $23.50 $9.20 .
24 99999999 9260115784 25SEP2007 ... $46.10 $19.70 .
41 120195 1600101527 26SEP2007 ... $134.00 $28.90 .
11 99999999 3940108592 28SEP2007 ... $78.20 $19.65 .
107
8-72 Chapter 8 Combining Data Horizontally
Business Scenario
The data set orion.rates has the average conversion
rate for converting from dollars to euros for the weeks in
September 2007.
orion.rates
SDate EDate AvgRate
01SEP2007 07SEP2007 0.73117
08SEP2007 14SEP2007 0.72184
15SEP2007 21SEP2007 0.71589
22SEP2007 30SEP2007 0.70725
108
Business Scenario
You need to determine the Total_Retail_Price in euros.
Listing of euros
Total_
Customer_ Order_ Product_
Retail_ SDate EDate AvgRate EuroPrice
ID Date ID
Price
928 04SEP2007 230100600030 $86.30 01SEP2007 07SEP2007 0.73117 € 63.10
27 05SEP2007 240500200082 $78.40 01SEP2007 07SEP2007 0.73117 € 57.32
31 06SEP2007 220200100137 $50.30 01SEP2007 07SEP2007 0.73117 € 36.78
45 06SEP2007 230100600015 $78.20 01SEP2007 07SEP2007 0.73117 € 57.18
5 09SEP2007 210200500016 $52.50 08SEP2007 14SEP2007 0.72184 € 37.90
12 18SEP2007 240200100053 $87.20 15SEP2007 21SEP2007 0.71589 € 62.43
69 20SEP2007 210200700016 $23.50 15SEP2007 21SEP2007 0.71589 € 16.82
24 25SEP2007 240600100102 $46.10 22SEP2007 30SEP2007 0.70725 € 32.60
41 26SEP2007 210200600067 $134.00 22SEP2007 30SEP2007 0.70725 € 94.77
11 30SEP2007 220200100002 $78.20 28SEP2007 30SEP2007 0.70725 € 55.31
109
8.4 Combining Data Conditionally (Self-Study) 8-73
Partial PDV
Order_Date SDate EDate AvgRate
04SEP2007 01SEP2007 07SEP2007 0.73117
Order_Date between
SDate and EDate
110 ...
Partial PDV
Order_Date SDate EDate AvgRate
04SEP2007 01SEP2007 07SEP2007 0.73117
112 ...
8-74 Chapter 8 Combining Data Horizontally
Partial PDV
Order_Date SDate EDate AvgRate
09SEP2007 01SEP2007 07SEP2007 0.73117
114 ...
Partial PDV
Order_Date SDate EDate AvgRate
09SEP2007 08SEP2007 14SEP2007 0.72184
115
8.4 Combining Data Conditionally (Self-Study) 8-75
True False
116
8.07 Poll
Can the DATA step merge be used for this task?
Yes
No
118
8-76 Chapter 8 Combining Data Horizontally
120
To use multiple SET statements in this fashion, both data sets must be sorted in order (ascending or
descending) by the variables tested in the DO WHILE statement.
8.08 Quiz
Why do you have to use a WHERE= data set option
rather than a WHERE statement to subset by
Order_Date?
127
8.4 Combining Data Conditionally (Self-Study) 8-79
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
129 ...
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
run;
SDate EDate AvgRate The DO WHILE condition
01SEP2007 07SEP2007 0.73117 is true, so the DO loop
08SEP2007 14SEP2007 0.72184 executes.
15SEP2007 21SEP2007 0.71589
22SEP2007 30SEP2007 0.70725
PDV
True
Total_
Customer Order_ Product_ Avg Euro D
Retail_ SDate EDate _N_
_ID Date ID Rate Price
Price
928 04SEP2007 230100600030 86.30 . . . . 1
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
132 ...
8.4 Combining Data Conditionally (Self-Study) 8-81
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
The DO WHILE condition
01SEP2007 07SEP2007 0.73117 is false, so the DO loop
08SEP2007 14SEP2007 0.72184
does not execute.
15SEP2007 21SEP2007 0.71589
22SEP2007 30SEP2007 0.70725
PDV
Total_
Customer Order_ Product_ False
Retail_ SDate EDate
Avg Euro D
_N_
_ID Date ID Rate Price
Price
928 04SEP2007 230100600030 86.30 01SEP2007 07SEP2007 0.73117 . 1
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
136 ...
8-82 Chapter 8 Combining Data Horizontally
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
01SEP2007 Implicit
07SEP2007OUTPUT;
0.73117
08SEP2007
Implicit RETURN;
14SEP2007 0.72184
15SEP2007 21SEP2007 0.71589
22SEP2007 30SEP2007 0.70725
PDV
Total_
Customer Order_ Product_ Avg Euro D
Retail_ SDate EDate _N_
_ID Date ID Rate Price
Price
928 04SEP2007 230100600030 86.30 01SEP2007 07SEP2007 0.73117 63.10 1
137
8.09 Quiz
What variables are set to missing at the top of the DATA
step?
139
8.4 Combining Data Conditionally (Self-Study) 8-83
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . Initialize
. PDV. set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
141 ...
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
142 ...
8-84 Chapter 8 Combining Data Horizontally
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
The DO WHILE condition
01SEP2007 07SEP2007 0.73117 is false, so the DO loop
08SEP2007 14SEP2007 0.72184
does not execute.
15SEP2007 21SEP2007 0.71589
22SEP2007 30SEP2007 0.70725
PDV
Customer Order_ Product_ False
Total_
Avg Euro D
Retail_ SDate EDate _N_
_ID Date ID Rate Price
Price
27 05SEP2007 240500200082 78.40 01SEP2007 07SEP2007 0.73117 . 2
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
145 ...
8.4 Combining Data Conditionally (Self-Study) 8-85
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate Continue until
run;
147 ...
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
150 ...
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
The DO WHILE condition
01SEP2007 07SEP2007 0.73117 is false, so the DO loop
08SEP2007 14SEP2007 0.72184
does not execute.
15SEP2007 21SEP2007 0.71589
22SEP2007 30SEP2007 0.70725
PDV
Total_
Customer Order_ Product_ False
Retail_ SDate EDate
Avg Euro D
_N_
_ID Date ID Rate Price
Price
5 09SEP2007 210200500016 52.50 08SEP2007 14SEP2007 0.72184 . 5
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. . . . . .
end;
orion.rates EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
153 ...
Execution
Partial orion.order_fact
data euros;
Total_
Customer
. . .
Order
. . . Retail_ . . .
set orion.order_fact(where=(Order_Date
_ID _Date between '01SEP2007'd and
Price
928 . . . 04SEP2007 . . . 86.30 . . . '30SEP2007'd)
27 . . . 05SEP2007 . . . 78.40 . . . keep=Customer_ID Order_Date
31 . . . 06SEP2007 . . . 50.30 . . . Product_ID
45 . . . 06SEP2007 . . . 78.20 . . . Total_Retail_Price);
5 . . . 09SEP2007 . . . 52.50 . . . do while (not (SDate le Order_Date le
. . . . . . EDate));
. . . . . . set orion.rates;
. .
Continue until
.
.
EOF . .
end;
orion.rates for orion.order_fact. EuroPrice=Total_Retail_Price*AvgRate;
format EuroPrice Euro10.2;
SDate EDate AvgRate run;
154
8-88 Chapter 8 Combining Data Horizontally
Order_ Total_Retail_
Obs Customer_ID Date Product_ID Price SDate EDate AvgRate EuroPrice
155
157
8.4 Combining Data Conditionally (Self-Study) 8-89
p308d13
159
Date
Order was Total Retail
placed by Price for
Customer ID Customer Product ID This Product SDate EDate AvgRate EuroPrice
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
928 04SEP2007 230100600030 $86.30 01SEP2007 07SEP2007 0.731167 €63.10
27 05SEP2007 240500200082 $78.40 01SEP2007 07SEP2007 0.731167 €57.32
31 06SEP2007 220200100137 $50.30 01SEP2007 07SEP2007 0.731167 €36.78
45 06SEP2007 230100600015 $78.20 01SEP2007 07SEP2007 0.731167 €57.18
5 09SEP2007 210200500016 $52.50 08SEP2007 14SEP2007 0.72184 €37.90
12 18SEP2007 240200100053 $87.20 15SEP2007 21SEP2007 0.715886 €62.43
69 20SEP2007 210200700016 $23.50 15SEP2007 21SEP2007 0.715886 €16.82
24 25SEP2007 240600100102 $46.10 22SEP2007 30SEP2007 0.70725 €32.60
41 26SEP2007 210200600067 $134.00 22SEP2007 30SEP2007 0.70725 €94.77
11 28SEP2007 220200100002 $78.20 22SEP2007 30SEP2007 0.72184 €55.31
160
8-90 Chapter 8 Combining Data Horizontally
p308d14
data euros;
length SDate EDate AvgRate 8;
drop rc;
format SDate EDate Order_Date date9. EuroPrice Euro10.2;
if _N_=1 then do;
declare hash H(dataset: 'orion.rates', ordered: 'ascending');
H.definekey('SDate');
H.definedata('SDate','EDate', 'AvgRate');
H.definedone();
call missing(SDate, EDate, AvgRate);
declare hiter E('H');
end;
set orion.order_fact(where=(Order_Date between '01SEP2007'd and
'30SEP2007'd)
keep=Customer_ID Order_Date Product_ID
Total_Retail_Price);
E.first();
do until (rc ne 0);
if SDate <= Order_Date <= EDate then do;
EuroPrice=Total_Retail_Price * AvgRate;
output;
leave;
end;
else if SDate > Order_Date then leave;
rc=E.next();
end;
run;
8.4 Combining Data Conditionally (Self-Study) 8-91
162
8-92 Chapter 8 Combining Data Horizontally
Exercises
Level 1
10. Combining Two Data Sets Conditionally Using the SQL Procedure
The data set orion.ages_mod contains information about age groups.
Listing of orion.ages_mod
orion.ages_mod SAS Data Set
First_
Description Age Last_Age
15-29 years 15 30
30-44 years 30 45
45-59 years 45 60
60-75 years 60 75
a. Use the SQL procedure to create a data set named age_groups that contains the customer ID,
name, age, and age group (the variable Description in the orion.ages_mod SAS data set) as of
January 1, 2008. Order the data by Customer_ID.
Hint: The following calculates the customer age:
int(yrdif(Birth_Date,'01Jan2008'd, 'ACT/ACT'))
8.4 Combining Data Conditionally (Self-Study) 8-93
Level 2
11. Combining Two Data Sets Conditionally Using the DATA Step DO Loop
The data set orion.ages_mod contains information about age groups.
Listing of orion.ages_mod
orion.ages_mod SAS Data Set
First_
Description Age Last_Age
15-29 years 15 30
30-44 years 30 45
45-59 years 45 60
60-75 years 60 75
a. Use the DATA step and a DO loop to create a data set named age_groups that contains the
customer ID, name, age, and age group (the variable Description in the orion.ages_mod
SAS data set) as of January 1, 2008.
b. Print the first five observations of the age_groups data set.
PROC PRINT Output
age_groups
(Partial Output)
Level 3
12. Combining Two Data Sets Conditionally Using the DATA Step Hash Object
The data set orion.ages_mod contains information about age groups.
Listing of orion.ages_mod
orion.ages_mod SAS Data Set
First_
Description Age Last_Age
15-29 years 15 30
30-44 years 30 45
45-59 years 45 60
60-75 years 60 75
a. Use the DATA step and a hash object to create a data set named age_groups that contains the
customer ID, name, age, and age group (the variable Description in the orion.ages_mod
SAS data set) as of January 1, 2008.
b. Print the first five observations of the age_groups data set.
PROC PRINT Output
age_groups
(Partial Output)
Chapter Review
1. Given the following input data, what is one difference
in data sets created by the default DATA step MERGE
and the default SQL procedure inner join?
one two
X Y X Z
1 a 1 f
2 b 3 t
3 c 4 w
165 continued...
Chapter Review
2. When data sets are combined using the SET/SET
KEY= syntax, how is the data set named in the first
SET statement read?
167 continued...
8.5 Chapter Review 8-97
Chapter Review
4. If the following program is used to combine the
summary data set containing one observation with the
detail data set containing fifty observations, how many
observations are in the data set combined?
data combined;
set summary;
set detail;
run;
169
8-98 Chapter 8 Combining Data Horizontally
8.6 Solutions
Solutions to Exercises
1. Merging or Joining Three Data Sets
a. Combine the three data sets to create a data set named purchases that contains the customer
name, product name, and supplier name for the customers in the orion.order_fact data set.
p308s01
/* Merge Solution */
data temp;
merge order_fact(in=O)
orion.customer_dim(keep=Customer_ID Customer_Name
in=C);
by Customer_ID;
if O and C;
run;
data purchases;
keep Customer_Name Product_Name Supplier_Name;
merge temp(in=T)
orion.product_dim(keep=Product_ID Product_Name
Supplier_Name
In=P);
by Product_ID;
if P and T;
run;
data purchases
no_products(keep=Product_ID Product_Name
Supplier_Name);
merge temp(in=T)
orion.product_dim(keep=Product_ID Product_Name
Supplier_Name
in=P);
by Product_ID;
if P and T then output purchases;
else if P and not T then output no_products;
run;
proc sql;
create table no_purchases as
select Customer_ID,
Customer_Name
from orion.customer_dim
where customer_dim.Customer_ID not in
(select Customer_ID from orion.order_fact);
create table no_products as
select Product_ID, Product_Name
from orion.product_dim
where product_dim.Product_ID not in
(select Product_ID from orion.order_fact);
create table purchases as
select order_fact.*,
Customer_Name,
Product_Name,
Supplier_Name
from orion.order_fact,
orion.product_dim,
orion.customer_dim
where order_fact.Customer_ID=customer_dim.Customer_ID
and order_fact.Product_ID=product_dim.Product_ID
order by order_fact.Product_ID;
quit;
(Continued on the next page.)
8.6 Solutions 8-101
/* Merge Solution */
data manager1;
merge man1(in=M)
emp_addresses(keep=Manager_Level1 Manager1_Name);
by Manager_Level1;
if M;
run;
data manager2;
merge man2(in=M)
emp_addresses(rename=(Manager_Level1=Manager_Level2
Manager1_Name=Manger2_Name)
keep=Manager_Level1 Manager1_Name);
by Manager_Level2;
if M;
run;
data manager3;
merge man3(in=M)
emp_addresses(rename=(Manager_Level1=Manager_Level3
Manager1_Name=Manger3_Name)
keep=Manager_Level1 Manager1_Name);
by Manager_Level3;
if M;
run;
data manager4;
merge man4(in=M)
emp_addresses(rename=(Manager_Level1=Manager_Level4
Manager1_Name=Manger4_Name)
keep=Manager_Level1 Manager1_Name);
by Manager_Level4;
if M;
run;
data manager5;
merge man5(in=M)
emp_addresses(rename=(Manager_Level1=Manager_Level5
Manager1_Name=Manger5_Name)
keep=Manager_Level1 Manager1_Name);
by Manager_Level5;
if M;
run;
data manager_names;
merge man6(in=M)
emp_addresses(rename=(Manager_Level1=Manager_Level6
Manager1_Name=Manger6_Name)
keep=Manager_Level1 Manager1_Name);
by Manager_Level6;
if M;
run;
(Continued on the next page.)
8-104 Chapter 8 Combining Data Horizontally
data sales_emps;
set orion.salesstaff;
set orion.organization_dim(keep=Employee_ID Department
Section Org_Group)
key=Employee_ID;
if _IORC_=0;
run;
b. Print the first five observations of the sales_emps SAS data set.
p308s04
proc print data=sales_emps(obs=5);
title 'Sales Employee Data';
title2 '(Partial Output)';
run;
8.6 Solutions 8-105
The errors data can then be used to determine why these vendors do not have
observations in price_list.
e. Delete the Product_ID index on the data set orion.shoe_prices.
p308s05
proc datasets lib=orion nolist;
modify shoe_prices;
index create Product_ID;
run;
quit;
/**************************************/
/* If you keep the supplier_country */
/* variable, uncomment and submit */
/* the following options statement */
/* to avoid an error */
/**************************************/
*options nofmterr;
c. Use the %SYSRC AUTOCALL macro described in the reference information in this chapter. In
addition, refer to SAS OnlineDoc by following the path shown below:
Support & Training Ö Knowledge Base Ö Documentation Ö Base SAS Ö
SAS 9.2 Macro Language: Reference Ö Macro Language Dictionary Ö AutoCall Macros
8.6 Solutions 8-107
p308s06
data processed_orders;
set orion.first_internet_order;
set orion.internet key=Order_ID;
length Comment $30;
select (_IORC_);
when (%sysrc(_sok)) do;
Comment='Order has been processed.';
output;
end;
when (%sysrc(_dsenom)) do;
_ERROR_=0;
Comment='Order has not been processed.';
output;
end;
otherwise;
end;
run;
d. Print the first 10 observations of processed_orders.
p308s06
proc print data=processed_orders(obs=10);
title 'Internet Orders';
title2 '(Partial Output)';
run;
7. Combining Summary Data Containing an Average with Detail Data
a. Calculate the average age of all customers.
b. Create a SAS data set named age_dif, which combines the average age of all customers with the
orion.customer_dim data set in order to determine the difference between each customer’s age
and the average for all customers. (You can use any method presented in this section.)
p308s07
/* Using PROC SUMMARY and the DATA step */
data age_dif;
if _N_=1 then set average(keep=AvgAge);
set orion.customer_dim(keep=Customer_ID Customer_Age);
Age_Difference=Customer_Age - AvgAge;
run;
(Continued on the next page.)
8-108 Chapter 8 Combining Data Horizontally
proc sql;
create table age_dif as
select AvgAge,
Customer_ID,
Customer_Age,
Customer_Age - AvgAge as Age_Difference
from orion.customer_dim,
average;
quit;
proc sql;
create table age_dif as
select mean(Customer_Age) as AvgAge,
Customer_ID,
Customer_Age,
Customer_Age - calculated AvgAge as Age_Difference
from orion.customer_dim;
quit;
data age_dif;
drop i Tot_Age;
if _N_=1 then do i=1 to TotObs;
set orion.customer_dim(keep=Customer_Age) nobs=TotObs;
Tot_Age + Customer_Age;
end;
set orion.customer_dim(keep=Customer_ID Customer_Age);
AvgAge=Tot_Age / TotObs;
Age_Difference=Customer_Age - AvgAge;
run;
c. Print the first five observation of the age_dif SAS data set.
p308s07
proc print data=age_dif(obs=5);
var AvgAge Customer_ID Customer_Age Age_Difference;
title 'The age_dif Data Set';
title2 '(Partial Output)';
run;
8.6 Solutions 8-109
data donations;
set orion.employee_donations;
Total_Donation=sum(of Qtr1 - Qtr4);
run;
data compare;
if _N_=1 then set totals;
set donations;
Difference=Total_Donation - Avg_Donation;
run;
proc sql;
create table compare as
select Avg_Donation,
donations.*,
Total_Donation - Avg_Donation as Difference
from totals,
donations;
quit;
/* Using PROC SQL only */
proc sql;
create table compare as
select mean(sum(Qtr1, Qtr2, Qtr3, Qtr4)) as Avg_Donation,
employee_donations.*,
sum(Qtr1, Qtr2, Qtr3, Qtr4) as Total_Donation,
calculated Total_Donation - calculated Avg_Donation
as Difference
from orion.employee_donations;
quit;
data compare;
drop i;
if _N_=1 then do i=1 to TotObs;
set orion.employee_donations(keep=Qtr1 - Qtr4)
nobs=TotObs;
Total + sum(of Qtr1 - Qtr4);
end;
set orion.employee_donations;
Total_Donation=sum(of Qtr1 - Qtr4);
Avg_Donation=Total / TotObs;
Difference=Total_Donation-Avg_Donation;
run;
b. Print the first five observations of the compare SAS data set.
p308s08
proc print data=compare(obs=5);
var Avg_Donation Employee_ID Qtr1 Qtr2 Qtr3 Qtr4
Recipients Paid_By Total_Donation Difference;
title 'The compare Data Set';
title2 '(Partial Output)';
run;
8.6 Solutions 8-111
proc sql;
create table products as
select Customer_ID,
CostPrice_Per_Unit,
Quantity,
Product_Name,
(Quantity * CostPrice_Per_Unit) / Total_Cost as
Percent format=percent9.3
from totals,
orion.order_fact,
orion.product_dim
where order_fact.Product_ID=product_dim.Product_ID;
quit;
(Continued on the next page.)
8-112 Chapter 8 Combining Data Horizontally
proc sql;
create table products as
select Customer_ID,
CostPrice_Per_Unit,
Quantity,
Product_Name,
(Quantity * CostPrice_Per_Unit)/
sum(Quantity * CostPrice_Per_Unit)as Percent
format=percent9.3
from orion.order_fact,
orion.product_dim
where order_fact.Product_ID=product_dim.Product_ID;
quit;
10. Combining Two Data Sets Conditionally Using the SQL Procedure
a. Use the SQL procedure to create a data set named age_groups.
p308s10
proc sql;
create table age_groups as
select Customer_ID,
Customer_Name,
int(yrdif(Birth_Date, '01Jan2008'd, 'ACT/ACT')) as Age,
Description
from orion.customer,
orion.ages_mod
where calculated Age between First_Age and Last_Age
order by Customer_ID;
quit;
b. Print the first five observations of the age_groups data set.
p308s10
proc print data=age_groups(obs=5);
title 'age_groups';
title2 '(Partial Output)';
run;
11. Combining Two Data Sets Conditionally Using the DATA Step DO Loop
a. Use the DATA step and a DO loop to create a data set named age_groups.
p308s11
proc sort data=orion.customer(keep=Customer_ID Birth_Date
Customer_Name)
out=customer;
by descending Birth_Date;
run;
data age_groups;
keep Customer_ID Customer_Name Age Description;
set customer;
Age=int(yrdif(Birth_Date, '01Jan2008'd, 'ACT/ACT'));
do while (not (First_Age le Age lt Last_Age));
set orion.ages_mod;
end;
run;
b. Print the first five observations of the age_groups data set.
p308s11
proc print data=age_groups(obs=5);
title 'age_groups';
title2 '(Partial Output)';
run;
8-114 Chapter 8 Combining Data Horizontally
12. Combining Two Data Sets Conditionally Using the DATA Step Hash Object
a. Use the DATA step and a hash object to create a data set named age_groups that contains the
customer ID, name, age, and age group (the variable Description in the orion.ages_mod
SAS data set) as of January 1, 2008.
p308s12
/* Using the DATA Step Hash Object */
data age_groups;
keep Customer_ID Customer_Name Age Description;
if _N_=1 then do;
if 0 then set orion.ages_mod;
declare hash AG(dataset: 'orion.ages_mod',
ordered: 'ascending');
AG.definekey('First_Age');
AG.definedata('First_Age', 'Last_Age', 'Description');
AG.definedone();
declare hiter A('AG');
end;
/* alternative solution */
19
8.6 Solutions 8-117
38
54
8-118 Chapter 8 Combining Data Horizontally
60
61
8.6 Solutions 8-119
98 continued...
8-120 Chapter 8 Combining Data Horizontally
99
119
8.6 Solutions 8-121
128
140
8-122 Chapter 8 Combining Data Horizontally
158
8.6 Solutions 8-123
168 continued...
8-124 Chapter 8 Combining Data Horizontally
170
Chapter 9 Sorting SAS Data Sets
Objectives
List the reasons for sorting data.
Define the SAS sort.
Define threading.
Calculate the workspace and library space
required to sort a SAS data file.
Allocate sort workspace.
Use the EQUALS|NOEQUALS option.
Use the SORTEDBY= option.
Use the PRESORTED option.
Change the collating sequence of the SORT
procedure.
6
9-4 Chapter 9 Sorting SAS Data Sets
Threading Terminology
In SAS®9, the SORT procedure is multi-threaded.
thread a single, independent flow of control
through a program or within a
process
symmetric computers with multiple CPUs that
multiprocessing share the same memory and a
machines thread-enabled operating system,
(SMPs) providing the ability to spawn and
process multiple threads
simultaneously
parallel multiple units of work scheduled for
processing concurrent execution by the
operating system
8
9.1 Using the SORT Procedure 9-5
SORT
Partial results
Collate process
Multi-Threaded Processing
Threading can be enabled or disabled for the following
Base SAS procedures:
PROC MEANS/SUMMARY
PROC REPORT
PROC SORT
PROC TABULATE
10
When you benchmark using the threaded procedures, use the real-time statistic rather than the
CPU-time statistic. The back-end collating process to re-create the single data set might result in
an increase in total CPU time, while reducing wall-clock time (time from submission of code for
execution to return of results).
9-6 Chapter 9 Sorting SAS Data Sets
11
If the TAGSORT option is used with PROC SORT, threading is disabled. The TAGSORT option
stores only the BY variables and the observation numbers (named tags) in temporary files. At the
completion of the sorting process, PROC SORT uses the tags to retrieve records from the input
data set in sorted order.
9.01 Quiz
Open and submit the program p309a01.
How many CPUs are available in your SAS session?
13
9.1 Using the SORT Procedure 9-7
15
The SAS Administrator might limit the number of CPUs that are available for SAS processing, so
the value ACTUAL might be less than the total number of CPUs in the machine that SAS is using.
9.02 Poll
Have you ever run out of space during a sort?
Yes
No
17
9-8 Chapter 9 Sorting SAS Data Sets
orion
Disk Space
orders
orders
19
9.1 Using the SORT Procedure 9-9
Reference Information
The formula below calculates the estimated amount of space needed by a single-threaded PROC SORT:
bytes required=((4 * obslen) + (2 * keylen)) * numobs
The formula below calculates the estimated amount of space needed by a multi-threaded PROC SORT:
bytes required=3 * (obslen * numobs)
The space calculation for the SAS 8.2 sort is as follows:
bytes required=(keylen + obslen) * numobs * N
20
21
9.1 Using the SORT Procedure 9-11
options fullstimer;
proc sort data=orion.order_fact
sortsize=max;
by Order_Date;
run;
p309d02
22
24
9-12 Chapter 9 Sorting SAS Data Sets
25
26
9.1 Using the SORT Procedure 9-13
Sorted Data
When data is sorted by SAS, the descriptor contains the
following information:
1. a sort indicator that contains the variable(s) on which
the data is sorted
2. whether the sort is validated by SAS
3. the character set used
Additional information contained in the descriptor portion
includes the following:
4. the collating sequence used for ordering the data
5. collation rules, if the data set is sorted linguistically
27
Sorted Data
Partial PROC CONTENTS Output
The CONTENTS Procedure
Sort Information
Sortedby Emp_Hire_Date
Validated YES
Character Set ANSI
p309d03
28
ANSI (American National Standards Institute) is an organization in the United States that coordinates
voluntary standards and conformity to those standards. ANSI works with the ISO (International
Organization for Standardization) to establish global standards.
By default, the ANSI character set uses ASCII (American Standard Code for Information Interchange) for
the Windows and UNIX operating environments and EBCDIC (Extended Binary Coded Decimal
Interchange Code) for z/OS.
9-14 Chapter 9 Sorting SAS Data Sets
30
32
9.1 Using the SORT Procedure 9-15
data-set-name(SORTEDBY=BY-clause | _NULL_ )
33
data january(sortedby=Order_Date);
infile M1 dlm=',';
input Customer_ID Order_ID Order_Type
Order_Date : date9.
Delivery_Date : date9.;
run;
p309d04
34
9-16 Chapter 9 Sorting SAS Data Sets
<lines removed>
Sort Information
Sortedby Order_Date
Validated NO
Character Set ANSI
p309d04
35
Log
1197 proc sort data=january;
1198 by Order_Date;
1199 run;
p309d05
36
9.1 Using the SORT Procedure 9-17
p309d06
37
In SAS 9.2, the SORTVALIDATE system option specifies whether the SORT procedure verifies that a
data set is sorted according to the variables in the BY statement when the sort indicator metadata
designates a user-specified sort order. NOSORTVALIDATE is the default.
Sortedby Emp_Hire_Date
Validated YES
Character Set ANSI
p309d06
38
9-18 Chapter 9 Sorting SAS Data Sets
p905d05
proc sort data=january;
by Order_Date;
run;
40
41
9.1 Using the SORT Procedure 9-19
Sort Order
The character set determines the sort order of a particular
character in relation to other characters. By default,
PROC SORT uses one of the following collating
sequences, depending on the environment under which
the procedure is running:
ASCII (Windows and UNIX)
EBCDIC (z/OS)
43
The options EBCDIC, ASCII, NATIONAL, DANISH, SWEDISH, and REVERSE can change the default
collating sequence.
9-20 Chapter 9 Sorting SAS Data Sets
44
9.06 Quiz
The data set orion.salesstaff was previously sorted in
ascending order by Employee_Hire_Date.
Open and submit program p309a02.
proc sort data=orion.salesstaff
out=salesstaff;
by Employee_ID;
run;
proc print data=salesstaff;
where Employee_ID=120134;
run;
46
9.1 Using the SORT Procedure 9-21
Variables Summary
Observation Summary
First Obs 1 1
First Unequal 14 14
Last Unequal 49 49
Last Obs 163 163
50
9.1 Using the SORT Procedure 9-23
EBCDIC
DANISH
FINNISH
NORWEGIAN
POLISH
SWEDISH
NATIONAL
SORTSEQ=
Refer to the “Collating Sequence” chapter of the SAS National Language Support (NLS): User’s
Guide for detailed information about the various collating sequences and when they are used.
63 US . . . 25 Briarforest Pl . . .
4 Burke Street
195 AU . . . . . .
Woolloongabba
215 AU . .. 23 Benjamin Street . ..
53
9-24 Chapter 9 Sorting SAS Data Sets
54
UPPER sorts uppercase letters first, and then the lowercase letters.
LOWER sorts lowercase letters first, and then the uppercase letters.
9-26 Chapter 9 Sorting SAS Data Sets
Sort Information
Sortedby Customer_Address
Validated YES
Character Set ANSI
Collating Sequence LINGUISTIC
Sort Information
Locale en_US
Strength 3
Numeric Collation ON
9-28 Chapter 9 Sorting SAS Data Sets
51 79 9658 Dinwiddie Ct US
52 544 A Blok No: 1 TR
53 1100 A Blok No: 1 TR
54 1684 A Blok No: 1 TR
55 2618 Arnold Road 2 ZA
56 65 Bahnweg 1 DE
57 2550 Bryanston Drive 122 ZA
58 42 Carl Von Linde Str. 13 DE
59 11 Carl-Zeiss-Str. 15 DE
60 1033 Fahrettin Kerim Gokay Cad. No. 24 TR
61 2788 Fahrettin Kerim Gokay Cad. No. 30 TR
62 19 Hechtsheimerstr. 18 DE
63 50 Humboldtstr. 1 DE
64 13 Iese 1 DE
65 9 Kallstadterstr. 9 DE
66 908 Mayis Cad. Nova Baran Plaza Ka 11 TR
67 14703 Mivtza Boulevard 17 IL
68 12386 Mivtza Kadesh St 16 IL
69 19873 Mivtza Kadesh St 18 IL
70 14104 Mivtza Kadesh St 25 IL
71 19444 Mivtza Kadesh St 61 IL
72 3959 Moerbei Avenue 120 ZA
73 33 Münsterstraße 67 DE
74 61 Münzstr. 28 DE
75 16 Oberstr. 61 DE
76 2806 Quinn Street 11 ZA
77 928 Turkcell Plaza Mesrutiyet Cad. 142 TR
9-30 Chapter 9 Sorting SAS Data Sets
Exercises
Level 1
c. Submit a PROC CONTENTS step to determine whether the data set holidays is sorted by Date.
d. Change the BY variable to Holiday_Name and resubmit the PROC SORT step.
What is the resulting message in the log?
e. Submit a PROC CONTENTS step to determine whether the data set holidays is sorted by
Holiday_Name.
Level 2
d. Use PROC PRINT to create a report grouped by the Company variable. Print the first 24
observations of the data set.
Partial PROC PRINT Output
------------------------- Company=Logistics --------------------------
e. Use the PROC SORT PRESORTED option to turn on the validated flag and use
PROC CONTENTS to verify that the sort flag is set in the descriptor portion.
9-32 Chapter 9 Sorting SAS Data Sets
Level 3
Objectives
Define BY-group processing.
Use indexes to return the data in sorted order.
Use indexes to combine data horizontally.
Use a format to group data for BY-group processing.
Use a CLASS statement.
Specify a user-asserted sort order.
62
BY-Group Processing
BY-group processing has these characteristics:
is a method of processing observations that are
grouped or ordered by the values of the BY variables
can be used in both DATA and PROC steps
user-sort assertion
a CLASS statement
63
9-34 Chapter 9 Sorting SAS Data Sets
65
67 p309d10
9.2 BY-Group Processing (Self-Study) 9-35
Total_Retail_
Obs Customer_ID Product_ID Quantity Price
62 4 240600100017 1 $53.00
70 4 220101400145 1 $16.70
79 4 240700100011 3 $80.97
111 4 230100100053 2 $92.60
Total_Retail_
Obs Customer_ID Product_ID Quantity Price
37 5 240100100433 1 $3.00
48 5 220101400276 2 $136.80
88 5 240300200018 1 $87.20
89 5 240300300071 1 $138.00
148 5 220101400265 2 $74.20
149 5 220101400387 4 $50.40
69
9-36 Chapter 9 Sorting SAS Data Sets
70
Business Scenario
The data set orion.street_code contains the Street_ID
and the name, city, and country for the streets.
71
9.2 BY-Group Processing (Self-Study) 9-37
Business Scenario
The SAS data set orion.order_fact contains the
information needed for the delivery of products to Orion
customers. There is no index on Street_ID in the
orion.order_fact data set.
Partial Listing of orion.order_fact
Customer_
Street_ID Delivery_Date Order_ID Quantity
ID
63 9260125492 11JAN2003 1230058123 1
5 9260114570 19JAN2003 1230080101 1
45 9260104847 22JAN2003 1230106883 1
41 1600101527 28JAN2003 1230147441 2
183 1600100760 27FEB2003 1230315085 3
79 9260101874 03MAR2003 1230333319 1
23 9260126679 08MAR2003 1230338566 1
23 9260126679 08MAR2003 1230338566 2
45 9260104847 11MAR2003 1230371142 2
72
Business Scenario
Combine the two data sets to expedite delivery of
customer orders.
1. Create an index on Street_ID in orion.street_code.
2. Use SET/SET with KEY=Street_ID to combine the data
sets.
Partial Listing of addresses
Street_ Postal_ City_
Customer_ID Street_ID . . . Name Code Name
Briarforest
63 9260125492 . . . Pl
62201 St. Clair
73
9-38 Chapter 9 Sorting SAS Data Sets
data addresses;
set orion.order_fact(keep=Customer_ID Street_ID
Delivery_Date Order_ID Product_ID Quantity);
set orion.street_code(keep=Street_ID Country Street_Name
City_Name Postal_Code)
key=Street_ID / unique;
run;
75
76
9.2 BY-Group Processing (Self-Study) 9-39
78
Using Indexes
proc datasets library=orion nolist;
modify street_code;
index create Street_ID;
orion.order_fact
quit; is read sequentially.
data addresses;
set orion.order_fact(keep=Customer_ID
Street_ID
Delivery_Date
Order_ID
Product_ID
Quantity) ;
set orion.street_code(keep=Street_ID
Country
Street_Name
City_Name
orion.street_code is read
Postal_Code)
key=Street_ID / unique; by accessing the index and
run;
directly accessing the
appropriate observation.
p309a03
80
The UNIQUE option causes a KEY= search to use the first matching observation from the
indexed data set, if there are duplicates.
9-40 Chapter 9 Sorting SAS Data Sets
83
9.11 Quiz
Open and submit the program p309a03.
Are there any data errors in the log?
data addresses;
set orion.order_fact(keep=Customer_ID Street_ID
Delivery_Date Order_ID Product_ID Quantity);
set orion.street_code(keep=Street_ID Country
Street_Name City_Name Postal_Code)
key=Street_ID / unique;
run;
85
9-42 Chapter 9 Sorting SAS Data Sets
9.12 Quiz
Open and submit the program p309a04.
1. What does the SAS log show for this DATA step?
2. Why do you get those error messages?
data addresses2;
set orion.order_fact(keep=Customer_ID Street_ID
Delivery_Date Order_ID Product_ID Quantity);
set orion.street_code(keep=Street_ID Country
Street_Name City_Name Postal_Code)
key=Street_ID;
run;
87
The data values in the addresses data set and the data
values in the addresses2 data sets are equal.
Observation Summary
First Obs 1 1
Last Obs 617 617
NOTE: No unequal values were found. All values compared are exactly equal.
p309a04
90
9.2 BY-Group Processing (Self-Study) 9-43
Business Scenario
You must print the data set orion.shoe_vendors
by Group_Name for the vendors where the variable
Mfg_Suggested_Retail_Price is greater than $100.
Mfg_Suggested_Retail_Price>100
Mfg_Suggested_
Obs Group_Name Supplier_Name Category_Name Retail_Price
91
Mfg_Suggested_
Obs Group_Name Supplier_Name Category_Name Retail_Price
94
p309d11
96
9.2 BY-Group Processing (Self-Study) 9-45
Supplier_ Mfg_Suggested_
Obs Name Category_Name Retail_Price
N = 4
Supplier_ Mfg_Suggested_
Obs Name Category_Name Retail_Price
N = 3
Mfg_Suggested_
Obs Supplier_Name Category_Name Retail_Price
97
9-46 Chapter 9 Sorting SAS Data Sets
98
The BYSORTED SAS system option can be used to affect how SAS treats all SAS data sets.
The BYSORTED SAS system option has the following characteristics:
• specifies that observations in a data set or data sets are sorted in alphabetic or numeric order
• should be used if the data set is ordered by the BY variable
OPTIONS BYSORTED;
If observations with the same BY value are grouped together but are not necessarily sorted in alphabetic
or numeric order, use the NOBYSORTED option.
OPTIONS NOBYSORTED;
When the NOBYSORTED option is specified, you do not have to specify the NOTSORTED
option in a BY statement to access grouped data.
9.2 BY-Group Processing (Self-Study) 9-47
9.14 Quiz
Open and submit the program p309a06.
proc print data=orion.order_fact(obs=10);
title 'Using the NOTSORTED Option with Ungrouped Data';
by Customer_ID notsorted;
var Order_ID Order_Date Delivery_Date Quantity –-
CostPrice_Per_Unit;
run;
100
Business Scenario
Create a SAS data set that Partial Listing of
contains the total quantity of orion.order_fact
items sold each year. Order_Date Quantity
The values of Order_Date 11JAN2003 1
and Quantity are in the 15JAN2003 1
data set orion.order_fact. 20JAN2003 1
28JAN2003 2
27FEB2003 3
02MAR2003 1
03MAR2003 1
03MAR2003 2
09MAR2003 2
09MAR2003 1
15MAR2003 2
103
9-48 Chapter 9 Sorting SAS Data Sets
p309d12
104
The GROUPFORMAT option enables the BY statement to use the YEAR4. format to create
FIRST.Order_Date and LAST.Order_Date.
The NOTSORTED option can be used with the GROUPFORMAT option if the data is grouped,
but not sorted.
9.15 Quiz
Open and submit the program p309a07.
data yr_totals;
keep Order_Date YrTot;
set orion.order_fact(keep=Order_Date
Quantity);
format Order_Date year4.;
by groupformat Order_Date;
if first.Order_Date then YrTot=0;
YrTot + Quantity;
if last.Order_Date;
run;
proc print data=yr_totals;
title 'Total Quantity Sold each Year';
run;
Advantages Disadvantages
can be used to create requires that the data set be sorted
ordered/grouped reports by the GROUPFORMAT variable or
without sorting the data grouped by the formatted values of
the GROUPFORMAT variable
causes the DATA step to available only in the DATA step
process formatted BY values
in the same way that SAS
procedures do
frequently eliminates the need
for another step
109
9-50 Chapter 9 Sorting SAS Data Sets
PROC TABULATE
PROC SUMMARY
PROC UNIVARIATE
110
Reference Information
ORDER= specifies the order in which to group the levels of the class variables in the output.
The values for ORDER= can be any of the following:
INTERNAL orders values by ascending unformatted values. The
INTERNAL order yields the same order as the SORT procedure.
The order depends on your operating environment. This sort
sequence is particularly useful for displaying dates
chronologically. The term UNFORMATTED is an alias for
INTERNAL. INTERNAL is the default order.
DATA orders values according to their order in the input data set.
FORMATTED orders values by the ascending formatted values. This order
depends on your operating environment.
FREQ orders values by descending frequency count.
GROUPINTERNAL specifies not to apply formats to the class variables when the MEANS,
SUMMARY, or TABULATE procedures group the values to create combinations
of class variables.
MISSING considers missing values as valid class variable levels. Special missing values that
represent numeric values (the letters A through Z and the underscore (_)
character) are each considered as a separate value.
9.2 BY-Group Processing (Self-Study) 9-51
111
9.16 Quiz
1. Open and submit the program p309a08.
2. Change the BY statement to a CLASS statement and
resubmit the program.
3. Are the statistics created with a CLASS statement
equal to those created with a BY statement?
113
9-52 Chapter 9 Sorting SAS Data Sets
The SUMSIZE= option is available as both a SAS system option and as a PROC statement option.
9.2 BY-Group Processing (Self-Study) 9-53
Exercises
Level 1
Level 2
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚Total Retail Price‚ Customer Age Group ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚15-30 years ‚31-45 years ‚46-60 years ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Supplier Name ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚
‚3Top Sports ‚ 8,923.17‚ 8,728.44‚ 4,631.40‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Greenline Sports ‚ ‚ ‚ ‚
‚Ltd ‚ 1,232.00‚ 1,767.18‚ 1,474.08‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Pro Sportswear Inc‚ 5,684.60‚ 2,863.30‚ 2,623.10‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Top Sports ‚ $0‚ 355.20‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Triple Sportswear ‚ ‚ ‚ ‚
‚Inc ‚ $0‚ 18.20‚ 18.20‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
9.2 BY-Group Processing (Self-Study) 9-55
Page 2
Products by Sales Supplier and Customer Age Group
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚Total Retail Price‚ Customer Age Group ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚15-30 years ‚31-45 years ‚46-60 years ‚61-75 years ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Supplier Name ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚3Top Sports ‚ 8,211.90‚ 7,512.20‚ 6,225.16‚ 4,445.20‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Greenline Sports ‚ ‚ ‚ ‚ ‚
‚Ltd ‚ 1,255.80‚ 998.30‚ 875.58‚ 1,423.90‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Pro Sportswear Inc‚ 5,797.40‚ 2,115.50‚ 2,148.30‚ 2,446.60‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Top Sports ‚ $0‚ 88.80‚ $0‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Triple Sportswear ‚ ‚ ‚ ‚ ‚
‚Inc ‚ 161.80‚ 1,001.00‚ 254.50‚ 53.80‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
c. Modify the program so that the PROC SORT step is not necessary and the PROC TABULATE
step reads from the data set orion.purchased_products. Resubmit the program.
If necessary, consult SAS OnlineDoc or the SAS Help facility about PROC TABULATE
in order to determine what changes must be made to the program.
PROC TABULATE Output
Products by Sales Supplier and Customer Age Group
Order Type 3
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚Total Retail Price‚ Customer Age Group ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚15-30 years ‚31-45 years ‚46-60 years ‚61-75 years ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Supplier Name ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚3Top Sports ‚ 8,923.17‚ 8,728.44‚ 4,631.40‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Greenline Sports ‚ ‚ ‚ ‚ ‚
‚Ltd ‚ 1,232.00‚ 1,767.18‚ 1,474.08‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Pro Sportswear Inc‚ 5,684.60‚ 2,863.30‚ 2,623.10‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Top Sports ‚ $0‚ 355.20‚ $0‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Triple Sportswear ‚ ‚ ‚ ‚ ‚
‚Inc ‚ $0‚ 18.20‚ 18.20‚ $0‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
9-56 Chapter 9 Sorting SAS Data Sets
Page 2
Products by Sales Supplier and Customer Age Group
Order Type 2
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚Total Retail Price‚ Customer Age Group ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚15-30 years ‚31-45 years ‚46-60 years ‚61-75 years ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Supplier Name ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚3Top Sports ‚ 8,211.90‚ 7,512.20‚ 6,225.16‚ 4,445.20‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Greenline Sports ‚ ‚ ‚ ‚ ‚
‚Ltd ‚ 1,255.80‚ 998.30‚ 875.58‚ 1,423.90‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Pro Sportswear Inc‚ 5,797.40‚ 2,115.50‚ 2,148.30‚ 2,446.60‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Top Sports ‚ $0‚ 88.80‚ $0‚ $0‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Triple Sportswear ‚ ‚ ‚ ‚ ‚
‚Inc ‚ 161.80‚ 1,001.00‚ 254.50‚ 53.80‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
Level 3
b. Open the program p309e06, add a PROC SORT step to create a data set named sorted_sort that
is sorted by Name, and submit the program. Record the usage statistics.
CPU
Memory
I/O
c. Open the program p309e06, add a PROC SQL step to create a new table in sorted order for Name
from temp, and submit the program. Record the usage statistics.
CPU
Memory
I/O
d. Open the program p309e06, write a DATA step using a hash object to sort the data, and submit
the program. Record the usage statistics.
CPU
Memory
I/O
Chapter Review
1. Define a threaded sort.
118
9.4 Solutions 9-59
9.4 Solutions
Solutions to Exercises
1. Using the PRESORTED Option
a. Use PROC CONTENTS to determine whether the data set orion.holidays is sorted.
p309s01
proc contents data=orion.holidays;
run;
b. Write a PROC SORT step to sort the data orion.holidays by Date. Create a temporary data set
named holidays. Use the PRESORTED option in the PROC SORT statement.
What is the resulting message in the log?
p309s01
proc sort data=orion.holidays out=holidays presorted;
by Date;
run;
c. Submit a PROC CONTENTS step to determine whether the data set holidays is sorted by Date.
p309s01
proc contents data=holidays;
run;
9-60 Chapter 9 Sorting SAS Data Sets
d. Change the BY variable to Holiday_Name and resubmit the PROC SORT step.
What is the resulting message in the log?
p309s01
proc sort data=orion.holidays out=holidays presorted;
by Holiday_Name;
run;
e. Submit a PROC CONTENTS step to determine if the data set holidays is sorted by
Holiday_Name.
p309s01
proc contents data=holidays;
run;
2. Creating a Sorted Data Set
a. Open the program p309e02.
b. Modify the program to create a data set named profit07 and specify that it is sorted by Company
without sorting the data set.
p309s02
data profit07(sortedby=Company);
infile 'profit07.dat' dlm=',';
input Company : $30. Sales Cost Salaries Profit;
run;
c. Use PROC CONTENTS to verify that profit07 has a sort flag on the variable Company.
p309s02
proc contents data=profit07;
run;
9.4 Solutions 9-61
data payroll;
set employee_payroll(keep=Salary);
by Salary groupformat;
format Salary salaryfmt. AvgSalary TotalSal dollar15.2;
if first.Salary then do;
TotalSal=0;
Count=0;
end;
TotalSal+Salary;
Count+1;
if last.Salary then do;
AvgSalary=TotalSal/Count;
output;
end;
run;
c. Open the program p309e03, add a PROC SQL step to create a new table in sorted order for Name
from temp, and submit the program. Record the usage statistics.
p309s06
proc sql;
create table sorted_sql as
select *
from temp
order by Name;
quit;
CPU Answers vary according to operating environment.
Memory Answers vary according to operating environment.
I/O Answers vary according to operating environment.
d. Open the program p309e06, write a DATA step using a hash object to sort the data, and submit
the program. Record the usage statistics.
p309s06
data _null_;
length Name $4 I J 8;
if _N_=1 then do;
declare hash S(dataset:'temp', ordered:'Ascending');
S.definekey('Name', 'I', 'J');
S.definedata('Name', 'I', 'J');
S.definedone();
call missing(Name, I, J);
end;
S.output(dataset:'sorted_hash');
run;
CPU Answers vary according to operating environment.
Memory Answers vary according to operating environment.
I/O Answers vary according to operating environment.
14
31
9.4 Solutions 9-67
42
55
66
9.4 Solutions 9-69
77
79
9-70 Chapter 9 Sorting SAS Data Sets
86
88
9.4 Solutions 9-71
89
95
9-72 Chapter 9 Sorting SAS Data Sets
Order_ Yr
Obs Date Tot
1 2003 233
2 2004 182
3 2005 153
4 2006 225
5 2007 285
108
114
9-74 Chapter 9 Sorting SAS Data Sets
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically ......................... 10-10
Exercises ............................................................................................................................ 10-46
10.4 Using FILE and PUT Statements to Create a SAS Program File ............................ 10-78
Demonstration: Using the DATA Step to Send E-Mail ....................................................... 10-87
10.1 Introduction
Objectives
List various programming techniques for improving
programmer efficiency.
Provide examples of using the macro facility.
Use functions.
Substitute a procedure for a DATA step.
4
10-4 Chapter 10 Programmer Efficiency
6
10.1 Introduction 10-5
macro definitions
p310d01
8
10-6 Chapter 10 Programmer Efficiency
%mend PrintSubset;
The %STR function masks (that is, removes the normal meaning of) these special tokens:
+ - * / , < > = ; ' "
LT EQ GT LE GE NE AND OR NOT
blank
General form of the %STR function:
%STR(argument)
10.02 Quiz
In addition to saving programmer time, does creating a
macro variable or a macro definition always save
computer resources?
11
10-8 Chapter 10 Programmer Efficiency
If you do not specify the length of the new variable, the value of the new variable returned by any
of the CAT functions has a length of 200.
If the receiving variable is numeric, the CAT functions remove leading and trailing blanks from numeric
arguments after they format the numeric values with the BEST. format. No note is written to the log when
the BEST. format is used.
10.1 Introduction 10-9
Using Procedures
Example of selecting appropriate procedures for data
processing:
Use the SUMMARY procedure…
p310d04
15 continued...
Using Procedures
…instead of the DATA step.
proc sort data=orion.shoe_vendors(keep=Line_Name
Mfg_Suggested_Retail_Price
out=shoe_vendors;
by Line_Name;
run;
data sum;
keep Line_Name Avg_MSP;
set shoe_vendors;
by Line_Name;
if first.Line_Name then do;
Tot_MSP=0;
Count=0;
end;
Tot_MSP + Mfg_Suggested_Retail_Price;
if Mfg_Suggested_Retail_Price ne . then Count+1;
if last.Line_Name then do;
Avg_MSP=Tot_MSP/Count;
output;
end;
run; p310d04
16
10-10 Chapter 10 Programmer Efficiency
Objectives
Develop a program that is flexible.
Using the FILENAME statement, create a SAS data
set from multiple raw data files.
Using the FILEVAR= option, create a SAS data set
from multiple raw data files.
18
Flexible Programming
Programs that run in a production environment should
be as flexible as possible so that there is little, if any,
editing of the program code when the program is
submitted.
These programs are often developed using the following
steps:
1. Write an initial version of the program quickly, even
if it requires editing on subsequent submissions.
2. Add syntax, such as the DATE and TIME functions,
that can extract current information.
3. Make the program as efficient as possible.
19
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-11
Business Scenario
You need to use 12 raw data files to create a SAS data
set that contains the data for the current month and the
two previous months.
The raw data files have the same file structure and similar
names: mon1.dat, mon2.dat, and so forth. They are all
comma-separated files with these fields in the same
order: Customer_ID, Order_ID, Order_Type,
Order_Date, and Delivery_Date.
Partial Listing of mon1.dat
1 1 2 2 3 3 4 4 5 5 6
1---5----0----5----0----5----0----5----0----5----0----5----0
53,1232087464,1,13JAN2007,13JAN2007
49,1232092527,1,13JAN2007,13JAN2007
34,1232161564,1,23JAN2007,23JAN2007
2618,1232173841,3,25JAN2007,30JAN2007
20
Business Scenario
Every month you need to provide reports that contain
three months of data to Orion executives. The three
months are the current month and the previous two
months (rolling quarter).
21
10-12 Chapter 10 Programmer Efficiency
22
23
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-13
data quarter;
infile MON dlm=',';
input Customer_ID Order_ID Order_Type
Order_Date : date9. Delivery_Date : date9.;
run;
p310d05
24
In Windows and UNIX, you can use the * wildcard to specify that all 12 monthly raw data files are to be
read.
filename MON ('mon*.dat');
SAS Log
764 filename MON ('mon3.dat' 'mon2.dat' 'mon1.dat'); * PC and Unix;
765 *filename MON ('.workshop.rawdata(mon3)'
766 '.workshop.rawdata(mon2)'
767 '.workshop.rawdata(mon1)'); * z/OS;
768
769 data quarter;
770 infile MON dlm=',';
771 input Customer_ID Order_ID Order_Type
772 Order_Date : date9. Delivery_Date : Date9.;
773 run;
File Name='S:\Workshop\mon3.dat,
File Name=S:\Workshop\mon1.dat,
774
775 proc print data=quarter;
776 title 'quarter ';
777 run;
NOTE: There were 20 observations read from the data set WORK.QUARTER.
NOTE: PROCEDURE PRINT used (Total process time):
real time 1.34 seconds
cpu time 0.00 seconds
25
A FILENAME statement can associate a fileref with multiple physical external files.
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-15
10.03 Quiz
1. Open and submit the program p310a01.
2. How many observations are in the data set quarter?
27
30
Similar to automatic variables, the FILEVAR= variable is not written to the data set.
The FILEVAR= variable can read raw data files conditionally.
mon + 9 + .dat
mon + 10 + .dat
mon + 11 + .dat
There are multiple techniques for creating the names of
the raw data files.
31
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-17
10.04 Quiz
If the value of the variable i is the number of the month,
which of the following could be used to create the name
of the raw data file?
a. NextFile=cats("mon",i,".dat");
b. NextFile="mon"||put(i,2.)||".dat";
NextFile=compress(NextFile);
c. NextFile=compress("mon"||put(i,2.)||".dat");
33
When i=11
NextFile=mon11.dat
When i=10
NextFile=mon10.dat
When i=9
NextFile=mon9.dat
35
10-18 Chapter 10 Programmer Efficiency
The first four of the following statements are within the DO loop:
c The assignment statement creates the name of the raw data file.
d The INFILE statement with the FILEVAR= option names the raw data file. In addition, the
FILEVAR= option closes the current file and opens a new file if the value of the FILEVAR= variable
changes.
e The INPUT statement copies a record of the raw data file, converts it to SAS format, and writes it to
the PDV.
f The OUTPUT statement outputs the observation that is created by the INPUT statement.
g The STOP statement outside the DO loop stops the DATA step after all of the observations are
written.
In this example, the DATA step does not encounter the end of file. If the STOP statement were not
included, the program would continue to execute the DO loop repetitively. Therefore, the STOP statement
is needed to prevent an infinite loop of the DATA step.
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-19
SAS Log
807
808 data movingq;
809 drop i;
810 do i=11,10,9;
811 NextFile=cats("mon",put(i,2.),".dat"); * PC and UNIX;
812 *NextFile=cats(".lwprg3.rawdata(mon",put(i,2.),")"); * mainframe ;
813 infile ORD filevar=NextFile dlm=',';
814 input Customer_ID Order_ID Order_Type
815 Order_Date:date9. Delivery_Date:Date9.;
816 output;
817 end;
818 stop;
819 run;
File Name=S:\Workshop\mon11.dat,
RECFM=V,LRECL=256
File Name=S:\Workshop\mon10.dat,
RECFM=V,LRECL=256
File Name=S:\Workshop\mon9.dat,
RECFM=V,LRECL=256
820
821 proc print data=movingq;
822 title 'Moving Quarter Data';
823 run;
NOTE: There were 3 observations read from the data set WORK.MOVINGQ.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
10-20 Chapter 10 Programmer Efficiency
data movingq;
drop i;
do i=11, 10, 9;
NextFile=cats("mon", i, ".dat");
infile ORD filevar=NextFile dlm=',';
input Customer_ID Order_ID Order_Type
Order_Date : date9. Delivery_Date : date9.;
output;
end;
stop;
run;
38
10.05 Poll
Is the STOP statement necessary?
Yes
No
39
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-21
41
The DO WHILE statement continues to execute the INFILE statement for every record of the raw data
file until the value of LastObs=1. The DO WHILE statement checks the condition at the top of the loop.
The END= option creates the variable LastObs that can be used to determine the end of the raw data file.
The END= option names a variable whose value is one of the following:
0 when the current input data record is not the last in the current input file
1 when the current input record is the last in the current input file
10-22 Chapter 10 Programmer Efficiency
The MONTH function is used to obtain the month number of today’s date to begin the rolling month
range. The month numbers of the two months before today’s month number are then calculated.
10-24 Chapter 10 Programmer Efficiency
10.07 Poll
Will the SAS code in p310d08 produce the correct results
if the current month is January or February?
Yes
No
47
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-25
INTNX Function
The INTNX function increments a date value by a given
interval or intervals, and returns a date value.
EDate=intnx('interval', BDate, increment)
Formatted Value Using the INTNX Function Formatted Value
of BDate of EDate
04JUL2008 intnx('year', BDate, -1) 01JAN2007
04JUL2008 intnx('year', BDate, 0) 01JAN2008
04JUL2008 intnx('year', BDate, 1) 01JAN2009
04JUL2008 intnx('month', BDate, -2) 01MAY2008
04JUL2008 intnx('month', BDate, -1) 01JUN2008
04JUL2008 intnx('month', BDate, 0) 01JUL2008
04JUL2008 intnx('month', BDate, 1) 01AUG2008
04JUL2008 intnx('month', BDate, 2) 01SEP2008
49
The INTNX function also supports multiples of an interval and shifted intervals.
The program p310d08a contains the SAS DATA step code to replicate these results.
data dates;
BDate='04JUL2008'd;
PreviousYear=intnx('year', BDate, -1);
ThisYear=intnx('year', BDate, 0);
NextYear=intnx('year', BDate, 1);
TwoMonthsBack=intnx('month', BDate, -2);
PreviousMonth=intnx('month', BDate, -1);
ThisMonth=intnx('month', BDate, 0);
NextMonth=intnx('month', BDate, 1);
TwoMonthsFromNow=intnx('month', BDate, 2);
format BDate PreviousYear ThisYear NextYear TwoMonthsBack
PreviousMonth ThisMonth NextMonth TwoMonthsFromNow date9.;
run;
INTNX Function
General form of the INTNX function:
INTNX('interval', start-from, increment<, alignment>)
The INTNX function also supports multiples of an interval and shifted intervals.
General form of the INTNX function with multiples and shift indexes:
shift- specifies the starting point of the interval. By default, the starting point is 1. A value
index that is greater than 1 shifts the start to a later point within the interval. The unit for
shifting depends on the interval. For example, YEAR.3 specifies yearly periods that
are shifted to start on the first of March of each calendar year and to end in February
of the following year. The shift index cannot be greater than the number of periods in
the entire interval. For example, YEAR2.24 has a valid shift index, but YEAR2.25 is
invalid because there is no twenty-fifth month in a two-year interval. If the default
shift period is the same as the interval type, then you can shift only multi-period
intervals with the shift index. For example, because MONTH type intervals shift by
MONTH sub-periods by default, you cannot shift monthly intervals with the shift
index. However, you can shift bimonthly intervals with the shift index, because two
MONTH intervals exist in each MONTH2 interval. The interval name MONTH2.2,
for example, specifies bimonthly periods starting on the first day of even-numbered
months.
(Continued on the next page.)
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-27
start-from specifies a SAS expression that represents a SAS date, time, or datetime value that
identifies a starting point.
increment specifies a negative, positive, or zero integer that represents the number of date, time,
or datetime intervals. Increment is the number of intervals to shift the value of start-
from.
alignment controls the position of SAS dates within the interval. Alignment can be one of these
values:
The values of alignment can be as follows:
BEGINNING | B specifies that the returned date is aligned to the beginning of the
interval (DEFAULT).
MIDDLE | M specifies that the returned date is aligned to the midpoint of the
interval.
END | E specifies that the returned date is aligned to the end of the interval.
SAMEDAY | S | SAME specifies that the date that is returned is aligned to the same calendar
date with the corresponding interval increment.
NextFile='.prog3.rawdata(mon'||put(i,2.)||')';
SAS Log
144 data movingq;
145 drop MonNum MidMon LastMon i;
146 MonNum=month(today());
147 MidMon=month(intnx('month', today(), -1));
148 LastMon=month(intnx('month', today(), -2));
149 do i=MonNum, Midmon, LastMon;
150 NextFile=cats("mon", i, ".dat");
151 infile ORD filevar=NextFile dlm=','
152 end=LastObs;
153 do while (not LastObs);
154 input Customer_ID
155 Order_ID
156 Order_Type
157 Order_Date : date9.
158 Delivery_Date : date9.;
159 output;
160 end;
161 end;
162 stop;
163 run;
10.08 Quiz
p310d09 contains the following code.
MonNum=month(today());
MidMon=month(intnx('month', today(), -1));
LastMon=month(intnx('month', today(), -2));
Why is the following program more efficient?
Today=today();
MonNum=month(Today);
MidMon=month(intnx('month', Today, -1));
LastMon=month(intnx('month', Today, -2));
53
10-30 Chapter 10 Programmer Efficiency
55 p310d10
56
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-31
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs;
do while(not LastObs);
input Num1 Num2;
output;
end;
end;
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
0 . . 0 1
57 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one; i is initialized to 'a'
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs;
do while(not LastObs);
input Num1 Num2;
output;
end;
end;
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a 0 . . 0 1
58 ...
10-32 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs;
do while(not LastObs);
input Num1 Num2;
output;
end;
end;
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 0 . . 0 1
59 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs;
do while(not LastObs);
input Num1 Num2;
output;
end;
end;
stop; LastObs=0
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 0 . . 0 1
60 ...
LastObs is reset to 0 because the value of FILEVAR= changed and a new file is opened.
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-33
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs;
do while(not LastObs);
input Num1 Num2;
output;
end; The DO WHILE evaluates
end; the condition at the top
stop; of the loop.
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 0 . . 0 1
61 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 4
do while(not LastObs);
input Num1 Num2;
output;
end;
end;
stop; LastObs=0
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 0 1 4 0 1
62 ...
10-34 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
Output current observation.
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 4
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end;
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 0 1 4 0 1
63 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 4
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop
stop; executes.
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 0 1 4 0 1
64 ...
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-35
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end;
stop; LastObs=1
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 1 1 5 0 1
65 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
Output current observation.
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 1 1 5 0 1
66 ...
10-36 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop 1 5
stop; does not execute.
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
a a.dat 1 1 5 0 1
67 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one; i increments to 'b'.
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b a.dat 1 1 5 0 1
68 ...
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-37
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b b.dat 1 1 5 0 1
69 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; LastObs=0 1 5
stop;
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b b.dat 0 1 5 0 1
70 ...
10-38 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 1 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop 1 5
stop; executes.
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b b.dat 0 1 5 0 1
71 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; LastObs=1
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b b.dat 1 2 5 0 1
72 ...
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-39
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
Output current observation.
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b b.dat 1 2 5 0 1
73 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop 1 5
stop; 2 5
run;
does not execute.
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
b b.dat 1 2 5 0 1
74 ...
10-40 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one; i increments to 'c'.
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c b.dat 1 2 5 0 1
75 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 1 2 5 0 1
76 ...
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-41
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; LastObs=0 2 5
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 0 2 5 0 1
77 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 2 , 5
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop 1 5
stop; 2 5
run;
executes.
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 0 2 5 0 1
78 ...
10-42 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 8
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; LastObs=0 2 5
run;
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 0 3 8 0 1
79 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
Output current observation.
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 8
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run; 3 8
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 0 3 8 0 1
80 ...
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-43
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 8
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop 1 5
stop; 2 5
run;
executes. 3 8
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 0 3 8 0 1
81 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 9
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; LastObs=1 1 5
stop; 2 5
run; 3 8
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 1 3 9 0 1
82 ...
10-44 Chapter 10 Programmer Efficiency
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
Output current observation.
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 9
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run; 3 8
3 9
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 1 3 9 0 1
83 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 9
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; The DO WHILE loop 1 5
stop; 2 5
run;
does not execute. 3 8
3 9
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 1 3 9 0 1
84 ...
10.2 Writing Flexible Programs: Combining Raw Data Files Vertically 10-45
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c'; The values of i
NextFile=cats(i,".dat");
Input Buffer
are all assigned.
infile ORD filevar=NextFile dlm=',' 1 2 3 4 5 6
end=LastObs; 3 , 9
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run; 3 8
3 9
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 1 3 9 0 1
85 ...
Execution (Self-Study)
a b c
1 1 1
1---5----0 1---5----0 1---5----0
1, 4 2, 5 3, 8
1, 5 3, 9
data one;
do i='a', 'b', 'c';
NextFile=cats(i,".dat");
Input Buffer
The DATA step 1 2 3 4 5 6
infile ORD filevar=NextFile dlm=','
stops execution.
end=LastObs; 3 , 9
do while(not LastObs);
input Num1 Num2; one
output; Num1 Num2
end; 1 4
end; 1 5
stop; 2 5
run; 3 8
3 9
PDV
i NextFile DLastObs
D Num1 Num2 D _ERROR_ D _N_
c c.dat 1 3 9 0 1
86 ...
10-46 Chapter 10 Programmer Efficiency
Exercises
Level 1
1 Sandrina Stephano 15-30 years Orion Club Gold members medium activity
2 Cornelia Krahl 31-45 years Orion Club Gold members medium activity
3 Markus Sepke 15-30 years Orion Club Gold members low activity
4 Oliver S. Füßling 31-45 years Orion Club Gold members high activity
5 Cynthia Martinez 46-60 years Orion Club Gold members medium activity
Obs Customer_Group
Customer_
Obs Customer_Name Age_Group Customer_Type
Obs Customer_Group
Level 2
1 Sandrina Stephano 15-30 years Orion Club Gold members medium activity
2 Cornelia Krahl 31-45 years Orion Club Gold members medium activity
3 Markus Sepke 15-30 years Orion Club Gold members low activity
4 Oliver S. Füßling 31-45 years Orion Club Gold members high activity
5 Cynthia Martinez 46-60 years Orion Club Gold members medium activity
Obs Customer_Group
Customer_
Obs Customer_Name Age_Group Customer_Type
Obs Customer_Group
Level 3
3. Using the FILEVAR= Option to Read Filenames from a SAS Data Set
The SAS data set orion.month_file contains the names of the raw data files that need to be
concatenated.
Listing of orion.month_file
Obs File_Name
1 mon1.dat
2 mon2.dat
3 mon3.dat
4 mon4.dat
5 mon5.dat
6 mon6.dat
7 mon7.dat
8 mon8.dat
9 mon9.dat
10 mon10.dat
11 mon11.dat
12 mon12.dat
The starter file p310e03 contains the following DATA step program:
p310e03
data all_months;
format Order_Date Delivery_Date date9.;
input Customer_ID Order_ID Order_Type
Order_Date : date9. Delivery_Date : Date9.;
run;
Objectives
List the types of SAS data sets.
Create and use DATA step views.
List the advantages of DATA step views.
List guidelines for using DATA step views.
90
10-52 Chapter 10 Programmer Efficiency
is a SAS file with a member type of DATA. is a SAS file with a member type of VIEW.
92 ...
10.3 Creating Views 10-53
Compilation Execution
93
The name of a DATA view must be different from the name of any existing SAS data file or view in the
same SAS library.
10-54 Chapter 10 Programmer Efficiency
p310d11
filename MON ('mon3.dat' 'mon2.dat' 'mon1.dat'); * PC and UNIX;
*filename MON ('.workshop.rawdata(mon3)'
'.workshop.rawdata(mon2)'
'.workshop.rawdata(mon1)'); * z/OS;
SAS Log
1011 filename MON ('mon3.dat' 'mon2.dat' 'mon1.dat'); * PC and Unix;
1012 *filename MON ('.workshop.rawdata(mon3)'
1013 '.workshop.rawdata(mon2)'
1014 '.workshop.rawdata(mon1)'); * z/OS;
1015
1016 proc print data=orion.quarter;
1017 title ' quarter';
1018 run;
File Name=S:\Workshop\mon3.dat,
File Name=S:\Workshop\mon1.dat,
NOTE: There were 20 observations read from the data set ORION.QUARTER_MON.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.81 seconds
cpu time 0.00 seconds
p310d12
data orion.movingq / view=orion.movingq;
drop Today MonNum MidMon LastMon i;
Today=today();
MonNum=month(Today);
MidMon=month(intnx('month', Today, -1));
LastMon=month(intnx('month', Today, -2));
do i=MonNum, MidMon, LastMon;
NextFile=cats("mon", i, ".dat"); * Windows and UNIX;
*NextFile=cats(".workshop.rawdata(mon", i, ")"); * z/OS ;
infile ORD filevar=NextFile dlm=',' end=LastObs;
do while (not LastObs);
input Customer_ID Order_ID Order_Type
Order_Date : date9. Delivery_Date : date9.;
output;
end;
end;
stop;
run;
(Continued on the next page.)
10.3 Creating Views 10-57
SAS Log
64 data orion.movingq / view=orion.movingq;
65 drop MonNum MidMon LastMon i;
66 Today=today();
67 MonNum=month(Today);
68 MidMon=month(intnx('month', Today, -1));
69 LastMon=month(intnx('month', Today, -2));
70 do i=MonNum, MidMon, LastMon;
71 NextFile=cats("mon", i, ".dat"); * Windows and UNIX;
72 *NextFile=cats(".workshop.rawdata(mon", i, ")"); * z/OS ;
73 infile ORD filevar=NextFile dlm=',' end=LastObs;
74 do while (not LastObs);
75 input Customer_ID Order_ID Order_Type
76 Order_Date : date9. Delivery_Date : date9.;
77 output;
78 end;
79 end;
80 stop;
81 run;
p310d12
proc print data=orion.movingq;
title 'movingq';
format Order_Date date9.
Delivery_Date date9.;
run;
Partial PROC PRINT Output (created in May)
MovingQ
SAS Log
192 proc print data=orion.movingq;
193 title 'movingq';
194 format Order_Date date9.
195 Delivery_Date date9.;
196 run;
NOTE: There were 34 observations read from the data set ORION.MOVINGQ.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.03 seconds
10.3 Creating Views 10-59
95
You can also create SAS data files in the DATA step that creates the view, but you can only create one
view per DATA step.
The SAS data file is not created until the view is accessed.
10.09 Quiz
Open and submit the program p310a02.
What does the log report?
data view=orion.movingq;
describe;
run;
97
10-60 Chapter 10 Programmer Efficiency
99
10.3 Creating Views 10-61
p310d13
100
The SQL procedure DESCRIBE statement retrieves the SQL view code and reports it in the log.
PROC SQL;
DESCRIBE VIEW view-name;
QUIT;
SAS Log
1213 proc sql;
1214 describe view orion.names_view;
NOTE: SQL view ORION.NAMES_VIEW is defined as:
1215 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
10-62 Chapter 10 Programmer Efficiency
For more information about the USING clause in PROC SQL, consult SAS OnlineDoc:
SAS OnlineDoc Ö Base SAS Ö SAS 9.2 SQL Procedure User’s Guide Ö
Creating and Updating Tables and Views
Advantages of Views
Advantages of Using Views
Data from multiple sources can be combined.
Complex code can be stored for reuse.
Errors and programming time can be reduced.
You can access the most current data in changing files
A SAS copy of a large data file does not have to be
stored.
You can avoid creating intermediate copies of data.
102
10.3 Creating Views 10-63
10.10 Quiz
What is the advantage of the following program?
data bonus_view(keep=Manager_ID YrEndBonus)
/ view=bonus_view;
set orion.staff;
YrEndBonus=Salary * 0.05;
where Job_Title contains 'Manager';
run;
104 p310d15
106
10-64 Chapter 10 Programmer Efficiency
data four;
set one two three;
run;
proc sort data=four;
by X;
run;
DATA Step SORT Step
107 continued...
108
10.3 Creating Views 10-65
109 p310d16
Disadvantages of Views
Disadvantages of Using Views
The code executes each time that you use a view.
110
10-66 Chapter 10 Programmer Efficiency
111
113
10.3 Creating Views 10-67
114
117
The PRINT procedure with the UNIFORM option, the CLASS statement in the MEANS/SUMMARY,
TABULATE, and UNIVARIATE procedures, and many SAS/STAT procedures require multiple passes
through the data.
In the case of multiple passes in a step, the view creates a temporary spill file so that SAS does
not have to read the data from disk multiple times.
10.3 Creating Views 10-69
118
MINIMUM uses, for each variable, the minimum column width that MIN
accommodates all values of the variable.
UNIFORM uses each variable’s formatted width as its column width on all U
pages. If the variable does not have a format that explicitly
specifies a field width, PROC PRINT uses the widest data value as
the column width.
UNIFORMBY formats all columns uniformly within a BY group, using each UBY
variable’s formatted width as its column width. If the variable does
not have a format that explicitly specifies a field width, PROC
PRINT uses the widest data value as the column width.
10-70 Chapter 10 Programmer Efficiency
119
120 continued...
10.3 Creating Views 10-71
Reference Information
Because SAS macro variables are resolved during compilation, any macro variables used in a DATA step
view are resolved when the view is created.
You can use the SYMGET function to postpone macro resolution until the view is executed.
p310d18
%let OrderType=2;
Exercises
Level 1
2) Create a variable named Total_Donations as the total of the variable values for Qtr1, Qtr2,
Qtr3, and Qtr4.
3) Create a new variable Donation_Category with the following values:
b. Open and submit the program p310e04 to create a report from the view cc_donations. Use the
variable Donation_Category as a class variable and the variable Total_Donations as an analysis
variable. Verify that the view was created correctly.
p310e04
proc means data=cc_donations sum n nonobs maxdec=2;
class Donation_Category;
var Total_Donations;
run;
Preferred PROC MEANS Output
The MEANS Procedure
Donation_
Category Sum N
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
$100 or more 200.00 2
Level 2
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds
10-76 Chapter 10 Programmer Efficiency
Employee_ Marital_
Obs Term_Date Status Dependents Age
1 . S 0 32
2 . O 2 39
3 . M 1 55
4 . S 0 34
5 . S 0 25
The PROC PRINT output was generated on June 23, 2009. Your results might vary due to
the value of Age.
f. Print the file older60 successfully.
Partial PROC PRINT Output
Older60 Data Set
Employee_ Marital_
Obs Term_Date Status Dependents Age
1 . M 1 60
2 . M 2 64
3 . M 2 60
4 . S 0 65
5 . M 3 65
The PROC PRINT output was generated on June 23, 2009. Your results might vary due to
the value of Age.
g. Why could you not print older60 in step d?
10.3 Creating Views 10-77
Level 3
6. Creating a View with the SQL Procedure and the USING Clause
You can embed a SAS LIBNAME statement in a view with the USING clause. This enables you to
store SAS libref information in the view. The scope of the libref is local to the view, and it will not
conflict with an identically named libref in the SAS session.
The starter program p310e06 contains a PROC SQL step that creates a view.
p310e06
proc sql;
create view orion.payroll_donations as
select Employee_ID, Qtr1, Qtr2, Qtr3, Qtr4,
sum(Qtr1, Qtr2, Qtr3, Qtr4) as Total_Donations
from orion.employee_donations
where Paid_By='Payroll Deduction';
quit;
Windows s:\workshop
UNIX .
z/OS .prg3.sasdata
b. Submit a LIBNAME statement to assign a libref of sasdata to the library specified in the table
above.
c. Submit a PROC PRINT step to print the view sasdata.payroll_donations.
Partial PROC PRINT Output
orion.payroll_donations View
Total_
Obs Employee_ID Qtr1 Qtr2 Qtr3 Qtr4 Donations
1 120267 15 15 15 15 60
2 120269 20 20 20 20 80
3 120271 20 20 20 20 80
4 120272 10 10 10 10 40
5 120669 15 15 15 15 60
Objectives
Use a DATA step to write SAS program code.
Include the code and submit it.
125
IF/THEN logic
DO loop processing
126
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-79
p310d19
128
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV
Job_Title D _N_
Sales Rep. I 1
129 ...
10-80 Chapter 10 Programmer Efficiency
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV
Job_Title D _N_
Sales Rep. I 1
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
130 ...
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV
Job_Title D _N_
Sales Rep. I 1
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
proc print data=orion.salesstaff;
131 ...
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-81
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV
Job_Title D _N_
Sales Rep. I 1
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
proc print data=orion.salesstaff;
title "Listing for Job Title Sales Rep. I";
132 ...
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV
Job_Title D _N_
Sales Rep. I 1
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
proc print data=orion.salesstaff;
title "Listing for Job Title Sales Rep. I";
where Job_Title="Sales Rep. I";
133 ...
10-82 Chapter 10 Programmer Efficiency
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV
Job_Title D _N_
Sales Rep. I 1
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
proc print data=orion.salesstaff;
title "Listing for Job Title Sales Rep. I";
where Job_Title="Sales Rep. I";
run;
134 ...
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV Implicit RETURN;
Job_Title D _N_
Sales Rep. I 1
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
proc print data=orion.salesstaff;
title "Listing for Job Title Sales Rep. I";
where Job_Title="Sales Rep. I";
run;
135 ...
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-83
jobs
data _null_;
Job_Title set jobs;
Sales Rep. I file 'jobs.sas';
put 'proc print data=orion.salesstaff;';
Sales Rep. II put 'title "Listing for Job Title '
Sales Rep. III Job_Title '";';
put 'where Job_Title="' Job_Title
Sales Rep. IV '";' / 'run;' /;
run;
PDV Processing
Job_Title D _N_ continues until
Sales Rep. IV 4 the end of file
in work.jobs.
jobs.sas
1 1 2 2 3 3 4 4 5 5 6 6 7 7
1---5----0----5----0----5----0----5----0----5----0----5----0----5----0----5
proc print data=orion.salesstaff;
title "Listing for Job Title Sales Rep. IV";
where Job_Title="Sales Rep. IV";
run;
136
DATA _NULL_;
…DATA step statements…
FILE file-specification;
PUT @n variable1 format … @n variable-n format;
…DATA step statements…
RUN;
137
10-84 Chapter 10 Programmer Efficiency
139
variable values
put 'title "Listing for Job Title ' Job_Title '";';
142
10-86 Chapter 10 Programmer Efficiency
NOTE: There were 63 observations read from the data set ORION.SALESSTAFF.
WHERE Job_Title='Sales Rep. I ';
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
82 +
83 +proc print data=orion.salesstaff;
84 +title "Listing for Job Title Sales Rep. II ";
85 +where Job_Title="Sales Rep. II ";
86 +run;
NOTE: There were 50 observations read from the data set ORION.SALESSTAFF.
WHERE Job_Title='Sales Rep. II ';
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
144 continued...
NOTE: There were 34 observations read from the data set ORION.SALESSTAFF.
WHERE Job_Title='Sales Rep. III ';
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
92 +
93 +proc print data=orion.salesstaff;
94 +title "Listing for Job Title Sales Rep. IV ";
95 +where Job_Title="Sales Rep. IV ";
96 +run;
NOTE: There were 16 observations read from the data set ORION.SALESSTAFF.
WHERE Job_Title='Sales Rep. IV ';
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
97 +
NOTE: %INCLUDE (level 1) ending.
145
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-87
Do not submit the %INCLUDE statement that is commented out below. There are no mail servers
attached to the classroom machines and the generated e-mail addresses are not valid e-mail
addresses.
p310d20
proc sort data=orion.order_fact(keep=Customer_ID Order_ID
Delivery_Date
obs=50)
out=order_fact;
by Customer_ID;
run;
data _null_;
merge orion.customer_dim(keep=Customer_FirstName
Customer_LastName Customer_ID)
order_fact;
by Customer_ID;
file 'email.sas';
if first.Customer_ID then do;
Address=catt(Customer_FirstName,'.',
Customer_LastName,'@something.com');
FullName=catx(' ', Customer_FirstName, Customer_LastName);
put "filename mail email '" Address "' subject='Purchases';";
put 'data _null_;';
put 'file mail;';
put "put '" FullName +(-1) ",';";
put "put 'Thank you for your orders.';";
put "put 'They will be delivered as follows:'//;";
put "put @10 'Your order number'
@30 'Expected Delivery Date'/;";
(Continued on the next page.)
10-88 Chapter 10 Programmer Efficiency
end;
DT=put(Delivery_Date,mmddyy10.);
put "put @15 '" Order_ID"' @35 '" DT "';";
if last.Customer_ID then do;
put "put /'Your friends at Orion Star';";
put "run;";
end;
run;
/*
%include 'email.sas';
*/
Partial Listing of email.sas
filename mail email '[email protected] ' subject='Purchases';
data _null_;
file mail;
put 'James Kvarniq,';
put 'Thank you for your orders.';
put 'They will be delivered as follows:'//;
put @10 'Your order number' @30 'Expected Delivery Date'/;
put @15 '1232410925 ' @35 '03/03/2004 ';
put @15 '1232455720 ' @35 '03/09/2004 ';
put @15 '1232530384 ' @35 '03/21/2004 ';
put @15 '1232654929 ' @35 '04/09/2004 ';
put @15 '1232654929 ' @35 '04/09/2004 ';
put @15 '1232709099 ' @35 '04/16/2004 ';
put @15 '1232998740 ' @35 '05/29/2004 ';
put @15 '1233543560 ' @35 '08/20/2004 ';
put @15 '1234348668 ' @35 '12/18/2004 ';
put /'Your friends at Orion Star';
run;
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-89
147
10-90 Chapter 10 Programmer Efficiency
Exercises
Level 1
a. Write a DATA step to build the following two PROC FORMAT steps. Under Windows and UNIX,
name the files customer_type.sas and customer_group.sas. Under z/OS, name the files
.workshop.sascode(customer_type) and .workshop.sascode(customer_group).
Preferred Output from the DATA Step for customer_group
proc format fmtlib;
value GrpLevl
10="Orion Club members"
20="Orion Club Gold members"
30="Internet/Catalog Customers";
run;
Preferred Output from the DATA Step for customer_type
proc format fmtlib;
value TypeLevl
1010="Orion Club members inactive"
1020="Orion Club members low activity"
1030="Orion Club members medium activity"
1040="Orion Club members high activity"
2010="Orion Club Gold members low activity"
2020="Orion Club Gold members medium activity"
2030="Orion Club Gold members high activity"
3010="Internet/Catalog Customers";
run;
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-91
Hint: To create the value for the label, use the $QUOTE format that writes data values that are
enclosed in double quotation marks. Investigate the descriptor portion of the data set to
determine the appropriate width.
b. Use the %INCLUDE statement to execute the code.
Partial SAS Log
95 %include 'Customer_Type.sas'/source2;
NOTE: %INCLUDE (level 1) file Customer_Type.sas is file S:\workshop\Customer_Type.sas.
96 +proc format fmtlib;
97 +value TypLevl
98 +1010="Orion Club members inactive"
99 +1020="Orion Club members low activity"
100 +1030="Orion Club members medium activity"
101 +1040="Orion Club members high activity"
102 +2010="Orion Club Gold members low activity"
103 +2020="Orion Club Gold members medium activity"
104 +2030="Orion Club Gold members high activity"
105 +3010=Internet/Catalog Customers"
106 +;
NOTE: Format TYPLEVL has been output.
106!+ run;
Level 2
Open the program p310e08, which contains a DATA step with a MERGE statement, and edit the
program to generate an e-mail for each employee to inform him that the total contribution was mailed.
Under Windows and UNIX, name the file donations.sas. Under z/OS, name the file
.workshop.sascode(donations). Do not include the program file. However, open the program file to
verify that it is correct.
p310e08
proc sort data=orion.employee_addresses out=employee_addresses;
by Employee_ID;
run;
data _null_;
merge orion.employee_donations(in=d) employee_addresses;
by Employee_ID;
if d;
run;
Partial Contents of donations
filename mail email '[email protected] ' subject='Your Donation';
data _null_;
file mail;
put 'Your donation of $25 has been sent to Mitleid International
90%, Save the Baby Animals 10% ';
run;
filename mail email '[email protected] ' subject='Your Donation';
data _null_;
file mail;
put 'Your donation of $60 has been sent to Disaster Assist, Inc.
80%, Cancer Cures, Inc. 20% ';
run;
10.4 Using FILE and PUT Statements to Create a SAS Program File 10-93
Level 3
MEMNAME The value you need to retrieve In the DATA= option in the PROC PRINT
from Sashelp.VCOLUMN statement and in the TITLE statement
The starter program p310e09 can be used as a starting point for this exercise.
p310e09
data _null_;
set sashelp.vcolumn;
where Libname='ORION' and Name='Product_ID';
file 'print_products.sas';
run;
10-94 Chapter 10 Programmer Efficiency
b. Store the program in a file named Print_Products.sas under Windows and UNIX and
.workshop.sascode(Print_Products) under z/OS.
Partial Contents of Print_Products
proc print data=ORION.CATALOG(obs=5);run;
title 'First Five Observations of ORION.CATALOG ';
run;
proc print data=ORION.DENMARK_CUSTOMERS(obs=5);run;
title 'First Five Observations of ORION.DENMARK_CUSTOMERS ';
run;
proc print data=ORION.FIRST_INTERNET_ORDER(obs=5);run;
title 'First Five Observations of ORION.FIRST_INTERNET_ORDER ';
run;
proc print data=ORION.INTERNET(obs=5);run;
title 'First Five Observations of ORION.INTERNET ';
run;
proc print data=ORION.MULTIPLE_ORDERS(obs=5);run;
title 'First Five Observations of ORION.MULTIPLE_ORDERS ';
run;
proc print data=ORION.NEW_PRODUCTS(obs=5);run;
title 'First Five Observations of ORION.NEW_PRODUCTS ';
run;
proc print data=ORION.ORDER_FACT(obs=5);run;
title 'First Five Observations of ORION.ORDER_FACT ';
run;
proc print data=ORION.PRICE_LIST(obs=5);run;
title 'First Five Observations of ORION.PRICE_LIST ';
run;
proc print data=ORION.PRODUCT_DIM(obs=5);run;
title 'First Five Observations of ORION.PRODUCT_DIM ';
run;
proc print data=ORION.PRODUCT_LIST(obs=5);run;
title 'First Five Observations of ORION.PRODUCT_LIST ';
run;
c. Use the %INCLUDE statement to execute the code or open the program to verify that it is correct.
(You can use the SAS Editor window, PROC FSLIST, or Notepad to verify the contents of the
program file.)
10.5 Using the FCMP Procedure (Self-Study) 10-95
Objectives
List reasons to use the FCMP procedure.
Examine the syntax for the FCMP procedure.
Create functions using the FCMP procedure.
Use the user-written functions.
Create subroutines using the FCMP procedure.
151
The FCMP procedure is new for use in the DATA step in SAS 9.2.
153
Business Scenario
The Marketing Department at Orion needs to have a
report created daily. The requirements are that the report
must include a column that is the customer ID
concatenated with a comment.
Partial Listing
Using the FCMP Procedure
Delivery_ Order_
Obs Customer_ID Date Type Marketing_Comment
154
10.5 Using the FCMP Procedure (Self-Study) 10-97
p310d21
155
156
10-98 Chapter 10 Programmer Efficiency
157
158
10.5 Using the FCMP Procedure (Self-Study) 10-99
160
All functions must return a value.
10-100 Chapter 10 Programmer Efficiency
ENDSUB;
161
162
Support for PROC FCMP functions used in WHERE statements and PROC COMPUTAB was
added in the platform for SAS Business Analytics 9.2 release.
10.5 Using the FCMP Procedure (Self-Study) 10-101
Using a Function
options cmplib=orion.functions;
data temp;
set orion.order_fact;
Marketing_Comment=
MKT(Customer_ID,Delivery_Date,Order_Type);
run;
p310d22
163
Delivery_ Order_
Obs Customer_ID Date Type Marketing_Comment
164
10-102 Chapter 10 Programmer Efficiency
165
The order of the libref.data-set names in the list (libref. data-set-1 ... libref. data-set-n)
determines the order in which the data sets are searched.
167
10.5 Using the FCMP Procedure (Self-Study) 10-103
10.13 Quiz
Specify the argument of the MKT function that
corresponds to the each of following variables:
Variable Argument
Customer_ID
Delivery_Date
Order_Type
options cmplib=orion.functions;
data temp;
set orion.order_fact;
Marketing_Comment =
MKT(Customer_ID,Delivery_Date,Order_Type);
run;
168
Business Scenario
You need to create two functions.
Function Name Use of Function
interval<multiple><.shift-index>
171
Function Result
INTNX('week',Date,0) Sunday, December 30, 2007
172
10.5 Using the FCMP Procedure (Self-Study) 10-105
Business Scenario
The data set orion.Order_Fact contains the delivery date
and order number for customer orders. If the delivery date
is on Saturday or Sunday, the order will be delivered on
Saturday. However, if the delivery date is a weekday, then
the order will be delivered on some day between Monday
and Friday in that week.
Partial Listing
Delivery Information
173
10-106 Chapter 10 Programmer Efficiency
p310d23
proc fcmp outlib=orion.functions.DateType;
function MONDAY(Date);
return(intnx('week.2', Date, 0));
endsub;
function FRIDAY(Date);
return(intnx('week.7', Date, 1)-1);
endsub;
run;
quit;
option cmplib=orion.functions;
Partial Listing
Delivery Information
9 1232698281 Between Monday, April 19, 2004 and Friday, April 23, 2004
1236028541 Saturday, June 11, 2005
1236673732 Between Monday, August 15, 2005 and Friday, August 19, 2005
1237825036 Between Monday, December 5, 2005 and Friday, December 9, 2005
1238053337 Between Monday, December 26, 2005 and Friday, December 30, 2005
10-108 Chapter 10 Programmer Efficiency
175
176
10.5 Using the FCMP Procedure (Self-Study) 10-109
Support for PROC FCMP subroutines used in %SYSFUNC and %SYSCALL macro functions,
ODS tagsets, and the Graph Template Language was added in the platform for SAS Business
Analytics 9.2 release.
p310d24
proc fcmp outlib=orion.functions.Directory;
function DIROPEN(DIR$);
length DIR$ 256 FREF $ 8;
rc=filename(FREF, DIR);
if rc=0 then do;
DID=dopen(FREF);
rc=filename(FREF);
end;
else do;
MSG=sysMSG();
put MSG '(DIROPEN(' DIR= ')';
DID=.;
end;
return(DID);
endsub;
subroutine DIRCLOSE(DID);
outargs DID;
rc=dCLOSE(DID);
DID=.;
endsub;
options cmplib=orion.functions;
data _null_;
array FILES[1000] $ 256 _temporary_;
DNUM=0;
TRUNC=0;
call DIR_entries(".", FILES, DNUM, TRUNC);
if TRUNC then put 'ERROR: Not enough result array entries.
S:\Workshop\city.sas7bdat
S:\Workshop\continent.sas7bdat
S:\Workshop\country.sas7bdat
S:\Workshop\county.sas7bdat
S:\Workshop\customer.sas7bdat
S:\Workshop\customer_dim.sas7bdat
S:\Workshop\customer_type.sas7bdat
S:\Workshop\discount.sas7bdat
S:\Workshop\employee_addresses.sas7bdat
S:\Workshop\employee_organization.sas7bdat
S:\Workshop\employee_payroll.sas7bdat
S:\Workshop\employee_phones.sas7bdat
S:\Workshop\funcs.sas7bdat
S:\Workshop\funcs.sas7bndx
S:\Workshop\functions.sas7bdat
S:\Workshop\functions.sas7bndx
S:\Workshop\geography_dim.sas7bdat
S:\Workshop\geo_type.sas7bdat
S:\Workshop\lookup_agegroup.sas7bdat
S:\Workshop\lookup_country.sas7bdat
S:\Workshop\lookup_custgrp.sas7bdat
S:\Workshop\lookup_euday.sas7bdat
S:\Workshop\lookup_order_type.sas7bdat
S:\Workshop\lookup_product.sas7bdat
S:\Workshop\lookup_usday.sas7bdat
S:\Workshop\orders.sas7bdat
S:\Workshop\order_fact.sas7bdat
S:\Workshop\order_item.sas7bdat
S:\Workshop\organization.sas7bdat
S:\Workshop\organization_dim.sas7bdat
S:\Workshop\org_level.sas7bdat
S:\Workshop\postal_code.sas7bdat
S:\Workshop\price_list.sas7bdat
S:\Workshop\product_dim.sas7bdat
S:\Workshop\product_level.sas7bdat
S:\Workshop\product_list.sas7bdat
S:\Workshop\staff.sas7bdat
S:\Workshop\state.sas7bdat
S:\Workshop\street_code.sas7bdat
S:\Workshop\supplier.sas7bdat
S:\Workshop\time_dim.sas7bdat
181
10-114 Chapter 10 Programmer Efficiency
Exercises
Level 1
Level 2
d. Print the data to ensure that the function is correctly calculating Real_Age.
PROC PRINT Output
Age Calculations based using INTCK
Obs Birth_Date Actual_Date Real_Age Age
e. Create a data set named customer_ages from the orion.customer_dim data set. The new data set
should contain a new variable named Real_Age using the AGE function with
Customer_BirthDate as the Birth_Date variable and 01JAN2008 as the value of Actual_Date.
The new data set should also contain a new variable named Age calculated using the INTCK
function. Print the first five observations of customer_ages.
There is a variable, Customer_Age, in the data set orion.customer_dim. Do not use this
variable.
PROC PRINT Output
Age Calculations based on Calendar-Based Algorithm
Customer_
Obs Customer_ID Customer_Name BirthDate
Level 3
Chapter Review
1. What are two parts of the macro facility?
185
Chapter Review
4. What is the difference between a SAS data file and a
SAS data view?
187 continued...
10-120 Chapter 10 Programmer Efficiency
Chapter Review
5. Why would you use the FILE statement?
189
10.7 Solutions 10-121
10.7 Solutions
Solutions to Exercises
1. Using the FILENAME Statement
a. Open the program p310e01.
b. Use the FILENAME statement to concatenate the three raw data files.
c. Modify the DATA step to use the fileref created in part b to create the SAS data set all_levels.
d. Print the all_levels data set.
p310s01
filename levels ('level_1.dat' 'level_2.dat' 'level_3.dat');
data all_levels;
length Customer_Name $ 40 Customer_Age_Group $ 12
Customer_Type $ 40 Customer_Group $ 40;
infile levels dlm=',';
input Customer_Name $ Customer_Age_Group $ Customer_Type $
Customer_Group $;
run;
2) Create a variable named Total_Donations as the total of the variables values for Qtr1, Qtr2,
Qtr3, and Qtr4.
3) Create a new variable Donation_Category with the following values:
b. Open and submit the program p310e04 to create a report from the view cc_donations. Use the
variable Donation_Category as a class variable and the variable Total_Donations as an analysis
variable. Verify that the view was created correctly.
p310s04
data cc_donations / view=cc_donations;
set orion.employee_donations;
length Donation_Category $15;
where Paid_By='Credit Card';
Total_Donations=sum(of Qtr1-Qtr4);
if Total_Donations >= 100 then
Donation_Category='$100 or more';
else Donation_Category='Less than $100';
run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds
/* z/OS */
%include '.workshop.sascode(Customer_Type)'/source2;
%include '.workshop.sascode(Customer_Group)'/source2;
10.7 Solutions 10-127
SAS Log
95 %include 'Customer_Type.sas'/source2;
NOTE: %INCLUDE (level 1) file Customer_Type.sas is file S:\workshop\Customer_Type.sas.
96 +proc format fmtlib;
97 +value TypLevl
98 +1010 ="Orion Club members inactive"
99 +1020 ="Orion Club members low activity"
100 +1030 ="Orion Club members medium activity"
101 +1040 ="Orion Club members high activity"
102 +2010 ="Orion Club Gold members low activity"
103 +2020 ="Orion Club Gold members medium activity"
104 +2030 ="Orion Club Gold members high activity"
105 +3010 ="Internet/Catalog Customers"
106 +;
NOTE: Format TYPLEVL has been output.
106!+ run;
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ FORMAT NAME: TYPLEVL LENGTH: 39 NUMBER OF VALUES: 8 ‚
‚ MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 39 FUZZ: 0 ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚START ‚END ‚LABEL (VER. V7|V8 18FEB2008:18:08:41)‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚1010 ‚1010 ‚Orion Club members inactive ‚
‚1020 ‚1020 ‚Orion Club members low activity ‚
‚1030 ‚1030 ‚Orion Club members medium activity ‚
‚1040 ‚1040 ‚Orion Club members high activity ‚
‚2010 ‚2010 ‚Orion Club Gold members low activity ‚
‚2020 ‚2020 ‚Orion Club Gold members medium activity ‚
‚2030 ‚2030 ‚Orion Club Gold members high activity ‚
‚3010 ‚3010 ‚Internet/Catalog Customers ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒŒ
10.7 Solutions 10-129
/* z/OS */
%include '.workshop.sascode(Print_Products)'/source2;
10. Using the FCMP Procedure to Store a Formula in a Function
a. Open the program p310e10 and submit it.
b. Use PROC FCMP to encapsulate the IF/THEN logic into a function named KB. Store the
function in work.functions.Marketing.
p310s10
proc fcmp outlib=work.functions.Marketing;
function KB(Quantity, Price);
if Quantity > 2 then return(Quantity * Price / 5);
else return(Quantity * Price / 10);
endsub;
run;
quit;
10.7 Solutions 10-131
c. Write a DATA step to create a data set named kick_backs that uses the KB function to create a
variable named Kick_Back_Amt. The DATA step should not contain any IF/THEN logic.
p310s10
options cmplib=work.functions;
data kick_backs;
set orion.order_fact(keep=Employee_ID Quantity
Total_Retail_Price);
Kick_Back_Amt=KB(Quantity, Total_Retail_Price);
run;
d. Print the first five observations of the SAS data set kick_backs.
p310s10
proc print data=kick_backs (obs=5);
run;
11. Using the FCMP Procedure to Store a Date Calculation in a Function
a. Open the program p310e11 that contains the formula and a DATA step that generates test data.
b. Write a PROC FCMP step that creates a function named AGE that contains the formula. Store the
function in work.functions.Marketing.
p310s11
proc fcmp outlib=work.functions.Marketing;
function AGE(BirthDate, ActualDate);
return(intck('year', BirthDate, ActualDate)
-(put(BirthDate, mmddyy4.) gt put(ActualDate, mmddyy4.))
+(put(BirthDate, mmddyy4.)||put(ActualDate, mmddyy4.)||
put(ActualDate + 1, mmddyy4.)='022902280301')
);
endsub;
run;
quit;
10-132 Chapter 10 Programmer Efficiency
c. Add an assignment statement to the DATA step in p310e11 that creates a variable named
Real_Age using the AGE function and a variable named Age by using the INTCK function.
p310s11
options cmplib=work.functions;
data real_ages;
do Birth_Date='28feb1960'd to '01mar1960'd;
do Actual_Date='28feb2004'd to '01mar2004'd,
'28feb2005'd to '01mar2005'd;
Real_Age=AGE (Birth_Date, Actual_Date);
Age=intck('year', Birth_Date, Actual_Date);
output;
end;
end;
format Birth_Date Actual_Date worddate.;
run;
d. Print the data to ensure that the function is correctly calculating Real_Age.
p310s11
proc print data=real_ages;
var Birth_Date Actual_Date Real_Age Age;
title1 'Age Calculations using INTCK';
run;
e. Create a data set named customer_ages from the orion.customer_dim data set. The new data set
should contain a new variable named Real_Age using the AGE function with
Customer_BirthDate as the Birth_Date variable and 01JAN2008 as the value of Actual_Date.
The new data set should also contain a new variable named Age calculated using the INTCK
function. Print the first five observations of customer_ages.
p310s11
data customer_ages;
set orion.customer_dim(keep=Customer_ID Customer_Name
Customer_Group Customer_BirthDate);
Real_Age=AGE(Customer_BirthDate,'01jan2008'd);
Age=intck('year',Customer_BirthDate,'01jan2008'd);
format Customer_BirthDate worddate.;
run;
b. Open the program p310e12 and submit it. Look in the log to ensure that the function is working
correctly.
p310s12
proc fcmp outlib=work.functions.Marketing;
function NUMS(DSN $);
length DSN $41;
DSID=open(DSN);
return(attrn(DSID, "NLOBSF"));
DSID=close(DSID);
endsub;
run;
quit;
options cmplib=work.functions;
data _null_;
X=NUMS('orion.internet');
put X=;
run;
10-134 Chapter 10 Programmer Efficiency
13
28
10.7 Solutions 10-135
b. NextFile="mon"||put(i,2.)||".dat";
NextFile=compress(NextFile);
c. NextFile=compress("mon"||put(i,2.)||".dat");
34
40
10-136 Chapter 10 Programmer Efficiency
42
48
10.7 Solutions 10-137
54
98
10-138 Chapter 10 Programmer Efficiency
115
10.7 Solutions 10-139
140
188 continued...
10.7 Solutions 10-141
190
10-142 Chapter 10 Programmer Efficiency
Chapter 11 Customizing Your SAS
Session (Self-Study)
11.1 Introduction
Objectives
Review the OPTIONS procedure.
List reasons for customizing a SAS session.
Describe the methods that are used to customize a
SAS session.
Option Tasks
11.01 Quiz
Open and submit the program p311a01.
proc options listgroups;
run;
In SAS 9.2 you can view multiple groups using the following syntax:
p311a02.sas
proc options group=(sort memory);
run;
8
11.1 Introduction 11-5
autoexec file
SAS Registry
10
11-6 Chapter 11 Customizing Your SAS Session (Self-Study)
contains only system contains SAS code such consists of keys and
options and the location as OPTIONS statements sub-keys that refer to
of SAS components or LIBNAME statements particular aspects of SAS
11
z/OS users can create a user configuration file using any text editor to write SAS system options into a
physical file. The physical file can then be specified in the CONFIG= invocation system option
interactively or in batch mode.
13
11.2 Editing the Configuration File 11-7
Objectives
Define the purpose of the configuration file.
List the two parts of the configuration file.
Create a custom configuration file.
Use the custom configuration file.
16
17
11-8 Chapter 11 Customizing Your SAS Session (Self-Study)
18
19
11.2 Editing the Configuration File 11-9
20
11-10 Chapter 11 Customizing Your SAS Session (Self-Study)
-FONTSLOC specifies the location that contains the SAS fonts that are loaded by some
Universal Printer drivers.
-SET defines a SAS (internal) environment variable. In this case, the variable,
FT15F001, specifies the file reference of a file that SAS opens when it
encounters a PARMCARDS (or PARMCARDS4) statement in a
procedure. The PARMCARDS statement is used in the BMDP and
EXPLODE procedures.
22
11-12 Chapter 11 Customizing Your SAS Session (Self-Study)
<lines removed>
The following are SAS librefs, which are established in the configuration file:
HELPLOC points to and concatenates the locations of the SAS Help facility
files.
The following are SAS internal environment variables, which are set in the configuration file:
sasroot sets the location where SAS software was installed, often referred to
as the current or working location or only as sasroot.
sasext1 sets the location for the National Language Support modules.
MYSASFILES sets the location for the Sasuser libref as an environment variable.
24
Reference Information
25
26
11.2 Editing the Configuration File 11-15
-nocenter
-nodate
-msglevel i
-linesize 64
-pagesize 56
-work /users/myuserid/tmp
27
-WORK pathname
28
11-16 Chapter 11 Customizing Your SAS Session (Self-Study)
Windows Specifics:
Creating a Configuration File
To ensure that all of the required system options are
defined in the custom configuration file, copy the default
file and modify the copy.
Name the file sasv9.cfg or .sasv9.cfg.
29 continued...
Windows Specifics:
Creating a Configuration File
Example:
-nocenter
-nodate
-msglevel i
-linesize 64
-pagesize 56
-work "c:\temp"
-sasinitialfolder s:\workshop
30
11.2 Editing the Configuration File 11-17
-WORK "library-specification"
31
-SASINITIALFOLDER newfolder
32
11-18 Chapter 11 Customizing Your SAS Session (Self-Study)
33 continued...
nocenter
nodate
msglevel=i
linesize=64
pagesize=56
-work userid.myfile.mywork
34
11.2 Editing the Configuration File 11-19
-WORK library-specification
35
11.03 Quiz
In the Windows operating environment, navigate to
C:\Program Files\SAS\SASFoundation\9.2\nls\en\sasv9.cfg
37
11-20 Chapter 11 Customizing Your SAS Session (Self-Study)
39
For more information, consult the SAS Help facility by following the path described below:
Using SAS Software in Your Operating Environment Ö
SAS 9.2 Companion for UNIX Environments Ö Running SAS Software Under UNIX Ö
Getting Started with SAS in UNIX Environments Ö
Customizing Your SAS Session by Using Configuration and Autoexec Files
40
11.2 Editing the Configuration File 11-21
"c:\program files\SAS\SASFoundation\9.2\sas.exe"
-config "c:\mysas\mysasconfig.CFG"
41
For more information, consult the SAS Help facility by following the path described below:
Using SAS Software in Your Operating Environment Ö SAS 9.2 Companion for Windows Ö
Running SAS under Windows Ö Getting Started Ö Files Used by SAS
42
For more information, consult the SAS Help facility by following the path described below:
Using SAS Software in Your Operating Environment Ö
SAS 9.2 Companion for z/OS Ö Running SAS Software under z/OS Ö
Initializing and Configuring SAS Software Ö Customizing Your SAS Session
11-22 Chapter 11 Customizing Your SAS Session (Self-Study)
Objectives
Define an autoexec file.
Create an autoexec file.
Execute the autoexec file.
45
46
11.3 Creating an Autoexec.sas File 11-23
11.04 Poll
Have you ever created an autoexec file?
Yes
No
48
50
51
11.3 Creating an Autoexec.sas File 11-25
52
53
11-26 Chapter 11 Customizing Your SAS Session (Self-Study)
11.05 Poll
Is the code from the autoexec file included as part of your
log?
Yes
No
55
NOECHOAUTO | ECHOAUTO
sas -noautoexec
z/OS
sas autoexec(noautoexec)
58
11-28 Chapter 11 Customizing Your SAS Session (Self-Study)
Objectives
Define the SAS Registry.
Investigate techniques for modifying the SAS Registry.
61
62
11.4 Using the SAS Registry 11-29
63
64
11-30 Chapter 11 Customizing Your SAS Session (Self-Study)
65
66
To open the Print Setup window, select File Ö Print Setup.
11.4 Using the SAS Registry 11-31
The PRTDEF procedure creates printer definitions in batch mode either for an individual user or for all
SAS users at your site. Your system administrator can create printer definitions in the SAS Registry and
make these printers available to all SAS users at your site by using PROC PRTDEF with the
USESASHELP option. An individual user can create personal printer definitions in the SAS Registry by
using PROC PRTDEF.
Option Task
DATA= specifies the input data set that contains the printer
attributes.
FOREIGN specifies that the registry entries are created for export
to a different host.
REPLACE specifies that any printer name that already exists will
be modified by using the information in the printer
attributes data set.
68
11.06 Quiz
Open the Registry Editor by selecting
Solutions Ö Accessories Ö Registry Editor
or use the REGEDIT command on the command line.
Which key would contain settings from the LIBNAME
window?
70
11.4 Using the SAS Registry 11-33
73
11-34 Chapter 11 Customizing Your SAS Session (Self-Study)
74
11.4 Using the SAS Registry 11-35
75
CORE\EXPLORER\KEYEVENTS The valid key events for the 3270 interface. This key is
used only on the mainframe platforms.
CORE\EXPLORER\ICONS The icons displayed in the Explorer. If the icon value is -1,
this causes the icon to be hidden in the Explorer.
CORE\EXPLORER\NEW What types of objects are available from the File Ö New
menu in Explorer.
11-36 Chapter 11 Customizing Your SAS Session (Self-Study)
76
78 p311d02 continued...
11-38 Chapter 11 Customizing Your SAS Session (Self-Study)
p311d03
80
11.4 Using the SAS Registry 11-39
81
11.07 Quiz
Open and submit p311a03.
p311a03
proc registry list startat='core\options\libnames';
run;
83
11-40 Chapter 11 Customizing Your SAS Session (Self-Study)
11.5 Solutions
11.01 Quiz
Open and submit the program p311a01.
proc options listgroups;
run;
38
11.5 Solutions 11-41
56
71
11-42 Chapter 11 Customizing Your SAS Session (Self-Study)
84
Chapter 12 Learning More
12.1 Conclusions
Objectives
Review techniques for conserving computer
resources.
4 continued...
12-4 Chapter 12 Learning More
6
12.1 Conclusions 12-5
7
12-6 Chapter 12 Learning More
Functions by Category
• http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000245860.htm
SAS Efficiency
• http://www2.sas.com/proceedings/forum2007/042-2007.pdf
• http://www2.sas.com/proceedings/forum2007/209-2007.pdf
Hash Tables
• http://www2.sas.com/proceedings/forum2007/039-2007.pdf
• http://www2.sas.com/proceedings/sugi31/244-31.pdf
• http://support.sas.com/resources/papers/sgf2008/hashing92.pdf
• http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf
• http://support.sas.com/rnd/base/datastep/dot/iterator-getting-started.pdf
Arrays
• http://support.sas.com/rnd/papers/sgf07/arrays1780.pdf
Numeric Precision
• http://support.sas.com/techsup/technote/ts654.pdf
Threading
• http://www2.sas.com/proceedings/sugi29/217-29.pdf
• http://www2.sas.com/proceedings/sugi28/282-28.pdf
Objectives
Identify areas of support that SAS offers.
List additional resources.
Education
Comprehensive training to deliver greater value to your
organization
http://support.sas.com/training/
10
12.2 SAS Resources 12-9
SAS Publishing
SAS offers a complete selection of publications to help
customers use SAS software to its fullest potential:
http://support.sas.com/publishing/
11
Computer-based
certification exams –
typically 60-70 questions
and 2-3 hours in length
Preparation materials and
practice exams available
Worldwide directory of
SAS Certified Professionals
http://support.sas.com/certify/
12
12-10 Chapter 12 Learning More
Support
SAS provides a variety of self-help and assisted-help
resources.
http://support.sas.com/techsup/
13
User Groups
SAS supports many local, regional, international, and
special-interest SAS user groups.
SAS Global Forum
http://support.sas.com/usergroups/
14
12.2 SAS Resources 12-11
List of Papers
• http://support.sas.com/resources/papers/
12-12 Chapter 12 Learning More
Objectives
Identify the next set of courses that follow this course.
16
Next Steps
SAS® Programming 3:
Advanced Techniques
and Efficiencies
Applications
SAS Macro
Development
Language
Curriculum
Web Presenting
Enablement Your
Curriculum Data Statistical Information
Warehousing Analysis
Curriculum Curriculum
17
12.3 Beyond This Course 12-13
Next Steps
To learn more about this: Enroll in the following:
SAS® Macro Language 1:
Essentials
Using the Macro
SAS® Macro Language 2:
Facility
Developing Macro
Applications
Creating graphic
reports with SAS/GRAPH® 1: Essentials
SAS/GRAPH software
Next Steps
In addition, there are prerecorded, short, technical
discussions and demonstrations that are called e-lectures.
http://support.sas.com/training/
19
12-14 Chapter 12 Learning More
Appendix A Index
business scenario, 3-5, 3-50–3-51, 6-7–6-8,
% 6-68, 6-81, 7-4, 7-15, 7-25, 8-71–8-72, 9-
36–9-37, 9-43, 9-47
%INCLUDE statement, 10-87
BY statement
%SYSRC macro, 8-26
DESCENDING option, 9-36
_ GROUPFORMAT option, 9-48
indexes, 9-36
_FREQ_ variable NOTSORTED option, 9-36
SUMMARY procedure, 8-52 versus CLASS statement, 2-47–2-48, 9-51
_IORC_ automatic variable, 8-25–8-27, 8-35 BY-group processing, 9-33–9-52, 12-5
_TYPE_ variable CLASS statement, 9-33
SUMMARY procedure, 8-52 indexes, 9-33–9-36
NOTSORTED option, 9-33
A SORT procedure, 9-33
additional information user-sort assertion, 9-33
links to, 12-6–12-7, 12-11 BYSORTED system option, 9-46
alignment, 10-27
AND operator, 3-35 C
APPEND procedure, 3-43 CALCULATED keyword, 8-60
ARRAY statement, 4-7 CALL MISSING statement, 6-26
one-dimensional arrays, 5-6–5-7 CASE_FIRST= suboption
syntax, 4-6 values, 9-25
arrays, 6-8 CAT function, 10-8
advantages of using, 5-61 CATALOG procedure, 7-10
comparing with hash objects and formats, syntax, 7-11
7-33 catalogs, 7-8
disadvantages of using, 5-61 FMTSEARCH= system option, 7-15
multidimensional, 5-22–5-28, 5-41–5-60 CATQ function, 10-8
one-dimensional, 5-3–5-16 CATS function, 10-8
overview, 4-6 CATT function, 10-8
versus hash objects, 6-41 CATX function, 10-8
assignment statement, 10-18 CEIL function, 3-60
PUT function, 7-12 centiles, 3-38
attributes, 6-6 CENTILES option, 3-38
AUTOCALL library, 8-26 chained lookups
autoexec files, 11-4–11-6, 11-22–11-26 using hash objects, 6-67–6-83
disabling, 11-27 CLASS statement, 9-50–9-51, 10-68, 12-5
ECHOAUTO system option, 11-26 BY-group processing, 9-33
NOAUTOEXEC system option, 11-27 versus BY statement, 2-47–2-48, 9-51
CLOSE value
B SASFILE statement, 2-12
BEST. format, 10-8 CMPLIB= system option, 10-102
BUFNO= system option, 2-57 CNTLIN= option, 7-6, 7-16
FORMAT procedure, 7-6
CNTLOUT= option, 7-6, 7-16
A-2 Index
Notes
x Prices are subject to change without notice.
x SAS® 9 documentation is also available online at: support.sas.com/documentation
x To order, please visit: support.sas.com/bookstore