SAS Language Reference 9.4
SAS Language Reference 9.4
SAS® Documentation
November 4, 2019
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS® 9.4 Language Reference: Concepts, Sixth
Edition. Cary, NC: SAS Institute Inc.
SAS® 9.4 Language Reference: Concepts, Sixth Edition
Copyright © 2016, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-62960-821-1 (Paperback)
ISBN 978-1-62960-822-8 (PDF)
All Rights Reserved. Produced in the United States of America.
For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you
acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and
punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted
materials. Your support of others' rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at
private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software
by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR
227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights
as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other
notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only those
set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
November 2019
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and
other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
9.4-P9:lrcon
Contents
Overview
SAS 9.4 has the following changes and enhancements:
n in SAS 9.4M6, Noto Sans TrueType fonts are added to support languages for
harmonious web display.
n in SAS 9.4M6 and later, be aware of a restriction for PROC SQL views.
n in SAS 9.4M5, you can access SAS Cloud Analytics Services from your SAS
session when you license and install SAS Viya 3.2 and later releases.
n in SAS 9.4M5, new AvenirNextforSAS and HelveticaNeueforSAS fonts replace
the Avenir Next LT W04, Avenir Next Cyr W04, and Helvetica LT Pro fonts.
n in SAS 9.4M4, new TrueType fonts are added.
n in SAS 9.4M2, font slanting and emboldening features are added. For more
information, see “Slanting and Emboldening Fonts”.
n in SAS 9.4M1, the LOCKDOWN statement and system option are added.
n in SAS 9.4, the following are new:
o universal printing enhancements to support additional graphic output types
and animation for GIF and SVG files
o buffer size specification support for SAS DATA step views
o new multilingual and Asian monolingual TrueType fonts are added
o support for extended attributes on SAS data sets and variables
o enhanced functionality to extend the observation count for 32-bit SAS data
files
o enhancements to SAS data file protection
o enhanced functionality for VIEWTABLE column headings.
xii What's New in the 9.4 Base SAS Language Reference: Concepts
Universal Printing
The following features are new in SAS 9.4:
n You can animate multi-page GIF images and SVG files.
n SAS can now create TIFF images, and the EMFPlus and EMFDual metafile
formats.
n Transparency is supported for EMF Universal Printers and GIF images that are
printed using the PostScript Universal Printer.
n You can add a printer’s mark that is not visible in Universal Printing output by
using the COLOPHON= system option.
n SVG documents can be magnified by setting the SVGMAGNIFYBUTTON
system option. SAS embeds a magnify tool in the document when the SVG
document is created.
See “Creating TIFF Images Using Universal Printing” on page 372.
Fonts
n In SAS 9.4M6 and later, the following monotype TrueType fonts are added to
support languages for harmonious web display:
o NotoSans-Bold
o NotoSans-BoldItalic
o NotoSans-Italic
o NotoSansJP-Bold
o NotoSansJP-Light
o NotoSansJP-Regular
o NotoSansKR-Bold
o NotoSansKR-Light
o NotoSansKR-Regular
o NotoSans-Regular
o NotoSansSC-Bold
o NotoSansSC-Light
o NotoSansSC-Regular
o NotoSansTC-Bold
SAS System Features xiii
o NotoSansTC-Light
o NotoSansTC-Regular
o NotoSansThai-Bold
o NotoSansThai-Regular
Note: The NotoSansThai-Regular and the NotoSansThai-Bold fonts do not
contain all of the Latin1 glyphs. When a glyph is missing, SAS substitutes the
ArialUnicodeMS font, enabling the output to display.
n In SAS 9.4, the following TrueType fonts were added, replacing the fonts shown
in the second column:
CSongGB19030-LightHWL NSimSun
n In SAS 9.4M2, font slanting and emboldening features are new. If you specify
italic or bold styles on a universal printer font that does not have italic or bold, the
font will display as slanted or bold. See “Slanting and Emboldening Fonts” on
page 321.
n In SAS 9.4M3, the following Avenir Next Fonts were added for the Latin and
Cyrillic character sets. These fonts were replaced with AvenirNextforSAS fonts in
SAS 9.4M5.
Avenir Next Fonts for the Latin Avenir Next Fonts for the Cyrillic
Character Set Character Set
Avenir Next LT W04 Demi Italic Avenir Next Cyr W04 Demi Italic
Avenir Next LT W04 Light Italic Avenir Next Cyr W04 Light Italic
xiv What's New in the 9.4 Base SAS Language Reference: Concepts
Extended Attributes
You can create customized attributes for variables and data sets by using extended
attributes. Extended attributes are customized metadata for your SAS files. They are
user-defined characteristics that you associate with a SAS data set or variable. See
“Extended Attributes” on page 716.
PART 1
Chapter 1
Essential Concepts of Base SAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2
SAS Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3
Rules for Words and Names in the SAS Language . . . . . . . . . . . . . . . . . . . . 21
Chapter 4
SAS Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 5
Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Chapter 6
Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter 7
Dates, Times, and Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Chapter 8
Error Processing and Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Chapter 9
SAS Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Chapter 10
By-Group Processing in SAS Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Chapter 11
WHERE-Expression Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Chapter 12
Optimizing System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Chapter 13
Support for Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Chapter 14
The SAS Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
2
Chapter 15
Printing with SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
3
1
Essential Concepts of Base SAS
Software
What Is SAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Overview of Base SAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Other SAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Operating System Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Components of the SAS Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
SAS Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
SAS Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
External Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Database Management System Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
SAS Language Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
SAS Macro Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Ways to Run Your SAS Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Starting a SAS Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Different Types of SAS Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
SAS Windowing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Interactive Line Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Noninteractive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Batch Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Object Server Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Customizing Your SAS Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Setting Default System Option Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Executing Statements Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Customizing the SAS Windowing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Conceptual Information about Base SAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
SAS Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
DATA Step Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
SAS Files Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
What Is SAS?
SAS is a set of solutions for enterprise-wide business users and provides a powerful
fourth-generation programming language for performing tasks such as these:
4 Chapter 1 / Essential Concepts of Base SAS Software
n quality improvement
n applications development
With Base SAS software as the foundation, you can integrate with SAS many SAS
business solutions that enable you to perform large scale business functions.
Examples include data warehousing and data mining, human resources
management and decision support, and financial management and decision
support.
Components
The core of SAS is Base SAS software, which consists of the following:
DATA step
a programming language that you use to manipulate and manage your data.
SAS procedures
software tools for data analysis and reporting.
macro facility
a tool for extending and customizing SAS software programs and for reducing
text in your programs.
DATA step debugger
a programming tool that helps you find logic problems in DATA step programs.
Output Delivery System (ODS)
a system that delivers output in a variety of easy-to-access formats, such as
SAS data sets, procedure output files, or Hypertext Markup Language (HTML).
SAS windowing environment
an interactive, graphical user interface that enables you to easily run and test
your SAS programs.
The SAS windowing environment is described in the online Help.
SAS Files
When you work with SAS, you use files that are created and maintained by SAS, as
well as files that are created and maintained by your operating environment, and
that are not related to SAS. Files with formats or structures known to SAS are
referred to as SAS files. All SAS files reside in a SAS library.
The most commonly used SAS file is a SAS data set. A SAS data set is structured
in a format that SAS can process. Another common type of SAS file is a SAS
catalog. Many different types of information that are used in a SAS job are stored in
SAS catalogs. Examples include instructions for reading and printing data values, or
function key settings that you use in the SAS windowing environment. A SAS stored
program is a type of SAS file that contains compiled code that you create and save
for repeated use.
In some operating environments, a SAS library is a physical relationship among
files; in others, it is a logical relationship. For more information about the
characteristics of SAS libraries, see the SAS documentation for your operating
environment:
Windows Specifics: “Introduction to SAS Files” in SAS Companion for Windows
UNIX Specifics: "Introduction to SAS Files, Libraries, and Engines"
z/OS Specifics: “Introduction to SAS Files” in SAS Companion for Windows
6 Chapter 1 / Essential Concepts of Base SAS Software
n SAS view
A SAS data set both describes and physically stores your data values. A SAS view,
on the other hand, does not actually store values. Instead, it is a query that creates
a logical SAS data set that you can use as if it were a single SAS data set. It
enables you to look at data stored in one or more SAS data sets or in other vendors'
software files. SAS views enable you to create logical SAS data sets without using
the storage space required by SAS data files.
A SAS data set consists of the following:
n descriptor information
n data values
The descriptor information describes the contents of the SAS data set to SAS. The
data values are data that has been collected or calculated. They are organized into
rows, called observations, and columns, called variables. An observation is a
collection of data values that usually relate to a single object. A variable is the set of
data values that describe a given characteristic. The following figure represents a
SAS data set.
descriptor
portion descriptive information
variables
External Files
Data files that you use to read and write data, but which are in a structure unknown
to SAS, are called external files. External files can be used for storing
n raw data that you want to read into a SAS data file
For more information about the characteristics of external files in your operating
environment, see the SAS documentation for your operating environment:
n “Using External Files and Devices” in SAS Companion for UNIX Environments,
n “Using External Files under Windows” in SAS Companion for Windows, and
n PROC steps
A DATA step consists of a group of statements in the SAS language that can
perform the following tasks:
n read data from external files
Once your data is accessible as a SAS data set, you can analyze the data and write
reports by using a set of tools known as SAS procedures.
A group of procedure statements is called a PROC step. SAS procedures analyze
data in SAS data sets to produce statistics, tables, reports, charts, and plots, to
create SQL queries, and to perform other analyses and operations on your data.
They also provide ways to manage and print SAS files.
8 Chapter 1 / Essential Concepts of Base SAS Software
You can also use global SAS statements and options outside of a DATA step or
PROC step.
In the Explorer window, you can view and manage your SAS files, which are stored
in libraries, and create shortcuts to external files. The Results window helps you
navigate and manage output from SAS programs that you submit; you can view,
save, and manage individual output items. You use the Program Editor, Log, and
Output windows to enter, edit, and submit SAS programs, view messages about
your SAS session and programs that you submit, and browse output from programs
that you submit. For more detailed information about the SAS windowing
environment, see Chapter 16, “Introduction to the SAS Windowing Environment,” on
page 385.
By default, the SAS log and output are displayed immediately following the program
statements.
Noninteractive Mode
In noninteractive mode, SAS program statements are stored in an external file. The
statements in the file execute immediately after you issue a SAS command
referencing the file. Depending on your operating environment and the SAS system
10 Chapter 1 / Essential Concepts of Base SAS Software
options that you use, the SAS log and output are either written to separate external
files or displayed.
For more information about how these files are named and where they are stored,
see the SAS documentation for your operating environment:
UNIX Specifics: “The Default Routings for the SAS Log and Procedure Output in
UNIX Environments” in SAS Companion for UNIX Environments
Windows Specifics: “Routing Procedure Output and the SAS Log to a File” in SAS
Companion for Windows
z/OS Specifics: “Destinations of SAS Output Files” in SAS Companion for z/OS
Batch Mode
You can run SAS jobs in batch mode in operating environments that support batch
or background execution. Place your SAS statements in a file and submit them for
execution along with the control statements and system commands required at your
site.
When you submit a SAS job in batch mode, one file is created to contain the SAS
log for the job, and another is created to hold output that is produced in a PROC
step or, when directed, output that is produced in a DATA step by a PUT statement.
For more information about executing SAS jobs in batch mode, see the SAS
documentation for your operating environment:
UNIX Specifics: UNIX operating environment: “Printing and Routing Output” in
SAS Companion for UNIX Environments
Windows Specifics: Windows operating environment: “Running SAS in Batch
Mode” in SAS Companion for Windows
z/OS Specifics: z/OS operating environment: “Directing SAS Log and SAS
Procedure Output” in SAS Companion for z/OS
Also, see the documentation specific to your site for local requirements for running
jobs in batch and for viewing output from batch jobs.
n Customize the Explorer window by registering member, entry, and file types.
See the SAS online Help for more information and for additional ways to customize
your SAS windowing environment.
SAS Concepts
SAS concepts include the building blocks of SAS language: rules for words and
names, variables, missing values, expressions, dates, times, and intervals, and
each of the six SAS language elements — data set options, formats, functions,
informats, statements, and system options.
SAS system-wide concepts also include introductory information that helps you
begin to use SAS, including information about the SAS log, SAS output, error
processing, WHERE processing, and debugging. Information about SAS processing
prepares you to write SAS programs. Information about how to optimize system
performance as well as how to monitor performance.
2
SAS Processing
You can use different types of data as input to a DATA step. The DATA step includes
SAS statements that you write, which contain instructions for processing the data.
As each DATA step in a SAS program is compiling or executing, SAS generates a
log that contains processing messages and error messages. These messages can
help you debug a SAS program.
External files
External files can contain raw data, SAS pgm stmts, procedure output, or
output created by the PUT stmt. contain records comprised of formatted data
(data is arranged in columns) or free-formatted data (data that are not
arranged in columns).
Instream data
is data included in your program. You use the DATALINES statement at the
beginning of your data to identify the instream data.
For more information about raw data, see Chapter 21, “Reading Raw Data,” on
page 471.
Remote access
enables you to read input data from nontraditional sources such as a TCP/IP
socket or a URL. SAS treats this data as if it were coming from an external file.
SAS enables you to access your input data remotely in the following ways:
SAS catalog
specifies the access method that enables you to reference a SAS catalog as
an external file.
Clipboard
specifies the access method that enables you to read or write text data to the
clipboard on the host computer.
DATAURL
specifies the access method that enables you to access remote files by using
the DATAURL access method.
FTP
specifies the access method that enables you to use File Transfer Protocol
(FTP) to read from or write to a file from any host computer that is connected
to a network with an FTP server running.
Hadoop
specifies the access method that enables you to access files on a Hadoop
Distributed File System (HDFS) whose location is specified in a configuration
file.
SFTP
specifies the access method that enables you to use Secure File Transfer
Protocol (SFTP) to read from or write to a file from any host computer that is
connected to a network with an Open SSH SSHD server running.
TCP/IP socket
specifies the access method that enables you to read from or write to a
Transmission Control Protocol/Internet Protocol (TCP/IP) socket.
URL
specifies the access method that enables you to use the uniform resource
locator (URL) to read from and write to a file from any host computer that is
connected to a network with a URL server running.
WebDAV
specifies the access method that enables you to use the WebDAV protocol to
read from or write to a file from any host computer that is connected to a
network with a WebDAV server running.
ZIP
specifies the access method that enables you to access ZIP files by using
zlib services.
16 Chapter 2 / SAS Processing
For more information about accessing data remotely, see the following topics:
n “FILENAME Statement: CLIPBOARD Access Method” in SAS Global
Statements: Reference
n “FILENAME Statement: CATALOG Access Method” in SAS Global
Statements: Reference
n “FILENAME Statement: DATAURL Access Method” in SAS Global
Statements: Reference
n “FILENAME Statement: FTP Access Method” in SAS Global Statements:
Reference
n “FILENAME Statement: Hadoop Access Method” in SAS Global Statements:
Reference
n “FILENAME Statement: SFTP Access Method” in SAS Global Statements:
Reference
n “FILENAME Statement: SOCKET Access Method” in SAS Global
Statements: Reference
n “FILENAME Statement: URL Access Method” in SAS Global Statements:
Reference
n “FILENAME Statement: WebDAV Access Method” in SAS Global
Statements: Reference
n “FILENAME Statement: ZIP Access Method” in SAS Global Statements:
Reference
SAS log
contains a list of processing messages and program errors. The SAS log is
produced by default.
SAS data file
is a SAS data set that contains two parts: a data portion and a data descriptor
portion.
SAS view
is a SAS data set that uses descriptor information and data from other files. SAS
views enable you to dynamically combine data from various sources without
using disk space to create a new data set. A SAS data file contains actual data
values. However, SAS views contain only references to data stored elsewhere.
SAS views are of member type VIEW. In most cases, you can use a SAS view
as if it were a SAS data file.
External data file
contains the results of DATA step processing. These files are data or text files.
The data can be records that are formatted or free-formatted.
Report
contains the results of DATA step processing. Although you usually generate a
report by using a PROC step, you can generate the following two types of
reports from the DATA step:
Procedure output file
contains printed results of DATA step processing, and usually contains headers
and page breaks.
HTML file
contains results that you can display on the World Wide Web. This type of output
is generated through the Output Delivery System (ODS).
set. For more information about procedure output, see Base SAS Procedures Guide
and the SAS Output Delivery System: User’s Guide.
General Information
If you are running SAS in a client/server environment (for example, you are using
SAS Enterprise Guide), the SAS server administrator can restrict access to files and
directories on the host system. Additionally, when a SAS session is in a locked-
down state, certain access methods, functions, CALL routines, and procedures are
restricted by default. For more information, see “Sign On to Locked-Down SAS
Sessions” in SAS/CONNECT User’s Guide.
When SAS is in a locked-down state, the following SAS language elements are not
available by default:
Functions and
CALL Routines Access Methods Procedures Other
FTP
EMAIL
HADOOP (enables PROC HADOOP)
HTTP (enables PROC HTTP and PROC SOAP)
SOCKET
TCPIP
URL (enables PROC HTTP and PROC SOAP
If you attempt to use a resource that is locked down, SAS issues an error message
to the SAS log. If the SAS session is configured for the SAS logging facility, SAS
issues an error message to the Audit.Lockdown logger.
For more information, see the following resources:
n “LOCKDOWN system option” and “LOCKDOWN statement” in SAS Intelligence
Platform: Application Server Administration Guide on the SAS Intelligence
Platform Documentation page at support.sas.com/documentation/onlinedoc/
intellplatform.
n “Locked-Down Servers” in SAS Intelligence Platform: Security Administration
Guide on the SAS Intelligence Platform Documentation page at
support.sas.com/documentation/onlinedoc/intellplatform.
n To see the procedures that do not execute when the SAS server is in a locked-
down state, see “Restrictions” and “Interactions” syntax information for the
individual procedures in the Base SAS Procedures Guide.
z/OS-Specific Information
Restricted Features
Access to permanent z/OS data sets and UFS files and directories is not permitted
unless enabled in the lockdown list. This restriction applies to all SAS features, most
notably FILENAME and LIBNAME statements in SAS programs that are submitted
for execution on the server. This restriction also applies to the ability to list files on
the server through SAS clients such as SAS Enterprise Guide. When SAS is in the
locked-down state, SAS does not permit access to uncataloged z/OS data sets
except through externally allocated ddnames that are established by the server
administrator. However, there are no restrictions on creating temporary z/OS data
sets and UFS files, and processing them within the context of a single client
session. The z/OS data sets are considered temporary if they are allocated
DISP=(NEW,DELETE). External files are considered temporary if they are assigned
using the FILENAME device of TEMP. All members of the client WORK library are
considered temporary.
The SAS server administrator at your installation is responsible for the content of the
lockdown list. Therefore, if you need to access a z/OS data set or UFS file that is
unavailable in the locked-down state, contact your server administrator.
20 Chapter 2 / SAS Processing
Disabled Features
The following SAS procedures, which are specific to z/OS, cannot be executed
when SAS is in the locked-down state:
PDS SOURCE
PDSCOPY TAPECOPY
RELEASE TAPELABEL
The following DATA step functions, which are specific to z/OS, cannot be executed
when SAS is in the locked-down state:
ZVOLLIST ZDSATTR
ZDSLIST ZDSRATT
ZDSNUM ZDSXATT
ZDSIDNM ZDSYATT
The following access method, which is specific to z/OS, cannot be executed when
SAS is in the locked-down state:
VTOC
n FILEEXIST
n FILENAME
n RENAME
n DSNCATLGD (z/OS-specific)
21
3
Rules for Words and Names in the
SAS Language
Definition of Word
A word or token in the SAS programming language is a collection of characters that
communicates a meaning to SAS and which cannot be divided into smaller units
that can be used independently. A word can contain a maximum of 32,767 bytes.
A word or token ends when SAS encounters one of the following:
n the beginning of a new token
n a blank after a name or a number token
Each word or token in the SAS language belongs to one of four categories:
n names
n literals
n numbers
n special characters
22 Chapter 3 / Rules for Words and Names in the SAS Language
n _new
n yearcutoff
n year_99
n descending
n _n_
literal
consists of 1 to 32,767 bytes enclosed in single or double quotation marks. Here
are some examples of literals:
n 'Chicago'
n "1990‑91"
n 'Amelia Earhart'
Note: The surrounding quotation marks identify the token as a literal, but SAS
does not store these marks as part of the literal token.
number
in general, is composed entirely of numeric digits, with an optional decimal point
and a leading plus or minus sign. SAS also recognizes numeric values in the
following forms as number tokens: scientific (E−) notation, hexadecimal notation,
missing value symbols, and date and time literals. Here are some examples of
number tokens:
n 5683
n 2.35
n 0b0x
n ‑5
n 5.4E‑1
n '24aug90'd
special character
is usually any single keyboard character other than letters, numbers, the
underscore, and the blank. In general, each special character is a single token,
although some two-character operators, such as ** and <=, form single tokens.
Words in the SAS Language 23
The blank can end a name or a number token, but it is not a token. Here are
some examples of special-character tokens:
n =
n ;
n '
n +
n @
n /
Spacing Requirements
Here are the spacing requirements for words in SAS statements:
n You can begin SAS statements in any column of a line and write several
statements on the same line.
n You can begin a statement on one line and continue it on another line, but you
cannot split a word between two lines.
n A blank is not treated as a character in a SAS statement unless it is enclosed in
quotation marks as a literal or part of a literal. Therefore, you can put multiple
blanks any place in a SAS statement where you can put a single blank. It has no
effect on the syntax.
n The rules for recognizing the boundaries of words or tokens determine the use of
spacing between them in SAS programs. If SAS can determine the beginning of
each token due to cues such as operators, you do not need to include blanks. If
SAS cannot determine the beginning of each token, you must use blanks. See
Examples on page 23.
Although SAS does not have rigid spacing requirements, SAS programs are easier
to read and maintain if you consistently indent statements. The examples illustrate
useful spacing conventions.
Examples
n In this statement, blanks are not required because SAS can determine the
boundary of every token by examining the beginning of the next token:
total=x+y;
The first special-character token, the equal sign, marks the end of the name
token total. The plus sign, another special-character token, marks the end of
the name token x. The last special-character token, the semicolon, marks the
end of the y token. Though blanks are not needed to end any tokens in this
example, you can add them for readability, as shown here:
total = x + y;
n This statement requires blank spaces because SAS cannot recognize the
individual tokens without them:
24 Chapter 3 / Rules for Words and Names in the SAS Language
Without blanks, the entire statement up to the semicolon fits the rules for a name
token: it begins with a letter or underscore, contains letters, digits, or
underscores thereafter, and is less than 32,767 bytes long. Therefore, this
statement requires blanks to distinguish individual name and number tokens.
n user-supplied names
n Special characters, except for the underscore, are not allowed. In filerefs only,
you can use the dollar sign ($), the number sign (#), and the at sign (@).
n SAS reserves a few names for automatic variables and variable lists, SAS data
sets, and librefs.
o When creating variables, do not use the names of special SAS automatic
variables (for example, _N_ and _ERROR_) or special variable list names
(for example, _CHARACTER_, _NUMERIC_, and _ALL_).
o When associating a libref with a SAS library, do not use these libref names:
n Sashelp
n Sasmsg
n Sasuser
n Work
o When you create SAS data sets, do not use these names:
n _NULL_
n _DATA_
n _LAST_
n When assigning a fileref to an external file, do not use the filename SASCAT.
n When you create a macro variable, do not use names that begin with SYS.
Arrays 32
CALL routines 16
Catalog entries 32
Component objects 32
Engines 8
Filerefs 8
Formats, character 31
Formats, numeric 32
26 Chapter 3 / Rules for Words and Names in the SAS Language
Functions 16
Informats, character 30
Informats, numeric 31
Librefs 8
Macro variables 32
Macro windows 32
Macros 32
Passwords 8
SCL variables 32
n The name can contain mixed–case letters. SAS stores and writes the
variable name in the same case that is used in the first reference to the
variable. However, when SAS processes variable names, SAS internally
converts it to uppercase. You cannot, therefore, use the same variable name
with a different combination of upper and lowercase letters to represent
different variables. For example, cat, Cat, and CAT all represent the same
variable.
n Do not assign variables the names of special SAS automatic variables (such
as _N_ and _ERROR_) or variable list names (such as _NUMERIC_,
_CHARACTER_, and _ALL_) to variables.
Examples season=’summer’;
percent_of_profit=percent;
UPCASE
is the same as V7, except that variable names are uppercased, as in earlier
versions of SAS.
ANY
n The name can be up to 32 bytes in length.
n The name can contain any characters, including blanks, national characters,
special characters, and multi-byte characters. Names containing these types
of characters must be specified as name literals on page 31.
n The name can begin with any characters, including blanks, national
characters, special characters, and multi-byte characters.
n The name cannot contain any null bytes.
n The name must contain at least one character. A name with all blanks is not
permitted.
n can contain mixed-case letters. SAS stores and writes the variable name in
the same case that is used in the first reference to the variable. However,
when SAS processes a variable name, SAS internally converts it to
uppercases. You cannot, therefore, use the same variable name with a
different combination of uppercase and lowercase letters to represent
different variables. For example, cat, Cat, and CAT all represent the same
variable.
Requirement If you use any characters other than the ones that are valid when
the VALIDVARNAME= system option is set to V7 (letters of the
Latin alphabet, numerals, or underscores), then you must express
the variable name as a name literal and you must set
VALIDVARNAME=ANY. If the name includes either the percent
sign (%) or the ampersand (&), then you must use single quotation
marks in the name literal in order to avoid interaction with the SAS
Macro Facility. See “SAS Name Literals” on page 31 and
“Avoiding Errors When Using Name Literals” on page 34.
See “How Many Characters Can I Use When I Measure SAS Name
Lengths in Bytes?” on page 30
CAUTION Throughout SAS, using the name literal syntax with variable
names that exceed the 32-byte limit or have excessive
embedded quotation marks might cause unexpected results.
The intent of the VALIDVARNAME=ANY system option is to enable
compatibility with other DBMS variable (column) naming conventions,
such as allowing embedded blanks and national characters.
Rules for SAS Data Set Names, View Names, and Item
Store Names
Three types of SAS members, SAS data sets, data views, and item stores, are
expanded to have more functionality. The setting of the VALIDMEMNAME= system
option determines what rules apply to the names of these members in your SAS
session. The VALIDMEMNAME= option has two settings (COMPATIBLE and
EXTEND), each with varying degrees of flexibility for data set names, data view
names, and item store names:
COMPATIBLE
specifies that a SAS data set name, a view name, or an item store name must
follow these rules:
n The name can be up to 32 bytes in length.
n The name must begin with a letter of the Latin alphabet (A–Z, a–z) or the
underscore. Subsequent characters can be letters of the Latin alphabet,
numerals, or underscores.
n The name cannot contain blanks or special characters except for the
underscore.
n The name can contain mixed-case letters. SAS internally converts the
member name to uppercase. You cannot, therefore, use the same member
name with a different combination of uppercase and lowercase letters to
represent different variables. For example, customer, Customer, and
CUSTOMER all represent the same member name. How the name on the disk
appears is determined by the operating environment.
Alias COMPAT
EXTEND
specifies that a SAS data set name, a SAS view name, or an item store name
must follow these rules:
n The name can be up to 32 bytes in length.
n The name can include national characters, but it must be written as a SAS
name literal on page 31.
n The name can include special characters, except for the / \ * ? " < > |: -
characters, but it must be written as a SAS name literal.
Note: The SPD engine does not allow ‘.’ (the period) anywhere in the
member name.
Names in the SAS Language 29
n The name must contain at least one character (letters, numbers, valid special
characters, and national characters).
n Null bytes are not allowed.
Note: The SPD engine does not allow ‘$’ as the first character of the
member name.
n Leading and trailing blanks are deleted when the member is created.
n The name can contain mixed-case letters. SAS internally converts the
member name to uppercase. You cannot, therefore, use the same member
name with a different combination of uppercase and lowercase letters to
represent different variables. For example, customer, Customer, and
CUSTOMER all represent the same member name. How the name appears is
determined by the operating environment.
Operating For Windows and UNIX operating environments, all Base SAS
environments windows support the extended rules when
VALIDMEMNAME=EXTEND is set.
z/OS specifics The windowing environment for Base SAS supports the extended
rules in the Editor, Log, and Output windows when
30 Chapter 3 / Rules for Words and Names in the SAS Language
When you reference a SAS file directly by its physical name, the
final embedded period is considered to be an extension delimiter
only if what follows the period is a valid SAS extension.
Otherwise, the period is considered to be part of the member
name. For example, in the name my.member, member is
considered part of the member name and not a file extension. In
the name "my.member.sas7bdat", the member name is
"my.member" and the file extension is sas7bdat.
See “How Many Characters Can I Use When I Measure SAS Name
Lengths in Bytes?” on page 30
CAUTION Throughout SAS, using the name literal syntax with SAS
member names that exceed the 32-byte limit or have
excessive embedded quotation marks might cause
unexpected results. The intent of the
VALIDMEMNAME=EXTEND system option is to enable
compatibility with other DBMS member naming conventions, such as
allowing embedded blanks and national characters.
Note: The VALIDMEMNAME= option is not valid for the following tape engines:
V9TAPE, V8TAPE, V7TAPE, and V6TAPE.
When these system option values are set, the maximum number of characters that
you can use for a SAS variable name, data set name, view name, or item store
name is determined by the number of bytes of storage that are used to store one
character. This value is set by the SAS encoding value for your SAS session.
VALIDVARNAME=ANY or VALIDMEMNAME=EXTEND must be set to allow the use
of national language support (NLS) characters. Otherwise, only one-byte characters
are allowed.
Names in the SAS Language 31
The SAS encodings for western languages use one byte of storage to store one
character. Therefore, in western languages, you can use 32 characters for these
SAS names. The SAS encoding for some Asian languages use one to two bytes of
storage to store one character. The Unicode encoding, UTF-8, supports one to four
bytes of storage for a single character. When the SAS encoding uses four bytes to
store one character, the maximum length of one of these SAS names is eight
characters.
All SAS encodings support the characters A–Z and a–z as one-byte characters.
Follow these instructions for finding the maximum number of characters that can be
used for a SAS name:
2 In the table “SBCS, DBCS, and Unicode Encoding Values Used to Transcode
Data,” find the maximum number of bytes per character for the SAS encoding.
This table is in SAS National Language Support (NLS): Reference Guide.
3 Find the maximum number of bytes for a SAS name from Table 3.1 on page 25.
Divide this number by the bytes per character. The result is the maximum
number of characters that you can use for the SAS name.
n DBMS table
n item store
n SAS view
n statement label
32 Chapter 3 / Rules for Words and Names in the SAS Language
n variable
To use characters in a name literal other than _, A–Z, or a–z, you must set either the
VALIDVARNAME=ANY or VALIDMEMNAME=EXTEND system options. The
following table specifies the options that you must set to use SAS name literals.
Name literals are especially useful for expressing DBMS column and table names
that contain special characters and for including national characters in SAS names.
The following is an example of a VAR statement and a name literal:
var 'a b'n;
Important Restrictions
n You can use a name literal only for variables, statement labels, DBMS column
and table names, SAS data sets, SAS view, and item stores.
n When the name literal of a SAS data set name, a SAS view name, or an item
store name contains any characters that are not allowed when
VALIDMEMNAME=COMPAT, then you must set the system option
VALIDMEMNAME=EXTEND. See “VALIDMEMNAME= System Option” in SAS
System Options: Reference.
Note: Hash objects do not support the VALIDMEMNAME=EXTEND system
option for data set names. Data set names can not contain special characters or
national characters.
n When the name literal of a variable, DBMS table, or DBMS column contains any
characters that are not allowed when VALIDVARNAME=V7, then you must set
the system option VALIDVARNAME=ANY. See “VALIDVARNAME= System
Option” in SAS System Options: Reference.
n If you use either the percent sign (%) or the ampersand (&), then you must use
single quotation marks in the name literal in order to avoid interaction with the
SAS Macro Facility.
n When the name literal of a DBMS table or column contains any characters that
are not valid for SAS rules, you might need to specify a SAS/ACCESS LIBNAME
statement option.
Note: For more details and examples about the SAS/ACCESS LIBNAME
statement and about using DBMS table and column names that do not conform
to SAS naming conventions, see SAS/ACCESS for Relational Databases:
Reference.
n In a quoted string, SAS preserves and uses leading blanks, but SAS ignores and
trims trailing blanks.
n Blanks between the closing quotation mark and the n are not valid when you
specify a name literal.
n Note that even if you set VALIDVARNAME=ANY, the V6 engine does not support
names that have intervening blanks.
For more information about BY-Group Processing and how SAS creates the
temporary variables, FIRST and LAST, see “How SAS Determines FIRST.variable
and LAST.variable” on page 499 and “How SAS Identifies the Beginning and End of
a BY Group” in SAS DATA Step Statements: Reference.
Table 3.3 Summary of Default Rules for Naming SAS Data Sets and SAS Variables
Table 3.4 Summary of Extended Rules for Naming SAS Data Sets and SAS Variables
n can contain special characters except for / n can contain special characters
\ * ? " < > | : -. A name that contains including / \ * ? " < > | : - . A name that
special characters must be specified as a contains special characters must be
name literal. specified as a name literal.
n cannot begin with a blank or a period. n can begin with any character, including
blanks, national characters, special
n ignores leading and trailing blanks. characters, and multi-byte characters.
n can contain mixed-case letters. SAS n preserves leading blanks, but trailing
internally converts the member name to blanks are ignored.
uppercase. You cannot, therefore, use the
same member name with a different n can contain mixed-case letters. SAS
combination of uppercase and lowercase stores and writes the variable name in
letters to represent different variables. For the same case that is used in the first
example, cat, Cat, and CAT all represent reference to the variable. However,
when SAS processes a variable name,
the same member name. 1
it internally converts the variable name
to uppercase. You cannot, therefore,
use the same variable name with a
different combination of uppercase and
lowercase letters to represent different
variables. For example, cat, Cat, and
CAT all represent the same variable.
n cannot contain all blanks.
1 In the UNIX operating environment, SAS only reads data set names that are written in all lowercase characters
.
36 Chapter 3 / Rules for Words and Names in the SAS Language
37
4
SAS Variables
* If not explicitly defined, a variable’s type and length are automatically set by SAS based on the
variable’s first occurrence in a DATA step.
** The minimum length is 2 bytes in some operating environments, 3 bytes in others. See the
documentation for your operating system.
Note: The maximum number of variables can be greater than 32,767. This number
depends on your environment, the file's attributes and the total length of all the
variables, which cannot exceed the maximum page size, which is 32,767.
To get information about a variable’s attributes, use the CONTENTS statement in
the DATASETS procedure or the functions that are named in the following
definitions:
name
identifies a variable. A variable name must conform to SAS naming rules. See a
list of rules in Table 3.3 on page 34
The names _N_, _ERROR_, _FILE_, _INFILE_, _MSG_, _IORC_, and _CMD_
are reserved for automatic variables, which are generated automatically during
DATA step execution. Note that SAS products use variable names that start and
end with an underscore; it is recommended that you do not use names that start
and end with an underscore in your own applications.
To determine the value of this attribute, use the VNAME function or the
VARNAME function.
40 Chapter 4 / SAS Variables
type
identifies a variable as numeric or character. Within a DATA step, a variable is
assumed to be numeric unless character is indicated. Numeric values represent
numbers, can be read in a variety of ways, and are stored in floating-point
format. Character values can contain letters, numbers, and special characters
and can be from 1 to 32,767 characters long.
In an INPUT statement, you can assign a length other than the default length to
character variables. You can also assign a length to a variable in the ATTRIB
statement.
If you create a variable for the first time in an assignment statement and do not
explicitly define its type, then SAS determines its type based on the variable’s
first occurrence in the DATA step. The variable gets the same type and length as
the expression on the right side of the assignment statement.
n A variable that appears for the first time on the left side of an assignment
statement has the same length as the expression on the right side of the
assignment statement.
n A character variable that appears for the first time in a DATA step in an
INPUT statement and whose length has not been otherwise specified, has a
default length of 8.
n A character variable that appears for the first time in an FORMAT or
INFORMAT statement has a type and length based on the category of format
or informat that is applied when it is created. See “Formats by Category” in
SAS Formats and Informats: Reference and “Informats by Category” in SAS
Formats and Informats: Reference for a list of these categories.
To determine the value of this attribute, use the VARTYPE function or the
VTYPE function.
length
refers to the number of bytes used to store each of the variable's values in a
SAS data set. You can use a LENGTH statement to set the length of both
numeric and character variables. Variable lengths that are specified in a
LENGTH statement affect the length of numeric variables only in the output data
set. During processing, all numeric variables have a length of 8. Lengths of
character variables that are specified in a LENGTH statement affect both the
length during processing and the length in the output data set.
In an INPUT statement, you can assign a length other than the default length to
character variables. You can also assign a length to a variable in the ATTRIB
statement.
If you create a variable and do not explicitly define its length, then the length is
automatically set by SAS. SAS sets the length based on the variable’s first
occurrence in the DATA step.
n A variable that appears for the first time on the left side of an assignment
statement has the same length as the expression on the right side of the
assignment statement.
n A character variable that appears for the first time in a DATA step in an
INPUT statement and whose length has not been otherwise specified, has a
default length of 8.
n A character variable that appears for the first time in an FORMAT or
INFORMAT statement has a type and length based on the category of format
or informat that is applied when it is created. See “Formats by Category” in
SAS Variable Attributes 41
To determine the value of this attribute, use the OUT= option in the CONTENTS
statement of the DATASETS procedure to create an output data set. The
IdxUsage variable in the output data set contains one of the following values for
each variable:
Value Definition
extended attribute
is a user-defined attribute that is created using the XATTR ADD VAR statement
in the DATASETS procedure. For more information, see XATTR ADD statement.
Overview
The recommended way to create variables is to use one of the following methods.
When using any of these methods, be sure to reference the variable for the first time
in the statement that is used to create it.
n Create a New Variable Using the LENGTH Statement.
Here are some additional ways that you can create variables in a DATA step.
However, these methods are usually recommended for changing existing variables
or for reading in existing variables that are located in external files or raw data:
n Specify a new variable in a FORMAT or INFORMAT statement.
Note: This list is not exhaustive. For example, the SET, MERGE, MODIFY, and
UPDATE statements can also create variables.
Ways to Create Variables 43
When SAS assigns a value to a character variable, it pads the value with blanks or
truncates the value on the right side to make it match the length of the target
variable. Consider the following DATA step, in which variables are assigned a length
of 200 bytes, then concatenated:
data sales;
length address1 address2 address3 $ 200;
address3 = address1||address2;
run;
Because the length of Address3 is 200 bytes, only the first 200 bytes of the
concatenation (the value of Address1) are assigned to Address3. You might be able
to avoid this problem by using the TRIM function to remove trailing blanks from
Address1 before performing the concatenation, as follows:
data sales;
length address1 address2 address3 $ 200;
address3 = trim(address1)||address2;
run;
To change the lengths of existing numeric variables, use the LENGTH statement or
the ATTRIB statement. See Example and “ATTRIB Statement” in SAS DATA Step
Statements: Reference for more information.
n INFORMAT=
n LENGTH=
In this example, the ATTRIB statement is specified first in the DATA step. The
ATTRIB statement uses the FORMAT= option to create a character variable named
Flavor with the $w. Format and a length of 10 bytes. The ATTRIB statement also
specifies the LENGTH= option to create a variable named Sizes with a length of 20
bytes.
Example Code 4.1 Create a New Variable Using the ATTRIB Statement
data lollipops;
attrib Flavor format=$10.
Sizes length=$20;
Flavor="Cherry";
Size="Small Medium Large";
run;
proc contents data=lollipops; run;
Note:
n You cannot change the length of a character variable with a subsequent
LENGTH or ATTRIB statement within the same DATA step
n You can change the length of a numeric variable by using a subsequent
LENGTH statement.
If the assignment statement is specified first in the DATA step (before the ATTRIB
statement), then the DATA step would have created the character variable Flavor
with a length of 6 and the character variable Sizes with a length of 18.
Example Code 4.2 Change the Attributes of an Existing Variable Using the ATTRIB
Statement
data lollipops;
Flavor="Cherry";
attrib Flavor format=$10.;
run;
If the variable already exists, then you can use the ATTRIB statement to specify one
or more of the following variable attributes to change the existing variable:
n FORMAT=
n INFORMAT=
n LENGTH=
Ways to Create Variables 45
n LABEL=
Note: You cannot create a new variable by using a LABEL statement or the ATTRIB
statement's LABEL= attribute by itself. Labels can be applied only to existing
variables.
For more information, see “ATTRIB Statement” in SAS DATA Step Statements:
Reference.
Table 4.3 Resulting Variable Types and Lengths Produced When They Are Not Explicitly Set
If a variable appears for the first time on the right side of an assignment statement,
SAS assumes that it is a numeric variable and that its value is missing. If no later
statement gives it a value, SAS prints a note in the log that the variable is not
initialized.
You can use the “VARINITCHK= System Option” in SAS System Options:
Reference to specify that no notes, warnings, or error messages are written to the
SAS log. If an error is set, the DATA step stops processing.
Note: A RETAIN statement initializes a variable and can assign it an initial value,
even if the RETAIN statement appears after the assignment statement.
See Also
n “Reading Unaligned Data with Simple List Input” in SAS DATA Step Statements:
Reference
n “When to Use List Input” in SAS DATA Step Statements: Reference
Figure 4.1 Results for Creating New Variables Using Simple List Input with Lengths
Specified
To change the lengths of existing numeric variables, use the LENGTH statement or
the ATTRIB statement. See ““Example: ”” in SAS DATA Step Statements: Reference
and “ATTRIB Statement” in SAS DATA Step Statements: Reference for more
information.
See Also
n “INPUT Statement: List” in SAS DATA Step Statements: Reference
n “INPUT Statement: Formatted” in SAS DATA Step Statements: Reference
category of format that is assigned to them. Since the variable Amount is associated
with numeric type format (COMMAwd), SAS defines it as a numeric type variable,
with a default length of 8.
The Flavor variable is defined as a character type variable because the
$UPCASEw. format is a character type format. When character variables are
created using the FORMAT statement, SAS determines their length first based on
the length that you specify in the FORMAT statement. If you do not specify a length
with the format or anywhere else in the DATA step, then SAS gives the variable a
default length of 8. In this example, the format does not include a length
specification, so the length for the variable Flavor is 8 bytes.
Example Code 4.3 Specifying a New Variables Using the FORMAT Statement (Without
Specifying Lengths)
data lollipops;
format Flavor $upcase. Amount comma.;
Flavor='Cherry';
Amount=10;
run;
proc contents data=lollipops; run;
The next example is identical except that a length is specified for both variables in
the FORMAT statement along with the format:
Output 4.1 PROC CONTENTS Output for Creating New Variables Using the FORMAT
Statement (Without Specifying Lengths)
Example Code 4.4 Specifying New Variables Using the FORMAT Statement (With Length
Specified)
data lollipops;
format Flavor $upcase10. Amount comma10.;
Flavor='Cherry';
Amount=10;
run;
proc contents data=lollipops; run;
Output 4.2 PROC CONTENTS Output for Specifying New Variables Using the FORMAT
Statement (With Length Specified)
In the example below, the variables Flavor and Amount are created using an
assignment statement rather than using a FORMAT statement. When a variable
appears for the first time on the left side of an assignment statement, SAS
50 Chapter 4 / SAS Variables
automatically sets its type and length based on the expression on the right side of
the assignment statement.
Example Code 4.5 Changing the Format of an Existing Variable Using the FORMAT
Statement
data lollipops;
Flavor='Cherry';
Amount=10;
format Flavor $upcase10. Amount comma10.;
run;
proc contents data=lollipops; run;
Output 4.3 PROC CONTENTS Output for Changing the Format of an Existing Variable
Using the FORMAT Statement
Since the expression on the right is the 6-letter character string, Cherry, SAS
assigns a length of 6 bytes to the character variable Flavor. SAS assigns a length
of 8 bytes to the numeric variable, Amount.
See Also
n Table 4.1 on page 38
n SAS Formats and Informats: Reference
44 data _null_;
45 x= 3626885;
46 length y $ 4;
47 y=x;
48 put y;
49 run;
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
47:6
36E5
50 data _null_;
51 x1= 3626885;
52 length y1 $ 1;
53 y1=x1;
54 xs=0.000005;
55 length ys $ 1;
56 ys=xs;
57 put y1= ys=;
58 run;
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
53:7 56:7
NOTE: Invalid character data, x1=3626885.00 , at line 53 column 7.
y1=* ys=0
x1=3626885 y1=* xs=5E-6 ys=0 _ERROR_=1 _N_=1
NOTE: At least one W.D format was too small for the number to be printed. The
decimal may be shifted by the "BEST" format.
In the first DATA step of the example, SAS is able to fit the value of Y into a 4-byte
field by representing its value in scientific notation. In the second DATA step, SAS
cannot fit the value of Y1 into a 1-byte field and displays an asterisk (*) instead.
52 Chapter 4 / SAS Variables
datalines;
San Francisco 67
Paris 427
New York 67
Moscow 770
Melbourne 802
;
proc print data=aircode;
run;
ods listing close;
This example produces the following LISTING output:
n ATTRIB
n FORMAT
n INFORMAT
n LENGTH
n RETAIN
For any of these statements to work, they must be placed prior to any one of the
following declarative statements:
n SET
n MERGE
n UPDATE
Only the variables whose positions are relevant need to be listed. Variables not
listed in these statements retain their original position.
In the following example, the data set Sashelp.Class contains variables Name, Sex,
Age, Height, and Weight (in that order). The LENGTH statement is specified before
the SET statement. Specifying the LENGTH statement before the SET statement
causes the variable Height to be moved to the first position in the output data set.
Example Code 4.6 Using the LENGTH Statement to Reorder Variables
data Class1;
length Height 3; /* The LENGTH statement precedes the SET statement */
set Sashelp.Class; /* and causes the variable Height to be placed first */
54 Chapter 4 / SAS Variables
The RETAIN statement is most often used to reorder variables simply because no
other variable attribute specifications are required. The RETAIN statement has no
effect on retaining values of existing variables being read from the data set. In the
following example, the RETAIN statement causes the variable Weight to be listed
first in the output data set:
Example Code 4.7 Using the RETAIN Statement to Reorder Variables
data Class2;
retain Weight; /* The RETAIN statement precedes the SET statement */
set Sashelp.Class; /* and causes the variable Weight to be placed first */
run; /* in the output */
proc print data=Class2;
run;
Automatic Variables
Automatic variables are created automatically by the DATA step or by DATA step
statements. These variables are added to the program data vector but are not
written to the output data set. The values of automatic variables are retained from
one iteration of the DATA step to the next, rather than set to missing.
Automatic variables that are created by specific statements are documented with
those statements. For examples, see the “BY Statement” in SAS DATA Step
SAS Variable Lists 55
Definition
A SAS variable list is an abbreviated method of referring to a list of variable names.
SAS enables you to use the following variable lists:
n numbered range lists
With the exception of the numbered range list, you refer to the variables in a
variable list in the same order that SAS uses to keep track of the variables. SAS
keeps track of active variables in the order in which the compiler encounters them
within a DATA step. This happens whether the active variables are read from
existing data sets, an external file, or created in the step.
In a numbered range list, you can refer to variables that were created in any order,
provided that their names have the same prefix.
Note: Only the numbered range list is used in the RENAME= option.
56 Chapter 4 / SAS Variables
Definition
Numbered range lists require you to have a series of variables with the same name,
except for the last character or characters, which are consecutive numbers. For
example, the following two lists refer to the same variables:
Var1 Var2 Var3 Var4 Var5 Var6
Var1-Var6
Example
For example, suppose you decide to give some of your variables sequential names,
as in Score1, Score2, Score3, and so on. You can write an INPUT statement as
follows:
data exam;
input Score1-Score10;
datalines;
1 2 3 4 5 6 7 8 9 10
;
In a numbered range list, you can begin with any number and end with any number
as long as you do not violate the rules for user-supplied names and the numbers are
consecutive.
data exam;
input Score11-Score20;
datalines;
1 2 3 4 5 6 7 8 9 10
;
proc print data=exam noobs; run;
Example
Using the same data set from the previous example, the following example shows
how you can use a numbered range list to reference a subset of the variables:
data exam2;
set exam;
drop Score13-Score18;
run;
proc print data=exam2 noobs; run;
SAS Variable Lists 57
Example
You can also use a numbered range list in an ARRAY statement. In the following
example, notice how the variables are first defined in the INPUT statement before
they are used in the array declaration. The variables in the INPUT statement can
either be written out individually, as shown in the first DATA step below, or they can
be written as a numbered range list, as shown in the second DATA step:
data temperatures;
input day1 day2 day3 day4 day5 day6 day7;
datalines;
44.4 44.6 44.9 45.2 45.4 45.7 45.9
;
proc print data=temperatures;
title "Average Daily Low Temperature";
run;
data tempCelsius(drop=i);
set temperatures;
array celsius{7} day1-day7;
do i=1 to 7;
celsius{i}=(celsius{i} - 32) * 5/9;
end;
run;
run;
One solution to this limitation is to rename the variables that make up the list by
appending a character to the ends of the variable names:
data test;
set test;
keep a2147483646A--a2147483648A; /* 3 */
run;
proc print data=test; run;
1 Create a data set that contains variable names longer than the maximum
allowed for lists. Specify the variables as a numbered range list in the KEEP
statement. An error is returned.
2 Use PROC SQL and SAS dictionary tables to concatenate the letter A to the
ends of all the variable names. Read those values into a macro variable named
list.
3 Use PROC DATASETS MODIFY to rename the variables by specifying the
macro variable as the argument to the RENAME statement.
4 Use the variables in a name range list in the KEEP statement (instead of in a
numbered range list). See “Name Range Lists” on page 58 for more
information.
Note: Notice the difference in syntax between a numbered range list and the
name range list used in the solution above. In a name range list, the range is
indicated using double hyphens.
Note: Notice that name range lists use a double hyphen ( - - ) to designate the
range between variables, and numbered range lists use a single hyphen to
designate the range.
You can use a name range list in an ARRAY declaration as long as you have
already defined the variables prior to declaring the array. The variables can be
defined in the same DATA step or in a previous DATA step.
Below are some examples that show how name range lists can be used with various
SAS statements and options.
The following DATA step creates the data set that will be used in Examples 1
through 3.
data patients;
input Idnum Name $ Weight Pulse BMI Gender $;
datalines;
123 Jones 155 82 27 F
456 Smith 175 78 24 M
789 Kamda 172 69 22 F
;
Example 1
In the following example, the name range list specified in the KEEP statement keeps
all variables between and including Name and Pulse.
data patientsConsec;
set patients;
keep Name--Pulse;
run;
proc print data=patientsConsec; run;
Example 2
In the following example, the name range list specified in the KEEP statement keeps
all numeric variables between and including Idnum and BMI.
data patientsNumerics;
set patients;
keep Idnum-numeric-BMI;
run;
60 Chapter 4 / SAS Variables
Example 3
In the following example, the name range list specified in the KEEP statement keeps
all character variables between and including Idnum and Pulse.
data patientsCharacters;
set patients;
keep Idnum-character-Pulse;
run;
proc print data=patientsCharacters; run;
Example 4
The following example uses the Sashelp.Fish data set.
This example shows how you can use a name range list to specify the variables in
an array. The ARRAY statement reads all variables between and including the
variables Length1 and Width into an array named fish. The DO loop iterates
through the items in the array and converts the values to centimeters.
/* Array is used to convert inches to centimeters */
data fishConvert(drop=i);
set sashelp.fish(where=(species="Whitefish"));
array fish{5} Length1--Width;
do i=1 to 5;
fish{i}=fish{i} * 2.54;
end;
run;
proc print data=fishConvert; run;
Example 5
The following example uses the Sashelp.Baseball data set, in which the following
variables are defined:
In the example, the name range list specified in the KEEP statement keeps all
numeric variables between and including nAtBat and nOuts.
The name range list specified in the ARRAY statement reads all character variables
between and including Name and logSalary into an array named stats.
The name range list specified in the VAR statement in the PRINT procedure
specifies that only the variables between and including Name and nBB are printed in
the PROC PRINT output.
/* Array is used to multiply stats by 10 */
data changeStats(where=(YrMajor>18));
set sashelp.baseball;
keep Name nAtBat-numeric-nOuts YrMajor;
array stats(4) nAtBat--nRuns;
do i=1 to 4;
stats{i} = stats{i} * 10;
end;
run;
proc print data=changeStats;
var Name--nBB;
run;
Example 6
The following example uses the Sashelp.Baseball data set. See Output 4.8 on page
60 for a list of variables defined in the data set.
In the example, the name range list specified in the ARRAY statement reads all
character variables between and including Name and logSalary into an array named
case. The KEEP statement specifies a named range list to keep all variables
between nAtBat and CrBB.
/* Array is used to uppercase character values */
data baseballUpcase;
set sashelp.baseball(where=(YrMajor>18));
array case{6} Name-character-Div;
do i=1 to 6;
case{i}=upcase(case{i});
end;
keep crRuns-numeric-nOuts Name-character-Div;
run;
proc print data=baseballUpcase; run;
62 Chapter 4 / SAS Variables
Example 7
The following example uses the Sashelp.Baseball data set. See Output 4.8 on page
60 for a list of variables defined in the data set.
In the example, the name range list specified in the VAR statement prints all
variables between nAtBat and CrBB.
proc print data=sashelp.baseball(obs=5);
var Name nAtBat--nHome Salary;
run;
Note: You can use the VARNUM option in PROC CONTENTS or the VAR
statement in PROC PRINT to print the variables in the order of definition.
For more information about using arrays with variable lists, see “Using Variable Lists
to Define an Array Quickly” on page 611.
This character string tells SAS to calculate the sum of all the variables that begin
with “Sales,” such as Sales_Jan, Sales_Feb, and Sales_Mar.
Definition
The OF operator enables you to specify SAS variable lists or SAS arrays as
arguments to functions. Here is the syntax for functions used with the OF operator:
FUNCTION (OF variable-list)
FUNCTION (<argument | OF variable-list | OF array-name[*]><…, <argument | OF
variable-list | OF array-name[*]>>)
The following table shows the types of SAS variable lists that are valid with the OF
operator:
Special SAS name lists Function(OF _numeric_) Performs the function on the
_numeric_variable, which
specifies all numeric
variables that are already
defined in the current DATA
step.
In the following example, arguments are passed in as numbered range lists, both
with and without the use of the OF operator.
1. Requires you to have a series of variables with the same name except for the last character or characters, which are
consecutive numbers.
2. If array-name is a temporary array, there are limitations. See “Using the OF Operator with Temporary Arrays” in SAS
Functions and CALL Routines: Reference.
64 Chapter 4 / SAS Variables
data _null_;
x1=30; x2=20; x3=10;
T=sum(x1-x3); /* #1 */
T2=sum(OF x1-x3); /* #2 */
put T=; /* #3 */
put T2=; /* #4 */
run;
The following table summarizes the general differences between the DROP, KEEP,
and RENAME statements and the DROP=, KEEP=, and RENAME= data set
options.
Table 4.6 Statements versus Data Set Options for Dropping, Keeping, and Renaming
Variables
apply to output data sets only apply to output or input data sets
can be used in DATA steps only can be used in DATA steps and PROC steps
can appear anywhere in DATA steps must immediately follow the name of each
data set to which they apply
Dropping, Keeping, and Renaming Variables 65
Table 4.7 Status of Variables and Variable Names When Dropping, Keeping, and Renaming
Variables
Output data set DROP, KEEP specifies which variables all variables
are written to all output available for
data sets processing
Order of Application
If your program requires that you use more than one data set option or a
combination of data set options and statements, it is helpful to know that SAS drops,
keeps, and renames variables in the following order:
n First, options on input data sets are evaluated left to right within SET, MERGE,
and UPDATE statements. DROP= and KEEP= options are applied before the
RENAME= option.
n Next, DROP and KEEP statements are applied, followed by the RENAME
statement.
n Finally, options on output data sets are evaluated left to right within the DATA
statement. DROP= and KEEP= options are applied before the RENAME=
option.
data newstate(drop=tempvar);
set state(rename=(poprank=tempvar));
poprank=input(tempvar,8.);
run;
and hidden from public view, but that also contains methods to both encrypt and
decrypt the data.
This section provides sample programs that use the DATA step with different
functions and methods to encrypt and decrypt variables.
/*ENCRYPT*/
do i = 1 to 8;
encrypt=strip(encrypt)||translate(substr(name,i,1),
'0123456789!@#$%^&*()-=,./?<','ABCDEFGHIJKLMNOPQRSTUVWXYZ');
end;
/*DECRYPT*/
do j = 1 to 8;
decrypt=strip(decrypt)||translate(substr(encrypt,j,1),
'ABCDEFGHIJKLMNOPQRSTUVWXYZ','0123456789!@#$%^&*()-=,./?<');
end;
drop i j;
datalines;
ROBERT
JOHN
GREG
;
proc print;
run;
The following output shows the results of the PROC PRINT for Example 1:
Encrypting Variable Values 69
/*ENCRYPT*/
encrypt=id;
i=21;
do from_1 = "C","F","E","A","D","B";
to_1=put(i,2.);
encrypt=tranwrd(encrypt,from_1,to_1);
i+1;
end;
/*DECRYPT*/
decrypt=encrypt;
j=21;
do to_2 = "C","F","E","A","D","B";
70 Chapter 4 / SAS Variables
from_2=put(j,2.);
decrypt=tranwrd(decrypt,from_2,to_2);
j+1;
end;
drop i j to_1 from_1 to_2 from_2;
datalines;
ABCDEF
FEDC
ACE
BDFA
CAFDEB
BADCF
ABC
;
proc print;
run;
The following output shows the results of the PROC PRINT for Example 2:
Four ARRAY statements are used: the first array sets up the from values; the
second sets up the to values; the third holds the five separate numeric values; and
the fourth holds the five new, separate encrypted values.
The from and to arrays are each created with three elements. The from ARRAY is
assigned the same string of numbers for all three elements, and the to ARRAY is
assigned a different string of letters for each of the three elements to build the
every-third-time rotating pattern.
The PUT function converts the numeric value to a character value.
The first DO loop uses the SUBSTR function to split the value into five separate
values and assigns each to the old ARRAY. The second DO loop translates each
value by using the INDEXC function to find the original number in the from ARRAY
and, if found, translates the value using the from ARRAY, and rotates through the
list of elements every third time. The encrypted value is created by using the CATS
function to concatenate the five translated values.
If you compare the two DATA steps in the example below, you can see that the
values in the to and from arrays are reversed. This is because the second DATA
step reverses the encryption done in the first DATA step, converting the values back
to their original values.
The same process that is used to encrypt the values is also used to decrypt the
values. The only differences are that the encrypted variable is passed to the
SUBSTR function, and the final decrypted variable is passed to the INPUT function
following the CATS function. This is done to convert the final values to numeric
values.
Example Code 4.10 Using Different Functions to Encrypt Numeric Values into Character
Strings
data sample3;
input num;
array from(3) $ 10 from1-from3 ('0123456789','0123456789','0123456789');
array to(3) $ 10 to1-to3 ('ABCDEFGHIJ','KLMNOPQRST','UVWXYZABCD');
array old(5) $ old1-old5;
array new(5) $ new1-new5;
char_num=put(num,5.);
do i = 1 to 5;
old(i)=substr(char_num,i,1);
end;
j=1;
do k = 1 to 5;
if indexc(old(k),from(j)) > 0 then do;
new(k)=translate(old(k),to(j),from(j));
j+1;
if j=4 then j=1;
end;
end;
encrypt_num=cats(of new1-new5);
keep num encrypt_num;
datalines;
12345
70707
99
1111
;
run;
72 Chapter 4 / SAS Variables
data sample3;
set sample3;
array to(3) $ 10 to1-to3 ('0123456789','0123456789','0123456789');
array from(3) $ 10 from1-from3 ('ABCDEFGHIJ','KLMNOPQRST','UVWXYZABCD');
array old(5) $ old1-old5;
array new(5) $ new1-new5;
do i = 1 to 5;
old(i)=substr(encrypt_num,i,1);
end;
j=1;
do k = 1 to 5;
if indexc(old(k),from(j)) > 0 then do;
new(k)=translate(old(k),to(j),from(j));
j+1;
if j=4 then j=1;
end;
end;
decrypt_num=input(cats(of new1-new5),5.);
keep num encrypt_num decrypt_num;
run;
proc print;
run;
The following output shows the results of the PROC PRINT for Example 3:
Overview
In any number system, whether it is binary or decimal, there are limitations to how
precise numbers can be represented. As a result, approximations have to be made.
For example, in the decimal number system, the fraction 1/3 cannot be perfectly
represented as a finite decimal value because it contains infinitely repeating digits
(.333...). On computers, because of finite precision, this number must be
approximated. Numerical precision is the accuracy with which numbers are
approximated or represented.
Numerical Accuracy in SAS Software 73
where the pattern 0011 is repeated indefinitely. As a result, the value will be rounded
when stored on a computer.
Performing calculations and comparisons on imprecise numbers in SAS can lead to
unexpected results. Even the simplest calculations can lead to a wrong conclusion.
Hardware cannot always match what might seem obvious and expected in the
decimal system.
For example, in decimal arithmetic, the expression (3 x 0.1) is expected to be
equal to 0.3, so the difference between (3 x 0.1) and (0.3), must be 0. Because
the decimal values 0.1 and 0.3 do not have exact binary representations, this
equality does not hold true in binary arithmetic. If you compute the difference
between the two values in a SAS program, the result is not 0, as Example Code
4.11 on page 74 illustrates.
In the example, SAS sets the variables point_three and
three_times_point_three to 0.3 and (3 x 0.1), respectively. It then compares
74 Chapter 4 / SAS Variables
the two values by subtracting one from the other and writing the result to the SAS
log:
Example Code 4.11 Comparing Imprecise Values in SAS
data a;
point_three=0.3;
three_times_point_one= 3 * 0.1;
difference= point_three - three_times_point_one;
put 'The difference is ' difference;
run;
The log output shows that (3 x 0.1) — 0.3 does not equal 0, as it does in decimal
arithmetic. This is because the variable "difference" is the result of calculations that
are performed on rounded values, or, infinitely repeating binary values.
There are many decimal fractions whose binary equivalents are infinitely repeating
binary numbers, so be careful when interpreting results from general rational
numbers in decimal. There are some rational numbers that do not present problems
in either number system. For example, 1/2 can be finitely represented in both the
decimal and binary systems.
To understand better why a simple calculation such as this one can go wrong, or
how a number can be out of range, it is important to understand in more detail how
SAS stores binary numbers.
If you have not explicitly specified the number of storage bytes, then SAS uses the
default length of 8 bytes, and the maximum integer then depends solely on what
operating system you are using.
The following table lists the largest integer that can be reliably stored by a SAS
variable in the mainframe, UNIX, and Windows operating environments.
Table 4.8 Largest Integer That Can Be Safely Stored in a Given Length
When Variable
Largest Integer Largest Integer
Length
Equals ... z/OS Window and UNIX
3 65,536 8,192
4 16,777,216 2,097,152
5 4,294,967,296 536,870,912
6 1,099,511,627,776 137,438,953,472
7 281,474,946,710,656 35,184,372,088,832
CAUTION! Use the full 8 bytes to store variables that contain real numbers.
76 Chapter 4 / SAS Variables
Floating-Point Representation
SAS stores numeric values in 8 bytes of data. The way that the numbers are stored
and the space available to store them also affects numerical accuracy. Although
there are various ways to store binary numbers internally, SAS uses floating-point
representation to store numeric values. Floating-point representation supports a
wide range of values (very large or very small numbers) with an adequate amount of
numerical accuracy.
You might already be familiar with floating-point representation because it is similar
to scientific notation. In both scientific notation and floating-point representation,
each number is represented as a mantissa, a base, and an exponent.
mantissa exponent
987 = .987 x 103
base
n the mantissa is the number that is being multiplied by the base. In the example,
the mantissa is .987.
n the base is the number that is being raised to a power. In the example, the base
is 10.
n the exponent is the power to which the base is raised. In the example, the
exponent is 3.
One major difference between scientific notation and floating-point representation is
that in scientific notation, the base is 10. In floating-point representation, on most
operating systems, the base is either 2 or 16 depending on the system.
The following figure shows the decimal value 987 written in the IEEE 754 binary
floating-point format. Because it is a small value, no rounding is needed.
sign
exponent (11 bit) mantissa
Different host computers can have different formats and specifications for floating-
point representation. All platforms on which SAS runs use 8-byte floating-point
representation.
Precision v. Magnitude
The largest integer value that can be represented exactly (without rounding)
depends on the base and the number of bits that are allotted to the exponent. The
precision is determined by the number of bits that are allotted for the mantissa.
Whether an operating system truncates or rounds digits affects errors in
representation.
SAS stores truncated floating-point numbers using the LENGTH statement, which
reduces the number of mantissa bits. The following table shows some differences
between floating-point formats for the IBM mainframe and the IEEE standard. The
IEEE standard is used by the Windows and UNIX operating systems.
IEEE Standard
IBM (Windows and
Specifications Mainframe UNIX) Affects
Base 16 2 magnitude
The following bullet points describe the table above in more detail:
n Base 16 – uses digits 0-9 and letters A-F (to represent the values 10-15).
For example, to convert the decimal value 3000 to hexadecimal, you use the
base 16 number system:
Base 16
16 7 ... 16 4 16 3 16 2 16 1 16 0
For example, to convert the decimal value 184 to binary, you use the base 2
number system:
Base 2
27 ... 24 23 22 21 20
128 ... 16 8 4 2 1
Storage Format
The byte layout for a 64-bit, double-precision number on Windows is as follows:
This representation corresponds to bytes of data with each character being 1 bit, as
follows:
n The S in byte 1 is the sign bit of the number. A value of 0 in the sign bit is used to
represent positive numbers.
n The remaining M characters in bytes 2 through 8 represent the bits of the
mantissa. There is an implied radix point before the left-most bit of the mantissa.
Therefore, the mantissa is always less than 1. The term radix point is used
instead of decimal point because decimal point implies that you are working with
decimal (base 10) numbers, which might not be the case. The radix point can be
thought of as the generic form of decimal point.
The exponent has a base associated with it. Do not confuse this with the base in
which the exponent is represented; the exponent is always represented in binary
format, but the exponent is used to determine how many times the base should be
multiplied by the mantissa.
Conversion Example
This example shows the conversion process for the decimal value 255.75 to
floating-point representation.
1 Use the base 2 number system to write out the value 255.75 in binary.
Note: Each bit in the mantissa represents a fraction whose numerator is 1 and
whose denominator is a power of 2; that is, the mantissa is the sum of a series of
fractions such as 1 half , 1 fourth , 1 eighth , and so on. Therefore, for any
floating-point number to be represented exactly, you must express it as the
previously mentioned sum.
Base 2
27 26 25 24 23 22 21 20 .2-1 2-2
2 Move the decimal over until there is only one digit to the left of it. This process is
called normalizing the value. Normalizing a value in scientific notation is the
process by which the exponent is chosen in such a way that the absolute value
of the mantissa is at least one but less than ten. For this number, you move the
decimal point 7 places:
1.111 1111 11
Because the decimal point was moved 7 places, the exponent is now 7.
Numerical Accuracy in SAS Software 81
4 Convert the decimal value, 1030, to hexadecimal using the base 16 number
system:
Base 16
16 7 ... 16 4 16 3 16 2 16 1 16 0
The converted hexadecimal value for 1030 will be placed in the exponent portion
of the final result.
If the value that you are converting is negative, change the first bit to 1:
1100 0000 0110
6 In Step 2 above, delete the first digit and decimal (the implied one-bit):
11111111
8 To have a complete nibble at the end, add enough zeros to complete 4 bits:
1111 1111 1000
9 Convert
1111 1111 1000
In this example, the starting decimal value, 255.75, conveniently converts to a finite
binary value that can be represented without rounding in both binary and
82 Chapter 4 / SAS Variables
hexadecimal. The following section shows the conversion process for a decimal
number that cannot be represented precisely in floating-point representation.
not equal
Although these values appear to be alike, the internal representations differ slightly,
because the IEEE floating-point representation can represent only 15 digits. Here is
the floating-point representation of both variables using the HEX16. format.
x=3FE0000000000000
y=3FDFFFFFFFFFFFFF
When the number of significant digits is reduced to 15 or less, the floating-point
representation is the same and the values are equal.
data _null_;
x=.5000000000000000;
y=.500000000000000;
if x=y then put 'equal';
else put 'not equal';
put x=hex16./
y=hex16.;
run;
Log Output
equal
x=3FE0000000000000
y=3FE0000000000000
Storage Format
SAS for z/OS uses the traditional IBM mainframe floating-point representation as
follows:
This representation corresponds to bytes of data with each character being 1 bit, as
follows:
n The S in byte 1 is the sign bit of the number. A value of 0 in the sign bit is used to
represent positive numbers.
n The seven E characters in byte 1 represent a binary integer known as the
characteristic. The characteristic represents a signed exponent and is obtained
by adding the bias to the actual exponent. The bias is an offset used to enable
both negative and positive exponents with the bias representing 0. If a bias is not
used, an additional sign bit for the exponent must be allocated. For example, if a
system uses a bias of 64, a characteristic with the value of 66 represents an
exponent of +2, whereas a characteristic of 61 represents an exponent of –3.
n The remaining M characters in bytes 2 through 8 represent the bits of the
mantissa. There is an implied radix point before the left-most bit of the mantissa.
Therefore, the mantissa is always less than 1. The term radix point is used
instead of decimal point because decimal point implies that you are working with
decimal (base 10) numbers, which might not be the case. The radix point can be
thought of as the generic form of decimal point.
Conversion Example
The following example shows the conversion process for the decimal value 512.1 to
hexadecimal floating-point representation. This example illustrates how values that
can be precisely represented in decimal cannot be precisely represented in
hexadecimal floating point.
1 Because the base is 16, you must first convert the value 512.1 to hexadecimal
notation.
2 First, convert the integer portion, 512, to hexadecimal using the base 16 number
system:
84 Chapter 4 / SAS Variables
Base 16
16 7 ... 16 4 16 3 16 2 16 1 16 0
4 Convert the fraction portion (.1) of the original number, 512.1 to hexadecimal:
1 1.6
.1 = 10
= 16
The numerator cannot be a fraction, so keep the 1 and convert the .6 portion
again.
6 9.6
.6 = 10
= 16
Again, there cannot be fractions in the numerator, so keep the 9 and reconvert
the .6 portion.
The .6 continues to repeat as 9.6 which means that you keep the 9 and
reconvert. The closest that .1 can be represented in hexadecimal is
.1 = .1999999 × 160
5 The exponent for the value is 3 (Step 2 above). To determine the actual
exponent that will be stored, take the exponent value and add the bias to it:
true exponent + bias = 3 + 40 = 43 (hexadecimal) = stored exponent
The final portion to be determined is the sign of the mantissa. By convention, the
sign bit for positive mantissas is 0, and the sign for negative mantissas is 1. This
information is stored in the first bit of the first byte. From the hexadecimal value
in Step 4, compute the decimal equivalent and write it in binary format. Add the
sign bit to the first position. The stored value now looks like this:
This example shows how values that can be represented exactly in decimal notation
cannot always be represented precisely in floating-point notation. If a floating-point
value has a repeating pattern of numbers (like the above value has repeating ‘9s’),
there is a good chance that the value cannot be represented exactly.
Computational Considerations
Regardless of how much precision is available, there are still some numbers that
cannot be represented exactly. Most rational numbers (for example, .1) cannot be
represented exactly in base 2 or base 16. This is why it is often difficult to store
fractions in floating-point representation.
Consider the IBM mainframe representation of
1: 40 19 99 99 99 99 99 99
Notice that here is an infinitely repeating 9 digit similar to the trailing 3 digit in the
attempted decimal representation of one-third (.3333 …). This lack of precision can
be compounded when arithmetic operations are performed on these values
repeatedly.
For example, when you add .33333 to .99999, the theoretical answer is 1.33333,
but in practice, this answer is not possible. The sums become more imprecise as
the values continue to be calculated.
For example, consider the following DATA step:
data _null_;
do i=-1 to 1 by .1;
put i=;
if i=0 then put 'AT ZERO';
end;
run;
The AT ZERO message in the DATA step is never printed because the accumulation
of the imprecise number introduces enough errors that the exact value of 0 is never
encountered. The calculated result is close to 0, but never exactly equal to 0.
Therefore, when numbers cannot be represented exactly in floating point,
performing mathematical operations with other non-exact values can compound the
imprecision.
run;
Example Code 4.2 Log Output for Using the ROUND Function to Avoid Computational
Errors
i=-1
i=-0.9
i=-0.8
i=-0.7
i=-0.6
i=-0.5
i=-0.4
i=-0.3
i=-0.2
i=-0.1
i=0
AT ZERO
i=0.1
i=0.2
i=0.3
i=0.4
i=0.5
i=0.6
i=0.7
i=0.8
i=0.9
i=1
Here is another example of a numerical precision issue that occurs on z/OS but not
on the PC.
Example Code 4.13 Using the ROUND Function with the IF Statement
data a;
input gender $ height;
datalines;
m 60
m 58
m 59
m 70
m 60
m 58
;
proc freq;
tables gender/out=new;
run;
data final;
set new;
if percent=100 then put 'equal';
else put 'not equal';
run;
Numerical Accuracy in SAS Software 87
Output 4.10 Output for Using the ROUND Function with the IF Statement
In the example, PROC FREQ creates an output data set that contains the variable
Percent. Because all of the values for the variable Gender are the same, you might
expect Percent to have an exact value of 100. However, when the value of Percent
is tested, the log indicates that Percent is not exactly 100.
The algorithm used by PROC FREQ to produce the variable Percent involves
mathematical computations. The result is very close to 100 but not exactly. Using
the ROUND function (or the COMPFUZZ function) in the IF statement resolves this
issue.
A work-around for very simple calculations (for example, retaining only 2 digits to
the right of the decimal point) is to multiply the values by 100 and use the ROUND
function to round them to integers. Once you have performed the calculations on the
new whole numbers, divide by 100 to convert the values back to decimal form.
In the following example, the values for variable x are stored in the SAS data set as
real numbers. The number is multiplied by 1,000 and the ROUND function is used to
change the values to integers. The SUM statement is used to sum all the values of
New. On the last observation, which is detected using the END= option, the sum is
divided by 1,000 to convert the values back to fractions.
Example Code 4.14 Summing Rounded Values
data a;
set b end=last;
new=round(x*1000);
sum+new;
if last then sum=sum/1000;
run;
See “ROUND Function” in SAS Functions and CALL Routines: Reference for more
information about this function.
is true. But, in SAS, if you compare the literal value of 3.8 to the calculated value of
15.7 – 11.9 and output the result to the SAS log, you will get a result of 'not equal.'
Example Code 4.15 Comparing Values That Have Imprecise Representations
data a;
x=15.7-11.9;
if x=3.8 then put 'equal';
else put 'not equal';
88 Chapter 4 / SAS Variables
run;
Example Code 4.3 Log Output for Comparing Values That Have Imprecise
Representations
988 data a;
989 x=15.7-11.9;
990 if x=3.8 then put 'equal';;
991 else put 'not equal';
992 run;
not equal
NOTE: The data set WORK.A has 1 observations and 1 variables.
The log output indicates that the values 3.8 and (15.7 – 11.9) are not equivalent.
This is because the values involved in the computation cannot be precisely
represented in binary and hexadecimal.
If you add the PRINT procedure to display the results, you can see that the PROC
PRINT output is different from the stored value. The PROC PRINT statement
displays the value for x as 3.8 rather than the actual stored value because the
procedure automatically applies a format and rounds the results before displaying
them. This example shows how non-explicit rounding can cause confusion because,
in this case, PROC PRINT rounds only the final results after they are calculated.
proc print data=a;
run;
Example Code 4.4 Log Output: Using Formats to Confirm Precision Errors
102 data a;
103 x=15.7-11.9;
104 if x=3.8 then put 'equal';
105 else put 'not equal';
106 put x=10.8;
107 put x=18.16;
run;
not equal
x=3.80000000
x=3.7999999999999900
NOTE: The data set WORK.A has 1 observations and 1 variables.
Another way to verify the stored value of x is to apply the HEX16. format to the
calculated result. The HEX16. format is a special format that can be used to show
floating-point representation.
Example Code 4.17 Using the HEX16 Format to Verify Calculated Results
data a;
x=15.7-11.9;
if x=3.8 then put 'equal';
else put 'not equal';
put x=hex16.;
run;
Example Code 4.5 Using the HEX16 Format to Verify Calculated Results
123 data a;
124 x=15.7-11.9;
125 if x=3.8 then put 'equal';
126 else put 'not equal';
127 put x=hex16.;
128 run;
not equal
x=400E666666666664
NOTE: The data set WORK.A has 1 observations and 1 variables.
See “HEXw. Format” in SAS Formats and Informats: Reference for more information
about this format. See “Dictionary of Formats” in SAS Formats and Informats:
Reference for more information about formats in general.
However, if you add the ROUND function, as in the following example, the PUT
‘MATCH’ statement is executed:
data _null_;
x=1/3;
if round(x,.00001)=.33333 then put 'MATCH';
run;
Example Code 4.6 Log Output: Using the ROUND Function to Avoid Comparison Errors
1 data _null_;
2 x=1/3;
3 if round(x,.00001)=.33333 then put 'MATCH';
4 run;
MATCH
In general, if you are doing comparisons with fractional values, it is good practice to
use the ROUND function before performing any computations or comparisons.
See “ROUND Function” in SAS Functions and CALL Routines: Reference for more
information about this function.
in 8 bytes. In 2 bytes, it is truncated to 41 10. In this case, you still have the full
range of magnitude because the exponent remains intact, but there are fewer digits
involved. A decrease in the number of digits means either fewer digits to the right of
the decimal place or fewer digits to the left of the decimal place before trailing zeros
must be used.
For example, consider the number 1234567890, which is .1234567890 to the 10th
power of 10 in base 10 floating-point notation. If you have only five digits of
precision, the number becomes 123460000 (rounding up). Note that this is the case
regardless of the power of 10 that is used (.12346, 12.346, .0000012346, and so
on).
In addition, you must be careful in your choice of lengths, as the previous discussion
shows. Consider a length of 2 bytes on an IBM mainframe system. This value
enables 1 byte to store the exponent and sign, and 1 byte for the mantissa. The
largest value that can be stored in 1 byte is 255. Therefore, if the exponent is 0
(meaning 16 to the 0th power, or 1 multiplied by the mantissa), then the largest
integer that can be stored with complete certainty is 255. However, some larger
integers can be stored because they are multiples of 16.
For example, consider the 8-byte representation of the numbers 256 to 272 in the
following table:
Numerical Accuracy in SAS Software 91
Sign
and
Value Exp Mantissa 1 Mantissa 2-7 Considerations
258 43 10 200000000000
259 43 10 300000000000
271 43 10 F00000000000
The numbers from 257 to 271 cannot be stored exactly in the first 2 bytes; a third
byte is needed to store the number precisely. As a result, the following code
produces misleading results:
data temp;
length x 2;
x=257;
y1=x+1;
run;
data _null_;
set temp;
if x=257 then put 'FOUND';
y2=x+1;
run;
The PUT statement is never executed because the value of X is actually 256 (the
value 257 truncated to 2 bytes). Recall that 256 is stored in 2 bytes as 4310, but
257 is also stored in 2 bytes as 4310, with the third byte of 10 truncated.
You receive no warning that the value of 257 is truncated in the first DATA step.
Note, however, that Y1 has the value 258 because the values of X are kept in full, 8-
byte floating-point representation in the program data vector. The value is truncated
only when stored in a SAS data set. Y2 has the value 257 because X is truncated
before the number is read into the program data vector.
CAUTION! Do not use the LENGTH statement if your variable values are not
integers. Fractional numbers lose precision if truncated. Also, use the LENGTH
statement to truncate values only when disk space is limited. Refer to the length table in
the SAS documentation for your operating environment for maximum values.
See “LENGTH Statement” in SAS DATA Step Statements: Reference for more
information about this statement.
the effect of storing numbers in less than full length and then reading them. For
example, if the variable
x = 1/3
However, adding the TRUNC function makes the comparison true, as in the
following:
if x=trunc(1/3,3) then ...;
See“TRUNC Function” in SAS Functions and CALL Routines: Reference for more
information about this function.
data temp;
set numbers;
x=value;
do L=8 to 1 by -1;
if x NE trunc(x,L) then
do;
minlen=L+1;
output;
return;
end;
end;
run;
Output 4.12 Determining How Many Bytes Are Needed to Store a Number Accurately
Note that the minimum length required for the value 271 is greater than the
minimum required for the value 272. This fact illustrates that it is possible for the
largest number in a range of numbers to require fewer bytes of storage than a
smaller number. If precision is needed for all numbers in a range, you should obtain
the minimum length for all the numbers, not just the largest one.
See “TRUNC Function” in SAS Functions and CALL Routines: Reference for more
information about this function.
5
Missing Values
n character
n special numeric
By default, SAS prints a missing numeric value as a single period (.) and a
missing character value as a blank space. See “Creating Special Missing
Values” on page 96 for more information about special numeric missing values.
96 Chapter 5 / Missing Values
Definition
special missing value
is a type of numeric missing value that enables you to represent different
categories of missing data by using the letters A–Z or an underscore.
Tips
n SAS accepts either uppercase or lowercase letters. Values are displayed and
printed as uppercase.
n If you do not begin a special numeric missing value with a period, SAS identifies
it as a variable name. Therefore, to use a special numeric missing value in a
SAS expression or assignment statement, you must begin the value with a
period, followed by the letter or underscore. For example:
x=.d;
n When SAS prints a special missing value, it prints only the letter or underscore.
n When data values contain characters in numeric fields that you want SAS to
interpret as special missing values, use the MISSING statement to specify those
characters. For further information, see the “MISSING Statement” in SAS Global
Statements: Reference.
Example
The following example uses data from a marketing research company. Five testers
were hired to test five different products for ease of use and effectiveness. If a tester
was absent, there is no rating to report, and the value is recorded with an X for
“absent.” If the tester was unable to test the product adequately, there is no rating,
and the value is recorded with an I for “incomplete test.” The following program
reads the data and displays the resulting SAS data set. Note the special missing
values in the first and third data lines:
data period_a;
missing X I;
input Id $4. Foodpr1 Foodpr2 Foodpr3 Coffeem1 Coffeem2;
datalines;
1001 115 45 65 I 78
1002 86 27 55 72 86
1004 93 52 X 76 88
1015 73 35 43 112 108
Order of Missing Values 97
Numeric Variables
Within SAS, a missing value for a numeric variable is smaller than all numbers. If
you sort your data set by a numeric variable, observations with missing values for
that variable appear first in the sorted data set. For numeric variables, you can
compare special missing values with numbers and with each other. The following
table shows the sorting order of numeric values.
98 Chapter 5 / Missing Values
smallest ._ underscore
. period
-n negative numbers
0 zero
For example, the numeric missing value (.) is sorted before the special numeric
missing value .A, and both are sorted before the special missing value .Z. SAS does
not distinguish between lowercase and uppercase letters when sorting special
numeric missing values.
Note: The numeric missing value sort order is the same regardless of whether your
system uses the ASCII or EBCDIC collating sequence.
Character Variables
Missing values of character variables are smaller than any printable character value.
Therefore, when you sort a data set by a character variable, observations with
missing (blank) values of the BY variable always appear before observations in
which values of the BY variable contain only printable characters. However, some
usually unprintable characters (for example, machine carriage-control characters
and real or binary numeric data that have been read in error as character data) have
values less than the blank. Therefore, when your data includes unprintable
characters, missing values might not appear first in a sorted data set.
n automatic variables
SAS replaces the missing values as it encounters values that you assign to the
variables. Thus, if you use program statements to create new variables, their values
in each observation are missing until you assign the values in an assignment
statement, as shown in the following DATA step:
data new;
input x;
if x=1 then y=2;
datalines;
4
1
3
1
;
This DATA step produces a SAS data set with the following variable values:
OBS X Y
1 4 .
2 1 2
3 3 .
4 1 2
When X equals 1, the value of Y is set to 2. Since no other statements set Y's value
when X is not equal to 1, Y remains missing (.) for those observations.
values. When all of the rows in a data set in a one-to-one merge operation (without
a BY statement) have been processed, the variables in the output data set are set
to missing and remain missing.
Invalid Operations
SAS prints a note in the log and assigns a missing value to the result if you try to
perform an invalid operation, such as the following:
n dividing by zero
run;
This DATA step results in the following log:
Example Code 5.1 SAS Log Results for a Missing Value
130 data a;
131 x=.d;
132 y=x+1;
133 put y=;
134 run;
y=.
NOTE: Missing values were generated as a result of performing an operation on
missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 132:10
NOTE: The data set WORK.A has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
This statement sets the stored value of Age to a numeric missing value if Age has a
value less than 0.
Note: You can display a missing numeric value with a character other than a period
by using the DATA step's MISSING statement or the MISSING= system option.
The following example sets the stored value of Name to a missing character value if
Name has a value of “none”:
if name="none" then name='';
Working with Missing Values 103
Alternatively, if you want to set to a missing value for one or more variable values,
you can use the CALL MISSING routine. For example:
call missing(sales, name);
If your data contains special missing values, you can check for either an ordinary or
special missing value with a statement that is similar to the following:
if numvar<=.z then do;
To check for a missing character value, you can use a statement that is similar to
the following:
if charvar=' ' then do;
The MISSING function enables you to check for either a character or numeric
missing value, as in:
if missing(var) then do;
In each case, SAS checks whether the value of the variable in the current
observation satisfies the condition specified. If it does, SAS executes the DO group.
Note: Missing values have a value of false when you use them with logical
operators such as AND or OR.
104 Chapter 5 / Missing Values
105
6
Expressions
n variable
n function
compound expression
is an expression that includes several operators. When SAS encounters a
compound expression, it follows rules to determine the order in which to evaluate
each part of the expression.
WHERE expressions
is a type of SAS expression that is used within a WHERE statement or WHERE=
data set option to specify a condition for selecting observations for processing in
a DATA or PROC step. For syntax and further information about WHERE
expressions, see Chapter 11, “WHERE-Expression Processing,” on page 197.
n x
n x+1
n age<100
n trim(last)||', '||first
SAS Constants in Expressions 107
Definition
A SAS constant is a number or a character string that indicates a fixed value.
Constants can be used as expressions in many SAS statements, including variable
assignment and IF-THEN statements. They can also be used as values for certain
options. Constants are also called literals.
The following are types of SAS constants:
n character
n numeric
n bit testing
Character Constants
A character constant consists of 1 to 32,767 characters and must be enclosed in
quotation marks. Character constants can also be represented in hexadecimal form.
Another way to write the same string is to enclose the string in single quotation
marks and to express the apostrophe as two consecutive quotation marks. SAS
treats the two consecutive quotation marks as one quotation mark:
name='Tom''s'
statements that follow it. For example, in name='O'Brien';, O is the character value of
Name, Brien is extraneous, and '; begins another quoted string.
In the second set of examples, SAS searches for variables named ABC and SMITH,
instead of constants.
Note: SAS distinguishes between uppercase and lowercase when comparing
character expressions. For example, the character values 'Smith' and 'SMITH' are
not equivalent.
A comma can be used to make the string more readable, but it is not part of and
does not alter the hexadecimal value. If the string contains a comma, the comma
must separate an even number of hexadecimal characters within the string, as in
this example:
if value='3132,3334'x then do;
Note: Any trailing blanks or leading blanks within the quotation marks cause an
error message to be written to the log.
Numeric Constants
A numeric constant is a number that appears in a SAS statement. Numeric
constants can be presented in many forms, including
SAS Constants in Expressions 109
n standard notation
n hexadecimal notation
1 is an unsigned integer
n 0.5e‑10
n 9x
n date='01jan09'd;
n time='9:25:19pm't;
n dtime='18jan2003:9:27:05am'dt;
If the third bit of A (counting from the left) is on, and the fifth through eighth bits are
off, the comparison is true and the expression result is 1. Otherwise, the comparison
is false and the expression result is 0. The following is a more detailed example:
data test;
input @88 bits $char1.;
if bits='10000000'b
then category='a';
else if bits='01000000'b
then category='b';
else if bits='00100000'b
then category='c';
run;
Note: Bit masks cannot be used as bit literals in assignment statements. For
example, the following statement is not valid:
x='0101'b; /* incorrect*/
The $BINARYw. and BINARYw. formats and the $BINARYw., BINARYw.d, and
BITSw.d informats can be useful for bit testing. You can use them to convert
character and numeric values to their binary values, and vice versa, and to extract
specified bits from input data. See SAS Formats and Informats: Reference for
complete descriptions of these formats and informats.
112 Chapter 6 / Expressions
Table 6.2 Characters That Cause Misinterpretation When Following a Character Constant
Inserting a blank space between the ending quotation mark and the succeeding
character in the IF statement eliminates this misinterpretation. No error message is
generated and all observations with a FLIGHT value of 821 are replaced with a
value of 230.
if flight='821' then
flight='230';
SAS Variables in Expressions 113
Definition
variable
is a set of data values that describe a given characteristic. A variable can be
used in an expression.
Definitions
A SAS operator is a symbol that represents a comparison, arithmetic calculation, or
logical operation; a SAS function; or grouping parentheses. SAS uses two major
types of operators:
n prefix operators
n infix operators
n ‑25
n ‑cos(angle1)
n +(x*y)
An infix operator applies to the operands on each side of it (for example, 6<8). Infix
operators include the following:
n arithmetic
n comparison
n logical, or Boolean
n minimum
n maximum
n concatenation.
When used to perform arithmetic operations, the plus and minus signs are infix
operators.
SAS Operators in Expressions 115
SAS also provides several other operators that are used only with certain SAS
statements. The WHERE statement uses a special group of SAS operators, valid
only when used with WHERE expressions. For a discussion of these operators, see
Chapter 11, “WHERE-Expression Processing,” on page 197. The _NEW_ operator
is used to create an instance of a DATA step component object. For more
information, see Chapter 24, “Using DATA Step Component Objects,” on page 565.
Arithmetic Operators
Arithmetic operators indicate that an arithmetic calculation is performed, as shown in
the following table:
* The asterisk (*) is always necessary to indicate multiplication; 2Y and 2(Y) are not valid expressions.
Comparison Operators
Comparison operators set up a comparison, operation, or calculation with two
variables, constants, or expressions. If the comparison is true, the result is 1. If the
comparison is false, the result is 0.
Comparison operators can be expressed as symbols or with their mnemonic
equivalents, which are shown in the following table:
116 Chapter 6 / Expressions
Mnemonic
Symbol Equivalent Definition Example
= EQ equal to a=3
¬= NE not equal to
~= NE not equal to
* The symbol that you use for NE depends on your personal computer.
** The symbol => is also accepted for compatibility with previous releases of SAS. It is not supported in
WHERE clauses or in PROC SQL.
*** The symbol =< is also accepted for compatibility with previous releases of SAS. It is not supported in
WHERE clauses or in PROC SQL.
See “Order of Evaluation in Compound Expressions” on page 124 for the order in
which SAS evaluates these operators.
You can add a colon (:) modifier to any of the operators to compare only a specified
prefix of a character string. See “Character Comparisons” on page 118 for details.
The IN Operator
You can use the IN operator to compare a value that is produced by an expression
on the left of the operator to a list of values that are given on the right. Individual
values can be separated by commas or spaces. You can use a colon to specify a
range of sequential integers.
The three forms of the IN comparison are:
expression IN(value-1<...,value-n>)
expression IN(value-1<... value-n>)
expression IN(value-1<...:value-n>)
value
must be a constant.
For more information and examples of using the IN operator, see “The IN Operator
in Numeric Comparisons” on page 117.
Numeric Comparisons
SAS makes numeric comparisons that are based on values. In the expression
A<=B, if A has the value 4 and B has the value 3, then A<=B has the value 0, or
false. If A is 5 and B is 9, then the expression has the value 1, or true. If A and B
each have the value 47, then the expression is true and has the value 1.
Comparison operators appear frequently in IF-THEN statements, as in this example:
if x<y then c=5;
else c=12;
n y = x in (1 2 3 4 5 6 7 8 9 10);
n y = x in (1:10);
You can use multiple ranges in the same IN list, and you can use ranges with other
constants in an IN list. The following example shows a range that is used with other
constants to test if X is 0, 1, 2, 3, 4, 5, or 9.
118 Chapter 6 / Expressions
if x in (0,9,1:5);
You can also use the IN operator to search an array of numeric values. For
example, the following code creates an array a, defines a constant x, and then uses
the IN operator to search for x in array a. Note that the array initialization syntax of
array a{10} (2*1:5) creates an array that contains the initial values of 1, 2, 3, 4,
5, 1, 2, 3, 4, 5.
data _null_;
array a{10} (2*1:5);
x=99;
y = x in a;
put y=;
a{5} = 99;
y = x in a;
put y=;
run;
Example Code 6.2 Results from Using the IN Operator to Search an Array of Numeric
Values (Partial Output)
Character Comparisons
You can perform comparisons on character operands, but the comparison always
yields a numeric result (1 or 0). Character operands are compared character by
character from left to right. Character order depends on the collating sequence,
usually ASCII or EBCDIC, used by your computer.
For example, in the EBCDIC and ASCII collating sequences, G is greater than A.
Therefore, this expression is true:
'Gray'>'Adams'
Since trailing blanks are ignored in a comparison, 'fox ' is equivalent to 'fox'.
However, because blanks at the beginning and in the middle of a character value
are significant to SAS, ' fox' is not equivalent to 'fox'.
SAS Operators in Expressions 119
You can compare only a specified prefix of a character expression by using a colon
(:) after the comparison operator. SAS truncates the longer value to the length of the
shorter value during the comparison. In the following example, the colon modifier
after the equal sign tells SAS to look at only the first character of values of the
variable LastName and to select the observations with names beginning with the
letter S:
if lastname=:'S';
Because printable characters are greater than blanks, both of the following
statements select observations with values of LastName that are greater than or
equal to the letter S:
n if lastname>='S';
n if lastname>=:'S';
Note: If you compare a zero-length character value with any other character value
in either an IN: comparison or an EQ: comparison, the two-character values are not
considered equal. The result always evaluates to 0, or false.
The operations that are discussed in this section show you how to compare entire
character strings and the beginnings of character strings. Several SAS character
functions enable you to search for and extract values from within character strings.
See SAS Functions and CALL Routines: Reference for complete descriptions of all
SAS functions.
You can also use the IN operator to search an array of character values. For
example, the following code creates an array a, defines a constant x, and then uses
the IN operator to search for x in array a.
data _null_;
array a{5} $ (5*'');
x='b1';
y = x in a;
put y=;
a{5} = 'b1';
y = x in a;
put y=;
run;
120 Chapter 6 / Expressions
Example Code 6.3 Results from Using the IN Operator to Search an Array of Character
Values (Partial Output)
! OR
¦ OR
¬ NOT** not(a>b)
∘ NOT
~ NOT
* The symbol that you use for OR depends on your operating environment.
** The symbol that you use for NOT depends on your operating environment.
See “Order of Evaluation in Compound Expressions” on page 124 for the order in
which SAS evaluates these operators.
In addition, a numeric expression without any logical operators can serve as a
Boolean expression. For an example of Boolean numeric expressions, see “Boolean
Numeric Expressions” on page 122.
SAS Operators in Expressions 121
the result is true (has a value of 1) only when both A<B and C>0 are 1 (true): that is,
when A is less than B and C is positive.
Two comparisons with a common variable linked by AND can be condensed with an
implied AND. For example, the following two subsetting IF statements produce the
same result:
n if 16<=age and age<=65;
n if 16<=age<=65;
The OR Operator
If either of the quantities linked by an OR is 1 (true), then the result of the OR
operation is 1 (true). Otherwise, the OR operation produces a 0. For example,
consider the following comparison:
a<b|c>0
The result is true (with a value of 1) when A<B is 1 (true) regardless of the value of
C. It is also true when the value of C>0 is 1 (true), regardless of the values of A and
B. Therefore, it is true when either or both of those relationships hold.
Be careful when using the OR operator with a series of comparisons (in an IF,
SELECT, or WHERE statement, for example). Remember that only one comparison
in a series of OR comparisons must be true to make a condition true, and any
nonzero, nonmissing constant is always evaluated as true. For more information
about how SAS computes Boolean expressions, see “Boolean Numeric
Expressions” on page 122. Therefore, the following subsetting IF statement is
always true:
if x=1 or 2;
SAS first evaluates X=1, and the result can be either true or false. However, since
the 2 is evaluated as nonzero and nonmissing (true), the entire expression is true. In
this statement, however, the condition is not necessarily true because either
comparison can evaluate as true or false:
if x=1 or x=2;
(true). The result of NOT in front of a quantity whose value is missing is also 1
(true). The result of NOT in front of a quantity with a nonzero, nonmissing value is 0
(false). That is, the result of negating a true statement is 0 (false).
For example, the following two expressions are equivalent:
n not(name='SMITH')
n name ne 'SMITH'
n a ne b | c le d
For example, suppose that you want to fill in variable Remarks depending on
whether the value of Cost is present for a given observation. You can write the IF-
THEN statement as follows:
if cost then remarks='Ready to budget';
The numeric value that is returned by a function is also a valid numeric expression:
if index(address,'Avenue') then do;
The value of Game is 'black jack'. To correct this problem, use the TRIM function
in the concatenation operation as follows:
game=trim(color)||name;
This statement produces a value of 'blackjack' for the variable Game. The
following additional examples demonstrate uses of the concatenation operator:
n If A has the value 'fortune', B has the value 'five', and C has the value
'hundred', then the following statement produces the value
'fortunefivehundred' for the variable D:
d=a||b||c;
If the value of OldName is 'Jones', then NewName has the value 'Mr. or Ms.
Jones'.
n Because the concatenation operation does not trim blanks, the following
expression produces the value 'JOHN SMITH':
name='JOHN '||'SMITH';
n This example uses the PUT function to convert a numeric value to a character
value. The TRIM function is used to trim blanks.
month='sep ';
year=99;
date=trim(month) || left(put(year,8.));
124 Chapter 6 / Expressions
Order of
Evaluatio Mnemonic
Priority n Symbols Equivalent Definition Example
/ division f=g/h;
Order of
Evaluatio Mnemonic
Priority n Symbols Equivalent Definition Example
- subtraction f=g-h;
= EQ equal to if y eq (x+a)
then output;
¬= NE not equal to if x ne z
then output;
y = x in (1:10);
Group VI left to right & AND logical and if a=b & c=d
then x=1;
* Because Group I operators are evaluated from right to left, the expression x=2**3**4 is evaluated as x=(2**(3**4)).
** The plus (+) sign can be either a prefix or arithmetic operator. A plus sign is a prefix operation only when it appears at the
beginning of an expression or when it is immediately preceded by an open parenthesis or another operator.
*** The minus (−) sign can be either a prefix or arithmetic operator. A minus sign is a prefix operator only when it appears at
the beginning of an expression or when it is immediately preceded by an open parenthesis or another operator.
† Depending on the characters available on your keyboard, the symbol can be the not sign (¬), tilde (~), or caret (^). The
SAS system option CHARCODE allows various other substitutions for unavailable special characters.
†† For example, the SAS System evaluates -3><-3 as -(3><-3), which is equal to -(-3), which equals +3. This is because
Group I operators are evaluated from right to left.
††† Depending on the characters available on your keyboard, the symbol that you use as the concatenation operator can be a
double vertical bar (||), broken vertical bar (¦¦), or exclamation mark (!!).
‡ Group V operators are comparison operators. The result of a comparison operation is 1 if the comparison is true and 0 if it
is false. Missing values are the lowest in any comparison operation. The symbols =< (less than or equal to) are also
allowed for compatibility with previous versions of the SAS System.When making character comparisons, you can use a
colon (:) after any of the comparison operators to compare only the first character or characters of the value. SAS
truncates the longer value to the length of the shorter value during the comparison. For example, if name=:'P' compares
the value of the first character of NAME to the letter P.
‡‡ An exception to this rule occurs when two comparison operators surround a quantity. For example, the expression x<y<z is
evaluated as (x<y) and (y<z).
‡‡‡ Depending on the characters available on your keyboard, the symbol that you use for the logical or can be a single vertical
bar (|), broken vertical bar (¦), or exclamation mark (!). You can also use the mnemonic equivalent OR.
126 Chapter 6 / Expressions
127
7
Dates, Times, and Intervals
Definitions
SAS date value
is a value that represents the number of days between January 1, 1960, and a
specified date. SAS can perform calculations on dates ranging from A.D.
November 1582 to A.D. 19,900. Dates before January 1, 1960, are negative
numbers; dates after January 1, 1960, are positive numbers.
n SAS date values account for all leap year days, including the leap year day in
the year 2000.
n SAS date values can reliably tell you what day of the week a particular day
fell on as far back as September 1752. That was when the calendar was
adjusted by dropping several days. SAS day-of-the-week and length-of-time
calculations are accurate in the future to A.D. 19,900.
n Various SAS language elements handle SAS date values: functions, formats,
and informats.
SAS time value
is a value representing the number of seconds since midnight of the current day.
SAS time values are between 0 and 86400.
SAS datetime value
is a value representing the number of seconds between January 1, 1960, and an
hour/minute/second within a specified date.
The following figure shows some dates written in calendar form and as SAS date
values.
Figure 7.1 How SAS Converts Calendar Dates to SAS Date Values
The following SAS language elements do not convert SAS dates to Julian dates.
They apply a Julian date format to a SAS date.
PDJULI
SAS can perform calculations on raw SAS date values and on formatted SAS date
values. This includes performing calculations on Julian formatted date values.
SAS uses these definitions of Julian dates and Julian formats:
Julian date
is the number of continuous days since January 1, 4713 BC, which is also known
as an astronomical date.
Julian format
is the representation of an ordinal SAS date in the form of a calendar day,
YYDDD or YYDD.
SAS uses the Julian format (ordinal date) definition of dates. Julian-related
language elements in SAS do not convert SAS dates internally to Julian
astronomical dates. These Julian-related language elements make a SAS date look
like an ordinal date with the form YYDDD or YYYYDDD. For example, January, 23,
2018 is 18023 when you apply a Julian format in SAS.
You must define the values as SAS dates before using them in calculations. The
only way you can convert a SAS date to an astronomical date is to add 2,436,934.5
to the SAS date value. This conversion enables SAS to use the values to perform
calculations. Otherwise, SAS treats the values as regular integer numeric values,
and you might get unexpected results.
n Converts the dates into the MMDDYY10 format and the Julian format.
n Performs calculations on the two sets of dates, even though they have different
formats.
data dates; /* 1 */
input sas_date;
datalines;
21519
21522
21528
21535
21545
21555
130 Chapter 7 / Dates, Times, and Intervals
21565
;
proc print data = dates;
run;
data dates2; /* 2 */
set dates;
formatted_sas_date = sas_date;
julian_formatted_SAS_date = sas_date;
format formatted_sas_date mmddyy10. julian_formatted_SAS_date julian.; /* 3 */
run;
proc print data=dates2;
run;
data dates3; /* 4 */
set dates2;
datediff=sas_date - lag(julian_formatted_SAS_date); /* 5 */
run;
proc print data =dates3;
run;
Output 7.1 Converting SAS Dates and Using the Results in Calculations
Five-Digit Years
Although some formats that specify a width large enough to accommodate
formatting a five-digit year, such as DATETIME20., the SAS documentation does not
display five-digit years.
The PUT statement writes the following lines to the SAS log:
SAS date=15639
formated date=26OCT2002
Note: Whenever possible, specify a year using all four digits. Most SAS date and
time language elements support four-digit year values.
data schedule;
input @1 jobid $ @6 projdate mmddyy10.;
datalines;
About SAS Date, Time, and Datetime Values 133
A100 01/15/25
A110 03/15/2025
A200 01/30/96
B100 02/05/12
B200 06/15/2012
;
Output 7.2 Output Showing Four-Digit Years That Result from Setting YEARCUTOFF= to
1926
n Use the YEARCUTOFF= system option when converting two-digit dates to SAS
date values.
n Examine sets of raw data coming into your SAS process to make sure that any
dates containing two-digit years are correctly interpreted by the YEARCUTOFF=
system option. Look out for the following situations:
o two-digit years that are distributed over more than a 100-year period. For
dates covering more than a 100-year span, you must either use four-digit
years in the data, or use conditional logic in a DATA step to interpret them
correctly.
o two-digit years that need an adjustment to the default YEARCUTOFF= range.
For example, if the default value for YEARCUTOFF= in your operating
environment is 1926 and you have a two-digit date in your data that
represents 1925, you have to adjust your YEARCUTOFF= value downward
by a year in the SAS program that processes this value.
n Make sure that output SAS data sets represent dates as SAS date values.
n Check your SAS programs to make sure that formats and informats that use two-
digit years, such as DATE7., MMDDYY6., or MMDDYY8., are reading and
writing data correctly.
Note: The YEARCUTOFF= option has no effect on dates that are already stored as
SAS date values.
Type of
Language Language
Task Element Element Input Result
DAY. 19434 17
DDMMYYB. 19434 17 03 13
JULDAY. * 19434 76
MMDDYYB. 19434 03 17 13
Type of
Language Language
Task Element Element Input Result
MONTH. 19434 3
WEEKDAY. 19434 1
QTRR. 19434 I
Type of
Language Language
Task Element Element Input Result
* In SAS, a Julian date is a date in the form YYNNN or YYYYNNN, where YY is a two-digit year, YYYY is a four-digit year,
and NNN is the ordinal offset from January 1 of the year YY or YYYY. SAS processes Julian dates only for valid SAS
dates.
Type of
Language Language
Task Element Element Input Result
Date Tasks
About SAS Date, Time, and Datetime Values 139
Type of
Language Language
Task Element Element Input Result
HOUR 19434 5
MINUTE 19434 23
MONTH 19434 3
QTR 19434 1
SECOND 19434 54
WEEKDAY 19434 1
Type of
Language Language
Task Element Element Input Result
Time Tasks
HOUR. 19434 5
Write the current time as a SYSTIME SYSTIME &SYSTIME The time at the
string automatic macro moment of
variable execution, in the
form HH:MM
Return the current time of Time functions TIME( ) () The SAS time
day as a SAS time value value at
moment of
execution, in the
form
NNNNN.NNN.
Datetime Tasks
Type of
Language Language
Task Element Element Input Result
Interval Tasks
* In SAS, a Julian date is a date in the form YYNNN or YYYYNNN, where YY is a two-digit year, YYYY is a four-digit year,
and NNN is the ordinal offset from January 1 of the year YY or YYYY. SAS processes Julian dates only for valid SAS
dates.
Examples
n DATETIME formats count the number of seconds since January 1, 1960. For
datetimes that are greater than 02JAN1960:00:00:01 (integer of 86401), the
datetime value is always greater than the time value.
n When in doubt, look at the contents of your data set for clues as to which type of
value you are dealing with.
This program uses the DATETIME, DATE, and TIMEAMPM formats to display the
value 86399 to a date and time, a calendar date, and a time.
options nodate;
data test;
Time1=86399;
format Time1 datetime.;
Date1=86399;
format Date1 date9.;
Time2=86399;
format Time2 timeampm.;
run;
proc print data=test;
title 'Same Number, Different SAS Values';
footnote1 'Time1 is a SAS DATETIME value';
footnote2 'Date1 is a SAS DATE value';
footnote3 'Time2 is a SAS TIME value';
run;
footnote;
Definitions
duration
is an integer representing the difference between any two dates or times or
datetimes. Date durations are integer values representing the difference, in the
number of days, between two SAS dates. Time durations are decimal values
representing the number of seconds between two times or datetimes.
TIP Date and datetimes durations can be easily calculated by subtracting the
smaller date or datetime from the larger. When dealing with SAS times, special
care must be taken if the beginning and the end of a duration are on different
calendar days. Whenever possible, the simplest solution is to use datetimes
rather than times.
interval
is a unit of measurement that SAS can count within an elapsed period of time,
such as DAYS, MONTHS, or HOURS. SAS determines date and time intervals
based on fixed points on the calendar, the clock, or both. The starting point of an
interval calculation defaults to the beginning of the period in which the beginning
value falls, which might not be the actual beginning value specified. For
example, if you are using the INTCK function to count the months between two
dates, regardless of the actual day of the month specified by the date in the
beginning value, SAS treats it as the first of that month.
144 Chapter 7 / Dates, Times, and Intervals
Syntax
SAS provides date, time, and datetime intervals for counting different periods of
elapsed time. You can create multiples of the intervals and shift their starting point.
Use them with the INTCK and INTNX functions and with procedures that support
numbered lists (such as the PLOT procedure). This is the form of an interval:
name<multiple><.starting-point>
The terms in an interval have the following definitions:
name
is the name of the interval. See the following table for a list of intervals and their
definitions.
multiple
creates a multiple of the interval. multiple can be any positive number. The
default is 1. For example, YEAR2 indicates a two-year interval.
.starting-point
is the starting point of the interval. By default, the starting point is 1. A value
greater than 1 shifts the start to a later point within the interval. The unit for
shifting depends on the interval, as shown in the following table. For example,
YEAR.3 specifies a yearly period from the first of March through the end of
February of the following year.
Intervals by Category
Table 7.3 Intervals Used with Date and Time Functions
Default
Categor Starting
y Interval Definition Point Shift Period Example Description
Default
Categor Starting
y Interval Definition Point Shift Period Example Description
Default
Categor Starting
y Interval Definition Point Shift Period Example Description
Output 7.5 Calculating the Duration between Start and End Dates
Boundaries of Intervals
SAS associates date and time intervals with fixed points on the calendar. For
example, the MONTH interval represents the time from the beginning of one
calendar month to the next, not a period of 30 or 31 days. When you use date and
time intervals (for example, with the INTCK or INTNX functions), SAS bases its
calculations on the calendar divisions that are present. Consider the following
examples:
Note: The only intervals that do not begin on the same date in each year are WEEK
and WEEKDAY. A Sunday can occur on any date because the year is not divided
evenly into weeks.
Single-Unit Intervals
Single-unit intervals begin at the following points on the calendar:
148 Chapter 7 / Dates, Times, and Intervals
n Monday–Monday
n Tuesday–Tuesday
n Wednesday–Wednesday
n Thursday–Thursday
n Friday–Sunday
Multi-Unit Intervals
example, does the first of October mark the first or the second month in a two-month
interval?
For all multi-unit intervals except multi-week intervals, SAS creates an interval
beginning on January 1, 1960, and counts forward from that date to determine
where individual intervals begin on the calendar. As a practical matter, when a year
can be divided evenly by an interval, think of the intervals as beginning with the
current year. Thus, MONTH2 intervals begin with January, March, May, July,
September, and November. Consider this example:
howmany1=intck('month2','15feb2000'd, howmany1=1
'15mar2000'd);
count=intck('day50','01oct1998'd, count=1
'01jan1999'd);
In the above example, SAS counts 50 days beginning with January 1, 1960; then
another 50 days; and so on. As part of this count, SAS counts one DAY50 interval
between October 1, 1998, and January 1, 1999. For example, to determine the date
on which the next DAY50 interval begins, use the INTNX function, as follows:
Multi-Week Intervals
Multi-week intervals, such as WEEK2, present a special case. In general, weekly
intervals begin on Sunday, and SAS counts a week whenever it passes a Sunday.
However, SAS cannot calculate multi-week intervals based on January 1, 1960,
because that date fell on a Friday, as shown:
Dec Su Mo Tu We Th Fr Sa Jan
1959 27 28 29 30 31 1 2 1960
Therefore, SAS begins the first interval on Sunday of the week containing January
1, 1960—that is, on Sunday, December 27, 1959. SAS counts multi-week intervals
from that point. The following example counts the number of two-week intervals in
the month of August 1998:
150 Chapter 7 / Dates, Times, and Intervals
To see the beginning date of the next interval, use the INTNX function, as shown
here:
Shifted Intervals
counting shifted intervals from that point. The INTNX function demonstrates that the
next interval begins on January 5, 1960:
For shifted intervals based on weeks, SAS first creates an interval based on Sunday
of the week containing January 1, 1960 (that is, December 27, 1959). Then, it
moves forward the required number of days. For example, suppose you want to
create the interval WEEK2.8 (biweekly periods beginning on the second Sunday of
the period). SAS measures a two-week interval based on Sunday of the week
containing January 1, 1960, and begins counting shifted intervals on the eighth day
of that. The INTNX function shows the beginning of the next interval:
Table 7.12 Using the INTNX Function to Show the Beginning of the Next Interval
You can also shift time intervals. For example, HOUR8.7 intervals divide the day
into the periods 06:00 to 14:00, 14:00 to 22:00, and 22:00 to 06:00.
Custom Intervals
You can define custom intervals and associate interval data sets with new interval
names when you use the INTERVALDS= system option. An interval name cannot
be a reserved SAS name. The dates for these intervals are located in a SAS data
set that you create. The data set must contain the variable Begin. For each
observation, the Begin variable represents the start of an interval. You can specify a
second variable, End, to represent the end of the interval, but it is not required. If the
End variable is not present in the data set, the end of an interval is inferred by the
next Begin variable value. After the custom intervals have been defined, you can
use them with the INTCK and INTNX functions just as you would use standard
intervals.
The INTERVALDS= system option enables you to increase the number of allowable
intervals. In addition to the standard list of intervals (DAY, WEEKDAY, and so on),
the names that are listed in INTERVALDS= are valid as well.
Note: Nested custom intervals are not supported.
The first, second, and third numbers specify the number of weeks in the first,
second, and third month of each period, respectively. Retail calendar intervals
facilitate comparisons across years, because week definitions remain consistent
from year to year.
The intervals that are created from the formats can be used in any of the following
functions: INTCINDEX, INTCK, INTCYCLE, INTFIT, INTFMT, INTGET, INTINDEX,
INTNX, INTSEAS, INTSHIFT, and INTTEST.
The following table lists calendar intervals that are used in the retail industry and
that are ISO 8601 compliant.
Interval Description
YEARV specifies ISO 8601 yearly intervals. The ISO 8601 year begins
on the Monday on or immediately preceding January 4. Note
that it is possible for the ISO 8601 year to begin in December
of the preceding year. Also, some ISO 8601 years contain a
leap week. The beginning subperiod s is written in ISO 8601
weeks (WEEKV).
R445YR is the same as YEARV except that in the retail industry the
beginning subperiod s is 4-4-5 months (R445MON).
R454YR is the same as YEARV except that in the retail industry the
beginning subperiod s is 4-5-4 months (R454MON).
R544YR is the same as YEARV except that in the retail industry the
beginning subperiod s is 5-4-4 months (R544MON).
R445MON specifies retail 4-4-5 monthly intervals. The 3rd, 6th, 9th, and
12th months are five ISO 8601 weeks long with the exception
that some 12th months contain leap weeks. All other months
are four ISO 8601 weeks long. R445MON intervals begin with
the 1st, 5th, 9th, 14th, 18th, 22nd, 27th, 31st, 35th, 40th, 44th,
and 48th weeks of the ISO year. The beginning subperiod s is
4-4-5 months (R445MON).
R454MON specifies retail 4-5-4 monthly intervals. The 2nd, 5th, 8th, and
11th months are five ISO 8601 weeks long with the exception
that some 12th months contain leap weeks. R454MON
intervals begin with the 1st, 5th, 10th, 14th, 18th, 23rd, 27th,
31st, 36th, 40th, 44th, and 49th weeks of the ISO year. The
beginning subperiod s is 4-5-4 months (R454MON).
About Date and Time Intervals 153
Interval Description
R544MON specifies retail 5-4-4 monthly intervals. The 1st, 4th, 7th, and
10th months are five ISO 8601 weeks long. All other months
are four ISO 8601 weeks long with the exception that some
12th months contain leap weeks. R544MON intervals begin
with the 1st, 6th, 10th, 14th, 19th, 23rd, 27th, 32nd, 36th, 40th,
45th, and 49th weeks of the ISO year. The beginning
subperiod s is 5-4-4 months (R544MON).
WEEKV specifies ISO 8601 weekly intervals of seven days. Each week
begins on Monday. The beginning subperiod s is calculated in
days (DAY). Note that WEEKV differs from WEEK in that
WEEKV.1 begins on Monday, WEEKV.2 begins on Tuesday,
and so on.
154 Chapter 7 / Dates, Times, and Intervals
155
8
Error Processing and Debugging
macro-related when you use the macro facility macro compile time or
incorrectly execution time, DATA, or
PROC step compile time
or execution time
Syntax Errors
Syntax errors occur when program statements do not conform to the rules of the
SAS language. Here are some examples of syntax errors:
n misspelled SAS keyword
n missing a semicolon
When SAS encounters a syntax error, it first attempts to correct the error by
attempting to interpret what you mean. Then SAS continues processing your
program based on its assumptions. If SAS cannot correct the error, it prints an error
message to the log. If you do not want SAS to correct syntax errors, you can set the
NOAUTOCORRECT system option. For more information, see the
AUTOCORRECT system option in the SAS System Options: Reference.
In the following example, the DATA statement is misspelled, and SAS prints a
warning message to the log. Because SAS could interpret the misspelled word, the
program runs and produces output.
date temp;
x=1;
run;
Example Code 8.1 SAS Log: Syntax Error (Misspelled Key Word)
39 date temp;
----
14
WARNING 14-169: Assuming the symbol DATA was misspelled as date.
40 x=1;
41 run;
NOTE: The data set WORK.TEMP has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
42
43 proc print data=temp;
44 run;
NOTE: There were 1 observations read from the data set WORK.TEMP.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Some errors are explained fully by the message that SAS prints in the log. Other
error messages are not as easy to interpret because SAS is not always able to
detect exactly where the error occurred. For example, when you fail to end a SAS
statement with a semicolon, SAS does not always detect the error at the point
where it occurs. This is because SAS statements are free-format (they can begin
and end anywhere). In the following example, the semicolon at the end of the DATA
statement is missing. SAS prints the word ERROR in the log, identifies the possible
location of the error, prints an explanation of the error, and stops processing the
DATA step.
data temp
x=1;
run;
67 data temp
68 x=1;
-
22
76
ERROR 22-322: Syntax error, expecting one of the following: a name,
a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_.
69 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
70
71 proc print data=temp;
72 run;
NOTE: There were 1 observations read from the data set WORK.TEMP.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Whether subsequent steps are executed depends on which method of running SAS
you use, as well as on your operating environment.
Note: You can add these lines to your code to fix unmatched comment tags,
unmatched quotation marks, and missing semicolons:
/* '; * "; */;
quit;
run;
Semantic Errors
Semantic errors occur when the form of the elements in a SAS statement is correct,
but the elements are not valid for that usage. Semantic errors are detected at
compile time and can cause SAS to enter syntax check mode. (For a description of
syntax check mode, see “Syntax Check Mode” on page 165.)
Examples of semantic errors include the following:
n specifying the wrong number of arguments for a function
In the following example, SAS detects an invalid reference to the array All at
compile time.
data _null_;
array all{*} x1-x5;
all=3;
datalines;
Types of Errors in SAS 159
1 1.5
. 3
2 4.5
3 2 7
3 . .
;
run;
Example Code 8.3 SAS Log: Semantic Error (invalid Reference to an Array)
81 data _null_;
82 array all{*} x1-x5;
ERROR: invalid reference to the array all.
83 all=3;
84 datalines;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds
90 ;
91
92 run;
93 proc printto; run;
Here is another example of a semantic error that occurs at compile time. In this
DATA step, the libref SomeLib has not been previously assigned in a LIBNAME
statement.
data test;
set somelib.old;
run;
Example Code 8.4 SAS Log: Semantic Error (Libref Not Previously Assigned)
Execution-Time Errors
Definition
Execution-time errors are errors that occur when SAS executes a program that
processes data values. Most execution-time errors produce warning messages or
notes in the SAS log but allow the program to continue executing.1 The location of
an execution-time error is usually given as line and column numbers in a note or
error message.
Common execution-time errors include the following:
n invalid arguments to functions
Out-of-Resources Condition
An execution-time error can also occur when you encounter an out-of-resources
condition, such as a full disk, or insufficient memory for a SAS procedure to
complete. When these conditions occur, SAS attempts to find resources for current
use. For example, SAS might ask the user for permission to perform these actions
in out-of-resource conditions:
n Delete temporary data sets that might no longer be needed.
Examples
In the following example, an execution-time error occurs when SAS uses data
values from the second observation to perform the division operation in the
1. When you run SAS in noninteractive mode, more serious errors can cause SAS to enter syntax check mode and stop
processing the program.
Types of Errors in SAS 161
data inventory;
input Item $ 1-14 TotalCost 15-20
UnitsOnHand 21-23;
UnitCost=TotalCost/UnitsOnHand;
datalines;
Hammers 440 55
Nylon cord 35 0
Ceiling fans 1155 30
;
123 ;
124
125 proc print data=inventory;
126 format TotalCost dollar8.2 UnitCost dollar8.2;
127 run;
SAS executes the entire step, assigns a missing value for the variable UnitCost in
the output, and writes the following to the SAS log:
n a note that describes the error
n the contents of the program data vector at the time the error occurred
Note that the values that are listed in the program data vector include the _N_ and
_ERROR_ automatic variables. These automatic variables are assigned temporarily
to each observation and are not stored with the data set.
In the following example of an execution-time error, the program processes an array
and SAS encounters a value of the array's subscript that is out of range. SAS prints
an error message to the log and stops processing.
data test;
array all{*} x1-x3;
input I measure;
if measure > 0 then
all{I} = measure;
datalines;
1 1.5
. 3
2 4.5
;
Example Code 8.6 SAS Log: Execution-Time Error (Subscript Out of Range)
172 ;
173
174 proc print data=test;
175 run;
NOTE: No variables in data set WORK.TEST.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Data Errors
Definition
Data errors occur when some data values are not appropriate for the SAS
statements that you have specified in the program. For example, if you define a
variable as numeric, but the data value is actually character, SAS generates a data
error. SAS detects data errors during program execution and continues to execute
the program, and does the following:
n writes an invalid data note to the SAS log.
n prints the input line and column numbers that contain the invalid value in the
SAS log. Unprintable characters appear in hexadecimal. To help determine
column numbers, SAS prints a rule line above the input line.
n prints the observation under the rule line.
In this example, a character value in the Number variable results in a data error
during program execution:
data age;
164 Chapter 8 / Error Processing and Debugging
240 ;
241
242 proc print data=age;
243 run;
You can also use the INVALIDDATA= system option to assign a value to a variable
when your program encounters invalid data. For more information, see the
INVALIDDATA= system option in SAS System Options: Reference.
the ?? modifier also sets the automatic variable _ERROR_ to 0. For example, these
two sets of statements are equivalent:
n input x ?? 10-12;
n input x ? 10-12;
_error_=0;
Macro-related Errors
Several types of macro-related errors exist:
n macro compile time and macro execution-time errors, generated when you use
the macro facility itself
n errors in the SAS code produced by the macro facility
For more information about macros, see SAS Macro Language: Reference.
n creates the descriptor portion of any output data sets that are specified in
program statements
n does not write any observations to new data sets that SAS creates
166 Chapter 8 / Error Processing and Debugging
n does not execute most of the subsequent DATA steps or procedures in the
program (exceptions include PROC DATASETS and PROC CONTENTS)
Note: Any data sets that are created after SAS has entered syntax check mode do
not replace existing data sets with the same name.
When syntax checking is enabled, SAS underlines the point where it detects a
syntax or semantic error in a DATA step and identifies the error by number. SAS
then enters syntax check mode and remains in this mode until the program finishes
executing. When SAS enters syntax check mode, all DATA step statements and
PROC step statements are validated.
276
277 proc print data=temporary;
ERROR: Variable ITEM2 not found.
ERROR: Variable ITEM3 not found.
278 var Item1 Item2 Item3;
279 run;
NOTE: The SAS System stopped processing this step because of
errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.52 seconds
cpu time 0.00 seconds
SAS displays two error messages, one for the variable Item2 and one for the
variable Item3.
When you are running debugged production programs that are unlikely to encounter
errors, you might want to force SAS to abend after a single error occurs. You can
use the ERRORABEND system option to do this.
report:
proc report data=mylib.mydata;
...more sas code...;
run;
endReadSortReport:
Note: The use of label: in checkpoint mode and restart mode is valid only outside of
a DATA or PROC statement. Checkpoint mode and restart mode for labeled code
sections are not valid for labels within a DATA step or macros.
Checkpoint mode and restart mode can be enabled for either DATA and PROC
steps or for labeled code sections, but not both simultaneously. To use checkpoint
mode and restart mode on a step-by-step basis, use the step checkpoint mode and
the step restart mode. To use checkpoint mode and restart mode based on groups
of code sections, use the label checkpoint mode and the label restart mode. Each
group of code is identified by a unique label. If you use labels, all steps in a SAS
program must belong to a labeled code section.
When checkpoint mode is enabled, SAS records information about DATA and
PROC steps or labeled code sections in a checkpoint library. When a batch program
terminates prematurely, you can resubmit the program in restart mode to complete
execution. In restart mode, global statements are re-executed, macro definitions are
recompiled, and macros are re-executed.. SAS reads the data in the checkpoint
library to determine which steps or labeled code sections completed. Program
execution resumes with the step or the label that was executing when the failure
occurred.
The checkpoint-restart data contains information only about the DATA and PROC
steps or the labeled code sections that completed and the step or labeled code
sections that did not complete. The checkpoint-restart data does not contain the
following information:
n information about macro variables and macro definitions
n information that might have been processed in the step or labeled code section
that did not complete
Note: Checkpoint mode is not valid for batch programs that contain the DM
statement to submit commands to SAS. If checkpoint mode is enabled and SAS
encounters a DM statement, checkpoint mode is disabled and the checkpoint
catalog entry is deleted.
As a best practice, if you use labeled code sections, add a label at the end of your
program. When the program completes successfully, the label is recorded in the
checkpoint-restart data. If the program is submitted again in restart mode, SAS
knows that the program has already completed successfully.
If a DATA or PROC step must be re-executed, you can add the global statement
CHECKPOINT EXECUTE_ALWAYS immediately before the step. This statement
tells SAS to always execute the following step without considering the checkpoint-
restart data. It is applicable only to the step that follows the statement. For more
information, see “CHECKPOINT EXECUTE_ALWAYS Statement” in SAS Global
Statements: Reference.
Error Processing in SAS 169
You enable checkpoint mode and restart mode for DATA and PROC steps by using
system options when you start the batch program in SAS.
n STEPCHKPT system option enables checkpoint mode, which indicates to SAS
to record checkpoint-restart data
n STEPCHKPTLIB system option identifies a user-specified checkpoint-restart
library
n STEPRESTART system option enables restart mode, ensuring that execution
resumes with the DATA or PROC step indicated by the checkpoint-restart library.
You enable checkpoint mode and the restart mode for labeled code sections by
using these system options when you start the batch program in SAS:
n LABELCHKPT system option enables checkpoint mode for labeled code
sections, which indicates to SAS to record checkpoint-restart data.
n LABELCHKPTLIB system option identifies a user-specified checkpoint-restart
library
n LABELRESTART system option enables restart mode, ensuring that execution
resumes with the labeled code section indicated by the checkpoint-restart library.
If you use the Work library as your checkpoint-restart library, you can use the
CHKPTCLEAN system option to have the files in the Work library erased after a
successful execution of your batch program.
For information, see the following system options in SAS System Options:
Reference:
n “STEPCHKPT System Option” in SAS System Options: Reference
lost. Under z/OS, it might not be practical for your site to reuse the Work library in a
batch session.
The labels for labeled code sections must be unique. If SAS enters restart mode for
a label that is a duplicate label, SAS starts at the first label. The code between the
duplicate labels might rerun needlessly.
n CHKPTCLEAN specifies whether to erase files in the Work library if the batch
program runs successfully.
In the Windows operating environment, the following SAS command resubmits a
batch program whose checkpoint-restart data was saved to the Work library:
sas -sysin 'c:\mysas\mysasprogram.sas' -stepchkpt -steprestart -noworkinit
-noworkterm -errorcheck strict -errorabend -chkptclean
n NOWORKINIT does not initialize the Work library when SAS starts.
LABELRESTART
specifies whether to execute a batch program by using checkpoint-restart data
for labeled code sections.
MERROR
specifies whether SAS issues a warning message when a macro-like name does
not match a macro keyword.
QUOTELENMAX
if a quoted string exceeds the maximum length allowed, specifies whether SAS
writes a warning message to the SAS log.
SERROR
specifies whether SAS issues a warning message when a macro variable
reference does not match a macro variable.
STEPCHKPT
specifies whether checkpoint-restart data is to be recorded for a batch program.
STEPCHKPTLIB=
specifies the libref of the library where checkpoint-restart data is saved.
STEPRESTART
specifies whether to execute a batch program by using checkpoint-restart data.
VARINITCHK=
specifies whether to stop or continue processing a DATA step when a variable is
not initialized. You can also specify the type of message that is written to the
SAS log.
VNFERR
specifies whether SAS issues an error or warning when a BY variable exists in
one data set but not another data set. SAS only issues these errors or warnings
when processing the SET, MERGE, UPDATE, or MODIFY statements.
For more information about SAS system options, see SAS System Options:
Reference.
n the SYSRC and SYSMSG functions to return information when a data set or
external-files access function encounters an error condition
n the SYSRC automatic macro variable to receive return codes
n the SYSERR automatic macro variable to detect major system errors, such as
out of memory or failure of the component system
n log control options:
MSGLEVEL=
controls the level of detail in messages that are written to the SAS log.
PRINTMSGLIST
controls the printing of extended lists of messages to the SAS log.
SOURCE
controls whether SAS writes source statements to the SAS log.
SOURCE2
controls whether SAS writes source statements included by %INCLUDE to
the SAS log.
9
SAS Output
You can write specific information to the SAS log (such as variable values or text
strings) by using the SAS statements that are described in “Writing to the Log in
All Modes” on page 190.
The log is also used by some of the SAS procedures that perform utility
functions, for example the DATASETS and OPTIONS procedures. See the Base
SAS Procedures Guide.
Because the SAS log provides a journal of program processing, it is an essential
debugging tool. However, certain system options must be in effect to make the
log effective for debugging your SAS programs. “Customizing the Log” on page
191 describes several SAS system options that you can use.
SAS console log
created when the regular SAS log is not active, for recording information,
warnings and error messages. When the SAS log is active, the SAS console log
is used only for fatal system initialization errors or late termination messages.
Note: For more information, see the SAS documentation for your operating
environment for specific information about the destination of the SAS console
log.
SAS logging facility output
contain log messages that you create using the SAS logging facility. Logging
facility messages can be created within SAS programs or they can be created by
SAS for logging SAS server messages. Logging facility log messages are based
on message categories such as authentication, administration, performance, or
customized message categories in SAS programs. In SAS programs, you use
logging facility functions, autocall macros, or DATA step component objects to
create the logging facility environment.
The logging facility environment consists of loggers, appenders, and log events.
A logger defines a message category, references one or more appenders, and
specifies the logger's message level threshold. The message level threshold can
be one of the following, from lowest to highest: trace, debug, info, warn, error, or
fatal. An appender defines the physical location to write log messages and the
format of the message. A log event consists of a log message, a message
Routing and Customizing SAS Output 177
threshold, and a logger. Log events are initiated by SAS servers and SAS
programs.
When SAS processes a logging facility log event, it compares the message level
of the log event to the message threshold of the logger that is named in the log
event. If the log event message threshold is the same or higher than the logger's
message threshold, the message is written to the locations that are specified by
the appenders that are referenced in the logger definition. If the log event is not
accepted by the logger, the message is discarded.
Appenders are defined for the duration of a macro program or a DATA step.
Loggers are defined for the duration of the SAS session.
For more information, see SAS Logging: Configuration and Programming
Reference.
Definition
The destination in SAS is a designation that the Output Delivery System (ODS) uses
to generate a specific type of output. Or, simply put, it is how and where ODS routes
your output. For example, ODS can route your output to a browser as HTML, to a
file, or to your terminal or display as a simple list report. The destination of your
output depends on the following:
n your operating environment
Default Destinations
In SAS 9.3 and later versions, when running SAS in windowing mode in the
Microsoft Windows and UNIX operating environments, output is sent by default to
the HTML destination (HTML is the default destination). Also, ODS Graphics is
turned on by default in the windowing environment under UNIX and Windows for
SAS 9.3 and later versions.
For running SAS in batch mode, however, LISTING is the default destination for
SAS 9.4 and earlier versions, and ODS Graphics is turned off by default. See Table
9.1 on page 178 for a comparison of output destinations based on SAS version and
operating mode. Your defaults might be different due to your registry or
configuration file settings.
The following table shows the default destinations for each method of operation
based on SAS version:
178 Chapter 9 / SAS Output
Mode of Running
SAS Version SAS Viewer ODS Destination
SAS 9.3 and later windowing mode SAS Results Viewer or HTML
browser window
Overview
With SAS, there are many ways to control where your log, procedure, and DATA
step output is sent. The method that you use depends on your operating system
and the mode in which you are running SAS. See Table 9.2 on page 179 for a list of
commonly used methods for changing the output destination. You can route your
output directly to a PC or terminal display, to a printer, or to an external file.Output
destinations can be specified using SAS procedures, system options, commands,
statements, or global ODS statements.
Routing and Customizing SAS Output 179
PRINTTO procedure routes DATA step, log, or procedure output from the system
default destinations to the destination that you choose. The
PRINTTO procedure defines non-ODS destinations.
FILENAME statement associates a fileref with an external file or output device and
enables you to specify file and device attributes
FILE command: Windows stores the contents of the LOG or OUTPUT windows in files
that you specify, when the command is issued from within the
windowing environment.
ODS OUTPUT Statement produces a SAS data set from an output object and manages
the selection and exclusion lists for the OUTPUT destination.
ODS DOCUMENT produces and ODS document that enables you to restructure,
statement navigate, and replay your data in different ways. It also
enables you to specify multiple destinations without needing to
rerun your analysis or repeat your database query.
ODS HTML Statement opens, manages, or closes the HTML destination, which
produces HTML 4.0 output that contains embedded style
sheets.
ODS RTF Statement opens, manages, or closes the RTF destination, which
produces output written in Rich Text Format for use with
Microsoft Word 2002.
180 Chapter 9 / SAS Output
SAS System Options redefine the destination of log and output for an entire SAS
program. These system options are specified when you invoke
SAS. Here are the system options used to route output:
“ALTLOG System Option: Windows” in SAS Companion for
Windows
n ALTLOG= (Windows, UNIX, z/OS)
For conceptual information about global ODS statements, see the following
resources:
n “Destination Category Table” in SAS Output Delivery System: User’s Guide.
Customizing Output
DATE | NODATE system option controls printing of date and time values. When
this option is enabled, SAS prints on the top of
each page of output the date and time the SAS
job started. When you run SAS in interactive
mode, the date and time the job started is the
date and time that you started your SAS session.
Routing and Customizing SAS Output 181
LINESIZE= and PAGESIZE= system change the default number of lines per page
options (page size) and characters per line (line size) for
printed output. The default depends on the
method of running SAS and the settings of
certain SAS system options. Specify new page
and line sizes in the OPTIONS statement or
OPTIONS window. You can also specify line and
page size for DATA step output in the FILE
statement.
The values that you use for the LINESIZE= and
PAGESIZE= system options can significantly
affect the appearance of the output that is
produced by some SAS procedures.
quit;
Note: At SAS start-up, unless you have previously closed the HTML destination,
output is sent to the WORK directory by default. If you close the HTML destination
and re-open it in the same SAS session, all output goes to the current directory
rather than the WORK directory. You do not have to specify ODS HTML CLOSE; to
view your output.
ods listing;
options nodate;
title 'Students';
proc print data=sashelp.class;
where weight>100;
run;
quit;
ods html;
ods listing close;
See the procedure descriptions in the Base SAS Procedures Guide for examples of
output from SAS procedures. For a discussion and examples of DATA step output,
see the “FILE Statement” in SAS DATA Step Statements: Reference and the “PUT
Statement” in SAS DATA Step Statements: Reference.
NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA. 1
NOTE: SAS (r) Proprietary Software 9.4 (TS1B0) 2
Licensed to SAS Institute Inc., Site 1. 3
NOTE: This session is executing on the W32_7PRO platform. 4
1 options pagesize=24
2 linesize=64 pageno=1 nodate; 5
3 data logsample; 6
5 infile
5 ! '\\myserver\my-directory-path\sampledata.dat'; 7
6 input LastName $ ID $ Gender $ Birth : date7. score1
6 ! score2 score3 score4 score5 score6 score7 score8;
7 format Birth mmddyy8.;
8 run;
Filename=\\myserver\my-directory-path\sampledata.dat,
RECFM=V,LRECL=256,File Size (bytes)=296,
Last Modified=08Jun2009:15:42:26,
Create Time=08Jun2009:15:42:26
9
10 proc sort data=logsample; 12
11 by LastName;
12 run;
13
14 proc print data=logsample; 14
15 by LastName;
16 run;
The following list corresponds to the circled numbers in the SAS log shown above:
1 copyright information
2 SAS system release used to run this program
3 name and site number of the computer installation where the program ran
4 platform used to run the program
5 OPTIONS statement to set a page size of 24, a line size of 64, and to suppress
the date in the output
6 SAS statements that make up the program (if the SAS system option SOURCE
is enabled)
7 long statement continued to the next line. Note that the continuation line is
preceded by an exclamation point (!), and that the line number does not change.
8 input file information-notes or warning messages about the raw data and where
they were obtained (if the SAS system option NOTES is enabled)
9 the number and record length of records read from the input file (if the SAS
system option NOTES is enabled)
10 SAS data set that your program created; notes that contain the number of
observations and variables for each data set created (if the SAS system option
NOTES is enabled)
11 reported performance statistics when the STIMER option or the FULLSTIMER
option is set
12 procedure that sorts your data set
13 note about the sorted SAS data set
14 procedure that prints your data set
the SAS logging facility and the LOGPARM= option is ignored. The LOG= option is
honored only when the %S{App.Log} conversion character is specified in the
logging configuration file.
The following sections discuss the log options that you can configure using the
LOGPARM= system option and how you would name the SAS log for those options
when the logging facility has not been initiated.
The LOG= system option names the SAS log. The LOGPARM= system option
enables you to perform the following tasks:
n append or replace an existing SAS log
n determine when to write to the SAS log
For information about these log system options, see “LOGPARM= System Option”
in SAS System Options: Reference in the documentation for your operating
environment: For information about the SAS logging facility, see SAS Logging:
Configuration and Programming Reference.
The OPEN= option is ignored when the ROLLOVER= option of the LOGPARM=
system option is set to a specific size, n.
When the SAS log is created on February 2, 2009, the name of the log is
2009Feb02sas.log.
Directives resolve only when the value of the ROLLOVER= option of the
LOGPARM= system option is set to AUTO or SESSION. If directives are specified in
the log name and the value of the ROLLOVER option is NONE or a specific size, n,
the directive characters, such as #b or #Y, become part of the log name. Using the
example above for the LOG= system option, if the LOGPARM= system option
specifies ROLLOVER=NONE, the name of the SAS log is #Y%b#dsas.log.
For a complete list of directives, see “LOGPARM= System Option” in SAS System
Options: Reference.
Automatically Rolling Over the SAS Log When Directives Change: When the SAS
log name contains one or more directives and the ROLLOVER= option of the
LOGPARM= system option is set to AUTO, SAS closes the log and opens a new log
when the directive values change. The new SAS log name contains the new
directive values.
The follow table shows some of the log names that are created when SAS is started
on the second of the month at 6:15 AM, using this SAS command:
sas -objectserver -log "london#n#d#%H.log"
-logparm
"rollover=auto"
The directive #n inserts the system node name into the log name. #d adds the day
of the month to the log name. #H adds the hour to the log name. The node name for
this example is Thames. The log for this SAS session rolls over when the hour
changes and when the day changes.
Rolling Over the SAS Log by SAS Session: To roll over the log at the start of a SAS
session, specify the LOGPARM=“ROLLOVER=SESSION” option when SAS starts.
SAS resolves the system-specific directives by using the system information
obtained when SAS starts. No roll over occurs during the SAS session and the log
file is closed at the end of the SAS session.
Rolling Over the SAS Log by the Log Size: To roll over the log when the log reaches
a specific size, specify the LOGPARM=“ROLLOVER=n” option when SAS starts. n
is the maximum size of the log, in bytes, and it cannot be smaller than 10K (10,240)
bytes. When the log reaches the specified size, SAS closes the log and appends
the text “old” to the filename (for example, londonold.log). SAS opens a new log
using the value of the LOG= option for the log name and ignores the OPEN= option
statement in the LOGPARM system option. This is done so that SAS never writes
over an existing log file. Directives in log names are ignored for logs that roll over
based on log size.
To ensure unique log filenames between servers, SAS creates a lock file that is
based on the log filename. The lock filename is logname.lck, where logname is the
value of the LOG= option. If a lock file exists for a server log and another server
specifies the same log name, the log and lock filenames for the second server have
a number appended to the names. The numbers begin with 2 and increment by 1
for subsequent requests for the same log filename. For example, if a lock exists for
the log file london.log, the second server log would be london2.log and the lock file
would be london2.lck.
No SAS Log Roll Over: To not roll over the log at all, specify the LOGPARM=
“ROLLOVER=NONE” option when SAS starts. Directives are not resolved and no
rollover occurs. For example, if LOG=“March#b.log”, the directive #b does not
resolve and the log name is March#b.log.
PUTLOG statement
writes a user-specified message to the SAS log. Use the PUTLOG statement in
a DATA step.
LIST statement
writes to the SAS log the input data records for the data line that is being
processed. The LIST statement operates only on data that are read with an
INPUT statement. It has no effect on data that are read with a SET, MERGE,
MODIFY, or UPDATE statement. Use the LIST statement in a DATA step.
DATA statement with /NESTING option
writes to the SAS log a note for the beginning and end for each nesting level of
DO-END and SELECT-END statements. This enables you to debug mismatched
DO-END and SELECT-END statements.
ERROR statement
sets the automatic _ERROR_ variable to 1 and (OPTIONAL) writes to the log a
message that you specify. Use the ERROR statement in a DATA step.
Use the PUT, PUTLOG, LIST, DATA, and ERROR statements in combination with
conditional processing to debug DATA steps by writing selected information to the
log.
FULLSTIMER
writes a subset of system performance statistics to the SAS log.
ISPNOTES
specifies whether ISPF error messages are written to the SAS log. The
ISPNOTES system option is valid only under the z/OS operating environment.
HOSTINFOLONG
writes additional operating environment information to the SAS log when SAS
starts.
LOGPARM “OPEN=APPEND | REPLACE | REPLACEOLD”
when a log file already exists and SAS is opening the log, the LOGPARM option
specifies whether to append to the existing log or to replace the existing log. The
REPLACEOLD option specifies to replace logs that are more than one day old.
MEMRPT
specifies whether memory usage statistics are written to the SAS log for each
step. The MEMRPT system option is valid only under the z/OS operating
environment.
MLOGIC
writes macro execution trace information to the SAS log.
MLOGICNEST
writes macro nesting execution trace information to the SAS log.
MPRINT | NOMPRINT
specifies whether SAS statements that are generated by macro execution are
written to the SAS log.
MSGLEVEL=N | I
specifies the level of detail in messages that are written to the SAS log. If the
MSGLEVEL system option is set to N, the log displays notes, warnings, and
error messages only. If MSGLEVEL is set to I, then the log displays additional
notes pertaining to index usage, merge processing, HADOOP MapReduce jobs,
and sort utilities.
NEWS=external-file
specifies whether news information that is maintained at your site is written to
the SAS log.
NOTES | NONOTES
specifies whether notes (messages beginning with NOTE) are written to the SAS
log. NONOTES does not suppress error or warning messages.
OPLIST
specifies whether to write to the SAS log the values of all system options that are
specified when SAS is invoked.
OVP | NOOVP
specifies whether error messages that are printed by SAS are overprinted.
PAGEBREAKINITIAL
specifies whether the SAS log and the listing file begin on a new page.
PRINTMSGLIST | NOPRINTMSGLIST
specifies whether extended lists of messages are written to the SAS log.
RTRACE
produces a list of resources that are read during SAS execution and writes them
to the SAS log if a location is not specified for the RTRACELOC= system option.
The RTRACE system option is valid only for the Windows and UNIX operating
environments.
The SAS Log 193
SOURCE | NOSOURCE
specifies whether SAS writes source statements to the SAS log.
SOURCE2 | NOSOURCE2
specifies whether SAS writes secondary source statements from files included
by %INCLUDE statements to the SAS log.
SYMBOLGEN | NOSYMBOLGEN
specifies whether the results of resolving macro variable references are written
to the SAS log.
VERBOSE
specifies whether SAS writes to the batch log or to the computer monitor the
values of the system options that are specified in the configuration file.
See SAS System Options: Reference for more information about how to use these
and other SAS system options.
Operating Environment Information: See the documentation for your operating
environment for other options that affect log output.
MSGCASE
specifies whether to display notes, warning, and error messages in uppercase
letters or lowercase letters.
MISSING= system option
specifies the character to be printed for missing numeric variable values.
NUMBER system option
controls whether the page number is printed on the first title line of each page of
printed output.
PAGE statement
skips to a new page in the SAS log and continues printing from there.
PAGESIZE= system option
specifies the number of lines that you can print per page of SAS output.
SKIP statement
skips a specified number of lines in the SAS log.
STIMEFMT= system option
specifies the format to use for displaying the read and CPU processing times
when the STIMER system option is set. The STIMEFMT= system option is valid
under Windows, VMS, and UNIX operating environments.
Operating Environment Information: The range of values for the FILE statement
and for SAS system options depends on your operating environment. See the SAS
documentation for your operating environment for more information.
For more information about how to use these and other SAS system options and
statements, see SAS System Options: Reference.
10
By-Group Processing in SAS
Programs
n MERGE statement
n MODIFY statement
n UPDATE statement
When you create reports or summaries with SAS procedures, BY-group processing
enables you to group information in the output according to values of one or more
variables.
SAS Data Sets,” on page 509. For even more extensive examples of BY-group
processing, see Combining and Modifying SAS Data Sets: Examples.
n For information about the BY statement, see Statements in SAS DATA Step
Statements: Reference.
n For information about how to use BY-group processing with other software
products, see the SAS documentation for those products.
197
11
WHERE-Expression Processing
n WHERE= data set option. The following PRINT procedure includes the
WHERE= data set option:
proc print data=employees (where=(startdate > '01jan2001'd));
run;
n WHERE clause in the SQL procedure, SCL, and SAS/IML software. For
example, the following SQL procedure includes a WHERE clause to select only
the states where the murder count is greater than seven:
proc sql;
select state from crime
where murder > 7;
n SAS view (DATA step view, SAS/ACCESS view, PROC SQL view), stored with
the definition. For example, the following SQL procedure creates an SQL view
named STAT from the data file Crime and defines a WHERE expression for the
SQL view definition:
proc sql;
create view stat as
select * from crime
where murder > 7;
In some cases, you can combine the methods that you use to specify a WHERE
expression. That is, you can use a WHERE statement as follows:
n in conjunction with a WHERE= data set option
n along with the WHERE= data set option in windowing procedures, and in
conjunction with the WHERE command
n on a SAS view that has a stored WHERE expression
For example, it might be useful to combine methods when you merge data sets.
That is, you might want different criteria to apply to each data set when you create a
subset of data. However, when you combine methods to create a subset of data,
there are some restrictions. For example, in the DATA step, if a WHERE statement
and a WHERE= data set option apply to the same data set, the data set option
takes precedence. For details, see the documentation for the method that you are
using to specify a WHERE expression.
Syntax of WHERE Expression 199
Note: By default, a WHERE expression does not evaluate added and modified
observations. To specify whether a WHERE expression should evaluate updates,
you can specify the WHEREUP= data set option. See the “WHEREUP= Data Set
Option” in SAS Data Set Options: Reference.
Specifying an Operand
Variable
A variable is a column in a SAS data set. Each SAS variable has attributes like
name and type (character or numeric). The variable type determines how you
specify the value for which you are searching. For example:
where score > 50;
where date >= '01jan2001'd and time >= '9:00't;
where state = 'Texas';
In a WHERE expression, you cannot use automatic variables created by the DATA
step (for example, FIRST.variable, LAST.variable, _N_, or variables created in
assignment statements).
200 Chapter 11 / WHERE-Expression Processing
As in other SAS expressions, the names of numeric variables can stand alone. SAS
treats numeric values of 0 or missing as false; other values as true. In the following
example, the WHERE expression returns all rows where EMPNUM is not missing
and not zero and ID is not missing and not zero:
where empnum and id;
The names of character variables can also stand alone. SAS selects observations
where the value of the character variable is not blank. For example, the following
WHERE expression returns all values not equal to blank:
where lastname;
SAS Function
A SAS function returns a value from a computation or system manipulation. Most
functions use arguments that you supply, but a few obtain their arguments from the
operating environment. To use a SAS function in a WHERE expression, enter its
name and arguments enclosed in parentheses. Some functions that you might want
to specify include:
n SUBSTR extracts a substring.
The following DATA step produces a SAS data set that contains only observations
from data set Customer in which the value of Name begins with Mac and the value of
variable City is Charleston or Atlanta:
data testmacs;
set customer;
where substr (name,1,3) = 'Mac' and
(city='Charleston' or city='Atlanta');
run;
The OF syntax is permitted in some SAS functions, but it cannot be used when
using those functions that are specified in a WHERE clause. In the following DATA
step example, OF can be used with RANGE.
data abc;
x1=2;
x2=3;
x3=4;
r=range(of x1-x3);
run;
When you use the WHERE clause with RANGE and OF, an error is written to the
SAS log.
Syntax of WHERE Expression 201
Note: The SAS functions that are used in a WHERE expression and can be
optimized by an index are the SUBSTR function and the TRIM function.
For more information about SAS functions, see SAS Functions and CALL Routines:
Reference.
Constant
A constant is a fixed value such as a number or quoted character string, that is, the
value for which you are searching. A constant is a value of a variable obtained from
the SAS data set, or values created within the WHERE expression itself. Constants
are also called literals. For example, a constant could be a flight number or the
name of a city. A constant can also be a time, date, or datetime value.
The value is either numeric or character. Note the following rules regarding whether
to use quotation marks:
n If the value is numeric, do not use quotation marks.
where price > 200;
n You can use either single or double quotation marks, but do not mix them.
Quoted values must be exact matches, including case.
202 Chapter 11 / WHERE-Expression Processing
n A SAS date constant must be enclosed in quotation marks. When you specify
date values, case is not important. You can use single or double quotation
marks. The following expressions are equivalent:
where birthday = '24sep1975'd;
where birthday = '24sep1975"d;
Specifying an Operator
Arithmetic Operators
Arithmetic operators enable you to perform a mathematical operation. The
arithmetic operators include the following:
Comparison Operators
Comparison operators (also called binary operators) compare a variable with a
value or with another variable. Comparison operators propose a relationship and
ask SAS to determine whether that relationship holds. For example, the following
WHERE expression accesses only those observations that have the value 78753 for
the numeric variable ZipCode:
where zipcode eq 78753;
Mnemonic
Symbol Equivalent Definition Example
When you do character comparisons, you can use the colon (:) modifier to compare
only a specified prefix of a character string. For example, in the following WHERE
expression, the colon modifier, used after the equal sign, tells SAS to look at only
the first character in the values for variable LastName and to select the observations
with names beginning with the letter S:
where lastname=: 'S';
Note that in the SQL procedure, the colon modifier that is used in conjunction with
an operator is not supported; you can use the LIKE operator instead.
IN Operator
The IN operator, which is a comparison operator, searches for character and
numeric values that are equal to one from a list of values. The list of values must be
in parentheses, with each character value in quotation marks and separated by
either a comma or blank.
For example, suppose you want all sites that are in North Carolina or Texas. You
could specify:
where state = 'NC' or state = 'TX';
However, it is easier to use the IN operator, which selects any state in the list:
where state in ('NC','TX');
In addition, you can use the NOT logical operator to exclude a list.
where state not in ('CA', 'TN', 'MA');
integers, and M, N, and all the integers between M and N are included in the range.
For example, the following statements are equivalent.
n y = x in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
n y = x in (1:10);
Note that the previous range condition expression is equivalent to the following:
where empnum >= 500 and empnum <= 1000;
You can combine the NOT logical operator with a fully bounded range condition to
select observations that fall outside the range. Note that parentheses are required:
where not (500 <= empnum <= 1000);
BETWEEN-AND Operator
The BETWEEN-AND operator is also considered a fully bounded range condition
that selects observations in which the value of a variable falls within an inclusive
range of values.
You can specify the limits of the range as constants or expressions. Any range that
you specify is an inclusive range, so that a value equal to one of the limits of the
range is within the range. The general syntax for using BETWEEN-AND is as
follows:
WHERE variable BETWEEN value AND value;
For example:
where empnum between 500 and 1000;
where taxes between salary*0.30 and salary*0.50;
You can combine the NOT logical operator with the BETWEEN-AND operator to
select observations that fall outside the range:
where empnum not between 500 and 1000;
Note: The BETWEEN-AND operator and a fully bounded range condition produce
the same results. That is, the following WHERE expressions are equivalent:
where 500 <= empnum <= 1000;
where empnum between 500 and 1000;
CONTAINS Operator
The most common usage of the CONTAINS (?) operator is to select observations by
searching for a specified set of characters within the values of a character variable.
The position of the string within the variable's values does not matter. However, the
operator is case sensitive when making comparisons.
Syntax of WHERE Expression 205
The following examples select observations having the values Mobay and Brisbayne
for the variable Company, but they do not select observations containing Bayview:
where company contains 'bay';
where company ? 'bay';
You can combine the NOT logical operator with the CONTAINS operator to select
observations that are not included in a specified string:
where company not contains 'bay';
You can also use the CONTAINS operator with two variables, that is, to determine
whether one variable is contained in another. When you specify two variables, keep
in mind the possibility of trailing spaces, which can be resolved using the TRIM
function.
proc sql;
select *
from table1 as a, table2 as b
where a.fullname contains trim(b.lastname) and
a.fullname contains trim(b.firstname);
In addition, the TRIM function is helpful when you search on a macro variable.
proc print;
where fullname contains trim("&lname");
run;
And the following is equivalent for numeric data. This statement differentiates
missing values with special missing value characters:
where idnum <= .Z;
You can combine the NOT logical operator with IS NULL or IS MISSING to select
nonmissing values, as follows:
where salary is not missing;
LIKE Operator
The LIKE operator selects observations by comparing the values of a character
variable to a specified pattern, which is referred to as pattern matching. The LIKE
operator is case sensitive. There are two special characters available for specifying
a pattern:
206 Chapter 11 / WHERE-Expression Processing
underscore (_)
matches just one character in the value for each underscore character. You can
specify more than one consecutive underscore character in a pattern, and you
can specify a percent sign and an underscore in the same pattern. For example,
you can use different forms of the LIKE operator to select character values from
this list of first names:
n Diana
n Diane
n Dianna
n Dianthus
n Dyan
The following table shows which of these names is selected by using various forms
of the LIKE operator:
You can use a SAS character expression to specify a pattern, but you cannot use a
SAS character expression that uses a SAS function.
You can combine the NOT logical operator with LIKE to select values that do not
have the specified pattern, such as the following:
where frstname not like 'D_an%';
Because the % and _ characters have special meaning for the LIKE operator, you
must use an escape character when searching for the % and _ characters in values.
An escape character is a single character that, in a sequence of characters, signifies
that what follows takes an alternative meaning. For the LIKE operator, an escape
character signifies to search for literal instances of the % and _ characters in the
variable's values instead of performing the special-character function.
For example, if the variable X contains the values abc, a_b, and axb, the following
LIKE operator with an escape character selects only the value a_b. The escape
character (/) specifies that the pattern searches for a literal ' _' that is surrounded by
the characters a and b. The escape character (/) is not part of the search.
where x like 'a/_b' escape '/';
Without an escape character, the following LIKE operator would select the values
a_b and axb. The special character underscore in the search pattern matches any
single b character, including the value with the underscore:
Syntax of WHERE Expression 207
Sounds-like Operator
The sounds-like ( =*) operator selects observations that contain a spelling variation
of a specified word or words. The operator uses the Soundex algorithm to compare
the variable value and the operand. For more information, see the SOUNDEX
function in SAS Functions and CALL Routines: Reference.
Note: Note that the SOUNDEX algorithm is English-biased, and is less useful for
languages other than English.
Although the sounds-like operator is useful, it does not always select all possible
values. For example, consider that you want to select observations from the
following list of names that sound like Smith:
n Schmitt
n Smith
n Smithson
n Smitt
n Smythe
The following WHERE expression selects all the names from this list except
Smithson:
where lastname=* 'Smith';
You can combine the NOT logical operator with the sounds-like operator to select
values that do not contain a spelling variation of a specified word or words, such as:
where lastname not =* 'Smith';
SAME-AND Operator
Use the SAME-AND operator to add more conditions to an existing WHERE
expression later in the program without retyping the original conditions. This
capability is useful with the following:
n interactive SAS procedures
Use the SAME-AND operator when you already have a WHERE expression defined
and you want to insert additional conditions. The SAME-AND operator has the
following form:
n where-expression-1;
SAS selects observations that satisfy the conditions after the SAME-AND operator
in addition to any previously defined conditions. SAS treats all of the existing
conditions as if they were conditions separated by AND operators in a single
WHERE expression.
The following example shows how to use the SAME-AND operator within RUN
groups in the GPLOT procedure. The SAS data set YEARS has three variables and
contains quarterly data for the 2009–2011 period:
proc gplot data=years;
plot unit*quar=year;
run;
For example, if A is less than B, then the following would return the value of A:
where x = (a min b);
Note: The symbol representation >< is not supported, and <> is interpreted as “not
equal to.”
Concatenation Operator
The concatenation operator concatenates character values. You indicate the
concatenation operator as follows:
n || (two OR symbols)
For example:
where name = 'John'||'Smith';
Prefix Operators
The plus sign (+) and minus sign (–) can be either prefix operators or arithmetic
operators. They are prefix operators when they appear at the beginning of an
expression or immediately preceding an open parenthesis. A prefix operator is
applied to the variable, constant, SAS function, or parenthetic expression.
where z = −(x + y);
Syntax
You can combine or modify WHERE expressions by using the logical operators
(also called Boolean operators) AND, OR, and NOT. The basic syntax of a
compound WHERE expression is as follows:
WHERE where-expression-1 AND | OR | NOT where-expression-n
AND combines two conditions by finding observations that satisfy both conditions.
For example:
where skill eq 'java' and years eq 4;
NOT modifies a condition by finding the complement of the specified criteria. You
can use the NOT logical operator in combination with any SAS and WHERE
expression operator. And you can combine the NOT operator with AND and OR. For
example:
where skill not eq 'java' or years not eq 4;
The logical operators and their equivalent symbols are shown in the following table:
& AND
210 Chapter 11 / WHERE-Expression Processing
! or | or ¦ OR
^ or ~ or ¬ NOT
The result, however, includes all sites that license SAS/GRAPH software along with
the Canadian sites that license SAS/STAT software. To obtain the correct results,
you can use parentheses, which causes SAS to evaluate the comparisons within the
parentheses first, providing a list of sites with either product licenses, then the result
is used for the remaining condition:
where (product='GRAPH' or product='STAT') and country='Canada';
Avoid using the LIKE where country like 'A where country like
operator that begins with % %INA'; '%INA';
or _.
Avoid using arithmetic where salary > 48000; where salary > 12*4000;
expressions.
When used with a WHERE expression, the values specified for OBS= and
FIRSTOBS= are not the physical observation number in the data set, but a logical
number in the subset. For example, obs=3 does not mean the third observation
number in the data set. Instead, it means the third observation in the subset of data
selected by the WHERE expression.
Applying OBS= and FIRSTOBS= processing to a subset of data is supported for the
WHERE statement, WHERE= data set option, and WHERE clause in the SQL
procedure.
If you are processing a SAS view that is a view of another view (nested views),
applying OBS= and FIRSTOBS= to a subset of data could produce unexpected
results. For nested views, OBS= and FIRSTOBS= processing is applied to each
SAS view, starting with the root (lowest-level) view, and then filtering observations
for each SAS view. The result could be that no observations meet the subset and
segment criteria. See “Processing a SAS View” on page 212.
1 The DATA step creates a data set named Work.A containing 100 observations
and two variables: I and X.
2 The WHERE expression I > 90 tells SAS to process only the observations that
meet the specified condition, which results in the subset of observations 91
through 100.
3 The FIRSTOBS= data set option tells SAS to begin processing with the 2nd
observation in the subset of data, which is observation 92.
4 The OBS= data set option tells SAS to stop processing when it reaches the 4th
observation in the subset of data, which is observation 94.
X=I + 1;
output;
end;
run;
data viewa/view=viewa; 2
set a;
Z = X+1;
run;
data viewb/view=viewb; 3
set viewa;
where I > 90;
run;
options obs=3; 4
run;
1 The first DATA step creates a data set named Work.A, which contains 100
observations and two variables: I and X.
2 The second DATA step creates a SAS view named Work.ViewA containing 100
observations and three variables: I, X (from data set Work.A), and Z (assigned in
this DATA step).
3 The third DATA step creates a SAS view named Work.ViewB and subsets the
data with a WHERE statement, which results in the view accessing ten
observations.
4 The OBS= system option applies to the previous SET ViewA statement, which
tells SAS to stop processing when it reaches the 3rd observation in the subset of
data being processed.
5 When SAS processes the PRINT procedure, the following occurs:
1 First, SAS applies obs=3 to Work.ViewA, which stops processing at the 3rd
observation.
2 Next, SAS applies the condition I > 90 to the three observations being
processed. None of the observations meet the criteria.
3 PROC PRINT results in no observations.
To prevent the potential of unexpected results, you can specify obs=max when
creating Work.ViewA to force SAS to read all the observations in the root (lowest-
level) view:
data viewa/view=viewa;
set a (obs=max);
Z = X+1;
run;
Task Method
Task Method
Make the selection at some point during a DATA step rather subsetting IF
than at the beginning
12
Optimizing System Performance
these statistics by using SAS system options that can help you to measure your
job's initial performance and to determine how to improve performance.
system performance
is measured by the overall amount of I/O, memory, and CPU time that your
system uses to process SAS programs. By using the techniques discussed in
the following sections, you can reduce or reallocate your usage of these three
critical resources to improve system performance. You might not be able to take
advantage of every technique for every situation, but you can choose the ones
that are most appropriate for a particular situation.
The STIMER option reports a subset of the FULLSTIMER statistics. The following
example shows STIMER output in the SAS log.
Example Code 12.2 Sample Results of Using the STIMER Option in a UNIX Operating
Environment
on your operating environment, so statistics that you see might differ from the ones
shown.
You can also modify your programs to reduce the number of times it processes the
data internally by:
n creating SAS data sets
n using indexes
You can reduce the number of data accesses by processing more data each time a
device is accessed by:
n setting the ALIGNSASIOFILES, BUFNO=, BUFSIZE=, CATCACHE=,
COMPRESS= , DATAPAGESIZE=. STRIPESIZE=, UBUFNO=, and UBUFSIZE=
system options
n using the SASFILE global statement to open a SAS data set and allocate
enough buffers to hold the entire data set in memory
When using SAS DATA step views, you can improve performance by:
n specifying the VBUFSIZE= system option
Note: Sometimes you might be able to use more than one method, making your
SAS job even more efficient.
However, you can get the same output from the PROC PRINT step without creating
a data set if you use a WHERE statement in the PRINT procedure, as in the
following example:
proc print data=auto.survey;
where seatbelt='yes';
run;
The WHERE statement can save resources by eliminating the number of times that
you process the data. In this example, you might be able to use less time and
memory by eliminating the DATA step. Also, you use less I/O because there is no
intermediate data set. Note that you cannot use a WHERE statement in a DATA
step that reads raw data.
Techniques for Optimizing I/O 221
The extent of savings that you can achieve depends on many factors, including the
size of the data set. It is recommended that you test your programs to determine the
most efficient solution. For more information, see “Deciding Whether to Use a
WHERE Expression or a Subsetting IF Statement” on page 214.
Using Indexes
An index is an optional file that you can create for a SAS data file to provide direct
access to specific observations. The index stores values in ascending value order
for a specific variable or variables and includes information as to the location of
those values within observations in the data file. In other words, an index enables
you to locate an observation by the value of the indexed variable.
Without an index, SAS accesses observations sequentially in the order in which
they are stored in the data file. With an index, SAS accesses the observation
directly. Therefore, by creating and using an index, you can access an observation
faster.
In general, SAS can use an index to improve performance in these situations:
n For WHERE processing, an index can provide faster and more efficient access
to a subset of data.
n For BY processing, an index returns observations in the index order, which is in
ascending value order, without using the SORT procedure.
n For the SET and MODIFY statements, the KEY= option enables you to specify
an index in a DATA step to retrieve particular observations in a data file.
Note: An index exists to improve performance. However, an index conserves some
resources at the expense of others. Therefore, you must consider costs associated
with creating, using, and maintaining an index. See “Understanding SAS Indexes”
on page 692 for more information about indexes and deciding whether to create
one.
The following statement does not explicitly specify an engine. In the output, notice
the Note about mixed engine types that is generated:
/* Engine not specified. */
Example Code 12.3 SAS Log Output from the LIBNAME Statement
NOTE: Directory for library FRUITS contains files of mixed engine types.
NOTE: Libref FRUITS was successfully assigned as follows:
Engine: V9
Physical Name: SAS-library
z/OS Specifics: In the z/OS operating environment, you do not need to specify an
engine for certain types of libraries.
See Chapter 37, “SAS Engines,” on page 803 for more information about SAS
engines.
set to optimize the sequential access method. To improve performance for direct
(random) access, you should change the value for BUFSIZE=.
Whether you use your operating environment's default value or specify a value,
the engine always writes complete pages regardless of how full or empty those
pages are.
If you know that the total amount of data is going to be small, you can set a small
page size with the BUFSIZE= option, so that the total data set size remains
small and you minimize the amount of wasted space on a page. In contrast, if
you know that you are going to have many observations in a data set, you
should optimize BUFSIZE= so that as little overhead as possible is needed. Note
that each page requires some additional overhead.
Large data sets that are accessed sequentially benefit from larger page sizes
because sequential access reduces the number of system calls that are required
to read the data set. Note that because observations cannot span pages,
typically there is unused space on a page.
“Calculating Data Set Size” on page 229 discusses how to estimate data set
size.
For more information, see “BUFSIZE= System Option” in SAS System Options:
Reference and the SAS documentation for your operating environment.
CATCACHE=
SAS uses this option to determine the number of SAS catalogs to keep open at
one time. Increasing its value can use more memory, although this might be
warranted if your application uses catalogs that are needed relatively soon by
other applications. (The catalogs closed by the first application are cached and
can be accessed more efficiently by subsequent applications.)
For more information, see “CATCACHE= System Option” in SAS System
Options: Reference and the SAS documentation for your operating environment.
COMPRESS=
One further technique that can reduce I/O processing is to store your data as
compressed data sets by using the COMPRESS= data set option. However,
storing your data this way means that more CPU time is needed to decompress
the observations as they are made available to SAS. But if your concern is I/O
and not CPU usage, compressing your data might improve the I/O performance
of your application.
For more information, see “COMPRESS= System Option” in SAS System
Options: Reference.
DATAPAGESIZE=
Beginning with SAS 9.4, the optimal buffer page size is increased to improve I/O
performance. The increase in page size might increase the size of the data set
or utility file. If you find that the current optimization processes are not ideal for
your SAS session, you can use DATAPAGESIZE=COMPAT93 to use the
optimization processes that were used prior to SAS 9.4.
For more information, see “DATAPAGESIZE= System Option” in SAS System
Options: Reference.
STRIPESIZE=
When data is stored in a RAID (Redundant Array of Independent Disks) device,
you can use the STRIPESIZE= system option to set the I/O buffer size for a
directory to be the size of a RAID stripe. SAS data sets or utility files that are
created in the directory have a page size that matches the RAID stripe size.
Using this option can improve the performance of individual disk.
Techniques for Optimizing I/O 225
If your SAS program consists of steps that read a SAS data set multiple times and
you have an adequate amount of memory so that the entire file can be held in real
memory, the program should benefit from using the SASFILE statement. Also,
SASFILE is especially useful as part of a program that starts a SAS server such as
a SAS/SHARE server. For more information about the SASFILE global statement,
see the SAS DATA Step Statements: Reference.
System Options
If memory is a critical resource, several techniques can reduce your dependence on
increased memory. However, most of them also increase I/O processing or CPU
usage.
You can use the MEMSIZE= system option to increase the amount of memory
available to SAS and therefore decrease processing time. By increasing memory,
you reduce processing time because the amount of time spent on paging, or
reading pages of data into memory, is reduced.
The SORTSIZE= and SUMSIZE= system options enable you to limit the amount of
memory that is available to sorting and summarization procedures.
You can also make tradeoffs between memory and other resources, as discussed in
“Reducing CPU Time By Modifying Program Compilation Optimization” on page
229. To use the I/O subsystem most effectively, you must use more and larger
buffers. However, these buffers share space with the other memory demands of
your SAS session.
Techniques for Optimizing CPU Performance 227
13
Support for Parallel Processing
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
What Is Threading Technology in SAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
How Is Threading Controlled in SAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Threading in Base SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
SAS/ACCESS Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
SAS Scalable Performance Data Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
SAS Intelligence Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
SAS High-Performance Analytics Portfolio of Products . . . . . . . . . . . . . . . . . . . . . . . . 238
SAS Grid Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
SAS In-Database Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
SAS In-Memory Analytics Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
SAS High-Performance Analytics Product Integration . . . . . . . . . . . . . . . . . . . . . . . . . 242
SAS Viya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Overview
SAS introduced threading technology starting in SAS 9 with the introduction of
several Base SAS procedures that had been enhanced to execute, in part, in
multiple threads. SAS has continued to develop and enhance products and
components that take advantage of the threaded processing capabilities provided
by proprietary internal subsystems. Threading is available on a variety of platforms
from a local desktop with multiple CPUs to high-performance platform servers.
These high-performance servers include large multi-core symmetric multi-processor
(SMP) systems and massively parallel processing (MPP) appliances typically
configured as a distributed cluster. Many SAS components that execute on these
platforms take advantage of threading technology.
With SAS 9.4M5, when you license SAS Viya, you can access SAS Cloud Analytic
Services (CAS), a distributed server environment that supports multithreaded, in-
memory processing. See “What is SAS Cloud Analytic Services?” on page 433 for
more information.
Previous releases of Base SAS 9.4 support programs written in the SAS DS2
programming language or the SAS Federated SQL language. These languages can
232 Chapter 13 / Support for Parallel Processing
take advantage of threading. Many other SAS products also use threading
technology. For example, the SAS High-Performance Analytics procedures, SAS
Stored Processes, and SAS Embedded Process either execute or generate code
that executes in high-performance distributed computing environments.
n REPORT
n SORT
n SUMMARY
n TABULATE
n SQL
For details, see “Threaded Processing for Base SAS Procedures” in Base SAS
Procedures Guide. For details of the thread-enabled SQL procedure, see the
SAS SQL Procedure User’s Guide. Details of SAS System Options, see the SAS
System Options: Reference.
Some procedures in SAS/STAT software are also thread-enabled and most of
them can run in either SMP or MPP mode. In SMP mode, NOTHREADS and
CPUCOUNT are honored. In MPP mode, the PERFORMANCE statement
provides the options to control threading. These are the thread-enabled
SAS/STAT procedures:
n ADAPTIVEREG
n FMM
n GLM
n GLMSELECT
n LOESS
n MIXED
n QUANTLIFE
n QUANTREG
n QUANTSELECT
n ROBUSTREG
See the SAS/STAT Procedures Guide for details for each procedure.
SAS Scalable Performance Data Engine
The SAS Scalable Performance Data Engine, which is included in Base SAS, is
engineered to exploit SMP hardware capabilities. The SAS Scalable
Threading in Base SAS 235
Performance Data Engine uses partitioned data sets that are optimized for
reading data in threads. The partition size can be configured with the SAS
Scalable Performance Data Engine PARTSIZE option. THREADNUM and
SPDEMAXTHREADS control threading for optimum threaded reads. The Base
SAS NOTHREADS and CPUCOUNT system options have no effect on SPD
Engine threaded reads. They remain in effect for the SAS thread-enabled
procedures executing on the SPD Engine data set. SPD Engine indexes are also
created in threads in parallel automatically without regard to NOTHREADS, if
set. You can use SPDEINDEXSORTSIZE= to optimize threaded index creation.
The SPD Engine is described in the SAS Scalable Performance Data Engine:
Reference.
SAS FedSQL Language
SAS FedSQL is a SAS proprietary SQL implementation based on the ANSI
SQL:1999 standard. It provides support for ANSI SQL data types and other ANSI
compliance features. The core strength of SAS FedSQL is its ability to execute
federated queries across a heterogeneous database environment and return a
single result set. FedSQL queries are automatically optimized with multi-
threaded algorithms in order to resolve large-scale operations. In addition,
FedSQL can execute outside of a SAS session, for example in the SAS
Federation Server and SAS Scalable Performance Data Server environments.
The NOTHREADS and CPUCOUNT options have no effect on FedSQL
processing.
The FedSQL procedure, which submits FedSQL programs for execution, is
included. See the SAS FedSQL Language Reference for complete information.
SAS DS2 Programming Language
DS2 is a SAS proprietary programming language that is appropriate for
advanced data manipulation and data modeling applications. DS2 is included
with Base SAS and intersects with the SAS DATA step but also supports
additional data types, ANSI SQL types, programming structure elements, user-
defined methods, and packages. The DS2 SET statement accepts embedded
FedSQL syntax and the runtime-generated queries can exchange data
interactively between DS2 and any supported database. This allows SQL
preprocessing of input tables which effectively combines the power of the two
languages.
DS2 programs are thread-enabled by using the THREAD statement on a
program coded for parallel execution. The NOTHREADS and CPUCOUNT
options have no effect. See the SAS 9.4 DS2 Language Reference for details
about whether your DATA step programs would benefit from being converted to
DS2.
The DS2 procedure, which submits thread-enabled DS2 programs to the SAS
Embedded Process for execution is also included. A high-performance version of
the DS2 procedure, PROC HPDS2, submits DS2 language statements to the
separately licensed High-Performance Analytics Server for processing. See the
SAS High-Performance Analytics Server Usage Guide for documentation on this
and other high-performance versions of certain SAS procedures.
DS2 can execute outside of a SAS session. For example:
n SAS Federation Server
SAS Logging
The SAS Logging Facility ignores the NOTHREADS and CPUCOUNT options. It
handles all incoming logging events in threads. The client identity that is
associated with the current thread or task is reported in the log. The logging
facility supports many SAS products and components, but it is included with
Base SAS. See the SAS Logging: Configuration and Programming Reference.
SAS Code Analyzer
The SAS Code Analyzer (SCAPROC procedure) runs an existing SAS program
(executing the program as usual) when generating metadata about the SAS job
that are recorded comments. PROC SCAPROC captures information about the
job step, I/O information such as file dependencies, and macro symbol usage
information from a running SAS job. The output is a SAS program containing
comments with the dependencies described in the comments. An application can
read this text and create SAS metadata or determine a process flow based on
these dependencies. For example, developers for SAS Data Integration Studio
can use the information emitted by the SAS Code Analyzer to reverse engineer
legacy SAS jobs. It can also be used with SAS Grid Manager. When the saved
job is run on the grid, SAS Grid Manager automatically assigns the identified
subtasks to a grid node. For more information, see the SCAPROC procedure
documentation in the Base SAS Procedures Guide.
SAS/ACCESS Engines
SAS/ACCESS engines are LIBNAME engines that provide Read, Write, and Update
access to more than 60 relational and nonrelational databases, PC files, data
warehouse appliances, and distributed file systems. These engines are not part of
Base SAS but they depend on Base SAS. They are licensed separately or are
included in many product bundles such as SAS BI Server or SAS Activity-Based
Management. Many bundles offer the customer a choice of two out of the many
SAS/ACCESS engines available.
SAS/ACCESS engines enable SAS programs to connect to a DBMS as if it were a
SAS data set. This takes advantage of performance-related DBMS features and
benefits including bulk load support, temporary table support, and native SQL
support with Explicit Pass-Through. If the DBMS is a parallel server, the engine
accesses the DBMS data in parallel by using multiple threads to connect to the
DBMS server. If your SAS program is executing a thread-enabled SAS procedure
with these SAS/ACCESS engines, even greater gains in performance are likely.
In SAS/ACCESS, threaded reads partition the result set across multiple threads.
Unlike threaded processing in Base SAS procedures, threaded reads in
SAS/ACCESS are not dependent on the number of processors on a machine.
Instead, the result set is retrieved on multiple connections between SAS and the
DBMS. SAS causes the DBMS to partition the result set by appending a WHERE
clause to the SQL statement. When this happens, a single SQL statement becomes
multiple SQL statements, one on each thread. The DBMS reads the partitions one
per thread also.
The amount of scalability that is provided with the SAS/ACCESS engines depends
on the efficiency of parallelization implemented in the DBMS itself. However,
SAS/ACCESS engines have options available in the LIBNAME statement that
enable tuning of the threaded implementation within the SAS/ACCESS engines.
The options that control threaded reads in SAS/ACCESS are DBSLICE,
SAS Intelligence Platform 237
n SAS Business Intelligence Server and SAS Data Integration Server technologies
Each server is initiated with a pool of active threads. These threads are controlled
by the server and are used by server processes (for example, handling incoming
238 Chapter 13 / Support for Parallel Processing
requests). If the NOTHREADS and CPUCOUNT options are specified, they are
ignored, except during the execution of submitted code that includes a SAS
procedure that honors these options.
For the SAS Metadata Server, thread usage is controlled by default settings for the
object server parameters (THREADSMAX and THREADSMIN) and for the metadata
server configuration option, MACACTIVETHREADS. Administrators can override
these settings in order to fine-tune performance. See the SAS Intelligence Platform:
System Administrator’s Guide for details and examples. The THREADMAX and
THREADMIN object server parameters are rarely used for servers other than the
SAS Metadata Server.
In the intelligent platform middle tier (which is an infrastructure for web applications),
incoming requests are processed on threads. These threads are defined using the
job execution service. The threads are not constrained by the NOTHREADS or
CPUCOUNT options. Both the number of job queue threads and number of job
execution threads can be specified. Refer to the SAS Intelligence Platform: Middle-
Tier Administration Guide.
SAS MP CONNECT is a part of SAS/CONNECT software that is bundled with the
intelligence platform. It supports parallel processing by establishing a connection
between multiple SAS sessions and enabling each of the sessions to
asynchronously execute tasks in parallel. By establishing connections to processes
on the same local computer, the application can use network resources to process
in parallel and coordinate all the results into the client SAS session. Many SAS
processes use multiple processors on an SMP computer, but they can also be
executed on multiple remote single or multiprocessor computers on a network.
Threads are always assumed to be available.
Some SAS High-Performance Analytics products can execute on the SAS
Intelligence Platform if it is configured as an MPP environment. For example, SAS
Grid Manager (discussed in the next section) handles workload management for
SAS applications that execute in SMP configurations. It can also manage
applications that are coded for parallel execution and distributed across the nodes
of the SAS High-Performance Analytics Server.
SAS Visual Analytics is a web-based suite of high-performance analytics
applications that executes on the SAS Intelligence Platform if it is configured as an
MPP environment. This execution environment for SAS Visual Analytics is
documented in the SAS Intelligence Platform: Middle-Tier Administration Guide.
(SAS Visual Analytics is discussed further in the next section. See “SAS High-
Performance Analytics Portfolio of Products” on page 238.)
executing on the SAS Intelligence Platform. The SAS Grid Manager can be used to
manage the workload of SAS jobs on the SAS High-Performance Analytic Server
running on a DBMS appliance such as EMC Greenplum or Teradata. And SAS
Visual Analytics can be used to explore data that is consumed by SAS Enterprise
Miner executing in a SAS Grid environment.
SAS High-Performance Analytics technologies include the following:
n SAS Grid Manager
n SAS In-Database
n SAS In-Memory Analytics technologies, which include:
o SAS High-Performance Analytics Server
o SAS Visual Analytics
o SAS High-Performance Risk Management
o other SAS high-performance products and solutions
your models, and then use SAS In-Database to push the models into an
appropriate database for scoring.
SAS Visual Analytics requires a dedicated and specialized configuration of blade
hardware such as Teradata or EMC Greenplum appliances or Hadoop HDFS
configured as an MPP cluster. This environment is always threaded. SAS
options CPUCOUNT and NOTHREADS have no effect. Instead, the NTHREADS
option in the PERFORMANCE statement provides a way to throttle thread
usage. See the the SAS Visual Analytics: User’s Guide for product information.
SAS Visual Analytics relies on the SAS LASR Analytic Server to provide a highly
scalable analytics infrastructure that is optimized for large volumes of data and
complex computations.
SAS LASR Analytic Server
The SAS LASR Analytic Server is an analytic platform that provides a secure
environment for concurrent access to data. It loads the data into memory across
the computing nodes of a SAS High-Performance Analytics Server. The SAS
LASR Analytic Server executes on the SAS High-Performance Analytics Server
root node with worker nodes across the appliance that read data into memory in
parallel very fast. If the data is not from a co-located data provider, then the data
is read from the DBMS appliance or Hadoop cluster and transferred to the root
node of the SAS High-Performance Analytics Server. Then, it is loaded into the
memory of the worker nodes. The SAS LASR Analytic Server is not influenced
by CPUCOUNT or NOTHREADS. Instead, the NTHREADSoption in the
PERFORMANCE statement throttles thread usage. Refer to the SAS LASR
Analytic Server: Administration Guide for details.
For more SAS In-Memory Analytics products, see Products & Solutions/In-Memory
Analytics.
SAS Viya
With SAS 9.4M5, you can license SAS Viya, software that offers a variety of high
performance products and access to SAS Cloud Analytic Services. For more
information, see An Introduction to SAS Viya Programming.
245
14
The SAS Registry
This configuration data is stored in a hierarchical form. The form works in a manner
similar to how directory-based file structures work under the operating environments
in UNIX and Windows, and under the z/OS UNIX System Services (USS).
Note: Host printers are not referenced in the SAS registry.
246 Chapter 14 / The SAS Registry
This method prints the registry to the SAS log, and it produces a large list that
contains all registry entries, including subkeys. Because of the large size, it
might take a few minutes to display the registry using this method.
For more information about how to view the SAS registry, see the REGISTRY
PROCEDURE in “REGISTRY Procedure” in Base SAS Procedures Guide. Base
SAS Procedures Guide.
value
the names and content associated with a key or subkey. There are two
components to a value, the value name and the value content, also known as a
value datum.
Figure 14.1 Section of the Registry Editor Showing Value Names and Value Data for the
Subkey 'HTML'
.SASXREG file
a text file with the file extension .SASXREG that contains the text representation
of the actual binary SAS Registry file.
1. The Sashelp part of the registry contains settings that are common to all users at your site. Sashelp is Write protected,
and can be updated only by a system administrator.
Managing the SAS Registry 249
n modifying printer settings from the default printer settings that your system
administrator provides for you
n changing localization settings
n If you delete the registry file called regstry.sas7bitm, which is located in the
Sasuser library, then SAS restores the Sasuser registry to its default settings.
CAUTION! Do not delete the registry file that is located in Sashelp; this
prevents SAS from starting.
1 Start SAS Explorer with the EXPLORER command, or select View ð Explorer.
5 Click Unhide.
If there is no icon associated with ITEMSTOR in the Type list, then you are
prompted to select an icon.
8 Select Copy from the pop-up menu and copy the Regstry file. SAS assigns the
name Regstry_copy to the file.
Operating Environment Information: You can also use a copy command from
your operating environment to make a copy of your registry file for backup
purposes. When viewed from outside SAS Explorer, the filename is
regstry.sas7bitm. Under z/OS, you cannot use the environment copy
command to copy your registry file unless your Sasuser library is assigned to an
HFS directory.
2 Select the top-level key in the left pane of the registry window.
4 Enter a name for your registry backup file in the filename field. (SAS applies the
proper file extension name for your operating system.)
5 Click Save.
This saves the registry backup file in Sasuser. You can control the location of your
registry backup file by specifying a different location in the Save As window.
2 Rename your backup file to regstry.sas7bitm, which is the name of your registry
file.
3 Copy your renamed registry file to the Sasuser location where your previous
registry file was located.
4 Click Open.
5 Restart SAS.
1 Open the Program editor and submit the following program to import the registry
file that you created previously.
proc registry import=<registry file specification>;
run;
2 If the file is not already properly named, then use Explorer to rename the registry
file to regstry.sas7bitm:
3 Restart SAS.
1 Rename the damaged registry file to something other than “registry” (for
example, temp).
5 Start the Registry Editor with the REGEDIT command. Select Solutions ð
Accessories ð Registry Editor ð View All.
7 Close your SAS session and rename the modified registry back to the original
name.
8 Open a new SAS session to see whether the changes fixed the problem.
252 Chapter 14 / The SAS Registry
4 Enter the color name in the Value Name field and the RGB value in the Value
Data field.
5 Click OK.
After you add these colors to the registry, you can use these color names anywhere
that you use the color names supplied by SAS. For example, you could use the
color name in the GOPTIONS statement as shown in the following code:
goptions cback=anaranjado;
proc gtestit;
run;
Many of the windows in the SAS windowing environment update the registry for you
when you make changes to such items as your printer setting or your color
preferences. Because these windows update the registry using the correct syntax
and semantics, it is often best to use these alternatives when making adjustments to
SAS.
2 Enter all or part of the text string that you want to find, and click Options to
specify whether you want to find a key name, a value name, or data.
3 Click Find.
254 Chapter 14 / The SAS Registry
1 In the left pane of the Registry Editor window, click the key that you want to
change. The values contained in the key appear in the right pane.
The Registry Editor displays several types of windows, depending on the type of
value that you are changing.
Managing the SAS Registry 255
Figure 14.3 Example Window for Changing a Value in the SAS Registry
2 From the pop-up menu, select the New menu item with the type that you want to
create.
3 Enter the values for the new key or value in the window that is displayed.
Figure 14.4 Registry Editor with Pop-up Menu for Adding New Keys and Values
256 Chapter 14 / The SAS Registry
2 Select Rename from the pop-up menu and enter the new name.
3 Click OK.
1 Select TOOLS ð Options ð Registry Editor This opens the Select Registry
View group box.
2 Select View All to display the Sasuser and Sashelp items separately in the
Registry Editor's left pane.
n The Sashelp portion of the registry is listed under the
HKEY_SYSTEM_ROOT folder in the left pane.
n The Sasuser portion of the registry is listed under the HKEY_USER_ROOT
folder in the left pane.
Managing the SAS Registry 257
Note: In order to first create the backup registry file, you can use the REGISTRY
Procedure or the Export Registry File menu choice in the Registry Editor.
1 In the left pane of the Registry Editor, select the key that you want to export to a
SASXREG file.
To export the entire registry, select the top key.
4 Click Save.
258 Chapter 14 / The SAS Registry
Table 14.1 Registry Locations for Commonly Used Explorer Configuration Data
4 Click OK.
5 Verify that the file shortcut was created successfully and enter the REGEDIT
command.
8 Edit the exported file and replace all instances of HKEY_USER_ROOT with
HKEY_SYSTEM_ROOT.
run;
5 Click OK.
6 Issue the REGEDIT command after verifying that the library was created
successfully.
12 Right-click the file and select Edit in NOTEPAD to edit the file.
13 Edit the exported file and replace all instances of “HKEY_USER_ROOT” with
“HKEY_SYSTEM_ROOT”.
14 To apply your changes to the site's Sashelp use PROC REGISTRY. The
following code imports the file:
proc registry import="yourfile.sasxreg" usesashelp;
run;
n Required field values for libref assignment in the SAS Registry are invalid. For
example, library names are limited to eight characters, and engine values must
match actual engine names.
n Encrypted password data for a libref has changed in the SAS Registry.
Note: You can also use the New Library window to add librefs. You can open this
window by typing DMLIBASSIGN in the toolbar, or selecting File ð New from the
Explorer window.
CAUTION! You can correct many libref assignment errors in the SAS Registry
Editor. If you are unfamiliar with librefs or the SAS Registry Editor, then ask for technical
support. Errors can be made easily in the SAS Registry Editor, and they can prevent
your libraries from being assigned at startup.
To correct a libref assignment error using the SAS Registry Editor:
2 Select one of the following paths, depending on your operating environment, and
then make modifications to keys and key values as needed:
CORE\OPTIONS\LIBNAMES
or
CORE\OPTIONS\LIBNAMES\CONCATENATED
Note: These corrections are possible only for permanent librefs. That is, those that
are created at startup by using the New Library or File Shortcut Assignment window.
For example, if you determine that a key for a permanent, concatenated library has
been renamed to something other than a positive whole number, then you can
rename that key again so that it is in compliance. Select the key, and then select
Rename from the pop-up menu to begin the process.
262 Chapter 14 / The SAS Registry
263
15
Printing with SAS
Universal Printing
menu. These options can be set only in a SAS configuration file or at start-up. You
cannot enable or disable Universal Printing menus and dialog boxes after SAS
starts.
Include the following system options when you start SAS:
-uprint -uprintmenuswitch
GIF Graphics An image format designed for the online transmission and
Interchange interchange of graphic data. The format is widely used to
Format display images on the World Wide Web because of its
smaller size and portability.
Universal Printing 267
PDF Portable A file format developed by Adobe Systems for viewing and
Document printing a formatted document. To view a file in PDF format,
Format you need Adobe Reader, a free application distributed by
Adobe Systems.
Note: Adobe Acrobat is not required to produce PDF
files with Universal Printing.
TIFF Tagged Image An Adobe raster image format that supports both image and
File Format data in a single file. The TIFF Universal Printer supports
RGBA color printing and transparency. The TIFFk Universal
Printer supports CYMK color printing.
You set the value of the PRINTERPATH= system option to a Universal Printer or
use ODS statements to create output in one of the above formats. When the
PRINTERPATH= system option is set to a printer that prints to a file, the default
filename is sasprt.extension. extension is the printer format type. Here are some
example filenames: sasprt.pdf, sasprt.emf, sasprt.png, and sasprt.gif. The file is
written to the current directory.
You can use the PRINTERPATH= system option to change the location and the
name of the file. Here is an example:
options printerpath=(svg out);
filename out 'c:\myimages\graph1.svg';
268 Chapter 15 / Printing with SAS
For more information, see “QDEVICE Procedure” in Base SAS Procedures Guide.
To print a list of printer prototypes to the SAS log, submit this SAS program:
filename registry temp;
proc printto log=registry;
run;
data protypes;
keep prototype;
infile registry lrecl=300 pad;
length line $300;
input line $300.;
if substr(line,1,1) = "["
then do;
prototype = strip(substr(line,2,length(line)-2));
if index(prototype,'core\printing\prototypes') ne 0
then delete;
else
output;
end;
run;
For more information, see “REGISTRY Procedure” in Base SAS Procedures Guide.
Universal Printing 269
19 proc qdevice;
20 printer gif;
21 run;
Name: GIF
Description: Graphics Interchange Format RGB Color/Alpha Blending
Module: SASPDGIF
Type: Universal Printer
Registry: SASHELP
Prototype: GIF
Default Typeface: Cumberland AMT
Typeface Alias: Courier
Font Style: Regular
Font Weight: Normal
Font Height: 8 points
Font Version: Version 1.03
Maximum Colors: 16777216
Visual Color: True Color
Color Support: RGBA
Destination: sasprt.gif
I/O Type: DISK
Data Format: GIF
Height: 6.25 inches
Width: 8.33 inches
Ypixels: 600
Xpixels: 800
Rows(vpos): 50
Columns(hpos): 114
Left Margin: 0 inches
Minimum Left Margin: 0 inches
Right Margin: 0 inches
Minimum Right Margin: 0 inches
Bottom Margin: 0 inches
Minimum Bottom Margin: 0 inches
Top Margin: 0 inches
Minimum Top Margin: 0 inches
XxY Resolution: 96x96 pixels per inch
Compression Enabled: Always
Compression Method: LZW
Font Embedding: Never
Animation: Enabled
The QDEVICE procedure does not report all printer settings.For a description of the
printer settings that can be reported, see “QDEVICE Procedure” in Base SAS
Procedures Guide.
270 Chapter 15 / Printing with SAS
Document Formats
Graphic Formats
* You must have SAS/GRAPH installed to create drill-down regions in a graph created by the PDF
Universal Printer. For more information, see “Adding Drill-Down Graphs in Your PDF File” in
SAS/GRAPH: Reference.
Universal Printing 271
options orientation=portrait;
footnote 'PROC PRINT in Portrait Orientation';
proc print data=sashelp.class;
run;
options orientation=landscape;
footnote 'PROC SGSCATTER in Landscape Orientation';
proc sgscatter data=sashelp.cars;
matrix mpg_city enginesize horsepower /
diagonal=(histogram kernel);
run;
options orientation=portrait;
footnote 'PROC MEANS in Portrait Orientation';
proc means data=sashelp.cars n mean;
272 Chapter 15 / Printing with SAS
Figure 15.1 Page Three of an SVG Document Showing the Landscape Orientation
Universal Printing 273
Figure 15.2 Page Four of an SVG Document Showing the Portrait Orientation
Supports
Universal Printer Color Support Transparency
Supports
Universal Printer Color Support Transparency
SASEMF RGBA only for bitmap images. RGB for Yes, for bitmap
vector elements. images
TIFFk CYMK No
For information about CYMK, RGB, and RGBA colors, see “CMYK Colors” on page
274 and “RGB and RGBA Colors” on page 276.
CMYK Colors
CMYK colors setting specify eight hexadecimal characters with a value of 0–255 to
specify the amount of cyan, magenta, yellow, and black ink. Use your printer’s
Pantone Color Lookup table to find the CMYK values for your printer. If you specify
an unsupported color, such as a CMYK color with an EMF printer, the color is
converted to a color that is supported.
You can specify CMYK colors where ever colors can be set (for example, in the
PROC PRINT statement STYLE option or in the TITLE statement).
Preface the hexadecimal number with a CMYK or a K. Here are some examples of
CMYK colors that you can set in SAS:
Universal Printing 275
Hexadecimal
Representation Color
cmykFF000000 cyan
k00FF0000 magenta
cmyk0000FF00 yellow
kFFFF0000 blue
cmykFF00FF00 green
k00FFFF00 red
k000000FF black
The first byte of the hexadecimal number represents cyan. The second byte
represents magenta. The third byte represents yellow. The fourth byte represents
black.
This example uses the STYLE option to set the column background color to
magenta and sets the foreground color to white. The TITLE statement sets the
output title to blue.
options obs=5 nodate;
ods html close;
ods pdf;
proc print data=sashelp.demographics label
style(header)={background=cmyk00ff0000 foreground=k00000000} noobs;
var name pop;
label name=Country Name pop=Population;
title color=kffff0000 'Demographics 2005';
run;
ods pdf close;
ods html;
276 Chapter 15 / Printing with SAS
Note: When you use the TIFFk Universal Printer, specify the
UPRINTCOMPRESSION system option to avoid very large TIFF files.
run;
title;
ods printer close;
ods html;
n The PRINT procedure uses an RGBA color value for the background of the table
header and formats the salary variable using the PCT. format.
options nodate;
278 Chapter 15 / Printing with SAS
data _null_ ;
call execute('proc format fmtlib ; value pct');
max=10000;
maxloop=255;
do i=1 to maxloop by 10;
color='RGBA'||put(((maxloop)/(maxloop+i)*200),hex2.)
||put(((maxloop)/(maxloop+i)*235),hex2.)
||put(((maxloop)/(maxloop+i)*255),hex2.)||'95';
from=max;
to=(max+3000);
max=max+3000;
/* Create salary ranges of $3000.00 equal to the calculated RGBA color value.*/
call execute(put(from,best.)||'-'||put(to,best.)||'='||quote(color));
end;
/* Create RGBA values for missing values and values outside the salary ranges. */
call execute('.="RGBAF7F5F0480" other="RGBAFF2A2A88"; run;');
run;
data staff;
infile datalines dlm='#';
input Name $16. IdNumber $ Salary
Site $ HireDate date7.;
format hiredate date7.;
datalines;
Capalleti, Jimmy# 2355# 21163# BR1# 30JAN09
Chen, Len# 5889# 20976# BR1# 18JUN06
Davis, Brad# 3878# 19571# BR2# 20MAR84
Leung, Brenda# 4409# 34321# BR2# 18SEP94
Martinez, Maria# 3985# 49056# US2# 10JAN93
Orfali, Philip# 0740# 50092# US2# 16FEB03
Patel, Mary# 2398# 35182# BR3# 02FEB90
Smith, Robert# 5162# 40100# BR5# 15APR66
Sorrell, Joseph# 4421# 38760# US1# 19JUN11
Zook, Carla# 7385# 22988# BR3# 18DEC10
;
run;
ods html close;
ods pdf file='outpdf.pdf';
proc print data=staff noobs label
style(HEADER)={background=rgbac7eafe95 fontstyle=italic}
style(DATA)={foreground=black};
var name IdNumber ;
var salary /style(DATA)={background=pct.};
label IdNumber='Employee Number' salary='Salary in U.S. Dollars';
format salary dollar7.;
title 'Generated Colors for the Variable Salary';
run;
ods pdf close;
Universal Printing 279
Example Code 15.1 Static and Varying Background Color in a Table Using RGBA Colors
525
526 data staff;
527 infile datalines dlm='#';
528 input Name $16. IdNumber $ Salary
529 Site $ HireDate date7.;
530 format hiredate date7.;
531 datalines;
542 ;
543 run;
544
Universal Printing 281
545 /* Close the HTML destination and open the PDF destination.*/
546 /* Format the header background using an RGBA color. */
547 /* Use the PCT. format to format the salary variable. */
548
549 ods html close;
550 ods pdf file='outpdf.pdf';
NOTE: Writing ODS PDF output to DISK destination "c:\public\mySASPrograms
\outpdf.pdf",
printer "PDF".
551 proc print data=staff noobs label
552 style(HEADER)={background=rgbac7eafe95 fontstyle=italic}
553 style(DATA)={foreground=black};
554 var name IdNumber ;
555 var salary /style(DATA)={background=pct.};
556 label IdNumber='Employee Number' salary='Salary in U.S. Dollars';
557 format salary dollar7.;
558 title 'Generated Colors for the Variable Salary';
559 run;
NOTE: There were 10 observations read from the data set Work.Staff.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
For more information, see “COLOPHON= System Option” in SAS System Options:
Reference.
Print DMPRINT
Action Command
Setting Up Printers
2 Select the new default device from the list of printers in the Printer field.
3 Click OK.
2 Select the printer that you want to delete from the list of printers in the Printer
field
3 Click Remove.
Note: Only your system administrator can remove printers that the administrator
has defined for your site. If you select a printer that was defined by your system
administrator, the Remove button is unavailable.
Alternatively, you can issue the DMPRINTSETUP command.
2 Enter the name and a description for the new printer (127-character maximum,
no backslash characters, not case sensitive).
The printer name is required. The description is optional.
Select a printer model. If your exact printer model is not available, select a
general model that is compatible with your printer. For example, for the HP
LaserJet printer series, select PCL5 for monochrome printers or PCL5c for color
printers.
Note: General models might provide fewer options than specific models.
Figure 15.7 Printer Definition Window to Select Printer Model
5 Select the Device type for your print output. Put this sentence in a paragraph
under the numbered item
The device type selections are host-dependent.
If you select Catalog, Disk, Ftp, Socket, or Pipe as the device type, then you
must specify a destination.
If you select a device type of Printer, then a destination might not be required,
because some operating environments use the Host options box to route output.
Note: Examples for your operating system of Device Type, Destination, and
Host options are also provided in “Sample Values for the Device Type,
Destination, and Host Options Fields” on page 307.
The destination is the target location for your device type. For example, if your
device type is disk, then your destination is an operating environment-specific
filename. With some system device types, the destination might be blank and
you can specify the target location using the Host options box.
7 Select or enter any host-specific options for the device that you chose.
This field might be optional for your operating environment. For a list of host
options, see the FILENAME statement information for your operating
environment.
Note: The Destination and Host Options lists can also be populated using the
REGISTRY procedure. Click the Help button in step 3 to see the “Populating
Destination and Host Option Lists” topic, which contains more details.
8 Click Next to proceed to Step 4 of the wizard, in which you select from a list of
installed print previewers.
If no previewers are defined, proceed to the next step of the wizard.
288 Chapter 15 / Printing with SAS
If the previewer selection box appears, select the previewer for this printer. If you
do not need a previewer, choose None or leave the field blank.
Note: You can add a previewer to any printer through the DMPRTCREATE
PREVIEWER command. For more information, see “Define a New Previewer” on
page 294.
Note: It is not required that printers and print previewers share a common
language.
10 Click Previous to change any information. Click Finish when you have
completed your printer definition.
After you have returned to the Print Setup window, you can test your default printer
by clicking Print Test Page.
Note: You can also use the PRTDEF procedure to define a printer
programmatically. For more information, see “Managing Universal Printers Using the
PRTDEF Procedure” on page 302.
n advanced features such as translation tables, printer resolution, and the print
previewer associated with the printer
To change printer properties for your default printer, follow these steps:
2 From the Printer Properties window, select the tab that contains the information
that you need to modify.
n In the Name tab, you can modify the printer name and the printer description.
Note: The printer name is not case sensitive. If you change only the casing,
the printer name change fails. To change the case of the printer name, you
can delete the printer and re-create it with the new casing. You can also
modify the name of the printer, save the modifications, and then change the
name again to the name and casing that you want.
Figure 15.11 Printer Properties Window Displaying Name Tab
n The Destination tab enables you to designate the device type, destination,
and host options for the printer. See “Sample Values for the Device Type,
Destination, and Host Options Fields” on page 307 for examples.
290 Chapter 15 / Printing with SAS
n The Font tab controls the available font options. The selections available in
the drop-down boxes are printer specific. The font size is in points.
Note: This window enables you to set attributes for the default fonts.
Typically, procedure output is controlled by the fonts specified by the ODS
style or by program statements that specify font attributes.
Figure 15.13 Printer Properties Window Displaying Font Tab
n The Advanced tab lists the Resolution, Protocol, Translate table, Buffer size,
Previewer, and Preview command options for the printer. The information in
the drop-down fields is printer specific.
Configuring Universal Printing Using the Windowing Environment 291
Resolution
specifies the resolution for the printed output in dots per inch (dpi).
Protocol
provides the mechanism for converting the output to a format that can be
processed by a protocol converter that connects the EBCDIC host
mainframe to an ASCII device. Protocol is required in the z/OS operating
environment, and if you must use one, select one of the protocol
converters that are listed.
Translate table
manages the transfer of data between an EBCDIC host and an ASCII
device. Normally, the driver selects the correct table for your locale; the
translate table needs to be specified only when you require nonstandard
translation.
Buffer size
controls the size of the output buffer or record length. If the buffer size is
left blank, a default size is used.
Previewer
specifies the Previewer definition to use when Print Preview is requested.
The Previewer box contains the previewer application that you have
defined. See “Define a New Previewer” on page 294.
Preview command
is the command that is used to open an external printer language viewer.
For example, if you want Ghostview as your previewer, type ghostview
%s. When a Preview Command is entered into a Printer definition, the
printer definition becomes a previewer definition. The Preview Command
must a valid command. When the command is executed as part of the
preview process the %s are replaced with the name of a temporary file
that contains the input for the preview command.
Note: The Previewer and Preview Command fields are mutually exclusive.
When you enter a command path into the Preview Command field, the
Previewer box is dimmed.
292 Chapter 15 / Printing with SAS
Note: If the printer name contains blanks, you must enclose it in quotation marks.
You can get a list of printers that are currently defined from two places:
n The list of printers in the Printer field of the Print Setup window.
n Submit this code:
proc qdevice out=printers;
printer _all_;
run;
You can also override the printer destination by specifying a fileref with the
PRINTERPATH= system option:
options printerpath= (myprinter printout);
filename printout path;
1 Select File ð Print Setup and choose Print Test Page to open the Print Setup
window.
2 Select the printer for which you would like a test page from Printer list view.
A print window appears. Your print window might differ from the window that
follows.
Alternatively, you can issue the DMPRINT command.
Figure 15.15 Print Window
3 If the Use Forms check box is visible, clear it in order to use Universal Printing.
4 From the Printer group box, select the name of the printer definition.
6 If you want to save your print job to a file, follow these steps:
Note: If you print to an existing file, the contents of the file are either overwritten
or appended, depending on whether you choose replace or append from the
open print window. Most viewers for EMF, GIF, PNG, SVG, and TIFF files do not
view appended files. When append is selected with a PDF printer, a merged
PDF file is not produced.
Selected lines of text in a window Select the text that you want to print, and
then open the Print window. In the Page
Note: not available on z/OS Options box, check the Print Selected
Text box.
A range of pages or other individual Select Range and enter the page
pages numbers in the Pages field. Separate
individual page numbers and page ranges
with either a comma (,) or a blank. You
can enter page ranges in any of these
formats:
n n–m prints all pages from n to m,
inclusive.
n –n prints all pages from page 1 to page
n.
n n– prints all pages from page n to the
last page.
8 Click OK to print.
2 Enter the name and a description for the new previewer (127-character
maximum, no backslashes, not case sensitive).
The previewer name is required. The description is optional.
4 Select the printer model that you want to associate with your previewer
definition.
The PostScript, PCL, or PDF language generated for the model must be a
language that your external viewer package supports. For best results, select the
generic models such as PostScript Level 1 (Color) or PCL 5.
8 Click Previous to correct any information. Click Finish when you have finished
defining your default previewer.
The newly defined previewer displays a previewer icon in the Print Setup window.
This previewer application can be tested with the Print Test Page button on the
Print Setup window.
2 Select a tab to open windows that control various aspects of your printed output.
Descriptions of the tabbed windows follow.
The Page Setup window consists of four tabs: General, Orientation, Margins, and
Paper.
n The General tab enables you to change the options for Binding, Collate,
Duplex, and Color Printing.
Figure 15.21 Page Setup Window Displaying the General Tab
Binding
specifies the binding edge (Long Edge or Short Edge) to use with duplexed
output. This sets the Binding option.
Collate
specifies whether the printed output should be collated. This sets the Collate
option.
Duplex
specifies whether the printed output should be single-sided or double-sided.
This sets the Duplex option.
Color Printing
specifies whether output should be printed in color. This sets the
COLORPRINTING option.
n The Orientation tab enables you to change the output's orientation on the page.
The default is Portrait. This tab sets the ORIENTATION option.
Configuring Universal Printing Using the Windowing Environment 299
n The Margin tab enables you to change the top, bottom, left, and right margins
for your pages. The value range depends on the type of printer that you are
using. The values that are specified on this tab set the TOPMARGIN,
BOTTOMMARGIN, LEFTMARGIN, and RIGHTMARGIN options.
Figure 15.23 Page Setup Window Displaying the Margins Tab
n The Paper tab specifies the Size, Type, Source, and Destination of the paper
used for the printed output.
300 Chapter 15 / Printing with SAS
Size
specifies the size of paper to use by setting the PAPERSIZE option. Paper
sizes include Letter, Legal, A4, and so on.
Type
specifies the type of paper to use. Examples of choices include Standard,
Glossy, and Transparency. This sets the PAPERTYPE option.
Source
designates which input paper tray is to be used. This sets the
PAPERSOURCE option.
Destination
specifies the bin or output paper tray that is to be used for the resulting
output. This sets the PAPERDEST option.
Note: Page settings are stored in the SAS registry. Although your page settings
should remain in effect from one SAS session to another, changing default printers
could lose, change, or disable some of the settings. If you change printers during a
SAS session, check the Page Setup window to ensure that all of your settings are
valid for your new default printer.
LEFTMARGIN= Specifies the size of the margin on the left side of the
page.
RIGHTMARGIN= Specifies the size of the margin on the right side of the
page.
TOPMARGIN= Specifies the size of the margin at the top of the page.
Introduction
These examples show you how to use the PRTDEF procedure to define new
printers and to manage your installed printers and previewers.
After a program statement containing the PRTDEF procedure runs successfully, the
printers or previewers that have been defined appear in the Print Setup window. A
complete set of all available printers and previewers appear in the Printer name list.
Printer definitions can also be viewed in the Registry Editor window under CORE
\PRINTING\PRINTERS.
After you create the data set containing the variables, you run the PRTDEF
procedure. The PRTDEF procedure creates the printers that are named in the data
set by creating the appropriate entries in the SAS registry.
proc prtdef data=printers usesashelp replace;
run;
The USESASHELP option specifies that the printer definitions are to be placed in
the Sashelp library, where they are available to all users. If the USESASHELP
option is not specified, then the printer definitions are placed in the current Sasuser
library, where they are available to the local user only. The printers that are defined
are available only in the local Sasuser directory. However, to use the USESASHELP
option, you must have permission to write to the Sashelp library.
The REPLACE option specifies that the default operation is to modify existing printer
definitions. Any printer name that already exists is modified by using the information
in the printer attributes data set. Any printer name that does not exist is added.
Note: To preview output for this printer, you must create a Ghostview printer
definition. You can do this either in the Preview Definition Wizard (Figure 15.14 on
page 291), on the Advanced tab of the Printer Properties window (Figure 15.18 on
page 296) or by using the PRTDEF procedure.
Here is a Ghostview printer definition using the PRTDEF procedure:
data gsview;
name = "Ghostview";
desc = "Print Preview with Ghostview";
model= "Tek Phaser 780 Plus";
viewer = 'gv %s';
device = "dummy";
dest = " ";
The PROC PRTDEF statement LIST option specifies to write the printer definition to
the log.
Note: You must specify a preview command either in the Preview Definition Wizard
(Figure 15.14 on page 291) or on the Advanced tab of the Printer Properties
window (Figure 15.18 on page 296). An example of a preview command is
ghostview ‑bg white ‑fg black ‑magstep ‑2 –nolabel %s
For more information about print previewers see, “Creating PostScript Previewer
Definitions” on page 306.
n The OPCODE variable specifies what action (Add, Delete, or Modify) to perform
on the printer definition.
n The first Add operation creates a new printer definition for Color PostScript in the
registry and the second Add operation creates a new printer definition for
ColorPS in the registry.
n The Mod operation modifies the existing printer definition for LaserJet 5 in the
registry.
n The Del operation deletes the printer definitions for printers named “Gray
PostScript” and “test” from the registry.
The following example creates a printer definition in the Sashelp library. Because
the definition is in Sashelp, the definition becomes available to all users. Special
system administration privileges are required to write to the Sashelp library. An
individual user can create a personal printer definition by specifying the Sasuser
library instead.
data printers;
infile datalines dlm='#';
length name $ 80
model $ 80
device $ 8
dest $ 80
opcode $ 3;
input opcode $ name $ model $ device $ dest $ ;
datalines;
add# Color PostScript F2# PostScript Level 2 (Color)# DISK# sasprt.ps
mod# LaserJet 5# PCL 5c (DeltaRow)# DISK# sasprt.pcl
del# Gray PostScript# PostScript Level 2(Gray Scale)# DISK# sasprt.ps
del# test# PostScript Level 2 (Color)# DISK# sasprt.ps
add# ColorPS# PostScript Level 2 (Color)# DISK# sasprt.ps
;
306 Chapter 15 / Printing with SAS
Note: If the end user modifies and saves new attributes for an administrator-defined
printer in the Sashelp library, the printer becomes a user-defined printer in the
Sasuser library. Values that are specified by the user override the values that were
set by the administrator. If the user-defined printer definition is deleted, the
administrator-defined printer reappears.
n The MODEL variable specifies the printer prototype to use when defining this
printer.
n The VIEWER variable specifies the host system commands for print preview.
The following program creates a print previewer definition for using Adobe Acrobat
Reader:
data adobeR;
name = "myAdobeReader";
desc = "Adobe Reader Print Preview";
model= "PDF Version 1.2";
viewer = "'c:\Program Files\Adobe\Reader 9.0\Reader\AcroRd32.exe' %s.pdf";
device = "dummy";
dest = " ";
run;
proc prtdef data=adobeR list replace;
run;
The following program creates a print previewer definition for using Ghostview:
data gsview;
name = "MyGhostview";
desc = "Print Preview with Ghostview";
model= "PostScript Level 2 (Color)";
viewer = 'ghostview %s';
device = "dummy";
dest = " ";
Managing Universal Printers Using the PRTDEF Procedure 307
run;
proc prtdef data=gsview list replace;
run;
The following example shows how to back up four printer definitions (named PDF,
postscript, PCL5, and PCL5c) using the PRTEXP procedure:
proc prtexp out=printers;
select PDF postscript PCL5 PCL5c;
run;
For more information, see “PRTEXP Procedure” in Base SAS Procedures Guide.
o z/OS
n Device type: FTP
n Destination: ftp.out
n Host options: host='nodename' recfm=vb prompt
n Device type: Printer
n Destination: printer name
n Host options: (leave blank)
o Windows
n Device type: FTP
n Destination: ftp.out
n Host options: host='nodename' prompt
o UNIX
n Device type: FTP
n Destination: filename.ext
n Host options: host='nodename' prompt
n Device Type: Socket
Forms Printing
You can move between fields on a frame with the TAB key.
After you finish defining or editing your form, issue the END command to save your
changes and exit the FORM window.
Note: Turning on Forms by checking the Use Forms check box in the print window
turns Universal Printing off for printing non-graphic windows.
Operating Environment Information: For more information about printing with
Forms, see the documentation for your operating environment.
Rendering Fonts
Universal printing uses the following two methods to generate and display fonts in
SAS output.
n the FreeType library
n Type1 fonts
Note: Universal Printing and SAS/GRAPH do not support double-byte Type1 fonts.
The output methods in the following table are recommended because they use the
FreeType library to render fonts. This means that they can render fonts in all of the
operating environments that SAS supports. 1
Table 15.9 Recommended Devices (because they use the FreeType library to render fonts)
JPEG
SASEMF, SASWMF**
1. The FreeType library is used to perform two distinct operations in SAS: measuring the text and rendering the font.
Depending on the output devices specified, the FreeType library can perform one or both of these operations. to render
fonts.
Using Fonts with Universal Printers and SAS/GRAPH Devices 311
GIF
PDF, PDFA
PostScript
SVGANIM
* If the NOFONTRENDERING option is set, the device driver uses only the FreeType library for
measuring the text. See “FONTRENDERING= System Option” in SAS System Options: Reference
** These devices use the FreeType library only for measuring text. The final font rendering is done by
an application such as Microsoft Word, which displays the output using system installed fonts.
You can specify the QDEVICE procedure to see a list of supported fonts. For a more
detailed example, see Example 5: Generate a Font Report.
proc qdevice;
run;
BMP
EMF, WMF
ZGIF
312 Chapter 15 / Printing with SAS
Table 15.11 Devices That Use Either FreeType Font-Rendering or Host Font-Rendering
n Not all printers support font embedding. To determine whether the printer that
you are using supports font embedding, use the QDEVICE procedure . If Font
Embedding is listed in the SAS log with a value of Option or Always, then the
printer supports font embedding.
proc qdevice report=general;
printer pdf;
run;
n multilingual
314 Chapter 15 / Printing with SAS
n monolingual Asian
Windows Glyph List (WGL) fonts are also called Pan-European Character Set
Fonts. These fonts are about the same shape and size as the Microsoft fonts and
can be substituted for the Microsoft fonts without changing formatting or paging. The
following table shows the SAS font and the compatible Microsoft font.
Table 15.12 Windows Glyph List (WGL) and Compatibility with Microsoft
Compatibility with
Font Name Font Description Microsoft Font
* SAS Monotype Sorts is an ornamental font consisting of shapes, symbols, and decorative glyphs that
have no one-to-one mapping to Microsoft TrueType or Adobe Type1 fonts. However, the SAS
Monotype Sorts font closely resembles Microsoft "Wingdings" TrueType and Adobe "ITC Zapf
Dingbats" Type1 fonts.
** These fonts have special glyphs for the Latin characters 0, <. =, C, D, L, M, N, P, R, S, U, V ,W, X, Z,
and a-z. All other characters are undefined and might be rendered as a rectangle. For example, in
Using Fonts with Universal Printers and SAS/GRAPH Devices 315
the HTML destination, the rectangle is replaced with the matching Latin1 character when it is
displayed in Internet Explorer.
* In SAS 9.4, the Arial Unicode MS and Times New Roman fonts replace the Monotype Sans WT and
Thorndale Duospace WT fonts.
In SAS 9.4M5, the following new AvenirNextforSAS replaces the Avenir Next fonts
that were added in a previous maintenance release.
AvenirNextforSASBold
AvenirNextforSASBoldItalic
AvenirNextforSASLight
AvenirNextfor SASLightItalic
In SAS 9.4M5, the following new HelveticaNeueforSAS replace the Helvetica fonts
that were added in a previous maintenance release.
316 Chapter 15 / Printing with SAS
HelveticaNeueforSASBold
HelveticaNeueforSASBoldItalic
HelveticaNeueforSASlightItalic
HelveticaNeueforSASLight
* HeiT, MingLiU, MingLiU_HKSCS, and PMingLiu support HKSCS2004 (Hong Kong Supplemental
Character Set) characters.
The fonts that are supplied by SAS and the fonts that are already installed on
Windows are automatically registered in the SAS registry when you install SAS.
Fonts already installed on UNIX and z/OS must be registered manually in the SAS
registry after you install SAS. To register other TrueType Fonts, see “Registering
Fonts” on page 317.
Using Fonts with Universal Printers and SAS/GRAPH Devices 317
Registering Fonts
* Data from a .pfm file is used to generate output using the SAS/GRAPH SASEMF and SASWMF
devices on Windows. On UNIX and z/OS, data from a .pfm file is used to generate output using the
WMF device and the EMF universal printer. This file is not required to register Type1 fonts using
PROC FONTREG. If you do not register a .pfm file, you might not have the desired results.
For more information about adding fonts to the SAS Registry, see “FONTREG
Procedure” in Base SAS Procedures Guide.
Note: In Microsoft Windows environments, TrueType fonts are usually located in
either the C:\WINNT\Fonts or C:\Windows\Fonts directory. For all other operating
environments, contact your system administrator for the location of the TrueType
font files.
For more information, see “FONTREG Procedure” in Base SAS Procedures Guide.
318 Chapter 15 / Printing with SAS
z/OS Specifics: When you add fonts to a z/OS system, the font file must be
allocated as a sequential data set with a fixed block record format and a record
length of 1.
For more information, see “FONTREG Procedure” in Base SAS Procedures Guide.
data;
set fonts;
drop ftype;
length type $16;
if ftype = "System"
then do;
if substr(font,2,3) = "ttf" then type = "TrueType";
else if substr(font,2,3) = "at1" then type = "Adobe Type1";
else if substr(font,2,3) = "cff" then type = "Adobe CFF/Type2";
else if substr(font,2,3) = "pfr" then type = "Bitstream PFR";
else type = "System";
if type ^= "System" then font = substr(font,7,length(font)-6);
else if substr(font,1,1) = "@" then font = substr(font, 2,length(font)-1);
end;
else type = "Printer Resident";
run;
proc sort;
by font;
run;
Using Fonts with Universal Printers and SAS/GRAPH Devices 319
%mend fontlist;
%fontlist(printer, pdf)
%fontlist(device, pdf)
%fontlist(device, win)
%fontlist(printer, png)
%fontlist(device, pcl5c)
Here is the output for the first 25 fonts in the output data set::
Output 15.3 List of Fonts Supported by the PDF Printer (Partial Output)
320 Chapter 15 / Printing with SAS
For more information, see “QDEVICE Procedure” in Base SAS Procedures Guide.
Using Fonts
This window contains drop-down boxes for Font, Style, Weight, Size (in points),
and Character Set.
5 Click the arrow on the right side of the Font box and scroll through the list of
available fonts.
TrueType fonts are indicated by the letters ttf enclosed in angle brackets (< >),
and Type1 fonts are indicated by the letters at1 enclosed in angle brackets (< >).
For example, the TrueType font Albany AMT is listed as <ttf> Albany AMT and
the Type1 Font Times is listed as <at1> Times. The three-character tag
enclosed in angle brackets makes the distinction between the different types of
fonts with the same name, such as <ttf> Symbol and a Symbol font that resides
on a physical printer. Fonts that do not have a <ttf> tag or an <at1> tag reside
in the printer's memory. To ensure that you are using SAS fonts when you
specify a font that has different types, use only the font syntax with the angled
brackets. For example, you can specify the Symbol font as follows: <ttf>
Symbol.
You can also specify attributes such as style or weight in the TITLE statement by
using the forward slash (/) as a delimiter.
Title1 f="Albany AMT/Italic/Bold" "Text in Bold Italic Albany AMT";
Using Fonts with Universal Printers and SAS/GRAPH Devices 321
For ODS templates, the attributes are specified after the text size parameter. See
“Specifying a Font with PROC PRINT and a User-Defined ODS Template” on page
324 for a complete example.
Note: You should use the <ttf> tag only when it is necessary (for example, to
distinguish between a TrueType font and another type of font with the same name).
You can also use the SYSPRINTFONT= system option to specify the weight and
size of a font. For example, the following code specifies an Arial font that uses bold
face, is italicized, and has a size of 14 points.
options sysprintfont=("Arial" bold italic 14);
You can override the default font by explicit font specifications or ODS styles.
For more information, see the “SYSPRINTFONT= System Option” in SAS System
Options: Reference.
Table 15.17 Universal Printers That Support Font Slanting and Emboldening
GIF PCL
TIFF PNG
SVG*
*Font slanting and emboldening is not supported on Internet Explorer and Firefox.
However, it is supported on Chrome, Opera, and Safari browsers.
The following universal printers do not support font slanting and emboldening:
n PDF
n EMF
n PostScript
to font slanting. To change the slant factor for all universal printers, follow these
steps:
n open the Registry Editor by entering regedit in the command bar or by selecting
Tools ð Options ð Registry Editor from the Application Toolbar.
n in the SAS Registry panel of the Registry Editor window, expand the CORE/
PRINTING/FREETYPE folder
n right-click and choose New Double Value from the pop-up menu
n enter SlantFactor in the Value Name field of the Edit Double Value window
n enter the desired slant factor value in the Value Data field. The default is
value .25.
To change the slant factor for a specific font, perform the following tasks using the
SAS Registry Editor:
n in the left SAS Registry panel of the SAS Registry Editor, expand the CORE/
PRINTING/FREETYPE/FONTS/<ttf>font-name/Attributes folder
n right-click and choose New Double Value from the pop-up menu
n enter SlantFactor in the Value Name field of the Edit Double Value window
n enter the desired slant factor value in the Value Data field. The default is
value .25.
Figure 15.27 PDF Output Using PROC PRINT and a User-Defined ODS Template
proc gslide;
title1 "Printing Unicode code points";
title2 "double exclamation mark" f="Arial Unicode MS/Unicode" h=2 '203C'x;
title3 "French Franc symbol " f="Arial Unicode MS/Unicode" h=3 '20A3'x;
title4 "Lira symbol " f="Arial Unicode MS/Unicode" h=3 '20A4'x;
title4 "Rupee symbol " f="Arial Unicode MS/Unicode" h=3 '20A8'x;
title5 "Euro symbol " f="Arial Unicode MS/Unicode" h=3 '20Ac'x;
title6 "Fraction, one third " f="Arial Unicode MS/Unicode" h=3 '2153'x;
title7 "Fraction, one fifth " f="Arial Unicode MS/Unicode" h=3 '2155'x;
title8 "Fraction one eighth " f="Arial Unicode MS/Unicode" h=3 '215B'x;
title9 "Black Florette " f="Arial Unicode MS/Unicode" h=3 '273F'x;
title10 "Black Star " f="Arial Unicode MS/Unicode" h=3 '2605'x;
run;
quit;
ods printer close;
326 Chapter 15 / Printing with SAS
Example 2
The following example produces an output file, utf8.gif. It must be run with a
UTF-8 server and requires a TrueType font that contains the characters that are
used. The table of character names and the associated codes can be found on the
Unicode website at http://www.unicode.org/charts.
proc template;
define style utf8_style / store = SASUSER.TEMPLAT;
parent = styles.printer;
style fonts /
'docFont' = ("Arial Unicode MS", 12pt)
'headingFont' = ("Arial Unicode MS", 10pt, bold)
'headingEmphasisFont' = ("Arial Unicode MS", 10pt, bold italic)
'TitleFont' = ("Arial Unicode MS", 12pt, italic bold)
'TitleFont2' = ("Arial Unicode MS", 11pt, italic bold)
'FixedFont' = ("Times New Roman Uni", 11pt)
'BatchFixedFont' = ("Times New Roman Uni", 6pt)
'FixedHeadingFont' = ("Times New Roman Uni", 9pt, bold)
'FixedStrongFont' = ("Times New Roman Uni", 9pt, bold)
'FixedEmphasisFont' = ("Times New Roman Uni", 9pt, italic)
'EmphasisFont' = ("Arial Unicode MS", 10pt, italic)
'StrongFont' = ("Arial Unicode MS", 10pt, bold);
end;
run;
Using Fonts with Universal Printers and SAS/GRAPH Devices 327
%macro utf8chr(ucs2);
kcvt(&ucs2, 'ucs2b', 'utf8');
%mend utf8chr;
%macro namechar(name, char);
name="&name"; code=upcase("&char"); char=%utf8chr("&char"x); output;
%mend namechar;
data uft8char;
length name $40;
%namechar(Registered Sign, 00AE);
%namechar(Cent Sign, 00A2);
%namechar(Pound Sign, 00A3);
%namechar(Currency Sign, 00A4);
%namechar(Yen Sign, 00A5);
%namechar(Rupee Sign, 20A8);
%namechar(Euro Sign, 20Ac);
%namechar(Dong Sign, 20Ab);
%namechar(Euro-currency Sign, 20A0);
%namechar(Colon Sign, 20A1);
%namechar(Cruzeiro Sign, 20A2);
%namechar(French Franc Sign, 20A3);
%namechar(Lira Sign, 20A4);
run;
This statement causes the templates to be written to the WORK library where the
server has Read and Write access.
328 Chapter 15 / Printing with SAS
Universal Printing supports three EMF metafile formats, EMF, EMFPlus, and
EMFDual. The following table shows the EMF Universal Printers and their
corresponding EMF metafile formats:
EMF
Universal Type of EMF
Printer Metafile Format Description
EMFDual and SASEMF Universal Printers support TrueType and Type1 fonts. The
EMF Universal Printer supports only TrueType fonts. Use the FONTREG procedure
to register Fonts. If you specify another type of font when the Universal Printer is
EMF, the font is mapped to the TrueType font.
Compression and font embedding are not supported.
For a description of the EMF printers, submit the following QDEVICE procedure and
view the output in the SAS log:
proc qdevice;
printer emf-universal-printer;
run;
See Also
“Color Support for Universal Printers” on page 273
330 Chapter 15 / Printing with SAS
The GIF printer does not support multiple page documents. If a procedure creates
multiple pages or if more than one procedure is used in the code for ODS PRINTER
output, only the first page is viewable.
332 Chapter 15 / Printing with SAS
See Also
n “Color Support for Universal Printers” on page 273
n “Creating Animated GIF Images and SVG Documents” on page 375
For a description of the PCL printers, you can either view the printers in the SAS
registry or submit the following QDEVICE procedure and view the output in the SAS
log:
proc qdevice;
printer pcl-printer-name;
run;
See Also
“Color Support for Universal Printers” on page 273
:
Using the same sample code, you can create a PCL file by substituting ODS PCL
with ODS PRINTER:
n ods html close;
ods printer printer=pcl5c;
n options printerpath=pcl5c;
Creating PDF Files Using Universal Printing 335
SAS creates the file sasprt.pcl in the current directory. PCL files can be viewed after
they are created by sending the output to a Hewlett-Packard LaserJet printer or a
Hewlett-Packard Color LaserJet printer. PCL files can also be viewed on a monitor
with some software applications.
Note: If you have SAS/GRAPH installed, your PDF output can contain links and
pop-up text boxes. For more information, see “Enhancing Web Presentations with
Chart Descriptions, Data Tips, and Drill-Down Functionality” in SAS/GRAPH:
Reference.
Universal Printer is specified as the value of the PRINTERPATH= system option and
the ODS PRINTER statement creates the PDF:
n ods html close;
ods pdf;
n options printerpath=pdf;
ods html close;
ods printer;
SAS creates a file sasprt.pdf in the current directory and opens the PDF in the
Results Viewer window.
See Also
“Color Support for Universal Printers” on page 273
run;
ods printer close;
ods html;
The following output is the PNG graphic displayed in Windows Picture and Fax
Viewer:
n Netscape Navigator 6
For more information about browsers and viewers that support PNG images, see
the PNG web pages at www.libpng.org.
Creating PostScript Files Using Universal Printing 341
PostScript output supports transparent GIF files. You can use Ghostview to view
PostScript files. If you have Acrobat Distiller installed, you can distill the PostScript
file to create a PDF file that you can view in Adobe Reader.
See Also
“Color Support for Universal Printers” on page 273
ods ps close;
ods html;
342 Chapter 15 / Printing with SAS
n options printerpath=ps;
ods html close;
ods printer;
ods ps close;
ods html;
Here is the distilled PostScript file in PDF output:
Creating SVG (Scalable Vector Graphics) Files Using Universal Printing 343
Most often in SAS, the SVG Universal Printers and device drivers are used to create
graphs. Graphs can be created by using ODS Graphics or SAS/GRAPH. You can
also use the SVG Universal Printers to show tables or reports that you create as
SVG documents.
Several ODS destinations (EPUB, HTML, HTML5, LISTING, and PRINTER
destinations) can be used to create SVG documents. SVG is the default Universal
Printer and device driver for the ODS HTML5 destination.
SVG documents can be stand-alone files or integrated within an HTML5 or EPUB
file. A stand-alone SVG document can be referenced as a link target, referenced as
an embedded file in an HTML document, or referenced as a CSS2 or XSL property.
For information about embedding SVG documents in web pages, see the topic on
using SVG documents in web pages in the SVG 1.1 specification on the W3 SVG
website http://www.w3.org/TR/SVG.
Multi-page SVG documents can be animated in Base SAS and SAS/GRAPH. When
you create animated SVG documents in Base SAS using Universal Printing without
specifying any ODS Graphics procedures, the animated SVG documents appear as
a slide show or an animated PowerPoint presentation. For more information, see
“Creating Animated GIF Images and SVG Documents” on page 375.
If you have SAS/GRAPH installed, your SVG documents can contain links and pop-
up text boxes.
The information provided here is limited to creating SVG documents using Universal
Printers in Base SAS and ODS Graphics. For more information about creating SVG
files in SAS/GRAPH, see “Enhancing Web Presentations with Chart Descriptions,
Data Tips, and Drill-Down Functionality” in SAS/GRAPH: Reference.
For detailed information about the SVG standard, see the W3 documentation at
http://www.w3.org/TR/SVG.
SVG Terminology
SVG canvas
the space upon which the SVG document is rendered.
viewBox
specifies the coordinate system and the area of the SVG document that is visible
in the viewport.
viewport
a finite rectangular space within the SVG canvas where an SVG document is
rendered. In SAS, the viewport is determined by the value of the PAPERSIZE=
system option for a scalable viewport and by the SVGWIDTH= and
SVGHEIGHT= system options for a static viewport.
viewport coordinate system or viewport space
the starting X and Y coordinates and the width and height values of the viewport.
user coordinate system or user space
the starting X and Y coordinates and the width and height values of the area of
the document to display in the viewport.
user units
is equal to one unit of measurement that is defined in your environment's
coordinate system. In many cases, the coordinate system uses pixels. Check
with your system administrator to determine the unit of measure that is used in
your environment.
Creating SVG (Scalable Vector Graphics) Files Using Universal Printing 345
* When you use this printer in SAS/GRAPH, you can create pop-up data tips. For more information,
see “Data Tips for Web Presentations” in SAS/GRAPH: Reference.
346 Chapter 15 / Printing with SAS
SVG prototypes for creating printers are available in the SAS Registry under CORE
\PRINTING\PROTOTYPES. You can define your own SVG printer using the
PRTDEF procedure. For more information, see “PRTDEF Procedure” in Base SAS
Procedures Guide and “Managing Universal Printers Using the PRTDEF Procedure”
on page 302.
For a description of an SVG printer, you can either view the printer in the SAS
registry or submit the following QDEVICE procedure and view the output in the SAS
log:
proc qdevice;
printer svg-printer-name;
run;
See Also
n “Color Support for Universal Printers” on page 273
n “Creating Animated GIF Images and SVG Documents” on page 375
Alternatively, you can specify the SVG printer in the ODS PRINTER statement and
eliminate the OPTIONS statement, as shown below.
ods html close;
ods printer printer=svg;
To create SVG graphs using SAS/GRAPH, you can use the ODS LISTING
statement:
You can create SVG graphs for ODS Graphics using these statements:
n ods html5 options (svg_mode="inline");
ods graphics /imagefmt=svg;
n options printerpath=svg;
ods html;
n ods listing;
ods graphics /imagefmt=svg;
n Using SAS/GRAPH:
ods listing;
goptions dev=SVG;
SAS has several system options that enable you to modify various aspects of your
SVG document. Here are some SVG document traits:
n a specific SVG Universal Printer
By using the NEWFILE option in the ODS PRINTER statement, you can create an
SVG document for the output from each procedure or DATA step.
For more information, see the following language elements:
n “PRINTERPATH= System Option” in SAS System Options: Reference
ODS HTML graphs created by ODS an SVG file for each graph
Graphics and SAS/GRAPH and an HTML file
ODS HTML5 graphs created by ODS an SVG file for each graph
svg_mode='embed' * Graphics and SAS/GRAPH and an HTML file
ODS LISTING graphs created by ODS an SVG file for each graph
Graphics and SAS/GRAPH
ODS PRINTER all output created by the one SVG file for all output
DATA step and SAS created between ODS
procedures PRINTER and ODS
PRINTER CLOSE
Note: Graphs that are created by ODS Graphics do not use options that are
specified by the GOPTIONS SAS/GRAPH statement. The GOPTIONS statement is
valid only for SAS/GRAPH.
The default filename for an SVG file that was created with an ODS Graphics
procedure is prefixed with the procedure name. For example, the default filename
for PROC SGPLOT output could be sgplot01.svg. The default filename for an SVG
file that was created using ODS PRINTER is sasprt.svg.
eSVG Viewer and IDE eSVG Viewer for PC, PDA, Mobile
TinyLine TinyLine
n If you select View ð Page Style ð No Style, all graphs appear as a black
rectangle.
n Firefox does not support font embedding. To avoid font mapping problems in
your SVG document, you can set the NOFONTEMBEDDING system option. If
the FONTEMBEDDING option is set when an SVG document is created and the
SVG document is subsequently viewed in Firefox, Firefox uses the default font
setting that is defined on the Contents tab in the Firefox Options dialog box.
SVGZ documents that you create for ODS HTML5 output can be viewed only with
the Google Chrome or Opera web browsers.
n embeds the base64 encoded PNG image into the SVG document
Creating SVG (Scalable Vector Graphics) Files Using Universal Printing 351
In the SVG document, the <image> element has an xlink attribute that begins as
follows:
xlink:href="data:image/png;base64,
run;
ods printer close;
ods html;
The SVG Universal Printer creates separate PNG files when the SVG printer that
you are using has the Images Embedded registry setting set to 0.
To set this registry setting, do the following:
4 Click OK.
Set the title that appears in the title bar of the SVGTITLE=
SVG document.
By default, the magnify tool is not included in the SVG document. You must explicitly
set the SVGMAGNIFYBUTTON system option. You can use this OPTIONS
statement:
options svgmagnifybutton;
In this example, no specific SVG system option values were set to size the SVG
document. Therefore, the viewBox is the default size specified by the PAPERSIZE=
system option. The SVG document scales to the viewport because no value was
specified for the SVGWIDTH= and SVGHEIGHT= system options. The following is
the <svg> element that SAS creates:
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xml:space="preserve" baseProfile="full" version="1.1"
id="SVGMain" onload='SVGMain_Init("SVGMain")'
viewBox="-1 -1 801 601">
SAS creates a single SVG document named sasprt.svg and stores it in a specific
location, depending on your operating environment. Under Windows, the file is
stored in the current directory. Under UNIX, the file is stored in your home directory.
Under z/OS, the file is stored as a z/OS UNIX System Services Hierarchal File
System (HFS) file, or as a z/OS data set. If the SVG file is written to a z/OS data set,
it is written to PDSE library userid.SASPRT.SVG. You can use the FILE= option in
the ODS PRINTER statement to specify a different filename.
The following figure is an SVG file that uses the Adobe Acrobat SVG plug-in for
Microsoft Internet Explorer. This file was created by using the SGPLOT procedure to
plot the Sashelp.Class data set.
When you use the SVG, SVGnotip, SVGt, SVGView, and SVGZ Universal Printers,
SAS creates a single SVG document. Depending on the size of the SVG document,
the browser might display the complete SVG document. Check the documentation
for your browser to determine whether your browser has controls for viewing SVG
documents. In the Adobe SVG Viewer plug-in for Internet Explorer, you can press
the Alt key and the left mouse button to pan and move to different pages in a
continuous, multi-page SVG document.
Figure 15.37 First Page of a Multi-page SVG File with Navigation Controls
To display an index of all pages in the SVG file, select the Index button. To go to a
specific page from the index, select the thumbnail image of the page.
362 Chapter 15 / Printing with SAS
You can hide the control buttons by selecting the SVG Controls button. The tooltip
is displayed when the cursor is over the control. To show the navigation controls
again, click in the top area of the output when you see the tooltip Click to toggle
SVG control button bar. This is useful when you want to print a page in the
document without the SVG controls.
Creating SVG (Scalable Vector Graphics) Files Using Universal Printing 363
Figure 15.39 A Multi-page SVG File That Hides the Navigational Controls
For information about the NEWFILE= option, see “ODS PRINTER Statement” in
SAS Output Delivery System: User’s Guide.
For information about the NEWFILE= option, see “ODS PRINTER Statement” in
SAS Output Delivery System: User’s Guide.
run;
%let name=annomap;
filename odsout '.';
goptions reset=all;
/* Close the HTML and LISTING destinations for map creation. */
ods html close;
ods listing close;
options printerpath=svgt nodate nonumber;
ods printer file='annomap.svg' ;
366 Chapter 15 / Printing with SAS
goptions border;
quit;
/* you must use the default ods style, for transparency to work */
quit;
n Create an SVG document using the ODS PRINTER statement and the
PRINTERPATH=SVG option. Then, embed the SVG document in an HTML file
using the <EMBED> element.
You can integrate an SVG graph in an HTML file by using the ODS HTML5
SVG_MODE='INLINE' statement.
For information about creating SVG document in SAS/GRAPH, see “Generating
SVG, PNG, GIF, and TIFF Graphics” in SAS/GRAPH: Reference.
</head>
<body>
<p>Linking to an SVG document:</p>
<a href="sasprt.svg">SGPlot Graph</a>
<p>Embed the SVG document:</p>
<embed src="sasprt.svg" type="image/svg+xml" height="400" width="300">
</body>
</html>
Figure 15.41 An HTML Document Displaying a Link to a Stand-alone SVG Document and
an Embedded SVG Document
The viewport has a height of 400 pixels and a width of 300 pixels. Because the
default SVG system option values were used, the SVG document scales to 100% of
the viewport.
If you click the SGPLOT Graph link, the browser displays the following SVG
document:
370 Chapter 15 / Printing with SAS
The viewport is the area in the browser window that can be displayed and the SVG
document scales to 100% of the viewport.
The following example uses the ODS HTML5 destination to embed an SVG graph in
an HTML file:
ods html close;
ods html5 options(svg_mode="embed");
Creating SVG (Scalable Vector Graphics) Files Using Universal Printing 371
The default svg_mode for the HTML5 destination is INLINE. In order to embed the
SVG graph, you must specify SVG_MODE="EMBED" as an option in the ODS
HTML5 statement. Here is the <EMBED> element in the HTML file:
<embed style="height: 480px; width: 640px" src="SGPLOT.svg" type="image/svg+xml"/>
For a description of the TIFF printer, you can either view the printer in the SAS
registry or submit the following QDEVICE procedure and view the output in the SAS
log:
proc qdevice;
printer tiff;
run;
See Also
“Color Support for Universal Printers” on page 273
n for SVG documents only, whether to immediately start the animation when the
document is loaded in the web page
n for SVG documents only, whether a frame fades in and out of view and if during
the fade-in and fade-out time, the frames are overlaid or played sequentially
SAS/GRAPH is required to create animated files for the ODS HTML5, ODS HTML,
and the ODS LISTING destinations. For more information, see SAS/GRAPH:
Reference.
You set the options using the OPTIONS statement before opening the ODS
PRINTER destination:
options printerpath=gif animation=start animduration=5 animloop=yes noanimoverlay;
ods printer file='myfile.gif';
In this OPTIONS statement, the ANIMATION option starts creating the animation
file, the ANIMDURATION option specifies that each frame is held for 5 seconds. The
ANIMLOOP option specifies to continuously repeat the animation loop. The
NOANIMOVERLAY option specifies that each frame is played sequentially.
When the PRINTERPATH= option is set to SVG, you can use the SVG animation
options to configure the fade-in and fade-out attributes and the autoplay attribute.
The animation options that begin with SVG do not affect GIF images.
After you set the options and opened the PRINTER destination, proceed with your
SAS code to create each frame in your file. The animation frame is created when
you run SAS procedures. You can use animation options in between procedures to
change the duration that a frame is held in view and the fade-in and fade-out times.
For example, you can hold a particular frame in view for a longer period of time. You
would use the OPTIONS ANIMDURATION= statement before a procedure to
increase the time that the frame is held in view. Specify ANIMATION=STOP to end
the creation of the animation file. Use the ODS PRINTER CLOSE statement to
close the file.
376 Chapter 15 / Printing with SAS
TIP Be sure to specify ANIMATION=STOP after you create the frames for your
animation file. If ANIMATION=START remains set, you might create an animation
file unintentionally for subsequent procedure statements.
To embed the file in a web page or to create a link to the file from a web page, see
“SVG Documents in HTML Files” on page 367.
When an animated file is displayed in a browser, the animation control buttons can
be used to reset ( ) the animation, to pause ( ) the animation, and to play
( ) the animation. You can toggle SVG Controls to show or hide the control
buttons. Here is one frame of an animated SVG document with the control buttons:
If you are creating SVG files that will be viewed on an iPad, a best practice is to use
the SVGVIEW Universal Printer for optimal sizing.
You can view animated SVG files using Internet Explorer 9 or later. You can view
animated GIF files in releases prior to Internet Explorer 9.
Valid Universal
Description Option Name Printers
Valid Universal
Description Option Name Printers
Specifies the amount of time that each ANIMDURATION= GIF and SVG
frame in an animation is held in view.
select Samples & SAS Notes. Search for SVG animation, In the search results,
look for the program seasons.sas.
Create a data set for each quarter for the years 1993 and 1994. Each DATA step
uses a WHERE clause to create a data set by year and quarter. The KEEP option in
the SET statement specifies the variables that are in each of the data sets.
data work.q1y93 (where=(year=1993 and quarter=1));
set sashelp.prdsale(keep=Actual Country Product Quarter Year);
run;
proc template;
define style winter;
parent = Styles.meadow;
style body from body;
;
end;
quit;
proc template;
define style spring;
parent = Styles.meadow;
style body from body;
proc template;
define style summer;
parent = Styles.meadow;
style body from body;
proc template;
define style fall;
parent = Styles.meadow;
style body from body;
Create an SVG document for each quarter, using the SGPLOT procedure. The
%LET macro variable is used to name the SVG file. The first ODS PRINTER
statement opens the PRINTER destination and creates the SVG file. After the first
ODS PRINTER statement, an ODS PRINTER statement is used before each
procedure to specify the style that indicates the seasonal colors to use to create a
chart. The TITLE statement specifies the season and the year that is reported. Each
SGPLOT procedure plots the sales for each country by using identical VBAR and
YAXIS options in each procedure. The vbar country / response=actual
group=product; statement specifies to create a vertical bar for each country. Each
vertical bar contains sales data for each product. The visual aspects for each
product in vertical bar are automatically determined by the SGPLOT procedure. The
YAXIS statement specifies the values to plot for the Y axis.
%let name=seasons;
options animation=stop;
ods printer close;
Here are the charts that were created for 1993 by season:
382 Chapter 15 / Printing with SAS
383
PART 2
Windowing Environment
Chapter 16
Introduction to the SAS Windowing Environment . . . . . . . . . . . . . . . . . . . 385
Chapter 17
Managing Your Data in the SAS Windowing Environment . . . . . . . . . . . . 405
384
385
16
Introduction to the SAS Windowing
Environment
Command:
Enter EXPLORER in the command line and press Enter.
Menu:
Select View ð Explorer.
Log Window
Results Window
Output Window
Menu:
Select View ð Output.
Several default values are selected in the Results tab. Under HTML, Create HTML
is the default output type, and HTMLBlue is the default output style. Use ODS
Graphics is also selected by default. When the Use ODS Graphics box is checked,
you are able to automatically generate graphs when running procedures that
support ODS graphics. Checking or unchecking this box enables you to turn on or
turn off ODS graphics when you invoke SAS.
To produce LISTING output, check the Create listing box under Listing. If you
deselect Create HTML and leave the Create listing box checked, your program
produces listing output only.
Menus in SAS
Menus contain lists of options that you can select.
The following example shows the menu options that are available when you select
Help from the menu bar:
Menu choices change as you change the windows that you are using. For example,
if you select Explorer from the View menu, and then select View again, the menu
lists the View options that are available when the Explorer window is active.
The following display shows the View menu when the Explorer window is active:
396 Chapter 16 / Introduction to the SAS Windowing Environment
If you select Program Editor from the View menu, and then select View again, the
menu lists the View options that are available when the Program Editor window is
active.
The following display shows the View menu when the Program Editor window is
active:
Navigating in the SAS Windowing Environment 397
Figure 16.10 View Options When the Program Editor Window Is Active
You can also access menus when you right-click an item. For example, when you
select View ð Explorer and then right-click Libraries in the Explorer window, the
following menu appears:
The menu remains visible until you make a selection from the menu or until you click
an area outside of the menu area.
Toolbars in SAS
A toolbar displays a block of window buttons or icons. When you click items in the
toolbar, a function or an action is started. For example, clicking a picture of a printer
in a toolbar starts a print process. The toolbar displays icons for many of the actions
that you perform most often in a particular window.
z/OS Specifics: SAS in the z/OS operating environment does not have a toolbar.
See SAS Companion for z/OS for more information.
The toolbar that you see depends on which window is active. For example, when
the Program Editor window is active, the following toolbar is displayed:
Figure 16.12 Example of the SAS Toolbar When the Enhanced Editor Window Is Active
When you position your cursor at one of the items in the toolbar, a text window
appears that identifies the purpose of the icon.
Figure 16.15 Results of Using Help in the Command Line of a SAS Session
Related items are displayed, along with the documents that contain the information.
Click a topic to view Help for that item.
n Send feedback.
About SAS®9
provides version and release information about SAS.
SAS/ACCESS ACCESS
Note: Some additional SAS windows that are specific to your operating
environment might also be available. For more information, see the SAS
documentation for your operating environment.
405
17
Managing Your Data in the SAS
Windowing Environment
n To view data sets in a list, select List from the View menu.
The following example uses large icons to show the contents of Sashelp:
If you select the Sashelp library and then select View ð Details from the menu bar,
the contents of the Sashelp library is displayed, along with the size and type of the
data sets:
408 Chapter 17 / Managing Your Data in the SAS Windowing Environment
If you double-click a table in this list, the data set opens. The VIEWTABLE window,
which is a SAS table viewer and editor, appears and is populated with the data from
the table.
2 In the File Shortcut Assignment window that appears, enter the name of the
fileref that you want to use in the Name field.
3 Enter the full pathname for the file in the File field.
The following display shows the File Shortcut Assignment window:
Managing Data with SAS Explorer 409
By default, filerefs that you create are temporary and can be used in the current
SAS session only. Selecting Enable at Start-up from the File Shortcut Assignment
window, however, assigns the fileref to the file whenever you start a new SAS
session.
3 Select Rename from the menu, and enter the new name of the data set.
4 Click OK.
3 From the menu that is displayed, choose Copy to copy a data set to another
library or catalog, or choose Duplicate to copy the data set to the same library or
catalog.
a Click the library in the left pane of SAS Explorer to select the library or
catalog into which the data set will be copied.
b In the right pane, right-click the mouse and select Paste from the menu that
appears.
A copy of the data set now resides in the new directory.
5 If you choose Duplicate, then the Duplicate window appears. In the Duplicate
window, SAS appends _copy to the data set name (for example, data-set-
name_copy).
Do one of the following:
n Keep the name and click OK.
n Create another name for your duplicated data set and click OK.
4 In the Description field of the General tab, you can enter a description of the
data set. To save the description, click OK.
5 Select other tabs to display additional information about the data set.
Overview of VIEWTABLE
To manipulate data interactively, you can use the SAS table editor, VIEWTABLE. In
the VIEWTABLE window, you can create a new table, and view or edit an existing
table.
Here are the steps for using the SAS Explorer window to open a SAS data set in a
VIEWTABLE window:
1 Open SAS Explorer and double-click on the icon for the library that contains the
target data set.
3 The VIEWTABLE window should appear, populated with data from the data set.
4 Use the scroll bar on the VIEWTABLE window to view all of the data.
Working with VIEWTABLE 413
1 Specify the VIEWTABLE command in the SAS Display Manager command line
using the following syntax:
VIEWTABLE data-set-name <-options>
2 Here is an example:
viewtable cars
1 Open a data set in VIEWTABLE (to access the VIEWTABLE pop-up menu,
you must have an active VIEWTABLE window open).
3 Select View ð Column Names or View ð Column Labels from the drop-
down View menu.
4 Once this selection is made, the opened table, and all tables that are
subsequently opened, will display table headers based on this setting in the
VIEWTABLE pop-up menu. When you exit VIEWTABLE, or exit SAS, the
preference for column labels or column names is saved. When you open
VIEWTABLE or invoke SAS again, the preference that you chose is
automatically selected.
This feature is available in SAS 9.4M1 and later releases.
414 Chapter 17 / Managing Your Data in the SAS Windowing Environment
n Using the VIEWTABLE command to change the way table headers are displayed
when a table is opened:
2 Here is an example:
viewtable cars colheading=names
1 With the SAS Explorer window active, select Tools ð Options ð Explorer to
open the Explorer Options window.
3 Select Table in the list of registered types, and then click Edit to open the
TABLE Options dialog box.
4 Select the &Open Action Command in the list of actions, and then click Edit to
open the Edit Action dialog box.
Working with VIEWTABLE 415
5 In the Edit Action dialog box, add ‑COLHEADING=<value> to the end of the
VIEWTABLE command:
VIEWTABLE %8b.'%s'.DATA colheading=names
6 When you are finished making changes, click OK three times to exit all of the
open dialog boxes. From this point on, when you use the SAS Explorer Window
to open the VIEWTABLE window, SAS displays the table headers according to
what you specified in this SAS Explorer dialog box.
Note: These steps only affect how tables are displayed when they are opened from
the SAS Explorer Window (either by double-clicking on the icon or by right-clicking
on the icon and selecting "Open"). They do not affect how tables are opened when
you use the VIEWTABLE command to open a table.
1 Select Tools ð Options ð Keys from the SAS menu. The Keys window will
appear.
2 In the Keys window, select the F-Key that you want to assign to the VIEWTABLE
command and place the cursor in the Definition field of the selected F-Key.
3 Type the VIEWTABLE command with the desired option. Here is an example:
VIEWTABLE %8b. '%s'.DATA colheading=name
For more information about using VIEWTABLE, see Doing More with the SAS®
Display Manager: From Editor to ViewTable - Options and Tools You Should Know
(PDF).
1 Right-click the heading for the column that you want to change, and then select
Column Attributes from the menu.
2 In the Label field of the Column Attributes window, enter the new name of the
column heading and then click Apply.
In this example, the Name heading is replaced by the Name of Player label.
Working with VIEWTABLE 417
When you press Apply, the column heading in VIEWTABLE changes to the new
name.
In this example, the label was changed to Name of Player.
1 Click a column heading for the column that you want to move.
418 Chapter 17 / Managing Your Data in the SAS Windowing Environment
In this example, if you click the heading Name, and then drag and drop Name
onto Team at the End of 1986, the Name column moves to the right of the
Team at the End of 1986 column.
1 Right-click the heading of the column on which you want to sort, and select Sort
from the menu.
3 When the following warning message appears, click Yes to create a sorted copy
of the table.
Working with VIEWTABLE 419
Note: If you selected Edit Mode after opening the table and clicking a data cell,
this window does not appear. SAS updates the original table.
4 In the Sort window, enter the name of the new sorted table.
In this example, the name of the sorted table is BaseballStatisticsList.
5 Click OK.
The rows in the new table are sorted in ascending order by values of Team at
the End of 1986.
420 Chapter 17 / Managing Your Data in the SAS Windowing Environment
1 With the table open, select Edit ð Edit Mode from the Edit menu.
2 Click a cell in the table, and the value in the cell is highlighted.
In this example, the third cell in the fifth row is highlighted.
In this example, the cell has been updated with a new value for Times at Bat in
1986.
Subsetting Data By Using the WHERE Expression 421
5 When prompted to save pending changes to the table, click Yes to save your
changes or No to disregard changes.
Note: If you make changes in one row and then edit cells in another row, the
changes in the first row are automatically saved. When you select File ð Close, you
are prompted to save the pending changes to the second row.
1 In the Explorer window, open a library and double-click the table that you want to
subset.
In this example, the Cars data table is selected.
422 Chapter 17 / Managing Your Data in the SAS Windowing Environment
2 Right-click any table cell that is not a heading and select Where from the menu.
3 In the Available Columns list, select a column, and then select an operator from
the Operators menu.
In this example, Make is selected from the Available Columns list, and EQ
(equal to) is selected from the Operators menu. Note that the WHERE
expression is being built in the Where box at the bottom of the window.
Subsetting Data By Using the WHERE Expression 423
4 In the Available Columns list, select another value to complete the WHERE
expression.
In this example, scroll to the bottom of the Available Columns window and
select <LOOKUP distinct values>.
Note that the complete WHERE expression appears in the Where box at the
bottom of the window.
424 Chapter 17 / Managing Your Data in the SAS Windowing Environment
In this example, VIEWTABLE displays only rows where the value of Make is
Honda.
The VIEWTABLE window removes any existing subsets of data that were
created with the WHERE expression, and displays all of the rows of the table.
Exporting a Subset of Data 425
Export Data
To export data, follow these steps:
2 Select the SAS data set from which you want to export data.
In this example, Sashelp is selected as the library, and Cars is the member
name.
3 Click Next and the Export Wizard - Select export type window appears.
4 Select the type of data source to which you want to export files.
6 In the Workbook field, enter the name of the workbook that will contain the
exported file and then click OK.
In this example, Myworkbook is entered as the name of the workbook.
7 When the Export Wizard - Select table window appears, enter a name for the
table that you are exporting.
In this example, Mytable is the table name.
Exporting a Subset of Data 427
8 Click Next.
9 If you want SAS to create a file of PROC EXPORT statements for later use, then
enter the name of the file that will contain the SAS statements.
In this example, PROC EXPORT statements are saved to the file. The Replace
file if it exists box is checked.
2 Select the type of file that you are importing by selecting a data source from the
Select a data source menu.
Note that Standard data source is selected by default. In this example,
Microsoft Excel Workbook is selected.
4 In the Connect to MS Excel window, enter the pathname of the file that you want
to export, and then click OK.
Importing Data into a Table 429
5 In the Import Wizard - Select table window, enter the name of the table that you
want to import.
7 In the Import Wizard - Select library and member window, enter a location in
which to store the imported file.
In this example, Work is selected as the library, and Book1 is selected as the
member name.
430 Chapter 17 / Managing Your Data in the SAS Windowing Environment
9 If you want SAS to create a file of PROC IMPORT statements for later use, then
enter the name of a file that will contain the SAS statements.
PART 3
Chapter 18
Introduction to SAS Cloud Analytic Services . . . . . . . . . . . . . . . . . . . . . . . 433
Chapter 19
SAS Language Support for CAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
432
433
18
Introduction to SAS Cloud Analytic
Services
19
SAS Language Support for CAS
Figure 19.1 Example Documentation Syntax Page That Shows a “Restriction” to Indicate That the Language
Element Is Not Supported in CAS
SAS language elements that are supported in CAS display “CAS” in the Categories
field of the language elements’ syntax page:
Figure 19.2 Example Documentation Syntax Page That Shows Support for CAS in the Categories Field
Figure 19.3 Example Documentation Category Page Showing CAS-supported Language Elements
Here is a list of category tables for each of the SAS language element types:
n DATA Step Statements By Category in SAS DATA Step Statements: Reference.
n CAS Actions in SAS Viya Actions and Action Sets by Name and Product, SAS
Viya: System Programming Guide and CAS DATA Step Action in SAS Cloud
Analytic Services: DATA Step Programming
Note: A SAS Viya Visual Analytics license is required for access to SAS Cloud
Analytic Services.
438 Chapter 19 / SAS Language Support for CAS
439
PART 4
Chapter 20
DATA Step Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Chapter 21
Reading Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Chapter 22
BY-Group Processing in the DATA Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
Chapter 23
Reading, Combining, and Modifying SAS Data Sets . . . . . . . . . . . . . . . . . 509
Chapter 24
Using DATA Step Component Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Chapter 25
Array Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
440
441
20
DATA Step Processing
statements that manipulate existing SAS data sets or create SAS data sets from raw
data files.
You can use the DATA step for the following tasks:
n creating SAS data sets (SAS data files or SAS views)
n creating SAS data sets from input files that contain raw data (external files)
n creating new SAS data sets from existing ones by subsetting, merging,
modifying, and updating existing SAS data sets
n analyzing, manipulating, or presenting your data
n computing the values for new variables
n retrieving information
n file management
Note: A DATA step creates a SAS data set. This data set can be a SAS data file or
a SAS view. A SAS data file stores data values while a SAS view stores instructions
for retrieving and processing data. When you can use a SAS view as a SAS data
file, as is true in most cases, this documentation uses the broader term SAS data
set.
Flow of Action
When you submit a DATA step for execution, it is first compiled and then executed.
The following figure shows the flow of action for a typical SAS DATA step.
Overview of DATA Step Processing 443
compiles
SAS statements Compile Phase
(includes syntax checking)
creates
an input buffer
a program data vector
descriptor information
YES
reads
an input record
executes
additional
executable statements
writes
an observation to
the SAS data set
returns
to the beginning of
the DATA step
444 Chapter 20 / DATA Step Processing
1 The DATA step begins with a DATA statement. Each time the DATA statement
executes, a new iteration of the DATA step begins, and the _N_ automatic
variable is incremented by 1.
2 SAS sets the newly created program variables to missing in the program data
vector (PDV).
Processing a DATA Step: A Walk-through 445
3 SAS reads a data record from a raw data file into the input buffer, or it reads an
observation from a SAS data set directly into the program data vector. You can
use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.
4 SAS executes any subsequent programming statements for the current record.
5 At the end of the statements, an output, return, and reset occur automatically.
SAS writes an observation to the SAS data set, the system automatically returns
to the top of the DATA step, and the values of variables created by INPUT and
assignment statements are reset to missing in the program data vector. Note that
variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are
not reset to missing here.
6 SAS counts another iteration, reads the next record or observation, and
executes the subsequent programming statements for the current observation.
7 The DATA step terminates when SAS encounters the end-of-file in a SAS data
set or a raw data file.
Note: The figure shows the default processing of the DATA step. You can place
data-reading statements (such as INPUT or SET), or data-writing statements (such
as OUTPUT), in any order in your program.
1 The DROP= data set option prevents the variable TeamName from being written
to the output SAS data set called Total_Points.
2 The INPUT statement describes the data by giving a name to each variable,
identifying its data type (character or numeric), and identifying its relative location
in the data record.
446 Chapter 20 / DATA Step Processing
3 The SUM statement accumulates the scores for three events in the variable
TeamTotal.
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Variables that are created by the INPUT and the Sum statements (TeamName,
ParticipantName, Event1, Event2, Event3, and TeamTotal) are set to missing
initially. Note that in this representation, numeric variables are initialized with a
period and character variables are initialized with blanks. The automatic variable
_N_ is set to 1; the automatic variable _ERROR_ is set to 0.
The variable TeamName is marked Drop in the PDV because of the DROP= data
set option in the DATA statement. Dropped variables are not written to the SAS data
set. The _N_ and _ERROR_ variables are dropped because automatic variables
created by the DATA step are not written to a SAS data set. See Chapter 4, “SAS
Variables,” on page 37 for details about automatic variables.
Reading a Record
SAS reads the first data line into the input buffer. The input pointer, which SAS uses
to keep its place as it reads data from the input buffer, is positioned at the beginning
of the buffer, ready to read the data record. The following figure shows the position
of the input pointer in the input buffer before SAS reads the data.
Processing a DATA Step: A Walk-through 447
Figure 20.3 Position of the Pointer in the Input Buffer Before SAS Reads Data
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Kn i g h t s S u e 6 8 8
The INPUT statement then reads data values from the record in the input buffer and
writes them to the PDV where they become variable values. The following figure
shows both the position of the pointer in the input buffer, and the values in the PDV
after SAS reads the first record.
Figure 20.4 Values from the First Record Are Read into the Program Data Vector
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Kn i g h t s S u e 6 8 8
After the INPUT statement reads a value for each variable, SAS executes the Sum
statement. SAS computes a value for the variable TeamTotal and writes it to the
PDV. The following figure shows the PDV with all of its values before SAS writes the
observation to the data set.
Figure 20.5 Program Data Vector with Computed Value of the Sum Statement
SAS then returns to the DATA statement to begin the next iteration. SAS resets the
values in the PDV in the following way:
n The values of variables created by the INPUT statement are set to missing.
n The value of the automatic variable _N_ is incremented by 1, and the value of
_ERROR_ is reset to 0.
The following figure shows the current values in the PDV.
22 2 0
Drop Drop Drop
Figure 20.8 Input Buffer, Program Data Vector, and First Two Observations
Input Buffer
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
C a r d i n a l s J a n e 9 7 8
As SAS continues to read records, the value in TeamTotal grows larger as more
participant scores are added to the variable. _N_ is incremented at the beginning of
each iteration of the DATA step. This process continues until SAS reaches the end
of the input file.
Data-reading statements: *
Optional SAS programming statements, for further processes the data for the current
example: observation
* The table shows the default processing of the DATA step. You can alter the sequence of statements
in the DATA step. You can code optional programming statements, such as creating or reinitializing a
constant, before you code a data-reading statement.
Note: You can also use functions to read and process data. For information about
how statements and functions process data differently, see “Using Functions to
Manipulate Files” in SAS Functions and CALL Routines: Reference. For specific
information about SAS functions, see the SAS File I/O and External Files categories
in “SAS Functions and CALL Routines by Category” in SAS Functions and CALL
Routines: Reference.
For more information, see the individual statements in SAS DATA Step Statements:
Reference.
LINK and RETURN statements alter the flow of control, execute statements
following the label specified, and return control of
the program to the next statement following the
LINK statement.
HEADER= option in the FILE alters the flow of control whenever a PUT
statement statement causes a new page of output to begin;
statements following the label specified in the
HEADER= option are executed until a RETURN
statement is encountered, at which time control
returns to the point from which the HEADER=
option was activated.
EOF= option in an INFILE statement alters the flow of execution when the end of the
input file is reached; statements following the
label that is specified in the EOF= option are
executed at that time.
_N_ automatic variable in an IF-THEN causes parts of the DATA step to execute only
construct for particular iterations.
ABORT statement in an IF-THEN stops execution of the DATA step and instruct
construct SAS to resume execution with the next DATA or
PROC step. It can also stop executing a SAS
program altogether, depending on the options
specified in the ABORT statement and on the
method of operation.
WHERE statement or WHERE= data causes SAS to read certain observations based
set option on one or more specified criteria.
executes program statements only when SAS crosses a default or a step boundary.
Consider the following DATA steps:
data _null_; 1
set allscores(drop=score5-score7);
title 'Student Test Scores'; 2
data employees; 3
set employee_list;
run;
data test;
set alltests;
run;
The OPTIONS statement specifies that the first observation that is read from the
input data set should be the 5th, and the last observation that is read should be the
55th. Inserting a RUN statement immediately before the OPTIONS statement
causes the first DATA step to reach its boundary (run;) before SAS encounters the
OPTIONS statement. The OPTIONS statement settings, therefore, are put into
effect for the second DATA step only.
Following the statements in a DATA step with a RUN statement is the simplest way
to make the step begin to execute, but a RUN statement is not always necessary.
SAS recognizes several step boundaries for a SAS step:
n another DATA statement
n a PROC statement
n a RUN statement
n an ENDSAS statement
When you submit a DATA step during interactive processing, it does not begin
running until SAS encounters a step boundary. This fact enables you to submit
statements as you write them while preventing a step from executing until you have
entered all the statements.
raw data instream data lines INPUT statement after the last data
line is read
observations one SAS data set SET and MODIFY after the last
sequentially statements observation is read
multiple SAS data one SET, MERGE, when all input data
sets MODIFY, or sets are exhausted
UPDATE statement
A DATA step that reads observations from a SAS data set with a SET statement that
uses the POINT= option has no way to detect the end of the input SAS data set.
(This method is called direct or random access.) Such a DATA step usually requires
a STOP statement.
456 Chapter 20 / DATA Step Processing
A DATA step also stops when it executes a STOP or an ABORT statement. Some
system options and data set options, such as OBS=, can cause a DATA step to stop
earlier than it would otherwise.
If the VARINITCHK= system option is set to ERROR, a DATA step stops processing
and writes an error to the SAS log if a variable is not initialized. For more
information, see “VARINITCHK= System Option” in SAS System Options:
Reference.
n data that you can remotely access through a SAS catalog entry, the clipboard, a
data URL, an email, an FTP protocol, a Hadoop Distributed File System, TCP/IP
socket, a URL, a WebDAV protocol, or through zlib services
n data that is stored in a Database Management System (DBMS) or other vendor's
data files.
Usually, DATA steps read input data records from only one of the first three sources
of input. However, DATA steps can use a combination of some or all of the sources.
About Creating a SAS Data Set with a DATA Step 457
1 Begin the DATA step and create a SAS data set called Weight.
2 Specify the external file that contains your data.
3 Read a record and assign values to three variables.
4 Calculate a value for variable WeightLoss.
5 Execute the DATA step.
6 Print data set Weight using the PRINT procedure.
7 Execute the PRINT procedure.
1 Begin the DATA step and create SAS data set Weight2.
2 Read a data line and assign values to three variables.
3 Calculate a value for variable WeightLoss2.
4 Begin the data lines.
5 Signal end of data lines with a semicolon and execute the DATA step.
6 Print data set Weight2 using the PRINT procedure.
458 Chapter 20 / DATA Step Processing
1 Use the MISSOVER option to assign missing values to variables that do not
contain values in records that do not satisfy the current INPUT statement.
2 Begin data lines.
3 Signal end of data lines and execute the DATA step.
4 Print data set Weight2 using the PRINT procedure.
5 Execute the PRINT procedure.
data all_errors;
length filelocation $ 60;
input filelocation; /* reads instream data */
infile daily filevar=filelocation
filename=daily end=done;
do while (not done);
input Station $ Shift $ Employee $ NumberOfFlaws;
output;
end;
put 'Finished reading ' daily=;
datalines;
pathmyfile_A
About Creating a SAS Data Set with a DATA Step 459
pathmyfile_B
pathmyfile_C
;
1 Begin the DATA step and create a SAS data set called Average_Loss.
2 Read an observation from SAS data set Weight.
3 Calculate a value for variable Percent.
4 Execute the DATA step.
460 Chapter 20 / DATA Step Processing
1 Begin the DATA step and create a SAS data set called Investment.
2 Calculate a value based on a $2,000 capital investment and 7% interest each
year from 1990 to 2009. Calculate variable values for one observation per
iteration of the DO loop.
3 Write each observation to data set Investment.
4 Write a note to the SAS log proving that the DATA step iterates only once.
5 Execute the DATA step.
6 To see your output, print the Investment data set with the PRINT procedure.
7 Use the FORMAT statement to write numeric values with dollar signs, commas,
and decimal points.
8 Execute the PRINT procedure.
data _null_; 3
set budget; 4
file print footnote; 5
MidYearTotal=Jan+Feb+Mar+Apr+May+Jun; 6
if _n_=1 then 7
do;
put @5 'Department' @30 'Mid-Year Total';
end;
put @7 Department @35 MidYearTotal; 8
run; 9
1 Define titles.
2 Define the footnote.
3 Begin the DATA step. _NULL_ specifies that no data set is created.
4 Read one observation per iteration from data set Budget.
5 Name the output file for the PUT statements and use the PRINT fileref. By
default, the PRINT fileref specifies that the file contains carriage-control
characters and titles. The FOOTNOTE option specifies that each page of output
contains a footnote.
6 Calculate a value for the variable MidYearTotal on each iteration.
7 Write variable name headings for the report on the first iteration only.
8 Write the current values of variables Department and MidYearTotal for each
iteration.
9 Execute the DATA step.
The example above uses the FILE statement with the PRINT fileref to produce
LISTING output. If you want to print to a file, specify a fileref or a complete filename.
Use the PRINT option if you want the file to contain carriage-control characters and
titles. The following example shows how to use the FILE statement in this way.
file 'external-file' footnote print;
You can also use the data _null_; statement to write to an external file. For more
information about writing to external files, see the FILE statement in SAS DATA Step
Statements: Reference, and the SAS documentation for your operating
environment.
+-----------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+
| | SUN | MON | TUE | WED | THU | FRI | SAT | | PAID BY PAID BY
| EXPENSE DETAIL | 07/11 | 07/12 | 07/13 | 07/14 | 07/15 | 07/16 | 07/17 | TOTALS | COMPANY EMPLOYEE
|-----------------------------------|--------|--------|--------|--------|--------|--------|--------|--------|
|Lodging, Hotel | 92.96| 92.96| 92.96| 92.96| 92.96| | | 464.80| 464.80
|Telephone | 4.57| 4.73| | | | | | 9.30| 9.30
|Personal Auto 36 miles @.28/mile | 5.04| | | | | 5.04| | 10.08| 10.08
|Car Rental, Taxi, Parking, Tolls | | 35.32| 35.32| 35.32| 35.32| 35.32| | 176.60| 176.60
|Airlines, Bus, Train (Attach Stub) | 485.00| | | | | 485.00| | 970.00| 970.00
|Dues | | | | | | | | |
|Registration Fees | 75.00| | | | | | | 75.00| 75.00
|Other (explain below) | | | | | | 5.00| | 5.00| 5.00
|Tips (excluding meal tips) | 3.00| | | | | 3.00| | 6.00| 6.00
|-----------------------------------|--------|--------|--------|--------|--------|--------|--------|--------|
|Meals | | | | | | | | |
|Breakfast | | | | | | 7.79| | 7.79| 7.79
|Lunch | | | | | | | | |
|Dinner | 36.00| 28.63| 36.00| 36.00| 30.00| | | 166.63| 166.63
|Business Entertainment | | | | | | | | |
|-----------------------------------|--------|--------|--------|--------|--------|--------|--------|--------|
|TOTAL EXPENSES | 641.57| 176.64| 179.28| 179.28| 173.28| 541.15| | 1891.20| 1611.40 279.80
+-----------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+
Charge to Division: ATW Region: TX Dept: MKT Acct: 6003 Date: 27JUL2010
The code shown below generates the report example. You must create your own
input data. It is beyond the scope of this discussion to fully explain the code that
generated the report example. For a complete explanation of this example, see the
SAS Guide to Report Writing: Examples.
Writing a Report with a DATA Step 463
data travel;
proc format;
value category 1='Lodging, Hotel'
2='Telephone'
3='Personal Auto'
4='Car Rental, Taxi, Parking, Tolls'
5='Airlines, Bus, Train (Attach Stub)'
6='Dues'
7='Registration Fees'
8='Other (explain below)'
9='Tips (excluding meal tips)'
10='Meals'
11='Breakfast'
12='Lunch'
13='Dinner'
14='Business Entertainment'
15='TOTAL EXPENSES';
value blanks 0=' '
other=(|8.2|);
value $cuscore ' '='________';
value nuscore . ='________';
run;
data _null_;
file print;
title 'Expense Report';
format rptdate actdate1 actdate2 dptdate rtrndate date9.;
set travel;
tips1-tips10 meals1-meals10
bkfst1-bkfst10 lunch1-lunch10
dinner1-dinner10 busent1-busent10
total1-total10;
array misc{8} $ misc1-misc8;
array mday{7} mday1-mday7;
dptday=weekday(dptdate);
mday{dptday}=dptdate;
if dptday>1 then
do dayofwk=1 to (dptday-1);
mday{dayofwk}=dptdate-(dptday-dayofwk);
end;
if dptday<7 then
do dayofwk=(dptday+1) to 7;
mday{dayofwk}=dptdate+(dayofwk-dptday);
end;
if rptdate=. then rptdate="&sysdate9"d;
tripnum=substr(tripid,4,2)||'-'||substr(scan(tripid,1),6);
proc format;
value $cntry 'BRZ'='Brazil'
'CHN'='China'
'IND'='India'
'INS'='Indonesia'
'USA'=''United States';
run;
data _null_;
length Country $ 3 Type $ 5;
input Year country $ type $ Kilotons;
format country $cntry.;
label type='Grain';
file print
ods=(variables=(country type kilotons));
put _ods_;
datalines;
2012 BRZ Wheat 3302
2012 BRZ Rice 10035
2012 BRZ Corn 31975
2012 CHN Wheat 109000
2012 CHN Rice 190100
2012 CHN Corn 119350
2012 IND Wheat 62620
2012 IND Rice 120012
2012 IND Corn 8660
2012 INS Wheat .
2012 INS Rice 51165
2012 INS Corn 8925
2012 USA Wheat 62099
2012 USA Rice 7771
2012 USA Corn 236064
;
run;
The DATA Step and ODS 467
data a;
if _N_ = 1 then do;
endTime = datetime();
DATA Step Processing Time 469
Output 20.5 Log Output for Finding Compilation and Execution Time
Note: Macro statements and macro variables are resolved at compilation time and
have no bearing on the time it takes to execute the DATA step. For information
about how SAS processes statements with Macro activity, see “Getting Started with
the Macro Facility” in SAS Macro Language: Reference, and “SAS Programs and
Macro Processing” in SAS Macro Language: Reference.
470 Chapter 20 / DATA Step Processing
471
21
Reading Raw Data
n instream data
n an external file
Note: Raw data does not include Database Management System (DBMS) files. You
must license SAS/ACCESS software to access data stored in DBMS files. For more
information about SAS/ACCESS features, see Chapter 33, “About SAS/ACCESS
Software,” on page 757.
n SAS functions
n Import Wizard
When you read raw data with a DATA step, you can use a combination of the
INPUT, DATALINES, and INFILE statements. SAS automatically reads your data
when you use these statements. For more information about these statements, see
“Reading Raw Data with the INPUT Statement” on page 477.
You can also use SAS functions to manipulate external files and to read records of
raw data. These functions provide more flexibility in handling raw data. For a
description of available functions, see the SAS File I/O and External File categories
in “SAS Functions and CALL Routines by Category” in SAS Functions and CALL
Routines: Reference. For more information about how statements and functions
manipulate files differently, see “Using Functions to Manipulate Files” in SAS
Functions and CALL Routines: Reference.
If your operating environment supports a graphical user interface, you can use the
EFI or the Import Wizard to read raw data. The EFI is a point-and-click graphical
interface that you can use to read and write data that is not in SAS software's
internal format. By using EFI, you can read data from an external file and write it to a
SAS data set. You can also read data from a SAS data set and write it to an external
file. See SAS/ACCESS Interface to PC Files: Reference for more information about
EFI.
Note: If the data file that you are passing to EFI is password protected, you are
prompted multiple times for your login ID and password.
The Import Wizard guides you through the steps to read data from an external data
source and write it to a SAS data set. As a wizard, it is a series of windows that
present simple choices to guide you through a process. See SAS/ACCESS
Interface to PC Files: Reference for more information about the wizard.
Operating Environment Information: Using external files with your SAS jobs
requires that you specify filenames with syntax that is appropriate to your operating
environment. See the SAS documentation for your operating environment for more
information.
Types of Data 473
Types of Data
Definitions
data values
are character or numeric values.
numeric value
contains only numbers, and sometimes a decimal point, a minus sign, or both.
When they are read into a SAS data set, numeric values are stored in the
floating-point format native to the operating environment. Nonstandard numeric
values can contain other characters as numbers; you can use formatted input to
enable SAS to read them.
character value
is a sequence of characters.
standard data
are character or numeric values that can be read with list, column, formatted, or
named input. Examples of standard data include:
n ARKANSAS
n 1166.42
nonstandard data
is data that can be read only with the aid of informats. Examples of nonstandard
data include numeric values that contain commas, dollar signs, or blanks; date
and time values; and hexadecimal and binary values.
Numeric Data
Numeric data can be represented in several ways. SAS can read standard numeric
values without any special instructions. To read nonstandard values, SAS requires
special instructions in the form of informats. Table 21.2 on page 474 shows
standard, nonstandard, and invalid numeric data values and the special tools, if any,
that are required to read them. For complete descriptions of all SAS informats, see
SAS Formats and Informats: Reference.
Character Data
A value that is read with an INPUT statement is assumed to be a character value if
one of the following is true:
n A dollar sign ($) follows the variable name in the INPUT statement.
n The variable has been previously defined as character. For example, a value is
assumed to be a character value if the variable has been previously defined as
character in a LENGTH statement, in the RETAIN statement, by an assignment
statement, or in an expression.
Input data that you want to store in a character variable can include any character.
Use the guidelines in the following table when your raw data includes leading blanks
and semicolons.
Table 21.4 Reading Instream Data and External Files Containing Leading Blanks and
Semicolons
leading or trailing blanks that formatted input and the List input trims leading
you want to preserve $CHARw. informat and trailing blanks from a
character value before
the value is assigned to
a variable.
delimiters, blank characters, or DSD option, with DLM= or These options enable
quoted strings DLMSTR= option in the SAS to read a character
INFILE statement value that contains a
delimiter within a quoted
string; these options can
also treat two
consecutive delimiters as
a missing value and
remove quotation marks
from character values.
Instream Data
The following example uses the INPUT statement to read in instream data:
data weight;
input PatientID $ Week1 Week8 Week16;
loss=Week1-Week16;
datalines;
2477 195 177 163
2431 220 213 198
2456 173 166 155
2412 135 125 116
;
Note: A semicolon appearing alone on the line immediately following the last data
line is the convention that is used in this example. However, a PROC statement,
DATA statement, or a global statement ending in a semicolon on the line
immediately following the last data line also submits the previous DATA step.
data weight;
input PatientID $ Week1 Week8 Week16;
loss=Week1-Week16;
datalines4;
24;77 195 177 163
24;31 220 213 198
24;56 173 166 155
24;12 135 125 116
;;;;
External Files
The following example shows how to read in raw data from an external file using the
INFILE and INPUT statements:
data weight;
infile file-specification or path-name;
input PatientID $ Week1 Week8 Week16;
loss=Week1-Week16;
run;
Note: See the SAS documentation for your operating environment for information
about how to specify a file with the INFILE statement.
n column input
n formatted input
n named input
You can also combine styles of input in a single INPUT statement. For details about
the styles of input, see the INPUT statement in SAS DATA Step Statements:
Reference.
478 Chapter 21 / Reading Raw Data
List Input
List input uses a scanning method for locating data values. Data values are not
required to be aligned in columns but must be separated by at least one blank (or
other defined delimiter). List input requires only that you specify the variable names
and a dollar sign ($), if defining a character variable. You do not have to specify the
location of the data fields.
An example of list input follows:
data scores;
length name $ 12;
input name $ score1 score2;
datalines;
Riley 1132 1187
Henderson 1015 1102
;
List input has several restrictions on the type of data that it can read:
n Input values must be separated by at least one blank (the default delimiter) or by
the delimiter specified with the DLM= or DLMSTR= option in the INFILE
statement. If you want SAS to read consecutive delimiters as if there is a missing
value between them, specify the DSD option in the INFILE statement.
n Blanks cannot represent missing values. A real value, such as a period, must be
used instead.
n To read and store a character input value longer than 8 bytes, define a variable's
length by using a LENGTH, INFORMAT, or ATTRIB statement before the INPUT
statement, or by using modified list input, which consists of an informat and the
colon modifier in the INPUT statement. See “Modified List Input” on page 478 for
more information.
n Character values cannot contain embedded blanks when the file is delimited by
blanks.
n Fields must be read in order.
Note: Nonstandard numeric values, such as packed decimal data, must use the
formatted style of input. See “Formatted Input” on page 480 for more information.
n The : (colon) format modifier enables you to use list input but also to specify an
informat after a variable name, whether character or numeric. SAS reads until it
encounters a blank column, the defined length of the variable (character only), or
the end of the data line, whichever comes first.
n The ~ (tilde) format modifier enables you to read and retain single quotation
marks, double quotation marks, and delimiters within character values.
The following is an example of the : and ~ format modifiers. You must use the DSD
option in the INFILE statement. Otherwise, the INPUT statement ignores the ~
format modifier.
data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
proc print data=scores;
Column Input
Column input enables you to read standard data values that are aligned in columns
in the data records. Specify the variable name, followed by a dollar sign ($) if it is a
character variable, and specify the columns in which the data values are located in
each record:
data scores;
infile datalines truncover;
input name $ 1-12 score2 17-20 score1 27-30;
datalines;
Riley 1132 987
Henderson 1015 1102
;
Note: Use the TRUNCOVER option in the INFILE statement to ensure that SAS
handles data values of varying lengths appropriately.
480 Chapter 21 / Reading Raw Data
n Input values can be read in any order, regardless of their position in the record.
n Both leading and trailing blanks within the field are ignored.
n Values do not need to be separated by blanks or other delimiters.
CAUTION! If you insert tabs while entering data in the DATALINES statement in
column format, you might get unexpected results. This issue exists when you use
the SAS Enhanced Editor or SAS Program Editor. To avoid the issue, do one of the
following:
n Replace all tabs in the data with single spaces using another editor outside of
SAS.
n Use the %INCLUDE statement from the SAS editor to submit your code.
n If you are using the SAS Enhanced Editor, select Tools ð Options ð Enhanced
Editor to change the tab size from 4 to 1.
Formatted Input
Formatted input combines the flexibility of using informats with many of the features
of column input. By using formatted input, you can read nonstandard data for which
SAS requires additional instructions. Formatted input is typically used with pointer
controls that enable you to control the position of the input pointer in the input buffer
when you read data.
The INPUT statement in the following DATA step uses formatted input and pointer
controls. Note that $12. and COMMA5. are informats; +4 and +6 are column pointer
controls.
data scores;
input name $12. +4 score1 comma5. +6 score2 comma5.;
datalines;
Riley 1,132 1,187
Henderson 1,015 1,102
;
Note: You can also use informats to read data that is not aligned in columns. See
“Modified List Input” on page 478 for more information.
Important points about formatted input are:
n Characters values can contain embedded blanks.
Reading Raw Data with the INPUT Statement 481
n Placeholders, such as a single period (.) are not required for missing data.
n With the use of pointer controls to position the pointer, input values can be read
in any order, regardless of their positions in the record.
n Values or parts of values can be reread.
n Formatted input enables you to read data stored in nonstandard form, such as
packed decimal or numbers with commas.
Named Input
You can use named input to read records in which data values are preceded by the
name of the variable and an equal sign (=). The following INPUT statement reads
the data lines containing equal signs.
data games;
input name=$ score1= score2=;
datalines;
name=riley score1=1132 score2=1187
;
Note: When an equal sign follows a variable in an INPUT statement, SAS expects
that data remaining on the input line contains only named input values. You cannot
switch to another form of input in the same INPUT statement after using named
input. Also, note that any variable that exists in the input data but is not defined in
the INPUT statement generates a note in the SAS log indicating a missing field.
variable-length data fields read delimited data list input with or without a
and records format modifier in the INPUT
statement and the
TRUNCOVER, DLM=,
DLMSTR=, or DSD options
in the INFILE statement.
instream data lines control the reading with INFILE statement with
DATALINES and appropriate
special options
options.
For further information about data-reading features, see the INPUT and INFILE
statements in SAS DATA Step Statements: Reference.
n It does not match the input style used. An example is if it is read as standard
numeric data (no dollar sign or informat) but it does not conform to the rules for
standard SAS numbers.
n It is out of range (too large or too small).
n prints the input line and column number containing the invalid value in the SAS
log. If a line contains unprintable characters, it is printed in hexadecimal form. A
scale is printed above the input line to help determine column numbers.
484 Chapter 21 / Reading Raw Data
datalines;
Smith 2 5 9
Jones 4 b 8
Carter a 4 7
Reed 3 5 c
;
Note that you must use a period when you specify a special missing numeric value
in an expression or assignment statement, as in the following:
x=.d;
However, you do not need to specify each special missing numeric data value with a
period in your input data. For example, the following DATA step, which uses periods
Reading Binary Data 485
in the input data for special missing values, produces the same result as the input
data without periods:
data test_results;
missing a b c;
input name $8. Answer1 Answer2 Answer3;
datalines;
Smith 2 5 9
Jones 4 .b 8
Carter .a 4 7
Reed 3 5 .c
;
proc print;
run;
Note: SAS is displayed and prints special missing values that use letters in
uppercase.
Definitions
binary data
is numeric data that is stored in binary form. Binary numbers have a base of two
and are represented with the digits 0 and 1.
packed decimal data
are binary decimal numbers that are encoded by using each byte to represent
two decimal digits. Packed decimal representation stores decimal data with
exact precision; the fractional part of the number must be determined by using
an informat or format because there is no separate mantissa and exponent.
486 Chapter 21 / Reading Raw Data
Different computer platforms store numeric binary data in different forms. The
ordering of bytes differs by platforms that are referred to as either “big endian” or
“little endian.” For more information, see “Byte Ordering for Integer Binary Data on
Big Endian and Little Endian Platforms” in SAS Formats and Informats: Reference.
SAS provides a number of informats for reading binary data and corresponding
formats for writing binary data. Some of these informats read data in native mode,
that is, by using the byte-ordering system that is standard for the system on which
SAS is running. Other informats force the data to be read by the IBM 370 standard,
regardless of the native mode of the system on which SAS is running. The informats
that read in native or IBM 370 mode are listed in the following table.
If you write a SAS program that reads binary data and that is run on only one type of
system, you can use the native mode informats and formats. However, if you want
to write SAS programs that can be run on multiple systems that use different byte-
storage systems, use the IBM 370 informats. The IBM 370 informats enable you to
write SAS programs that can read data in this format and that can be run in any
SAS environment, regardless of the standard for storing numeric data.1 The IBM
370 informats can also be used to read data originally written with the corresponding
native mode formats on an IBM mainframe.
Note: Anytime a text file originates from anywhere other than the local encoding
environment, it might be necessary to specify the ENCODING= option on either
EBCDIC or ASCII systems. When you read an EBCDIC text file on an ASCII
platform, it is recommended that you specify the ENCODING= option in the
FILENAME or INFILE statement. However, if you use the DSD and the DLM= or
DLMSTR= options on the INFILE statement, the ENCODING= option is a
requirement because these options require certain characters in the session
encoding (such as quotation marks, commas, and blanks). Reserve encoding-
specific informats for use with true binary files that contain both character and non-
character fields.
For complete descriptions of all SAS formats and informats, including how numeric
binary data is written, see SAS Formats and Informats: Reference.
Definition
column-binary data storage
is an older form of data storage that is no longer widely used and is not needed
by most SAS users. Column-binary data storage compresses data so that more
than 80 items of data can be stored on a single “virtual” punched card. The
advantage is that this method enables you to store more data in the same
1. For example, using the IBM 370 informats, you could download data that contain binary integers from a mainframe to a
PC and then use the S370FIB informats to read the data.
488 Chapter 21 / Reading Raw Data
n how to set the RECFM= and LRECL= options in the INFILE statement
To read column-binary data, you must set two options in the INFILE statement:
n Set RECFM= to F for fixed.
n Set the LRECL= to 160, because each card column of column-binary data is
expanded to two bytes before the fields are read.
For example, to read column-binary data from a file, use an INFILE statement in the
following form before the INPUT statement that reads the data:
infile file-specification or path-name
recfm=f
lrecl=160;
Note: The expansion of each column of column-binary data into two bytes does not
affect the position of the column pointer. You use the absolute column pointer
control @, as usual, because the informats automatically compute the true location
on the doubled record. If a value is in column 23, use the pointer control @23 to
move the pointer there.
the Hollerith system, each column on a card had a maximum of two punches, one
punch in the zone portion, and one in the digit portion. These punches
corresponded to a pair of values, and each pair of values corresponded to a specific
alphabetic character or sign and numeric digit.
In the zone portion of the punched card (the first three rows), the zone component of
the pair can have the values 12, 11, 0 (or 10), or not punched. In the digit portion of
the card (the fourth through the twelfth rows), the digit component of the pair can
have the values 1 through 9, or not punched.
The following figure shows the multi-punch combinations corresponding to letters of
the alphabet.
row punch
12 X X X X X X X X X
zone 11 X X X X X X X X X
portion 10 X X X X X X X X
1 X X
2 X X X
3 X X X
4 X X X
digit
portion 5 X X X
6 X X X
7 X X X
8 X X X
9 X X X
alphabetic
character A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
SAS stores each column of column-binary data (a “virtual” punched card) in two
bytes. Since each column has only 12 positions and since 2 bytes contain 16
positions, the 4 extra positions within the bytes are located at the beginning of each
byte. The following figure shows the correspondence between the rows of “virtual”
punched card data and the positions within 2 bytes that SAS uses to store them.
SAS stores a punched position as a binary 1 bit and an unpunched position as a
binary 0 bit.
490 Chapter 21 / Reading Raw Data
byte 1 byte 2
byte
positions 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
12
11
10 ( o r 0)
1
2
3
4
5
6
7
8
9
491
22
BY-Group Processing in the DATA
Step
common use of BY-group processing in the DATA step is to combine two or more
SAS data sets. To do this, you use the BY statement with a SET, MERGE,
MODIFY, or UPDATE statement.
BY variable
names a variable or variables by which the data set is sorted or indexed. All data
sets must be ordered or indexed on the values of the BY variable if you use the
SET, MERGE, or UPDATE statements. If you use MODIFY, data does not need
to be ordered. However, your program might run more efficiently with ordered
data. All data sets that are being combined must include one or more BY
variables. The position of the BY variable in the observations does not matter.
BY value
is the value or formatted value of the BY variable.
BY group
includes all observations with the same BY value. If you use more than one
variable in a BY statement, a BY group is a group of observations with the same
combination of values for these variables. Each BY group has a unique
combination of values for the variables.
FIRST.variable and LAST.variable
are variables that SAS creates for each BY variable. SAS sets FIRST.variable
when it is processing the first observation in a BY group, and sets LAST.variable
when it is processing the last observation in a BY group. These assignments
enable you to take different actions, based on whether processing is starting for
a new BY group or ending for a BY group. For more information, see “FIRST.
and LAST. DATA Step Variables” on page 498.
For more information about BY-Group processing, see Chapter 23, “Reading,
Combining, and Modifying SAS Data Sets,” on page 509. See also Combining and
Modifying SAS Data Sets: Examples.
Syntax
DATA step BY-groups are created and managed using the BY statement in SAS.
See “BY Statement” in SAS DATA Step Statements: Reference for complete syntax
information.
Understanding BY Groups
data zip;
set zip; by zipcode;
run;
The figure shows three BY groups. The data set is shown with the BY variables
State and City printed on the left for easy reading. The position of the BY variables
in the observations does not affect how the values are grouped and ordered.
The observations are arranged so that the observations for Arizona occur first. The
observations within each value of State are arranged in order of the value of City.
Each BY group has a unique combination of values for the variables State and City.
For example, the BY value of the first BY group is AZ Tucson, and the BY value of
the second BY group is FL Lakeland.
Understanding BY Groups 495
Here is the code for creating the output shown in the figure Figure 22.2 on page
495 :
Example Code 22.3 Create the Zip Data Set
/* BY Groups with Multiple BY Variables */
data zip;
input State $ City $ Street $13-22 ZipCode ;
datalines;
FL Miami Nervia St 33146
FL Miami Rice St 33133
FL Miami Corsica St 33146
FL Miami Thomas Ave 33133
FL Miami Surrey Dr 33133
FL Miami Trade Ave 33133
FL Lakeland French Ave 33801
FL Lakeland Egret Dr 33809
AZ Tucson Domenic Ln 85730
AZ Tucson Gleeson Pl 85730
;
Example Code 22.4 Sort and Group the zipCode Data Set by Multiple BY Variables
proc sort data=zip;
by State City;
run;
data zip;
set zip;
by State City;
run;
proc print data=zip noobs;
title 'BY Groups with Multiple BY Variables: State City';
run;
496 Chapter 22 / BY-Group Processing in the DATA Step
As a general rule, specify the variables in the PROC SORT BY statement in the
same order that you specify them in the DATA step BY statement. For a detailed
description of the default sorting orders for numeric and character variables, see the
SORT procedure in Base SAS Procedures Guide.
Note: The BY statement honors the linguistic collation of sorted data when you use
the SORT procedure with the SORTSEQ=LINGUISTIC option.
For example, if the DATA step specifies the variable state in the BY statement, then
SAS creates the temporary variables FIRST.state and LAST.state.
These temporary variables are available for DATA step programming but are not
added to the output data set. Their values indicate whether an observation is one of
the following positions:
n the first one in a BY group
n both first and last, as is the case when there is only one observation in a BY
group
You can take actions conditionally, based on whether you are processing the first or
the last observation in a BY group. See “Processing BY-Groups Conditionally” on
page 503 for more information.
n FIRST and LAST variables are referenced in the DATA step but they are not part
of the output data set.
n Six temporary variables are created for each BY variable: FIRST.State,
LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.
data zip;
input State $ City $ ZipCode Street $20-29;
datalines;
FL Miami 33133 Rice St
FL Miami 33133 Thomas Ave
FL Miami 33133 Surrey Dr
FL Miami 33133 Trade Ave
FL Miami 33146 Nervia St
FL Miami 33146 Corsica St
FL Lakeland 33801 French Ave
FL Lakeland 33809 Egret Dr
AZ Tucson 85730 Domenic Ln
AZ Tucson 85730 Gleeson Pl
;
proc sort data=zip; by State City ZipCode; run;
data zip2;
set zip;
by State City ZipCode;
put _n_= City State ZipCode
first.city= last.city=
first.state= last.state=
first.ZipCode= last.ZipCode= ;
500 Chapter 22 / BY-Group Processing in the DATA Step
run;
Example Code 22.1 Grouping Observations by State, City, and ZIP Code
Note: This is a chart used to display the contents of the log more clearly. It is not
the output data set.
datalines;
FL Miami 33133 Rice St
FL Miami 33133 Thomas Ave
FL Miami 33133 Surrey Dr
FL Miami 33133 Trade Ave
FL Miami 33146 Nervia St
FL Miami 33146 Corsica St
FL Lakeland 33801 French Ave
FL Lakeland 33809 Egret Dr
AZ Tucson 85730 Domenic Ln
AZ Tucson 85730 Gleeson Pl
;
proc sort data=zip; by City State ZipCode; run;
data zip2;
set zip;
by City State ZipCode;
put _n_= City State ZipCode
first.city= last.city=
first.state= last.state=
first.ZipCode= last.ZipCode=;
run;
proc print data=zip2; title 'By City, State, Zip'; run;
Example Code 22.2 Grouping Observations by City, State, and ZIP Code
data fruit;
input x $ y $ 10-18 z $ 21-29;
datalines;
apple banana coconut
apple banana coconut
apple blueberry citron
apricot blueberry citron
;
data _null_;
set fruit; by x y z;
if _N_=1 then put 'Grouped by X Y Z';
put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run;
data _null_;
set fruit; by y x z;
if _N_=1 then put 'Grouped by Y X Z';
put _N_= first.y= last.y= first.x= last.x= first.z= last.z= ;
run;
Grouped by X Y Z
_N_=1 FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
_N_=4 FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
Grouped by Y X Z
_N_=1 FIRST.y=1 LAST.y=0 FIRST.x=1 LAST.x=0 FIRST.z=1 LAST.z=0
_N_=2 FIRST.y=0 LAST.y=1 FIRST.x=0 LAST.x=1 FIRST.z=0 LAST.z=1
_N_=3 FIRST.y=1 LAST.y=0 FIRST.x=1 LAST.x=1 FIRST.z=1 LAST.z=1
_N_=4 FIRST.y=0 LAST.y=1 FIRST.x=1 LAST.x=1 FIRST.z=1 LAST.z=1
Overview
The most common use of BY-group processing is to combine data sets by using the
BY statement with the SET, MERGE, MODIFY, or UPDATE statements. (If you use
a SET, MERGE, or UPDATE statement with the BY statement, your observations
must be grouped or ordered.) When processing these statements, SAS reads one
observation at a time into the program data vector. With BY-group processing, SAS
selects the observations from the data sets according to the values of the BY
variable or variables. After processing all the observations from one BY group, SAS
expects the next observation to be from the next BY group.
The BY statement modifies the action of the SET, MERGE, MODIFY, or UPDATE
statement by controlling when the values in the program data vector are set to
missing. During BY-group processing, SAS retains the values of variables until it has
copied the last observation that it finds for that BY group in any of the data sets.
Processing BY-Groups in the DATA Step 503
Without the BY statement, the SET statement sets variables to missing when it
reads the last observation. The MERGE statement does not set variables to missing
after the DATA step starts reading observations into the program data vector.
data total_sale(drop=sales);
set region.sales
by month notsorted;
total+sales;
if last.month;
run;
The GROUPFORMAT option is valid only in the DATA step that creates the SAS
data set. It is particularly useful with user-defined formats. The following examples
illustrate the use of the GROUPFORMAT option.
data _null_;
format height range.;
set sorted_class;
by height groupformat;
if first.height then
put 'Shortest in ' height 'measures ' height:best12.;
run;
Shortest
in Under 55 measures 51.3
Shortest in 55 to 60 measures 56.3
Shortest in 60 to 65 measures 62.5
Shortest in 65 to 70 measures 65.3
Shortest in Over 70 measures 72
Joseph 4
Ian 5
Jan 6
;
/* Create a user-defined format */
proc format;
value Range 1-2='Low'
3-4='Medium'
5-6='High';
run;
23
Reading, Combining, and Modifying
SAS Data Sets
Definitions for Reading, Combining, and Modifying SAS Data Sets . . . . . . . . . . . . 509
Overview of Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Reading SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Reading a Single SAS Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Reading from Multiple SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
Controlling the Reading and Writing of Variables and Observations . . . . . . . . . . . . . 511
Combining SAS Data Sets: Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
What You Need to Know Before Combining Information Stored in
Multiple SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
The Four Ways That Data Can Be Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
Access Methods: Sequential versus Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Overview of Methods for Combining SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
Overview of Tools for Combining SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
How to Prepare Your Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Combining SAS Data Sets: Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Concatenating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
One-to-One Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
One-to-One Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Match-Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Updating with the UPDATE and the MODIFY Statements . . . . . . . . . . . . . . . . . . . . . . 543
Error Checking When Using Indexes to Randomly Access or Update Data . . . . . 555
The Importance of Error Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Error-Checking Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Example 1: Routing Execution When an Unexpected Condition Occurs . . . . . . . . . 556
Example 2: Using Error Checking on All Statements That Use KEY= . . . . . . . . . . . 559
n interleaving
n one-to-one reading
n one-to-one merging
n match-merging
The methods for combining SAS data sets are defined in “Combining SAS Data
Sets: Methods” on page 523.
Modifying SAS data sets
refers to using the MODIFY statement to update information in a SAS data set in
place. The MODIFY statement can save disk space because it modifies data in
place, without creating a copy of the data set. You can modify a SAS data set
with programming statements or with information that is stored in another data
set.
Overview of Tools
The primary tools that are used for reading, combining, and modifying SAS data
sets are four statements: SET, MERGE, MODIFY, and UPDATE. This section
describes these tools and shows examples. For complete information about these
statements, see the SAS DATA Step Statements: Reference.
run;
For details about reading from multiple SAS data sets, see “Combining SAS Data
Sets: Methods” on page 523.
Table 23.1 Statements and Options That Control Reading and Writing
KEEP KEEP=
RENAME RENAME=
DELETE OBS=
REMOVE
OUTPUT
Use statements or data set options (such as KEEP= and DROP=) to control the
variables and observations that you want to write to the output data set. The
WHERE statement is an exception: it controls which observations are read into the
program data vector based on the value of a variable. You can use data set options
(including WHERE=) on input or output data sets, depending on their function and
what you want to control. You can also use SAS system options to control your data.
512 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
n one-to-many
n many-to-one
n many-to-many
To obtain the results that you want, you should understand how each of these
methods combines observations, how each method treats duplicate values of
Combining SAS Data Sets: Basic Concepts 513
common variables, and how each method treats missing values or nonmatched
values of common variables. Some of the methods also require that you preprocess
your data sets by sorting them or by creating indexes. See the description of each
method in “Combining SAS Data Sets: Methods” on page 523.
One-to-One Relationship
In a one-to-one relationship, typically a single observation in one data set is related
to a single observation from another based on the values of one or more selected
variables. A one-to-one relationship implies that each value of the selected variable
occurs no more than once in each data set. When you work with multiple selected
variables, this relationship implies that each combination of values occurs no more
than once in each data set.
In the following example, observations in data sets Salary and Taxes are related by
common values for EmployeeNumber.
SALARY TAXES
ONE TWO
A B C A E F
1 5 6 1 2 0
3 3 4 1 3 99
1 4 88
1 5 77
2 1 66
2 2 55
3 4 44
In the following example, observations in data sets One, Two, and Three are related
by common values for variable ID. Values of ID are unique in data sets One and
Three but not in Two. For values 2 and 3 of ID, a one-to-many relationship exists
between observations in data sets One and Two. A many-to-one relationship exists
between observations in data sets Two and Three.
Many-to-Many Relationships
The many-to-many category implies that multiple observations from each input data
set can be related based on values of one or more common variables.
In the following example, observations in data sets BreakDown and Maintenance
are related by common values for variable Vehicle. Values of Vehicle are not unique
in either data set. A many-to-many relationship exists between observations in these
data sets for values AAA and CCC of Vehicle.
Combining SAS Data Sets: Basic Concepts 515
BREAKDOWN MAINTENANCE
Overview
Once you have established data relationships, the next step is to determine the best
mode of data access to relate the data. You can access observations sequentially in
the order in which they appear in the physical file. Or you can access them directly.
That is, you can go straight to an observation in a SAS data set without having to
process each observation that precedes it.
Sequential Access
The simplest and perhaps most common way to process data with a DATA step is to
read observations in a data set sequentially. You can read observations sequentially
using the SET, MERGE, UPDATE, or MODIFY statements. You can also use the
SAS File I/O functions. OPEN, FETCH, and FETCHOBS are examples.
Direct Access
Direct access allows a program to access specific observations based on one of two
methods:
n by an observation number
To access observations directly by their observation number, use the POINT= option
with the SET or MODIFY statement. The POINT= option names a variable whose
current value determines which observation a SET or MODIFY statement reads.
To access observations directly based on the values of one or more specified
variables, you must first create an index for the variables and then read the data set
using the KEY= option. The KEY= option can be specified with either the SET
statement or the MODIFY statement. An index is a separate structure that contains
516 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
the data values of the key variable or variables, paired with a location identifier for
the observations containing the value.
Note: You can also use the SAS File I/O functions such as CUROBS, NOTE,
POINT, and FETCHOBS to access observations by observation number.
n one-to-one reading
n one-to-one merging
n match merging
n updating
Concatenating
The following figure shows the results of concatenating two SAS data sets.
Concatenating the data sets appends the observations from one data set to another
data set. The DATA step reads Data1 sequentially until all observations have been
processed, and then reads Data2. Data set Combined contains the results of the
concatenation. Note that the data sets are processed in the order in which they are
listed in the SET statement.
Interleaving
The following figure shows the results of interleaving two SAS data sets.
Interleaving intersperses observations from two or more data sets, based on one or
more common variables. Data set Combined shows the result.
DATA1 COMBINED
DATA2
Year Year
Year
1991 1991
1992 1992 1992
1993 + 1993 = 1992
1994 1994 1993
1995 1995 1993
1996 1994
1994
d a t a c o mb i n e d ;
1995
s et dat a1 dat a2;
by Year ; 1995
r un;
1996
d a t a c o mb i n e d ;
s et dat a1;
s et dat a2;
r un;
d a t a c o mb i n e d ;
me r g e d a t a 1 d a t a 2 ;
r un;
Match-Merging
The following figure shows the results of match-merging. Match-merging combines
observations from two or more SAS data sets into a single observation in a new
data set based on the values of one or more common variables. Data set Combined
shows the results.
d a t a c o mb i n e d ;
me r g e d a t a 1 d a t a 2 ;
by Year ;
r un;
Updating
The following figure shows the results of updating a master data set. Updating uses
information from observations in a transaction data set to delete, add, or alter
information in observations in a master data set. You can update a master data set
by using the UPDATE statement or the MODIFY statement. If you use the UPDATE
statement, your input data sets must be sorted by the values of the variables listed
in the BY statement. (In this example, Master and Transaction are both sorted by
Combining SAS Data Sets: Basic Concepts 519
Year.) If you use the MODIFY statement, your input data does not need to be
sorted.
UPDATE replaces an existing file with a new file, enabling you to add, delete, or
rename columns. MODIFY performs an update in place by rewriting only those
records that have changed, or by appending new records to the end of the file.
Note that by default, UPDATE and MODIFY do not replace nonmissing values in a
master data set with missing values from a transaction data set.
MASTER MASTER
Year VarX VarY Year VarX VarY
1985 X1 Y1 1985 X1 Y1
1986 X1 Y1 1986 X1 Y1
1987 X1 Y1 1987 X1 Y1
1988 X1 Y1 1988 X1 Y1
1989 X1 Y1 TRANSACTION 1989 X1 Y1
1990 X1 Y1 Year VarX VarY 1990 X1 Y1
1991 X1 Y1 1991 X2 1991 X2 Y1
1992 X1 Y1 + 1992 X2 Y2 = 1992 X2 Y2
1993 X1 Y1 1993 X2 1993 X2 Y2
1994 X1 Y1 1993 Y2 1994 X1 Y1
1995 X2 Y2 1995 X2 Y2
d a t a ma s t e r ;
u p d a t e ma s t e r t r a n s a c t i o n ;
by Year ;
r un;
Access Method
* PROC SQL is the SAS implementation of Structured Query Language. In addition to expected SQL capabilities, PROC
SQL includes additional capabilities specific to SAS, such as the use of formats and SAS macro language.
Combining SAS Data Sets: Basic Concepts 521
n Ensure that observations are in the correct order, or that they can be retrieved in
the correct order (for example, by using an index).
n Test your program.
To print a sample of the observations, use the PRINT procedure or the REPORT
procedure.
You can also use functions such as VTYPE, VLENGTH, and VLENGTHX to show
specific descriptor information. For complete information about these functions, see
SAS Functions and CALL Routines: Reference.
SAS includes only one variable of a given name in the new data set. If two data
sets have variables with the same names but different data, the values from the
last data set that was read are written over the values from the previously read
data sets.
To correct the error, you can rename variables before you combine the data sets
by using the RENAME= data set option in the SET, UPDATE, or MERGE
statement. Or you can use the DATASETS procedure.
n common variables with the same data but different attributes
The way SAS handles these differences depends on which attributes are
different:
o type attribute
If the type attribute is different, SAS stops processing the DATA step and
issues an error message stating that the variables are incompatible.
To correct this error, you must use a DATA step to re-create the variables.
The SAS statements that you use depend on the nature of the variable.
o length attribute
If the length attribute is different, SAS takes the length from the first data set
that contains the variable. In the following example, all data sets that are
listed in the MERGE statement contain the variable Mileage. In Quarter1, the
length of the variable Mileage is four bytes; in Quarter2, it is eight bytes and
in Quarter3 and Quarter4, it is six bytes. In the output data set Yearly, the
length of the variable Mileage is four bytes, which is the length derived from
Quarter1.
data yearly;
merge quarter1 quarter2 quarter3 quarter4;
by Account;
run;
To override the default and set the length yourself, specify the appropriate
length in a LENGTH statement that precedes the SET, MERGE, or UPDATE
statement.
Note: If the length of a variable changes as a result of combining data sets,
SAS prints a warning message to the log and issues a nonzero return code.
For example, on z/OS, the value for SYSRC would be 4. If you do not want
SAS to issue a warning, you can turn it off by setting the VARLENCHK
system option to NOWARN. For example, if you expect truncation of data
because you are removing insignificant blanks from the end of a character
value, you might not want the warnings. For more information, see
“VARLENCHK= System Option” in SAS System Options: Reference.
o label, format, and informat attributes
If any of these attributes are different, SAS takes the attribute from the first
data set that contains the variable with that attribute. However, any label,
format, or informat that you explicitly specify overrides a default. If all data
sets contain explicitly specified attributes, the one specified in the first data
set overrides the others. To ensure that the new output data set has the
attributes that you prefer, use an ATTRIB statement.
You can also use SAS File I/O functions, such as VLABEL, VLABELX, and
other Variable Information functions to access this information. For complete
information about these functions, see SAS Functions and CALL Routines:
Reference.
Combining SAS Data Sets: Methods 523
o extended attributes
Like formats and labels, extended attributes are automatically passed from
the input data set to the output data set in a DATA step. If two input data sets
contain extended attributes, then SAS preserves the extended attributes from
the first data set read and applies those attributes to the output data set. To
ensure that the new output data set has the extended attributes that you
prefer, use the DATASETS procedure to add, delete, remove, set, and update
extended attributes. For more information about the DATASETS procedure
see “DATASETS Procedure” in Base SAS Procedures Guide.
Concatenating
Definition
Concatenating data sets is the combining of two or more data sets, one after the
other, into a single data set. The number of observations in the new data set is the
sum of the number of observations in the original data sets. The order of
observations is sequential. All observations from the first data set are followed by all
observations from the second data set, and so on.
In the simplest case, all input data sets contain the same variables. If the input data
sets contain different variables, observations from one data set have missing values
for variables defined only in other data sets. In either case, the variables in the new
data set are the same as the variables in the old data sets.
524 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
Syntax
Use this form of the SET statement to concatenate data sets:
SET data-set(s);
where
data-set
specifies any valid SAS data set name.
For a complete description of valid SAS data set names, see the SET statement in
SAS DATA Step Statements: Reference.
1 a Ant 5 1 g Grape 69
2 b Bird 2 h Hazelnut 55
3 c Cat 17 3 i Indigo .
4 d Dog 9 4 j Jicama 14
5 e Eagle 5 k Kale 5
Combining SAS Data Sets: Methods 525
6 f Frog 76 6 l Lentil 77
The following program uses a SET statement to concatenate the data sets and then
prints the results:
data concatenation;
set animal plant;
run;
The resulting data set CONCATENATION has 12 observations, which is the sum of
the observations from the combined data sets. The program data vector contains all
variables from all data sets. The values of variables found in one data set but not in
another are set to missing.
YEAR1 YEAR2
Date1 Date2
526 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
2009
2010 2010
2011 2011
2012 2012
2013
2014
The following SQL code creates and prints the table Combined.
proc sql;
title 'SQL Table Combined';
create table combined as
select * from year1
union all
select * from year2;
select * from combined;
quit;
Appending Files
Instead of concatenating data sets or tables, you can append them and produce the
same results as concatenation. SAS concatenates data sets (DATA step) and tables
(SQL) by reading each row of data to create a new file. To avoid reading all the
records, you can append the second file to the first file by using the APPEND
procedure:
proc append base=year1 data=year2;
run;
Efficiency
If no additional processing is necessary, using PROC APPEND or the APPEND
statement in PROC DATASETS is more efficient than using a DATA step to
concatenate data sets.
Interleaving
Definition
Interleaving uses a SET statement and a BY statement to combine multiple data
sets into one new data set. The number of observations in the new data set is the
sum of the number of observations from the original data sets. However, the
observations in the new data set are arranged by the values of the BY variable or
variables and, within each BY group, by the order of the data sets in which they
occur. You can interleave data sets either by using a BY variable or by using an
index.
Syntax
Use this form of the SET statement to interleave data sets when you use a BY
variable:
SET data-set(s);
BY variable(s);
where
data-set
specifies a one-level name, a two-level name, or one of the special SAS data set
names.
variable
specifies each variable by which the data set is sorted. These variables are
referred to as BY variables for the current DATA or PROC step.
Use this form of the SET statement to interleave data sets when you use an index:
SET data-set-1 . . . data-set-n KEY= index;
where
data-set
specifies a one-level name, a two-level name, or one of the special SAS data set
names.
index
provides nonsequential access to observations in a SAS data set, which are
based on the value of an index variable or key.
For a complete description of the SET statement, including SET with the KEY=
option, see the SET statement in SAS DATA Step Statements: Reference.
528 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
Sort Requirements
Before you can interleave data sets, the observations must be sorted or grouped by
the same variable or variables that you use in the BY statement, or you must have
an appropriate index for the data sets.
The following program uses SET and BY statements to interleave the data sets, and
prints the results:
data interleaving;
set animal plant;
by Common;
run;
Combining SAS Data Sets: Methods 529
The resulting data set INTERLEAVING has 12 observations, which is the sum of the
observations from the combined data sets. The new data set contains all variables
from both data sets. The value of variables found in one data set but not in the other
are set to missing, and the observations are arranged by the values of the BY
variable.
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
6 e Eagle 6 e Eggplant
530 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
The following program uses SET and BY statements to interleave the data sets, and
prints the results:
data interleaving2;
set animal1 plant1;
by Common;
run;
Output 23.4 Interleaved Data Sets with Duplicate Values of the BY Variable
The number of observations in the new data set is the sum of the observations in all
the data sets. The observations are written to the new data set in the order in which
they occur in the original data sets.
1 a Ant 1 a Apple
Combining SAS Data Sets: Methods 531
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
5 f Fig
This program uses SET and BY statements to interleave these data sets, and prints
the results:
data interleaving3;
set animal2 plant2;
by Common;
run;
The resulting data set has nine observations arranged by the values of the BY
variable.
One-to-One Reading
Definition
One-to-one reading combines observations from two or more data sets into one
observation by using two or more SET statements to read observations
independently from each data set. This process is also called one-to-one matching.
The new data set contains all the variables from all the input data sets. The number
of observations in the new data set is the number of observations in the smallest
original data set. If the data sets contain common variables, the values that are read
in from the last data set replace the values that were read in from earlier data sets.
Syntax
Use this form of the SET statement for one-to-one reading:
SET data-set-1;
SET data-set-2;
where
data-set-1
specifies a one-level name, a two-level name, or one of the special SAS data set
names. data-set-1 is the first file that the DATA step reads.
data-set-2
specifies a one-level name, a two-level name, or one of the special SAS data set
names. data-set-2 is the second file that the DATA step reads.
CAUTION! Use care when you combine data sets with multiple SET statements.
Using multiple SET statements to combine observations can produce undesirable
results. Test your program on representative samples of the data sets before using this
method to combine them.
For a complete description, see SET Statement in SAS DATA Step Statements:
Reference.
data vector to missing, except for those variables that were created or assigned
values during the DATA step.
Execution — Step 2
SAS continues reading from one data set and then the other until it detects an
end-of-file indicator in one of the data sets. SAS stops processing with the last
observation of the shortest data set and does not read the remaining
observations from the longer data set.
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 g Fig
The following program uses two SET statements to combine observations from
Animal and Plant, and prints the results:
data twosets;
set animal;
set plant;
run;
Output 23.6 Data Set Created from Two Data Sets That Have Equal Observations
Each observation in the new data set contains all the variables from all the data
sets. Note, however, that the Common variable value in observation 6 contains a
“g.” The value of Common in observation 6 of the Animal data set was overwritten
by the value in Plant, which was the data set that SAS read last.
One-to-One Merging
Definition
One-to-one merging combines observations from two or more SAS data sets into a
single observation in a new data set. To perform a one-to-one merge, use the
MERGE statement without a BY statement. SAS combines the first observation from
all data sets in the MERGE statement into the first observation in the new data set,
the second observation from all data sets into the second observation in the new
data set, and so on. In a one-to-one merge, the number of observations in the new
data set equals the number of observations in the largest data set that was named
in the MERGE statement.
Combining SAS Data Sets: Methods 535
If you use the MERGENOBY= SAS system option, you can control whether SAS
issues a message when MERGE processing occurs without an associated BY
statement.
Syntax
Use this form of the MERGE statement to merge SAS data sets:
MERGE data-set(s);
where
data-set
names at least two existing SAS data sets.
CAUTION! Avoid using duplicate values or different values of common variables.
One-to-one merging with data sets that contain duplicate values of common variables
can produce undesirable results. If a variable exists in more than one data set, the value
from the last data set that is read is the one that is written to the new data set. The
variables are combined exactly as they are read from each data set. Using a one-to-one
merge to combine data sets with different values of common variables can also produce
undesirable results. If a variable exists in more than one data set, the value from the last
data set read is the one that is written to the new data set even if the value is missing.
Once SAS has processed all observations in a data set, all subsequent observations in
the new data set have missing values for the variables that are unique to that data set.
For a complete description of the MERGE statement, see the MERGE statement in
SAS DATA Step Statements: Reference.
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 g Fig
The following program merges these data sets and prints the results:
data combined;
merge animal plant;
run;
Output 23.7 Merged Data Sets That Have an Equal Number of Observations
Each observation in the new data set contains all variables from all data sets. If two
data sets contain the same variables, the values from the second data set replace
the values from the first data set, as shown in observation 6.
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog
Combining SAS Data Sets: Methods 537
5 e Eagle
6 f Frog
The following program merges these unequal data sets and prints the results:
data combined1;
merge animal1 plant1;
run;
Output 23.8 Merged Data Sets That Have an Unequal Number of Observations
Note that observations 4 through 6 contain missing values for the variable Plant.
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
6 e Eagle 6 e Eggplant
538 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
The following program produces the data set MERGE1 data set and prints the
results:
/* This program illustrates undesirable results. */
data merge1;
merge animal1 plant1;
run;
The number of observations in the new data set is six. Note that observations 2 and
3 contain undesirable values. SAS reads the second observation from data set
Animal1. It then reads the second observation from data set Plant1 and replaces the
values for the variables Common and Plant1. The third observation is created in the
same way.
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
Combining SAS Data Sets: Methods 539
5 f Fig
The following program produces the data set MERGE2 and prints the results:
/* This program illustrates undesirable results. */
data merge2;
merge animal2 plant2;
run;
Match-Merging
Definition
Match-merging combines observations from two or more SAS data sets into a single
observation in a new data set according to the values of a common variable. The
number of observations in the new data set is the sum of the largest number of
observations in each BY group in all data sets. To perform a match-merge, use the
MERGE statement with a BY statement. Before you can perform a match-merge, all
data sets must be sorted by the variables that you specify in the BY statement or
they must have an index.
Syntax
Use this form of the MERGE statement to match-merge data sets:
540 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
MERGE data-set(s);
BY variable(s);
where
data-set
names at least two existing SAS data sets from which observations are read.
variable
names each variable by which the data set is sorted or indexed. These variables
are referred to as BY variables.
For a complete description of the MERGE and the BY statements, see SAS DATA
Step Statements: Reference.
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 f Fig
The following program merges the data sets according to the values of the BY
variable Common, and prints the results:
data combined;
merge animal plant;
by Common;
run;
Each observation in the new data set contains all the variables from all the data
sets.
1 a Ant 1 a Apple
2 a Ape 2 b Banana
542 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
6 e Eagle 6 e
Eggplant
The following program produces the merged data set MATCH1, and prints the
results:
data match1;
merge animal1 plant1;
by Common;
run;
In observation 2 of the output, the value of the variable Plant1 is retained until all
observations in the BY group are written to the new data set. Match-merging also
produced duplicate values in Animal1 for observations 4 and 5.
Note: The MERGE statement does not produce a Cartesian product on a many-to-
many match-merge. Instead, it performs a one-to-one merge while there are
observations in the BY group in at least one data set. When all observations in the
BY group have been read from one data set and there are still more observations in
another data set, SAS performs a one-to-many merge until all observations have
been read for the BY group.
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
5 f
Fig
The following program produces the merged data set MATCH2, and prints the
results:
data match2;
merge animal2 plant2;
by Common;
run;
As the output shows, all values of the variable Common are represented in the new
data set, including missing values for the variables that are in one data set but not in
the other.
Definitions
Updating a data set refers to the process of applying changes to a master data set.
To update data sets, you work with two input data sets. The data set containing the
original information is the master data set, and the data set containing the new
information is the transaction data set.
544 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
You can update data sets by using the UPDATE statement or the MODIFY
statement:
UPDATE
uses observations from the transaction data set to change the values of
corresponding observations from the master data set. You must use a BY
statement with the UPDATE statement because all observations in the
transaction data set are keyed to observations in the master data set according
to the values of the BY variable.
MODIFY
can replace, delete, and append observations in an existing data set. Using the
MODIFY statement can save disk space because it modifies data in place,
without creating a copy of the data set.
The number of observations in the new data set is the sum of the number of
observations in the master data set and the number of unmatched observations in
the transaction data set.
For complete information about the UPDATE and the MODIFY statements, see SAS
DATA Step Statements: Reference.
master-data–set
specifies the SAS data set that you want to modify.
variable-list
names each variable by which the data set is ordered.
Note: The MODIFY statement does not support changing the descriptor portion of a
SAS data set, such as adding a variable.
For complete information, see MODIFY Statement in the SAS DATA Step
Statements: Reference.
Disk space saves disk space because it updates data in requires more disk space because it
place produces an updated copy of the
data set
Sort and index sorted input data sets are not required, requires only that both data sets be
although for good performance, it is strongly sorted
recommended that both data sets be sorted
and that the master data set be indexed
When to use use only when you expect to process a use if you expect to need to process
SMALL portion of the data set most of the data set
Where to specify the specify the updated data set in both the specify the updated data set in the
modified data set DATA and the MODIFY statements DATA and the UPDATE statements
Duplicate BY-values allows duplicate BY-values in both the allows duplicate BY-values in the
master and the transaction data sets transaction data set only (If
duplicates exist in the master data
set, SAS issues a warning.)
Combining SAS Data Sets: Methods 547
Scope of changes cannot change the data set descriptor can make changes that require a
information, so changes such as adding or change in the descriptor portion of a
deleting variables, variable labels, and so data set, such as adding new
on, are not valid variables, and so on
Error checking has error-checking capabilities using the needs no error checking because
_IORC_ automatic variable and the SYSRC transactions without a
autocall macro corresponding master record are
not applied but are added to the
data set
Data set integrity data might be only partially updated due to no data loss occurs because
an abnormal task termination UPDATE works on a copy of the
data
For more information about tools for combining SAS data sets, see Table 23.2 on
page 520.
The following program updates Master with the transactions in the data set
NEWPlant, writes the results to UPDATE_FILE, and prints the results:
data update_file;
update master newplant;
by common;
run;
548 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
Each observation in the new data set contains a new value for the variable Plant.
The following program applies the transactions in DupPlant to Master1 and prints
the results:
data update1;
update master1 dupplant;
by Common;
run;
When this DATA step executes, SAS generates a warning message stating that
there is more than one observation for a BY group. However, the DATA step
continues to process, and the data set Update1 is created.
The resulting data set has seven observations. Observations 2 and 3 have duplicate
values of the BY variable Common. However, the value of the variable Plant1 was
not updated in the second occurrence of the duplicate BY value.
The following program updates the data set Master2 and prints the results:
data update2_file;
update master2 nonplant;
by Common;
run;
550 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
Output 23.16 Results of Updating with New Variables, Nonmatched Observations, and
Missing Values
As shown, all observations now include values for the variable Mineral. The value of
Mineral is set to missing for some observations. Observations 2 and 6 in the
transaction data set did not have corresponding observations in Master2, and they
have become new observations. Observation 3 from the master data set was written
to the new data set without change, and the value for Plant2 in observation 4 was
not changed to missing. Three observations in the new data set contain updated
values for the variable Plant2.
The following program uses the UPDATEMODE statement option in the UPDATE
statement, and prints the results:
data update2_file;
update master2 nonplant updatemode=nomissingcheck;
by Common;
run;
data add_Inventory;
input PartNumber $ PartName $ Add_New_Stock New_Price;
format New_Price comma12.2;
datalines;
K89R seal 6 247.50
AA11 hammer 55 32.26
BB22 wrench 21 17.35
KJ66 cutter 10 24.50
CC33 socket 7 22.19
BV1E timer 30 36.50
;
proc sort data=add_Inventory; by PartNumber; run;
proc print data=add_Inventory;
title "Add_Inventory Data Set Sorted By PartNumber";
run;
Note: The SORT procedure is not required when modifying a data set using the
MODIFY statement. The data sets in this example are sorted to better show the
differences between the two data sets.
Combining SAS Data Sets: Methods 553
data Inventory;
modify Inventory add_Inventory; /* 1 */
by PartNumber;
select (_iorc_); /* 2 */
/*** The observation exists in the master data set */
when (%sysrc(_sok))do; /* 3 */
Amount_in_Stock = Amount_in_Stock + Add_New_Stock;
ReceivedDate = today();
replace; /* 4 */
end;
/*** The observation does not exist in the master data set*/
when (%sysrc(_dsenmr)) do; /* 5 */
Amount_in_Stock=Add_New_Stock;
ReceivedDate=today();
Price=New_Price;
output; /* 6 */
_error_=0;
end;
otherwise do; /* 7 */
put "An unexpected I/O error has occurred."
_error_ = 0;
stop;
end;
end;
run;
proc sort data=Inventory;
by PartNumber;
run;
proc print data=Inventory;
title "Updated Inventory Data Set Sorted by PartNumber";
run;
quit;
1 The MODIFY statement loads the data from the master and transaction data
sets. The BY statement matches observations from each data set based on the
unique values of the variable PartNumber.
2 If matches for PartNumber from the transaction data set are found for
PartNumber in the master data set, then the _IORC_ automatic variable is
automatically set to a code of _SOK.
3 The %SYSRC autocall macro checks to see whether the value of _IORC_ is
_SOK. If the value is _SOK, then the SELECT statement executes the first DO
statement block. Because the observation in the transaction data set matches
the observation in the master data set, the values in the observation can be
updated by being replaced.
4 The REPLACE statement updates the master data set by replacing its
observation with the observation from the transaction data set. The REPLACE
statement updates observations 4, 7, and 8, highlighted in blue in the output,
554 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
with new values for Amount_in_Stock and Price. The Amount_in_Stock values
are updated based on the values for Add_New_Stock in the transaction data set.
The Price values are updated based on the values for New_Price in the
transaction data set. The ReceivedDate values for these observations are not
updated, because these are existing items that were received in the past.
5 If no matches for PartNumber in the transaction data set are found for
PartNumber in the master data set, then the _IORC_ automatic variable is
automatically set to a code of _DSENMR, which means that no match was
found. The %SYSRC autocall macro checks to see whether the value of _IORC_
is _DSENMR. If the value is _DSENMR, then the SELECT statement executes
the second DO block. Because the observation in the transaction data set does
not exist in the master data set, the values cannot simply be replaced. An entire
observation is created and added to the master data set.
6 The OUTPUT statement writes the new observation to the master data set. The
OUTPUT statement adds observations 1, 2, and 5 to the master data set (see
the observations highlighted in yellow in the output). The ReceivedDate values
for these observations are updated based on the returned value for the TODAY
function.
7 If neither condition is met, the OTHERWISE statement executes the last DO
block and the PUT statement writes an error message to the log.
In the output below, the transaction data set contains 3 new items: hammer, wrench,
and socket. Because some observations do not exist in the master data set and are
being added from the transaction data set, an explicit OUTPUT statement is
needed. For those observations that do already exist in the master data set, the
REPLACE statement is needed to update the values for these observations.
The program uses the OUTPUT statement to add observations 1, 2, and 5 to the
master data set, and it uses the REPLACE statement to update observations 4, 7,
and 8 with new values for Amount_in_Stock and Price.
Figure 23.10 Results for the Inventory Master Data Set Sorted by PartNumber
Error Checking When Using Indexes to Randomly Access or Update Data 555
Note: Using the OUTPUT or REPLACE statements in a DATA step overrides the
default replacement of observations. If you use these statements in a DATA step,
then you must explicitly program each action that you want to take.
For more information about the statements in this program, see the following:
n “Error Checking When Using Indexes to Randomly Access or Update Data” on
page 555
n “%SYSRC Autocall Macro” in SAS Macro Language: Reference
Error-Checking Tools
Two tools have been created to make error checking easier when you use the
MODIFY statement or the SET statement with the KEY= option to process SAS data
sets:
n _IORC_ automatic variable
_IORC_ is created automatically when you use the MODIFY statement or the SET
statement with KEY=. The value of _IORC_ is a numeric return code that indicates
the status of the I/O operation from the most recently executed MODIFY or SET
statement with KEY=. Checking the value of this variable enables you to detect
abnormal I/O conditions and to direct execution down specific code paths instead of
having the application terminate abnormally. For example, if the KEY= variable
value does match between two observations, you might want to combine them and
write them to the output data set. If they do not match, however, you might want
SAS to write a note to the log.
556 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
Because the values of the _IORC_ automatic variable are internal and subject to
change, the SYSRC macro was created to enable you to test for specific I/O
conditions while protecting your code from future changes in _IORC_ values. When
you use SYSRC, you can check the value of _IORC_ by specifying one of the
mnemonics listed in the following table.
Table 23.4 Most Common Mnemonic Values of _IORC_ for DATA Step Processing
Overview
This example shows how to prevent an unexpected condition from terminating the
DATA step. The goal is to update a master data set with new information from a
transaction data set. This application assumes that there are no duplicate values for
the common variable in either data set.
Note: This program works as expected only if the master and transaction data sets
contain no consecutive observations with the same value for the common variable.
For an explanation of the behavior of MODIFY with KEY= when duplicates exist,
see the MODIFY statement in SAS DATA Step Statements: Reference.
Error Checking When Using Indexes to Randomly Access or Update Data 557
1 1 10 1 4 14
2 2 20 2 6 16
3 3 30 3 2 12
4 4 40
5 5 50
Original Program
The objective is to update the Master data set with information from the Transaction
data set. The program reads Transaction sequentially. Master is read directly, not
sequentially, using the MODIFY statement and the KEY= option. Only observations
with matching values for PartNumber, which is the KEY= variable, are read from
Master.
data master; 1
set transaction; 2
modify master key=PartNumber; 3
Quantity = Quantity + AddQuantity; 4
run;
Resulting Log
This program has correctly updated one observation but it stopped when it could not
find a match for PartNumber value 6. The following lines are written to the SAS log:
ERROR: No matching observation was found in Master data set.
PartNumber=6 AddQuantity=16 Quantity=70 _ERROR_=1
_IORC_=1230015 _N_=2
NOTE: The SAS System stopped processing this step because
of errors.
NOTE: The data set WORK.MASTER has been updated. There were
1 observations rewritten, 0 observations added and 0
observations deleted.
558 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
Revised Program
The objective is to apply two updates and one addition to Master. This action
prevents the DATA step from stopping when it does not find a match in Master for
the PartNumber value 6 in Transaction. By adding error checking, this DATA step is
allowed to complete normally and produce a correctly revised version of Master.
This program uses the _IORC_ automatic variable and the SYSRC autocall macro
in a SELECT group to check the value of the _IORC_ variable. If a match is found,
the program executes the appropriate code.
data master; 1
set transaction; 2
modify master key=PartNumber; 3
select(_iorc_); 4
when(%sysrc(_sok)) do;
Quantity = Quantity + AddQuantity;
replace;
end;
when(%sysrc(_dsenom)) do;
Quantity = AddQuantity;
_error_ = 0;
output;
end;
otherwise do;
put 'ERROR: Unexpected value for _IORC_= ' _iorc_;
put 'Program terminating. DATA step iteration # ' _n_;
put _all_;
stop;
end;
end;
run;
match occurs (_SOK), update Quantity and replace the original observation in
Master. When there is no match (_DSENOM), set Quantity equal to the
AddQuantity amount from Transaction, and append a new observation.
_ERROR_ is reset to 0 to prevent an error condition that would write the
contents of the program data vector to the SAS log. When an unexpected
condition occurs, write messages and the contents of the program data vector to
the log, and stop the DATA step.
Resulting Log
The DATA step executed without error and observations were appropriately updated
and added. The following lines are written to the SAS log:
NOTE: The data set WORK.MASTER has been updated. There were
2 observations rewritten, 1 observations added and 0
observations deleted.
Overview
This example shows how important it is to use error checking on all statements that
use the KEY= option when reading data.
Master ORDER
560 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
1 1 10 1 2
2 2 20 2 4
3 3 30 3 1
4 4 40 4 3
5 5 50 5 8
6 5
7 6
Description
1 4 Nuts
2 3 Bolts
3 2 Screws
4 6 Washers
2 Read an observation from the Order data set. Read an observation from the
Description and the Master data sets based on a matching value for
PartNumber, the key variable. Note that no error checking occurs after an
observation is read from Description.
3 Take the correct course of action, based on whether a matching value for
PartNumber is found in the Master or Description. (This logic is based on the
erroneous assumption that this SELECT group performs error checking for both
of the preceding SET statements that contain the KEY= option. It actually
performs error checking for only the most recent one.) The SELECT group
directs execution to the correct code. When a match occurs (_SOK), the value of
PartNumber in the observation that is being read from Master matches the
current PartNumber value from Order. The result is to write the observation to
the output data set. When there is no match (_DSENOM), no observations in
Master contain the current value of PartNumber, so set the value of
PartDescription appropriately and output an observation. _ERROR_ is reset to 0
to prevent an error condition that would write the contents of the program data
vector to the SAS log. When an unexpected condition occurs, write messages
and the contents of the program data vector to the log, and stop the DATA step.
Resulting Log
This program creates an output data set but executes with one error. The following
lines are written to the SAS log:
PartNumber=1 PartDescription=Nuts Quantity=10 _ERROR_=1
_IORC_=0 _N_=3
PartNumber=5 PartDescription=No description Quantity=50
_ERROR_=1 _IORC_=0 _N_=6
NOTE: The data set WORK.COMBINE has 7 observations and 3 variables.
Revised Program
To create an accurate output data set, this example performs error checking on both
SET statements that use the KEY= option:
data combine(drop=Foundes); 1
562 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
correct code based on the value of _IORC_. When a match occurs (_SOK), the
value of PartNumber in the observation that is being read from Description
matches the current value from Order. Foundes is set to 1 to indicate that
Description contributed to the current observation. When there is no match
(_DSENOM), no observations in Description contain the current value of
PartNumber, so the description is set appropriately. _ERROR_ is reset to 0 to
prevent an error condition that would write the contents of the program data
vector to the SAS log. Any other _IORC_ value indicates that an unexpected
condition has been met, so messages are written to the log and the DATA step is
stopped.
6 Read an observation from the Master data set, using PartNumber as a key
variable.
7 Take the correct course of action based on whether a matching value for
PartNumber is found in Master. When a match is found (_SOK) between the
current PartNumber value from Order and from Master, write an observation.
When a match is not found (_DSENOM) in Master, test the value of Foundes. If
Foundes is not true, then a value was not found in Description either, so write a
message to the log but do not write an observation. If Foundes is true, however,
the value is in Description but not Master. So write an observation but set
Quantity to 0. Again, if an unexpected condition occurs, write a message and
stop the DATA step.
Resulting Log
The DATA step executed without error. Six observations were correctly created and
the following message was written to the log:
WARNING: PartNumber 8 is not in Description or Master.
NOTE: The data set WORK.COMBINE has 6 observations
and 3 variables.
1 2 Screws 20
2 4 Nuts 40
3 1 No description 10
4 3 Bolts 30
5 5 No description 50
6 6 Washers 0
564 Chapter 23 / Reading, Combining, and Modifying SAS Data Sets
565
24
Using DATA Step Component
Objects
After you declare and instantiate a hash object, you can perform many tasks,
including these:
n Store and retrieve data.
n Output a data set that contains the data in the hash object.
For example, suppose you have a large data set that contains numeric lab results
corresponding to a unique patient number and weight. And suppose you have a
small data set that contains patient numbers (a subset of those in the large data
set). You can load the large data set into a hash object using the unique patient
number as the key and the weight values as the data. A single pass is made over
the small data set using the patient number to look up the current patient in the hash
object whose weight is over a certain value and output that data to a different data
set.
Depending on the number of lookup keys and the size of the data set, the hash
object lookup can be significantly faster than a standard format lookup. If you are
just looking up keys, you have a lot of memory, and you want fast performance, load
the large data set first. If you do not want to use a lot of memory, load the small data
set first.
The DECLARE statement tells the compiler that the object reference MyHash is of
type hash. At this point, you have declared only the object reference MyHash. It has
the potential to hold a component object of type hash. You should declare the hash
object only once. The _NEW_ operator creates an instance of the hash object and
assigns it to the object reference MyHash.
There is an alternative to the two-step process of using the DECLARE statement
and the _NEW_ operator to declare and instantiate a component object. You can
use the DECLARE statement to declare and instantiate the component object in one
step.
declare hash myhash();
For more information, see “DECLARE Statement: Hash and Hash Iterator Objects”
in SAS Component Objects: Reference and the “Hash and Hash Iterator Operator:
Objects” in SAS Component Objects: Reference.
For more information, see the “DECLARE Statement: Hash and Hash Iterator
Objects” in SAS Component Objects: Reference and the “Hash and Hash Iterator
Operator: Objects” in SAS Component Objects: Reference.
You can have multiple key and data variables, but the entire key must be unique,
unless you create the hash object with the MULTIDATA:“YES” argument tag. For
more information, see “Non-Unique Key and Data Pairs” on page 569.
You can store more than one data item with a particular key. For example, you could
modify the previous example to store auxiliary numeric values with the character key
and data. In this example, each key and each data item consists of a character
value and a numeric value:
length d1 8;
length d2 $20;
length k1 $20;
length k2 8;
For more information, see the “DEFINEDATA Method” in SAS Component Objects:
Reference, “DEFINEDONE Method” in SAS Component Objects: Reference, and
the “DEFINEKEY Method” in SAS Component Objects: Reference.
Note: The hash object does not assign values to key variables (for example,
h.find(key:'abc')), and the SAS compiler cannot detect the data variable
assignments that are performed by the hash object and the hash iterator. If you
Using the Hash Object 569
declare a key or data variable in the program, but do not assign that key or data
variable an initial value, SAS issues a note stating that the variable is uninitialized.
To avoid receiving these notes, you can perform one of the following actions:
n Set the NONOTES system option.
If you use a key or data variable without declaring or initializing that key or data
variable outside the hash object, an error occurs.
In addition to moving forward through the list for a given key, you can loop backward
through the list by using the HAS_PREV and FIND_PREV methods in a similar
manner.
When you have a hash object that has multiple values for a single key, you can use
the DO_OVER method in an iterative DO loop to traverse through the duplicate
keys. The DO_OVER method reads the key on the first method call and continues
to iterate over the duplicate key list until it reaches the end.
Note: The items in a multiple data item list are maintained in the order in which you
insert them.
For more information about these and other methods associated with non-unique
key and data pairs, see “Dictionary of Hash and Hash Iterator Object Language
Elements” in SAS Component Objects: Reference.
For more information, see the “REF Method” in SAS Component Objects:
Reference.
Note: You can also use the hash iterator object to retrieve the hash object data,
one data item at a time, in forward and reverse order. For more information, see
“Using the Hash Iterator Object ” on page 579.
Using the Hash Object 571
k = 'Homer';
/* Use the FIND method to retrieve the data associated with 'Homer' key */
rc = h.find();
if (rc = 0) then
put d=;
else
put 'Key Homer not found.';
run;
The FIND method assigns the data value Odyssey, which is associated with the key
value Homer, to the variable D.
data match;
length k 8;
length s 8;
if _N_ = 1 then do;
/* load SMALL data set into the hash object */
declare hash h(dataset: "work.small");
/* define SMALL data set variable K as key and S as value */
h.defineKey('k');
h.defineData('s');
h.defineDone();
/* avoid uninitialized variable notes */
call missing(k, s);
end;
/* use the SET statement to iterate over the LARGE data set using */
/* keys in the LARGE data set to match keys in the hash object */
set large;
rc = h.find();
if (rc = 0) then output;
run;
The dataset argument tag specifies the Small data set whose keys and data are
read and loaded by the hash object during the DEFINEDONE method. The FIND
method is then used to retrieve the data.
myhash.defineKey('k');
myhash.defineDone();
k = 99;
count = 1;
myhash.add();
In this example, a summary is maintained for each key value K=99 and K=100:
k = 99;
count = 1;
myhash.add();
/* key=99 summary is now 1 */
k = 100;
myhash.add();
/* key=100 summary is now 1 */
k = 99;
myhash.find();
/* key=99 summary is now 2 */
count = 2;
myhash.find();
/* key=99 summary is now 4 */
k = 100;
myhash.find();
/* key=100 summary is now 3 */
myhash.sum(sum: total);
put 'total for key 100 = 'total;
574 Chapter 24 / Using DATA Step Component Objects
k = 99;
myhash.sum(sum:total);
put 'total for key 99 = ' total;
And the second PUT statement prints the summary for K=99:
total for key 99 = 4
You can use key summaries in conjunction with the dataset argument tag. As the
data set is read into the hash object using the DEFINEDONE method, all key
summaries are set to the SUMINC value. And, all subsequent FIND, CHECK, or
ADD methods change the corresponding key summaries.
declare hash myhash(suminc: "keycount", dataset: "work.mydata");
You can use key summaries for counting the number of occurrences of given keys.
In the following example, the data set MyData is loaded into a hash object and uses
key summaries to keep count of the number of occurrences for each key in the data
set Keys. (The SUMINC variable is not set to a value, so the default initial value of
zero is used.)
data mydata;
input key;
datalines;
1
2
3
4
5
;
run;
data keys;
input key;
datalines;
1
2
1
3
5
2
3
2
4
1
5
1
;
run;
data count;
length total key 8;
keep key total;
Using the Hash Object 575
done = 0;
do while (not done);
set mydata end=done;
rc = myhash.sum(sum: total);
output;
end;
stop;
run;
Here is the output for the resulting data set.
n Use the REMOVEDUP method to remove only the current data item.
n Use the REPLACEDUP method to replace only the current data item.
576 Chapter 24 / Using DATA Step Component Objects
In the following example, the REPLACE method replaces the data Odyssey with
Iliad, and the REMOVE method deletes the entire data entry associated with the
Joyce key from the hash object.
data _null_;
length d $20;
length k $20;
/* Use the REMOVE method to remove the 'Joyce' key and data */
k = 'Joyce';
rc = h.remove();
if (rc = 0) then
put k 'removed from hash object';
else
put 'Deletion not successful.';
run;
Note: If an associated hash iterator is pointing to the key, the REMOVE method
does not remove the key or data from the hash object. An error message is issued
to the log.
For more information, see the “REMOVE Method” in SAS Component Objects:
Reference, “REMOVEDUP Method” in SAS Component Objects: Reference,
“REPLACE Method” in SAS Component Objects: Reference, and the
“REPLACEDUP Method” in SAS Component Objects: Reference.
data test;
length d1 8;
length d2 $20;
length k1 $20;
length k2 8;
/* Declare the hash object and two key and data variables */
if _N_ = 1 then do;
declare hash h();
rc = h.defineKey('k1', 'k2');
rc = h.defineData('d1', 'd2');
rc = h.defineDone();
end;
/* Use the OUTPUT method to save the hash object data to the OUT data set */
rc = h.output(dataset: "work.out");
run;
The following output shows the report that PROC PRINT generates.
578 Chapter 24 / Using DATA Step Component Objects
Note that the hash object keys are not automatically stored as part of the output
data set. The keys can be defined as data items by using the DEFINEDATA method
to be included in the output data set. In addition, if no data items are defined by
using the DEFINEDATA method, the keys are written to the data set specified in the
OUTPUT method. In the previous example, the DEFINEDATA method would be
written this way:
rc = h.defineData('k1', 'k2', 'd1', 'd2');
For more information, see the “OUTPUT Method” in SAS Component Objects:
Reference.
For more information, see the “EQUALS Method” in SAS Component Objects:
Reference.
attribute_value=obj.attribute_name;
There are two attributes available to use with hash objects. NUM_ITEMS returns the
number of items in a hash object and ITEM_SIZE returns the size (in bytes) of an
item. The following example retrieves the number of items in a hash object:
n = myhash.num_items;
You can obtain an idea of how much memory the hash object is using with the
ITEM_SIZE and NUM_ITEMS attributes. The ITEM_SIZE attribute does not reflect
the initial overhead that the hash object requires, nor does it take into account any
necessary internal alignments. Therefore, the use of ITEM_SIZE does not provide
exact memory usage, but it gives a good approximation.
For more information, see the “NUM_ITEMS Attribute” in SAS Component Objects:
Reference and the “ITEM_SIZE Attribute” in SAS Component Objects: Reference.
The DECLARE statement tells the compiler that the object reference MyIter is of
type hash iterator. At this point, you have declared only the object reference MyIter.
It has the potential to hold a component object of type hash iterator. You should
declare the hash iterator object only once. The _NEW_ operator creates an instance
of the hash iterator object and assigns it to the object reference MyIter. The hash
object, H, is passed as a constructor argument. The hash object, not the hash object
variable, is specifically assigned to the hash iterator.
As an alternative to the two-step process of using the DECLARE statement and the
_NEW_ operator to declare and instantiate a component object, you can declare
and instantiate a hash iterator object in one step by using the DECLARE statement
as a constructor method. The syntax is as follows:
declare hiter object_name(hash_object_name);
580 Chapter 24 / Using DATA Step Component Objects
In the above example, the hash object name must be enclosed in single or double
quotation marks.
For example:
declare hiter myiter('h');
Note: You must declare and instantiate a hash object before you create a hash
iterator object. For more information, see “Declaring and Instantiating a Hash
Object” on page 567.
For example:
if _N_ = 1 then do;
length key $10;
declare hash myhash(dataset:"work.x", ordered: 'yes');
declare hiter myiter('myhash');
myhash.defineKey('key');
myhash.defineDone();
end;
This code creates an instance of a hash iterator object with the variable name
MyIter. The hash object, MyHash, is passed as the constructor argument. Because
the hash object was created with the ORDERED argument tag set to 'yes', the
data is returned in ascending key-value order.
For more information about the DECLARE statement and the _NEW_ operator, see
the SAS DATA Step Statements: Reference.
M3 13 42.2 +28 23
M22 18 36.4 -23 54
M23 17 56.8 -19 01
M49 12 29.8 +08 00
M68 12 39.5 -26 45
M17 18 20.8 -16 11
M14 17 37.6 -03 15
M29 20 23.9 +38 32
M34 02 42.0 +42 47
M82 09 55.8 +69 41
M59 12 42.0 +11 39
M74 01 36.7 +15 47
M25 18 31.6 -19 15
;
run;
data out;
if _N_ = 1 then do;
length obj $10;
length ra $10;
length dec $10;
/* Read ASTRO data set and store in asc order in hash obj */
declare hash h(dataset:"work.astro", ordered: 'yes');
/* Define variables RA and OBJ as key and data for hash object */
declare hiter iter('h');
h.defineKey('ra');
h.defineData('ra', 'obj');
h.defineDone();
/* Avoid uninitialized variable notes */
call missing(obj, ra, dec);
end;
/* Retrieve RA values in ascending order */
rc = iter.first();
do while (rc = 0);
/* Find hash object keys greater than 12 and output data */
if ra GE '12' then
output;
rc = iter.next();
end;
run;
Operating
System Method Example
Windows
UNIX
z/OS
VMS
584 Chapter 24 / Using DATA Step Component Objects
Operating
System Method Example
In this example, the DECLARE statement tells the compiler that the object reference
J is of type Java. That is, the instance of the Java object is stored in the variable J.
At this point, you have declared only the object reference J. It has the potential to
hold a component object of type Java. You should declare the Java object only
Using the Java Object 585
once. The _NEW_ operator creates an instance of the Java object and assigns it to
the object reference J. The Java class name, SOMEJAVACLASS, is passed as a
constructor argument, which is the first-and-only argument that is required for the
Java object constructor. All other arguments are constructor arguments to the Java
class itself.
As an alternative to the two-step process of using the DECLARE statement and the
_NEW_ operator to declare and instantiate a Java object, you can declare and
instantiate a Java object in one step by using the DECLARE statement as a
constructor method. The syntax is as follows:
DECLARE JAVAOBJobject-name(“java-class”, <argument-1, … argument–n>);
For more information, see the “DECLARE Statement: Java Object” in SAS
Component Objects: Reference and the “_NEW_ Operator: Java Object” in SAS
Component Objects: Reference.
<return value>);
object.CALLSTATICtypeMETHOD (“method-name”,
<method-argument-1 …, method-argument-n>, <return value>);
Note: The type argument represents a Java data type. For more information about
how Java data types relate to SAS data types, see “Type Issues” on page 586.
For more information and examples, see “Dictionary of Java Object Language
Elements” in SAS Component Objects: Reference.
Type Issues
The Java type set is a superset of the SAS data types. Java has data types such as
BYTE, SHORT, and CHAR in addition to the standard numeric and character
values. SAS has only two data types: numeric and character.
The following table describes how Java data types are mapped to SAS data types
when using the Java object method calls.
Table 24.2 How Java Data Types Map to SAS Data Types
BOOLEAN numeric
BYTE numeric
CHAR numeric
DOUBLE numeric
FLOAT numeric
INT numeric
LONG numeric
SHORT numeric
STRING character*
* Java string data types are mapped to SAS character data types as UTF-8 strings.
Other than STRING, it is not possible to return objects from Java classes to the
DATA step. However, it is possible to pass objects to Java methods. For more
information, see “Passing Java Object Arguments” on page 589.
Some Java methods that return objects can be handled by creating wrapper classes
to convert the object values. In the following example, the Java hash table returns
object values. However, you can still use the hash table from the DATA step by
creating simple Java wrapper classes to handle the type conversions. Then you can
access the dhash and shash classes from the DATA step.
/* Java code */
import java.util.*;
Using the Java Object 587
public dhash()
{
table = new Hashtable ();
}
import java.util.*;
public shash()
{
table = new Hashtable ();
}
do i = 1 to 10;
dh.callvoidmethod('vput', i, i * 2);
end;
do i = 1 to 10;
sh.callvoidmethod('put', i, 'abc' || left(trim(i))); end;
588 Chapter 24 / Using DATA Step Component Objects
do i = 1 to 10;
dh.calldoublemethod('get', i, d);
sh.callstringmethod('get', i, s);
put d= s=;
end;
run;
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
abc
def
ghi
/* Java code */
import java.util.*;
public mVector(double d)
{
super((int)d);
}
import java.util.*;
public class mIterator
{
protected mVector m_v;
protected Iterator iter;
public mIterator(mVector v)
590 Chapter 24 / Using DATA Step Component Objects
{
m_v = v;
iter = v.iterator();
}
These wrapper classes are useful for performing type conversions (for example, the
mVector constructor takes a DOUBLE argument). Overloading the constructor is
necessary because java/util/Vector's constructor takes an integer value, but the
DATA step has no integer type.
The following DATA step program uses these classes. The program creates and fills
a vector, passes the vector to the iterator's constructor, and then lists all the values
in the vector. Note that you must create the iterator after the vector is filled. The
iterator keeps a copy of the vector's modification count at the time of creation, and
this count must stay in synchronization with the vector's current modification count.
The code would throw an exception if the iterator were created before the vector
was filled.
/* DATA step code */
data _null_;
length b 8;
length val $200;
dcl javaobj v("mVector");
v.callVoidMethod("addElement", "abc");
v.callVoidMethod("addElement", "def");
v.callVoidMethod("addElement", "ghi");
dcl javaobj iter("mIterator", v);
iter.callBooleanMethod("hasNext", b);
do while(b);
iter.callStringMethod("next", val);
put val=;
iter.callBooleanMethod("hasNext", b);
end;
m.delete();
v.delete();
iter.delete();
run;
One current limitation to passing objects is that the JNI method lookup routine does
not perform a full class lookup based on a given signature. This means that you
could not change the mIterator constructor to take a Vector as shown in the
following code:
/* Java code */
public mIterator(Vector v)
{
m_v = v;
iter = v.iterator();
}
Even though mVector is a subclass of Vector, the method lookup routine cannot
find the constructor. Currently, the only solution is to manage the types in Java by
adding new methods or by creating wrapper classes.
Java Exceptions
Java exceptions are handled through the EXCEPTIONCHECK,
EXCEPTIONCLEAR, and EXCEPTIONDESCRIBE methods.
The EXCEPTIONCHECK method is used to determine whether an exception
occurred during a method call. If you call a method that can throw an exception, it is
strongly recommended that you check for an exception after the call. If an exception
is thrown, you should take appropriate action and then clear the exception by using
the EXCEPTIONCLEAR method.
The EXCEPTIONDESCRIBE method is used to turn exception debug logging on or
off. If exception debug logging is on, exception information is printed to the JVM
standard output. By default, JVM standard output is redirected to the SAS log.
Exception debugging is off by default.
For more information, see the “EXCEPTIONCHECK Method” in SAS Component
Objects: Reference, “EXCEPTIONCLEAR Method” in SAS Component Objects:
Reference, and the “EXCEPTIONDESCRIBE Method” in SAS Component Objects:
Reference.
The Java output that is directed to the SAS log is flushed when the DATA step ends.
This flushing causes the Java output to appear after any output that was generated
while the DATA step was running. Use the FLUSHJAVAOUTPUT method to
synchronize the output so that it appears in the order of execution.
592 Chapter 24 / Using DATA Step Component Objects
rc = j.callDoubleMethod("compute", 1, 2, 3, r);
public colorsUI()
{
Using the Java Object 593
d = false;
list = new Vector();
cbl = new colorsButtonListener();
setBackground(Color.lightGray);
setSize(320,100);
setTitle("New Frame");
setVisible(true);
setLayout(new FlowLayout(FlowLayout.CENTER, 10, 15));
addWindowListener(new colorsUIListener());
this.add(red);
this.add(blue);
this.add(green);
this.add(quit);
show();
}
if (d)
return null;
synchronized(list)
{
594 Chapter 24 / Using DATA Step Component Objects
/*
* colorsUI.class will display a simple UI and maintain a
* queue to hold color choices.
*/
In the DATA step code, the colorsUI class is instantiated and the user interface is
displayed. You enter a loop that is terminated when you click Quit. This action is
communicated to the DATA step through the Done variable. While looping, the
DATA step retrieves the values from the Java class's queue and writes the values
successively to the output data set.
596 Chapter 24 / Using DATA Step Component Objects
public class x
{
public void m()
{
System.out.println("method m in y folder");
}
method m in y folder
method m2 in y folder
public class z
{
public void m()
{
System.out.println("method m in y folder");
}
You can call methods in this class instead of the class in folder y by changing the
classpath, but this requires restarting SAS. The following method allows for more
dynamic control of how classes are loaded.
To create a custom class loader, first you create an interface that contains all the
methods that you will call through the Java object—in this program, m and m2.
/* Java
code */
public interface apiInterface
{
public void m();
public void m2();
}
public apiImpl()
{
x = new x();
}
These methods are called by delegating to the Java object instance class. Note that
the code to create the apiClassLoader custom class loader is provided later in this
section.
/* Java code */
public class api
{
/* Load classes from the z folder */
static ClassLoader customLoader = new apiClassLoader("C:\\z");
static String API_IMPL = "apiImpl";
apiInterface cp = null;
public api()
{
cp = load();
}
The following DATA step program calls these methods by delegating through the
api Java object instance class. The Java object instantiates the api class, which
creates a custom class loader to load classes from the z folder. The api class calls
the custom loader and returns an instance of the apiImpl interface implementation
class to the Java object. When methods are called through the Java object, the api
class delegates them to the implementation class.
/* DATA step code */
data _null_;
dcl javaobj j('api');
j.callvoidmethod('m');
j.callvoidmethod('m2');
run;
method m is z folder
method m2 in z folder
In the previous Java code, you could also use .jar files to augment the classpath in
the ClassLoader constructor.
static ClassLoader customLoader = new apiClassLoader("C:\\z;C:\\temp\some.jar");
In this case, the Java code for the custom class loader is as follows. This code for
this class loader can be added to or modified as needed.
import java.io.*;
import java.util.*;
import java.util.jar.*;
import java.util.zip.*;
/**
* This method will look for the class in the class repository. If
* the method cannot find the class, the method will delegate to its parent
* class loader.
*
* @param className A String specifying the class to be loaded
* @return A Class object loaded by the apiClassLoader
* @throws ClassNotFoundException if the method is unable to load the class
*/
public Class loadClass(String name) throws ClassNotFoundException
{
// Check if the class is already loaded
Class loadedClass = findLoadedClass(name);
return loadedClass;
}
/**
* This method loads binary class file data from the classRepository.
*/
private byte[] loadFromCustomRepository(String classFileName)
throws ClassNotFoundException
{
Iterator dirs = classRepository.iterator();
byte[] classBytes = null;
while (dirs.hasNext())
{
String dir = (String) dirs.next();
if (dir.endsWith(".jar"))
{
// Look for class in jar
try
{
JarFile j = new JarFile(dir);
for (Enumeration e = j.entries(); e.hasMoreElements() ;)
{
Object n = e.nextElement();
if (jclassFileName.equals(n.toString()))
{
ZipEntry zipEntry = j.getEntry(jclassFileName);
if (zipEntry == null)
{
return null;
}
else
{
// read file
InputStream is = j.getInputStream(zipEntry);
classBytes = new byte[is.available()];
is.read(classBytes);
break;
}
Using the Java Object 601
}
}
}
catch (Exception e)
{
System.out.println("jar file exception");
return null;
}
}
else
{
// Look for class in directory
String fclassFileName = classFileName;
try
{
File file = new File(dir,fclassFileName);
if(file.exists()) {
//read file
InputStream is = new FileInputStream(file);
classBytes = new byte[is.available()];
is.read(classBytes);
break;
}
}
catch(IOException ex)
{
System.out.println("IOException raised while reading class
file data");
ex.printStackTrace();
return null;
}
}
}
return classBytes;
}
}
}
603
25
Array Processing
current DATA step. The array-name distinguishes it from any other arrays in the
same DATA step; it is not a variable.
Note: Arrays in SAS are different from those in many other programming
languages. In SAS, an array is not a data structure. An array is just a convenient
way of temporarily identifying a group of variables.
array processing
is a method that enables you to perform the same tasks for a series of related
variables.
array reference
is a method to reference the elements of an array.
one-dimensional array
is a simple grouping of variables that, when processed, results in output that can
be represented in simple row format.
multidimensional array
is a more complex grouping of variables that, when processed, results in output
that could have two or more dimensions, such as columns and rows.
Basic array processing involves the following steps:
n grouping variables into arrays
n repeating an action
One-Dimensional Array
The following figure is a conceptual representation of two one-dimensional arrays,
Misc and Mday.
Arrays Variables
1 2 3 4 5 6 7 8
MI SC mi s c 1 mi s c 2 mi s c 3 mi s c 4 mi s c 5 mi s c 6 mi s c 7 mi s c 8
1 2 3 4 5 6 7
MDAY md a y 1 md a y 2 md a y 3 md a y 4 md a y 5 md a y 6 md a y 7
Misc contains eight elements, the variables Misc1 through Misc8. To reference the
data in these variables, use the form Misc{n}, where n is the element number in the
array. For example, Misc{6} is the sixth element in the array.
Syntax for Defining and Referencing an Array 605
Mday contains seven elements, the variables Mday1 through Mday7. Mday{3} is the
third element in the array.
Two-Dimensional Array
The following figure is a conceptual representation of the two-dimensional array
Expenses.
First Second
Dimension Dimension
Expense
Categories Days of the Week Total
1 2 3 4 5 6 7 8
Pe r s . Au t o 3 per aut 1 per aut 2 per aut 3 per aut 4 per aut 5 per aut 6 per aut 7 per aut 8
Ai r f a r e 5 ai r l i n1 ai r l i n2 ai r l i n3 ai r l i n4 ai r l i n5 ai r l i n6 ai r l i n7 ai r l i n8
Re g i s t r a t i o n 7 r egf ee1 r egf ee2 r egf ee3 r egf ee4 r egf ee5 r egf ee6 r egf ee7 r egf ee8
Fees
Ot h e r 8 ot her 1 ot her 2 ot her 3 ot her 4 ot her 5 ot her 6 ot her 7 ot her 8
T i p s ( n o n - me a l ) 9 t i ps 1 t i ps 2 t i ps 3 t i ps 4 t i ps 5 t i ps 6 t i ps 7 t i ps 8
Me a l s 10 me a l s 1 me a l s 2 me a l s 3 me a l s 4 me a l s 5 me a l s 6 me a l s 7 me a l s 8
The Expenses array contains ten groups of eight variables each. The ten groups
(expense categories) comprise the first dimension of the array, and the eight
variables (days of the week) comprise the second dimension. To reference the data
in the array variables, use the form Expenses{m,n}, where m is the element number
in the first dimension of the array, and n is the element number in the second
dimension of the array. Expenses{6,4} references the value of dues for the fourth
day (the variable is Dues4).
number-of-elements
is the number of variables in the group. You must enclose this value in either
parentheses (), braces {}, or brackets [].
$
specifies that the elements in the array are character elements.
length
specifies the length of the elements in the array that have not been previously
assigned a length.
array-elements
is a list of the names of the variables in the group. All variables that are defined
in a given array must be of the same type, either all character or all numeric.
initial-value-list
is a list of the initial values for the corresponding elements in the array.
For complete information, see the “ARRAY Statement” in SAS DATA Step
Statements: Reference.
To reference an array that was previously defined in the same DATA step, use an
Array Reference statement. An array reference has the following form:
array-name {subscript}
where
array-name
is the name of an array that was previously defined with an ARRAY statement in
the same DATA step.
subscript
specifies the subscript, which can be a numeric constant, the name of a variable
whose value is the number, a SAS numeric expression, or an asterisk (*).
Note: Subscripts in SAS are 1-based by default, and not 0-based as they are in
some other programming languages.
For complete information, see the Array Reference statement in the SAS DATA Step
Statements: Reference.
When you define an array, SAS assigns each array element an array reference with
the form array-name{subscript}, where subscript is the position of the variable in the
list. The following table lists the array reference assignments for the previous
ARRAY statement:
Processing Simple Arrays 607
Reference books{1}
Usage books{2}
Introduction books{3}
Later in the DATA step, when you want to process the variables in the array, you can
refer to a variable by either its name or its array reference. For example, the names
Reference and Books{1} are equivalent.
The first time that the loop processes, the value of count is 1; the second time, 2;
and the third time, 3. At the beginning of the fourth iteration, the value of count is 4,
which exceeds the specified range and causes SAS to stop processing the loop.
DO Statement Description
When the value of count is 1, SAS reads the array reference as Books{1} and
processes the IF-THEN statement on Books{1}, which is the variable Reference.
When count is 2, SAS processes the statement on Books{2}, which is the variable
Usage. When count is 3, SAS processes the statement on Books{3}, which is the
variable Introduction.
The statements in the example tell SAS to
n perform the actions in the loop three times
n replace the array subscript count with the current value of count for each
iteration of the IF-THEN statement
n locate the variable with that array reference and process the IF-THEN statement
on it
n replace missing values with zero if the condition is true.
The following DATA step defines the array Book and processes it with a DO loop.
options linesize=80 pagesize=60;
data changed(drop=count);
input Reference Usage Introduction;
array book{3} Reference Usage Introduction;
do count=1 to 3;
if book{count}=. then book{count}=0;
Processing Simple Arrays 609
end;
datalines;
45 63 113
. 75 150
62 . 98
;
If you specify the number of elements explicitly, you can omit the names of the
variables or array elements in the ARRAY statement. SAS then creates variable
names by concatenating the array name with the numbers 1, 2, 3, and so on. If a
variable name in the series already exists, SAS uses that variable instead of
creating a new one. In the following example, the array c1t references five variables:
c1t1, c1t2, c1t3, c1t4, and c1t5.
array c1t{5};
n Use an array reference anywhere that you can write a SAS expression.
data one;
array state{*} &list;
… more SAS statements …
run;
data two;
array state{*} &list;
… more SAS statements …
run;
n do i=1 to dim4(days) by 2;
Variations on Basic Array Processing 611
n _NUMERIC_
n _ALL_
You can use these variable list names to reference variables that have been
previously defined in the same DATA step. The _CHARACTER_ variable lists
character values only. The _NUMERIC_ variable lists numeric values only. The
_ALL_ variable lists either all character or all numeric values, depending on how you
previously defined the variables.
For example, the following INPUT statement reads in variables X1 through X3 as
character values using the $8. informat, and variables X4 through X5 as numeric
variables. The following ARRAY statement uses the variable list _CHARACTER_ to
include only the character variables in the array. The asterisk indicates that SAS
determines the subscript by counting the variables in the array.
input (X1-X3) ($8.) X4-X5;
array item {*} _character_;
You can use the _NUMERIC_ variable in your program (for example, you need to
convert currency). In this application, you do not need to know the variable names.
You need only to convert all values to the new currency.
For more information about variable lists, see the “ARRAY Statement” in SAS DATA
Step Statements: Reference.
612 Chapter 25 / Array Processing
SAS places variables into a multidimensional array by filling all rows in order,
beginning at the upper left corner of the array (known as row-major order). You can
think of the variables as having the following arrangement:
c1t1 c1t2 c1t3 c1t4 c1t5
c2t1 c2t2 c2t3 c2t4 c2t5
To refer to the elements of the array later with an array reference, you can use the
array name and subscripts. The following table lists some of the array references for
the previous example:
c1t1 temprg{1,1}
c1t2 temprg{1,2}
c2t2 temprg{2,2}
c2t5 temprg{2,5}
DO index-variable-2=1 TO number-of-columns;
... more SAS statements ...
END;
END;
An array reference can use two or more index variables as the subscript to refer to
two or more dimensions of an array. Use the following form:
array-name {index-variable-1, …,index-variable-n}
The following example creates an array that contains ten variables- five temperature
measures (t1 through t5) from two cities (c1 and c2). The DATA step contains two
DO loops.
n The outer DO loop (DO I=1 TO 2) processes the inner DO loop twice.
n The inner DO loop (DO J=1 TO 5) applies the ROUND function to all the
variables in one row.
For each iteration of the DO loops, SAS substitutes the value of the array element
corresponding to the current values of I and J.
options linesize=80 pagesize=60;
data temps;
array temprg{2,5} c1t1-c1t5 c2t1-c2t5;
input c1t1-c1t5 /
c2t1-c2t5;
do i=1 to 2;
do j=1 to 5;
temprg{i,j}=round(temprg{i,j});
end;
end;
datalines;
89.5 65.4 75.3 77.7 89.3
73.7 87.3 89.9 98.2 35.6
75.8 82.1 98.2 93.5 67.7
101.3 86.5 59.2 35.6 75.7
;
The previous example can also use the DIM function to produce the same result:
614 Chapter 25 / Array Processing
do
i=1 to dim1(temprg);
do j=1 to dim2(temprg);
temprg{i,j}=round(temprg{i,j});
end;
end;
In the following ARRAY statement, the bounds of the first dimension are 1 and 2 and
those of the second dimension are 1 and 5:
array test{2,5} test1-test10;
For most arrays, 1 is a convenient lower bound, so you do not need to specify the
lower bound. However, specifying both the lower and the upper bounds is useful
when the array dimensions have beginning points other than 1.
In the following example, ten variables are named Year76 through Year85. The
following ARRAY statements place the variables into two arrays named First and
Second:
array first{10} Year76-Year85;
array second{76:85} Year76-Year85;
In the first ARRAY statement, the element first{4} is variable Year79, first{7} is
Year82, and so on. In the second ARRAY statement, element second{79} is Year79
and second{82} is Year82.
To process the array names Second in a DO group, make sure that the range of the
DO loop matches the range of the array as follows:
do i=76 to 85;
if second{i}=9 then second{i}=.;
end;
Specifying Array Bounds 615
In this example, the index variable in the iterative DO statement ranges from 76 to
85.
To process the array named YEARS in an iterative DO loop, make sure that the
range of the DO loop matches the range of the array as follows:
do i=lbound(years) to hbound(years);
if years{i}=99 then years{i}=.;
end;
The following ARRAY statement arranges the variables in an array by decades. The
rows range from 6 through 9, and the columns range from 0 through 9.
array X{6:9,0:9} X60-X99;
In array X, variable X63 is element X{6,3} and variable X89 is element X{8,9}. To
process array X with iterative DO loops, use one of these methods:
n Method 1:
do i=6 to 9;
do j=0 to 9;
if X{i,j}=0 then X{i,j}=.;
end;
end;
n Method 2:
do i=lbound1(X) to hbound1(X);
do j=lbound2(X) to hbound2(X);
if X{i,j}=0 then X{i,j}=.;
end;
end;
Both examples change all values of 0 in variables X60 through X99 to missing. The
first example sets the range of the DO groups explicitly. The second example uses
the LBOUND and HBOUND functions to return the bounds of each dimension of the
array.
The statement inside the DO loop uses the UPCASE function to change the values
of the variables in array NAMES to uppercase. The statement then stores the
uppercase values in the variables in the CAPITALS array.
options linesize=80 pagesize=60;
data text;
array names{*} $ n1-n5;
array capitals{*} $ c1-c5;
input names{*};
do i=1 to 5;
capitals{i}=upcase(names{i});
end;
datalines;
smithers michaels gonzalez hurth frank
;
data score1(drop=i);
array test{3} t1-t3 (90 80 70);
array score{3} s1-s3;
input id score{*};
do i=1 to 3;
if score{i}>=test{i} then
do;
618 Chapter 25 / Array Processing
NewScore=score{i};
output;
end;
end;
datalines;
1234 99 60 82
5678 80 85 75
;
data score2(drop=i);
array test{3} _temporary_ (90 80 70);
array score{3} s1-s3;
input id score{*};
do i=1 to 3;
if score{i}>=test{i} then
do;
Examples of Array Processing 619
NewScore=score{i};
output;
end;
end;
datalines;
1234 99 60 82
5678 80 85 75
;
data sales;
infile datalines;
input Value1 Value2 Value3 Value4;
datalines;
11 56 58 61
22 51 57 61
22 49 53 58
;
data convert(drop=i);
set sales;
array test{*} _numeric_;
do i=1 to dim(test);
test{i} = (test{i}*3);
end;
run;
PART 5
Chapter 26
SAS Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Chapter 27
SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Chapter 28
SAS Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
Chapter 29
SAS Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Chapter 30
Stored Compiled DATA Step Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
Chapter 31
DICTIONARY Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Chapter 32
SAS Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
Chapter 33
About SAS/ACCESS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
Chapter 34
Processing Data Using Cross-Environment Data Access (CEDA) . . . . . . 765
Chapter 35
Cross-Release Compatibility and Migration . . . . . . . . . . . . . . . . . . . . . . . . . 779
Chapter 36
File Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
Chapter 37
SAS Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
Chapter 38
SAS File Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815
Chapter 39
External Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
622
623
26
SAS Libraries
At the operating environment level, however, a SAS library has different physical
implementations. Most SAS libraries implement the storage of files in a manner
similar to how the operating environment stores and accesses files.
For example, in directory-based operating environments, a SAS library is a group of
SAS files that are stored in the same directory and accessed by the same engine.
Other files can be stored in the directory, but only the files with file extensions that
are assigned by SAS are recognized as part of the SAS library. Under z/OS, a SAS
library can be implemented as either a bound library in a traditional OS data set or
as a directory under UNIX System Services.
SAS files can be any of the following file types:
n SAS data set (SAS data file or SAS view)
n SAS catalog
n access descriptors
Multidimensional
Utility Files
D atabase Files
Each SAS file, in turn, stores information in smaller units that are characteristic of
the SAS file type. For example, SAS data sets store information as variables and
observations, while SAS catalogs store information in units called entries. SAS
determines the type of a file from the context of the SAS program in which the file is
created or specified. Therefore, a library can contain files with the same name but
with different member types.
SAS libraries can contain files that you create, or they can be one of several special
libraries that SAS provides for convenience, support, and customizing capability
such as the Work library. SAS does not limit the number of SAS files that you can
store in a SAS library.
Library Names 625
Library Engines
Each SAS library is associated with a library engine. SAS library engines are
software components that form the interface between SAS and the SAS library. It is
the SAS library engine that locates files in a SAS library and renders the file
contents to SAS in a form that it can recognize. Library engines perform such tasks
as:
n reading and writing data
SAS has a Multi Engine Architecture in order to read to and write from files in
different formats. Each SAS engine has specific processing characteristics, such as
the ability to
n process a SAS file generated by an older version of SAS
You generally are not aware of the particular type of engine that is processing data
at any given time. If you issue an instruction that is not supported by the engine, an
error message is displayed in the SAS log. When needed, you can select a specific
engine to perform a task. But usually, you do not have to specify an engine,
because SAS automatically selects the appropriate one.
More than one engine might be involved in processing a DATA step; for example,
one engine might be used to input data, and another engine might be used to write
observations to the output data set.
For more information about library engines, including a list of engines available in
Base SAS, see “About Library Engines” on page 808.
Library Names
n a logical name (libref) that you assign using the LIBNAME statement, LIBNAME
function, or the New Library window
The physical location name of the SAS library is a name that identifies your SAS
files to the operating environment. The physical location name must conform to the
naming conventions of your operating environment. The physical location name fully
identifies the directory, or operating environment data set that contains the SAS
library.
The logical name, or libref, is the way you identify a group of files to SAS. A libref is
a name that you associate with the physical location of the SAS library.
Assigning Librefs
Librefs can be assigned using the following methods:
n LIBNAME statement
n LIBNAME function
Once the libref is assigned, you can read, create, or update files in a SAS library. A
libref is valid only for the current SAS session, unless it is assigned using the New
Library window with the Enable at start-up box checked.
A libref can have a maximum length of eight characters. You can use the LIBREF
function to verify that a libref has been assigned. Librefs can be referenced
repeatedly within a SAS session. SAS does not limit the number of librefs that you
can assign during a session. However, your operating environment or site might set
limitations. If you are running in batch mode, the library must exist before you can
allocate or assign it. In interactive mode, you might be allowed to create it if it does
not already exist.
Operating Environment Information: Here are examples of the LIBNAME
statement for different operating environments. The rules for assigning and using
librefs differ across operating environments. See the SAS documentation for your
operating environment for specific information.
You can also access files without using a libref. See “Accessing Permanent SAS
Files without a Libref” on page 637.
Library Names 627
If you use the LIBNAME statement to assign the libref, SAS clears (deassigns) the
libref automatically at the end of each SAS session. If you want to clear the libref
Annual before the end of the session, you can issue the following form of the
LIBNAME statement:
libname annual clear;
SAS also provides a New Library window to assign or clear librefs and SAS Explorer
to view, add, or delete SAS libraries. You can select the New Library or the SAS
Explorer icon from the Toolbar.
Reserved Librefs
SAS reserves a few names for special uses. You should not use Sashelp, Sasuser,
or Work as librefs, except as intended. The purpose and content of these libraries
are discussed in “Permanent and Temporary Libraries” on page 631.
Operating Environment Information: There are other librefs reserved for SAS
under some operating environments. In addition, your operating environment might
have reserved certain words that cannot be used as librefs. See the SAS
documentation for your operating environment for more information.
628 Chapter 26 / SAS Libraries
When you access files on a WebDAV server, SAS pulls the file from the server to
your local disk for processing. The files are temporarily stored in the SAS Work
directory, unless you use the LOCALCACHE= option in the LIBNAME statement,
which specifies a different directory for temporary storage. When you finish updating
the file, SAS pushes the file back to the WebDAV server for storage and removes
the file from the local disk.
For more information, see “WHEREUP= Data Set Option” in SAS Data Set Options:
Reference.
Library Concatenation
Lib3
Oranges and Plums
The LIBNAME statement concatenates Lib1, Lib2, and Lib3:
libname fruit (lib1 lib2 lib3);
n Pears
n Oranges
n Plums
Note: Output always goes to the first library. For example, the following statement
writes to the first library in the concatenation, Lib1:
data fruit.oranges;
Note that in this example, if the file Apples in Lib1 was different from the file Apples
in Lib2, and if an update to Apples was specified, it is updated only in Lib1 because
that is the first occurrence of the member Apples.
For complete documentation on library concatenation, see the “LIBNAME
Statement” in SAS Global Statements: Reference.
Operating Environment Information: For more information about how specific
operating environments handle concatenation, see the SAS documentation for your
operating environment.
A.DATA and A.INDEX, only A.DATA from library One is listed. A.DATA and
A.INDEX from library Two are not listed.
n If any library in the concatenation is sequential, then the concatenated library is
considered sequential by applications that require random access. For example,
the DATASETS procedure cannot process sequential libraries, and therefore
cannot process a concatenated library that contains one or more sequential
libraries.
n The attributes of the first library that is specified determine the attributes of the
concatenation. For example, if the first SAS library that is listed is “read only,”
then the entire concatenated library is “read only.”
n Once a libref has been assigned in a concatenation, any changes made to the
libref does not affect the concatenation.
n Changing a data set name to an existing name in the concatenation will fail.
n User
n Sashelp
n Sasuser
Work Library
data work.test2;
User Library
n LIBNAME function
In this example, the LIBNAME statement is used with a DATA step, which stores the
data set Region in a permanent SAS library.
634 Chapter 26 / SAS Libraries
When assigning a libref using the USER= system option, you must first assign a
libref to a SAS library, and then use the USER= system option to specify that library
as the default for one-level names. In this example, the DATA step stores the data
set Prochlor in the SAS library Testlib.
libname testlib 'SAS-library';
options user=testlib;
data prochlor;
... more DATA step statements ...
run;
Sashelp Library
Each SAS site receives the Sashelp library, which contains a group of catalogs and
other files containing information that is used to control various aspects of your SAS
session. The defaults stored in this library are for everyone using SAS at your
installation. Your personal settings are stored in the Sasuser library, which is
discussed later in this section.
If SAS products other than Base SAS are installed at your site, the Sashelp library
contains catalogs that are used by those products. In many instances, the defaults
in this library are customized for your site by your on-site SAS support personnel.
You can list the catalogs stored at your site by using one of the file management
utilities discussed later in this section.
Sequential Data Libraries 635
Sasuser Library
The Sasuser library contains SAS catalogs that enable you to customize features of
SAS for your needs. If the defaults in the Sashelp library are not suitable for your
applications, you can modify them and store your personalized defaults in your
Sasuser library. For example, in Base SAS, you can store your own defaults for
function key settings or window attributes in a personal Profile catalog named
Sasuser.Profile.
SAS assigns the Sasuser library during system initialization, according to the
information supplied by the Sasuser system option.
A system option called RSASUSER enables the system administrator to control the
mode of access to the Sasuser library at installations that have one Sasuser library
for all users and that want to prevent users from modifying it.
Operating Environment Information: In most operating environments, the
Sasuser library is created if it does not already exist. However, the Sasuser library is
implemented differently in various operating environments. See the SAS
documentation for your operating environment for more information.
n You can access only one of the SAS files in a sequential library, or only one of
the SAS files on a tape, at any point in a SAS job.
For example, you cannot read two or more SAS data sets in the same library or
on the same tape at the same time in a single DATA step. However, you can
access:
o two or more SAS files in different sequential libraries, or on different tapes at
the same time, if there are enough tape drives available
o a SAS file during one DATA or PROC step, and then access another SAS file
in the same sequential library or on the same tape during a later DATA or
PROC step
Also, when you have more than one SAS data set on a tape or in a sequential
library in the same DATA or PROC step, one SAS data set file might be opened
during the compilation phase. The additional SAS data sets are opened during
the execution phase. For more information, see the “SET Statement” in SAS
DATA Step Statements: Reference.
n For some operating environments, you can read from or write to SAS data sets
only during a DATA or PROC step. However, you can always use the COPY
procedure to transfer all members of a SAS library to tape for storage and
backup purposes.
636 Chapter 26 / SAS Libraries
n Considerations specific to your site can affect your use of tape. For example, it
might be necessary to manually mount a tape before the SAS libraries become
available. Consult your operations staff if you are not familiar with using tape
storage at your location.
For information about sequential engines, see Chapter 37, “SAS Engines,” on page
803.
Operating Environment Information: The details for storing and accessing SAS
files in sequential format vary with the operating environment. See the SAS
documentation for your operating environment for more information.
SAS Utilities
The SAS utilities that are available for SAS file management enable you to work
with more than one SAS file at a time, as long as the files belong to the same library.
The advantage of learning and using SAS Explorer, functions, options, and
procedures is that they automatically copy, rename, or delete any index files or
integrity constraints, audit trails, backups, and generation data sets that are
associated with your SAS data files. Another advantage is that SAS utility
procedures work on any operating environment at any level.
There are several SAS window options, functions, and procedures available for
performing file management tasks. You can use the following features alone or in
combination, depending on what works best for you. See “Choosing the Right
Procedure” in Base SAS Procedures Guide for detailed information about SAS utility
procedures. The SAS windowing environment and how to use it for managing SAS
files is discussed in Chapter 16, “Introduction to the SAS Windowing Environment,”
on page 385 and Chapter 17, “Managing Your Data in the SAS Windowing
Environment,” on page 405 as well as in the online Help.
CATALOG procedure
provides catalog management utilities with the COPY, CONTENTS, and
APPEND procedures.
DATASETS procedure
provides all library management functions for all member types except catalogs.
If your site does not use the SAS Explorer, or if SAS executes in batch or
interactive line mode, using this procedure can save you time and resources.
SAS Explorer
includes windows that enable you to perform most file management tasks
without submitting SAS program statements. Type LIBNAME, CATALOG, or DIR
in the Toolbar window to use SAS Explorer, or select the Explorer icon from the
Toolbar menu.
DETAILS system option
Sets the default display for file information when using the CONTENTS or
DATASETS procedure. When enabled, DETAILS provides additional information
about files, depending on which procedure or window you use.
Tools for Managing Libraries 637
Library Directories
SAS Explorer and SAS procedures enable you to obtain a list, or directory, of the
members in a SAS library. Each directory contains the name of each member and
its member type. For the member type DATA, the directory indicates whether an
index, audit trail, backup, or generation data set is associated with the data set. The
directory also describes some attributes of the library, but the amount and nature of
this information vary with the operating environment.
Note: SAS libraries can also contain various SAS utility files. These files are not
listed in the library directory and are for internal processing.
SAS creates the data set and remembers its location for the duration of the SAS
session.
If you omit the single quotation marks, SAS creates the data set MyData in the
temporary Work subdirectory, named Work.MyData:
data mydata;
If you want to create a data set named MyData in a library other than the directory in
which you are running SAS, enclose the entire pathname in quotation marks,
following the naming conventions of your operating environment. For example, the
following DATA step creates a data set named Foo in the directory C:\sasrun
\mydata
data 'c:\sasrun\mydata\foo';
This method of accessing files works on all operating environments and in most
contexts where a libref.data-set-name is accepted as a SAS data set. Most data set
options can be specified with a quoted name.
You cannot use quoted names for the following:
n SAS catalogs
n contexts that do not accept a libref, such as the SELECT statement of PROC
COPY and most PROC DATASETS statements
n PROC SQL
The following table shows some examples of DATA statements that access SAS
data files without using a libref.
Table 26.2 Example DATA Statements That Access SAS Files without Using a Libref
Operating
Environment Examples
data '/mystuff/sasstuff/work/myfile';
/* UNIX file system library */
27
SAS Data Sets
SAS view
is a virtual data set that points to data from other sources. SAS views have a
member type of VIEW. For specific information, see Chapter 29, “SAS Views,”
on page 721.
Note: The term SAS data set is used when a SAS view and a SAS data file can be
used in the same manner.
1 A SAS view (member type VIEW) contains descriptor information and uses data
values from one or more data sets.
2 A SAS data file (member type DATA) contains descriptor information and data
values. SAS data sets can be a member type DATA (SAS data file) or VIEW
(SAS view).
3 An index is a separate file that you can create for a SAS data file in order to
provide direct access to specific observations. The index file has the same name
as its data file and a member type of INDEX. Indexes can provide faster access
to specific observations, particularly when you have a large data set.
n an UPDATE statement
n a MODIFY statement
Note: Because you can specify both SAS data files and SAS views in the same
program statements but cannot specify the member type, SAS cannot determine
from the program statement which one you want to process. This is why SAS
prevents you from giving the same name to SAS views and SAS data sets in the
same library.
When you create a new SAS data set, the libref indicates where it is to be stored.
When you reference an existing data set, the libref tells SAS where to find it. The
following examples show the use of two-level names in SAS statements:
data revenue.sales;
Data sets with one-level names are automatically assigned to one of two SAS
libraries: Work or User. Most commonly, they are assigned to the temporary library
Work and are deleted at the end of a SAS job or session. If you have associated the
libref User with a SAS library or used the USER= system option to set the User
library, data sets with one-level names are stored in that library. See Chapter 26,
“SAS Libraries,” on page 623 for more information about using the User and Work
libraries. The following examples show how one-level names are used in SAS
statements.
/* create perm data set in location of USER=option*/
options user='c:\temp'
data test3;
644 Chapter 27 / SAS Data Sets
sales1–sales4
If the numeric suffix of the first data set name contains leading zeros, the number of
digits in the last data set name must be greater than or equal to the number of digits
in the first data set name. Otherwise, an error occurs. For example, the data set lists
sales001–sales99 and sales01–sales9 cause an error to occur. The data set list
sales001–sales999 is valid. If the numeric suffix of the first data set name does not
contain leading zeros, then the number of digits in the numeric suffix of the first and
last data set names does not have to be equal. For example, the data set list
sales1–sales999 is valid.
Colon (name prefix) lists
require a series of data sets with the same starting character or characters. For
example, the following two lists refer to the same data sets:
abc:
n COPY EXCLUDE
n DELETE
n REPAIR
n REBUILD
In the DATA step, data set lists can be used in the following SAS statements:
n MERGE statement
n SET statement.
Special SAS Data Sets 645
Using _NULL_ causes SAS to execute the DATA step as if it were creating a new
data set, but no observations or variables are written to an output data set. This
process can be a more efficient use of computer resources if you are using the
DATA step for some function, such as report writing, for which the output of the
DATA step does not need to be stored as a SAS data set.
n whether there is only one observation for any given BY group (use of
NODUPKEY option)
n whether there are no adjacent duplicate observations (use of NODUPREC
option)
n whether the data set is validated
The sort indicator is set when a data set is sorted by a SORT procedure, an SQL
procedure with an ORDER BY clause, a DATASETS procedure MODIFY statement,
or a SORTEDBY= data set option. If the SORT or SQL procedures were used to
sort the data set, which is being sorted by SAS, the CONTENTS procedure output
indicates that the Validated sort information is YES. If the SORTEDBY= data set
option was used to sort the data set, which is being sorted by the user, the
CONTENTS procedure output indicates the Validated sort information is set to NO
and the Sortedby sort information is updated with the variable or variables specified
by the data set option.
Data sets can be sorted outside of SAS. In that case, you might use the
SORTEDBY= data set option or the DATASETS procedure MODIFY statement to
add the sort order to the sort indicator. In this case, they are not validated. For more
information, see “Validating That a Data Set Is Sorted” on page 652.
To view the sort indicator information, use the CONTENTS procedure or the
CONTENTS statement in the DATASETS procedure. The following three examples
show the sort indicator information in the CONTENTS procedure output.
Sorted Data Sets 647
data myfiles.sorttest1;
input priority 1. +1 indate date7.
+1 office $ code $;
format indate date7.;
datalines;
1 03may11 CH J8U
1 21mar11 LA M91
1 01dec11 FW L6R
1 27feb10 FW Q2A
2 15jan11 FW I9U
2 09jul11 CH P3Q
3 08apr10 CH H5T
3 31jan10 FW D2W
;
proc contents data=myfiles.sorttest1;
run;
Note that the CONTENTS procedure output indicates there was no sort. SAS did
not sort the data set, and the user did not specify that the data is sorted.
data myfiles.sorttest1;
input priority 1. +1 indate date7.
+1 office $ code $;
format indate date7.;
datalines;
1 03may01 CH J8U
1 21mar01 LA M91
1 01dec00 FW L6R
1 27feb99 FW Q2A
2 15jan00 FW I9U
2 09jul99 CH P3Q
3 08apr99 CH H5T
3 31jan99 FW D2W
;
proc sort data=myfiles.sorttest1;
by priority descending
indate;
run;
n For BY-group processing, if the data set is already sorted by the BY variable,
SAS does not use the index, even if the data set is indexed on the BY variable.
n If the Validated sort information is set to YES, SAS does not need to perform
another sort.
n SAS views
n MDDB files
For more information, see the SAS System Help for the VIEWTABLE window in
Base SAS.
654 Chapter 27 / SAS Data Sets
655
28
SAS Data Files
A SAS data file is not damaged when an operation attempts to exceed the
maximum observation count. However, you must take explicit action to continue
processing the file.
n If the SAS data file does not have an index or an integrity constraint that uses an
index, sequential processing continues and additional observations are
accepted. However, the file cannot store the observation count and does not
maintain the observation numbers. Any operation that requires an observation
number is not available. There are no messages to indicate that the file has
reached or exceeded the maximum observation count.
The following list describes some of the operations and features that are limited
for a SAS data file that exceeds the maximum observation count and does not
have an index or an integrity constraint that uses an index. For a complete list,
contact SAS Technical Support.
o SAS procedures that return an observation count (such as the PRINT
procedure or the CONTENTS procedure) return a missing value, which is
represented by a period (.), for the number of observations.
o SAS procedures that depend on the observation count (for example, the
SORT procedure or the COMPARE procedure) can return unpredictable
results.
o Operations that update the observation count cannot be submitted. You
cannot reset the observation count by deleting observations.
o When you request to compress a file for which the observation count is no
longer maintained, the compression percentage cannot be calculated.
o You cannot create an index or an integrity constraint.
n user variables, which are optional variables that you can define to collect
modification data
The _AT*_ variables are described in the following table.
Code Modification
AL Auditing is resumed
AS Auditing is suspended
The type of entries stored in the audit trail, along with their corresponding
_ATOPCODE_ values, are determined by the options specified in the LOG
statement when the audit trail is initiated. Note that if the LOG statement is omitted
when the audit trail is initiated, the default behavior is to log all images.
n The A operation codes are controlled by the ADMIN_IMAGE option.
The user variable is a variable that associates data values with the data file without
making them part of the data file. That is, the data values are stored in the audit file,
but you update them in the data file like any other variable. You might want to define
a user variable to enable end users to enter a reason for each update.
A data file can have one audit file, and the audit file must reside in the same SAS
library as the data file.
664 Chapter 28 / SAS Data Files
If you define user variables, you must store values in them in order for the variables
to be meaningful. Programmatically, you can enter data values for the user variables
as you would for any data variable. See “Example of a Data File Update” on page
669. The data values are saved to the audit trail as each observation is saved.
User variables cannot be displayed or updated in an interactive window except in
the FSEDIT window of SAS/FSP software. To view the audit variables, use the
TYPE=AUDIT data set option to print the audit file.
However, to rename a user variable or modify its attributes, you modify the data file,
not the audit file. The following example uses PROC DATASETS to rename the user
variable:
proc datasets lib=mylib;
modify sales;
rename reason_code = Reason;
run;
quit;
You must also define attributes such as format and informat in the data file with
PROC DATASETS.
Performance Implications
Because each update to the data file is also written to the audit file, the audit trail
can negatively impact system performance. You might want to consider suspending
the audit trail for large, regularly scheduled batch updates. Note that the audit
variables are unavailable when the audit trail is suspended.
Understanding an Audit Trail 665
Programming Considerations
For data files whose audit file contains user variables, the variable list is different
when browsing and updating the data file. The user variables are selected for
update but not for browsing. You should be aware of this difference when you are
developing your own full-screen applications.
Other Considerations
Data values that are entered for user variables are not stored in the audit trail for
Delete operations.
If the audit file becomes damaged, you cannot process the data file until you
terminate the audit trail. Then you can initiate a new audit trail or process the data
file without one. To terminate the audit trail for a generation data set, use the
GENNUM= data set option in the AUDIT statement. You cannot initiate an audit trail
for a generation data set.
In indexed data sets, the fast-append feature can cause some observations to be
written to the audit trail twice, first with a DA operation code and then with an EA
operation code. The observations with EA represent the observations rejected by
index restrictions. For more information, see “Appending to an Indexed Data Set —
Fast-Append Method” in Base SAS Procedures Guide.
The CONTENTS procedure output is shown below. Notice that the output contains
all of the variables from the corresponding data file, the _AT*_ variables, and the
user variable.
Understanding an Audit Trail 667
You can also use your favorite reporting tool, such as PROC REPORT or PROC
TABULATE, on the audit trail.
668 Chapter 28 / SAS Data Files
data mylib.sales;
length product $9;
input product invoice renewal;
datalines;
FSP 1270.00 570
SAS 1650.00 850
Understanding an Audit Trail 669
STAT 570.00 0
STAT 970.82 600
OR 239.36 0
SAS 7478.71 1100
SAS 800.00 800
;
/*----------------------------------*/
/* Create an audit trail with a */
/* user variable. */
/*----------------------------------*/
/*----------------------------------------*/
/* Print the audit trail. */
/*----------------------------------------*/
proc sql;
select product,
reason_code,
_atopcode_,
_atdatetime_
from mylib.sales(type=audit);
quit;
/*----------------------------------*/
/* Create integrity constraints. */
/*----------------------------------*/
proc datasets lib=mylib;
modify sales;
ic create null_renewal = not null (invoice)
message = "Invoice must have a value.";
ic create invoice_amt = check (where=((invoice > 0) and
(renewal <= invoice)))
message = "Invoice and/or renewal are invalid.";
run;
/*----------------------------------*/
/* Do some updates. */
/*----------------------------------*/
proc sql; /* this update works */
update mylib.sales
set invoice = invoice * .9,
reason_code = "10% price cut"
where renewal > 800;
/*----------------------------------------*/
/* Print the audit trail. */
/*----------------------------------------*/
proc print data=mylib.sales(type=audit);
Understanding an Audit Trail 671
/*----------------------------------------*/
/* Print the rejected records. */
/*----------------------------------------*/
proc print data=mylib.sales(type=audit);
where _atopcode_ eq "EA";
format _atmessage_ $250.;
var product invoice renewal _atmessage_ ;
title 'Rejected Records';
run;
The following output shows the contents of MyLib.Sales.Audit after several updates
of MyLib.Sales.Data were attempted. Integrity constraints were added to the file,
and then updates were attempted.
This output prints information about the rejected observations on the audit trail.
672 Chapter 28 / SAS Data Files
GENMAX=
is an output data set option that requests generations for a data set and
specifies the maximum number of versions (including the base version and all
historical versions) to keep for a given data set. The default is GENMAX=0,
which means that the generation data sets feature is not in effect.
GENNUM=
is an input data set option that references a specific version from a generation
group. Positive numbers are absolute references to a historical version by its
generation number. Negative numbers are a relative reference to historical
versions. For example, GENNUM=-1 refers to the youngest version.
historical versions
are the older copies of the base version of a data set. Names for historical
versions have a four-character suffix for the generation number, such as #003.
oldest version
is the oldest version in a generation group.
rolling over
specifies the process of the version number moving from 999 to 000. When the
generation number reaches 999, its next value is 000.
youngest version
is the version that is chronologically closest to the base version.
Once the GENMAX= data set option is in effect, the data set member name is
limited to 28 characters (rather than 32).This happens because the last four
characters are reserved for a version number. When the GENMAX= data set option
is not in effect, the member name can be up to 32 characters. See the GENMAX=
data set option in SAS Data Set Options: Reference.
A#003
most recent (youngest) historical version
A#002
second most recent historical version
A#001
oldest historical version
With GENMAX=4, a fourth replacement deletes the oldest version, which is A#001.
As replacements occur, SAS always keeps four copies. For example, after ten
replacements, the result is:
A
base (current) version
A#010
most recent (youngest) historical version
A#009
2nd most recent historical version
A#008
oldest historical version
The limit for version numbers that SAS can append is #999. After 999 replacements,
the youngest version is #999. After 1,000 replacements, SAS rolls over the
youngest version number to #000. After 1,001 replacements, the youngest version
number is #001. For example, using data set A with GENNUM=4, the results would
be:
999 replacements
n A (current)
n A#997 (oldest)
1,000 replacements
n A (current)
n A#998 (oldest)
1,001 replacements
n A (current)
n A#001 (most recent)
n A#999 (oldest)
The following table shows how names are assigned to a generation group:
Understanding Generation Data Sets 675
GENNUM= GENNUM=
Tim Data Set Absolute Relative
e SAS Code Names Reference Reference Explanation
To request a specific version from a generation group, use the GENNUM= input
data set option. There are two methods that you can use:
n A positive integer (excluding zero) references a specific historical version
number. For example, the following statement prints the historical version #003:
proc print data=a(gennum=3);
run;
Note: After 1,000 replacements, if you want historical version #000, specify
GENNUM=1000.
n A negative integer is a relative reference to a version in relation to the base
version, from the youngest predecessor to the oldest. For example,
GENNUM=-1 refers to the youngest version. The following statement prints the
data set that is three versions previous to the base version:
proc print data=a(gennum=-3);
run;
676 Chapter 28 / SAS Data Files
proc print data=air (gennum=0); Prints the current (base) version of the Air data
set.
proc print data=air;
proc print data=air (gennum=-2); Prints the version two generations back from the
current version.
proc print data=air (gennum=1000); After 1,000 replacements, prints the file Air#000,
which is the file that is created after Air#999.
Introduction
The DATASETS procedure provides a variety of statements for managing
generation groups. Note that for the DATASETS procedure, GENNUM= has the
following additional functionality:
n For the PROC DATASETS and DELETE statements, GENNUM= supports the
additional values ALL, HIST, and REVERT.
n For the CHANGE statement, GENNUM= supports the additional value ALL.
n For the CHANGE statement, specifying GENNUM=0 refers to all versions rather
than just the base version.
CAUTION! Do not use operating system tools when managing generation data
sets. This can cause limited access to the generation group files. Instead, use SAS tools
such as the DATASETS or COPY procedure.
For example, the following DATASETS procedure uses the COPY statement to copy
a generation group for data set MyGen1 from library MyLib1 to library MyLib2.
libname mylib1 'SAS-library-1';
libname mylib2 'SAS-library-2';
proc datasets;
copy in=mylib1 out=mylib2;
select mygen1;
run;
CAUTION! If you decrease the number of versions, SAS deletes the oldest
version or versions so as not to exceed the new maximum. For example, the
following MODIFY statement decreases the number of versions for MyLib.Air from 4 to
0. This decrease causes SAS to automatically delete the three historical versions:
proc datasets library=mylib;
modify air (genmax=0);
run;
delete air(gennum=all); Deletes all data sets in the generation group, including
the base version.
delete air(gennum=hist); Deletes all data sets in the generation group, except
the base version.
n If the foreign key is being added to a data file that already contains data, the
data values in the foreign key data file must either match existing values in the
primary key data file, or the values must be null.
The foreign key data file can exist in the same SAS library as the referenced primary
key data file (intra-libref), or in a different SAS library (inter-libref). However, if the
library that contains the foreign key data file is temporary, the library that contains
the primary key data file must be temporary as well. In addition, referential integrity
constraints cannot be assigned to data files in concatenated libraries.
There is no limit to the number of foreign keys that can reference a primary key.
However, additional foreign keys can adversely impact the performance of Update
and Delete operations.
When a referential constraint exists, a primary key integrity constraint is not deleted
until all foreign keys that reference it are deleted. There are no restrictions on
deleting foreign keys.
restrictions when you define a primary key and a foreign key constraint that use the
same variables:
n The foreign key's update and delete referential actions must both be RESTRICT.
n When the same variables are used in a primary key and foreign key definition,
the variables must be defined in a different order.
For an example, see “Defining Overlapping Primary Key and Foreign Key
Constraints” on page 691.
n PROC APPEND
o for an existing BASE= data file, integrity constraints in the BASE= file are
preserved, but integrity constraints in the DATA= file that is being appended
to the BASE= file are not preserved.
o for a non-existent BASE= data file, general integrity constraints in the DATA=
file that is being appended to the new BASE= file are preserved. Referential
constraints in the DATA= file are not preserved.
n PROC SORT, PROC UPLOAD, and PROC DOWNLOAD, when an OUT= data
file is not specified
n the SAS Explorer window
You can also use the CONSTRAINT= option to control whether integrity constraints
are preserved for the COPY, CPORT, CIMPORT, UPLOAD, and DOWNLOAD
procedures.
General integrity constraints are preserved in an active state. The state in which
referential constraints are preserved depends on whether the procedure causes the
primary key and foreign key data files to be written to the same or different SAS
libraries (intra-libref versus inter-libref integrity constraints). Intra-libref constraints
are preserved in an active state. Inter-libref constraints are preserved in an inactive
state. That is, the primary key portion of the integrity constraint is enforced as a
general integrity constraint but the foreign key portion is inactive. You must use the
DATASETS procedure statement IC REACTIVATE to reactivate the inactive foreign
keys.
The following table summarizes the circumstances under which integrity constraints
are preserved.
682 Chapter 28 / SAS Data Files
n The NOMISS index attribute and the not-null integrity constraint have different
effects. The integrity constraint prevents missing values from being written to the
SAS data file and cannot be added to an existing data file that contains missing
values. The index attribute allows missing data values in the data file but
excludes them from the index.
n When any index is created, it is marked as being “owned” by the user, the
integrity constraint, or both. A user cannot delete an index that is also owned by
an integrity constraint and vice versa. If an index is owned by both, the index is
deleted only after both the integrity constraint and the user have requested the
index's deletion. A note in the log indicates when an index cannot be deleted.
When you specify integrity constraints, you must specify a separate statement for
each constraint. In addition, you must specify a separate statement for each variable
to which you want to assign the not-null integrity constraint. When multiple variables
are included in the specification for a primary key, foreign key, or a unique integrity
constraint, a composite index is created and the integrity constraint enforces the
combination of variable values. The relationship between SAS indexes and integrity
constraints is described in “Indexes and Integrity Constraints” on page 715. For
more information, see “Understanding SAS Indexes” on page 692.
When you add an integrity constraint in SCL, open the data set in utility mode. See
“Creating Integrity Constraints By Using SCL” on page 687 for an example.
Integrity constraints must be deleted in utility open mode. For detailed syntax
information, see SAS Component Language: Reference.
When generation data sets are used, you must create the integrity constraints in
each data set generation that includes protected variables.
CAUTION! CHECK constraints in SAS 9.2 are not compatible with earlier releases
of SAS. If you add a CHECK constraint to an existing SAS data set or create a SAS
data set that includes a CHECK constraint, the data set cannot be accessed by a
release prior to SAS 9.2.
When the primary key data file is opened for update processing, SAS automatically
tries to open the foreign key data file by using the foreign key data file's physical
name that is stored in the primary key data file, which is C:\Public
\fkey_directory. However, that directory does not exist on machine F2760.
Therefore, opening the foreign key data file fails.
Understanding Integrity Constraints 685
Rejected Observations
You can customize the error message that is associated with an integrity constraint
when you create the constraint by using the MESSAGE= and MSGTYPE= options.
The MESSAGE= option enables you to prepend a user-defined message to the
SAS error message associated with an integrity constraint. The MSGTYPE= option
enables you to suppress the SAS portion of the message. For more information, see
the PROC DATASETS, PROC SQL, and SCL documentation.
Rejected observations can be collected in a special file by using an audit trail.
Examples
proc sql;
create table people
(
name char(14),
gender char(6),
hired num,
jobtype char(1) not null,
status char(10),
MAIN:
put "Opening WORK.ONE in utility mode.";
dsid = open('work.one', 'V');/* Utility mode.*/
if (dsid = 0) then
do;
_msg_=sysmsg();
put _msg_=;
end;
else do;
if (dsid > 0) then
put "Successfully opened WORK.ONE in"
"UTILITY mode.";
end;
put rc=;
_msg_=sysmsg();
put _msg_=;
end;
else do;
put "Successfully created a check"
"integrity constraint.";
end;
_msg_=sysmsg();
put _msg_=;
end;
TERM:
put "End of test SCL integrity constraint"
"functions.";
return;
The previous code creates the SCL catalog entry. The following code creates two
data files, One and Two, and executes the SCL entry Example.Ic_Cat_Allics.SCL:
/* Submit to create data files. */
Elaine 14
Tina 15
;
rc = icdelete(dsid2, 'fk');
if (rc > 0) then
do;
put rc=;
_msg_=sysmsg();
end;
Understanding Integrity Constraints 691
else
do;
put "Successfully deleted a foreign key integrity constraint.";
end;
rc = close(dsid2);
return;
quit;
1 Defines a primary key constraint for data set Singers1, for variables FirstName
and LastName.
2 Defines a foreign key constraint for data set Singers2 for variables FirstName
and LastName that references the primary key defined in Step 1. Because the
intention is to define a primary key using the same variables, the foreign key
update and delete referential actions must both be RESTRICT.
3 Defines a primary key constraint for data set Singers2 for variables LastName
and FirstName. Because those exact same variables are already defined as a
foreign key, the order must be different.
4 Defines a foreign key constraint for data set Singers1 for variables LastName
and FirstName that references the primary key defined in Step 3. Because those
exact same variables are already defined as a primary key, the order must be
different. Because a primary key is already defined using the same variables, the
foreign key's update and delete referential actions must both be RESTRICT.
Benefits of an Index
In general, SAS can use an index to improve performance in the following
situations:
n For WHERE processing, an index can provide faster and more efficient access
to a subset of data. To process a WHERE expression, SAS by default decides
whether to use an index or to read the data file sequentially.
n For BY processing, an index returns observations in the index order, which is in
ascending value order, without using the SORT procedure even when the data
file is not stored in that order.
Note: If you use the SORT procedure, the index is not used.
n For the SET and MODIFY statements, the KEY= option enables you to specify
an index in a DATA step to retrieve particular observations in a data file.
In addition, an index can benefit other areas of SAS. In SCL (SAS Component
Language), an index improves the performance of table lookup operations. For the
SQL procedure, an index enables the software to process certain classes of queries
more efficiently (for example, join queries). For the SAS/IML software, you can
explicitly specify that an index be used for read, delete, list, or Append operations.
Even though an index can reduce the time required to locate a set of observations,
especially for a large data file, there are costs associated with creating, storing, and
maintaining the index. When deciding whether to create an index, you must
consider increased resource usage, along with the performance improvement.
Note: An index is never used for the subsetting IF statement in a DATA step, or for
the FIND and SEARCH commands in the FSEDIT procedure.
n one or more unique record identifiers (referred to as a RID) that identifies each
observation containing the value. (Think of the RID as an internal observation
number.)
That is, in an index file, each value is followed by one or more RIDs, which identify
the observations in the data file that contains the value. (Multiple RIDs result from
694 Chapter 28 / SAS Data Files
multiple occurrences of the same value.) For example, the following represents
index file entries for the variable LastName:
Avery 10
Brown 6, 22, 43
Craig 5, 50
Dunn 1
Types of Indexes
n a composite index, which consists of the values of more than one variable, with
the values concatenated to form a single value
In addition to deciding whether you want a simple index or a composite index, you
can also limit an index (and its data file) to unique values and exclude from the
index missing values.
Simple Index
The most common index is a simple index, which is an index of values for one key
variable. The variable can be numeric or character. When you create a simple index,
SAS assigns to the index the name of the key variable.
Understanding SAS Indexes 695
The following example shows the DATASETS procedure statements that are used
to create two simple indexes for variables Class and Major in data file
College.Survey:
proc datasets library=college;
modify survey;
index create class;
index create major;
run;
To process a WHERE expression using an index, SAS uses only one index. When
the WHERE expression has multiple conditions using multiple key variables, SAS
determines which condition qualifies the smallest subset. For example, suppose that
College.Survey contains the following data:
n 42,000 observations contain class=12
With simple indexes on Class and Major, SAS would select Major to process the
following WHERE expression.
where class=12 and major='Biology';
Composite Index
A composite index is an index of two or more key variables with their values
concatenated to form a single value. The variables can be numeric, character, or a
combination. An example is a composite index for the variables LastName and
FirstName. A value for this index consists of the value for LastName immediately
followed by the value for FirstName from the same observation. When you create a
composite index, you must specify a unique index name.
The following example shows the DATASETS procedure statements that are used
to create a composite index for the data file College.MailList, specifying two key
variables: ZipCode and SchoolId.
proc datasets library=college;
modify maillist;
index create zipid=(zipcode schoolid);
run;
Often, only the first variable of a composite index is used. For example, for a
composite index on ZipCode and SchoolId, the following WHERE expression can
use the composite index for the variable ZipCode because it is the first key variable
in the composite index:
where zipcode = 78753;
However, you can take advantage of all key variables in a composite index by how
you construct the WHERE expression, which is referred to as compound
optimization. Compound optimization is the process of optimizing multiple WHERE
expression conditions using a single composite index. If you issue the following
WHERE expression, the composite index is used to find all occurrences where the
ZIP code is 78753 and the school identification number is 55. In this way, all of the
conditions are satisfied with a single search of the index:
where zipcode = 78753 and schoolid = 55;
696 Chapter 28 / SAS Data Files
When you are deciding whether to create a simple index or a composite index,
consider how you will access the data. If you often access data for a single variable,
a simple index will do. But if you frequently access data for multiple variables, a
composite index could be beneficial.
Unique Values
Often it is important to require that values for a variable be unique, like Social
Security number and employee number. You can declare unique values for a
variable by creating an index for the variable and including the UNIQUE option. A
unique index guarantees that values for one variable or the combination of a
composite group of variables remain unique for every observation in the data file. If
an update tries to add a duplicate value to that variable, the update is rejected.
The following example creates a simple index for the variable IdNum and requires
that all values for IdNum be unique:
proc datasets library=college;
modify student;
index create idnum / unique;
run;
Missing Values
If a variable has a large number of missing values, it might be desirable to keep
them from using space in the index. Therefore, when you create an index, you can
include the NOMISS option to specify that missing values are not maintained by the
index.
The following example creates a simple index for the variable Religion and specifies
that the index does not maintain missing values for the variable:
proc datasets library=college;
modify student;
index create religion / nomiss;
run;
In contrast to the UNIQUE option, observations with missing values for the key
variable can be added to the data file, even though the missing values are not
added to the index.
SAS does not use an index that was created with the NOMISS option to process a
BY statement or to process a WHERE expression that qualifies observations that
contain missing values. If no missing values are present, SAS considers using the
index in processing the BY statement or WHERE expression.
In the following example, the index Age was created with the NOMISS option and
observations exist that contain missing values for the variable Age. In this case,
SAS does not use the index:
proc print data=mydata.employee;
where age < 35;
run;
Understanding SAS Indexes 697
Costs of an Index
An index exists to improve performance. However, an index conserves some
resources at the expense of others. Therefore, you must consider costs associated
with creating, using, and maintaining an index. The following topics provide
information about resource usage and give you some guidelines for creating
indexes.
CPU Cost
Additional CPU time is necessary to create an index as well as to maintain the index
when the data file is modified. That is, for an indexed data file, when a value is
added, deleted, or modified, it must also be added, deleted, or modified in the
appropriate index(es).
When SAS uses an index to read an observation from a data file, there is also
increased CPU usage. The increased usage results from SAS using a more
complicated process than is used when SAS retrieves data sequentially. Although
CPU usage is greater, you benefit from SAS reading only those observations that
meet the conditions. Note that increased CPU usage is why using an index is more
expensive when there is a larger number of observations that meet the conditions.
Note: To compare CPU usage with and without an index, for some operating
environments, you can issue the STIMER or FULLSTIMER system options in order
to write performance statistics to the SAS log.
I/O Cost
Using an index to read observations from a data file can increase the number of I/O
(input/output) requests compared to reading the data file sequentially. For example,
processing a BY statement with an index might increase I/O count, but you save in
not having to issue the SORT procedure. For WHERE processing, SAS considers
I/O count when deciding whether to use an index.
1 SAS does a binary search on the index file and positions the index to the first
entry that contains a qualified value.
2 SAS uses the value's RID (identifier) to directly access the observation
containing the value. SAS transfers the observation between external storage to
a buffer, which is the memory into which data is read or from which data is
written. The data is transferred in pages, which is the amount of data (the
number of observations) that can be transferred for one I/O request; each data
file has a specified page size.
3 SAS then continues the process until the WHERE expression is satisfied. Each
time SAS accesses an observation, the data file page containing the observation
must be read into memory if it is not already there. Therefore, if the observations
are on multiple data file pages, an I/O operation is performed for each
observation.
698 Chapter 28 / SAS Data Files
The result is that the more random the data, the more I/Os are required to use the
index. If the data is ordered more like the index, which is in ascending value order, a
smaller number of I/Os are required to access the data.
The number of buffers determines how many pages of data can simultaneously be
in memory. Frequently, the larger the number of buffers, the smaller the number of
I/Os that are required. For example, if the page size is 4096 bytes and one buffer is
allocated, then one I/O transfers 4096 bytes of data (or one page). To reduce I/Os,
you can increase the page size but you need a larger buffer. To reduce the buffer
size, you can decrease the page size but you use more I/Os.
For information about data file characteristics like the data file page size and the
number of data file pages, issue the CONTENTS procedure (or use the CONTENTS
statement in the DATASETS procedure). With this information, you can determine
the data file page size and experiment with different sizes. Note that the information
that is available from PROC CONTENTS depends on the operating environment.
The BUFSIZE= data set option (or system option) sets the permanent page size for
a data file when it is created. The page size is the amount of data that can be
transferred for an I/O operation to one buffer. The BUFNO= data set option (or
system option) specifies how many buffers to allocate for a data file and for the
overall system for a given execution of SAS. That is, BUFNO= is not stored as a
data set attribute.
Buffer Requirements
In addition to the resources that are used to create and maintain an index, SAS also
requires additional memory for buffers when an index is actually used. Opening the
data file opens the index file but none of the indexes. The buffers are not required
unless SAS uses the index but they must be allocated in preparation for the index
that is being used.
The number of buffers that are allocated depends on the number of levels in the
index tree and in the data file open mode. If the data file is open for input, the
maximum number of buffers is three; for update, the maximum number is four. (Note
that these buffers are available for other uses; they are not dedicated to indexes.)
The IBUFSIZE= system option specifies the page size on disk for an index file when
it is created. The default setting causes SAS to use the minimum optimal page size
for the operating environment. Typically, you do not need to specify an index page
size. However, there are situations that could require a different page size. For more
information, see the “IBUFSIZE= System Option” in SAS System Options:
Reference.
The IBUFNO= system option specifies an optional number of extra buffers to be
allocated when navigating an index file. SAS automatically allocates a minimal
number of buffers. Typically, you do not need to specify extra buffers. However,
using IBUFNO= to specify extra buffers could improve execution time by limiting the
number of input/output operations that are required for a particular index file. The
improvement in execution time, however, comes at the expense of increased
memory consumption. For more information, see the “IBUFNO= System Option” in
SAS System Options: Reference.
For information about the index file size, issue the CONTENTS procedure (or the
CONTENTS statement in the DATASETS procedure). Note that the available
information from PROC CONTENTS depends on the operating environment.
expression. Take into consideration the resources that it takes to actually create
the index and to maintain it every time the data file is changed.
n When you create an index to process a WHERE expression, do not try to create
one index that is used to satisfy all queries. If there are several variables that
appear in queries, those queries might be best satisfied with simple indexes on
the most discriminating of those variables.
Note that when you create a composite index, the first key variable should be
the most discriminating.
Creating an Index
1 You request to create an index for one or multiple variables using a method such
as the INDEX CREATE statement in the DATASETS procedure.
2 SAS reads the data file one observation at a time, extracts values and RIDs for
each key variable, and places them in the index file.
SAS ensures that the values that are placed in the index are successively the same
or increasing. SAS determines whether the data is already sorted by the key
variables in ascending order. It determines this by checking the sort indicator in the
data file, which is an attribute of the file that indicates how the data is sorted. The
sort indicator is stored with the SAS data file descriptor information and is set from a
previous SORT procedure or SORTEDBY= data set option.
If the values in the sort indicator are in ascending order, SAS does not sort the
values for the index file and avoids the resource. Note that SAS always validates
that the data is sorted as indicated. If not, the index is not created. For example, if
the sort indicator was set from a SORTEDBY= data set option and the data is not
sorted as indicated, an error occurs. A message is written to the SAS log stating that
the index was not created because values are not sorted in ascending order.
If the values in the sort indicator are not in ascending order, SAS sorts the data that
is included in the index file in ascending value order. To sort the data, SAS follows
this procedure:
1 SAS first attempts to sort the data using the thread-enabled sort. By dividing the
sorting into separately executable processes, the time to sort the data can be
reduced. To use the thread-enabled sort, the index must be sufficiently large
(which is determined by SAS), the SAS system option CPUCOUNT= must be set
to more than one processor, and the THREADS system option must be enabled.
Adequate memory must be available for the thread-enabled sort. If not enough
memory is available, SAS reduces the number of threads to one and begins the
sort process again, which increases the time to create the index.
2 If the thread-enabled sort cannot be done, SAS uses the unthreaded sort.
Note: To display messages regarding what type of sort is used, memory and
resource information, and the status of the index being created, set the SAS system
option MSGLEVEL=I; that is:
options msglevel=i;
Note: If you delete and create indexes in the same step, place the INDEX DELETE
statement before the INDEX CREATE statement so that space occupied by deleted
indexes can be reused during index creation.
702 Chapter 28 / SAS Data Files
The next example uses the variables SSN, City, and State to create a simple index
named SSN and a composite index named CitySt:
data employee(index=(ssn cityst=(city state)));
Note: Conditions are not optimized with an index for arithmetic operators, a
variable-to-variable condition, the sounds-like operator, and any function other than
the TRIM and SUBSTR function as listed above.
The following examples illustrate optimizing a single condition:
n The following WHERE expressions could use a simple index on the variable
Major:
where major in ('Biology', 'Chemistry', 'Agriculture');
where class=11 and major in ('Biology', 'Agriculture');
n With a composite index on variables ZipCode and SchoolId, SAS could use the
composite index to satisfy the following conditions because ZipCode is the first
key variable in the composite index:
where zipcode = 78753;
However, the following condition cannot use the composite index because the
variable SchoolId is not the first key variable in the composite index:
where schoolid gt 1000;
Understanding SAS Indexes 705
Compound Optimization
Compound optimization is the process of optimizing multiple WHERE expression
conditions using a single composite index. Using a single index to optimize the
conditions can greatly improve performance.
For example, suppose there is a composite index for LastName and FirstName. If
you execute the following WHERE expression, SAS uses the concatenated values
for the first two variables, then SAS further evaluates each qualified observation for
the EmpId value:
where lastname eq 'Smith' and firstname eq 'John' and empid=3374;
o When conditions are connected with OR, the conditions must specify the
same variable. For example:
where firstname eq 'John' and
(lastname eq 'Smith' or lastname eq 'Jones');
Note: SAS transforms the OR conditions that specify the same variable into
a single condition that uses the IN operator. For the above WHERE
expression, SAS converts the two OR conditions into lastname IN
('Smith','Jones'), and then uses the composite index for the variables
FirstName and LastName in order to select the observations where
FirstName is John and LastName is Smith or Jones.
For the following examples, assume there is a composite index for variables I, J,
and CH:
n The following WHERE expression conditions are compound optimized because
every condition specifies a variable that is in the composite index, and each
condition uses one of the supported operators. SAS positions the composite
index to the first entry that meets all three conditions and retrieves only
observations that satisfy all three conditions.
where I = 1 and J not in (3,4) and 'abc' < CH;
n For the following WHERE expression, the first two conditions are compound
optimized. After retrieving a subset of observations that satisfy the first two
conditions, SAS examines the subset and eliminates any observations that fail to
match the third condition.
where I in (1,4) and J = 5 and K like '%c';
n This WHERE expression can be compound optimized for variables I and J. After
retrieving observations that satisfy the second and third conditions, SAS
706 Chapter 28 / SAS Data Files
examines the subset and eliminates those observations that do not satisfy the
first condition.
where X < 5 and I = 1 and J = 2;
1 SAS predicts the number of I/Os that it takes to satisfy the WHERE expression
using the index. To do so, SAS positions the index to the first entry that contains
a qualified value. In a buffer management simulation that takes into account the
current number of available buffers, the RIDs (identifiers) on that index page are
processed, indicating how many I/Os it takes to read the observations in the data
file.
If the observations are randomly distributed throughout the data file, the
observations are located on multiple data file pages. This means that an I/O is
needed for each page. Therefore, the more random the data in the data file, the
more I/Os it takes to use the index. If the data in the data file is ordered more like
the index, which is in ascending value order, a smaller number of I/Os are
needed to use the index.
2 SAS calculates the I/O cost of a sequential pass of the entire data file and
compares the two resource costs.
708 Chapter 28 / SAS Data Files
Factors that affect the comparison include the size of the subset relative to the size
of the data file, data file value order, data file page size, the number of allocated
buffers, and the cost to uncompress a compressed data file for a sequential read.
Note: If comparing resource costs results in a tie, SAS chooses the index.
For details, see the IDXWHERE data set option in SAS Data Set Options:
Reference.
The IDXNAME= data set option directs SAS to use a specific index in order to
satisfy the conditions of a WHERE expression.
By specifying IDXNAME=index-name, you are specifying the name of a simple or
composite index for the data file.
The following example uses the IDXNAME= data set option to direct SAS to use a
specific index to optimize the WHERE expression. SAS disregards the possibility
that a sequential search of the data file might be more resource efficient. SAS does
not attempt to determine whether the specified index is the best one. (Note that the
EMPNUM index was not created with the NOMISS option.)
data mydata.empnew;
set mydata.employee (idxname=empnum);
where empnum < 2000;
For details, see the IDXNAME data set option in SAS Data Set Options: Reference.
Note: IDXWHERE= and IDXNAME= are mutually exclusive. Using both options
results in an error.
Understanding SAS Indexes 709
n If an index is not used but one exists that could optimize at least one condition in
the WHERE expression, messages provide suggestions as to what you can do
to influence SAS to use the index. For example, a message could suggest
sorting the data file into index order or specifying more buffers.
n A message displays the IDXWHERE= or IDXNAME= data set option value if the
setting can affect index processing.
If you issue PROC SQL with an SQL WHERE clause that specifies the key variable
State, then the SQL view can join the two conditions, which enables SAS to use the
index State:
proc sql;
select * from stat where state > 42;
quit;
For example, if an index exists for LastName, the following BY statement would use
the index to order the values by last names:
proc print;
by lastname;
When you specify a BY statement, SAS looks for an appropriate index. If one exists,
the software automatically retrieves the observations from the data file in indexed
order.
A BY statement uses an index in the following situations:
n The BY statement consists of one variable that is the key variable for a simple
index or the first key variable in a composite index.
n The BY statement consists of two or more variables and the first variable is the
key variable for a simple index or the first key variable in a composite index.
For example, if the variable Major has a simple index, the following BY statements
use the index to order the values by Major:
by major;
by major state;
If a composite index named ZipId exists consisting of the variables ZipCode and
SchoolId, the following BY statements use the index:
by zipcode;
by zipcode schoolid;
by zipcode schoolid name;
However, the composite index ZipId is not used for these BY statements:
by schoolid;
by schoolid zipcode;
n The data file is physically stored in sorted order based on the variables specified
in the BY statement.
Note: Using an index to process a BY statement might not always be more efficient
than simply sorting the data file, particularly if the data file has a high blocking factor
of observations per page. Therefore, using an index for a BY statement is generally
for convenience, not performance.
proc print;
by lastname;
where lastname >= 'Smith';
run;
Note: A BY statement is not allowed in the same DATA step with the KEY= option,
and WHERE processing is not allowed for a data file with the KEY= option.
Task Result
delete observations index entries are deleted and space is recovered for
reuse
update observations index entries are deleted and new ones are inserted
Note: Use SAS to perform additions, modifications, and deletions to your data sets.
Using operating environment commands to perform these operations makes your
files unusable.
Note: If you sort an indexed data file with the FORCE option, the index file is
deleted.
Extended Attributes
Definition
You can think of extended attributes as customized metadata for your SAS files.
Whereas common SAS attributes such as Labels for data sets, or Length and Label
for variables are predefined SAS system attributes, extended attributes are
attributes that you define yourself. They are organized into name-value pairs and
are associated with either a variable within a SAS data set or a SAS data set in
general. Extended attributes are organized into (name, value) pairs and for the
BASE engine, their data is stored in a separate SAS data file with file extension
sas7bxat.
The CONTENTS procedure and the DATASETS procedure produce the following
error when the .sas7bxat (extended attributes) file is absent from the directory and
extended attributes were defined:
Definition of Compression
Compressing a file is a process that reduces the number of bytes required to
represent each observation. In a compressed file, each observation is a variable-
length record, while in an uncompressed file, each observation is a fixed-length
record.
Advantages of compressing a file include the following:
n reduced storage requirements for the file
n less I/O operations necessary to read from or write to the data during processing
718 Chapter 28 / SAS Data Files
Requesting Compression
By default, a SAS data file is not compressed. To compress, you can use these
options:
n COMPRESS= system option to compress all data files that are created during a
SAS session
n COMPRESS= option in the LIBNAME statement to compress all data files for a
particular SAS library
n COMPRESS= data set option to compress an individual data file
When you create a compressed data file, SAS writes a note to the log indicating the
percentage of reduction that is obtained by compressing the file. SAS obtains the
compression percentage by comparing the size of the compressed file with the size
of an uncompressed file of the same page size and record count.
After a file is compressed, the setting is a permanent attribute of the file. This means
that you must re-create the file to change the setting. That is, to uncompress a file,
specify COMPRESS=NO for a DATA step that copies the compressed data file.
For more information about the COMPRESS= data set option, see SAS Data Set
Options: Reference. For more information about the COMPRESS= option in the
LIBNAME statement, see SAS Global Statements: Reference. For more information
about the COMPRESS= system option, see SAS System Options: Reference.
For example, here is a simple data set for which SAS determines that it is not
possible for the compressed file to be smaller than an uncompressed one:
data one (compress=char);
length x y $2;
input x y;
datalines;
ab cd
;
The following output is written to the SAS log:
Example Code 28.1 SAS Log Output When Compression Request Is Disabled
NOTE: Compression was disabled for data set WORK.ONE because compression
overhead would increase the size of the data set.
NOTE: The data set WORK.ONE has 1 observations and 2 variables.
720 Chapter 28 / SAS Data Files
721
29
SAS Views
SAS Views
interface view
is a SAS view that is created with SAS/ACCESS software. An interface view can
read data from or write data to a database management system (DBMS) such as
DB2 or ORACLE. Interface views are also referred to as SAS/ACCESS views. In
order to use SAS/ACCESS views, you must have a license for SAS/ACCESS
software.
Note: You can create native views that access certain DBMS data by using a
SAS/ACCESS dynamic LIBNAME engine. See “SAS/ACCESS Views” on page
730, or the SAS/ACCESS documentation for your DBMS for more information.
Native Interface
(PROC SQL) (SAS/ACCESS)
n to migrate data to SAS data files or to database management systems that are
supported by SAS
n in combination with other data sources using PROC SQL
n Data file variables can be sorted and indexed before using; SAS views must
process data in its existing form during execution.
724 Chapter 29 / SAS Views
n SAS/ACCESS views
If the SAS view exists in a SAS library and if you use the same member name to
create a new view definition, then the old SAS view is overwritten.
Beginning with SAS 8, DATA step views retain source statements. You can retrieve
these statements using the DESCRIBE statement. The following example uses the
DESCRIBE statement in a DATA step view in order to write a copy of the source
code to the SAS log:
data view=inventory;
describe;
run;
For more information about how to create SAS views and use the DESCRIBE
statement, see the DATA statement in SAS DATA Step Statements: Reference.
DATA Step Views 725
Performance Considerations
n DATA step code executes each time you use a DATA step view, which might add
considerable system overhead. In addition, you run the risk of having your data
change between steps. However, this also means that you get the most recent
data available—that is, data when the view is executed compared to data when
the view was compiled.
n Depending on how many reads or passes on the data are required, processing
overhead increases.
o When one sequential pass is requested, no data set is created. Compared to
traditional methods of processing, making one pass improves performance
by decreasing the number of input/output operations and elapsed time.
o When random access or multiple passes are requested, the SAS view must
build a spill file that contains all generated observations so that subsequent
passes can read the same data that was read by previous passes. In some
instances, the view SPILL= data set option can reduce the size of a spill file.
n The VBUFSIZE= system option and the OBSBUF= data set option can be used
to speed up execution time when processing DATA step views. For information
about optimizing performance with SAS views, see “Setting VBUFSIZE= and
OBSBUF= for SAS DATA Step Views” on page 225.
For more information about the VBUFSIZE= system option, see “VBUFSIZE=
System Option” in SAS System Options: Reference. For more information about
the OBSBUF data set option, see “OBSBUF= Data Set Option” in SAS Data Set
Options: Reference.
select;
when (name=' ' or major=' ' or credits=.)
do code=01;
date=datetime();
output myv9lib.problems;
end; 4
when (0<credits<90)
do code=02;
date=datetime();
output myv9lib.problems;
end; 5
otherwise
output myv9lib.class;
end;
run; 6
The following example shows how to print the files created previously. The
MyV9Lib.Class contains the observations from Student that were processed without
728 Chapter 29 / SAS Views
errors. The data file MyV9Lib.Problems contains the observations that contain
errors.
If the data frequently changes in the source data file Student, there would be
different effects on the returned values in the SAS view and the SAS data file:
n New records, if error free, that are added to the source data file Student between
the time you run the DATA step in the previous example and the time you
execute PROC PRINT in the following example, appear in the SAS view
MyV9Lib.Class.
n On the other hand, if any new records, failing the error tests, were added to
Student, the new records would not show up in the SAS data file
MyV9Lib.Problems, until you run the DATA step again.
A SAS view dynamically updates from its source files each time it is used. A SAS
data file, each time it is used, remains the same, unless new data is written directly
to the file.
filename student 'external-file-specification';
libname myv9lib 'SAS–library'; 7
1 Reference a library called MyV9Lib. Tell SAS where a file that associated with
the fileref Student is stored.
2 Create a data file called Problems and a SAS view called Class and specify the
column names for both data sets.
3 Select the file that is referenced by the fileref Student and select the data in
character format that resides in the specified positions in the file. Assign column
names.
4 When data in the column Name, Major, or Credits is blank or missing, assign a
code of 01 to the observation where the missing value occurred. Also assign a
SAS datetime code to the error and place the information in a file called
Problems.
5 When the number of credits is greater than zero, but less than ninety, list the
observations as code 02 in the file called Problems and assign a SAS datetime
code to the observation.
6 Place all other observations, which have none of the specified errors, in the SAS
view called MyV9Lib.Class.
7 The FILENAME statement assigns the fileref Student to an external file. The
LIBNAME statement assigns the libref MyV9Lib to a SAS library.
8 The first PROC PRINT calls the SAS view MyV9Lib.Class. The SAS view
extracts data on the fly from the file referenced as Student.
9 This PROC PRINT prints the contents of the data file MyV9Lib.Problems.
Comparing DATA Step and PROC SQL Views 729
n SAS/ACCESS views
For complete documentation on how to create and use PROC SQL views, see “SQL
Procedure” in SAS SQL Procedure User’s Guide.
For information about using PROC SQL views created in an earlier release, see
Chapter 35, “Cross-Release Compatibility and Migration,” on page 779.
to easily send SQL statements and pass data to a DBMS by using the pass-
through facility.
o You can also use the SQL language to subset your data before processing it.
This capability saves memory when you have a large SAS view, but need to
select only a small portion of the data contained in the view.
o PROC SQL views do not use DATA step programming.
o When a WHERE clause is applied to a PROC SQL view, the WHERE clause
might be evaluated by the PROC SQL view engine, or the WHERE clause
might be evaluated by the underlying library's engine.
SAS/ACCESS Views
A SAS/ACCESS view is an interface view, also called a view descriptor, which
accesses DBMS data that is defined in a corresponding access descriptor.
Using SAS/ACCESS software, you can create an access descriptor and one or
more view descriptors in order to define and access some or all of the data
described by one DBMS table or DBMS view. You can also use view descriptors in
order to update DBMS data, with certain restrictions.
In addition, some SAS/ACCESS products provide a dynamic LIBNAME engine
interface. If available, it is recommended that you use SAS/ACCESS LIBNAME
statement to assign a SAS libref to your DBMS data because it is more efficient and
easier to use than access descriptors and view descriptors. The SAS/ACCESS
dynamic LIBNAME engine enables you to treat DBMS data as if it were SAS data by
assigning a SAS libref to DBMS objects. Using a SAS/ACCESS dynamic LIBNAME
engine means that you can use both native DATA step views and native PROC SQL
views to access DBMS data instead of view descriptors.
See Chapter 33, “About SAS/ACCESS Software,” on page 757 or the
SAS/ACCESS documentation for your database for more information about
SAS/ACCESS features.
For information about using SAS/ACCESS view descriptors created in an earlier
release, see Chapter 35, “Cross-Release Compatibility and Migration,” on page
779.
Note: Starting in SAS 9, PROC SQL views are the preferred way to access
relational DBMS data. You can convert existing SAS/ACCESS view descriptors into
PROC SQL views by using the CV2VIEW procedure. This enables you to use the
LIBNAME statement to access your data. See the CV2VIEW Procedure in
SAS/ACCESS for Relational Databases: Reference.
731
30
Stored Compiled DATA Step
Programs
the compiled program. However, SAS does not store the global statements, and it
does not display a warning message in the SAS log.
program. The following figure shows the process for creating a stored compiled
DATA step program.
Stored Compiled
DATA Step DATA Step
DATA Step
Source Code Compiler
Program
When SAS executes the stored program, it resolves the intermediate code produced
by the compiler and generates the executable machine code for that operating
environment. The following figure shows the process for executing a stored
compiled DATA step program.
Stored Compiled
DATA Step Executable DATA
DATA Step
Code Generator Step Program
Program
To move, copy, rename, or delete stored programs, use the DATASETS procedure
or the utility windows in your windowing environment.
source-option
enables you to save or encrypt the source code.
For complete information about the DATA statement, see SAS DATA Step
Statements: Reference.
1 Write, test, and debug the DATA step program that you want to store.
If you are reading external raw data files or if you write raw data to an external
file, use a fileref rather than the actual filename in your INFILE and FILE
statements so that you can redirect your input and output when the stored
program executes.
2 When the program runs correctly, submit it using the PGM= option in the DATA
statement.
The PGM= option tells SAS to compile, but not execute, the program and to
store the compiled code in the SAS file named in the option. SAS sends a
message to the log when the program is stored.
Type='ERROR';
Number=0;
end;
run;
Example Code 30.1 Partial SAS Log Identifying the Stored DATA Step Program
.
.
.
NOTE: DATA STEP program saved on file Stored.Sample.
NOTE: A stored DATA STEP program cannot run under a different operating
system.
NOTE: DATA statement used (Total process time):
real time 0.17 seconds
cpu time 0.01 seconds
DESCRIBE
is a SAS statement that retrieves source code from a stored compiled DATA step
program or a DATA step view.
Note: To DESCRIBE a password-protected DATA step program, you must
specify its password. If the program has more than one password, you must
specify the most restrictive password (with ALTER being the most restrictive and
READ the least restrictive). For more information, see “DESCRIBE Statement” in
SAS DATA Step Statements: Reference.
INPUT | OUTPUT
specifies whether you are redirecting input or output data sets. When you specify
INPUT, the REDIRECT statement associates the name of the input data set in
the source program with the name of another SAS data set. When you specify
OUTPUT, the REDIRECT statement associates the name of the output data set
with the name of another SAS data set.
old-name
specifies the name of the input or output data set in the source program.
new-name
specifies the name of the input or output data set that you want SAS to process
for the current execution.
EXECUTE
is a SAS statement that executes a stored compiled DATA step program.
For complete information about the DATA statement, see “DATA Statement” in SAS
DATA Step Statements: Reference.
1 Write a DATA step for each execution of the stored program. In this DATA step,
specify the name of the stored program in the PGM= option of the DATA
statement and include an optional password. You can do any of the following
tasks:
n Submit this DATA step as a separate program.
n Include it as part of a larger SAS program that can include other DATA and
procedure (PROC) steps.
n Point to different input and output SAS data sets each time you execute the
stored program by using the REDIRECT statement.
2 Submit the DATA steps. Be sure to end each one with a RUN statement or other
step boundary.
Redirecting Output
You can redirect external files using filerefs. You can use the REDIRECT statement
for renaming input and output SAS data sets.
You can use the REDIRECT statement to redirect input and output data to data sets
you specify. Note that the REDIRECT statement is available only for use with stored
compiled DATA step programs.
Note: To redirect input and output stored in external files, include a FILENAME
statement at execution time to associate the fileref in the source program with
different external files.
CAUTION! Use caution when you redirect input data sets. The number and
attributes of variables in the input SAS data sets that you read with the REDIRECT
statement should match those of the input data sets in the SET, MERGE, MODIFY, or
UPDATE statements of the source code. If they do not match, the following occurs:
n If the variable length attributes differ, the length of the variable in the source code
data set determines the length of the variable in the redirected data set.
n If extra variables are present in the redirected data sets, the stored program
stops processing, and an error message is sent to the SAS log.
n If the variable type attributes are different, the stored program stops processing,
and an error message is sent to the SAS log.
Example Code 30.2 Partial SAS Log Showing the Source Code Generated by the
DESCRIBE Statement
For more information about the DESCRIBE statement, see SAS DATA Step
Statements: Reference.
run;
Example Code 30.3 Partial SAS Log Identifying the Redirected Output File
that is used to execute the stored program can include a FILENAME statement to
associate the fileref Daily with a different external file.
The following statements compile and store the program:
The following statements execute the stored compiled program, redirect the output,
and print the results:
data pgm=stored.flaws;
redirect output flaws=testlib.daily;
run;
Note that you can use the TITLE statement when you execute a stored compiled
DATA step program or when you print the results.
741
31
DICTIONARY Tables
n use any SAS procedure or the DATA step, referring to the PROC SQL view of
the table in the Sashelp library
For more information about DICTIONARY tables, including a list of available
DICTIONARY tables and their associated Sashelp views, see SAS SQL Procedure
User’s Guide.
742 Chapter 31 / DICTIONARY Tables
2 Select the Sashelp library. A list of members in the Sashelp library appears.
3 Select a SAS view with a name that starts with V (for example, VMEMBER).
A VIEWTABLE window appears that contains its contents. (For z/OS, type the
letter 'O' in the command field for the desired member and press Enter. The
FSVIEW window appears with the contents of the view.)
In the VIEWTABLE window the column headings are labels. To see the column
names, select View ð Column Names.
The result of the DESCRIBE TABLE statement appears in the SAS log:
NOTE: SQL table DICTIONARY.INDEXES was created like:
n The first word on each line is the column (or variable) name. You need to use
this name when you write a SAS statement that refers to the column (or
variable).
n Following the column name is the specification for the type of variable and the
width of the column.
n The name that follows label= is the column (or variable) label.
After you know how a table is defined, you can use the processing ability of the
PROC SQL WHERE clause in a PROC SQL step to extract a portion of a SAS view.
Note that many character values in the DICTIONARY tables are stored as all-
uppercase characters; you should design your queries accordingly.
CAUTION! Do not confuse the GENNUM variable value in CONTENTS OUT= data
set with the GEN variable value from DICTIONARY tables. GENNUM from a
CONTENTS procedure or statement refers to a specific generation of a data set. GEN
from DICTIONARY tables refers to the total number of generations for a data set.
proc sql;
create table mytable as
select * from sashelp.vcolumn
where libname='WORK' and memname='SALES';
quit;
How to View DICTIONARY Tables 745
Note: SAS does not maintain DICTIONARY table information between queries.
Each query of a DICTIONARY table launches a new discovery process.
If you are querying the same DICTIONARY table several times in a row, you can get
even faster performance by creating a temporary SAS data set and running your
query against that data set. You can create the temporary data set by using the
DATA step SET statement or PROC SQL CREATE TABLE AS statement.
746 Chapter 31 / DICTIONARY Tables
747
32
SAS Catalogs
You commonly specify the two-level name for an entire catalog, as follows:
libref.catalog
libref
is the logical name of the SAS library to which the catalog belongs.
catalog
is a valid SAS name for the file.
The entry name and entry type are required by some SAS procedures. If the entry
type has been specified elsewhere or if it can be determined from context, you can
use the entry name alone. To specify entry names and entry types, use this form:
entry-name.entry-type
entry-name
is a valid SAS name for the catalog entry.
entry-type
is assigned by SAS when the entry is created.
CATALOG procedure
is similar to the DATASETS procedure. Use the CATALOG procedure to copy,
delete, list, and rename entries in catalogs.
CEXIST function
enables you to verify the existence of a SAS catalog or catalog entry. See the
CEXIST function in SAS Functions and CALL Routines: Reference for more
information.
CATALOG window
is a window that you can access at any time in an interactive windowing
environment. It displays the name, type, description, and date of last update for
each entry in the specified catalog. CATALOG window commands enable you to
edit catalog entries. You can also view and edit catalog entries after double-
clicking on a catalog file in SAS Explorer.
catalog directory windows
are available in some procedures in SAS/AF, SAS/FSP, and SAS/GRAPH
software. A catalog directory window lists the same type of information that the
CATALOG window provides: entry name, type, description, and date of last
update. See the description of each interactive windowing procedure for details
about the catalog directory window for that procedure.
Profile Catalog
during system initialization in your first SAS session. If you use one of the other
modes of execution, the Profile catalog is created the first time you execute a SAS
procedure that requires it.
At SAS start-up, SAS checks for an existing uncorrupted Sasuser.Profile catalog. If
this catalog is found, then SAS copies the Sasuser.Profile catalog to
Sasuser.Profbak. The backup is used if the Sasuser.Profile catalog becomes
corrupted. For more information, see “How to Recover Locked or Corrupt Profile
Catalogs” on page 750.
Operating Environment Information: The Sasuser library is implemented
differently in various operating environments. See the SAS documentation for your
host system for more information about how the Sasuser library is created.
Default Settings
The default settings for your SAS session are stored in several catalogs in the
Sashelp installation library. If you do not make any changes to key settings or other
options, SAS uses the default settings. If you make changes, the new information is
stored in your Sasuser.Profile catalog. To restore the original default settings, use
the CATALOG procedure or the CATALOG window to delete the appropriate entries
from your Profile catalog. By default, SAS then uses the corresponding entry from
the Sashelp library.
During SAS sessions, you can make customizations, such as window resizing and
positioning, and save them to Sasuser.Profile.
Catalog Concatenation
Definitions
You can logically combine two or more SAS catalogs by concatenating them. This
enables you to access the contents of several catalogs, using one catalog name.
There are two ways to concatenate catalogs, using the LIBNAME statement and
CATNAME statement.
LIBNAME catalog concatenation
results from a concatenation of libraries through a LIBNAME statement. When
two or more libraries are logically combined through concatenation, any catalogs
with the same name in each library become logically combined as well.
CATNAME catalog concatenation
is a concatenation that is specified by the global CATNAME statement in which
the catalogs to be concatenated are specifically named. During CATNAME
catalog concatenation, a logical catalog is set up in memory.
MyCat.CATALOG MyCat.CATALOG
Table1.DATA MyCat2.CATALOG
Table3.DATA Table1.DATA
752 Chapter 32 / SAS Catalogs
Table1.INDEX
Table2.DATA
Table2.INDEX
Notice that Table1.INDEX does not appear in the concatenation but Table2.INDEX
does appear. SAS suppresses listing the index when its associated data file is not
part of the concatenation.
So what happened to the catalogs when the libraries were concatenated? A
resulting catalog now exists logically in memory, with the full name
Both.MyCat.CATALOG. This catalog combines each of the two physical catalogs
residing in SAS-library-1 and SAS-library-2, called MyCat.CATALOG.
To understand the contents of the concatenation Both.MyCat, first look at the
contents of both parts of the concatenation. Assume that the two original
MyCat.CATALOG files contain the following:
Contents of MyCat.CATALOG in
Contents of MyCat.CATALOG in Library 1 Library 2
A.FRAME A.GRSEG
C.FRAME B.FRAME
C.FRAME
Both.MyCat
Both.MyCat
In the following example, there must be a libref that is defined and named CatDog.
The libref CatDog establishes the scope for the CATNAME concatenation definition.
Note: If a file in CatDog named Combined.CATALOG already exists, it becomes
inaccessible until the CATNAME concatenation CatDog.Combined is cleared.
MyCat.CATALOG MyDog.CATALOG
Table1.DATA MyCat2.CATALOG
Table3.DATA Table1.DATA
Table1.INDEX
Table2.DATA
Table2.INDEX
MyCat.CATALOG MyDog.CATALOG
Library 1 Library 2
A.FRAME A.GRSEG
C.FRAME B.FRAME
C.FRAME
Combined.CATALOG contents
n Any time a list of catalog entries is displayed, only one occurrence of the catalog
entry is shown.
Note: Even if a catalog entry occurs multiple times in the concatenation, only
the first occurrence is shown.
756 Chapter 32 / SAS Catalogs
757
33
About SAS/ACCESS Software
proc sql;
select *
from mydb2lib.employees(drop=salary)
where dept='Accounting';
quit;
The LIBNAME statement connects to DB2. You can reference a DBMS object, in
this case, a DB2 table, by specifying a two-level name that consists of the libref and
the DBMS object name. The DROP= data set option causes the SALARY column of
the EMPLOYEES table on DB2 to be excluded from the data that is returned by the
query.
See your SAS/ACCESS documentation for a full listing of the SAS/ACCESS data
set options and the Base SAS data set options that can be used on data sets that
refer to DBMS data.
SQL Procedure Pass-Through Facility 759
proc sql;
create view viewlib.emp_view as
select *
from mydblib.employees
using libname mydblib oracle user=smith password=secret
path='myoraclepath';
quit;
When PROC SQL executes the SAS view, the SELECT statement assigns the libref
and establishes the connection to the DBMS. The scope of the libref is local to the
SAS view and does not conflict with identically named librefs that might exist in the
SAS session. When the query finishes, the connection is terminated and the libref is
deassigned.
Note: You can also embed a Base SAS LIBNAME statement in a PROC SQL view.
select *
from connection to myconn
(select empid, lastname, firstname, salary
from employees
where salary>75000);
proc sql;
connect to oracle as myconn (user=smith password=secret
path='myoracleserver');
The following example creates an access descriptor and a view descriptor in the
same PROC step to retrieve data from a DB2 table:
libname adlib 'SAS-library';
libname vlib 'SAS -library';
create vlib.custord.view;
select ordernum stocknum shipto;
format ordernum 5.
stocknum 4.;
run;
When you want to use access descriptors and view descriptors, both types of
descriptors must be created before you can retrieve your DBMS data. The first step,
creating the access descriptor, enables SAS to store information about the specific
DBMS table that you want to query.
After you have created the access descriptor, the second step is to create one or
more view descriptors to retrieve some or all of the DBMS data described by the
access descriptor. In the view descriptor, you select variables and apply formats to
manipulate the data for viewing, printing, or storing in SAS. You use only the view
descriptors, and not the access descriptors, in your SAS programs.
The interface view engine enables you to reference your SAS view with a two-level
SAS name in a DATA or PROC step, such as the PROC PRINT step in the
example.
See Chapter 29, “SAS Views,” on page 721 for more information about SAS views.
See the SAS/ACCESS documentation for your DBMS for more detailed information
about creating and using access descriptors and SAS/ACCESS views.
DBLOAD Procedure
The DBLOAD procedure enables you to create and load data into a DBMS table
from a SAS data set, data file, SAS view, or another DBMS table, or to append rows
to an existing table. It also enables you to submit non-query DBMS-specific SQL
statements to the DBMS from your SAS session.
Note: If a dynamic LIBNAME engine is available for your DBMS, it is recommended
that you use the SAS/ACCESS LIBNAME statement to create your DBMS data
instead of the DBLOAD procedure. However, DBLOAD continues to work in SAS
software if it was available for your DBMS in SAS 6. Some new SAS features, such
as long variable names, are not supported when you use the DBLOAD procedure.
The following example appends data from a previously created SAS data set named
INVDATA into a table in an ORACLE database named INVOICE:
762 Chapter 33 / About SAS/ACCESS Software
See the SAS/ACCESS documentation for your DBMS for more detailed information
about the DBLOAD procedure.
In SAS/ACCESS products that provide a DATA step interface, the INFILE statement
has special DBMS-specific options that enable you to specify DBMS variable values
and to format calls to the DBMS appropriately. See the SAS/ACCESS
documentation for your DBMS for a full listing of the DBMS-specific INFILE
statement options and the Base SAS INFILE statement options that can be used
with your DBMS.
764 Chapter 33 / About SAS/ACCESS Software
765
34
Processing Data Using Cross-
Environment Data Access (CEDA)
data representation
is the form in which data is stored in a particular operating environment. Different
operating environments use different standards or conventions for storing data in
memory. (See Table 34.2 on page 770.)
n Floating-point numbers can be represented in IEEE floating-point format or
IBM floating-point format.
n Data alignment can be on a 1-byte, 4-byte, or 8-byte boundary, depending on
data type requirements for the operating environment.
n Data type lengths can be 8 bits or more for a character data type, 16 bit, 32
bit, or 64 bit for an integer data type, 32 bit for a single-precision floating-point
data type, and 64 bit for a double-precision floating-point data type.
n The ordering of bytes in memory can be big Endian or little Endian.
encoding
is a set of characters (letters, logograms, digits, punctuation, symbols, control
characters, and so on) that have been mapped to numeric values (called code
points) that can be used by computers. The code points are assigned to the
characters in the character set by applying an encoding method. Some
examples of encodings are Wlatin1 and Danish EBCDIC. (See “Encoding
Combinations That Do Not Need CEDA Processing for Transcoding” in SAS
National Language Support (NLS): Reference Guide.)
incompatible
describes a file that has a different data representation or encoding than the
current SAS session. CEDA enables access to many types of incompatible files.
Advantages of CEDA
CEDA offers these advantages:
n You can transparently process a supported SAS file with no knowledge of the
file's data representation or character encoding.
n No transport files are created. CEDA requires a single translation to the current
session's data representation, rather than multiple translations from the source
representation to transport file to target representation.
n CEDA eliminates the need to perform multiple steps in order to process the file.
n CEDA does not require a sign-on as is needed in SAS/CONNECT or a dedicated
server as is needed in SAS/SHARE.
SAS File Processing with CEDA 767
* For output processing that replaces an existing SAS data file, there are behavioral differences. For
more information, see “Behavioral Differences for Output Processing” on page 767.
** CEDA supports SAS 8 and later MDDB files.
n The TAPE engine uses the current SAS session encoding, except with PROC
COPY.
n For both the BASE and TAPE engines, by default PROC COPY uses the
encoding of the file from the source library. If, instead, you want to use the
encoding of the current SAS session, specify the NOCLONE option. If you
want to use a different encoding, specify the NOCLONE option and the
ENCODING= option. When you use PROC COPY with SAS/SHARE or
SAS/CONNECT, the default behavior is to use the encoding of the current
SAS session.
n The SPD Engine uses the current SAS session encoding. The CLONE option
of PROC COPY is not supported.
data representation
n The BASE and TAPE engines use the data representation of the current SAS
session, except with PROC COPY.
n For both the BASE and TAPE engines, by default PROC COPY uses the
data representation of the file from the source library. If, instead, you want to
use the data representation of the current SAS session, specify the
NOCLONE option. If you want to use a different data representation, specify
the NOCLONE option and the OUTREP= option. When you use PROC
COPY with SAS/SHARE or SAS/CONNECT, the default behavior is to use
the data representation of the current SAS session.
n The SPD Engine uses the data representation of the current SAS session.
The CLONE option of PROC COPY is not supported.
n Indexes are not supported. Therefore, WHERE optimization with an index is not
supported.
n Extended attributes cannot be updated, but they can be read.
n Other files that are not supported include DATA step views, SAS/ACCESS views
that are not for SAS/ACCESS for Oracle or SAP, stored compiled DATA step
programs, item stores, DMDB files, FDB files, or any SAS file that was created
prior to SAS 7.
n On z/OS, members of UNIX file system libraries can be created using any SAS
data representation. However, when bound libraries are created, they are
assigned the data representation of the SAS session that creates the library.
SAS does not allow the creation of bound library members with a data
representation that differs (except for the character encoding) from the data
representation of the library. For example, if you create a bound library with 31-
bit SAS on z/OS, the library has a data representation of MVS_32 for the
SAS File Processing with CEDA 769
duration of its existence, and you cannot use the OUTREP option of the
LIBNAME statement to create a member in the library with a data representation
other than MVS_32. For more information about library implementation types for
BASE and sequential engines on z/OS, see SAS Companion for z/OS.
n Because the BASE engine translates the data as the data is read, multiple
procedures require SAS to read and translate the data multiple times. In this
way, the translation could affect system performance.
n If a data set is damaged, CEDA cannot process the file in order to repair it.
CEDA does not support update processing, which is required in order to repair a
damaged data set. To repair the file, you must move it back to the environment
where it was created or a compatible environment that does not invoke CEDA
processing. For information about how to repair a damaged data set, see the
REPAIR statement in the DATASETS procedure in Base SAS Procedures Guide.
n Transcoding could result in character data loss when encodings are
incompatible. For information about encoding and transcoding, see the SAS
National Language Support (NLS): Reference Guide.
n Loss of precision can occur in numeric variables when you move data between
operating environments. If a numeric variable is defined with a short length, you
can try increasing the length of the variable. Full-size numeric variables are less
likely to encounter a loss of precision with CEDA. For more information, see
“Numerical Accuracy in SAS Software” on page 72.
n Numeric variables have a minimum length of either 2 or 3 bytes, depending on
the operating environment. In an operating environment that supports a
minimum of 3 bytes (such as Windows or UNIX), CEDA cannot process a
numeric variable that was created with a length of 2 bytes (for example, in z/OS).
If you encounter this restriction, then use the XPORT engine or the CPORT and
CIMPORT procedures instead of CEDA.
Note: If you encounter these restrictions because your files were created under a
previous version of SAS, consider using the MIGRATE procedure, which is
documented in the Base SAS Procedures Guide. PROC MIGRATE retains many
features, such as integrity constraints, indexes, and audit trails.
n when the encoding of character values for the SAS file is incompatible with the
currently executing SAS session encoding.
n when the data representation of the SAS file is incompatible with the data
representation of the currently executing SAS session. For example, an
incompatibility can occur if you move a file from an operating environment like
Windows to an operating environment like UNIX, or if you have upgraded to 64-
bit UNIX from 32-bit UNIX.
In the following table, each row contains a group of operating environments that
are compatible with each other. CEDA is used only when you create a file with a
data representation in one row and process the file under a data representation
of another row. (The current release of SAS does not run on some of these
environments, but they are included here for completeness.)
Data Representation
Value Environment
Data Representation
Value Environment
* Although all of the environments in this group are compatible, catalogs are an exception. Catalogs
are compatible between Tru64 UNIX and Linux for Itanium. Catalogs are compatible between Linux
for x64, Solaris for x64, and Linux on the Power Architecture. Linux on the Power Architecture is
added in SAS Viya 3.5 and is not supported in SAS 9.
** Although these OpenVMS environments have different data representations for some compiler types,
SAS data sets that are created by the BASE engine do not store the data types that are different.
Therefore, if the encoding is compatible, CEDA is not used between these environments. However,
note that SAS 9 does not support SAS 8 catalogs from OpenVMS. You can migrate the catalogs with
the MIGRATE procedure. For more information, see the Base SAS Procedures Guide.
*** Although these Windows environments are compatible, catalogs are an exception. Catalogs are not
compatible between 32-bit and 64-bit SAS for Windows.
ERROR: File HEALTH.OXYGEN cannot be updated because its encoding does not
match the session encoding or the file is in a format native to another host,
such as HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64.
To determine the data representation and the encoding of a file, you can use the
CONTENTS procedure (or the CONTENTS statement in PROC DATASETS). For
example, the data set HEALTH.OXYGEN was created in a UNIX environment in
SAS 9. The file was moved to a SAS 9 Windows environment, in which the following
CONTENTS output was requested:
772 Chapter 34 / Processing Data Using Cross-Environment Data Access (CEDA)
Using CEDA, SAS automatically recognizes the file's UNIX data representation and
translates it to the data representation for the Windows environment. The log output
displays a message that the file is being processed using CEDA.
libname Health 'c:\MyFiles';
Example Code 34.2 Log Output from Processing a File from a Different Operating
Environment
For more information, see the OUTREP= option for the LIBNAME statement in SAS
DATA Step Statements: Reference or see the OUTREP= data set option in SAS
Data Set Options: Reference.
The user wants to share data with another office. The other office is running SAS on
64-bit Linux. Their locale is EN_US. Normally, that combination would result in a
default session encoding of latin1. However, the user has been told that the other
office has set their session encoding to UTF-8.
The user creates a data set. Because the data set will later be processed in the
other office’s Linux environment, the user assigns the appropriate data
representation. The user wants to avoid CEDA processing in the target
environment, so the encoding and data representation must match the target. The
user assumes that because their own environment and the Linux environment are
both set to UTF-8 session encoding, an encoding specification is not required. The
user sets the OUTREP= value only. This is not the correct syntax for this situation.
libname mylib 'C:\test';
776 Chapter 34 / Processing Data Using Cross-Environment Data Access (CEDA)
The user checks their assumptions against the attributes of the data set:
proc contents data=mylib.testdata;
run;
The user is surprised. The data set does not have the session encoding of UTF-8.
Instead, the encoding is latin1, which is the default for OUTREP=LINUX_X86_64
together with the LOCALE value of the current session, EN_US.
The correct way to prevent this issue is as follows. If you want a nondefault
encoding value, then when you specify the OUTREP= option, you must also specify
the ENCODING= option:
data mylib.test2 (outrep=linux_x86_64 encoding=utf8);
x=1;
run;
proc contents data=mylib.test2;
run;
Now the encoding and data representation both match the target environment:
Examples of Using CEDA 777
35
Cross-Release Compatibility and
Migration
your guide to migrating files from previous versions of SAS. Refer to this focus area
for planning and cost analysis information, known compatibility issues and their
resolutions, and step-by-step instructions. The MIGRATE procedure, which provides
a simple way to migrate a library of SAS files from previous releases of SAS, is
documented in Base SAS Procedures Guide.
Metadata-bound libraries
Metadata-bound data sets cannot be accessed by any release prior to SAS
9.3M2. See “Metadata-Bound Libraries” on page 801.
Data sets created with EXTENDOBSCOUNTER=YES
In SAS 9.4, extended observation count is the default. In SAS 9.3 it is optional. If
a data set has EXTENDOBSCOUNTER=YES, then it is not accessible by any
release prior to SAS 9.3. The behavior depends on your operating environment.
See “Backward Compatibility of the Extended Observation Count Attribute” on
page 658.
Encoding attribute
The encoding attribute is not supported in SAS 6. SAS 7 and 8 data sets must
be updated or output in a SAS 9 session to be stamped with an encoding
attribute. (The encoding attribute was supported prior to SAS 9 in China, Korea,
and Japan.)
TIP If you replace or update a data set that does not have an encoding
attribute, then be aware that the session encoding is stamped on the data set
by default. If that behavior is not desired, you can override the data set's
encoding by using the DATA step option ENCODING= or the LIBNAME
options INENCODING= or OUTENCODING=. If a session encoding is
stamped on a data set incorrectly, and you are certain of the correct encoding,
then you can set it with the CORRECTENCODING= option in the MODIFY
statement of the DATASETS procedure. For the correct use of encoding
language elements, see SAS National Language Support (NLS): Reference
Guide.
Default Engine
SAS Library Contents Assignment
Both SAS 9 SAS files and SAS files from earlier releases V9
784 Chapter 35 / Cross-Release Compatibility and Migration
785
36
File Protection
Definition of a Password
SAS software enables you to restrict access to members of SAS libraries by
assigning passwords to the members. You can assign passwords to all member
types except catalogs. You can specify three levels of protection: Read, Write, and
Alter. When a password is assigned, it appears as uppercase Xs in the log.
Note: This document uses the terms SAS data file and SAS view to distinguish
between the two types of SAS data sets. Passwords work differently for type VIEW
than they do for type DATA. The term “SAS data set” is used when the distinction is
not necessary.
read
protects against reading the file.
write
protects against changing the data in the file. For SAS data files, write protection
prevents adding, modifying, or deleting observations.
alter
protects against deleting or replacing the entire file. For SAS data files, alter
protection also prevents modifying variable attributes and creating or deleting
indexes.
Alter protection does not require a password for Read or Write access; write
protection does not require a password for Read access. For example, you can read
an alter-protected or write-protected SAS data file without knowing the Alter or Write
password. Conversely, read and write protection do not prevent any operation that
requires alter protection. For example, you can delete a SAS data set that is read-
or write-protected only without knowing the Read or Write password.
To protect a file from being read, written to, deleted, or replaced by anyone who
does not have the proper authority, assign read, write, and alter protection. To allow
others to read the file without knowing the password, but not change its data or
delete it, assign just write and alter protection. To completely protect a file with one
password, use the PW= data set option. For more information, see “Assigning
Complete Protection with the PW= Data Set Option” on page 790.
Note: Because of how SAS opens files, you must specify the Read password to
update a SAS data set that is only read-protected.
Note: The levels of protection differ somewhat for the member type VIEW. See
“Using Passwords with Views” on page 792.
Assigning Passwords 787
Assigning Passwords
Syntax
To set a password, first specify a SAS data set in one of the following:
n a DATA statement
n the ToolBox
Then assign one or more password types to the data set. The data set might
already exist, or the data set might be one that you create. The following is an
example of syntax:
(password-type=password ... password-type=password>)
where password is a valid eight-character SAS name and password-type can be
one of the following SAS data set options:
n ALTER=
n PW=
n READ=
n WRITE=
TIP Each password option must be coded on a separate line to ensure that they
are properly blotted in the SAS log.
CAUTION! Keep a record of any passwords that you assign! If you forget or do not
know the password, you cannot get the password from SAS.
This example prevents reading or deleting a stored program without a password and
also prevents changing the source program.
/* assign a read and an alter password to the SAS view ROSTER */
data mylib.roster(read=green alter=red) / view=mylib.roster;
set mylib.students;
run;
Note: When you replace a SAS data set that is alter-protected, the new data set
inherits the Alter password. To change the Alter password for the new data set, use
the MODIFY statement in the DATASETS procedure.
Passwords are hierarchical in terms of gaining access. For example, specifying the
ALTER password gives you Read and Write access. The following example creates
the data set States, with three different passwords, and then reads the data set to
produce a plot:
data mylib.states(read=green write=yellow alter=red);
input density crime name $;
datalines;
151.4 6451.3 Colorado
… more data lines …
;
password. If you use the PW= data set option, those who have access need to
remember only one password for total access.
n To access a member whose password is assigned using the PW= data set
option, use the PW= data set option. You can also use the data set option that
equates to the specific level of access that you need:
/* create a data set using PW=, then use READ= to print the data set */
data mylib.states(pw=orange);
input density crime name $;
datalines;
151.4 6451.3 Colorado
… more data lines …
;
Encoded Passwords
Encoding a password enables you to write SAS programs without having to specify
a password in plain text. The PWENCODE procedure uses encoding to disguise
passwords. With encoding, one character set is translated to another character set
through some form of table lookup. An encoded password is intended to prevent
casual, non-malicious viewing of passwords. You should not depend on encoded
passwords for all your data security needs; a determined and knowledgeable
attacker can decode the encoded passwords.
When an encoded password is used, the syntax parser decodes the password and
accesses the file. The encoded password is never written in plain text to the SAS
log. SAS does not accept passwords longer than eight characters. If an encoded
password is decoded and is longer than eight characters, SAS reads it as an
incorrect password and sends an error message to the SAS log. For more
information, see “PWENCODE Procedure” in Base SAS Procedures Guide.
792 Chapter 36 / File Protection
Levels of Protection
The levels of protection for SAS views and stored programs are similar to the levels
of protection for other types of SAS files. However, with SAS views, passwords
affect not only the underlying data, but also the view’s definition (or source
statements).
You can specify three levels of protection for SAS views: Read, Write, and Alter. The
following section describes how these data set options affect the underlying data as
well as the view’s descriptor information. Unless otherwise noted, the term “view”
refers to any type of SAS view and the term “underlying data” refers to the data that
is accessed by the SAS view:
Read
n protects against reading of the SAS view's underlying data
n prevents the display of source statements in the SAS log when using
DESCRIBE
n allows replacement of the SAS view
Write
n protects the underlying data associated with a SAS view by insisting that a
Write password is given
n prevents the display of source statements in the SAS log when using
DESCRIBE
n allows replacement of the SAS view
Alter
n prevents the display of source statements in the SAS log when using
DESCRIBE
n protects against replacement of the SAS view
For example, to DESCRIBE a view that has both Read and Write protection, you
must specify its Write password. Similarly, to DESCRIBE a view that has both Read
and Alter protection, you must specify its Alter password (since Alter is the more
restrictive of the two).
The following program shows how to use the DESCRIBE statement to view the
descriptor information for a Read-protected and Alter-protected view:
/*create a view with read and alter protection*/
data exam / view=exam(read=read alter=alter);
set grades;
run;
/*describe the view by specifying the most restrictive password */
data view=exam(alter=alter);
describe;
run;
Using Passwords with Views 793
For more information, see “DESCRIBE Statement” in SAS DATA Step Statements:
Reference and “DATA Statement” in SAS DATA Step Statements: Reference.
In most DATA and PROC steps, the way you use password-protected views is
consistent with how you use other types of password-protected SAS files. For
example, the following PROC PRINT prints a Read-protected view:
proc print data=mylib.grade(read=green);
run;
Note: You might experience unexpected results when you place protection on a
SAS view if some type of protection is already placed on the underlying data set.
Note: You can create a PROC SQL view from password-protected SAS data sets
without specifying their passwords. Use the view that you are prompted for the
passwords of the SAS data sets named in the FROM clause. If you are running SAS
in batch or noninteractive mode, you receive an error message.
SAS/ACCESS Views
SAS/ACCESS software enables you to edit View descriptors and, in some
interfaces, the underlying data. To prevent someone from editing or reading
(browsing) the View descriptor, assign Alter protection to the view. To prevent
someone from updating the underlying data, assign Write protection to the view. For
more information, see the SAS/ACCESS documentation for your DBMS.
794 Chapter 36 / File Protection
Note that you can use the SAS view without a password, but access to the
underlying data requires a password. This is one way to protect a particular column
of data. In the above example, proc print data=mylib.emp; executes, but proc
print data=mylib.employee; fails without the password.
License required No No No
SAS version support 8 and later 9.4 and later 9.4m5 and later
See Also
“AUTHLIB Procedure” in Base SAS Procedures Guide
cannot change any password on an encrypted data set without re-creating the data
set.
The following rules apply to data file encryption:
n To copy an encrypted SAS data file, the output engine must support encryption.
Otherwise, the data file is not copied.
n Encrypted files work only in Release 6.11 or in later releases of SAS.
n You cannot encrypt SAS data views, because they contain no data.
n If the data file is encrypted, all associated indexes are also encrypted.
n Encryption requires approximately the same amount of CPU resources as
compression.
n You cannot use PROC CPORT on encrypted SAS data files.
The following example creates an SAS data set with SAS Proprietary Encryption:
data salary(encrypt=yes read=green);
input name $ yrsal bonuspct;
datalines;
Muriel 34567 3.2
Bjorn 74644 2.5
Freda 38755 4.1
Benny 29855 3.5
Agnetha 70998 4.1
;
TIP Each password option must be coded on a separate line to ensure that they
are properly blotted in the SAS log.
See Also
“AUTHLIB Procedure” in Base SAS Procedures Guide
AES Encryption
In SAS 9.4 release, AES encryption of data sets is available. You specify
ENCRYPT=AES when creating a data set. AES produces a strong encryption by
using a key value that can be up to 64 characters long. Beginning in SAS 9.4M5
release, a stronger AES key generation algorithm is available. You use
ENCRYPT=AES2 data set option. Instead of passwords that are stored in the data
set (SAS Proprietary encryption), AES and AES2 uses a key value that is not stored
in the data set. The key value is created using the ENCRYPTKEY= data set option
when the data set is created. You cannot change the ENCRYPTKEY= key value on
an AES encrypted data set without re-creating the data set or using PROC
AUTHLIB MODIFY to change the recorded key of a metadata-bound library. For
more information, see “AUTHLIB Procedure” in Base SAS Procedures Guide.
The following rules apply to AES and AES2 encryption of data sets:
SAS Data File Encryption 797
n You use SAS/SECURE software, which is licensed with Base SAS software and
is available in all deployments.
n You must use the ENCRYPTKEY= data set option when creating or accessing
an AES encrypted data set unless the metadata-bound library administrator has
securely recorded the encryption key in metadata to which the data set is bound.
For more information, see “AUTHLIB Procedure” in Base SAS Procedures Guide
and SAS Guide to Metadata-Bound Libraries.
n To copy an AES-encrypted data file, the output engine must support AES
encryption. Otherwise, the data file is not copied.
n Releases before SAS 9.4 cannot use an AES-encrypted data file.
n Releases before SAS 9.4M5 cannot use an AES encrypted file that uses AES2
key generation algorithm.
n SAS Viya cannot access data sets created with ENCRYPT=AES2.
n You cannot encrypt SAS views, because they contain no data.
n If two or more data files are referentially related and any of them are AES
encrypted, then all must be AES encrypted. The encryption key for all of the files
must be the same unless the files are bound to metadata with the key securely
recorded in the metadata. For more information about metadata-bound libraries,
see “Metadata-Bound Library” in Base SAS Procedures Guide.
n If the data file has AES encryption, all associated indexes have AES encryption.
The ENCRYPTKEY= data set option does not protect the AES encrypted file from
deletion or replacement. AES encrypted data sets can be deleted by using either of
the following scenarios without having to specify an encrypt key value:
n the KILL option in PROC DATASETS
The encrypt key only prevents access to the contents of the file. To protect the file
from unauthorized deletion or replacement with the SAS system, the file must also
contain an ALTER= password or be bound to metadata.
The following example creates an encrypted data set using AES encryption:
data salary(encrypt=aes encryptkey=green);
input name $ yrsal bonuspct;
datalines;
Muriel 34567 3.2
Bjorn 74644 2.5
Freda 38755 4.1
Benny 29855 3.5
Agnetha 70998 4.1
;
TIP Each password and encryption key option must be coded on a separate line
to ensure that they are properly blotted in the SAS log.
798 Chapter 36 / File Protection
If you omit the ENCRYPTKEY= key value when accessing an AES secured data
set, a dialog box appears and prompts you to add the ENCRYPTKEY= key value. If
the data set is metadata-bound and the key has been stored in the metadata for the
library, the dialog box does not appear.
See Also
“AUTHLIB Procedure” in Base SAS Procedures Guide
In most cases, placing the password=value pair on a separate line blots the value:
data &ds(
read=secret
encrypt=aes
encryptkey=evenmoreso
);
x=1;
run;
data &ds(read=secret);
x=1
;
run;
n Typing errors cause the following passwords to show in the SAS log:
proc print data=lubrary.abc(READ=secret);
run;
or
proc print data=library.abc(ERAD=secret);
run;
n If the code causes an ERROR message, the password is not blotted. For
example, in the following code the libref is misspelled causing SAS to issue the
800 Chapter 36 / File Protection
message: "ERROR: Libref MYLUB is not assigned." and the password is not
blotted.
libname mylib 'c:\';
data mylub.abc(
read=secret
);
x=1;
run;
NOTE: The SAS System stopped processing this step because of errors.
Using Macros
When a password is assigned within a macro, the password is not blotted in the
SAS log when the macro executes. To prevent the password from being revealed in
the SAS log, you can redirect the SAS log to a file. For more information, see
“PRINTTO Procedure” in Base SAS Procedures Guide.
Length of Passwords
In some cases, the length of the displayed password is fixed at eight blotted
characters. In other cases, the number of blotted characters is the length of the
password. Output from the OPTIONS procedure, VERBOSE option, and OPLIST
option have a fixed length of eight.
When a password value is being reported, its length is fixed at eight. But when a
password value is simply being echoed from an input statement, it retains its input
length. This example shows the length of the passwords:
options pdfpassword=(open=a owner=b );
proc options option=pdfpassword;
run;
PDFPASSWORD=XXXXXXXX
Specifies the password to use to open a PDF document and the
password used by a PDF document owner.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
Metadata-Bound Libraries
A metadata-bound library is a physical library that is tied to a corresponding
metadata secured table object. Each physical table within a metadata-bound library
has information in its header that points to a specific metadata object. The pointer
creates a security binding between the physical table and the metadata object. The
binding ensures that SAS universally enforces metadata-layer access requirements
for the physical table—regardless of how a user requests access from SAS. For
more information, see SAS Guide to Metadata-Bound Libraries.
The AUTHLIB procedure is used to create, access, and modify metadata-bound
libraries. This procedure is intended for use by SAS administrators. Users who lack
sufficient privileges in either the metadata layer or the host layer cannot use this
procedure. For more information, see “AUTHLIB Procedure” in Base SAS
Procedures Guide.
802 Chapter 36 / File Protection
803
37
SAS Engines
Specifying an Engine
Usually, you do not have to specify an engine. If you do not specify an engine, SAS
automatically assigns one based on the contents of the SAS library.
Even though SAS automatically assigns an engine based on the library contents, it
is more efficient for you to specify the engine. In some operating environments, in
804 Chapter 37 / SAS Engines
order to determine the contents of a library, SAS must perform extra processing
steps by looking at all of the files in the directory until it has enough information to
determine which engine to use.
For example, if you explicitly specify the engine name as in the following LIBNAME
statement, SAS does not need to determine which engine to use:
libname mylib v9 'SAS-library';
In order to use some engines, you must specify the engine name. For example, in
order to use engines like the XML engine or the metadata engine, specify the
engine name and specify specific arguments and options for that engine. For
example, the following LIBNAME statement specifies the XML engine to import or
export an XML document:
libname myxml xml 'C:\MyFiles\XML\MyXmlFile.xml' xmltype=generic;
You can specify an engine name in the LIBNAME statement, the ENGINE= system
option, and in the New Library window.
Data
SAS Files
Other Files
Oracle, DBMS
Engine A Engine B
Engine C Engine D
SAS Data
Set
n Your data is stored in files for which SAS provides an engine. When you specify
a SAS data set name, the engine locates the appropriate file or files.
Engine Characteristics 805
n The engine opens the file and obtains the descriptive information that is required
by SAS (for example, which variables are available and what attributes they
have, whether the file has special processing characteristics such as indexes or
compressed observations, and whether other engines are required for
processing). The engine uses this information to organize the data in the
standard logical form for SAS processing.
n This standard form is called the SAS data file, which consists of the descriptor
information and the data values organized into columns (variables) and rows
(observations).
n SAS procedures and DATA step statements access and process the data only in
its logical form. During processing, the engine executes whatever instructions
are necessary to open and close physical files and to read and write data in
appropriate formats.
Data that is accessed by an engine is organized into the SAS data set model, and in
the same way, groups of files that are accessed by an engine are organized in the
correct logical form for SAS processing. Once files are accessed as a SAS library,
you can use SAS utility windows and procedures to list their contents and to
manage them. See Chapter 26, “SAS Libraries,” on page 623 for more information
about SAS libraries. The following figure shows the relationship of engines to SAS
libraries.
files
engine
SAS utility
windows and procedures
Engine Characteristics
Engine
Read/Write Activity
An engine can perform one or more of the following tasks:
n limit read/write activity for a SAS data set to read-only
n fully support updating, deleting, renaming, or redefining the attributes of the data
set and its variables
n support only some of these functions
For example, the engines that process BMDP, OSIRIS, or SPSS files support read-
only processing. Some engines that process SAS views permit SAS procedures to
modify existing observations while others do not.
Access Patterns
SAS procedures and statements can read observations in SAS data sets in one of
four general patterns:
sequential access
processes observations one after the other, starting at the beginning of the file
and continuing in sequence to the end of the file.
random access
processes observations according to the value of some indicator variable without
processing previous observations.
BY-group access
groups and processes observations in order of the values of the variables that
are specified in a BY statement.
multiple-pass
performs two or more passes on data when required by SAS statements or
procedures.
If a SAS statement or procedure tries to access a SAS data set whose engine does
not support the required access pattern, SAS prints an appropriate error message in
the SAS log.
Engine Characteristics 807
Levels of Locking
Some features of SAS require that data sets support different levels at which
Update access is used. When a SAS data set can be opened concurrently by more
than one SAS session or by more than one statement or procedure within a single
session, the level of locking determines how many sessions, procedures, or
statements can read and write to the file at the same time. For example, with the
FSEDIT procedure, you can request two windows on the same SAS data set in one
session. Some engines support this capability; others do not.
The levels that are supported are record level and member (data set) level.
Member-level locking enables Read access to many sessions, statements, or
procedures. This locking restricts all other access to the SAS data set when a
session, statement, or procedure acquires update or output access. Record-level
locking enables concurrent Read access and Update access to the SAS data set by
more than one session, statement, or procedure. This locking prevents concurrent
Update access to the same observation. Not all engines support both levels.
By default, SAS provides the greatest possible level of concurrent access, while
guaranteeing the integrity of the data. In some cases, you might want to guarantee
the integrity of your data by controlling the levels of Update access yourself. Use the
CNTLLEV= data set option to control levels of locking. CNTLLEV= enables locking
at three levels:
n library
n data set
n observation
Here are situations in which you should consider using the CNTLLEV= data set
option:
n Your application controls access to the data, such as in SAS Component
Language (SCL), SAS/IML software, or DATA step programming.
n You access data through an interface engine that does not provide member-level
control of the data.
For more information about the CNTLLEV= data set option, see SAS Data Set
Options: Reference.
You can also acquire an exclusive lock on an existing SAS file by issuing the LOCK
global statement. After an exclusive lock is obtained, no other SAS session can read
or write to the file until the lock is released. For more information about the LOCK
statement, see SAS DATA Step Statements: Reference.
Note: SAS products, such as SAS/ACCESS and SAS/SHARE, contain engines that
support enhanced session management services and file locking capabilities.
Indexing
A major processing feature of SAS is the ability to access observations by the
values of key variables with indexes. See “Understanding SAS Indexes” on page
808 Chapter 37 / SAS Engines
692 for more information about using indexes for SAS data files. Note that not all
engines support indexing.
n It enforces integrity constraints, creates backup files, and creates audit trails.
Note: SAS files created in SAS 7, 8, and 9 have the same file format.
Remote Engine
The REMOTE engine is a SAS library engine for SAS/SHARE software. Using it
enables a SAS session to access shared data by communicating with a SAS server.
For more information, see the SAS/SHARE User’s Guide.
SASESOCK Engine
The SASESOCK engine processes input to and output from TCP/IP ports instead of
physical disk devices. The SASESOCK engine is required for SAS/CONNECT
applications that implement MP CONNECT processing with the piping mechanisms.
For more information, see the SAS/CONNECT User’s Guide.
Sequential Engines
A sequential engine processes SAS files on storage media that do not provide
random access methods (for example, tape or sequential format on disk). A
sequential engine requires less overhead than the default Base SAS engine
because sequential access is simpler than random access. However, a sequential
engine does not support some Base SAS features such as audit trails, generation
data sets, integrity constraints, and indexing.
The sequential engine supports some file types for backup and restore purposes
only, such as CATALOG, VIEW, and MDDB. ITEMSTOR is the only file type that the
sequential engine does not support. DATA is the only file type that is useful for
purposes other than backup and restore.
The following sequential engines are available:
V9TAPE (TAPE)
processes SAS 7, SAS 8, and SAS 9 files.
V6TAPE
processes SAS 6 files without requiring you to convert the file to the SAS 9
format.
For more information, see “Sequential Data Libraries” on page 635.
810 Chapter 37 / SAS Engines
Transport Engine
The XPORT engine processes transport files. The engine transforms a SAS file
from its operating environment-specific internal representation to a transport file. A
transport file is a machine-independent format that can be used for all hosts. In
order to create a transport file, explicitly specify the XPORT engine in the LIBNAME
statement, and then use the DATA step or COPY procedure.
For information about using the XPORT engine, see Moving and Accessing SAS
Files.
V6 Compatibility Engine
The SAS 6 compatibility engine can automatically support some processing of SAS
6 files in SAS 9 without requiring you to convert the file to the SAS 9 format.
For more information, see Chapter 35, “Cross-Release Compatibility and Migration,”
on page 779, or see the Migration Focus Area at support.sas.com.
Special-Purpose Engines
To use the JMP engine, specify JMP as the engine name, along with the location of
a SAS library in the LIBNAME statement. For example, the following code reads and
prints five observations from the JMP file Baseball.jmp:
For information about how to use the JMP engine, “LIBNAME Statement: JMP
Engine” in SAS Global Statements: Reference
To use the XML engine, specify either the XML or XMLV2 engine nickname, along with
specific arguments and options (for example, in the LIBNAME statement or in the
New Library window).
For information about how to use the XML engine, see the SAS XMLV2 and XML
LIBNAME Engines: User’s Guide.
814 Chapter 37 / SAS Engines
815
38
SAS File Management
n You are responsible for establishing performance guidelines for a data center.
n You do interactive queries on large SAS data sets using SAS/FSP software.
For information about improving performance, see Chapter 12, “Optimizing System
Performance,” on page 217.
n The disk where the data file (including the index file and audit file) or catalog is
stored becomes full before the file is completely written to it.
n An input/output error occurs while writing to the data file, index file, audit file, or
catalog.
When the failure occurs, the observations or records that were not written to the
data file or catalog are lost and some of the information about where values are
stored is inconsistent. The next time SAS reads the file, it recognizes that the file's
contents are damaged and repairs it to the extent possible in accordance with the
setting for the DLDMGACTION= data set option or system option, unless the data
set is truncated. In this case, use the REPAIR statement to restore the data set.
If damage occurs to the storage device where a data file resides, you can restore
the damaged data file, the index, and the audit file from a backup device.
Note: SAS is unable to repair or recover a SAS view (a DATA step view, an SQL
view, or a SAS/ACCESS view) or a stored compiled DATA step program. If a SAS
file of type VIEW or PROGRAM is damaged, you must re-create it.
Note: If the audit file for a SAS data file becomes damaged, you cannot process the
data file until you terminate the audit trail. Then, you can initiate a new audit file or
process the data file without one.
To recover the damaged data file, you can issue the REPAIR statement in PROC
DATASETS, which is documented in Base SAS Procedures Guide.
DLDMGACTION=ABORT
tells SAS to terminate the step, issue an error message to the SAS log indicating
that the request file is damaged, and end the SAS session.
DLDMGACTION=REPAIR
tells SAS to automatically repair the file and rebuild indexes, integrity constraints,
and the audit file as well. If the repair is successful, a message is issued to the
SAS log indicating that the open and repair steps were successful. If the repair is
unsuccessful, processing stops without a prompt and an error message is issued
to the SAS log indicating the requested file is damaged.
Note: If the data file is large, the time needed to repair it can be long.
DLDMGACTION=NOINDEX
tells SAS to automatically repair the data file, disable the indexes and integrity
constraints, delete the index file, update the data file to reflect the disabled
indexes and integrity constraints, and limit the data file to be opened only in
INPUT mode. A warning is written to the SAS log instructing you to execute the
PROC DATASETS REBUILD statement to correct the disabled indexes and
integrity constraints and rebuild the index file. For more information, see
“Recovering Disabled Indexes and Integrity Constraints” on page 819.
DLDMGACTION=PROMPT
tells SAS to provide the same behavior that exists in Version 6 for both
interactive mode and batch mode. For interactive mode, SAS displays a dialog
box that asks you to select the FAIL, ABORT, or REPAIR action. For batch
mode, the files fail to open.
For a data file, the date and time of the last repair and a count of the total number of
repairs is automatically maintained. To display the damage log, use PROC
CONTENTS as shown below:
proc contents data="c:\temp\testuser\large";
run;
818 Chapter 38 / SAS File Management
Recovering Indexes
In addition to the failures listed earlier, you can damage the indexes for SAS data
files by using an operating environment command to delete, copy, or rename a SAS
data file, but not its associated index file. The index is repaired similarly to the
DLDMGACTION= option as described for SAS data files, or you can use the
REPAIR statement in PROC DATASETS to rebuild composite and simple indexes
that were damaged.
You cannot use the REPAIR statement to recover indexes that were deleted by one
of the following actions:
n copying a SAS data file by some means other than PROC COPY or PROC
DATASETS, for example, using a DATA step
n using the FORCE option in the SORT procedure to write over the original data
file
In the above cases, the index must be rebuilt using the PROC DATASETS INDEX
CREATE statement.
n updates the data file to reflect the disabled indexes and integrity constraints
WARNING: SAS data file MYLIB.MYFILE.DATA was damaged and has been
partially repaired. To complete the repair, execute the DATASETS
procedure REBUILD statement.
The data file stays in INPUT mode until the PROC DATASETS REBUILD statement
is executed. You use this statement to specify whether you want to restore the
indexes and integrity constraints and rebuild the index file or delete the disabled
integrity constraints and indexes. For more information, see the REBUILD statement
in PROC DATASETS, which is documented in the Base SAS Procedures Guide.
Recovering Catalogs
To determine the type of action that SAS takes when it tries to open a SAS catalog
that is damaged, set the DLDMGACTION= data set option or system option. Then
820 Chapter 38 / SAS File Management
39
External Files
n SAS programming statements that you want to submit to the system for
execution
External files can also store output from your SAS job as:
n a SAS log (a record of your SAS job).
1. In some operating environments, you can also use the command '&' to assign a fileref.
Referencing Many External Files Efficiently 823
External File
Task Tool Example
* SAS creates a file that is named with the appropriate extension for your operating environment.
n DATAURL
n FTP
n Hadoop
n SFTP
n TCP/IP SOCKET
n URL
n WebDAV
Referencing External Files with Other Access Methods 825
n ZIP
Examples of how to use each method are shown in the following table:
External File
Task Tool Example
See SAS DATA Step Statements: Reference for detailed information about each of
these statements.
When you use a DATA step to write a customized report, you write it to an external
file. In its simplest form, a DATA step that writes a report looks like this:
data _null_;
set budget;
file 'your-file-name';
put variables-and-text;
run;
For examples of writing reports with a DATA step, see Chapter 20, “DATA Step
Processing,” on page 441.
If your operating environment supports a graphical user interface, you can use the
EFI or the Export Wizard to write to an external file. The EFI is a point-and-click
graphical interface that you can use to read and write data that is not in SAS internal
format. By using the EFI, you can read data from a SAS data set and write it to an
external file, and you can read data from an external file and write it to a SAS data
set. See SAS/ACCESS Interface to PC Files: Reference for more information about
the EFI.
Note: If the data file you are passing to EFI is password protected, you are
prompted multiple times for your login ID and password.
The Export Wizard guides you through the steps to read data from a SAS data set
and write it to an external file. As a wizard, it is a series of windows that present
simple choices to guide you through the process. See SAS/ACCESS Interface to
PC Files: Reference for more information about the wizard.
Working with External Files 827
For examples of using a DATA step to process external files, see Chapter 21,
“Reading Raw Data,” on page 471.
828 Chapter 39 / External Files
829
PART 6
Chapter 40
The SMTP E-Mail Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Chapter 41
Universal Unique Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
Chapter 42
Internet Protocol Version 6 (IPv6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
830
831
40
The SMTP E-Mail Interface
2 If the user ID is not specified by the USERID= option, the SAS SMTP e-mail
interface attempts to authenticate by using the user ID specified by the FROM=
option of the FILENAME= statement.
3 If the user ID is not specified in the FROM= option in the FILENAME= statement,
the SAS SMTP e-mail interface attempts to authenticate by using the user ID
specified by the EMAILID= system option.
4 If the user ID is not specified by the EMAILID= system option, the SAS SMTP e-
mail interface looks up the user ID from the operating system and attempts to
authenticate that user ID.
For more information about sending e-mail from SAS, see the SAS documentation
for your operating environment.
832 Chapter 40 / The SMTP E-Mail Interface
FILENAME Statement
In the FILENAME statement, the EMAIL (SMTP) access method enables you to
send e-mail programmatically from SAS using the SMTP e-mail interface. For more
information, see the “FILENAME Statement” in SAS Global Statements: Reference.
41
Universal Unique Identifiers
The UUIDGEND utility is required for non-Windows hosts that are running versions
of SAS prior to SAS 9.4M2.
The UUID Generator Daemon is not required for the following:
n SAS applications that execute on Windows
n SAS applications that execute in UNIX environments that are running SAS
version 9.4M2 (or later)
UUIDGEN Function
The UUIDGEN function returns a UUID for each cell. For more information, see
“UUIDGEN Function” in SAS Functions and CALL Routines: Reference.
42
Internet Protocol Version 6 (IPv6)
Overview of IPv6
SAS 9.2 introduced support for the next generation of Internet Protocol, IPv6, which
is the successor to the current Internet Protocol, IPv4. Rather than replacing IPv4
with IPv6, SAS supports both protocols. There is a lengthy transition period during
which the two protocols coexist.
A primary reason for the new protocol is that the limited supply of 32-bit IPv4
address spaces was being depleted. IPv6 uses a 128-bit address scheme. This
scheme provides more IP addresses than did IPv4.
IPv6 includes these benefits over IPv4:
n larger address space (128 bits rather than 32 bits)
n automatic configuration
Table 42.1 Comparison of Features of the IPv6 and IPv4 Address Formats
The :: (consecutive colons) notation can be used to represent four successive 16-bit
blocks that contain zeros. When SAS software encounters a collapsed IP address, it
reconstitutes the address to the required 128-bit address in eight 16-bit blocks.
Fully Qualified Domain Names (FQDN) 841
The brackets are necessary only if also specifying a port number. Brackets are used
to separate the address from the port number. If no port number is used, the
brackets can be omitted.
As an alternative, the block that contains the zero can be collapsed. Here is an
example:
[2001:db8::1]:80
The http:// prefix specifies a URL. The brackets are necessary only if also
specifying a port number. Brackets are used to separate the address from the port
number. If no port number is used, the brackets can be omitted.
ACTION=RESUME
OPTIONS=""
NOAUTOPAUSE;
If an IP address had been used and if the IP address that was associated with the
computer node name had changed, the code would be inaccurate.
842 Chapter 42 / Internet Protocol Version 6 (IPv6)
An FQDN can remain intact in the code while the underlying IP address can change
without causing unpredictable results. The TCP/IP name-resolution system
automatically resolves the FQDN to its associated IP address.
Here is an example of an FQDN that is specified in a SAS GUI application.
The full FQDN, d11076.na.apex.com, is specified in the Remote Host field of the
Connect Server Properties window in SAS Management Console.
Some SAS products impose limits on the length for computer names.
The following code is an example of an FQDN that is assigned to a SAS menu
variable:
%let sashost=hrmach1.dorg.com;
rsubmit sashost.sasport;
Because the FQDN is longer than eight characters, the FQDN must be assigned to
a SAS macro variable, which is used in the RSUBMIT statement.