Tabulate
Tabulate
INTRODUCTION
PROC TABULATE is a procedure used to display descriptive statistics in tabular format. It computes many statistics that are computed by other procedures, such as MEANS, FREQ, and REPORT. PROC TABULATE then displays the results of these statistics in a table format. TABULATE will produce tables in up to three dimensions and allows, within each dimension, multiple variables to be reported one after another hierarchically. PROC TABULATE has some very nice mechanisms that can be used to label and format the variables and the statistics produced.
BASIC SYNTAX
PROC TABULATE <options>; CLASS variables < / options>; VAR variables < / options>; TABLE <page> , <row> , column < / options> ; other statements ; RUN; Lets take a look at the basic syntax of the PROC TABULATE Procedure. We will start with three of the statements that you can use in PROC TABULATE, CLASS, VAR, and TABLE. As you can see each of these statements, as well as the PROC TABULATE statement itself allows options to be added. For each of the statements, the options need to be preceded with a /. Note: two differences in the syntax from any other Procedure in SAS; one) the variables in all three statements cannot be separated by commas; and two) the commas in the table statement are treated in a special way and mean a change in dimension.
NESUG 2006
table (more on this in a minute). This option eliminates horizontal separators in the table (only affects traditional SAS monospace output destination). Unformatted/Data/Formatted/Freq orders how the CLASS values appear in the table Tells SAS to treat missing values as valid Used with ODS specifications To specify exact combinations of data to include
VAR STATEMENT
The VAR statement is used to list the variables you intend to use to create summary statistics. As such, they must be numeric. There are only two options that can be used with the VAR statement and, if present, these options appear after a /. Style= Weight= ODS style element definitions. Example might be to change the justification or the font. specify another variable that will weight the values of the variable with the following exceptions: (0 or <0 = counts observation in total number of observations, blank= exclude observation entirely)
NESUG 2006
TABLE STATEMENT
The Table statement consists of up to three dimensions expressions and the table options. To identify different dimensions, just use a comma. If there are no commas, SAS assumes you are only defining the column dimension (which is required), if there is one comma, then the row dimension is first, then the column, or, if you have three commas, then the order of expressions is page, then row, then column. Options appear at the end after a /. You can have multiple table statements in one PROC TABULATE. This will generate one table for each statement. All variables listed in the table statement must also be listed in either the VAR or CLASS statements. In the table expressions, there are many statistics that can be specified. Among them are row and column percents, counts, means, and percentiles. There are about a dozen options that can be specified. Here are a few of them. Box = Condense NoContinued MissText PrintMiss Indent= RTSpace = Style=[options] Text and style for the empty box in the upper left corner. Print multiple pages to the same physical page. Suppress the continuation message If a cell is blank, this text will print instead Print CLASS variable values, even if there is not data for them (this only works if somewhere there is at least one observation with that value. Number of spaces to indent nested row headings. Number of positions to allow for the row headings. Specify ODS style elements for various parts of the table.
NESUG 2006
Here is our example using the variable as a VAR variable. We should get the total income summed across all observations. The resulting table is shown to the right of the example.
ADDING STATISTICS
If you want something other than the default N or sum, you can do this by using the * and adding the name of the statistic you want instead. You can group multiple stats and variables with parentheses to get the results you want. Descriptive Statistics COLPCTN PCTSUM COLPCTSUM MAX ROWPCTN MEAN ROWPCTSUM MIN STDDEV / STD N STDERR NMISS SUM PAGEPCTSUM PCTN VAR Quantile Statistics MEDIAN | P50 P1 Q3 | P75 P90 P95 P5 P10 P99 Q1 | P25 QRANGE
CLASS STATEMENT
Classification variables allow you to get stats by category. You will get one column or row for each value of the CLASS variable. You will need to be careful to use a categorical variable with only a limited number of categories or you may end up producing many, many pages of output.
NESUG 2006
The syntax for the CLASS statement is similar to the VAR statement. List the variables you want to use to group data followed by a / and any options you want. The variables here can be either numeric or character (unlike the VAR statement which required numeric). The statistics you can get for these variables are only counts and percents. The statistics will be produced for each LEVEL of the variable. This is almost like using a BY statement within the table. The options you can use for the CLASS statement are different than for the VAR statement. Here are a few of them: Ascending/Descending Missing MLF Order= Specify the order the CLASS variables values are displayed Consider missing values valid with special missing values treated separately. Enables use of multi-level formatting with overlapping ranges (ex by state and by region at the same time) Groups levels of CLASS variables in the order specified: Internal (default) use actual values in data Data same order the data is already sorted in Formatted use the formatted data values Freq highest counts first Give ODS style element definitions to these variables This will preload a format and will also (if other options are also specified), display all values in the table even if there are no observations present with some of the values. Will exclude from the table all combinations of CLASS variables not present in the data (normally used with the preloadfmt option. Used to group values together by their internal values, not formatted.
Lets talk about how the CLASS variables are handled if they are missing. This applies to any Procedure where you can use a CLASS statement. If an observation has a missing value on even one of the CLASS variables, that observation is excluded from ALL calculations, even if they could have been included in some of the others. For example, a student has a gender value of F, and an education value of blank. He would not be included in the gender totals. To get him included wherever possible, use the missing option. FROM SAS Online Documentation: By default, if an observation contains a missing value for any CLASS variable, then PROC TABULATE excludes that observation from all tables that it creates. CLASS statements apply to all TABLE statements in the PROC TABULATE step. Therefore, if you define a variable as a CLASS variable, then PROC TABULATE omits observations that have missing values for that variable from every table even if the variable does not appear in the TABLE statement for one or more tables. If you specify the MISSING option in the PROC TABULATE statement, then the Procedure considers missing values as valid levels for all CLASS variables. If you specify the MISSING option in a CLASS statement, then PROC TABULATE considers missing values as valid levels for the CLASS variable(s) that are specified in that CLASS statement. In this example, we are adding more columns to the right of the two columns we already have from the previous example. PROC TABULATE data=one; CLASS GENDER; VAR income; TABLE income * (N Mean) INCOME * MEAN * GENDER; RUN; 5
NESUG 2006
PROC TABULATE data=one; CLASS ethnic; TABLE ethnic=' ' * N=' ' , ALL='N' / ROW=FLOAT ; RUN;
NESUG 2006
To swap rows and columns, you need only to switch what you put in front of the comma compared to what is after it. PROC TABULATE data=one; CLASS gender; VAR income; TABLE gender , income * (N Mean) ; RUN;
If you move the statistic specification so that it is attached to the rows, the results look very different. PROC TABULATE data=one; CLASS gender; VAR income; TABLE gender * (N Mean Max) , income ; RUN;
NESUG 2006
You can also nest classification variables. PROC TABULATE data=one; CLASS gender fulltime educ; VAR income; TABLE fulltime * gender , Income * educ * mean ; RUN;
NESUG 2006
NESUG 2006
In this example, we are adding a sub-total for total education by gender PROC TABULATE data=one; CLASS gender fulltime educ; TABLE (fulltime ALL) * gender ALL, educ * N ; RUN;
If you want to put a subtotal for gender within each education level, just change the placement of the ALL keyword. Here we are also adding a total column for the Educ group. PROC TABULATE data=one; CLASS gender fulltime educ; TABLE fulltime * (gender ALL) , (educ all)* N ; RUN;
10
NESUG 2006
ADDING LABELS
There are two ways to add labels for your variables. The first, and the simplest, is to just add =label to the dimension expression after the variable you want to label. This way works for labeling both the variables and the statistics. The second way is to add a label statement to your code: LABEL var=label. The label statement will not work for labeling the statistics. You need to use the KEYLABEL statement to label statistics: KEYLABEL stat=label. PROC TABULATE data=one; CLASS gender fulltime; VAR income; TABLE gender = 'Gender' ALL = 'Total', Fulltime = 'Employment Status' * income * mean = 'Mean' ; RUN;
Alternatively, you can also use this code to get the same table as shown above. PROC TABULATE data=one; CLASS gender fulltime; VAR income; TABLE gender ALL , Fulltime * income * mean ; LABEL gender='Gender' Fulltime='Employment Status'; KEYLABEL mean='Mean' all='Total'; RUN;
HIDING LABELS
In order to hide variable or statistic labels, you can add = as a label. Note the statistics MUST be attached to the row dimension and NOT the column dimension for this to work. PROC TABULATE data=one; CLASS educ gender fulltime; VAR income; TABLE educ , Fulltime='Employment Status' * gender = ' ' * income * mean = ' ' ; RUN; 11
NESUG 2006
NESUG 2006
PROC TABULATE data=one; CLASS gender fulltime educ; VAR income; TABLE educ='Education', fulltime = 'Employment Status', Gender * income * mean / BOX=_PAGE_ ; RUN;
13
NESUG 2006
The next example shows the use of ROWPCTN. Note that the percents will add up to 100% (taking into account for rounding errors) across each row. COLPCTN works similarly only the columns will add up to 100%. PROC TABULATE data=one; CLASS ethnic educ; TABLE ethnic * ROWPCTN, Educ ; RUN;
NESUG 2006
To exclude this observation, you can use the following code. However, this will change the total number of observations included in the table and may change any total columns. Although in this example, the table did not change, in some cases, this can cause differences, so be careful. PROC TABULATE data=one; WHERE income ne . ; CLASS gender fulltime; VAR income; TABLE fulltime ALL, Income * (gender ALL) * mean; RUN; Rather than dropping the observations with missing income, a better solution is to label the empty cell. To do this, you can use the MISSTEXT option on the table statement. Any internal cell that is empty will have this text placed in it instead. PROC TABULATE data=one; CLASS gender fulltime; VAR income; TABLE fulltime ALL, Income * (gender ALL) * mean / MISSTEXT = 'no data' ; RUN;
15
NESUG 2006
To add these hidden missing observations back in, you can specify the missing keyword on either the PROC TABULATE statement, or after a / on the CLASS statement. In the following example, I chose to add it to the CLASS statement. This option can also be used in other procedures (such as PROC MEANS) where you can also specify the CLASS statement. PROC TABULATE data=one; CLASS gender ethnic / MISSING; VAR income; TABLE ethnic ALL, Income * (gender ALL) * n; RUN;
16
NESUG 2006
To label missing data in a row or column header (i.e. CLASS variable), you can create a format using PROC format and then assign the format to the variable using the format statement. proc format; value $ethnic 'W'='White' 'H'='Hisp.' 'I'='Am. Ind.' 'A'='Asian' 'B'='Af.Amer.' ' '='Missing' ; RUN; PROC TABULATE data=one; CLASS gender ethnic / missing; VAR income; TABLE ethnic ALL, Income * (gender ALL) * n; FORMAT ethnic $ethnic. ; RUN;
17
NESUG 2006
MEAN*F=Dollar8. );
MEAN*F=Dollar8. )
18
NESUG 2006
You can get very different results depending upon where you place the style options. If you place the style options on the PROC TABULATE statement, for example, you will affect all the table cells in the tables. (See the table below for a list of places where you can put style options and what portion of the table they will affect.) Note: for the CLASS, CLASSLEV, VAR, and KEYWORD statements, the style options can also be specified in the dimension expression in the Table statement. Style Place In PROC TABULATE S=[ ] CLASS varname / S=[ ] CLASSLEV varname / S=[ ] VAR varname / S=[ ] KEYWORD stat / S=[ ] TABLE page,row,col / S=[ ] BOX={label= S=[ ] } Part of Table Affected data cells heading for variable varname class values for variable varname heading for variable varname heading for named stat table borders, rules, cell spacing table Box
19
NESUG 2006
ODS RTF file='c:\myfile.rtf'; PROC TABULATE data=one f=10.2 S=[FOREGROUND=BLACK CELLWIDTH=200 JUST=C]; CLASS gender; CLASSLEV gender / S=[BACKGROUND=YELLOW]; VAR income; TABLE gender=' ' all={label='Tot' s=[JUST=R]}, mean={s=[FOREGROUND=WHITE BACKGROUND=PURPLE]} * income / box={label='Income' s=[VJUST=B JUST=R]}; Run; ODS RTF close;
NESUG 2006
over the column header when you view the table in a Web Browser, a little text box will open underneath that shows the text you have in your format. PROC FORMAT; VALUE $ethnic 'W'='White - Non-Hispanic' 'H'='Hispanic American' 'I'='American Indian' 'A'='Asian' 'B'='African American' ' '='Missing'; ODS HTML file='c:\myfile.HTML'; PROC TABULATE data=one; CLASS ethnic gender; CLASSLEV ethnic / s=[flyover=$ethnic.]; VAR income; TABLE gender, ethnic * income; RUN; ODS HTML CLOSE;
Cellwidth=80];
21
NESUG 2006
NESUG 2006
SAS and SAS/GRAPH are registered trademarks or trademarks of SAS Institute, Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.
AUTHOR CONTACT
Your comments and questions are valued and welcome. Contact the author at: Wendi L. Wright 71 Black Pine Ln. Levittown, PA 19054-2108 Phone: (215) 547-3372 E-mail: [email protected]
23