APPLIED STATISTICS-II (1) Updated
APPLIED STATISTICS-II (1) Updated
Introduction of SPSS
Irshad Ahmad
MS in Psychology
International Islamic University Islamabad
Introduction
SPSS – What Is It?
SPSS means “Statistical Package for the Social Sciences” and was first
launched in 1968.
Since SPSS was acquired by IBM in 2009, it's officially known as IBM SPSS
Statistics but most users still just refer to it as “SPSS”.
SPSS - Quick Overview Main Features
Go to Edit, Options,..
SPSS Windows
Data Editor
Syntax Editor
The Data Editor has two parts
1. Data View window
Which displays data from the active file in spreadsheet format
This sheet -called data view- always displays our data values.
SPSS toolbars contain some handy tools. Some of their limitations can be circumvented by building your
own toolbars and toolbar tools. Doings so is utterly simple and speeds up a lot of work.
Columns of cells are called variables. Variable names (“gender”) are shown in the column headers.
Rows of cells are called cases. Note that in SPSS, “cases” refers to nothing more than rows of cells which may -or may not-
correspond to people or objects.
Data cell contents are called values.
You can drag the three dots in the right margin leftwards in order to split the window horizontally. In a similar vein, split
the window vertically by dragging in the lower margin upwards. Split windows allow for viewing distant cases or variables
simultaneously.
You can toggle between Data View and Variable View by clicking the tabs in the left lower corner. A faster option is
the Ctrl + t shortkey.
The status bar may provide useful information on the data such as whether a WEIGHT, FILTER, SPLIT FILE or Unicode mode is in
effect.
The Menu bar
Data View
In order to define a variable
and set its parameters you
need to get some data into
SPSS. The easiest way is just
to type it in. Select the Data
View – click on the tab at the
bottom of the program
window – start in the first cell
of an empty column, and
work downwards. Let’s set up
a variable for age by typing in
five different ages (see
screenshot).
Defining a Variable: Adding a Variable
Variable View
If you click on the
Variable View tab, you’ll
get a screen that looks
like this.
Variable Parameters
The column headings: Name
As the name of the variable is restricted in terms of length and characters it can
contain, it is often useful to have a longer, more meaningful label that will appear in
the tables and graphs that you create.
The label can be up to 256 characters, but some statistical procedures will only
display the first part of a very long label, so it is desirable to keep the first 20 or so
characters unique.
A brief but descriptive definition or display name for the variable. When defined, a
variable's label will appear in the output in place of its name.
For example, if you give the Age variable a label “Age Status”, then “Age Status” will
appear on charts, graphs and tables. To add a label, click inside a cell within the
Label column, and type in the value.
The column headings: Values
Tip: In the Data Editor window, you can switch back and forward between viewing
the raw data and the assigned value labels, by selecting/deselecting Value Labels
under the View menu (or clicking on the value labels icon in the toolbar).
The column headings: Missing
Missing missing values
Defines missing values
The values are excluded from some analysis
Options:
Up to 3 discrete missing values (e g. 999)
A range of missing values plus one discrete missing value
In many data collection exercises, some information may be unavailable. This can be for a
variety of reasons (e.g. refusal, not collected, unknown). In the Data Editor, leaving a cell blank
will be interpreted as meaning that the information for that variable is missing for that
particular case. These cases are termed ‘system missing’. Other values that we want to define
as missing are termed ‘user-defined missing’.
For string variables, a blank cell is treated in the same way as any other valid value and is not
regarded as system missing. However, a blank space can be defined as user missing,
provided that it is a ‘short-string’ variable i.e. one which has been defined as 8 characters or
less.
The column headings: Columns, Align
Measure
Levels of measurement:
Nominal
Ordinal
Interval
Ratio
In SPSS, interval and ratio are designated together as Scale
The default for string variables is Nominal
The default for numeric variables is Scale
The column headings: Role
Role
Can be used to pre-select variables for analysis.
Default is ‘Input’.
Input - a predictor/independent variable
Target - a dependent variable
Both - may be used as independent or dependent
None - no role assigned
Partition - variable used to partition the data into separate samples
(Split is not used in SPSS Statistics: it is only used in SPSS Modeler).
Saving the file
The file must always be saved in order to save the work that has been done to date:
File/Save
Move to the target directory
Enter a file name
Save
Data Entry into SPSS
General guidelines for data entry
Guideline # 1
Rules & Best Practices for Naming Variables
1. Names can be safely up to 32 characters long (Recommended: 8 characters or less ). Names may
include alphanumeric characters, non-punctuation characters, and a period (.).
2. You can’t have a space in a variable name.
3. Don’t end a variable name with a period.
4. Don’t end a variable name with an underscore.
5. You can use periods and underscores within a variable name.
6. You can use upper and lower case, and a mixture thereof, within a variable name.
7. You can’t use SPSS reserved keywords as a variable name (i.e., you can’t use TEST, ALL, AND, BY, EQ,
GE, GT, LE, LT, NE, NOT, OR, TO or WITH). These are used in the SPSS syntax and if they were permitted,
the software would not be able to distinguish between a command and a variable.
8. Each variable name must be unique; duplication is not allowed. Variable names are not case sensitive.
The names NEWVAR, NewVar, and newvar are all considered identical.
General guidelines
Guideline # 2
Guideline # 3
Give each case a unique, sequential case number (ID). Place this ID number in the first
column on the left
Guideline # 4
Guideline # 5
All data for a project should be in one spreadsheet. Do not include graphs or
summary statistics in the spreadsheet.
Guideline # 6
Each patient should be entered on a single line or row. Do not copy a patient’s
information to another row to perform subgroup analysis.
General guidelines
Guideline # 7
For yes/no questions, enter “0” for no and “1” for yes.
Do not enter “?”, “*”, or “NA” for missing data because this indicates to the statistical
program that the variable is a string variable.
Guideline # 8
Put ordinal variables into one column if they are mutually exclusive
Avoid: Preferred:
Pain Pain
Mild Moderate Severe
1 0 0 1
0 1 0 2
0 0 1 3
Guideline # 9