0% found this document useful (0 votes)
24 views39 pages

APPLIED STATISTICS-II (1) Updated

Uploaded by

Abdul Majid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views39 pages

APPLIED STATISTICS-II (1) Updated

Uploaded by

Abdul Majid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

APPLIED STATISTICS-II

Introduction of SPSS

Irshad Ahmad
MS in Psychology
International Islamic University Islamabad
Introduction
SPSS – What Is It?

SPSS means “Statistical Package for the Social Sciences” and was first
launched in 1968.
Since SPSS was acquired by IBM in 2009, it's officially known as IBM SPSS
Statistics but most users still just refer to it as “SPSS”.
SPSS - Quick Overview Main Features

SPSS is software for editing and analyzing all sorts of data.


These data may come from basically any source: scientific research, a customer
database, Google Analytics or even the server log files of a website.
SPSS can open all file formats that are commonly used for structured data such as
 spreadsheets from MS Excel.
 plain text files (.txt or .csv).
 relational (SQL) databases.
 Stata and SAS.
Starting SPSS

Click on the IBM SPSS shortcut button on your desktop.


Starting SPSS
The opening screen should appear as
Before you perform analysis
in SPSS, let’s set up the
following option.

Go to Edit, Options,..
SPSS Windows

SPSS has 3 windows

Data Editor

Viewer or Draft Viewer

Syntax Editor
The Data Editor has two parts
1. Data View window
Which displays data from the active file in spreadsheet format
This sheet -called data view- always displays our data values.
SPSS toolbars contain some handy tools. Some of their limitations can be circumvented by building your
own toolbars and toolbar tools. Doings so is utterly simple and speeds up a lot of work.
Columns of cells are called variables. Variable names (“gender”) are shown in the column headers.
Rows of cells are called cases. Note that in SPSS, “cases” refers to nothing more than rows of cells which may -or may not-
correspond to people or objects.
Data cell contents are called values.
You can drag the three dots in the right margin leftwards in order to split the window horizontally. In a similar vein, split
the window vertically by dragging in the lower margin upwards. Split windows allow for viewing distant cases or variables
simultaneously.

You can toggle between Data View and Variable View by clicking the tabs in the left lower corner. A faster option is
the Ctrl + t shortkey.
The status bar may provide useful information on the data such as whether a WEIGHT, FILTER, SPLIT FILE or Unicode mode is in
effect.
The Menu bar

The Menu bar lists 12 pull down


menu, grouping the available
SPSS commands.

Some of these have sub-menus


2. Variable View window
An SPSS data file always has a second sheet called variable view. It shows the metadata
associated with the data.
Metadata is information about the meaning of variables and data values. This is generally known
as the “codebook” but in SPSS it's called the dictionary.
After selecting Variable View, variables are shown as rows instead of columns.
We're now seeing information about our variables and values instead of the data
values themselves.
Columns now represent variable properties such as label, name and type.
Cells contain property values. For example, the width of the fourth
variable last_name is 8.
Viewer or Draft Viewer
It holds a nice table with all statistics on all variables we chose. The screenshot above shows what it
looks like.
Syntax Editor

which displays syntax files


Defining a Variable
Defining a variable includes giving it a name, specifying its type, the values the
variable can take (e.g., 1, 2, 3), etc. Without this information, your data will be much
harder to understand and use. Whenever you are working with data, it is important
to make sure the variables in the data are defined so that you (and anyone else
who works with the data) can tell exactly what was measured, and how.
Variables (including any that you insert into the data file in Data View) are defined in
the Variable View within the Data Editor.
Within Variable View, rows correspond to the variables in the data file, and columns
correspond to their defining characteristics. Selecting a cell allows the
corresponding characteristic to be specified or changed, either by over-typing or via
a scrollable list.
For some attributes, when you click on a cell, a small grey box …
will appear.
Clicking on the grey box leads to a pop-up window where further changes can be
made.
Defining a Variable: Adding a Variable

Data View
In order to define a variable
and set its parameters you
need to get some data into
SPSS. The easiest way is just
to type it in. Select the Data
View – click on the tab at the
bottom of the program
window – start in the first cell
of an empty column, and
work downwards. Let’s set up
a variable for age by typing in
five different ages (see
screenshot).
Defining a Variable: Adding a Variable

Variable View
If you click on the
Variable View tab, you’ll
get a screen that looks
like this.
Variable Parameters
The column headings: Name

Name variable name


SPSS Statistics initially assigns default variable names (VAR00001, etc)
This should begin with a letter and be no more than 64 characters (letters and/or numbers –
no spaces).
Early versions of SPSS only allowed 8 characters. The 8-character limit may still apply when
importing from or exporting to files used by some other software. Variable names must be
unique.
Try to give meaningful variable names:
Describing the characteristic: for example, age
Linking to the questionnaire: for example, A1Q3
Keep the names consistent across files
The column headings: Type
Type
Internal formats:
Numeric
String (alphanumeric)
Date
Numeric variables:
Numeric measurements
Codes
Definition of the size of the variable
String variables contain words or characters; strings can include numbers but, taken here as
characters, mathematical operations cannot be applied to them
The maximum size of a string variable is 255 characters
The input format for date variables must be defined, such as DD/MM/YYYY, MM/DD/YYYY
or MM/DD/YY
Type
The column headings: Width, Decimals

Width number of digits or characters


This should be great enough to display the largest number or string.
Short string variables (up to 8 characters in length) can be used in some statistical
procedures such as crosstabs, but very few statistical procedures in SPSS can use
long string variables of more than 8 characters. String variables can contain up to
32,767 characters.
While numbers larger than that specfied in “width” can still be entered and will be
recorded correctly, it is not possible to input a string longer than that specified in
“width”.

Decimals number of decimal places (for numeric variables)


The column headings: Label

As the name of the variable is restricted in terms of length and characters it can
contain, it is often useful to have a longer, more meaningful label that will appear in
the tables and graphs that you create.
The label can be up to 256 characters, but some statistical procedures will only
display the first part of a very long label, so it is desirable to keep the first 20 or so
characters unique.
A brief but descriptive definition or display name for the variable. When defined, a
variable's label will appear in the output in place of its name.
For example, if you give the Age variable a label “Age Status”, then “Age Status” will
appear on charts, graphs and tables. To add a label, click inside a cell within the
Label column, and type in the value.
The column headings: Values

Values value labels


While it is possible to record information as text (as a string variable), it is often easier and
more efficient to enter information as a numeric code. In addition, in an analysis of string
variables (e.g. frequency table), the values will be recorded in strict alphabetical order, which
may not be particularly intuitive, particularly if the data is nominal rather than ordinal in
nature. However, we would still want the information to be displayed with meaningful labels
in any tables or graphs. To do this we must define the value labels.
Value labels can be up to 120 characters, but some statistical procedures will only display the
first part of any very long label – so try to keep the first 16 or so characters unique.
Click on the appropriate cell, then click on the grey … box
In the Value: box, enter the relevant value code, and in the Label: box enter the corresponding
label.
Click the Add button.
Repeat the process for other values.
Click OK button when finished.
Defining Variable Values

The Values attribute within


the Variable View allows you
to specify text values that are
associated with particular
numerical values, and then to
view these text values (value
labels) within the Data View
rather than their numerical
equivalent.
Defining Variable Values

To set this up, switch to the Variable View,


and then click on the cell in the Values
column corresponding to the variable you’re
interested in (see above). Click on the ellipsis,
which will bring up the Value Labels dialog
(see screenshot below). Enter a numerical
value for the variable corresponding to the
first of the numerical values you’re using to
code your variable – in our case, that’s a zero.
Then enter the value for the Label, which is
the text you want to associate with the
numerical value (“Female”, in our example).
Click Add, and then either add another value
label (and so on), or click OK to finalize your
changes.
Display Value Labels

Now if you switch back to


the Data View you can see
a bit of magic.

If you hit the toggle button


(as above), you’re able to
switch between the original
numerical coding and the
new value labels. The value
labels are much easier to
read, which is the
advantage of setting them
up, particularly if you have
more than 2 possible
values.
Tip

Tip: In the Data Editor window, you can switch back and forward between viewing
the raw data and the assigned value labels, by selecting/deselecting Value Labels
under the View menu (or clicking on the value labels icon in the toolbar).
The column headings: Missing
Missing missing values
Defines missing values
The values are excluded from some analysis
Options:
Up to 3 discrete missing values (e g. 999)
A range of missing values plus one discrete missing value
In many data collection exercises, some information may be unavailable. This can be for a
variety of reasons (e.g. refusal, not collected, unknown). In the Data Editor, leaving a cell blank
will be interpreted as meaning that the information for that variable is missing for that
particular case. These cases are termed ‘system missing’. Other values that we want to define
as missing are termed ‘user-defined missing’.
For string variables, a blank cell is treated in the same way as any other valid value and is not
regarded as system missing. However, a blank space can be defined as user missing,
provided that it is a ‘short-string’ variable i.e. one which has been defined as 8 characters or
less.
The column headings: Columns, Align

Columns width of column as displayed in Data View


Align alignment within column as displayed in Data View
Columns and Align only affect the way variables are displayed in Data View. It does
not affect how they are displayed in the output from any data analysis.
Align sets whether the contents of the variable appear on the left, centre or right of
the cell in Data View
Numeric variables are right-hand justified by default and string variables left-hand
justified by default; the defaults are generally adequate
The column headings: Measure

Measure
Levels of measurement:
Nominal
Ordinal
Interval
Ratio
In SPSS, interval and ratio are designated together as Scale
The default for string variables is Nominal
The default for numeric variables is Scale
The column headings: Role

Role
Can be used to pre-select variables for analysis.
Default is ‘Input’.
Input - a predictor/independent variable
Target - a dependent variable
Both - may be used as independent or dependent
None - no role assigned
Partition - variable used to partition the data into separate samples
(Split is not used in SPSS Statistics: it is only used in SPSS Modeler).
Saving the file

The file must always be saved in order to save the work that has been done to date:
File/Save
Move to the target directory
Enter a file name
Save
Data Entry into SPSS
General guidelines for data entry
Guideline # 1
Rules & Best Practices for Naming Variables
1. Names can be safely up to 32 characters long (Recommended: 8 characters or less ). Names may
include alphanumeric characters, non-punctuation characters, and a period (.).
2. You can’t have a space in a variable name.
3. Don’t end a variable name with a period.
4. Don’t end a variable name with an underscore.
5. You can use periods and underscores within a variable name.
6. You can use upper and lower case, and a mixture thereof, within a variable name.
7. You can’t use SPSS reserved keywords as a variable name (i.e., you can’t use TEST, ALL, AND, BY, EQ,
GE, GT, LE, LT, NE, NOT, OR, TO or WITH). These are used in the SPSS syntax and if they were permitted,
the software would not be able to distinguish between a command and a variable.
8. Each variable name must be unique; duplication is not allowed. Variable names are not case sensitive.
The names NEWVAR, NewVar, and newvar are all considered identical.
General guidelines

Guideline # 2

Encode categorical variables. Convert letters and words to numbers.

Guideline # 3

Give each case a unique, sequential case number (ID). Place this ID number in the first
column on the left

Guideline # 4

Each variable should be in its own column


Do not combine variables in one column
It is recommended to use 0/1 for 2 groups with 0 as a reference group.
General guidelines

Guideline # 5

All data for a project should be in one spreadsheet. Do not include graphs or
summary statistics in the spreadsheet.

Guideline # 6

Each patient should be entered on a single line or row. Do not copy a patient’s
information to another row to perform subgroup analysis.
General guidelines

Guideline # 7

For yes/no questions, enter “0” for no and “1” for yes.

Do not leave blanks for no.

Do not enter “?”, “*”, or “NA” for missing data because this indicates to the statistical
program that the variable is a string variable.

String variables cannot be used for any arithmetic computation.


General guidelines

Guideline # 8

Put ordinal variables into one column if they are mutually exclusive

Avoid: Preferred:

Pain Pain
Mild Moderate Severe
1 0 0 1
0 1 0 2
0 0 1 3
Guideline # 9

Do not make columns wider then 8 characters, unless absolutely essential.

You might also like