0% found this document useful (0 votes)
44 views

Computing I (1) Melkamu

This document provides information about SPSS and Minitab statistical software. It discusses launching and closing SPSS, the various windows in SPSS like the data editor window, how to enter and work with data, reading data files, customizing outputs, descriptive analysis, graphs and inferential statistics. It also provides information on Minitab windows, data types, opening and saving projects, entering data, sorting data, and explanatory data analysis.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Computing I (1) Melkamu

This document provides information about SPSS and Minitab statistical software. It discusses launching and closing SPSS, the various windows in SPSS like the data editor window, how to enter and work with data, reading data files, customizing outputs, descriptive analysis, graphs and inferential statistics. It also provides information on Minitab windows, data types, opening and saving projects, entering data, sorting data, and explanatory data analysis.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

BAHIR DAR UNIVERSITY

Department of Statistics

Bachelor of Science Degree in Statistics

Statistical computing I (Stat3021)

SPSS and MiniTab Statistical Software

Melkamu Ayana (Assistat Professor of Biostatistics)

Email: [email protected] or [email protected]


Contents

1 SPSS 5
1.1 Launching and Closing SPSS . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 SPSS windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Working with the Data and Variable View Window to enter and save data 12
1.3.1 Data View Window . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Variable View Window . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Entering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.4 Saving Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Reading Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Reading SPSS Data Files . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Reading in a Microsoft Excel File . . . . . . . . . . . . . . . . . . . 21
1.4.3 Reading Text Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Customizing SPSS Outputs and Reporting . . . . . . . . . . . . . . . . . . 22
1.5.1 Editing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.2 Sorting Cases/Variables . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5.3 Merge Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.4 Split File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.5.5 Select Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5.6 Computing Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.5.7 Count Occurrences of Values within Cases . . . . . . . . . . . . . 41
1.5.8 Recoding Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.6 Descriptive Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.6.1 Summary Statistics Using Frequencies . . . . . . . . . . . . . . . . 45
1.7 Creating Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.7.1 Editing Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1.7.2 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . 77
1.7.3 Analysis of cross-classifications Measure of Associations . . . . . . 79
1.8 Inferential Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1.8.1 Testing of Hypothesis About One Population Mean . . . . . . . . 85
1.8.2 Correlation and Linear Regression . . . . . . . . . . . . . . . . . . 91

2 MINITAB 97
2.1 MINITAB Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.3 Basics Computer Technical Words/ Jargon . . . . . . . . . . . . . . . . . 99
2.4 Opening and exiting MINITAB Window . . . . . . . . . . . . . . . . . . . 102
2.5 Entering Data and Saving a Project . . . . . . . . . . . . . . . . . . . . . 102
2.5.1 Entering Data into Worksheet Window . . . . . . . . . . . . . . . 103
2.6 Working on Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.6.1 Changing Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.6.2 Sorting and Ranking Data . . . . . . . . . . . . . . . . . . . . . . . 105
2.6.3 Displaying Data in to Session Window . . . . . . . . . . . . . . . . 107
2.7 Saving a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.8 Data Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.9 Explanatory Data Analysis (EDA) . . . . . . . . . . . . . . . . . . . . . . 110

3
Preface

The field of statistics deals with the collection, presentation, analysis, and use of data to
make decisions, solve problems, and design products and processes. However performing
these steps needs a computer help especially to organize, present and analyze. For this
purpose different statistical softwares have been developed. SPSS, MINITAB, STATA,
R and SAS are the most known statistical softwares and commonly used software pack-
ages.
Statistical software/package is a computer program or set of programs that provides
many different statistical procedures within a unified framework that able to produce a
meaningful information from our inputed data. They allow us to run complex analyses
without getting late, easily understand the behaviors of the data, dig out useful infor-
mation, etc.
In harmonized curriculum of statistical software for undergraduate students in statis-
tics program; statistical computing I contains SPSS and MINITAB. Whereas statistical
computing II encompasses R and SAS statistical softwares. However,for a sake of com-
pleteness and well understanding for statistical computing II and since SAS software is
the most widest software among those mentioned statistical software; an introduction
section of SAS included in statistical computing I, the last part of this Module. The
detail will discussed in statistical computing II.

4
Statistical computing I

1 SPSS

SPSS stands for Statistical Package for the Social Sciences. It is one of the most popular
statistical packages and provides a powerful statistical analysis and data management
system in a graphical environment based on the user interface facility. This program can
be used to analyze data from surveys, experiments, observations, etc. It can perform a
variety of data analyses and presentation functions including:

ˆ Descriptive statistics such as frequencies, charts, graphs, plots, summaries, and


etc; and

ˆ Statistical inferential and multivariate statistical procedures, such as chi-square,


t-test, correlation, regression, analysis of variance (ANOVA), non-parametric test,
and etc.

SPSS is particularly well-suited for survey research, though by no means is it limited to


just this topic of exploration.

1.1 Launching and Closing SPSS

The way that to launch SPSS is from the Start button located at the bottom of the
Windows desktop that is click the button, and then click Programs, and finally SPSS
icon. Click Start button =⇒ All Programs =⇒ SPSS icon. On the other hand, simply
from the shortcut of the desktop. After doing our analysis we able to close by using the
red cross ... at the top corner of the SPSS window but don’t forget to save your work
what you have done so far, unless you can’t reuse your analysis.

5
1.2 SPSS windows

The SPSS window can be displayed in one of the two views: Data View or Variable
View. The Data View displays the contents of the data file in the form of a spreadsheet.
The Variable View defines all variables in the data file. Switching from one view to
the other can be done by clicking the appropriate tab (Data View or Variable View)
at the bottom of the Data Editor window. The SPSS opens with the window looking
approximately either of the one as the picture displayed below.

Figure 1: The SPSS variable view (A) and data view (B) window

SPSS has four windows.

ˆ Data editor

ˆ Output viewer

ˆ Chart editor and

ˆ Syntax editor

6
Each of which is associated with a particular SPSS file type. Each window in SPSS has
their own Menus, Toolbars and Status bar. The common menus are: File, Edit, View,
Analyses, Graphs, Window and Help. The toolbar provides quick and easy access to
common task and a brief description of each tool when you put the mouse pointer on the
tool. The status bar at the bottom of each SPSS window tells what SPSS is currently
doing. SPSS Processor is ready appears in the Status Bar, tells you SPSS is ready to
receive your instructions.

ˆ Data editor
This window displays the contents of the current (working) data file. You can
create new data files or modify existing ones with the Data Editor. The Data
Editor window opens automatically when you start an SPSS session. The most
important components of the Data Editor window are menu bar, toolbar, and
status bar. The components are displayed in the picture below.

Figure 2: Data editor window

7
The menu bar provides easy access to most SPSS features. It consists of many
drop-down menus like File, Edit, View, Data, ... as reveal below.

Figure 3: Menu bar in SPSS window and their corresponding function

SPSS displays a toolbar below the menu bar on the Data Editor window and it
provides quick and easy access to many useful features that you may use frequently.
Clicking once on any of these buttons allows you to perform an action; such as
opening a data file, or selecting a chart for editing. To determine the function of
a tool, place the mouse pointer over the corresponding button, but don’t click the
mouse button. SPSS displays a brief description of the tool in the Status Bar.

Figure 4: Tool bar in SPSS window and what their corresponding notation stands for

8
The status bar at the bottom of each SPSS window apprises the user of the stage
of operations. In particular, for each procedure you run, a case counter indicates
the number of cases processed so far. There are also messages about the selection
of specified subsets of the data set (filter status). The message weight on indicates
that a weight variable is being used to weight cases for analysis. When the state-
ment: SPSS Processor is ready appears in the Status Bar, SPSS is ready to receive
your instructions.

ˆ Output Viewer
The results from running a statistical procedure are displayed in the Viewer. This
window displays the results of any statistical procedures you run and other text.
Tables, summary statistics, and charts, etc are displayed in the Output Viewer.
The Output Viewer opens automatically the first time you run a procedure that
generates output. The window is not accessible until after output has been gener-
ated. In this window, you can easily navigate to whichever part of the output you
want to see. The viewer window is divided into two panes. The outline pane (left
side): contains an outline of all the information stored in the Viewer. The contents
pane (right hand side): contains statistical tables, charts, and text output.

9
Figure 5: The output viewer window

ˆ Chart editor
This window is used to edit charts and plots. It is only displayed after SPSS has
been requested to produce a plot. You can use the window to change the colors,
select different type fonts or sizes, rotate axes, change the chart type, and the like.

10
Figure 6: The chart editor window

ˆ Syntax Editor
Most SPSS commands are accessible from the SPSS menus and dialog boxes. How-
ever, some commands and options are available only by using the SPSS command
language. In this case the Syntax Window is used. You will also use this window
if you wish to run SPSS commands instead of clicking on the pull-down menus.
Each window in SPSS has its own menu bar with menu selections appropriate for
that window type.

11
1.3 Working with the Data and Variable View Window to enter and
save data

1.3.1 Data View Window

The Data View window is a grid, whose rows represent subjects (or cases) and whose
columns contain values of the variables (gender, salary, age etc.) for each subject. Each
cell of the grid, therefore, will usually contain the score of one particular subject on
one particular variable. For example, the salaries of employees in a company can be
presented in a column, and then each employee is a case.

Figure 7: The data view window

The cell is the intersection of the case and the variable. Cells contain only data values.
Unlike spreadsheet programs (Excel), cells in the Data Editor cannot contain formulas.

12
The data file is rectangular. The dimensions of the data file are determined by the
number of cases and variables. Initially, every column in the Data Editor has the heading
var, and all the cells are empty.

1.3.2 Variable View Window

The Variable view contains descriptions of the attributes of each variable in the data
file. In the variable view, Rows are variables and columns are variable attributes. In
this table you can add or delete variables and modify attributes of variables including
variable name, data type, number of digits, ...You can create the data into the Data
Editor window. Creating a new SPSS statistics data file consists of two stages: defining
the variables and entering the data. Defining the variables involves multiple processes
and requires careful planning. Once the variables have been defined, the data can be
added. First assign variable names based on your research questions. If variable names
are not assigned, SPSS statistics provides default names that may not be recognizable
etc. Now, follow the instructions below to define the variables.

Variables: To define a variable make the Variables View the active window (click the
Variable View tab at the bottom of the Data Editor window). This will obtain the
Variable View window. If necessary, assign labels to values to help all users of the file
better understand the data. The variable name must begin with a letter and cannot
end with a period. The length of the name cannot exceed 8 characters. Variable names
that end with an underscore should be avoided. Blanks and special characters cannot
be used (!, ?,” and *).

13
Figure 8: Creating the variable in the variable view window

Enter the new variable name in the column Name in any blank row. For example,
enter the name gender in the first row. After entering the name, the default attributes
(Type, Width,...) are automatically assigned. Then if you click on the Type column,
the variable type sub dialog box appears.
Variable type:

ˆ Numeric, Comma and dot – you can enter values with any number of decimal
positions. The data editor displays only the defined number of decimal positions

ˆ String – all values are right padded to maximum width.

ˆ Date – you can use slashes, dashes, spaces. Commas, or periods as delimiters
between day, month and year. ( example, dd/mm/yy)

ˆ Time you can use colons, periods or spaces.

14
Figure 9: The variable type sub dialog box

Value Label: You can assign descriptive value labels for each value of a variable. It
can be up to 60 characters long. To assign to a label, enter the value in the text box
then enter the label in the label text box then click on Add. To define possible values
of the variable (for instance, gender,possible values M for male and F for female or 1 for
female and 2 for male) click the Values cell in the row for the variable, and then click
the button in the cell.

15
Figure 10: Value label sub dialog box

Missing Values: In many situations, data files do not have complete data on all
variables, that is, there are missing values. You need to inform SPSS when you have
missing values so that all computations are performed correctly. With SPSS, there are
two forms of missing values: system-missing and user defined missing.

ˆ System-missing values are those that SPSS automatically treats as missing.


The most common form of this type of value is when there is a ”blank” in the
data file. For example, a value for a variable may not be entered in the data file if
the information was not provided. When SPSS reads this variable, it will read a
blank, and thus treat the value as though it is missing. Any further computations
involving this variable will proceed without the missing information (computing
the average without the missing value).

ˆ User-defined missing values are those that the user specifically informs SPSS
to treat as missing. Rather than leaving a blank in the data file, numbers are often

16
entered that are meant to represent data. For instance, if gender for some subjects
in our data set is unknown, we could use the number 9 to represent those cases that
were missing information on the variable. You need to inform SPSS that 9is to be
treated as a missing value; otherwise it will treat it as valid. More precisely, select
the Variables View, and click the Missing cell for the variable (gender). Suppose
you define the missing values as displayed in the Missing Values dialog box as
bellow.

Figure 11: The Missing Values dialog box

With this definition of the missing values for the variable gender, SPSS will treat 9 as a
missing value of the variable and not include it in any computations involving the gender
variable.
Measures: Recall that the level of data based on scale of measurement is Nominal,
Ordinal, Interval and Ratio. But SPSS does not differentiate between interval and ratio
levels of measurement, both of these quantitative variable types are lumped together as
”scale”. As a result we should adjust our data in to three levels by clicking measure in

17
variable view window.

Figure 12: Assigning data level

Do the same way for the rest of all variables!

1.3.3 Entering Data

After defining the variables, users can enter data for each variable. Switch from the
Variables View window to the Data View window. Each column represents a variable
and each row represents a case or an observation.Now we are ready to enter the values
of the variables from the record. Clicking on any cell will highlight it (active cell) and
its contents will appear in the cell editor. You can enter the data in any order. Data
values are not recorded until you press Enter or select another cell. Unlike spreadsheet
programs, cells in the Data Editor cannot contain formulas. Note: if variables are
defined with the numeric type, SPSS statistics will only accept numeric digits (0-9)
while, if variables are defined as string data, it will only accept strings.

1.3.4 Saving Data Files

To save a new SPSS data file make the Data Editor the active window and from the
menus choose File and then Save As. . . . The following Save Data As dialog box is

18
displayed.

Figure 13: The Save Data As dialog box

To save the data to an SPSS data file give your file name in the file name and then click
save. Now your data is saved at SPSS directory by default but your data can save on
any directory (like on desktop,. . . ).
Activity One
1.1 Given below is an example of a questionnaire, suppose you have information on
several of such questionnaires. Prepare a data entry format that will help you to enter
your data to SPSS.
1. ID of participants——-
2. Sex – Male –Female
3. Age in year——-
4. Marital Status –Single – Married –Divorced –Widowed
5. Family Income in Birr —– eat out in a week?
–Once –Twice –three times –More than three times

19
6.How frequently do you eat out in a week?
– Once –Twice –three times –More than three times
7. How much do you spend on eating out at one time?
–Below 100 –100-200 –200-300 –More than 300
Save your work as trial 1.sav on desktop.
1.2. The following small data set consists of four variables namely, Agecat, Gender,
Accid and Pop, where agecat and genderare a categorical variable as; Agecat: 1=‘Un-
der 21’ 2= ‘21-25’, 3= ‘26-30’ and Gender: 0= ‘Female’ and 1= ‘Male’ whereas Accid
and Pop are numeric.
After defining these variables in variable view of data editor window, enter the following
data for the variables Agecat, Gender, Accid and Pop respectively. Your data should
appear as given below.
1 1 57997 198522
2 1 57113 203200
3 1 54123 200744
1 0 63936 187791
2 0 64835 195714
3 0 66804 208239
Save your work as trial 2.sav on desktop.

1.4 Reading Data Files

SPSS for Windows can read different types of data files. To read data files, click on File
in the menu bar, then on Open and then on Data. The Open File dialog box is displayed
as follows.

20
Figure 14: The open file dialog box

1.4.1 Reading SPSS Data Files

SPSS data files are easily identified since by default each file name is followed by ”.sav”
extension. To read an SPSS data file, from SPSS menu: File =⇒ Open. This opens
the Open File dialog box. Point the arrow to the data file you wish to open and click
on it. Finally Click OK.
To open your file(data) installed with the SPSS software:

ˆ From the menus choose: File =⇒ Open =⇒ Data...

ˆ Browse to and open your file(.sav) =⇒ OK

1.4.2 Reading in a Microsoft Excel File

While in Excel, only use one row of column heading (in the first row), make the column
headings unique, put all the rows of data onto the same spreadsheet (if you have several
sets of data with the same column in each and which you intend to analyze together,

21
include a column which says which data set they come from, rather than keep them as
separate spreadsheets.). Once you have the Excel file, you can read it into SPSS by
simply: File =⇒ Open =⇒ select Excel from under the option box ”Files of Type,” and
locate the file.
Note: Since Excel can have multiple spread sheets, by default data editor reads the first
worksheet. This box is asking whether the Excel file has variable names that appear
in the first row of the data set. If you do have such variable names, check this box as
above. Doing so, makes SPSS assign names to each of the new variables. You can also
select which worksheet to read in if there are multiple worksheets in the file. Then click
OK to have your data from excel format to SPSS format. Generally, you can read data
from applications such as Microsoft Excel as:

ˆ From the menus choose: File =⇒ Open =⇒ Data...

ˆ Select Excel (*.xls) as the file type you want to view.

ˆ Open file (.xls) =⇒ OK

1.4.3 Reading Text Files

SPSS for Windows can also read raw data files that are in text format. Text data files are
usually identified by the ”.txt” extension. These data files do not contain any additional
information about the file. The data for each subject are recorded in the form of three
values separated by tabs. To read a text data file, File =⇒ Read Text Data. Then,the
Open File dialog box opens. Select your file and click OK. The Text Open Wizard dialog
box opens. The wizard will help you to transfer your data from the text file into the
Data Editor window. The Text Open Wizard uses six steps to open any text file.

1.5 Customizing SPSS Outputs and Reporting

The data file as displayed in the Data View spreadsheet is not always organized in the
appropriate format for a particular use. The Data dropdown menu provides procedures

22
for reorganizing the structure of a datafile. The first four command options from the
Data drop-down menu are concerned with editing or moving within the Data View
spreadsheet.

1.5.1 Editing Data

To delete the old value and enter a new value: click the cell, enter the new value, and
press Enter. Or use the Undo command in Edit to undo any action you just performed.
For example, use the Undo command to delete the value you have just entered in the
Data Editor window.

ˆ Adding New Cases/Variables: You may want to add new variables or cases
to an existing dataset. To insert a new variable, click on the variable name to
select the column in which the variable is to be inserted. To insert a case, select
the row in which the case is to be added. The selected row and column is being
highlighted. Next use the insert option available in the DATA menu in the data
editor. Use either “insert variable” or “insert case” as needed; this produce an
empty row or column. Note that the existing cases and variables will be shifted
down or to the right, respectively.

23
Figure 15: Inserting or adding cases

Figure 16: Inserting or adding variables

24
ˆ Deleting Cases/Variables: To delete a case, click on the case number that you
wish to delete, then click on Edit from the menu bar and then on Clear. The
selected case will be deleted and the rows below will shift upward. To delete a
variable, click on the variable that you wish to delete, then click on Edit from the
menu bar and then on Clear. The selected variable will be deleted.

1.5.2 Sorting Cases/Variables

ˆ Sorting Cases: You can sort the cases (rows) of the active dataset based on
the values of one or more sorting variables. You can sort cases in ascending or
descending order. If you select multiple sort variables, cases are sorted by each
variable within categories of the preceding variable on the Sort list. For example,
if you select gender as the first sorting variable and minority as the second sorting
variable, cases will be sorted by minority classification within each gender category.

To Sort Cases from the menus choose: Data =⇒ sort Cases... =⇒ Select one
or more sorting variables. If you want to save the sorted data directly to a file,
check Save file with sorted data and click File... to specify where you want to save
the file.

25
Figure 17: Sorting cases dialog boxes

ˆ Sorting Variables: You can sort the variables in the active dataset based on the
values of any of the variable attributes (e.g., variable name, data type, measure-
ment level), including custom variable attributes.Values can be sorted in ascending
or descending order.
To Sort Variables: from the menus in Variable View or Data View, choose: Data
=⇒ Sort Variables =⇒ Select the attribute you want to use to sort variables
=⇒ Select the sort order (ascending or descending).

26
Figure 18: Sorting variable dialog boxes

1.5.3 Merge Files

The merging data files function is useful when records are stored in separate files and need
to be combined later. This allows users to import data from one file to another as long as
sets of data (from each file) contain a common identifier (most of the time the common
identifier is ID) for each of the data that user wishes to combine. Merge files allow
either Add Cases. . . or Add Variables. . . to an existing data file. This file can either
contain the same variables but different cases (to add cases) or different variables but
the same cases (to added variables). You can merge data from two files in two different
ways-Adding more new observations (cases) and adding more new variables.

ˆ Adding new (cases) observations: Assume your original data file includes data
from a survey of three places. Now, survey data from a fourth place (say Y) is
available. You want to expand the original file by adding data from the new survey.

27
However, you do not want to add/change the number of variables inthe existing
data file. Essentially, you are appending new (cases) observations to the existing
variable sin the file that does not have data on Y.
To Merge the active dataset with another open dataset or Statistics data file con-
taining the same variables but different cases, open at least one of the data files
that you want to merge. If you have multiple datasets open, make one of the
datasets that you want to merge the active dataset. The cases from this file will
appear first in the new, merged data file. From the menus choose: Data Merge
Files =⇒ Add Cases ... =⇒ Select the dataset or external SPSS Statistics data
file to merge with the active dataset. To Merge Files:Add Cases From the menus
choose: Data =⇒ Merge File =⇒ Select Add Cases =⇒ click OK.

Figure 19: Merging cases or Variables Dialog Box

The new dialog box contains information on all the variables in the original data

28
file [marked with the suffix (*)] and the one from which you want to add data
[marked with the suffix (+)].

Figure 20: Merging cases Dialog Box

Then, remove any variables that you do not want from the Variables in New Active
Dataset list. Finally click OK. Variables that have the same name in both files
are automatically matched and placed in the box“Variables in New Working Data
File.” Other variable names may not correspond. For example, gender in the
working data file (*)corresponds to sex in the file from which you want to add
data(+). You will have to tell SPSS that gender and sex are a pair.
Before doing so, you might want to change the name of the variable sex to gender.
Click on the variable name sex =⇒ Click on the button ”Rename” =⇒ Rename
the variable =⇒ Click on ”Continue” =⇒ make pairs (Click on gender, then
press the control key and,keeping it pressed, click on sex) Now Click on the button

29
”Pair” (This moves the pair into the box on the right side,”Variables in New
Working Data File”.) Finally click OK.

Figure 21: Making pair dialog box

ˆ Adding new Variables: Add Variables merges the active dataset with another
open dataset or external SPSS Statistics data file that contains the same cases
(rows) but different variables (columns). For example, you might want to merge a
data file that contains pre-test results with one that contains post-test results.For
this you will require a concordance key variable (Most of the time it is ID) pair
to match observations across the files and cases must be sorted in the same order
in both datasets. To Merge Files: Add Variables From the menus choose: Data
=⇒ Merge File =⇒ Select Add Variables =⇒ select the dataset or external
SPSS Statistics data file to merge with the active dataset. The suffix (*) indicates
that a variable is from the original data file, the file to which we are adding new

30
variables.The suffix (+) indicates that the variable is from the datafile from which
we are taking variables.
To Select Key Variables: Select the variables from the external file variables (+)
on the Excluded Variables list =⇒ Select Match cases on key variables in sorted
files =⇒ Add the variables to the Key Variables list. Then click OK.

Figure 22: Selecting key variable dialog box

Note: The key variables must exist in both the active dataset and the other
dataset. These are the variables whose correspondence dictates the merge. If the
key variable has a different name in one file than the other, we must first change
the name of the variable by rename.
Move all the variables you do not want in your analysis from the box “New Working
Data File” into the box“Excluded Variables.” (If any,) Click on “OK.”

31
1.5.4 Split File

Split file is a procedure that separate the data file into groups for analysis based
on the values of one or more grouping variables. If you select multiple grouping
variables, cases are grouped by each variable within categories of the preceding
variable on the groups based on list. For example, if you select gender as the
first grouping variable and minority as the second grouping variable, cases will be
grouped by minority classification within each gender category. Cases should be
sorted by values of the grouping variables and in the same order that variables are
listed in the groups based on list.

– Compare groups: Split-file groups are presented together for comparison


purposes. For pivot tables, a single pivot table is created and each split-file
variable can be moved between table dimensions. For charts, a separate chart
is created for each split-file group and the charts are displayed together in the
Viewer.

– Organize output by groups: All results from each procedure are displayed
separately for each split-file group.

To Split a Data File for Analysis, from the menus choose: Data =⇒ Split File
... =⇒ Select Compare groups or Organize output by groups =⇒ Select one or
more grouping variables.
N.B.:If the data file is not already sorted by values of the grouping variables, select
Sort the file by grouping variables.

32
Figure 23: Split file dialog box

1.5.5 Select Cases

There are occasions on which you will want to select a subset of cases from your data
file for a analysis. You may need to select the subset based on formally defined criteria
or randomly in case of a very large data file. Select Cases provides several methods for
selecting a subgroup of cases based on criteria that include variables and complex ex-
pressions. If you have selected a subset of cases but have not discarded unselected cases,
unselected cases are marked in the Data Editor with a diagonal line (slash) through the
row number.
To Select a Subset of Cases: From the menus choose; Data =⇒ select Cases.. =⇒
select one of the methods for selecting cases =⇒ specify the criteria for selecting cases.

33
Figure 24: Select cases dialog box

Note that selecting ALL CASES means turns case filtering off and uses all cases. The
criteria used to define a subgroup can include the following.

ˆ If condition is satisfied: If you have two or more subject groups in your data
and you want to analyze each subject independently, you can use the select cases
option. For example, the data we are currently analyzing has both male and
female participants. However, if you wish to analyze only female cases, then you
select Gender cases and set the condition for female cases only. Therefore, the
”if condition is satisfied” dialog box allows you to select subsets of cases using
conditional expressions. Most conditional expressions use one or more of the six
relational operators (<, >, <=, >=, =, and ∼=) on the calculator pad.Conditional
expressions can include variable names, constants, arithmetic operators, numeric
(and other) functions, logical variables, and relational operators.

34
Follow the following procedures to select cases based on conditional expressions.
From the menus choose:Data =⇒ Select Cases... =⇒ Select If condition is satisfied
=⇒ Click If =⇒ Enter the conditional expression =⇒ click continue =⇒ click
OK.

Figure 25: If condition dialog box

ˆ Random sample of cases: This dialog box allows you to select a random sam-
ple based on an approximate percentage or an exact number of cases. Sampling is
performed without replacement; so, the same case cannot be selected more than
once.
Approximately: Generates a random sample of approximately the specified per-
centage of cases. Since this routine makes an independent pseudo-random decision
for each case, the percentage of cases selected can only approximate the specified
percentage. The more cases there are in the data file, the closer the percentage of
cases selected is to the specified percentage.

35
Exactly (A user-specified number of cases): You must also specify the num-
ber of cases from which to generate the sample. This second number should be less
than or equal to the total number of cases in the data file. If the number exceeds
the total number of cases in the data file, the sample will contain proportionally
fewer cases than the requested number.
A procedure for selecting a random sample is the following. From the menus
choose: Data =⇒ Select Cases... =⇒ Select Random sample of cases =⇒ Click
Sample =⇒ Select the sampling method and enter the percentage or number of
cases.

Figure 26: Select cases by random sample dialog box

ˆ Based on time or case range: This dialog box selects cases based on a range
of case numbers or a range of dates or times.Case ranges are based on row number
as displayed in the Data Editor.Selecting a Range of Cases is as follows.
From the menus choose: Data =⇒ Select Cases... =⇒ Select Based on time or

36
case range =⇒ lick Range =⇒ Enter the starting and ending case numbers.
Note:Date and time ranges are available only for time series data with defined date
variables (Data menu, Define Dates).

Figure 27: select cases by range dialog box

ˆ Use filter variable: Use the selected numeric variable from the data file as the
filter variable. Cases with any value other than 0 or missing for the filter variable
are selected.

37
Figure 28: Filter variable dialog box

1.5.6 Computing Variables

Using a wide variety of mathematical functions, you can compute new variables based on
highly complex equations. Use the Compute dialog box to compute values for a variable
based on numeric transformations of other variables.

ˆ – You can compute values for numeric or string (alphanumeric) variables.

– You can create new variables or replace the values of existing variables. For
new variables, you can also specify the variable type and label.

– You can compute values selectively for subsets of data based on logical con-
ditions.

– You can use a large variety of built-in functions, including arithmetic func-

38
tions, statistical functions, distribution functions, and string functions.

To Compute Variables: from the menus choose: Transform =⇒ Compute Variable... =⇒


Type the name of a single target variable (It can be an existing variable or a new variable
to be added to the active dataset).

Figure 29: Compute Variables dialog box

Example:
You may want to modify the values of the variables in your dataset Your may want to
create new variable from employee data which contains the difference between beginning
and current salary. To do this: from the menus in the Data Editor window choose:
TRANSFORM =⇒ select COMPUTE VARIABLE (computed variable dialog box
appears) =⇒ Write the new variable on TARGET VARIABLE, say ”salchng” =⇒
Select ”salary” and click the transfer arrow button =⇒ write the minus sign (-)
=⇒ Select ”salbegin” and click the transfer arrow button =⇒ click OK. The new

39
variable is displayed in the Data Editor. Since the variable is added to the end of the
file, it is displayed in the far right column in Data View and in the last row in Variable
View. Therefore, the result is as follows.

Figure 30: Creating new variable

Example:
Many statistical procedures for quantitative data are less reliable when the distribution
of data values is markedly non-normal, as is the case with Amount of Last Sale. Some-
times, a transformation of the variable can bring the distribution of values closer to
normal. Create a new variable as “logsale” by taking Ln(sale) from contacts.sav.
Activity
1. Create a new variable as “sumsal” by adding “salary” and “salbegin” from em-
ployee data.sav.
2. Create a new variable as “sqrtsalbeg” by taking sqrt of“salbegin” from employee

40
data.sav.
3. Create a new variable as “jobtimey” by “jobtime/12” from employee data.sav.
4. Create a new variable as “prevexpd” by “prevexp*30” from employee data.sav.
5. Create a new variable as “lnsalary” by taking ln of “salary” from employee
data.sav.

1.5.7 Count Occurrences of Values within Cases

This dialog box creates a variable that counts the occurrences of the same value(s) in a
list of variables for each case. For example, a survey might contain a list of magazines
with yes/no check boxes to indicate which magazines each respondent reads. You could
count the number of yes responses for each respondent to create a new variable that
contains the total number of magazines read.
To Count Occurrences of Values within Cases
From the menus choose: Transform =⇒ Count Values within Cases... =⇒ Enter a
target variable name =⇒ Select two or more variables of the same type (numeric or
string) =⇒ Click Define Values and specify which value or values should be counted.
Optionally, you can define a subset of cases for which to count occurrences of values.
(Consider, survey.sav).

1.5.8 Recoding Variables

Some of the analyses to be performed in SPSS can easily create a new variable that
contains the same information but simply by recoding the variable in your dataset. You
have two options available for recoding variables:
Into Same Variables:This option changes the values of the existing variables.
Into Different Variables:This option creates the new variable and preserves the orig-
inal values of the variable.

41
ˆ Recode into Same Variables: The Recode into Same Variables dialog box
allows you to reassign the values of existing variables or collapse ranges of existing
values into new values. For example, you could collapse salaries into salary range
categories.You can recode numeric and string variables. If you select multiple
variables, they must all be the same type. You cannot recode numeric and string
variables together.
To Recode Values of a Variable
From the menus choose: Transform =⇒ Recode into Same Variables... =⇒ Select
the variables you want to recode [If you select multiple variables, they must be the
same type (numeric or string)] =⇒ Click Old and New Values and specify how to
recode values.

Figure 31: Recoding into same variable dialog box

ˆ Recoding Variables into Different Variables: The Recode into Different Vari-
ables dialog box allows you to reassign the values of existing variables or collapse

42
ranges of existing values into new values for a new variable. For example, you
could collapse salaries into a new variable containing salary-range categories.
To Recode Values of a Variable into a New Variable
From the menus choose: Transform =⇒ Recode into Different Variables... =⇒
Select the variables you want to recode [If you select multiple variables, they must
be the same type (numeric or string)] =⇒ Enter an output (new) variable name
for each new variable and click Change =⇒ Click Old and New Values and specify
how to recode values.

Figure 32: Recoding Variables into Different Variables dialog box

Automatic Recode
The Automatic Recode dialog box allows you to convert string and numeric values into
consecutive integers. When category codes are not sequential, the resulting empty cells
reduce performance and increase memory requirements for many procedures. Addition-
ally, some procedures cannot use string variables, and some require consecutive integer

43
values for factor levels.

Figure 33: Automatic Recode dialog box

ˆ The new variable(s) created by Automatic Recode retain any defined variable and
value labels from the old variable. For any values without a defined value label,
the original value is used as the label for the recoded value. A table displays the
old and new values and value labels.

ˆ String values are recoded in alphabetical order, with uppercase letters preceding
their lowercase counterparts.

ˆ Missing values are recoded into missing values higher than any non missing values,
with their order preserved. For example, if the original variable has 10 non missing
values, the lowest missing value would be recoded to 11, and the value 11 would
be a missing value for the new variable.

Use the same recoding scheme for all variables: This option allows you to apply

44
a single auto recoding scheme to all the selected variables, yielding a consistent coding
scheme for all the new variables. If you select this option, the following rules and
limitations apply:

ˆ All variables must be of the same type (numeric or string).

ˆ All observed values for all selected variables are used to create a sorted order of
values to recode into sequential integers.

ˆ User-missing values for the new variables are based on the first variable in the list
with defined user-missing values. All other values from other original variables,
except for system-missing, are treated as valid.

Activity
1. Create a new variable with variable name “”waittime”” for the following data.
2. Recode this data in deferent variable with name “”classwait”” based on 1=41-90,
2=91-140, 3= 141-190, 4=191-240, 5=241-290, 6=291-340.

1.6 Descriptive Data Analysis

1.6.1 Summary Statistics Using Frequencies

Summaries of individual variables provide an important ”first look” at your data. Some
of the tasks that these summaries help you to complete are:
Checking the quality of the data- are there missing values? Are there values that
should be recoded?
Determining ”typical” values of the variables-what values occur most often?
What range of values are you likely to see?
Checking the assumptions for statistical procedures-do you have enough obser-
vations? For each variable, is the observed distribution of values adequate?
The Frequencies procedure is useful for obtaining summaries of individual variables. The
following examples show how Frequencies can be used to analyze variables measured at

45
nominal, ordinal, and scale levels.

ˆ Using Frequencies to Study Nominal Data: You manage a team that sells
computer hardware to software development companies. At each company, your
representatives have a primary contact. You have categorized these contacts by
the department of the company in which they work (Development, Computer Ser-
vices, Finance, Other, Don’t Know). This information is obtained in contacts.sav
form the sample data. Use Frequencies to study the distribution of departments
to see if it meshes with your goals.
To run a frequencies analysis, from the menus choose:
Analyze =⇒ Descriptive Statistics =⇒ Frequencies... =⇒ =⇒ Select Depart-
ment as an analysis variable. (Open Contact.sav from sample data).

Figure 34: Creating frequency table dialog box

46
Finally, Click OK in the Frequencies dialog box. The procedure produces a fre-
quency table and pie chart for the variable dept.

Figure 35: Output window

The frequency table shows the precise frequencies for each category. For example,
the Frequency column reports that 30 of your contacts come from the computer
services department.This is equivalent to 42.9% of the total number of contacts are
from the computer services department. You can also see that the departmental
information is missing for 11.4% of your contacts.

47
Table 1: Frequency table for nominal data

ˆ Using Frequencies to Study Ordinal Data:


In addition to the department of each contact, you have recorded their company
ranks. Use Frequencies to study the distribution of company ranks to see if it
meshes with your goals.
To summarize the company ranks of your contacts, from the menus choose: An-
alyze =⇒ Descriptive Statistics =⇒ Frequencies... =⇒ Click Reset to
restore the default settings. Select Company Rank as an analysis variable. Click
OK in the Frequencies dialog box.
The procedure produces a frequency table and bar chart with the categories or-
dered by descending value.

48
Figure 36: Output window for ordinal data

The frequency table for ordinal data serves much the same purpose as the table
for nominal data. For example, you can see from the table that 15.7% of your con-
tacts are junior managers.However, when studying ordinal data, the Cumulative
Percent is much more useful. The table, since it has been ordered by ascending
values, shows that 67.8% of your contacts are of at most senior manager rank.

49
Table 2: Frequency table for ordinal data

ˆ Using Frequencies to Study Scale Data: For each account, you have also
kept track of the amount of the last sale, in thousands. You can use Frequencies
to study the distribution of purchases.
To summarize the amounts of the last sales, from the menus choose: Analyze =⇒
Descriptive Statistics =⇒ Frequencies... =⇒ Click Reset to restore the default
settings =⇒ Select Amount of Last Sale as an analysis variable =⇒ Click OK
in the Frequencies dialog box.

Table 3: Frequency table for scale data

Frequency table is useful mainly for categorical variables, i.e., where the values represent

50
categories such as male/female,... etc. However, scale variables have many values and
for these continuous variables, statistics like the mean and standard deviation etc are
sometimes useful. It is a good idea to turn off the display of frequency tables for scale
data because scale variables usually have many different values. Because there are other
statistics to summary scale data like Quartiles, Std. deviation, Minimum, Maximum,
Mean, Median, Skewness, and Kurtosis.
Now reselect display frequency tables =⇒ Click OK from the pop up dialog box
Click Statistics in the Frequencies dialog box. Check some statistics what you need in
the frequency statistics dialog box (for example, Mean, Median, Mode, SD...).

Figure 37: Creating descriptive Statistics dialog box

Click Continue =⇒ Click OK in the frequencies dialog box. The statistics table
tells you several interesting things about the distribution of sale, starting with the five-
number summary.

51
Table 4: Descriptive statistics table for last sale

The center of the distribution can be approximated by the median (or second quartile)
20.25, and half of the data values fall between 12.0 and 52.875, the first and third quar-
tiles. Also, the most extreme values are 6.0 and 776.5, the minimum and maximum. The
mean is quite different from the median, suggesting that the distribution is asymmetric.
This suspicion is confirmed by the large positive skewness, which shows that sale has a
long right tail. That is, the distribution is asymmetric, with some distant values in a
positive direction from the center of the distribution. Most variables with a finite lower
limit (for example, 0) but no fixed upper limit tends to be positively skewed. The large
positive skewness, in addition to skewing the mean to the right of the median, inflates
the standard deviation to a point where it is no longer useful as a measure of the spread
of data values. The large positive kurtosis tells you that the distribution of sale is more
peaked and has heavier tails than the normal distribution. With continuous variables,
frequency tables are not always the best method of summarizing. A better potion would
to use a selection of Descriptive. . . in place of a frequency table. To demonstrate these,

52
consider the following procedures. A telecommunications company maintains a customer
database that includes, among other things, information on how much each customer
spent on long distance, toll-free, equipment rental, calling card, and wireless services in
the previous month. This information is obtained in telco.sav. Use Descriptives: To
study customer spending to determine which services are most profitable. Note: The
most customers don’t have every service, so a lot of 0’s are being counted. So, treat
0’s as missing values by recoding in to the same variables (Long distance last month,
Toll free last month, Equipment last month, Calling card last month, and Wireless last
month) as: Select Long distance last month, Toll free last month, Equipment last month,
Calling card last month, and Wireless last month as numeric variables. Click Old and
New Values =⇒ Type 0 as the Old Value =⇒ Select System-missing New Value =⇒
Click Add.
To run a Descriptive analysis, from the menus choose: Analyze =⇒ Descriptive Statis-
tics =⇒ Descriptives ... =⇒ Select cardmon, equipmon, longmon, tollmon,and wire-
monas analysis variables.

Figure 38: Creating descriptive Statistics dialog box

53
Click Continue =⇒ Click OK in the Descriptives ... dialog box. Then the following
summary statistics is found in the output view window.
Table 5: Summary statistics

The above statistics table which can be used to compare the amounts spent on each
service. For instance, wireless and equipment rental services bring in far more revenue
per customer than other services. Moreover, Long distance has one of the lowest standard
deviations.

1.7 Creating Graphs

There are different ways of creating graphs in SPSS. There is a selection of different
graphs under the Graphs from the main menu but graphs are options in analytical pro-
cedures as well for example there is a chart option in the frequencies procedure and a
profile plot option in the ANOVA procedure. This tutorial focuses on producing graphs
using Chart Builder and Legacy Dialogs, which can be found on the SPSS menu
bar under graphs.

ˆ Chart Builder: The Chart Builder allows you to build charts from predefined
gallery charts or from the individual parts (for example, axes and bars). The easiest
method for building charts is to use the gallery.You build a chart by dragging and

54
dropping the gallery charts or basic elements onto the canvas, which is the large
area to the right of the Variables list in the Chart Builder dialog box. See Chart
Builder layout and terms for an illustration of the Chart Builder dialog box below.
Chart builder layout and terms are:

Figure 39: Chart builder layout and terms

Canvas: The canvas is the area of the Chart Builder dialog box where you build
the chart.
Variables list: The variables list displays the available variables.
Drop zones: are the areas on the canvas to which you drag and drop a variable
from the Variables list.
Following are general steps for building a chart (Bar Chart) from the gallery.

– From the menus choose: Graphs =⇒ Chart Builder .The Chart Builder
dialog box is an interactive window that allows you to preview how a chart

55
will look while you build it.

Figure 40: Chart Builder dialog box

– Click the Gallery tab if it is not selected.


The Gallery includes many different predefined charts, which are organized
by chart type. The basic Elements tab also provides basic elements (such as
axes and graphic elements) for creating charts from scratch, but it’s easier to
use the Gallery.

– Click Bar if it is not selected.

– Drag the icon for the simple bar chart onto the ”canvas” which is the large
area above the Gallery. The Chart Builder displays a preview of the chart on
the canvas. Note that the data used to draw the chart are not your actual
data. They are example data.

56
Figure 41: Bar chart on chart builder canvas

– Defining variables and statistics.


Although there is a chart on the canvas, it is not complete because there
are no variables or statistics to control how tall the bars are and to specify
which variable category corresponds to each bar. You cannot have a chart
without variables and statistics. You add variables by dragging them from
the Variables list, which is located to the left of the canvas.

– Now drag Job satisfaction from the Variables list to the x axis drop zone.

– Again, drag gender from the Variables list to the grouping drop zone.

– Return to the Chart Builder dialog box and drag Household income in thou-
sands from the Variables list to the y axis drop zone.
Because the variable on the y axis is scalar and the x axis variable is cate-
gorical (ordinal is a type of categorical measurement level), the y axis drops
zone defaults to the Mean statistic. The Statistic drop-down list in the El-

57
ement Properties window shows the specific statistics that are available.
The same statistics are usually available for every chart type. Be aware that
some statistics require that the y axis drop zone contains a variable.

Figure 42: chart editing dialog box

Note: If you need to change statistics or modify attributes of the axes or


legends (such as the scale range), click Element Properties (in the Edit
Properties Of list, select the item you want to change. After making any
changes, click Apply). If you need to add more variables to the chart (for
example, for clustering or paneling), click the Groups/Point ID tab in
the Chart Builder dialog box and select one or more options. Then drag
categorical variables to the new drop zones that appear on the canvas. If you
want to transpose the chart (for example, to make the bars horizontal), click
the Basic Elements tab and then click Transpose.

– Click OK to create the chart. The chart is displayed in the Viewer.

58
Figure 43: Bar Chart

The bar chart reveals that respondents who are more satisfied with their jobs
tend to have higher household incomes.

ˆ Legacy Dialogs: The legacy dialogs contain a collection of the most commonly
used charts. These include: Bar charts- Simple, stacked, clustered; Line charts-
Simple and grouped (multi-line); Area charts- Simple and stacked; Pie charts;
Scatter-plots and dot plots; Histograms; Box-plots-Simple and clustered.

A. Bar charts
Bar charts are useful for summarizing categorical variables. Bar Charts allow you
to make selections that determine the type of chart you obtain. Select the icon for
the chart type you want and select the option under the Data in Chart Are group
that best describes your data. For example, you can use a bar chart to show the

59
number of men and the number of women who participated in a survey or you can
use a bar chart to show the mean salary for men and the mean salary for women.
You can click on the icons below to browse examples of available chart types.

Clustering and stacking add dimensionality within the chart. Clustering splits one
bar into multiple bars, and stacking creates segments in each bar. Be careful that
you choose the right statistic for stacking. When the values are added together
(stacked), the result must make sense.
Obtaining Simple, Clustered, or Stacked Bar Charts From the menus choose:
Graphs =⇒ Legacy Dialogs =⇒ Bar

– In the Bar Charts dialog box, select the icon for Simple, Clustered, or Stacked.

– Select the option under the Data in Chart Are group that best fits your data.

– Click Define.

– Select variables and options for the chart.

– Click OK in the dialog box

60
Figure 44: Define Simple bar dialog box and the graph

Figure 45: Define Clustered Bar dialog box and the graph

61
Figure 46: Define Clustered Bar dialog box and the graph

B. Pie Charts
A pie chart is useful for comparing proportions. For example, you may use a pie
chart to demonstrate that a greater proportion of women are enrolled in a certain
class.Pie Charts allow you to specify how data are represented in the chart. Select
the option under the Data in Chart Are group that best describes your data.You
can click on the icons below to browse examples of available chart types. If you
find an example that looks like the chart you want, click on the How To button
next to that example for specific instructions on how to create that chart.
Obtaining Pie Charts from the menus choose: Graphs =⇒ Legacy Dialogs =⇒
Pie.

– In the Pie Charts dialog box, select an option under the Data in Chart Are
group.

– Click Define.

62
– Select variables and options for the chart.

– Click OK in the dialog box

Figure 47: Define pie chart dialog box and the graph

The result of the statistic determines the size of each slice.

C. Line Charts
Line Charts allows you to make selections that determine the type of chart you
obtain. Select the icon for the chart type you want and select the option under
the Data in Chart Are group that best describes your data. You can use a line
chart to summarize categorical variables, in which case it is similar to a bar chart.
Line charts are also useful for time-series data. You can click on the icons below
to browse examples of available chart types. If you find an example that looks like
the chart you want, click on the How To button next to that example for specific
instructions on how to obtain that chart.

63
Obtaining Simple, Multiple, or Drop-Line Charts

– From the menus choose: Graphs Legacy Dialogs Line

– In the Line Charts dialog box, select the icon for Simple, Multiple, or Drop-
line.

– Select an option under the Data in Chart Are group.

– Click Define

– Select variables and options for the chart.

– Click OK in the dialog box

D. Scatter-plots and dot plots


There are several broad categories of charts created with the point graphic element.
Scatter-plots: These are useful for plotting multivariate data. They can help
you determine potential relationships among scale variables. A simple scatter-plot
uses a 2-D coordinate system to plot two variables. A 3-D scatter-plot uses a 3-D
coordinate system to plot three variables. When you need to plot more variables,
you can try overlay scatter-plots and scatter-plot matrices (SPLOMs). An overlay
scatter-plot displays overlaid pairs of x-y variables, with each pair distinguished by
color or shape. A SPLOM creates a matrix of 2-D scatter-plots, with each variable
plotted against every other variable in the SPLOM.

64
Figure 48: The Scatter-plot dialog box

Simple Scatter-plot: Plots two numeric variables against each other. You must
select a variable for the y axis and a variable for the x axis. These variables must
be numeric but should not be in date format.
Overlay Scatter-plots: Plots two or more variable pairs. Select at least two
pairs of variables. Variables should be numeric but should not be in date format.
You can select a numeric or a string variable and move it into the Label Cases By
field. You can label points on the plot with this variable.
Scatter-plot Matrix: Plots all possible combinations of two or more numeric
variables against one another. Select at least two Matrix Variables. These variables
must be numeric but should not be in date format. You may select a variable and
move it into the Set Markers By field. Each value of this variable is marked by a
different symbol on the scatter-plot. This variable may be numeric or string. You
can select a numeric or a string variable and move it into the Label Cases By field.

65
You can label points on the plot with this variable.
3-D Scatter-plots: Plots three numeric variables in three dimensions. Select one
variable for the y axis, one for the x axis, and one for the z axis. These variables
must be numeric but should not be in date format.
Dot plots: These are useful for showing the distribution of a single scale variable.
The data are binned, but, instead of one value for each bin (like a count), all the
points in each bin are displayed and stacked. These graphs are sometimes called
density plots.
Obtaining Scatter-plots
From the menus choose: Graphs =⇒ Legacy Dialogs =⇒ Scatter/Dot

– In the Scatter-plot dialog box, select the icon for Simple, Overlay, Matrix,
3-D, or Simple Dot.

– Click Define.

– Select variables and options for the chart

66
Figure 49: Scatter plot dialog box

– Click OK in the dialog box

E. Box-plots
These alternatives control the display of box-plots when you have more than one
dependent variable. Factor levels together generate a separate display for each
dependent variable. Within a display, box-plots are shown for each of the groups
defined by a factor variable. Dependents together generate a separate display for
each group defined by a factor variable. Within a display, box-plots are shown
side by side for each dependent variable. This display is particularly useful when
the different variables represent a single characteristic measured at different times.
Box-plots allow you to compare each group using a five-number summary: the
median, the 25th and 75th percentiles, and the minimum and maximum observed
values that are not statistically outlying. Outliers and extreme values are given

67
special attention. The heavy black line inside each box marks the 50th percentile,
or median, of that distribution. The lower and upper hinges, or box boundaries,
mark the 25th and 75th percentiles of each distribution, respectively. Whiskers
appear above and below the hinges. Whiskers are vertical lines ending in horizontal
lines at the largest and smallest observed values that are not statistical outliers
while, Outliers are identified with dots (.) and Extreme values are marked with
an asterisk (*). The Box-plot may simple or clustered.
Simple Box-plot: Creates a box-plot summarizing a single numeric variable
within categories of another variable. Each box shows the median, quartiles, and
extreme values within a category.
Clustered Box-plot: Creates a box-plot summarizing the median, quartiles, and
extreme values for a single numeric variable, within clusters defined by a categorical
variable. Each box within a cluster is defined by a second categorical variable.
Obtaining Simple and Clustered Boxplots
From the menus choose: Graphs =⇒ Legacy Dialogs =⇒ Box-plot

– In the Boxplot dialog box, select the icon for Simple or Clustered.

– Select an option under the Data in Chart Are group.

– Click Define.

68
Figure 50: The Box-plot dialog box

– Select variables and options for the chart.

– Click OK in the dialog box

F. Histograms
A histogram displays the distribution of a quantitative variable by showing the
relative concentration of data points along different intervals or sections of the
scale on which the data are measured.
Histograms are useful for showing the distribution of a single scale variable. Data
are binned and summarized using a count or percentage statistic. A variation of
a histogram is a frequency polygon, which is like a typical histogram except that
the area graphic element is used instead of the bar graphic element.
How to create a histogram
From the menus choose: Graphs =⇒ Legacy Dialogs =⇒ Histogram

69
– Select a numeric variable for Variable in the Histogram dialog box.

– Optionally, select Display normal curve to display a normal curve on the


histogram.

– Click OK in the dialog box

Another variation of the histogram is the population pyramid. A population pyra-


mid shows the distribution of a variable across categories. It is two back-to-back
histograms (when the distribution variable is scale) or two back-to-back bar charts
(when the distribution variable is categorical), with the bars in the chart being
horizontal rather than vertical. When there are more than two categories, cre-
ating the chart results in more than one population pyramid, depending on the
number of categories. For example, if there were four categories, there would be
two population pyramids, one for each category pair.
Population pyramids are often used for demographic data. A common population
pyramid displays the counts for age groups in each gender, with the youngest age
at the bottom. A viewer can easily distinguish differences among the age groups
and between genders because of the different bar lengths and the symmetry be-
tween halves.
How to create a population pyramid From the menus choose: Graphs =⇒ Legacy
Dialogs =⇒ Population Pyramid.

– In the Define Population Pyramid dialog box, specify whether counts are com-
puted from the data or are taken from a variable that contains pre-aggregated
values. For example, if you have a variable that contains population values
for different age groups, you would select Get Counts from variable and
drag and drop the variable into the Variable field. If none of the variables
contains pre-aggregated data, select Compute counts from data.

70
Figure 51: Population pyramid dialog box

– Click OK in the dialog box

71
Figure 52: Population pyramid

1.7.1 Editing Charts

You can create and edit a wide variety of chart types. In this section, we will
create and edit bar charts but you can apply the principles to any chart type. To
demonstrate, we will create a bar chart of mean income for different levels of job
satisfaction by using the data file demo.sav.
The Chart Editor provides a powerful, easy-to-use environment where you can
customize your charts and explore your data. The Chart Editor features:

– Simple, intuitive user interface: You can quickly select and edit parts of
the chart using menus, context menus, and toolbars. You can also enter text
directly on a chart.

– Wide range of formatting and statistical options: You can choose from
a full range of styles and statistical options.

72
– Powerful exploratory tools: You can explore your data in various ways,
such as by labeling, reordering, and rotating it. You can change chart types
and the roles of variables in the chart. You can also add distribution curves
and fit, interpolation, and reference lines.

– Flexible templates for consistent look and behavior: You can create
customized templates and use them to easily create charts with the look and
options that you want. For example, if you always want a specific orientation
for axis labels, you can specify the orientation in a template and apply the
template to other charts.

You can create and edit a wide variety of chart types. To demonstrate the basics of
chart creation, we will create a bar chart of mean income for different levels of job
satisfaction. This example uses the data file demo.sav. From the menus choose:
Graphs =⇒ Chart Builder... =⇒ Click the Gallery tab if it is not selected =⇒
Click Bar =⇒ Drag the icon for the simple bar chart onto the ”canvas”. You add
variables by dragging them from the Variables list, which is located to the left of
the canvas.
Then, Click Element Properties to display the Element Properties window [The
Element Properties window allows you to change the properties of the various chart
elements. These elements include the graphic elements (such as the bars in the bar
chart) and the axes on the chart. Select one of the elements in the Edit Properties
of list to change the properties associated with that element.]

73
Figure 53: Chart editor dialog box

How to View the Chart Editor

– Create a chart in IBM® SPSS® Statistics, or open a Viewer file with charts.

– Double-click a chart in the Viewer.

The chart is displayed in the chart editor as the following.

74
Figure 54: chart editor window

The Chart Editor provides various methods for manipulating charts.


Menus: Many actions that you can perform in the Chart Editor are done with
the menus, especially when you are adding an item to the chart. For example, you
use the menus to add a fit line to a scatter-plot. After adding an item to the chart,
you often use the Properties dialog box to specify options for the added item.
Properties Dialog Box: Options for the chart and its chart elements can be
found in the Properties dialog box.
To view the Properties dialog box, you can:
Double-click a chart element or Select a chart element, and then from the menus
choose: Edit > Properties.
Additionally, the Properties dialog box automatically appears when you add an
item to the chart.

75
Figure 55: color editor window

The Properties dialog box has tabs that allow you to set the options and make
other changes to a chart. The tabs that you see in the Properties dialog box are
based on your current selection. Some tabs include a preview to give you an idea of
how the changes will affect the selection when you apply them. However, the chart
itself does not reflect your changes until you click Apply. You can make changes
on more than one tab before you click Apply. If you have to change the selection to
modify a different element on the chart, click Apply before changing the selection.
If you do not click Apply before changing the selection, clicking Apply at a later
point will apply changes only to the element or elements currently selected.
Depending on your selection, only certain settings will be available. The help for
the individual tabs specifies what you need to select to view the tabs. If multiple
elements are selected, you can change only those settings that are common to all
the elements.
Toolbars: The toolbars provide a shortcut for some of the functionality in the
Properties dialog box. For example, instead of using the Text tab in the Properties
dialog box, you can use the Edit toolbar to change the font and style of the text.
Saving the Changes: Chart modifications are saved when you close the Chart
Editor. The modified chart is subsequently displayed in the Viewer.

76
Figure 56: Edited bar chart

1.7.2 Exploratory Data Analysis

Exploring data can help to determine whether the statistical techniques that you
are considering for data analysis are appropriate. The Explore procedure provides
a variety of visual (graphical) and numerical summaries of the data, either for
all cases or separately for groups of cases. The dependent variable must be a
scale variable, while the grouping variables may be ordinal or nominal with the
Explore procedure, you can screen data, identify outliers, check assumptions and
characterize differences among groups of cases To Explore Data: From the menus
choose:Analyze /implies Descriptive Statistics /implies Explore.../impliesSelect
one or more dependent variables.

– Optionally, you can select one or more factor variables, whose values will
define groups of cases.

– Click Plots for histograms, normal probability plots and tests, and spread-

77
versus-level plots with Levene’s statistics .

– Click Options for the treatment of missing values

Figure 57: Exploring data dialog box

Then finally click OK in the explore dialog box.


The following output is explored by the above procedures.
Table 7: Descriptive Statistics Table

78
1.7.3 Analysis of cross-classifications Measure of Associations

The Crosstabs procedure forms two-way and multi-way tables and provides a variety of
tests and measures of association for two-way tables. The Crosstabs procedure offers
tests of independence and measures of association and agreement for nominal and ordinal
data. You can also test for significant differences in column proportions in the cross-
tabulation table. For example, to determine customer satisfaction rates, a retail company
conducted surveys of 582 customers at 4 store locations. From the survey results, you
found that the quality of customer service was the most important factor to a customer’s
overall satisfaction. Given this information, you want to test whether each of the store
locations provides a similar and adequate level of customer service. The results of the
survey are stored in satisf.sav. Use the Crosstabs procedure to test the hypothesis that
the levels of service satisfaction are constant across stores.
To run a Crosstabs analysis, from the menus choose:
Analyze =⇒ Descriptive Statistics =⇒ Crosstabs... =⇒ Select Store as the row
variable and Select Service satisfaction as the column variable.

Figure 58: Creating Crosstabs dialog box

79
Click OK in the Crosstabs dialog box.
Table 8: Cross-tabulation table

The cross-tabulation shows the frequency of each response at each store location. From
the above table, at each store, most responses occur in the middle (from somewhat
negative-to- somewhat positive). Store 2 appears to have fewer satisfied customers
(27+19=64) whereas store 4 appears to have fewer dissatisfied customers (15+20=35)
among the rest.

Chi-Square test
Although examination of the various row and column percentages in a cross-tabulation
is a useful first step in studying the relationship between two variables, row and col-
umn percentages do not allow for qualification or testing of that relationship. For these
purposes, it is useful to consider various indexes that measure the extent of association
as well as statistical tests of the hypothesis that is no association. Therefore, the Chi
Square Statistic is used to measure the association of the row and column variables.
The chi-square test measures the discrepancy between the observed cell counts and what
you would expect if the rows and columns were unrelated.
For tables with two rows and two columns, select Chi-square to calculate the Pearson
chi-square, the likelihood-ratio chi-square, Fisher’s exact test, and Yates’ corrected chi-
square (continuity correction). For 2 Ö 2 tables, Fisher’s exact test is computed when a

80
table that does not result from missing rows or columns in a larger table has a cell with
an expected frequency of less than 5. Yates’ corrected chi-square is computed for all
other 2 Ö 2 tables. For tables with any number of rows and columns, select Chi-square
to calculate the Pearson chi-square and the likelihood-ratio chi-square. When both table
variables are quantitative, Chi-square yields the linear-by-linear association test.
To produce a chi-square measure, in the crosstabs dialog box click on the statistics...
push button and check in the chi-square check box and then, continue.

Figure 59: Selecting chi-square dialog box

Click OK in the crosstabs dialog box.


Table 9: Chi-square test output

The two-sided asymptotic significance of the chi-square statistic is greater than 0.05, so
it is safe to say that the differences are due to chance variation, which implies that each

81
store offers the same level of customer service.
However, not all customers surveyed had contact with a service representative. The
ratings from these customers will not reflect the actual quality of service at a store, so
you further cross-classify by whether they had contact with a service representative. So,
go back to the Crosstabs dialog box and select Contact with employee as a layer variable.
Click OK in the Crosstabs dialog box.
Table 9: Cross-tabulation of three variables

The cross-tabulation now splits the previous cross-tabulation into two parts.
The chi-square test is performed separately for customers who did and did not have
contact with a store representative.
Table 9: Chi-square test

82
The significance value of the test for customers who had contact with an employee is
0.012. Since this value is less than 0.05, you can conclude that the relationship observed
in the cross-tabulation is real and not due to chance.

1.8 Inferential Statistics

Statistical inference is the process of using the characteristics of a sample to make state-
ments about the population from which it is drawn.
There are two main areas of inferential statistics:
Estimating parameters:This means taking a statistic from your sample data (for ex-
ample the sample mean) and using it to say something about a population parameter
(i.e. the population mean).
Hypothesis tests: This is where you can use sample data to answer research questions.
For example, you might be interested in knowing if a new cancer drug is effective. Or if
breakfast helps children perform better in schools.
-For example: Mean: sample mean estimates population mean. Standard devia-
tiont: Sample standard deviation estimates population standard deviation.
Two ways of estimates: point estimates and interval estimates.
A point estimate is a single value used as an estimate of a population parameter. The
objective of estimation is to determine the approximate value of a population param-
eter on the basis of a sample statistic. For example, suppose we want to estimate the

83
mean income of statistics students. For n=25 students, the mean income is calculated
to be = 40 /week(point estimate). Let say, the mean income is between 380 and 420
/week(interval estimate). Note: Point estimate is always within the interval esti-
mate.
Hypothesis Testing
Some common terms
A hypothesis is a statement or assertion about the parameter of population (about
the true value of an unknown population parameter).
A statistical test is a statistical rule by which a statistical hypothesis is accepted or
rejected.
p-value: The probability that the tabulated value will greater than the calculated test
of statistic value. We claim a significant effect if the p value is smaller than a conven-
tional significance level (such as 0.05). There are two types of hypotheses. i.e Null and
Alternative Hypotheses .
H0: Null Hypothesis states the Assumption to be tested
H1: Alternative Hypothesis is the opposite of the null. It may or may not be accepted
and it is the hypothesis that is believed to be true by the researcher.
General Steps in Hypothesis Testing

1. State the H0
2. State the H1
3. Choose α
4. Set up critical value(s)
5. Compute test statistic and p-value
6. Make statistical decision
7. Express conclusion

84
1.8.1 Testing of Hypothesis About One Population Mean

The t-test may be:


1. The one sample t-test
2. The independent samples t-test
3. The paired t-test

One-Sample T-Test
The one-sample t test can be used whenever sample means must be compared to a known
test value. The one-sample t test assumes that the data be reasonably normally dis-
tributed. The One-Sample t-test procedure tests whether the mean of a single variable
differs from a specified constant.
To perform One sample t-test: Analysis =⇒ Compare Means =⇒ One Samples
t-test.
Example 1: A manufacturer of high-performance automobiles produces disc brakes
that must measure 322 millimeters in diameter. Quality control randomly draws 16
discs made by each of eight production machines and measures their diameters. This
example uses the file “brakes.sav”. Use One Sample T Test to determine whether the
mean diameters of the brakes in each sample significantly differ from 322 millimeters.
Select Brakes as the test variable =⇒ Type 322 as the test value.
The output of the above procedure is given as follows.

85
The Descriptive table displays the sample size, mean, standard deviation, and standard
error for each of the eight samples. The sample means disperse around the 322mm stan-
dard by what appears to be a small amount of variation. The test statistic table shows
the results of the one-sample t test. The t column displays the observed t statistic for
each sample, calculated as the ratio of the mean difference divided by the standard error
of the sample mean. The df column displays degrees of freedom. In this case, this equals
the number of cases in each group minus 1.

Table 10: Parameter estimates using one sample t-test

The column labeled Sig. (2-tailed) displays a probability from the t distribution with
15 degrees of freedom. The value listed is the probability of obtaining an absolute value
greater than or equal to the observed t statistic, if the difference between the sample
mean and the test value is purely random.
The Mean Difference is obtained by subtracting the test value (322 in this example)
from each sample mean.
The 95% Confidence Interval of the Difference provides an estimate of the boundaries
between which the true mean difference lies in 90% of all possible random samples of
16-disc brakes produced by this machine. Since their confidence intervals lie entirely

86
above 0.0, you can safely say that machines 2, 5 and 7 are producing discs that are
significantly wider than 322mm on the average.
Exercise 1: Consider the demo.sav data and test whether the house hold income in
thousand is statistically different from 60.

Independent-Samples T Test
The Independent-Samples T Test procedure compares means for two groups of cases.
Ideally, for this test, the subjects should be randomly assigned to two groups, so that
any difference in response is due to the treatment (or lack of treatment) and not to other
factors. This is not the case if you compare average income for males and females. A
person is not randomly assigned to be a male or female. In such situations, you should
ensure that differences in other factors are not masking or enhancing a significant dif-
ference in means. Differences in average income may be influenced by factors such as
education (and not by sex alone).
In short: The Independent-Samples T Test procedure tests the significance of the dif-
ference between two sample means. Also displayed are:
–Descriptive statistics for each test variable
–A test of variance equality
–A confidence interval for the difference between the two variables (95% or a value you
specify)
Example 2: An analyst at a department store wants to evaluate a recent credit card
promotion. To this end, 500 cardholders were randomly selected. Half received an ad
promoting a reduced interest rate on purchases made over the next three months, and
half received a standard seasonal.
Output
Table 11: Independent Samples Test Equal variances assumed (check the pivot option
and select pivot for the assumptions of equality of variance)

87
a. Since the significance value of the test is less than 0.05, you can safely conclude that
the average of 71.11 dollars more spent by cardholders receiving the reduced interest
rate is not due to chance alone. The store will now consider extending the offer to all
credit customers.
b. The 95% Confidence Interval of the Difference provides an estimate of the boundaries
between which the true mean difference lies in 95% of all possible random samples of
500 cardholders.
Activity 2: consider the employee.sav data and test whether the current salary is sta-
tistically significant for the
a. minority classifications?
b. between male and female?

Paired t-test One of the most common experimental designs is the ”pre-post” design.
A study of this type often consists of two measurements taken on the same subject, one
before and one after the introduction of a treatment or a stimulus. The basic idea is
simple. If the treatment had no effect, the average difference between the measurements
is equal to 0 and the null hypothesis holds. On the other hand, if the treatment did
have an effect (intended or unintended!), the average difference is not 0 and the null
hypothesis is rejected.
The Paired-Samples T Test procedure is used to test the hypothesis of no difference
between two variables. The data may consist of two measurements taken on the same

88
subject or one measurement taken on a matched pair of subjects.
Additionally, the procedure produces:
–Descriptive statistics for each test variable
–The Pearson correlation between each pair and its significance
–A confidence interval for the average difference (95% or a value you specify)
Example 3: A physician is evaluating a new diet for her patients with a family history
of heart disease. To test the effectiveness of this diet, 16 patients are placed on the diet
for 6 months. Their weights are measured before and after the study, and the physician
wants to know if either set of measurements has changed. This example uses the file
dietstudy.sav. Use Paired-Samples T Test to determine whether there is a statistically
significant difference between the pre- and post-diet weights of these patients.

Table 12: Results paired samples statistics

– The subjects clearly lost weight over the course of the study; on average, about 8
pounds.
Table 13: paired samples Correlation

–The Pearson correlation between the baseline and six-month weight measurements is
0.996, almost a perfect correlation.

89
Table 14: Paired samples test

–The Mean column in the paired-samples t test table displays the average difference in
weight measurements before the diet and six months into the diet.
–The 95% Confidence Interval of the Difference provides an estimate of the boundaries
between which the true mean difference lies in 95% of all possible random samples of 16
patients like the ones participating in this study.
–Since p-value is less than alpha, we reject the null hypothesis, implies that there is a
great difference between means, you can also check from the confidence interval.
Activity 3: consider the employee.sav data and test whether there is a significant dif-
ference between the beginning and current salary of employees.

ANOVA The analysis of variance (ANOVA) is the most widely used method of statis-
tical analysis of quantitative data that come from agriculture. It is closely related to
Student’s t-test, but whereas the t-test is only suitable for comparing two treatment
means the ANOVA can be used both for comparing several means and in more complex
situations. The ANOVA partitions the total variation into several parts such as Treat-
ment, Block, Error and Total, depending on the design of the experiment. An analysis
of Variance (ANOVA) is used to test the null hypothesis that several population means
are equal. It examines the variability of the observation within each group as well as the

90
variability between the group means. Based on these two estimates of variability, you
draw conclusions about the population means.

One-Way ANOVA
The One-Way ANOVA procedure produces a one-way analysis of variance for a quanti-
tative dependent variable by a single factor (independent) variable. Analysis of variance
is used to test the hypothesis that several means are equal. This technique is an exten-
sion of the two-independent sample t test. In addition to determining that differences
exist among the means, you may want to know which means differ. Post hoc tests are
a type of test. An important first step in the analysis of variance is establishing the
validity of assumptions. One assumption of ANOVA is that the variances of the groups
are equivalent.
To test the equality of variance assumption, from the menus choose:
Analyze =⇒ Compare Means =⇒ One-Way ANOVA... =⇒ Select the de-
pendent variable and the factor variables =⇒ Options.

1.8.2 Correlation and Linear Regression

ˆ Correlation
Correlation is “a statistical technique used to determine the relationship between
two or more variables”. We use two different techniques to determine score rela-
tionships: graphing technique and mathematical technique called correlation The
values of the coefficient will always range from +1 to -1. A correlation coefficient
near 0 indicates no relationship.
Types of Relationships
”R” indicates the strength of relationship (strong, weak, or none) direction of re-
lationship positive (direct):

91
–Variables move in same direction negative (inverse).
–Variables move in opposite directions.
A scatter plot (or scatter diagram) is used to show the relationship between two
variables.

Correlation Coefficient The correlation coefficient ρ (rho) measures the strength


of the association between the variables.
Properties of Correlation Coefficient:
– Unit free
– Range between -1 and 1
– The closer to -1, the stronger the negative linear relationship
– The closer to 1, the stronger the positive linear relationship
– The closer to 0, the weaker the linear relationship

92
Hypothesis testing with Correlations
Ho: ρ = 0 (no actual correlation)
Ha: ρ 6= 0 (there is some correlation)
Example 4: In order to increase sales, motor vehicle design engineers want to
focus their attention on aspects of the vehicle that are important to customers.
For example, how important is fuel efficiency with respect to sales? One way
to measure this is to compute the correlation between past sales and fuel effi-
ciency. Information concerning various makes of motor vehicles is collected in
car sales.sav. Use Bivariate Correlations to measure the importance of fuel effi-
ciency to the saleability of a motor vehicle.
Open =⇒ data =⇒ car-sales.sav
Analyze =⇒ Correlate =⇒ Bivariate.
The Pearson correlation coefficient measures the linear association between two
scale variables.
However, the Pearson correlation coefficient works best when the variables are ap-
proximately normally distributed and have no outliers. A scatter plot can reveal
these possible problems.

ˆ Regression
To use this model, the response variable should be quantitative; the predictor vari-
able can be either qualitative or quantitative. Linear regression is used to model
the value of a dependent scale variable based on its linear relationship to one or
more predictors; Linear Regression estimates the coefficients of the linear equa-
tion, involving one or more independent variables that best predict the value of
the dependent variable.
Dependent variable: the variable we wish to explain.
Independent variable: the variable used to explain the dependent variable.
For example, you can try to predict a sales person’s total yearly sales (the depen-

93
dent variable) from independent variables such as age, education background, and
years of experience. The linear regression model assumes that there is a linear, or
”straight line,” relationship between the dependent variable and each predictor.
Simple Linear Regression Model
– Only one independent variable,x.
– Relationship between x and y(dependent variable) is described by a linear func-
tion.
– Changes in y are assumed to be caused by changes in x.

Interpretation of the Slope and the Intercept


– β0 is the estimated average value of y when the value of x is zero.
– β1 is the estimated change in the average value of y as a result of a one-unit
change in x.
Multiple Regression Models Relationship between one dependent and two or
more independent variables is a linear function.
Y = β0 + β1 x1 + β2 x2 + ... + βp xP + e
β0 = y-intercept a constant value
β1 = slope of Y with variable x1 holding the variables x2, x3, ..., xP effects con-
stant.
.
.
.
βp = slope of Y with variable xP holding all other variables’ effects constant.
The model is linear because increasing the value of the jth predictor by 1 unit
increases/decreases the value of the dependent by βj units. Note that β0 the inter-

94
cept, the model-predicted value of the dependent variable when the value of every
predictor is equal to 0.
For testing hypotheses about the values of model parameters, the linear regression
model also assumes the following:
– The error term has a normal distribution with a mean of 0.
– The variance of the error term is constant across cases and independent of the
variables in the model. An error term with non-constant variance is said to be
heterostructure.
– The value of the error term for a given case is independent of the values of the
variables in the model and of the values of the error term for other cases.
1. Analyze =⇒ regression =⇒ linear =⇒ select dependent and the independent
variables
2. Click over statistics and select over estimates, confidence interval and model
fits.
Coefficient of Determination (R2 )
The coefficient of determination is the portion of the total variation in the de-
pendent variable that is explained by variation in the independent variable. The
coefficient of determination is the square of the correlation coefficient (r). For ex-
ample, if the correlation coefficient between two variables is r = 0.90, the coefficient
of determination is (0.90)2 = 0.81.

Example: look the “car.sales.sav” data and let’s consider the sales in thousand
is the dependent variable and the rest are independent. Then
– Test each parameters or coefficients
– find the final fitted regression model
Step in SPSS
1. Open car.sales.sav data
2. Analyze =⇒ regression =⇒ linear =⇒ select sales in thousands in dependent

95
variables and the rest in the independent.
3. Click over statistics and select over estimates, confidence interval and model fits

Predictors: (Constant), Fuel efficiency, Length, Price in thousands, Vehicle type,


Width, Engine size, Fuel capacity, Wheelbase, Curb weight, Horsepower.
Generally, the regression doesn’t a good job of modeling sales. Only 33.5% of the
variation in sales is explained by the model.

Table:13 Parameter estimates, coefficients

– Dependent Variable: Sales in thousands


The variables whose sig.values are bold is the significant variables. There are several
non-significant coefficients, indicating that these variables do not contribute much to the

96
model.

2 MINITAB

2.1 MINITAB Windows

When you start MINITAB a new empty project is opened for you. You will see that
the main MINITAB window contains 3 main window; Session window, Worksheet and
Project manager. MINTAB works like any other window in your operating system; you
can also use special MINITAB commands.

Figure 60: MINITAB Window with its three components

I. Worksheet window: It contains many rows and columns with specified name for
each column and row. It is used to enter data (numerical, text and date & time) and to

97
store results of the analysis. It is possible to open more than one worksheet at a time.
II. Session Window: It is a blank white window and it is used to displays the results
of the analysis.
III. Project manager: It contains different folders which has their own function, these
are
a. Session Folder: It manages the session window
b. History folder: It lists commands you have used in your session.
c. Graph folder: It is for managing, arranging and naming your graphs.
d. Report pad folder: It is used to creating, arranging and editing reports of your
project.
e. Related document folder: For quickly accessing project related, nonMINITAB files
for easy reference.
f. Worksheet folder: it displays a summary of the columns, stored constants, matrices
and designs used in the current worksheet.
IV. Graph window: it used to display graphs and charts, but it is visible if you create
a graph or chart for your data.

2.2 Data Types

This section discusses the types of data you can work with in MINITAB and the various
forms those data types can take. In MINITAB you can work with 3 types of data in
three forms: columns, constants, or matrices, these are:
1. Numeric: It includes digits 0, 1 ... 9 and *. But the symbol * is reserved for missing
value. The number can have a – or + sign, also it can be written in exponential notation
if it is very large or very small number. e.g. 3.2E12 which is equal with 3.21012 . Num-
bers can be stored in columns, constants or matrices. MINITAB stores and computes
numbers in double precision, which means that numbers can have up to 15 or 16 digits
(depending on the number) without round-off error.

98
2. Text: It can be two types either character or string. Characters are a single al-
phabet, digits (from 0 to 9), spaces and punctuation marks such as >, ?, <, !... . Strings
are a series of characters; some examples of strings are country, name, occupation etc.
The maximum number of characters that can be entered at a time is 80. Texts can be
stored in columns or constants but not in matrices.

3. Date/Time: You can write Date (Such as Jan-1-1997, 03/01/2011...)or Times (Such
as 24:23) or both (Such as 24/11/2002;10:30AM ).

2.3 Basics Computer Technical Words/ Jargon

Throughout reading this text book you will encounter some computer jargon words, so
to let you feel convenience the words defined and illustrated as follows.
1 Dialog Box: A secondary window that contains buttons and various kinds of options
through which you can carry out a particular command or task.
2 Text box: In a dialog box, a box in which you type information needed to carry out a
command. The text box may be blank or may contain text when the dialog box opens.
3 Button: It is a small tab on a dialog box that you press (by clicking on it) in order to
command the computer to do something.
4 Radio button: A round holes to list different functions, which are ordered only one at
a time.
5 Check List box: A square box to list different functions, which can be ordered alone
or together. See MINITAB Calculator
6 Combo box: Is a special type of text box which enlist different items when you click
on a drop down arrow.
The technical words/jargons are visually expressed as follows.

99
100
Figure 61: A dialog boxes in Jargon Manner

101
2.4 Opening and exiting MINITAB Window

Note that when you attempt to close an opened and unsaved MINITAB window a
message box that reminds you to save a project will be prompted, if you want to save
your project click on Ok button other wise to close click on No button.

2.5 Entering Data and Saving a Project

When you start Minitab, you begin with a new, empty project. You can add data to
your project in many ways; the first one is simply by typing your data on to the data
window; secondly for a training purpose there are a lot of data saved in the program,
to open these files select File then Open Worksheet next on the new displayed form
select one among the lists.

102
2.5.1 Entering Data into Worksheet Window

In MINITAB program, data entry is performed either in column wise or row wise on a
data window, but column wise is mostly preferable and commonly used. The first row
of the data window/worksheet is reserved for the name of the variable/data; hence you
should start data entry at the second row.
Activity 2.1 : Enter the following data in to the 1st, 2nd and 3rd column of data
window.
Table 2.1: Age, sex and Date of Birth for 7 individual

Figure 62: Data entry on Worksheet window

Note that if you entered a numeric data into column 1, make sure that the head of the
column is “C1”, if you enter a text data into column 2, the head of the column will be

103
“C2-T” and if you enter a date/time data in column 3, the head of the column will be
“C3-D” or “C3-T”.
Activity 2.2: Enter the name of 10 students and their arrival time to class by yourself
in 2 columns. The name of the column should be “Name” for the name of students and
“Time” for the column of time arrival.

2.6 Working on Data

Under this section we will discuss about operating on data before we apply statistical
analysis. It includes changing data type, sorting, ranking and displaying on to session
window of a certain data which is already entered into the data window.

2.6.1 Changing Data Type

MINITAB can change one data type in to another type, i.e. either from text to numeric or
vice versa, either from numeric to Date/time or vice versa, ... . To do this, select Manip
from menu bar and select Change data type and then select one of the combination
from current data type to the new data type; on the new displayed form enter the name
of the variable to be changed at the first text box and enter the name of the new variable
to be created in the second text box. (See the following forms)

104
2.6.2 Sorting and Ranking Data

In MINITAB program scale (interval or ratio) and ordinal data can be sorted either
in ascending or descending order as well as it can be assigned a rank score.
Sorting Data
You can sort one or more columns of data according to the values in the column(s) you
indicate. Sorting alphabetizes or numerically orders the column(s) you are sorting by and
carries along associated columns. You can sort in ascending or descending order. The
ordering appears in the worksheet. Steps to sort a certain column data are mentioned
as follows.

105
Ranking Data

106
2.6.3 Displaying Data in to Session Window

You can display a stored data in to session window, in doing so we can follow the
following procedures. Before you begin the procedure you should enter any type of data
in a column(s).

107
Activity 2.3: Based on the data given in lab Activity 2.1, sort (in ascending) and rank
the variable “Age” and display the result in to session window.

108
Figure 63: Session Window result of lab window activity 2.3

2.7 Saving a Project

To save projects follow the following steps:


–Click on File> Save/ Save As.
–[Form 3.5 will be displayed] Write the file name of the project at File name: text box.
–Then click on Save Button.

2.8 Data Organization

Table procedures summarize data into table form for a further analysis of a tabled
summary. Your data need to be arranged in the worksheet in certain way in order to
apply further statistical procedures using MINITAB. Under this section it is discussed
how to organize your data in a table format using MIITAB for both numerical and
categorical data.

109
2.9 Explanatory Data Analysis (EDA)

Exploratory data analysis (EDA) methods are used primarily to explore data before
using more traditional methods, or to examine the distribution of the data. These
methods are particularly useful for identifying extraordinary observations and noting
violations of traditional assumptions, such as non normality or non constant variance.
Under this section among different EDAs the procedures of creating Box plot, stem and
leaf plot can be mentioned.

110

You might also like