Research Methodology Lab File
Research Methodology Lab File
Batch: 2020-23
Index
1
Module Module description Page no.
number
Module 1 Define SPSS. How to import data from 5
(Introduction MS Excel to SPSS? Differentiate
to SPSS) between Data View and Variable View.
Explain the basic elements of SPSS.
Describe the functions, advantages and
disadvantages of SPSS.
Module 2 What are Descriptive Statistics? Define 14
(Descriptive Mean, Median, Mode, Maximum and
Statistics) Minimum Value. (Practical in SPSS)
2
Module 7 Compute the Mean of variables. Also 52
(Mean of perform Test of Normality distribution
Variables & (reporting skewness, kurtosis, Shapiro
Normality Wilk test, Q-Q plots, and box plots).
Distribution (Practical in SPSS)
Test)
Module 8 Explain the meaning of Reliability Test. 59
(Reliability Show the steps to compute reliability.
Test) (Practical in SPSS)
Module 9 Defining Chi-square, its steps and 65
(Chi-square understanding significance of p-value in
Test) chi-square. (Practical in SPSS)
Module 10 Define T-Test. Explain One-Sample T- 71
(T-Tests) Test and Paired Sample T-Test.
(Practical in SPSS)
Module 11 Explain Regression and show steps to 77
(Regression) compute regression in SPSS. Also,
explain the types of Regression.
(Practical in SPSS)
Module 12 Define Anova Test. Perform Anova Test 86
(Anova) in SPSS. (Practical in SPSS)
3
Module - 1 (Introduction to SPSS)
4
Define SPSS
SPSS stands for Statistical Package for the Social Sciences and is used for complex statistical
data analysis by various researchers. The SPSS software package was created for the
management and statistical analysis of social science data. It is widely coveted due to its
straightforward command language and impressively thorough user manual.
SPSS is most often used in social science fields such as psychology, where statistical
techniques are involved at a large scale. In the field of psychology, techniques such as cross-
tabulation, t-test, chi-square test, etc., are available in the analyse menu of the software.
Processing and analysis of data using SPSS is done by market researchers, health researchers,
survey companies, government entities, education researchers, marketing organizations, data
miners, and many more.
For Importing data from Excel to SPSS, perform the following steps –
5
Location Population HighestInfectionCount PercentPopulationInfected
Faeroe Islands 49053 34658 70.65419037
Denmark 5813302 3017529 51.90731533
Andorra 77354 39234 50.72006619
Gibraltar 33691 16436 48.78454187
Iceland 368792 175329 47.54143257
Slovenia 2078723 942954 45.36217668
Netherlands 17173094 7677637 44.70736025
San Marino 34010 15034 44.20464569
Slovakia 5449270 2366902 43.43521242
Cyprus 896005 387315 43.22687931
Latvia 1866934 777201 41.62980587
Georgia 3979773 1643295 41.29117414
Estonia 1325188 545057 41.13054148
Israel 9291000 3801627 40.91730707
Liechtenstein 38254 15497 40.51079626
Seychelles 98910 39991 40.43170559
Austria 9043072 3532415 39.06211296
Switzerland 8715494 3353754 38.48036612
Lithuania 2689862 997690 37.09075038
Montenegro 628051 232376 36.99954303
France 67422000 24394936 36.18245676
Czechia 10724553 3750636 34.97242263
Portugal 10167923 3458727 34.01606208
Belgium 11632334 3741614 32.16563417
Maldives 543620 174658 32.12869284
Luxembourg 634814 202577 31.91123699
Aruba 107195 33843 31.57143523
Isle of Man 85410 26734 31.30078445
Bahrain 1748295 546896 31.28167729
Step 2 - Open SPSS, then go to File > Import Data and then click on Excel.
6
Step 3 - Select the File that you want to open and then click on Open Button.
Step 4 - In the Read Excel File Box, click on OK and your data file will now open in SPSS.
7
Data and Variable View
Let's first take a look at the main parts of the Data View tab.
1. The data editor has tabs for switching between Data View and Variable View.
Make sure you're in Data View.
2. Columns of cells are called variables. Each variable has a unique name
(“gender”) which is shown in the column header.
3. Rows of cells are called cases. Oftentimes, each respondent in a study is
represented as a single case.
4. In SPSS, values refer to cell contents.
5. The status bar may give useful information on the data, for instance, whether
a WEIGHT, FILTER, SPLIT FILE or Unicode mode is in effect.
8
These are the main elements in Data View.
1. In the left bottom corner, we find tabs for switching between Variable View and
Data View. For now, select Variable View.
2. In Variable View, variables are shown as rows of cells.
3. The first column shows the variable name for each variable.
4. The fifth column may or may not contain a variable label. This describes the exact
meaning of each variable.
5. The sixth column shows value labels: descriptions of the meaning of one, many or
all values that a variable may contain.
Elements of SPSS
The Data Editor opens immediately upon starting SPSS and, when empty, looks like a
typical spreadsheet. When data is loaded into the Data Editor, each column will represent a
variable and each row will represent a case. Selecting the tab at the bottom that's labeled
"Variable View" allows the user to view and edit information about each variable. To open a
new Data Editor, select "File"->"New"->"Data." When the contents of the Data Editor are
9
saved, the resulting file will have a ".sav" extension. If a file has been saved in the SAV
format you can open it by selecting "File"->"Open"->"Data."
2. Variables in SPSS
Each variable in an SPSS dataset has a set of attributes that can be edited by toggling to the
"Variable View" tab in the Data Editor:
Name is the variable's machine-readable name. This is the name used to refer to the variable
in SPSS's underlying code and, if no "Label" is defined, the name that will appear at the top
of the column in the "Data View."
Type indicates the type of data that can be stored in the variable's column. The most
frequently used types are "String" (for text) and "Numeric." SPSS uses the type to know what
rules can be applied to a specific variable. It won't do arithmetic on a string variable, for
example.
Label sets the name that will be displayed at the top of the column in the Data Editor,
allowing for a human readable representation of the variable name.
Values sets names given to coded values (e.g., if the variable contains survey responses
where a "0" represents "no" and "1" represents a "yes" this field can be used to tell SPSS to
display the text values instead of the numerical raw data).
10
Role is used by some SPSS dialogues to distinguish between the variable's intended usage in
some predictive applications (e.g., regression, clustering, classification). For most dialogues,
the role won't be significant.
Clicking in the appropriate cell will open a dialogue box or drop-down menu that allows the
attribute value to be altered.
The Statistics Viewer displays the output of actions performed on data. Whether you want to
generate frequency tables, perform one-way ANOVA, or build a regression model, the results
will end up as a table or graph in the statistics viewer. When data is loaded into the Data
Editor or changed in some way (e.g., if it is sorted), the Statistics Viewer will also display
some text describing the operation and any errors that occurred. Usually, the text is in the
form of SPSS syntax (see below). Any Action you perform from the Data Editor that
generates output will automatically open the Statistics Viewer, but you can also open a new
Statistics Viewer window by selecting "File"->"New"->"Output." The contents of a Statistics
Viewer window will be saved with a ".spv" extension.
The Syntax Editor allows you to control SPSS using the SPSS command language, usually
referred to as "syntax" in the SPSS documentation. At one time all of the user's interactions
with SPSS would have been performed using syntax commands, but these days, most
processes are automated in the Data Editor's menus. There are still times when writing or
editing syntax commands is useful, or even necessary. To open the Syntax Editor, select
"File"->"New"->"Syntax."
SPSS also has a Script Editor, which allows users to automate processes by writing
programs in Visual Basic or (with an extension) Python. The application is outside of the
scope of this guide, but can be a powerful tool when used for processing large numbers of
files consecutively or transforming text.
11
Functions of SPSS
SPSS offers four programs that assist researchers with their complex data analysis needs.
1. Statistics Program
SPSS’s Statistics program provides a plethora of basic statistical functions, some of which
include frequencies, cross-tabulation, and bivariate statistics.
2. Modeler Program
SPSS’s Modeler program enables researchers to build and validate predictive models using
advanced statistical procedures.
SPSS’s Text Analytics for Surveys program helps survey administrators uncover powerful
insights from responses to open-ended survey questions.
4. Visualization Designer
SPSS’s Visualization Designer program allows researchers to use their data to create a wide
variety of visuals like density charts and radial boxplots from their survey data with ease.
Advantages of SPSS
12
Disadvantages of SPSS
A very large dataset cannot be analysed.
SPSS can be expensive to purchase for students.
Usually involves added training to completely exploit all the available features.
The graph features are not as simple as those of Microsoft Excel.
13
Module - 2 (Descriptive Statistics)
Descriptive statistics are brief descriptive coefficients that summarize a given data set,
which can be either a representation of the entire population or a sample of a population.
Descriptive statistics are broken down into measures of central tendency and measures of
variability (spread). Measures of central tendency include the mean, median, and mode,
while measures of variability include standard deviation, variance, minimum and maximum
variables, kurtosis, and skewness.
Definitions
Mean: A mean is the simple mathematical average of a set of two or more numbers. The
mean for a given set of numbers can be computed using arithmetic and geometric mean
method.
Median: The median is the middle number in a sorted, ascending or descending, list of
numbers and can be more descriptive of that data set than the average.
Mode: In statistics, the mode is the value that is repeatedly occurring in a given set. It is the
value or number in a data set, which has a high frequency or appears more frequently.
The minimum: The minimum value occurring in the data set. If we were to order all of our
data in ascending order, then the minimum would be the first number on our list.
14
The maximum: The maximum value occurring in the data set. If we were to order all of our
data in ascending order, then the maximum would be the last number listed.
Practical
Step – 1 Fill the data of 50 respondents in MS excel. Then click on File > Save As.
Step -2 Open the SPSS -> File tab -> Import data -> choose the excel button and the file
which was saved and click OK.
Step - 3 Go to Variable View -> go to Column Value -> click on the three dots in respective
columns -> you can assign values to our desired fields.
15
Step - 4 Go to Data View -> click on Analyze Tab -> select Descriptive Statistics ->
Frequencies -> select the frequencies you desire the analysis of -> click on Statistics and
choose mean, Median, Mode, Maximum, Minimum.
Statistics
Qualificatio Marital
Gender n Experience Age Status
N Valid 50 50 50 50 50
Missing 0 0 0 0 0
Mean 1.54 1.50 1.50 1.48 1.50
Median 1.50 1.00 1.00 1.00 1.50
Mode 1 1 1 1 1a
Std. Deviation .579 .580 .580 .544 .505
Minimum 1 1 1 1 1
Maximum 3 3 3 3 2
a. Multiple modes exist. The smallest value is shown
16
Interpretations: We can see from the above table that the mean value is 1.50 in
qualification and median is 1 and the mode is also 1. That shows there are fewer people
qualified in PG than at UG.
We can also see that since the mean value of gender is 1.54 and median is 1.50 and the mode
is 1. That means in the no. of respondents the number of males and females are not equal.
17
Module - 3 (Cross-Tabulation)
A cross-tabulation is a two- (or more) dimensional table that records the number (frequency)
of respondents that have the specific characteristics described in the cells of the table. Cross-
tabulation tables provide a wealth of information about the relationship between the
variables.
They are data tables that display not only the results of the entire group of respondents, but
also the results from specifically defined subgroups. For this reason, crosstabs allow
researchers to closely investigate the relationships within a data set that might otherwise go
unnoticed.
Purpose of cross-tabulation
Cross tabulations or crosstabs group variables together and enable researchers to understand
the correlation between the different variables. By showing how correlations change from
one group of variables to another, cross tabulation allows for the identification of patterns,
trends, and probabilities within data sets.
2. Cross tabulation allows researchers to investigate data sets at a more
granular level.
Survey results are typically presented in aggregate data tables that show the total responses to
all questions asked in the survey. Cross tabulations are data tables that display not only the
18
results of the entire group of respondents, but also the results from specifically defined
subgroups.
For this reason, crosstabs allow researchers to closely investigate the relationships within a
data set that might otherwise go unnoticed.
By creating crosstabs, data sets are simplified by dividing the total set into representative
subgroups, which can then be interpreted at a smaller, more manageable scale.
This reduces the potential for making errors while analysing the data, which means that time
is spent efficiently.
By reducing total data sets into more manageable subgroups, cross tabulation allows
researchers to yield more granular, profound insights.
The entire purpose of performing statistical analysis on a data set is to uncover actionable
insights that will then impact your end goal. Because cross tabulation simplifies complex data
sets, these impactful insights are much easier to expose, record, and consider while
developing overarching strategies.
Practical
Step – 1 – Use the data of the same 50 respondents in MS excel and import them into the
SPSS software and assign values to them.
19
Class Value 1 Value 2
Gender Male Female
Age Below 25 years Above 25 years
Qualification UG PG
Marital status Married Unmarried
Experience Below 5 years Above 5 years
Step – 2 – We go to the Data View -> click on Analyze -> go to Descriptive Statistics ->
Cross Tabulation.
Step – 3 – Select the variables like Gender in the row section and Qualification in the column
section -> Click OK.
20
Step – 4 – We get the desired results in the Output.
Gender * Qualification
Crosstabulation
Count
Qualification
UG PG Total
Gender Male 14 11 25
Female 19 6 25
Total 33 17 50
21
Interpretation
1) There are more respondents (males and females) between the age of 20-
30 as compared to 30-40. Therefore, most of the respondents are between
the age group of 20-30.
2) The no. of respondents who have cleared UG is more than those who
have cleared PG.
3) The no. of educated respondents who are unemployed are more than
those who are employed.
22
Module – 4 (Charts and boxplots)
Define Charts and its types. What is box-plot? Explain with example.
Types of Charts
1. Column Charts
Column charts are effective for the comparison of at least one set of data points. The vertical
axis, also known as the Y-axis, is often shown in numeric values. The X-axis on the
horizontal line indicates a period.
23
A clustered column chart is useful in showing and analysing multiple data sets. For stacked
column charts, you can quickly check a specific percentage of the overall data.
2. Bar Charts
Bar charts are for comparing concepts and percentages among factors or sets of data. Users
can set different distinct choices for their respondents, for example, annually or quarterly
sales. You can see bar charts are similar to column charts lying on their X-axis.
Usually, compared to other types of charts, bar charts are better for showing and comparing
vast sets of data or numbers.
3. Pie Charts
24
Pie charts are useful for illustrating and showing sample break down in an individual
dimension. It is in the shape of a pie to show the relationship between your data's main and
sub-categories. It is good to use when you are dealing with categorized groups of data, or if
you want to show differences among data based on a single variable.
One can break down any sample data groups into different categories, for example, by gender
or in various age groups. For business projects, one can use pie charts to represent the
importance of one specific factor on the others.
4. Line Charts
This type of chart is normally used for explaining trends over periods. The vertical axis
always displays a numeric amount, while the X-axis indicates some other related factors.
Line charts can be shown with markers in the shape of circles, squares, or other formats.
25
Line charts make it evident for users to see the trend within a specific period for a single set
of data. Alternatively, one can compare trends for several different data groups. Managers or
financial leaders may use such charts to measure and analyse long-term trends in sales,
financial data, or marketing statistics.
What is a Boxplot?
A boxplot is a standardized way of displaying the distribution of data based on a five number
summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It
can tell you about your outliers and what their values are.
Boxplots are very useful for comparing distribution and identifying outliers.
There are two main types of boxplots or box and whisker diagrams as they are:
1. Simple Boxplots: Used to compare the distribution of one variable (i.e., Training and
development perception of millennial of X Company) based on one categorical variable (Age
Group).
2. Cluster Boxplots: can be used to compare the distribution of one variable (TD Perception)
based on two categorical variables (Gender and Age Group).
Note: Before Using Boxplot in SPSS: Box plot will only work if you have a nominal or
ordinal variable on the X-axis and if you have a scale variable on the Y-Axis. Also remember
to code your data correctly.
26
Ex - The following data indicates the billing amount with respect to the data usage for
randomly selected 20 mobile telecom subscribers belonging to male and female
categories.
1) Prepare an SPSS data file by incorporating gender as the nominal variable, and amount as
the scale variable.
Step – 1 – Prepare data of 20 male and 20 female respondents and the billing amounts in MS
excel and import it to SPSS.
27
Gender Billing Amount
1 500 1 155
2 660 2 650
1 520 1 345
2 140 2 980
1 900 1 1200
2 300 2 450
1 655 1 670
2 450 2 950
1 120 1 345
2 855 2 335
1 850 1 210
2 1400 2 950
1 1450 1 780
2 730 2 450
1 860 1 2400
2 600 2 125
1 400 1 760
2 100 2 650
1 890 1 440
2 740 2 560
Step – 2 – Go to Variable View > click on Values > Assign values to Gender as 1- Male and
2- Female.
28
Ensure that Gender is a Nominal variable and Billing Amount is a Scale Variable as shown
below.
Define Values:
1. Male
2. Female
1) Go to Top Menu, click on Graphs, then click on Chart Builder and press OK.
2) Go to Gallery, click on Boxplot, then click on the first boxplot and drag that into the
chart preview window.
3) Go to the Variables window, drag the Nominal, Ordinal, or Categorical Variable on X
axis and Scale Variable on Y axis then press OK.
29
4) In Output window you will see the Simple Boxplot of Billing Amount by Gender.
5) Black line reflects the difference in median value of Billing Amount for Male and
Female groups.
30
7) Double click on the Boxplot, then right-click on the black line, then click Show Data
Labels. A properties window will pop up, click on Close, then you will be able to see
the median value on the black line. Close the Chart Editor window. The value will be
shown in your Simple Boxplot.
8) Interpretation:
Simple boxplot is showing the difference in the mean scores of Males (663) and
females (625) based on Billing Amount. Results reveal that Males are having higher
mean amount spent on mobile bills than Females.
31
2) Go to Gallery, click on Boxplot, then click on Cluster Boxplot (2nd one) and drag
that into Chart Preview Window.
3) Go to Variable window, drag the one (Gender) Nominal, Ordinal or Categorical
variable on X axis, drag the other (Billing Amount) Nominal, Ordinal or
Categorical variable on Cluster on X SET window (you can see on your right
side), and Scale Variable on Y axis than press OK.
4) In Output window you will see the Clustered Boxplot of Billing Amount by
Gender by Gender.
5) Black line reflects the median value of the difference in Gender and Billing
Amount.
6) Double click on Simple Boxplot chart to activate the same.
7) Select the black line, then right click, and click the show data labels. A properties
window will pop up, click on Close, then you can see the median value on the
black line. Close the chart editor window. The value will be shown in your
clustered boxplot.
8) Interpretation:
Clustered boxplot is showing the difference in the mean scores of Males (663) and
females (625) based on Billing Amount. Results reveal that Males are having
higher mean amount spent on mobile bills than Females.
32
Module - 5 (Correlation)
According to Craxton and Cowden, correlation is “when the relationship is quantitative, the
approximate statistical tool for discovering and measuring the relationship and expressing it
in a brief formula is known as correlation”. In short, the tendency of simultaneous variation
between two variables is called correlation or covariation. For example, there may exist a
relationship between heights and weights of a group of students, the scores of students in two
different subjects are expected to have an interdependence or relationship between them.
If the change in one variable appears to be accompanied by a change in the other variable, the
two variables are said to be correlated and this interdependence is called correlation or
covariation.
Practical
18 students have taken Common Entrance Test (CAT) after their graduation. They were also
given their aptitude. The following is the information related to the CAT percentile and their
graduation percentage. Mr. X, a researcher, wants to see the relationship between the scores
of CAT & graduation percentages through correlation analysis.
Student No. Graduation Percentage CAT Percentile
1 70 80
2 60 85
3 65 70
4 68 65
5 70 69
6 75 89
7 80 99
8 89 95
9 90 94
10 95 98
11 65 88
12 68 75
13 72 89
14 78 88
15 87 90
16 91 89
17 82 94
18 84 93
33
Let us find out the existence of any relationship between graduation percentage and CAT
percentile.
Step 4 – Enter both the variables in the Variables box, select Pearson Two-Tailed Test and
click OK.
34
Step 5 – The following output will appear in the output window.
Correlations
Graduation CAT
Percentage Percentile
Graduation Percentage Pearson Correlation 1 .687**
35
Results & Interpretation
As seen from the above output window, the following interpretations can be made-
The correlation value is .687 which is near +1 and therefore it can be concluded that there is a
strong positive correlation between the graduation percentage and CAT percentile. Hence, H0
can be accepted. Also, as it can be seen that the P-value (.002) is less than .005 therefore our
hypothesis is accepted, which means there exists a positive relationship between graduation
percentage and CAT percentile.
36
Module – 6 (Practical Problem)
Create a data file with data on the following fields and codes and create
charts (histograms), calculate frequencies (standard deviation, mean,
median, mode, maximum, minimum), and create boxplots (simple and
clustered boxplots).
Create a data file in excel of fifty respondents and import it to SPSS and add values to them.
37
Gender
1) male 2) female
Marital status
1) married 2) unmarried
Education
Experience
Stress Levels
Go to Data View -> click on Analyse Tab -> select Descriptive Statistics -> Frequencies -
> select the frequencies you desire the analysis of -> click on Statistics and choose Mean,
Median, Mode, Maximum, Minimum, Standard Deviation.
Statistics
38
Minimum 1 1 1 1 1
Maximum 2 2 3 4 5
Interpretation – It is easy to understand that the mean stress level is 2.74 while the median
stress level is 3.00 and the mode stress level is 3. The standard deviation is 1.306.
Interpretation – We can see from the above graph that the mean stress level is more in the
Females as compared to the Males.
39
Interpretation - We can see from the above graph that the stress levels are more in the
Unmarried People as compared to the Married People.
Interpretation - We can see from the above graph that the most stress levels are in Ph.D.
students followed by Postgraduates and then Undergraduates.
40
Interpretation – We can see from the above graph that the stress levels are more in people
that have the experience above 10 years followed by people that have 6 – 10 years followed
by people that have 2 – 5 years followed by people that have less than 2 years.
41
Interpretation - We can see from the above graph that the stress levels are more in females
as compared to males.
Interpretation – We can see from the above graph that the stress levels are more in the
unmarried people as compared to the married people.
42
Interpretation - We can see from the above graph that the stress levels are more in Ph.D.
level educated people followed by undergraduates and then postgraduates.
Interpretation - We can see from the above graph that the stress levels are more in people
that have experience above 10 years followed by people that have 6 – 10 years followed by
people that have 2 – 5 years followed by people that have less than 2 years.
Go to Graphs -> click on Chart Builder -> click on Boxplots -> select Simple Boxplots and
assign Gender as x-axis and Stress Levels as y-axis.
Replace the gender with other variables on x-axis and repeat the process to get more Simple
Boxplots.
43
Interpretation – Simple boxplot is showing the difference in the median stress levels of
males and females. This shows that females have more stress than males.
44
Interpretation – Simple boxplot is showing the difference in the median stress levels of
married and unmarried people. We can see that married people have more stress than
unmarried people.
Interpretation - Simple boxplot is showing the difference in the median stress levels of
Ph.D., postgraduates, and undergraduates. This shows that Ph.D. and postgraduate students
have more stress than undergraduate.
45
Interpretation – Simple boxplot is showing the difference in the median stress levels of
people that have different number of experiences. We can see that the people that have less
than 2 years’ experience have more stress than the rest.
Replace the Cluster on X with different values and repeat the process to get other clustered
boxplots.
46
Interpretation – This clustered boxplot is showing the difference in the median stress levels
of males and females who have varying amounts of experience. In females, those who are
having experience of fewer than 2 years and 6-10 years are having similar stress levels and
those who are having 2-5 years and 10 and above years’ experience have similar stress levels,
while in the males those with less than 2 years experience have the most stress.
47
Interpretation – This clustered boxplot is showing the difference in the median stress levels
of males and females having different levels of education. The females who are
undergraduates and have a Ph.D. have similar amount of stress while the males with Ph.D.
have the most stress.
4) Perform correlation
Go to Data View -> click on Analyse -> click on Correlate -> click on Bivariate -> in the
pop-up menu enter any variable with stress level to get the following output.
48
N 50 50
Correlations
Quick Steps
49
1. Click on Analyse -> Descriptive Statistics -> Descriptives
2. Drag and drop the variable for which you wish to calculate skewness and kurtosis into
the box on the right
3. Click on Options, and select Skewness and Kurtosis
4. Click on Continue, and then OK
This is fairly self-explanatory. The skewness statistic is 0.83 and kurtosis is -2.078 (see
above). You can also see that SPSS has calculated the mean (1.48) and the standard deviation
(0.505). N represents the number of observations.
A Shapiro –Wilks test (p<.05) showed that the stress level was approximately normally
distributed for both males and females, with a standard error in the skewness of 0.337 and a
standard error in kurtosis of 0.662. These values are the same for both genders.
50
For Calculation of Skewness and Kurtosis we can also use the following way:
To begin the calculation, click on Analyse -> Descriptive Statistics -> Frequencies.
This will bring up the Frequencies dialog box. You need to get the variable for which you
wish to calculate skewness and kurtosis into the box on the right. You can drag and drop, or
use the arrow button.
Once you’ve got your variable into the right-hand column, click on the Statistics button. This
will bring up the Frequencies: Statistics dialog box, within which it is possible to choose
several measures.
To calculate skewness and kurtosis, just select the options (as above). You’ll notice that
we’ve also instructed SPSS to calculate the mean and standard deviation.
Once you’ve made your selections, click on Continue, and then on OK in the Descriptive
dialog to tell SPSS to do the calculation.
The Result
The result will pop up in the SPSS output viewer. It will look something like this.
51
This is fairly self-explanatory. The skewness statistic is 0.83 and kurtosis is -2.078 (see
above). You can also see that SPSS has calculated the mean (1.48) and the standard deviation
(0.505). N represents the number of observations.
A Shapiro –Wilks test (p<.05) showed that the stress level was approximately normally
distributed for both males and females, with a standard error in the skewness of 0.337 and a
standard error in kurtosis of 0.662. These values are the same for both genders.
52
Module – 7 (Mean of variables, test of normality distribution)
What is Skewness?
Skewness is a measure of the symmetry, or lack thereof, of a distribution. In mathematics, a
figure is called symmetric if there exists a point in it through which if a perpendicular is drawn on
the X-axis, it divides the figure into two congruent parts i.e. identical in all respect or one part can
be superimposed on the other i.e mirror images of each other. In Statistics, a distribution is called
symmetric if mean, median and mode coincide. Otherwise, the distribution becomes asymmetric.
53
What is Kurtosis?
Kurtosis measures the tail-heaviness of the distribution. Like skewness, kurtosis is a
statistical measure that is used to describe distribution. Whereas skewness differentiates
extreme values in one versus the other tail, kurtosis measures extreme values in either tail.
Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution
(e.g., five or more standard deviations from the mean). Distributions with low kurtosis exhibit
tail data that are generally less extreme than the tails of the normal distribution.
54
We’re going to calculate the skewness and kurtosis of the data that represents the customer
experience in food delivery system. The usual reason to do this is to get an idea of
whether the data is normally distributed.
Step 1 – Go to Transform -> Compute Variable -> define Target Variable -> Select All in
function group -> select Mean from Functions and Special variables -> enter mean in Type
and Labels -> drag and drop the variables.
Now go to Data View and observe that a column for Mean has appeared.
55
Normality distribution test –
Step – 2 For normality distribution go to Analyze -> Descriptive Statistics -> Explore ->
enter Mean in Dependent List and Gender in Factor List -> click on Plots -> Check
Normality Plots with Tests -> Click OK
56
Interpretation – In males, we can see that there is a Skewness Statistic value of .401 and
Std. Error of .597 and there is a Kurtosis statistic value of -.426 and Standard Error of 1.154.
We can calculate the normality distribution in Skewness to be 0.672 and in Kurtosis the same
to be -0.369, which shows that the data is normally distributed.
Interpretation – In females, we can see that there is a Skewness Statistic value of .136 and
Standard Error of .564 and there is a Kurtosis statistic value of -1.025 and Standard Error of
1.091. We can calculate the normality distribution in Skewness to be 0.241 and in Kurtosis
the same to be -0.940, which shows that the data is normally distributed.
57
Interpretation – We can observe that for males and females, the p-value is equal to 0.956
and 0.297 respectively, which means p > 0.05, which ultimately means that we can accept the
null hypothesis, and we can conclude the case that the Customer’s Experience towards Food
Delivery System is normally distributed for both male and female.
Interpretation – We may conclude that our data is normally distributed because the
variables are concentrated around the normality line.
58
Interpretation – We may conclude that our data is normally distributed because the
variables are concentrated around the normality line.
Interpretation – We can observe that the mean value of Male and Female is 2.71 and 2.79
respectively. With this, we can conclude that our data is normally distributed.
59
Module - 8 (Reliability Test)
Reliability test refers to the extent to which a test measures without error. It is highly
related to test validity. Reliability test can be thought of as precision; the extent to which
measurement occurs without error. Reliability is not a constant property of a test and is
better thought of as different types of reliability for different populations at different levels
of the construct being measured.
60
In this module, we will be checking the reliability and will also discuss how to deal with
the reliability issue if required. The following steps will be performed for checking the
reliability:
Step1 – Create an excel file showing 50 responses of employees of ABC Ltd who were given
training and development.
Gender Employee's ExperiencePlanned objectives were met Issues were dealt in depth Adequate length of course Well suited method Method enabled to take active part in training
1 3 4 5 5 2 1
1 2 5 5 5 1 2
1 3 3 5 1 1 2
2 3 3 1 5 1 3
1 3 1 2 2 1 1
1 1 2 3 5 1 1
1 3 1 2 5 4 1
2 3 4 5 3 3 2
1 1 3 5 1 3 1
1 3 5 1 4 4 3
1 3 4 3 5 5 1
2 1 3 3 5 4 1
2 3 5 2 4 5 1
2 3 5 1 3 2 2
2 2 3 1 1 4 1
2 1 2 2 4 2 2
2 3 4 3 3 3 2
2 1 2 2 1 3 1
1 1 2 4 5 3 2
2 3 1 1 3 5 2
2 2 4 1 1 5 4
2 1 4 2 4 1 3
2 2 5 3 1 1 1
2 3 5 3 4 3 2
1 3 3 4 4 5 4
2 3 2 3 5 2 4
Gender:
1 - Male 2 - Female
Experience:
61
Step 3- Import data in SPSS and change the required measures to ‘Scale’.
Step 4- Click on Analyse -> Scale -> then click on Reliability Analysis.
62
Step 5- The Reliability analysis box will open. Import all 12 questions to Items. Then click
on Statistics.
Step 6- Select ‘Scale’ and ‘Scale if item deleted’. Click continue and then ok.
63
The following output will appear -
64
The value of Cronbach’s Alpha should be .7 or more to report the reliability of the data. As it
can be seen that the value of Cronbach’s Alpha is .692 which signifies that the given data is
close to being reliable.
The above Item-Total Statistics Table is used to resolve any reliability concerns that may
arise. However, in this case, it is not required.
65
Module - 9 (Chi-square Test)
Defining Chi-square, its steps and the significance of p-value in chi square.
(Practical in SPSS)
Question – Respondents were asked their gender and whether or not they were a
cigarette smoker. There were three answer choices: Non-smoker, Past smoker, and
Current smoker. Suppose we want to test for an association between smoking behaviour
(non-smoker, current smoker, or past smoker) and gender (male or female) using a Chi-
Square Test of Independence.
The Problem:
To identify the association between smoking behaviour (non-smoker, current smoker, or
past smoker) and gender (male or female).
Hypothesis –
● H1 - There is a significant association between gender and smoking behaviour.
● Ha - There exists no significant relationship between gender and smoking behaviour.
Step – 1 To create a data file showing gender and the smoking behaviour in excel and import
in SPSS –
66
Gender Smoking Behavior
1 1
1 2
2 3
2 1
1 2
2 3
1 3
2 2
1 1
2 3
1 2
1 3
2 3
1 1
2 2
1 1
1 1
2 3
2 2
2 2
1 2
1 3
2 2
2 3
1 1
1 1
1 2
1 2
2 3
1 3
2 2
2 3
1 3
2 3
2 1
1 3
1 2
1 2
1 3
2 3
1 1
1 1
1 3
1 2
1 1
1 3
1 3
2 1
1 3
1 2
67
Step – 2 Import Excel file in SPSS –
Non-Smoker-1
Past Smoker-2
Current Smoker-3
68
Step – 4 Click on Analyse -> Descriptive Statistics -> Crosstabs.
Step – 5 Drag and drop Smoking Behaviour into the Row box and Gender into the Column
box.
69
Step – 6 Click on Statistics, and select Chi – Square.
70
Past Smoker 10 6 16
Current Smoker 11 10 21
Total 31 19 50
Chi-Square Tests
Asymptotic
Value df Significance (2-sided)
Pearson Chi-Square 2.055a 2 .358
Likelihood Ratio 2.127 2 .345
Linear-by-Linear Association 1.994 1 .158
N of Valid Cases 50
a. 1 cells (16.7%) have expected count less than 5. The minimum expected count is 4.94.
Interpretation –
The Chi-Square Statistics were used to examine the association between the categorical
variables. There was insignificant relationship at 5% significance level between gender and
smoking behaviour of respondents (x2 = 2.055a, df = 2, p = 0. 358). It can be seen from the
above table that the p-value (0.358) is greater than the alpha value (0.05).
71
Module – 10 (T-Test)
Define T-Test. Explain One Sample T-Test and Paired Sample T-Test.
(Practical in SPSS)
Definition
A t-test is an inferential statistic that is used to see if there is a significant difference in the
means of two groups that are related in some way.
The t-test is one of many statistical tests that are used to test hypotheses.
Three key data values are required to calculate a t-test. They include the mean difference (the
difference between the mean values in each data set), the standard deviation of each group,
and the number of data values in each group.
Depending on the data and sort of analysis required, different forms of t-tests can be used.
72
Hypothesis –
H0 = There exists no significant relationship in engine efficiency between the current and
previous trial.
Ha = There exists a significant relationship in engine efficiency between current and previous
trial.
73
To find out T-Test with on sample –
This will open the Dialogue Box -> select the Test Variables and shift it to the Variable List
74
Interpretation – The value of two-tailed significance is less than .05 (p < .05) at 0.001, as
such the difference between the means is significant. The output indicates that there exists a
significant relationship in engine efficiency between the current and previous trials. The cars
in the current trial have more engine efficiency than those in the earlier trial with t (29) =
15.834, p < .05
With 95% confidence level, df = 29 and t = 0.05 we can see that our computed value (15.834)
is greater than our table value (1.699). Hence, we can say that the alternate hypothesis can
be accepted.
Click Analyze -> Compare Means -> Paired Sample T-Test ->
75
This will open the Dialogue Box -> select the Test Variables and shift them to the Variable
List
Interpretation – The value of two-tailed significance is more than .05 (p < .05) at 0.479 as
such the difference between the means is not significant. The output indicates that there does
not exist a significant relationship in engine efficiency between ethanol and without ethanol
trial. We can’t say that the cars with the ethanol additive have more engine efficiency than
those without ethanol, with t (29) = 0.053, p > .05
76
As can be seen from the 95% confidence level, df = 29 and t = 0.05 we can see that our
computed value (0.053) is lesser than our table value (2.045), hence we can say that the null
hypothesis has been accepted.
77
Module – 11 (Regression analysis)
Regression is a statistical method used in finance, investing, and other disciplines that attempt
to determine the strength and character of the relationship between one dependent variable
(usually denoted by Y) and a series of other variables (known as independent variables).
Regression helps investment and financial managers to value assets and understand the
relationships between variables, such as commodity prices and the stocks of businesses
dealing in those commodities.
• Bivariate Regression
• Multiple Regression
Bivariate regression is similar to bivariate correlation because both are designed for situations
in which there are just two variables. Bivariate analysis refers to the analysis of two variables
to determine the relationships between them. Bivariate analyses are often reported in quality-
of-life research. Essentially, bivariate regression analysis involves analyzing two variables to
establish the strength of the relationship between them. The two variables are frequently
denoted as X and Y, with one being an independent variable (or explanatory variable), while
the other is a dependent variable (or outcome variable).
Multiple regression, however, was created for cases in which there are three or more
variables. Multiple regression is a statistical technique that can be used to analyze the
relationship between a single dependent variable and several independent variables. The
objective of multiple regression analysis is to use the independent variables whose values are
known to predict the value of the single dependent value.
78
Difference between Correlation and Regression
Regression coefficient
79
R Values
Regression analysis would provide you with two different values. A simple R value
represents the correlation between the observed values and the predicted values (based on the
regression equation obtained from the DV. The other R value is referred to as R Square, it is
the square of R and gives the proportion of variance in the dependent variable accounted for
by the set of IV’s chosen for the model.
R Square is used to find out how well the IVs can predict the DV. However, the R square
value tends to be a bit inflated when the number of IVs is more or when the number of cases
is large. The adjusted R Square takes into account these things and gives more accurate
information about the fitness of the model. For example, R Square value of 0.70 would mean
that the IVs in the model can predict 70% of the variance in the DV.
Problem:
The HR manager wants to know the impact of training and development on employee
engagement of employees working in different companies.
Hypothesis:
80
● My organization helps me develop the skills I need for the successful accomplishment
of my duties (e.g.: training, conferences, etc.)
Employee Engagement
● I concentrate on my work
81
Step 2: Import Excel file data into SPSS
>OK)
82
Step 5: Go to Analyze ->Regression ->Linear
83
Step 6: Move the mean_ee in Dependent List and the mean_td in Independent List.
Step 7: Press OK. You will then see the following output:
84
Variables Entered/Removeda
Model Summary
Std. Error of the
Model R R Square Adjusted R Square Estimate
1 .056a .003 -.018 .46753
a. Predictors: (Constant), mean_td
ANOVAa
Coefficientsa
Standardized
Model Unstandardized Coefficients Coefficients t Sig.
85
B Std. Error Beta
1 (Constant) 3.216 .416 7.728 <.001
The Hypothesis tests if training and development carry a significant impact on employee
engagement. The dependent variable ‘employee engagement’ was regressed on predicting
variable ‘training and development’ to test hypothesis H1. T&D significantly predicted EE, F
(3.216) = .144, p <0.001, which indicates that the T&D can play a significant role in shaping
EE (b = -.052, p<0.001). These results direct the positive effect of the T&D. Moreover, the
R2= .003 depicts that the model explains 0.3% of the variance in EE. The table shows the
summary of the findings.
86
Module – 12 (ANOVA)
Analysis of Variance, i.e., ANOVA in SPSS, is used for examining the differences in the
mean values of the dependent variable associated with the effect of the controlled
independent variables, after taking into account the influence of the uncontrolled independent
variables. Essentially, ANOVA in SPSS is used as the test of means for two or more
populations.
ANOVA in SPSS must have a dependent variable which should be metric (measured using an
interval or ratio scale). ANOVA in SPSS must also have one or more independent variables,
which should be categorical. In ANOVA in SPSS, categorical independent variables are
called factors. A particular combination of factor levels, or categories, is called a treatment.
Problem – Comparing the scores of case students from four metro cities of India (Delhi,
Pune, Mumbai, Chennai). We obtained 25 respondents for each of the metropolitan
cities.
Hypothesis:
Step – 1
Create the data file of 100 respondents with scores of 25 respondents from each city and
import it to SPSS.
87
Cities Scores 3 512
1 444 3 438
1 536 3 559
1 534 3 425
1 459 3 487
1 491 3 563
1 567 3 539
1 575 3 550
1 500 3 539
1 547 3 549
1 412 3 444
1 568 3 532
1 524 3 451
1 411 3 522
1 570 3 437
1 579 3 443
1 430 3 407
1 600 3 583
1 521 3 500
1 488
3 552
1 527
3 590
1 591
3 402
1 572
3 500
1 563
3 569
1 500
3 408
1 565
4 435
2 502
4 460
2 595
4 411
2 421
4 577
2 474
4 425
2 552
4 414
2 494
2 598 4 477
2 472 4 427
2 442 4 489
2 564 4 542
2 576 4 469
2 402 4 421
2 512 4 542
2 506 4 507
2 409 4 424
2 453 4 484
2 474 4 556
2 485 4 586
2 545 4 477
2 447 4 453
2 420 4 523
2 587 4 582
2 431 4 487
2 479 4 592
2 543 4 414
88
Step – 2 – Assign values to the city in variable view as
Step – 3 – Click on Analyze -> Compare Means -> One Way Anova.
89
Step – 5 - Go to Post Hoc -> select Tukey -> Continue.
Step – 6 - Click on Options -> select Descriptive and Homogeneity of Variance Tests ->
click Continue -> Click OK to get your desired results.
90
Step – 7 - It will show the following output:
Oneway
Descriptives
Scores
95% Confidence Interval
for Mean
Std. Std. Lower Upper Minimu Maximu
N Mean Deviation Error Bound Bound m m
Delhi 25 522.96 56.411 11.282 499.67 546.25 411 600
Mumb 25 495.32 60.825 12.165 470.21 520.43 402 598
ai
Chenna 25 500.04 60.762 12.152 474.96 525.12 402 590
i
Kolkat 25 486.96 60.480 12.096 462.00 511.92 411 592
a
Total 100 501.32 60.249 6.025 489.37 513.27 402 600
ANOVA
Scores
Sum of Mean
Squares df Square F Sig.
Between 17803.440 3 5934.480 1.668 .179
Groups
Within Groups 341560.320 96 3557.920
Total 359363.760 99
91
Post Hoc Tests
Multiple Comparisons
Dependent Variable: Scores
Tukey HSD
Mean 95% Confidence Interval
(I) Cities (J) Cities Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
Delhi Mumbai 27.640 16.871 .362 -16.47 71.75
Chennai 22.920 16.871 .528 -21.19 67.03
Kolkata 36.000 16.871 .150 -8.11 80.11
Mumbai Delhi -27.640 16.871 .362 -71.75 16.47
Chennai -4.720 16.871 .992 -48.83 39.39
Kolkata 8.360 16.871 .960 -35.75 52.47
Chennai Delhi -22.920 16.871 .528 -67.03 21.19
Mumbai 4.720 16.871 .992 -39.39 48.83
Kolkata 13.080 16.871 .865 -31.03 57.19
Kolkata Delhi -36.000 16.871 .150 -80.11 8.11
Mumbai -8.360 16.871 .960 -52.47 35.75
Chennai -13.080 16.871 .865 -57.19 31.03
Interpretation -
As shown in the ANOVA table, the F test values and significance of 0.179, given that
p>0.5 you can accept the null hypothesis and reject the alternate hypothesis.
92