0% found this document useful (0 votes)
188 views

Research Methodology Lab File

The document is a research methodology lab file submitted by a student to their professor. It contains 12 modules covering topics related to data analysis using SPSS such as descriptive statistics, cross-tabulation, charts, correlation, reliability tests, chi-square tests, t-tests, regression, and ANOVA. Each module provides definitions of key terms and concepts and examples of how to perform the relevant analyses in SPSS.

Uploaded by

Ashwin K N
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views

Research Methodology Lab File

The document is a research methodology lab file submitted by a student to their professor. It contains 12 modules covering topics related to data analysis using SPSS such as descriptive statistics, cross-tabulation, charts, correlation, reliability tests, chi-square tests, t-tests, regression, and ANOVA. Each module provides definitions of key terms and concepts and examples of how to perform the relevant analyses in SPSS.

Uploaded by

Ashwin K N
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 92

MAHARAJA SURAJMAL INSTITUTE

Department Of Business Administration

Research Methodology Lab File

Submitted to: Dr. Monika Tushir Roll No.: 02314901720

Submitted by: Ashwin K N Course & Sec: BBA(G)-4A

Date Submitted: 01/05/2022 Shift: 1

Batch: 2020-23

UNDER GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY

Index

1
Module Module description Page no.
number
Module 1 Define SPSS. How to import data from 5
(Introduction MS Excel to SPSS? Differentiate
to SPSS) between Data View and Variable View.
Explain the basic elements of SPSS.
Describe the functions, advantages and
disadvantages of SPSS.
Module 2 What are Descriptive Statistics? Define 14
(Descriptive Mean, Median, Mode, Maximum and
Statistics) Minimum Value. (Practical in SPSS)

Module 3 Define the meaning and purpose of 18


(Cross- Cross-tabulation. Explain with example.
Tabulation) (Practical in SPSS)
Module 4 Define Charts and its types. What is 23
(Box-plot) box-plot? Explain with example.
(Practical in SPSS)
Module 5 Define the meaning of Correlation. 32
(Correlation) Explain with example. (Practical in
SPSS)
Module 6 Practical on all topics covered till 36
(Practical – Module 5
Exercise)

2
Module 7 Compute the Mean of variables. Also 52
(Mean of perform Test of Normality distribution
Variables & (reporting skewness, kurtosis, Shapiro
Normality Wilk test, Q-Q plots, and box plots).
Distribution (Practical in SPSS)
Test)
Module 8 Explain the meaning of Reliability Test. 59
(Reliability Show the steps to compute reliability.
Test) (Practical in SPSS)
Module 9 Defining Chi-square, its steps and 65
(Chi-square understanding significance of p-value in
Test) chi-square. (Practical in SPSS)
Module 10 Define T-Test. Explain One-Sample T- 71
(T-Tests) Test and Paired Sample T-Test.
(Practical in SPSS)
Module 11 Explain Regression and show steps to 77
(Regression) compute regression in SPSS. Also,
explain the types of Regression.
(Practical in SPSS)
Module 12 Define Anova Test. Perform Anova Test 86
(Anova) in SPSS. (Practical in SPSS)

3
Module - 1 (Introduction to SPSS)

Define SPSS. How to import data from MS Excel to SPSS? Differentiate


between Data View and Variable View. Explain the basic elements of SPSS.
Describe the functions, advantages, and disadvantages of SPSS.

4
Define SPSS

SPSS stands for Statistical Package for the Social Sciences and is used for complex statistical
data analysis by various researchers. The SPSS software package was created for the
management and statistical analysis of social science data. It is widely coveted due to its
straightforward command language and impressively thorough user manual.

SPSS is most often used in social science fields such as psychology, where statistical
techniques are involved at a large scale. In the field of psychology, techniques such as cross-
tabulation, t-test, chi-square test, etc., are available in the analyse menu of the software.

Processing and analysis of data using SPSS is done by market researchers, health researchers,
survey companies, government entities, education researchers, marketing organizations, data
miners, and many more.

Importing Data in SPSS

For Importing data from Excel to SPSS, perform the following steps –

Step 1 - Create a data table using MS Excel and save it.

5
Location Population HighestInfectionCount PercentPopulationInfected
Faeroe Islands 49053 34658 70.65419037
Denmark 5813302 3017529 51.90731533
Andorra 77354 39234 50.72006619
Gibraltar 33691 16436 48.78454187
Iceland 368792 175329 47.54143257
Slovenia 2078723 942954 45.36217668
Netherlands 17173094 7677637 44.70736025
San Marino 34010 15034 44.20464569
Slovakia 5449270 2366902 43.43521242
Cyprus 896005 387315 43.22687931
Latvia 1866934 777201 41.62980587
Georgia 3979773 1643295 41.29117414
Estonia 1325188 545057 41.13054148
Israel 9291000 3801627 40.91730707
Liechtenstein 38254 15497 40.51079626
Seychelles 98910 39991 40.43170559
Austria 9043072 3532415 39.06211296
Switzerland 8715494 3353754 38.48036612
Lithuania 2689862 997690 37.09075038
Montenegro 628051 232376 36.99954303
France 67422000 24394936 36.18245676
Czechia 10724553 3750636 34.97242263
Portugal 10167923 3458727 34.01606208
Belgium 11632334 3741614 32.16563417
Maldives 543620 174658 32.12869284
Luxembourg 634814 202577 31.91123699
Aruba 107195 33843 31.57143523
Isle of Man 85410 26734 31.30078445
Bahrain 1748295 546896 31.28167729

Step 2 - Open SPSS, then go to File > Import Data and then click on Excel.

6
Step 3 - Select the File that you want to open and then click on Open Button.

Step 4 - In the Read Excel File Box, click on OK and your data file will now open in SPSS.

7
Data and Variable View

An SPSS data file always has two tabs in the left bottom corner:


 Data View is where we inspect our actual data and
 Variable View is where we see additional information about our data.

You can switch between Data View and Variable View by

 clicking the tabs in the left bottom corner;


 using the ctrl + t short key;

Let's first take a look at the main parts of the Data View tab.

SPSS Data View

1. The data editor has tabs for switching between Data View and Variable View.
Make sure you're in Data View.
2. Columns of cells are called variables. Each variable has a unique name
(“gender”) which is shown in the column header.
3. Rows of cells are called cases. Oftentimes, each respondent in a study is
represented as a single case.
4. In SPSS, values refer to cell contents.
5. The status bar may give useful information on the data, for instance, whether
a WEIGHT, FILTER, SPLIT FILE or Unicode mode is in effect.

8
These are the main elements in Data View.

SPSS Variable View

1. In the left bottom corner, we find tabs for switching between Variable View and
Data View. For now, select Variable View.
2. In Variable View, variables are shown as rows of cells.
3. The first column shows the variable name for each variable.
4. The fifth column may or may not contain a variable label. This describes the exact
meaning of each variable.
5. The sixth column shows value labels: descriptions of the meaning of one, many or
all values that a variable may contain.

Elements of SPSS

The elements or components of SPSS are as mentioned below:

1. The Data Editor

The Data Editor opens immediately upon starting SPSS and, when empty, looks like a
typical spreadsheet. When data is loaded into the Data Editor, each column will represent a
variable and each row will represent a case. Selecting the tab at the bottom that's labeled
"Variable View" allows the user to view and edit information about each variable. To open a
new Data Editor, select "File"->"New"->"Data." When the contents of the Data Editor are

9
saved, the resulting file will have a ".sav" extension. If a file has been saved in the SAV
format you can open it by selecting "File"->"Open"->"Data."

2. Variables in SPSS

Each variable in an SPSS dataset has a set of attributes that can be edited by toggling to the
"Variable View" tab in the Data Editor:

Name is the variable's machine-readable name. This is the name used to refer to the variable
in SPSS's underlying code and, if no "Label" is defined, the name that will appear at the top
of the column in the "Data View."

Type indicates the type of data that can be stored in the variable's column. The most
frequently used types are "String" (for text) and "Numeric." SPSS uses the type to know what
rules can be applied to a specific variable. It won't do arithmetic on a string variable, for
example.

Width indicates the allowed number of characters per instance.

Decimals set the number of decimal places allowed in variable instances.

Label sets the name that will be displayed at the top of the column in the Data Editor,
allowing for a human readable representation of the variable name.

Values sets names given to coded values (e.g., if the variable contains survey responses
where a "0" represents "no" and "1" represents a "yes" this field can be used to tell SPSS to
display the text values instead of the numerical raw data).

Missing sets the values that will be encoded as "Missing."

Columns sets the displayed column length.

Align sets the displayed alignment (right, left, or center).

Measure sets the statistical level of measurement. SPSS distinguishes between "Scale"


(variables that represent a continuous scale-like population or temperature), "Ordinal"
(variables that can be rank ordered but do not represent precisely measured values), and
"Nominal" (variables that cannot be ranked such as those that represent labels or
classifications).

10
Role is used by some SPSS dialogues to distinguish between the variable's intended usage in
some predictive applications (e.g., regression, clustering, classification). For most dialogues,
the role won't be significant.

Clicking in the appropriate cell will open a dialogue box or drop-down menu that allows the
attribute value to be altered.

3. SPSS Statistics Viewer

The Statistics Viewer displays the output of actions performed on data. Whether you want to
generate frequency tables, perform one-way ANOVA, or build a regression model, the results
will end up as a table or graph in the statistics viewer. When data is loaded into the Data
Editor or changed in some way (e.g., if it is sorted), the Statistics Viewer will also display
some text describing the operation and any errors that occurred. Usually, the text is in the
form of SPSS syntax (see below). Any Action you perform from the Data Editor that
generates output will automatically open the Statistics Viewer, but you can also open a new
Statistics Viewer window by selecting "File"->"New"->"Output." The contents of a Statistics
Viewer window will be saved with a ".spv" extension.

4. SPSS Syntax Editor

The Syntax Editor allows you to control SPSS using the SPSS command language, usually
referred to as "syntax" in the SPSS documentation. At one time all of the user's interactions
with SPSS would have been performed using syntax commands, but these days, most
processes are automated in the Data Editor's menus. There are still times when writing or
editing syntax commands is useful, or even necessary. To open the Syntax Editor, select
"File"->"New"->"Syntax."

SPSS also has a Script Editor, which allows users to automate processes by writing
programs in Visual Basic or (with an extension) Python. The application is outside of the
scope of this guide, but can be a powerful tool when used for processing large numbers of
files consecutively or transforming text.

11
Functions of SPSS

SPSS offers four programs that assist researchers with their complex data analysis needs.

1. Statistics Program

SPSS’s Statistics program provides a plethora of basic statistical functions, some of which
include frequencies, cross-tabulation, and bivariate statistics.
2. Modeler Program

SPSS’s Modeler program enables researchers to build and validate predictive models using
advanced statistical procedures.

3. Text Analytics for Surveys Program

SPSS’s Text Analytics for Surveys program helps survey administrators uncover powerful
insights from responses to open-ended survey questions.
4. Visualization Designer

SPSS’s Visualization Designer program allows researchers to use their data to create a wide
variety of visuals like density charts and radial boxplots from their survey data with ease.

Advantages of SPSS

 SPSS is a comprehensive statistical software.


 Many complex statistical tests are available as a built-in feature.
 Interpretation of results is relatively easy.
 Not much effort is needed for the researcher to use this software.
 The time required for analysing the data with the help of SPSS is comparatively less
than any other statistical tool.
 It is beneficial for both types of data, quantitative as well as qualitative.
 The users get the freedom of selecting a preferable graph type that matches the
requirements of their data distribution.
 The possibility of the occurrence of errors is minimum with the use of SPSS.

12
Disadvantages of SPSS
 A very large dataset cannot be analysed.
 SPSS can be expensive to purchase for students.
 Usually involves added training to completely exploit all the available features.
 The graph features are not as simple as those of Microsoft Excel.

13
Module - 2 (Descriptive Statistics)

What are Descriptive Statistics? Define Mean, Median, Mode, Maximum


and Minimum Value. (Practical in SPSS)

What Are Descriptive Statistics?

Descriptive statistics are brief descriptive coefficients that summarize a given data set,
which can be either a representation of the entire population or a sample of a population.
Descriptive statistics are broken down into measures of central tendency and measures of
variability (spread). Measures of central tendency include the mean, median, and mode,
while measures of variability include standard deviation, variance, minimum and maximum
variables, kurtosis, and skewness.

There are 3 main types of descriptive statistics:

 The distribution concerns the frequency of each value.


 The central tendency concerns the averages of the values.
 The variability or dispersion concerns how spread out the values are.

Definitions

Mean: A mean is the simple mathematical average of a set of two or more numbers. The
mean for a given set of numbers can be computed using arithmetic and geometric mean
method.

Median: The median is the middle number in a sorted, ascending or descending, list of
numbers and can be more descriptive of that data set than the average.

Mode: In statistics, the mode is the value that is repeatedly occurring in a given set. It is the
value or number in a data set, which has a high frequency or appears more frequently.

The minimum: The minimum value occurring in the data set. If we were to order all of our
data in ascending order, then the minimum would be the first number on our list.

14
The maximum: The maximum value occurring in the data set. If we were to order all of our
data in ascending order, then the maximum would be the last number listed.

Practical

Step – 1 Fill the data of 50 respondents in MS excel. Then click on File > Save As.

Class Value 1 Value 2


Gender Male Female
Age Below 25 years Above 25 years
Qualification UG PG
Marital status Married Unmarried
Experience Below 5 years Above 5 years

Step -2 Open the SPSS -> File tab -> Import data -> choose the excel button and the file
which was saved and click OK.

Step - 3 Go to Variable View -> go to Column Value -> click on the three dots in respective
columns -> you can assign values to our desired fields.

15
Step - 4 Go to Data View -> click on Analyze Tab -> select Descriptive Statistics ->
Frequencies -> select the frequencies you desire the analysis of -> click on Statistics and
choose mean, Median, Mode, Maximum, Minimum.

Step -5 – We get the following output.

Statistics
Qualificatio Marital
Gender n Experience Age Status
N Valid 50 50 50 50 50
Missing 0 0 0 0 0
Mean 1.54 1.50 1.50 1.48 1.50
Median 1.50 1.00 1.00 1.00 1.50
Mode 1 1 1 1 1a
Std. Deviation .579 .580 .580 .544 .505
Minimum 1 1 1 1 1
Maximum 3 3 3 3 2
a. Multiple modes exist. The smallest value is shown

16
Interpretations: We can see from the above table that the mean value is 1.50 in
qualification and median is 1 and the mode is also 1. That shows there are fewer people
qualified in PG than at UG.

We can also see that since the mean value of gender is 1.54 and median is 1.50 and the mode
is 1. That means in the no. of respondents the number of males and females are not equal.

17
Module - 3 (Cross-Tabulation)

Define the meaning and purpose of Cross-tabulation. Explain with


example. (Practical in SPSS)

A cross-tabulation is a two- (or more) dimensional table that records the number (frequency)
of respondents that have the specific characteristics described in the cells of the table. Cross-
tabulation tables provide a wealth of information about the relationship between the
variables.

They are data tables that display not only the results of the entire group of respondents, but
also the results from specifically defined subgroups. For this reason, crosstabs allow
researchers to closely investigate the relationships within a data set that might otherwise go
unnoticed.

Purpose of cross-tabulation

The various purposes of cross-tabulation are as mentioned below:

1. Used to quantitatively analyse the relationship between multiple


variables.

Cross tabulations or crosstabs group variables together and enable researchers to understand
the correlation between the different variables. By showing how correlations change from
one group of variables to another, cross tabulation allows for the identification of patterns,
trends, and probabilities within data sets.
2. Cross tabulation allows researchers to investigate data sets at a more
granular level.

Survey results are typically presented in aggregate data tables that show the total responses to
all questions asked in the survey. Cross tabulations are data tables that display not only the

18
results of the entire group of respondents, but also the results from specifically defined
subgroups. 

For this reason, crosstabs allow researchers to closely investigate the relationships within a
data set that might otherwise go unnoticed. 

3. For reducing confusion while analysing data.

By creating crosstabs, data sets are simplified by dividing the total set into representative
subgroups, which can then be interpreted at a smaller, more manageable scale. 

This reduces the potential for making errors while analysing the data, which means that time
is spent efficiently.

4. For deriving profound and actionable insights

By reducing total data sets into more manageable subgroups, cross tabulation allows
researchers to yield more granular, profound insights. 

The entire purpose of performing statistical analysis on a data set is to uncover actionable
insights that will then impact your end goal. Because cross tabulation simplifies complex data
sets, these impactful insights are much easier to expose, record, and consider while
developing overarching strategies. 

Practical

Step – 1 – Use the data of the same 50 respondents in MS excel and import them into the
SPSS software and assign values to them.

19
Class Value 1 Value 2
Gender Male Female
Age Below 25 years Above 25 years
Qualification UG PG
Marital status Married Unmarried
Experience Below 5 years Above 5 years

Step – 2 – We go to the Data View -> click on Analyze -> go to Descriptive Statistics ->
Cross Tabulation.

Step – 3 – Select the variables like Gender in the row section and Qualification in the column
section -> Click OK.

20
Step – 4 – We get the desired results in the Output.

Gender * Qualification
Crosstabulation
Count
Qualification
UG PG Total
Gender Male 14 11 25
Female 19 6 25
Total 33 17 50

Gender * Age Crosstabulation


Count
Age
20-30 30-40 Total
Gender Male 15 10 25
Female 18 7 25
Total 33 17 50

Qualification * Experience Crosstabulation


Count
Experience
Not
Working Working Total
Qualificati UG 16 17 33
on PG 8 9 17
Total 24 26 50

21
Interpretation

From the above cross tables, we can infer that:

1) There are more respondents (males and females) between the age of 20-
30 as compared to 30-40. Therefore, most of the respondents are between
the age group of 20-30.
2) The no. of respondents who have cleared UG is more than those who
have cleared PG.
3) The no. of educated respondents who are unemployed are more than
those who are employed.

22
Module – 4 (Charts and boxplots)

Define Charts and its types. What is box-plot? Explain with example.

A chart is a graphical representation for data visualization, in which the data is represented


by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart. A chart
can represent tabular numeric data, functions or some kinds of quality structure and provides
different info.

Types of Charts

The different types of charts are as mentioned below:

1. Column Charts

Column charts are effective for the comparison of at least one set of data points. The vertical
axis, also known as the Y-axis, is often shown in numeric values. The X-axis on the
horizontal line indicates a period.

23
A clustered column chart is useful in showing and analysing multiple data sets. For stacked
column charts, you can quickly check a specific percentage of the overall data.

2. Bar Charts

Bar charts are for comparing concepts and percentages among factors or sets of data. Users
can set different distinct choices for their respondents, for example, annually or quarterly
sales. You can see bar charts are similar to column charts lying on their X-axis.

Usually, compared to other types of charts, bar charts are better for showing and comparing
vast sets of data or numbers.

3. Pie Charts

24
Pie charts are useful for illustrating and showing sample break down in an individual
dimension. It is in the shape of a pie to show the relationship between your data's main and
sub-categories. It is good to use when you are dealing with categorized groups of data, or if
you want to show differences among data based on a single variable.

One can break down any sample data groups into different categories, for example, by gender
or in various age groups. For business projects, one can use pie charts to represent the
importance of one specific factor on the others.

4.  Line Charts

This type of chart is normally used for explaining trends over periods. The vertical axis
always displays a numeric amount, while the X-axis indicates some other related factors.
Line charts can be shown with markers in the shape of circles, squares, or other formats.

25
Line charts make it evident for users to see the trend within a specific period for a single set
of data. Alternatively, one can compare trends for several different data groups. Managers or
financial leaders may use such charts to measure and analyse long-term trends in sales,
financial data, or marketing statistics.

What is a Boxplot?

A boxplot is a standardized way of displaying the distribution of data based on a five number
summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It
can tell you about your outliers and what their values are.

Boxplots are very useful for comparing distribution and identifying outliers.

There are two main types of boxplots or box and whisker diagrams as they are:

1. Simple Boxplots: Used to compare the distribution of one variable (i.e., Training and
development perception of millennial of X Company) based on one categorical variable (Age
Group).

2. Cluster Boxplots: can be used to compare the distribution of one variable (TD Perception)
based on two categorical variables (Gender and Age Group).

Note: Before Using Boxplot in SPSS: Box plot will only work if you have a nominal or
ordinal variable on the X-axis and if you have a scale variable on the Y-Axis. Also remember
to code your data correctly.

26
Ex - The following data indicates the billing amount with respect to the data usage for
randomly selected 20 mobile telecom subscribers belonging to male and female
categories.

1) Prepare an SPSS data file by incorporating gender as the nominal variable, and amount as
the scale variable.

Step – 1 – Prepare data of 20 male and 20 female respondents and the billing amounts in MS
excel and import it to SPSS.

27
Gender Billing Amount
1 500 1 155
2 660 2 650
1 520 1 345
2 140 2 980
1 900 1 1200
2 300 2 450
1 655 1 670
2 450 2 950
1 120 1 345
2 855 2 335
1 850 1 210
2 1400 2 950
1 1450 1 780
2 730 2 450
1 860 1 2400
2 600 2 125
1 400 1 760
2 100 2 650
1 890 1 440
2 740 2 560

Step – 2 – Go to Variable View > click on Values > Assign values to Gender as 1- Male and
2- Female.

28
Ensure that Gender is a Nominal variable and Billing Amount is a Scale Variable as shown
below.

Define Values:

1. Male
2. Female

Steps for Simple Boxplots in SPSS

1) Go to Top Menu, click on Graphs, then click on Chart Builder and press OK.

2) Go to Gallery, click on Boxplot, then click on the first boxplot and drag that into the
chart preview window.
3) Go to the Variables window, drag the Nominal, Ordinal, or Categorical Variable on X
axis and Scale Variable on Y axis then press OK.

29
4) In Output window you will see the Simple Boxplot of Billing Amount by Gender.

5) Black line reflects the difference in median value of Billing Amount for Male and
Female groups.

6) Double click on Simple Boxplot chart to activate the same.

30
7) Double click on the Boxplot, then right-click on the black line, then click Show Data
Labels. A properties window will pop up, click on Close, then you will be able to see
the median value on the black line. Close the Chart Editor window. The value will be
shown in your Simple Boxplot.

8) Interpretation:
Simple boxplot is showing the difference in the mean scores of Males (663) and
females (625) based on Billing Amount. Results reveal that Males are having higher
mean amount spent on mobile bills than Females.

The following steps can be performed to create Cluster Box-plots in SPSS -

1) Go to Menu, click on Graphs, then click on Chart Builder, press OK

31
2) Go to Gallery, click on Boxplot, then click on Cluster Boxplot (2nd one) and drag
that into Chart Preview Window.
3) Go to Variable window, drag the one (Gender) Nominal, Ordinal or Categorical
variable on X axis, drag the other (Billing Amount) Nominal, Ordinal or
Categorical variable on Cluster on X SET window (you can see on your right
side), and Scale Variable on Y axis than press OK.
4) In Output window you will see the Clustered Boxplot of Billing Amount by
Gender by Gender.
5) Black line reflects the median value of the difference in Gender and Billing
Amount.
6) Double click on Simple Boxplot chart to activate the same.
7) Select the black line, then right click, and click the show data labels. A properties
window will pop up, click on Close, then you can see the median value on the
black line. Close the chart editor window. The value will be shown in your
clustered boxplot.
8) Interpretation:

Clustered boxplot is showing the difference in the mean scores of Males (663) and
females (625) based on Billing Amount. Results reveal that Males are having
higher mean amount spent on mobile bills than Females.

32
Module - 5 (Correlation)

Define the meaning of Correlation. Explain with example.

According to Craxton and Cowden, correlation is “when the relationship is quantitative, the
approximate statistical tool for discovering and measuring the relationship and expressing it
in a brief formula is known as correlation”. In short, the tendency of simultaneous variation
between two variables is called correlation or covariation. For example, there may exist a
relationship between heights and weights of a group of students, the scores of students in two
different subjects are expected to have an interdependence or relationship between them.

If the change in one variable appears to be accompanied by a change in the other variable, the
two variables are said to be correlated and this interdependence is called correlation or
covariation.

Practical

18 students have taken Common Entrance Test (CAT) after their graduation. They were also
given their aptitude. The following is the information related to the CAT percentile and their
graduation percentage. Mr. X, a researcher, wants to see the relationship between the scores
of CAT & graduation percentages through correlation analysis.
Student No. Graduation Percentage CAT Percentile
1 70 80
2 60 85
3 65 70
4 68 65
5 70 69
6 75 89
7 80 99
8 89 95
9 90 94
10 95 98
11 65 88
12 68 75
13 72 89
14 78 88
15 87 90
16 91 89
17 82 94
18 84 93

33
Let us find out the existence of any relationship between graduation percentage and CAT
percentile.

The following hypothesis is being tested-


H0: There exists a positive relationship between graduation percentage and CAT percentile.
H1: There exists no relationship between graduation percentage and CAT percentile.

The following steps are applied –

Step1 – Enter the given data in excel sheet.


Step 2- Import the data in SPSS
Step 3 – Go to Analyse > Correlate > Bivariate.

Step 4 – Enter both the variables in the Variables box, select Pearson Two-Tailed Test and
click OK.

34
Step 5 – The following output will appear in the output window.

Correlations
Graduation CAT
Percentage Percentile
Graduation Percentage Pearson Correlation 1 .687**

Sig. (2-tailed) .002


N 18 18
CAT Percentile Pearson Correlation .687** 1

Sig. (2-tailed) .002


N 18 18
**. Correlation is significant at the 0.01 level (2-tailed).

35
Results & Interpretation

As seen from the above output window, the following interpretations can be made-

The correlation value is .687 which is near +1 and therefore it can be concluded that there is a
strong positive correlation between the graduation percentage and CAT percentile. Hence, H0
can be accepted. Also, as it can be seen that the P-value (.002) is less than .005 therefore our
hypothesis is accepted, which means there exists a positive relationship between graduation
percentage and CAT percentile.

36
Module – 6 (Practical Problem)

Create a data file with data on the following fields and codes and create
charts (histograms), calculate frequencies (standard deviation, mean,
median, mode, maximum, minimum), and create boxplots (simple and
clustered boxplots).

Create a data file in excel of fifty respondents and import it to SPSS and add values to them.

37
Gender

1) male 2) female

Marital status

1) married 2) unmarried

Education

1) undergraduate 2) postgraduate 3) PhD

Experience

1) less than 2 years 2) 2 – 5 years 3) 6 – 10 years 4) 10 and above years

Stress Levels

1) strongly disagree 2) disagree 3) neutral 4) agree 5) strongly agree

1) Calculate the frequencies

Go to Data View -> click on Analyse Tab -> select Descriptive Statistics -> Frequencies -
> select the frequencies you desire the analysis of -> click on Statistics and choose Mean,
Median, Mode, Maximum, Minimum, Standard Deviation.

Statistics

Gender Marital Status Education Experience Stress Levels


N Valid 50 50 50 50 50
Missing 0 0 0 0 0

Mean 1.48 1.56 1.86 2.24 2.74


Median 1.00 2.00 2.00 2.00 3.00
Mode 1 2 2 1 3
Std. Deviation .505 .501 .756 1.153 1.306

38
Minimum 1 1 1 1 1
Maximum 2 2 3 4 5

Interpretation – It is easy to understand that the mean stress level is 2.74 while the median
stress level is 3.00 and the mode stress level is 3. The standard deviation is 1.306.

2) Create the charts

Interpretation – We can see from the above graph that the mean stress level is more in the
Females as compared to the Males.

39
Interpretation - We can see from the above graph that the stress levels are more in the
Unmarried People as compared to the Married People.

Interpretation - We can see from the above graph that the most stress levels are in Ph.D.
students followed by Postgraduates and then Undergraduates.

40
Interpretation – We can see from the above graph that the stress levels are more in people
that have the experience above 10 years followed by people that have 6 – 10 years followed
by people that have 2 – 5 years followed by people that have less than 2 years.

41
Interpretation - We can see from the above graph that the stress levels are more in females
as compared to males.

Interpretation – We can see from the above graph that the stress levels are more in the
unmarried people as compared to the married people.

42
Interpretation - We can see from the above graph that the stress levels are more in Ph.D.
level educated people followed by undergraduates and then postgraduates.

Interpretation - We can see from the above graph that the stress levels are more in people
that have experience above 10 years followed by people that have 6 – 10 years followed by
people that have 2 – 5 years followed by people that have less than 2 years.

3) Create boxplots and clustered boxplots

Go to Graphs -> click on Chart Builder -> click on Boxplots -> select Simple Boxplots and
assign Gender as x-axis and Stress Levels as y-axis.

Replace the gender with other variables on x-axis and repeat the process to get more Simple
Boxplots.

43
Interpretation – Simple boxplot is showing the difference in the median stress levels of
males and females. This shows that females have more stress than males.

44
Interpretation – Simple boxplot is showing the difference in the median stress levels of
married and unmarried people. We can see that married people have more stress than
unmarried people.

Interpretation - Simple boxplot is showing the difference in the median stress levels of
Ph.D., postgraduates, and undergraduates. This shows that Ph.D. and postgraduate students
have more stress than undergraduate.

45
Interpretation – Simple boxplot is showing the difference in the median stress levels of
people that have different number of experiences. We can see that the people that have less
than 2 years’ experience have more stress than the rest.

For Clustered Boxplot -


Go to Graphs -> click on Chart Builder -> click on Boxplots -> select Clustered Boxplots and
assign Gender as x-axis and Stress Levels as y-axis and Experience as the filter.

Replace the Cluster on X with different values and repeat the process to get other clustered
boxplots.

46
Interpretation – This clustered boxplot is showing the difference in the median stress levels
of males and females who have varying amounts of experience. In females, those who are
having experience of fewer than 2 years and 6-10 years are having similar stress levels and
those who are having 2-5 years and 10 and above years’ experience have similar stress levels,
while in the males those with less than 2 years experience have the most stress.

47
Interpretation – This clustered boxplot is showing the difference in the median stress levels
of males and females having different levels of education. The females who are
undergraduates and have a Ph.D. have similar amount of stress while the males with Ph.D.
have the most stress.

4) Perform correlation

Go to Data View -> click on Analyse -> click on Correlate -> click on Bivariate -> in the
pop-up menu enter any variable with stress level to get the following output.

Correlation between Gender and Stress Levels


Correlations

Gender Stress Levels


Gender Pearson Correlation 1 .069

Sig. (2-tailed) .632


N 50 50
Stress Levels Pearson Correlation .069 1

Sig. (2-tailed) .632

48
N 50 50

Interpretation - As can be seen from the above table –


1) There exists a positive correlation of 0.069 which shows that there is positive correlation
between Gender and Stress Levels.
2) The P-value is 0.632 which shows that our hypothesis is rejected, H1 (alternate hypothesis
is selected).

Correlation between Experience and Stress Levels

Correlations

Experience Stress Levels


Experience Pearson Correlation 1 -.229

Sig. (2-tailed) .110


N 50 50
Stress Levels Pearson Correlation -.229 1

Sig. (2-tailed) .110


N 50 50

Interpretation - As can be seen from the above table –


1) There exists a negative correlation of -0.229 which shows that there is a negative
correlation between Experience and Stress Levels.
2) The P-value is 0.110 which shows that our hypothesis is rejected, H1 (alternate hypothesis
is selected).

5) Perform normality distribution.

Check Skewness and Kurtosis as a normality distribution test.

Quick Steps

49
1. Click on Analyse -> Descriptive Statistics -> Descriptives
2. Drag and drop the variable for which you wish to calculate skewness and kurtosis into
the box on the right
3. Click on Options, and select Skewness and Kurtosis
4. Click on Continue, and then OK

5. The result will appear in the SPSS output viewer.

This is fairly self-explanatory. The skewness statistic is 0.83 and kurtosis is -2.078 (see
above). You can also see that SPSS has calculated the mean (1.48) and the standard deviation
(0.505). N represents the number of observations.

A Shapiro –Wilks test (p<.05) showed that the stress level was approximately normally
distributed for both males and females, with a standard error in the skewness of 0.337 and a
standard error in kurtosis of 0.662. These values are the same for both genders.

50
For Calculation of Skewness and Kurtosis we can also use the following way:

To begin the calculation, click on Analyse -> Descriptive Statistics -> Frequencies.

This will bring up the Frequencies dialog box. You need to get the variable for which you
wish to calculate skewness and kurtosis into the box on the right. You can drag and drop, or
use the arrow button.

Once you’ve got your variable into the right-hand column, click on the Statistics button. This
will bring up the Frequencies: Statistics dialog box, within which it is possible to choose
several measures.

To calculate skewness and kurtosis, just select the options (as above). You’ll notice that
we’ve also instructed SPSS to calculate the mean and standard deviation.

Once you’ve made your selections, click on Continue, and then on OK in the Descriptive
dialog to tell SPSS to do the calculation.

The Result

The result will pop up in the SPSS output viewer. It will look something like this.

51
This is fairly self-explanatory. The skewness statistic is 0.83 and kurtosis is -2.078 (see
above). You can also see that SPSS has calculated the mean (1.48) and the standard deviation
(0.505). N represents the number of observations.

A Shapiro –Wilks test (p<.05) showed that the stress level was approximately normally
distributed for both males and females, with a standard error in the skewness of 0.337 and a
standard error in kurtosis of 0.662. These values are the same for both genders.

52
Module – 7 (Mean of variables, test of normality distribution)

Compute Mean of variables. Also perform Test of Normality distribution


(reporting skewness, kurtosis, Shapiro Wilk test, Q-Q plots, and box plots).

Question – A study to investigate the customer experience towards a food delivery on


the basis of following questions.

What is Skewness?
Skewness is a measure of the symmetry, or lack thereof, of a distribution. In mathematics, a
figure is called symmetric if there exists a point in it through which if a perpendicular is drawn on
the X-axis, it divides the figure into two congruent parts i.e. identical in all respect or one part can
be superimposed on the other i.e mirror images of each other. In Statistics, a distribution is called
symmetric if mean, median and mode coincide. Otherwise, the distribution becomes asymmetric.

53
What is Kurtosis?
Kurtosis measures the tail-heaviness of the distribution. Like skewness, kurtosis is a
statistical measure that is used to describe distribution. Whereas skewness differentiates
extreme values in one versus the other tail, kurtosis measures extreme values in either tail.
Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution
(e.g., five or more standard deviations from the mean). Distributions with low kurtosis exhibit
tail data that are generally less extreme than the tails of the normal distribution.

54
We’re going to calculate the skewness and kurtosis of the data that represents the customer
experience in food delivery system. The usual reason to do this is to get an idea of
whether the data is normally distributed.

Steps to compute mean variable –

Step 1 – Go to Transform -> Compute Variable -> define Target Variable -> Select All in
function group -> select Mean from Functions and Special variables -> enter mean in Type
and Labels -> drag and drop the variables.

Now go to Data View and observe that a column for Mean has appeared.

55
Normality distribution test –

Hypothesis to be tested – Null hypothesis, in this case, is customer experience


towards food delivery system is normally distributed for both males and females.

Step – 2 For normality distribution go to Analyze -> Descriptive Statistics -> Explore ->
enter Mean in Dependent List and Gender in Factor List -> click on Plots -> Check
Normality Plots with Tests -> Click OK

56
Interpretation – In males, we can see that there is a Skewness Statistic value of .401 and
Std. Error of .597 and there is a Kurtosis statistic value of -.426 and Standard Error of 1.154.
We can calculate the normality distribution in Skewness to be 0.672 and in Kurtosis the same
to be -0.369, which shows that the data is normally distributed.

Interpretation – In females, we can see that there is a Skewness Statistic value of .136 and
Standard Error of .564 and there is a Kurtosis statistic value of -1.025 and Standard Error of
1.091. We can calculate the normality distribution in Skewness to be 0.241 and in Kurtosis
the same to be -0.940, which shows that the data is normally distributed.

57
Interpretation – We can observe that for males and females, the p-value is equal to 0.956
and 0.297 respectively, which means p > 0.05, which ultimately means that we can accept the
null hypothesis, and we can conclude the case that the Customer’s Experience towards Food
Delivery System is normally distributed for both male and female.

Interpretation – We may conclude that our data is normally distributed because the
variables are concentrated around the normality line.

58
Interpretation – We may conclude that our data is normally distributed because the
variables are concentrated around the normality line.

Interpretation – We can observe that the mean value of Male and Female is 2.71 and 2.79
respectively. With this, we can conclude that our data is normally distributed.

59
Module - 8 (Reliability Test)

Explain the meaning of Reliability Test. Show the steps to compute


reliability.

Reliability test refers to the extent to which a test measures without error. It is highly
related to test validity. Reliability test can be thought of as precision; the extent to which
measurement occurs without error. Reliability is not a constant property of a test and is
better thought of as different types of reliability for different populations at different levels
of the construct being measured.

Q. A Manager conducted a training and development program for the employees of


ABC Ltd. He is interested in determining the effectiveness of the training and
development program and to check whether it leads to enhancement of employee’s
performance or not. Responses were collected from employees to check the employee
satisfaction from training and development program and the following questions were
asked: -

60
In this module, we will be checking the reliability and will also discuss how to deal with
the reliability issue if required. The following steps will be performed for checking the
reliability:

Step1 – Create an excel file showing 50 responses of employees of ABC Ltd who were given
training and development.
Gender Employee's ExperiencePlanned objectives were met Issues were dealt in depth Adequate length of course Well suited method Method enabled to take active part in training
1 3 4 5 5 2 1
1 2 5 5 5 1 2
1 3 3 5 1 1 2
2 3 3 1 5 1 3
1 3 1 2 2 1 1
1 1 2 3 5 1 1
1 3 1 2 5 4 1
2 3 4 5 3 3 2
1 1 3 5 1 3 1
1 3 5 1 4 4 3
1 3 4 3 5 5 1
2 1 3 3 5 4 1
2 3 5 2 4 5 1
2 3 5 1 3 2 2
2 2 3 1 1 4 1
2 1 2 2 4 2 2
2 3 4 3 3 3 2
2 1 2 2 1 3 1
1 1 2 4 5 3 2
2 3 1 1 3 5 2
2 2 4 1 1 5 4
2 1 4 2 4 1 3
2 2 5 3 1 1 1
2 3 5 3 4 3 2
1 3 3 4 4 5 4
2 3 2 3 5 2 4

Step 2 – Assign values-

Gender:

1 - Male 2 - Female

Experience:

1 - 0-5 Years 2- 5-10 Years 3 - More than 10 Years

Responses to all questions on the Likert 5-point scale:

1 - Strongly Disagree 2 – Disagree 3 – Neutral 4 – Agree 5 - Strongly Agree

61
Step 3- Import data in SPSS and change the required measures to ‘Scale’.

To check the reliability, the following steps will be performed: -

Step 4- Click on Analyse -> Scale -> then click on Reliability Analysis.

62
Step 5- The Reliability analysis box will open. Import all 12 questions to Items. Then click
on Statistics.

Step 6- Select ‘Scale’ and ‘Scale if item deleted’. Click continue and then ok.

63
The following output will appear -

64
The value of Cronbach’s Alpha should be .7 or more to report the reliability of the data. As it
can be seen that the value of Cronbach’s Alpha is .692 which signifies that the given data is
close to being reliable.

The above Item-Total Statistics Table is used to resolve any reliability concerns that may
arise. However, in this case, it is not required.

65
Module - 9 (Chi-square Test)

Defining Chi-square, its steps and the significance of p-value in chi square.
(Practical in SPSS)

Chi-Square Test of Association


• The Chi-Square Test for the association is used when you want to check the association
between two categorical variables on the Nominal Scale. However, it is important to note
that in the case of two variables being compared, the test can also be interpreted as
determining if there is a difference between the two variables. The test is also referred to
as The Chi-Square test for independence, also called Pearson's Chi-square test.
• It is used to determine whether there is a statistically significant difference between the
expected frequencies and the observed frequencies. The purpose of the test is to evaluate
how likely the observed frequencies would be assuming that the null hypothesis is true.

Question – Respondents were asked their gender and whether or not they were a
cigarette smoker. There were three answer choices: Non-smoker, Past smoker, and
Current smoker. Suppose we want to test for an association between smoking behaviour
(non-smoker, current smoker, or past smoker) and gender (male or female) using a Chi-
Square Test of Independence.

The Problem:
To identify the association between smoking behaviour (non-smoker, current smoker, or
past smoker) and gender (male or female).

Hypothesis –
● H1 - There is a significant association between gender and smoking behaviour.
● Ha - There exists no significant relationship between gender and smoking behaviour.

Step – 1 To create a data file showing gender and the smoking behaviour in excel and import
in SPSS –

66
Gender Smoking Behavior
1 1
1 2
2 3
2 1
1 2
2 3
1 3
2 2
1 1
2 3
1 2
1 3
2 3
1 1
2 2
1 1
1 1
2 3
2 2
2 2
1 2
1 3
2 2
2 3
1 1
1 1
1 2
1 2
2 3
1 3
2 2
2 3
1 3
2 3
2 1
1 3
1 2
1 2
1 3
2 3
1 1
1 1
1 3
1 2
1 1
1 3
1 3
2 1
1 3
1 2

67
Step – 2 Import Excel file in SPSS –

Step – 3 Assign values to the dataset in Variable View.


Male-1
Female-2

Non-Smoker-1
Past Smoker-2
Current Smoker-3

68
Step – 4 Click on Analyse -> Descriptive Statistics -> Crosstabs.

Step – 5 Drag and drop Smoking Behaviour into the Row box and Gender into the Column
box.

69
Step – 6 Click on Statistics, and select Chi – Square.

Step – 7 Press Continue, and then OK to do the Chi-Square test.


Step – 8 The result will appear in the SPSS output viewer.

Case Processing Summary


Cases
Valid Missing Total
N Percent N Percent N Percent
Smoking Behavior * 50 100.0% 0 0.0% 50 100.0%
Gender

Smoking Behavior * Gender Crosstabulation


Count
Gender
Male Female Total
Smoking Behavior Non Smoker 10 3 13

70
Past Smoker 10 6 16
Current Smoker 11 10 21

Total 31 19 50

Chi-Square Tests
Asymptotic
Value df Significance (2-sided)
Pearson Chi-Square 2.055a 2 .358
Likelihood Ratio 2.127 2 .345
Linear-by-Linear Association 1.994 1 .158

N of Valid Cases 50
a. 1 cells (16.7%) have expected count less than 5. The minimum expected count is 4.94.

Interpretation –
The Chi-Square Statistics were used to examine the association between the categorical
variables. There was insignificant relationship at 5% significance level between gender and
smoking behaviour of respondents (x2 = 2.055a, df = 2, p = 0. 358). It can be seen from the
above table that the p-value (0.358) is greater than the alpha value (0.05).

Hence H1 cannot be supported, and Ha is selected.

71
Module – 10 (T-Test)

Define T-Test. Explain One Sample T-Test and Paired Sample T-Test.
(Practical in SPSS)

Definition

A t-test is an inferential statistic that is used to see if there is a significant difference in the
means of two groups that are related in some way.

The t-test is one of many statistical tests that are used to test hypotheses.

Three key data values are required to calculate a t-test. They include the mean difference (the
difference between the mean values in each data set), the standard deviation of each group,
and the number of data values in each group.

Depending on the data and sort of analysis required, different forms of t-tests can be used.

1. One Sample T-Test


The one-sample t-test is a statistical hypothesis test that can be used to see if an
unknown population mean differs from a given value. The test can be used with
continuous data. Your data should be drawn from a normal population at random.

2. Paired Sample T-Test


When the samples are frequently made up of matched pairs of similar units, or when
there are repeated measures, the paired t-test is used. It's possible, for example, that
the same patients will be examined multiple times—both before and after treatment.
Each patient is utilized as a control sample against themself in such circumstances.
This method can also be used in situations when the samples are related or have
similar traits, such as a comparison analysis of children, parents, or siblings.
Correlated or paired t-tests are dependent tests since they contain two sets of samples
that are related.

72
Hypothesis –

H0 = There exists no significant relationship in engine efficiency between the current and
previous trial.

Ha = There exists a significant relationship in engine efficiency between current and previous
trial.

Create a data file with 30 respondents and import it to SPSS.

Car With Ethanol Without


1 15.00 17.00
2 23.00 20.00
2 12.00 8.00
1 14.00 23.00
1 19.00 26.00
2 7.00 25.50
1 8.00 19.00
2 26.00 16.00
2 27.00 17.00
1 21.00 22.00
1 16.00 18.00
2 12.00 14.00
1 9.00 15.00
2 13.00 20.00
1 16.00 12.00
1 22.00 16.00
2 18.00 18.00
2 27.00 20.00
1 21.00 17.00
1 20.00 21.00
2 17.00 19.00
1 10.00 14.00
2 25.00 15.00
1 25.00 22.00
1 13.00 15.50
2 21.00 11.00
1 10.00 10.00
2 12.00 14.00
1 14.00 16.00
1 20.00 10.00

73
To find out T-Test with on sample –

Click on Analyze -> Compare Means -> One Sample T-Test

This will open the Dialogue Box -> select the Test Variables and shift it to the Variable List

Click on OK to see your output.

74
Interpretation – The value of two-tailed significance is less than .05 (p < .05) at 0.001, as
such the difference between the means is significant. The output indicates that there exists a
significant relationship in engine efficiency between the current and previous trials. The cars
in the current trial have more engine efficiency than those in the earlier trial with t (29) =
15.834, p < .05

With 95% confidence level, df = 29 and t = 0.05 we can see that our computed value (15.834)
is greater than our table value (1.699). Hence, we can say that the alternate hypothesis can
be accepted.

To find out Paired Sample T-Test –

Click Analyze -> Compare Means -> Paired Sample T-Test ->

75
This will open the Dialogue Box -> select the Test Variables and shift them to the Variable
List

Click on OK to see your output.

Interpretation – The value of two-tailed significance is more than .05 (p < .05) at 0.479 as
such the difference between the means is not significant. The output indicates that there does
not exist a significant relationship in engine efficiency between ethanol and without ethanol
trial. We can’t say that the cars with the ethanol additive have more engine efficiency than
those without ethanol, with t (29) = 0.053, p > .05

76
As can be seen from the 95% confidence level, df = 29 and t = 0.05 we can see that our
computed value (0.053) is lesser than our table value (2.045), hence we can say that the null
hypothesis has been accepted.

77
Module – 11 (Regression analysis)

Explain Regression and show steps to compute regression in SPSS. Also


explain the types of Regression.

Regression is a statistical method used in finance, investing, and other disciplines that attempt
to determine the strength and character of the relationship between one dependent variable
(usually denoted by Y) and a series of other variables (known as independent variables).

Regression helps investment and financial managers to value assets and understand the
relationships between variables, such as commodity prices and the stocks of businesses
dealing in those commodities.

Here we looking into two types of regression -

• Bivariate Regression

• Multiple Regression

Bivariate regression is similar to bivariate correlation because both are designed for situations
in which there are just two variables. Bivariate analysis refers to the analysis of two variables
to determine the relationships between them. Bivariate analyses are often reported in quality-
of-life research. Essentially, bivariate regression analysis involves analyzing two variables to
establish the strength of the relationship between them. The two variables are frequently
denoted as X and Y, with one being an independent variable (or explanatory variable), while
the other is a dependent variable (or outcome variable).

Multiple regression, however, was created for cases in which there are three or more
variables. Multiple regression is a statistical technique that can be used to analyze the
relationship between a single dependent variable and several independent variables. The
objective of multiple regression analysis is to use the independent variables whose values are
known to predict the value of the single dependent value.

78
Difference between Correlation and Regression

Characteristics Correlation Regression

Purpose of Technique Association/Connection Understanding Link i.e


Prediction and Explanation
between two variables

Labels Attached No Clear Labels Clear Distinction of


Dependent and Independent
Variable

Inferential Tests Correlation Coefficient Regression Coefficient,


Intercept statistics, Change in
Regression Coefficient

Regression - Some Important Terms

Regression coefficient

Regression coefficient is a measure of how strongly each IV (also known as predictor


variable) predicts the DV. There are two types of regression coefficients, unstandardized
coefficients and standardized coefficients, also known as beta values,

An unstandardized coefficient is used in the equation as coefficients of different IVs along


with the constant term to predict the value of DV. The standardized coefficient is, however,
measured in standard deviations. If there is just one IV to predict one DV, the beta value
obtained would be same as the correlation coefficient between the DV and the IV.

79
R Values

Regression analysis would provide you with two different values. A simple R value
represents the correlation between the observed values and the predicted values (based on the
regression equation obtained from the DV. The other R value is referred to as R Square, it is
the square of R and gives the proportion of variance in the dependent variable accounted for
by the set of IV’s chosen for the model.

R Square is used to find out how well the IVs can predict the DV. However, the R square
value tends to be a bit inflated when the number of IVs is more or when the number of cases
is large. The adjusted R Square takes into account these things and gives more accurate
information about the fitness of the model. For example, R Square value of 0.70 would mean
that the IVs in the model can predict 70% of the variance in the DV.

Problem:

The HR manager wants to know the impact of training and development on employee
engagement of employees working in different companies.

Hypothesis:

H1: There is a significant impact of training and development on employee engagement of


employees working in different companies.

H0: There is no significant impact of training and development on employee engagement of


employees working in different companies.

Questionnaire on 5-point Likert Scale - Strongly Disagree-1 Disagree-2 Neutral-3


Agree-4 Strongly Agree-5

Training and Development

● I can use knowledge and behaviors learned in training at work

80
● My organization helps me develop the skills I need for the successful accomplishment
of my duties (e.g.: training, conferences, etc.)

● My organization invests in my development and education promoting my personal


and professional growth broadly (e.g.: full or partial sponsorship of undergraduate
degrees, postgraduate programs, language courses, etc.)

● In my organization, training is evaluated by participants

● My organization stimulates learning and application of knowledge

● In my organization, training needs are identified periodically

Employee Engagement

● I focus hard on my work

● I concentrate on my work

● I pay a lot of attention to my work

● I share the same work values as my colleagues

● I share the same work goals as my colleagues

● I share the same work attitudes as my colleagues

● I feel positive about my work

● I feel energetic in my work

● I am enthusiastic about my work

Step 1: Create an Excel file

81
Step 2: Import Excel file data into SPSS

Step 3: Provide values on a 5-point Likert scale as mentioned earlier.

Step 4: Compute Average mean for T&D & EE (Transform->Compute Variable->Mean-

>OK)

82
Step 5: Go to Analyze ->Regression ->Linear

83
Step 6: Move the mean_ee in Dependent List and the mean_td in Independent List.

Step 7: Press OK. You will then see the following output:

84
Variables Entered/Removeda

Model Variables Entered Variables Removed Method


1 mean_tdb . Enter
a. Dependent Variable: mean_ee
b. All requested variables entered.

Model Summary
Std. Error of the
Model R R Square Adjusted R Square Estimate
1 .056a .003 -.018 .46753
a. Predictors: (Constant), mean_td

ANOVAa

Model Sum of Squares df Mean Square F Sig.


1 Regression .033 1 .033 .149 .701b

Residual 10.492 48 .219


Total 10.524 49
a. Dependent Variable: mean_ee
b. Predictors: (Constant), mean_td

Coefficientsa

Standardized
Model Unstandardized Coefficients Coefficients t Sig.

85
B Std. Error Beta
1 (Constant) 3.216 .416 7.728 <.001

mean_td -.052 .135 -.056 -.386 .701


a. Dependent Variable: mean_ee

The Hypothesis tests if training and development carry a significant impact on employee
engagement. The dependent variable ‘employee engagement’ was regressed on predicting
variable ‘training and development’ to test hypothesis H1. T&D significantly predicted EE, F
(3.216) = .144, p <0.001, which indicates that the T&D can play a significant role in shaping
EE (b = -.052, p<0.001). These results direct the positive effect of the T&D. Moreover, the
R2= .003 depicts that the model explains 0.3% of the variance in EE. The table shows the
summary of the findings.

Hypothesis Regression Beta R2 F P-value Hypothesis


Weights Coefficient supported

H1 TD->EE -.052 .003 .144 <0.001 Yes

Note: p<.005, TD-Training & Development, EE-Employee Engagement.

86
Module – 12 (ANOVA)

Define ANOVA Test. Perform ANOVA Test in SPSS.

Analysis of Variance, i.e., ANOVA in SPSS, is used for examining the differences in the
mean values of the dependent variable associated with the effect of the controlled
independent variables, after taking into account the influence of the uncontrolled independent
variables. Essentially, ANOVA in SPSS is used as the test of means for two or more
populations.

ANOVA in SPSS must have a dependent variable which should be metric (measured using an
interval or ratio scale). ANOVA in SPSS must also have one or more independent variables,
which should be categorical. In ANOVA in SPSS, categorical independent variables are
called factors. A particular combination of factor levels, or categories, is called a treatment.

Problem – Comparing the scores of case students from four metro cities of India (Delhi,
Pune, Mumbai, Chennai). We obtained 25 respondents for each of the metropolitan
cities.

Hypothesis:

H0 – There is no significant difference in scores from four different metropolitan cities.

H1 - There is a significant difference in scores from four different metropolitan cities.

Step – 1

Create the data file of 100 respondents with scores of 25 respondents from each city and
import it to SPSS.

87
Cities Scores 3 512
1 444 3 438
1 536 3 559
1 534 3 425
1 459 3 487
1 491 3 563
1 567 3 539
1 575 3 550
1 500 3 539
1 547 3 549
1 412 3 444
1 568 3 532
1 524 3 451
1 411 3 522
1 570 3 437
1 579 3 443
1 430 3 407
1 600 3 583
1 521 3 500
1 488
3 552
1 527
3 590
1 591
3 402
1 572
3 500
1 563
3 569
1 500
3 408
1 565
4 435
2 502
4 460
2 595
4 411
2 421
4 577
2 474
4 425
2 552
4 414
2 494
2 598 4 477
2 472 4 427
2 442 4 489
2 564 4 542
2 576 4 469
2 402 4 421
2 512 4 542
2 506 4 507
2 409 4 424
2 453 4 484
2 474 4 556
2 485 4 586
2 545 4 477
2 447 4 453
2 420 4 523
2 587 4 582
2 431 4 487
2 479 4 592
2 543 4 414

88
Step – 2 – Assign values to the city in variable view as

1 = Delhi, 2 = Mumbai, 3 = Chennai, 4 = Kolkata

Step – 3 – Click on Analyze -> Compare Means -> One Way Anova.

Step – 4 - Put City in Factor List and Scores in Dependent List.

89
Step – 5 - Go to Post Hoc -> select Tukey -> Continue.

Step – 6 - Click on Options -> select Descriptive and Homogeneity of Variance Tests ->
click Continue -> Click OK to get your desired results.

90
Step – 7 - It will show the following output:

Oneway

Descriptives
Scores
95% Confidence Interval
for Mean
Std. Std. Lower Upper Minimu Maximu
N Mean Deviation Error Bound Bound m m
Delhi 25 522.96 56.411 11.282 499.67 546.25 411 600
Mumb 25 495.32 60.825 12.165 470.21 520.43 402 598
ai
Chenna 25 500.04 60.762 12.152 474.96 525.12 402 590
i
Kolkat 25 486.96 60.480 12.096 462.00 511.92 411 592
a
Total 100 501.32 60.249 6.025 489.37 513.27 402 600

ANOVA
Scores
Sum of Mean
Squares df Square F Sig.
Between 17803.440 3 5934.480 1.668 .179
Groups
Within Groups 341560.320 96 3557.920
Total 359363.760 99

91
Post Hoc Tests

Multiple Comparisons
Dependent Variable: Scores
Tukey HSD
Mean 95% Confidence Interval
(I) Cities (J) Cities Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
Delhi Mumbai 27.640 16.871 .362 -16.47 71.75
Chennai 22.920 16.871 .528 -21.19 67.03
Kolkata 36.000 16.871 .150 -8.11 80.11
Mumbai Delhi -27.640 16.871 .362 -71.75 16.47
Chennai -4.720 16.871 .992 -48.83 39.39
Kolkata 8.360 16.871 .960 -35.75 52.47
Chennai Delhi -22.920 16.871 .528 -67.03 21.19
Mumbai 4.720 16.871 .992 -39.39 48.83
Kolkata 13.080 16.871 .865 -31.03 57.19
Kolkata Delhi -36.000 16.871 .150 -80.11 8.11
Mumbai -8.360 16.871 .960 -52.47 35.75
Chennai -13.080 16.871 .865 -57.19 31.03

Interpretation -

As shown in the ANOVA table, the F test values and significance of 0.179, given that
p>0.5 you can accept the null hypothesis and reject the alternate hypothesis.

92

You might also like