0% found this document useful (0 votes)
61 views17 pages

Chapter Summary - SRM - Triad 2

This chapter summary discusses quantitative data analysis methods. It describes preparing data by coding and checking for errors. Exploratory data analysis using diagrams is recommended to understand relationships. Data can be presented visually using tables, bar charts, histograms, line graphs and pie charts. Descriptive statistics describe data through measures of central tendency (mean, median, mode) and dispersion (standard deviation, range, interquartile range). Inferential statistics are used to test hypotheses about relationships between variables using parametric or non-parametric tests depending on data distribution. The chi-square test determines if categorical variables are associated.

Uploaded by

Vivek Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views17 pages

Chapter Summary - SRM - Triad 2

This chapter summary discusses quantitative data analysis methods. It describes preparing data by coding and checking for errors. Exploratory data analysis using diagrams is recommended to understand relationships. Data can be presented visually using tables, bar charts, histograms, line graphs and pie charts. Descriptive statistics describe data through measures of central tendency (mean, median, mode) and dispersion (standard deviation, range, interquartile range). Inferential statistics are used to test hypotheses about relationships between variables using parametric or non-parametric tests depending on data distribution. The chi-square test determines if categorical variables are associated.

Uploaded by

Vivek Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

SOCIAL RESEARCH METHODOLOGY

Chapter Summary

Triad 2:
G Nandhan (H010-19)
Mary Shannan (H022-19)
Vivek Singh Rana (H067-19)
Chapter 12 – Analysis of quantitative data
12.1 Quantitative data
Raw quantitative data, that haven’t been processed or analysed, convey very little meaning to
most people. In order for these data into useful information they need to be processed.
Quantitative data refer to all numerical primary and secondary data and can help the
researcher to answer research questions and meet objectives.

12.2 Preparing, inputting and checking data

Types of data
Quantitative data can be divided in two groups: categorical data and numerical data.

Categorical data are those whose values cannot be measured numerically but can be classified
into sets/categories according to the characteristics that describe or identify the variable or
they could be placed in rank order. There are two types of data:
• Descriptive/nominal data – these data can simply count the number of occurrences in
each category of a variable. When a variable is divided into two categories
(female/male for example) than the data are known as dichotomous data.
• Ranked/ordinal data – these are data that are a more precise form than categorical
data. An example of ranked data may be answers to rating or scale questions.

Alternatively, numerical data are those whose values are numerically measured or counted as
quantities (Berman 2008). Numerical data are therefore more precise than categorical ones
because one can assign each data value a position on a numerical scale. Numerical data can
be subdivided in two ways: based on interval and ratio data: or based on continuous or
discrete data. Interval data can state the difference (interval) between any two data values of a
certain variable, whereas ratio data can calculate the relative difference (ratio) between any
two data values of a certain variable. Continuous data are those whose values can take any
value (given that you measure them accurately) while discrete data can be measured precisely
(often whole numbers/integers).
After determining the types of data that are to be collected the researcher can start to
enter the data into data computer data processing software (RSS/EXCELL). To do this the
data need to be coded using numerical codes. This enables the researcher to enter the data
quickly with fewer errors. When this is done the data should be checked for errors.

12.3 Exploring and presenting the data


Turkey’s (1977) exploratory data analysis (EDA) is a useful approach to start the analysis of
quantitative data. This approach focuses on the use of diagrams to explore and understand the
data. Sometimes it might be possible that this approach enables you to look at other
relationships in data, which your research was not designed to test.
When looking at the collected data it is best to explore specific values, highest and
lowest values, trends over time, proportions and distributions. Once these have been explored
one can start to compare them and look for (causal) relationships between variables)

Exploring variables
The easiest way of summarising the data is by using tables. However, tables do not
demonstrate visual significance to highest or lowest values so it may be that diagrams are a
better option for summarising the data. Another way to present data is by using a bar chart,
where the height or length of each bar represents the frequency of occurrence. Bar charts are
similar to histograms, another type of data presenting, where the area of each bar represents
the frequency of occurrence and where the continuous nature of the data is emphasised by the
absence of gaps between bars. Finally, a pictogram, also similar to a bar chart, shows a series
of pictures chosen to represent the data. Other kind of data presentation are:

• Line graph – this is a suitable approach when trying to explore a trend.


• Pie chart – this is a diagram that is divided into proportional segments according to
the share each has of the total value.

Shapes of diagrams
If a diagram shows a bunching to the left and a long tail to the right (figure 12.3 on page 291)
then the data are ‘positively skewed’. If this is the other way around then the data are
‘negatively skewed’. When the data are equally distributed on each side of the highest
frequency they are ‘symmetrically skewed’.
A bell-shaped curve is called a normal distribution. With the indicator ‘kurtosis’ one
can compare a diagrams pointedness or flatness with that of the normal distribution. When a
distribution is flatter than it is called platykurtic and the kurtosis value is negative. When the
distribution is more peaked, than it is leptokurtic and the kurtosis value is positive.

Comparing variables
Contingency tables or cross tabulation are approaches one could use examine the
interdependence between variables. Other approaches are:

• Multiple bar charts - to explore highest and lowest values.


• Percentage component bar chart – this is used to compare proportions between
variables.
• Multiple line graph – this Is used to compare trends and conjunctions.
• Stacked bar chart – used to compare totals between variables.
• Comparative proportional pie chart – this is used to compare proportions of each
category or value as well as the totals between variables.
• Scatter graphs or scatter plots – this diagram is often used to explore the possible
relationships between ranked and numerical data variables by plotting one variable
against another

12.4 Describing data with use of statistics


Statistics to describe a variable focus on two aspects:
• the central tendency;
• the dispersion.

Turkey’s exploratory data analysis approach is a good approach to understand the data using
diagrams. Descriptive statistics, on the other hand, enable one to describe the variables
numerically. They describe a variable focus on the central tendency and the
dispersion. Central tendency is measured by general impressions of values that could be
seen as common, middling or average. These measures are determined by:

• The mode – the value that is visible most often


• The median – the middle value or mid-point after the data have been ranked
• The mean – also known as the average

The dispersion (how data are distributed around the central tendency) could be described by:

4. Inter-quartile range – the difference within the middle 50 per cent of values

5. Standard deviation – extent to which the value differs from the mean

6. Range – the difference between the lowest and the highest values

7. Coefficient of variation – this is to compare the relative spread of data between


distributions of different magnitudes, for example hundreds of tons with billions of
tons (calculated by dividing the standard deviation by the mean and multiply the
answer by 100)
12.5 Explore relationships, differences and trends using statistics
In a research one often wishes to find the relationship between variables. This is called
hypothesis testing, where one is actually comparing the collected data with what he expected
to happen.

There are two general groups of statistical significance tests: the non-parametric tests (used
when the data are not normally distributed) and the parametric tests (these are used with
numerical data).

Testing for normal distribution


A way to test for normality is to use statistics to determine whether the distribution for a
variable differs significantly from a comparable normal distribution. This could be done
using statistical software that use the Kolmogorov-Smirnov test and the Shapiro-Wilk test. A
probability of 0.05 means that there is a 5 per cent chance that the data distribution differs
from a comparable normal distribution. Thus, if the probability is lower than 0,05, the data
are not normally distributed.

Testing for significance


If a there is a relationship between variables than the researcher will reject the null hypothesis
and accept the alternative hypothesis. It is difficult to obtain a significant test statistic with a
small sample, by increasing the sample size more relationships found will be significant. This
is because the sample size resembles that of the population from which it was selected.

Type 1 and 2 errors


A Type 1 error occurs when the null hypothesis has been wrongly rejected and the
alternative hypothesis should not have been accepted. In other words, the researcher states
that two variables are related when they are actually not. Statististical significance is the same
as determining the probability of making a Type 1 error.

A Type 2 error is when a researcher does


not reject the null hypothesis when he
should. Thus, he states that two variable are
not related when they actually are.

When descriptive or numerical data


are summarised as a two-way contingency
table it is helpful to use a chi square test. A
chi square test makes it possible to
determine how likely it is that two variables
are associated. In order to do this test two
assumptions should be met:

• The categories of the contingency


table are mutually exclusive. Each
observation falls into one category
only
• Not more than 25 per cent of the cells can have expected values of less than 5. When
the table consists of two rows and two columns, no expected values can be less than
10.
The chi square test enables you to find out how likely it is that the two variables are
associated. It is based on a comparison of the observed values in the table with what might be
expected if the two distributions were entirely independent.

The test relies on:


• the categories used in the contingency table being mutually exclusive, so that each
observation falls into only one category or class interval;
• no more than 25 per cent of the cells in the table having expected values of less than 5. For
contingency tables of two rows and two columns, no expected values of less than 10 are
preferable (Dancey and Reidy 2008).

If the latter assumption is not met, the accepted solution is to combine rows and columns
where this produces meaningful data.

An alternative statistic used to measure the association between two variables is Phi. This
statistic measures the association on a scale between –1 (perfect negative association),
through 0 (no association) to 1 (perfect association).

To test whether two groups are different

Ranked data Sometimes it is necessary to see whether the distribution of an observed set of
values for each category of a variable differs from a specified distribution.

Numerical data If a numerical variable can be divided into two distinct groups using a
descriptive variable you can assess the likelihood of these groups being different using an
independent groups t-test.

To test whether three or more groups are different


If a numerical variable is divided into three or more distinct groups using a descriptive
variable, you can assess the likelihood of these groups being different occurring by chance
alone by using one-way analysis of variance or one-way ANOVA (Table 12.5). As you can
gather from its name, ANOVA analyses the variance, that is, the spread of data values,
within and between groups of data by comparing means.

The following assumptions need to be met before using one-way ANOVA. More detailed
discussion is available in Hays (1994) and Dancey and Reidy (2008).
• Each data value is independent and does not relate to any of the other data values. This
means that you should not use one-way ANOVA where data values are related in some way,
such as the same case being tested repeatedly.
• The data for each group are normally distribute. This assumption is not particularly
important provided that the number of cases in each group is large (30 or more).
• The data for each group have the same variance (standard deviation squared). However,
provided that the number of cases in the largest group is not more than 1.5 times that of the
smallest group, this appears to have very little effect on the test results.

Exploring the strength of a relationship

There are two kinds of relationships:


• Correlations: this is when a change in one variable leads to a change in another
variable, but it is not clear which variable has caused the other to change
• Cause-and-effect relationship: when a change in one or more variables cause a change
in another variable

The correlation coefficient quantifies the strength of a linear relationship between two ranked
or numerical variables between a number of +1 and -1. A value of +1 means positive
correlation, which means that the two variables are exactly related and when one increases,
the other one will increase as well. A value of -1 demonstrates a negative correlation, where
the two variables are precisely related, but when one increases the other one decreases

To predict the value of a variable from one or more other variables


Regression analysis can also be used to predict the values of a dependent variable given the
values of one or more independent variables by calculating a regression equation.

AoSi = a + b1MEi + b2NSSi

where:
AoS is the Amount of Sales
ME is the Marketing Expenditure
NSS is the Number of Sales Staff
a is the regression constant
b1 and b1 are the beta coefficients
This equation can be translated as stating:
Amount of Salesi = value + (b1 * Marketing Expenditurei) + (b2 * Number of Sales Staffi)

When calculating a regression equation you need to ensure the following assumptions are
met:

• The relationship between dependent and independent variables is linear. Linearity


refers to the degree to which the change in the dependent variable is related to the
change in the independent variables. Linearity can easily be examined through
residual plots (these are usually drawn by the analysis software).

• The extent to which the data values for the dependent and independent variables have
equal variances (this term was explained earlier in Section 12.4), also known as
homoscedasticity. Again, analysis software usually contains statistical tests for equal
variance.

• Absence of correlation between two or more independent variables (collinearity or


multicollinearity), as this makes it difficult to determine the separate effects of
individual variables.

• The data for the independent variables and dependent variable are normally
distributed.
Examining trends
When examining longitudinal data the first thing we recommend you do is to draw a line
graph to obtain a visual representation of the trend (Figure 12.7). Subsequent to this,
statistical analyses can be undertaken. Three of the more common uses of such analyses are:
• to examine the trend or relative change for a single variable over time;
• to compare trends or the relative change for variables measured in different units or of
different magnitudes;
• to determine the long-term trend and forecast future values for a variable.

To examine the trend


To calculate simple index numbers for each case of a longitudinal variable you use the
following formula:

index number of case = (data value for case/data value for base period)* 100

To compare trends
To answer some other research question(s) and to meet the associated objectives you may
need to compare trends between two or more variables measured in different units or at
different magnitudes.

To determine the trend and forecasting


Calculating a moving average involves replacing each value in the time series with the mean
of that value and those values directly preceding and following it (Morris 2003). This
smo\othes out the variation in the data so that you can see the trend more clearly. The
calculation of a moving average is relatively straightforward using either a spreadsheet or
statistical analysis software.

Once the trend has been established, it is possible to forecast future values by continuing the
trend forward for time periods for which data have not been collected. This involves
calculating the long-term trend – that is, the amount by which values are changing
each time period after variations have been smoothed out. Once again, this is relatively
straightforward to calculate using analysis software. Forecasting can also be undertaken using
other statistical methods, including regression analysis. If you are using regression for your
time series analysis, the Durbin-Watson statistic can be used to discover whether the value
of your dependent variable at time t is related to its value at the previous time period,
commonly referred to as as autocorrelation or serial correlation, is important as it means
that the results of your regression analysis are less likely to be reliable.
Chapter-13
Quantitative Data Analysis
Quantitative data analysis covers the basic techniques for analyzing quantitative data. Usually
this analysis is done using computer simulation software. But this chapter deals with the
various aspects and methods of quantitative data analysis and does not cover the steps
involved in the same.
Quantitative data analysis starts not after the collection of data, but suitable steps and
methods of data analysis must be planned before the actual collection of data. This will help
us design questionnaires and methods that are feasible for analysis. Lack of proper planning
will actually hinder the analysis process and may also lead to improper results.
One of the main problems with the quantitative Data analysis is the concept of missing data.
When analyzing a questionnaire for example, some of the columns may be left blank. This
may be due to the fact that the participants did not want to fill that data or the question is
designed in such a manner, it is redundant to answer the same. Hence it is essential that
suitable planning must take place before the actual collection of the data.

Types of variables:
There are four types of quantitative variables that will be generated during the course of the
research. They are as follows:
1. Interval/ratio variable
These are the variables where the distances between the categories are identical across the
range of categories and can be rank ordered. Interval/ratio variables are regarded as the
highest level of measurement because they permit a wider variety of statistical analysis to
be conducted on them than with any other type.
2. Ordinal variable
These are the variables whose categories can be rank ordered similar to the Interval/ratio
variables but the distances between the categories are not equal across the range.
3. Nominal variable
These variables comprise categories that cannot be rank ordered. They are also known as
categorical variables.
4. Dichotomous variable
These variables contain data that have only two categories. For example, gender etc.

Analyzing the type of variable:


The above diagram depicts a process for identifying the type of variable

Analysis of the data-Types:

1) Univariate analysis
This refers to the analysis of one variable at a time. This can be done with the following
methods
i) Frequency tables
Frequency table provides the number of people and the percentage belonging to each
of the categories for the variable in the question. It can be used in relation to all of the
different types of variables. I an interval/ratio variable is to be presented in a
frequency table format, the categories will need to be grouped and the grouping
should not overlap.
ii) Diagrams
These are the most frequently used methods of displaying quantitative data. They are
easy to interpret and understand. Some of the common diagrams are bar chart, pie
chart and histograms. Bar chart and pie charts work well with nominal and ordinal
variables. For interval/ratio variable, histograms are used.
iii) Measures of central tendency
(a) Arithmetic mean – Average of all available data or the data being
analyzed. This is employed with interval/ratio variables. The main issue
with mean is it is vulnerable to outliers.
(b) Median – This is the mid-point of distribution of the variables. Median is
not affected by outliers. The data has to be first sorted in the ascending
order before finding the median. If there are even number of data, the
mean of the two middle variables is calculated as median. This can be used
with both interval/ratio variables and also ordinal variables.
(c) Mode – The mode is the value that occurs most frequently in a
distribution. This can be employed with all types of variables.
iv) Measures of dispersion
(a) Range – Difference between the maximum and minimum value in a
distribution of values associated with an interval/ratio variable. These are
affected by presence of outliers
(b) Standard deviation – It is the average amount of variation around the
mean. These are also affected by the outliers but their effect is offset by
division of number of values in the distribution.
(c) Box-plot – Provides an indication of both central tendency and dispersion
in a single diagram. It also indicates the outliers in the distribution.

2) Bivariate analysis
It is concerned with the analysis of two variables at a time in order to uncover whether or not
the two variables are related. Exploring the relationships between variables mean searching
for evidence that the variation in one variable coincides with the variation in another variable.
All the methods mentioned here analyses the relationships between the variables and not the
causality. Some of the methods of analysis are as follows
i) Contingency tables – It is similar to the frequency table but it allows two variables
to be simultaneously analyzed so that the relationships between the two variables
can be examined.
ii) Pearson’s r – This is used for interval/ratio variable. Some of the key features are
(a) The co-efficient will almost certainly lie between 0 and 1 – this indicates
the strength of a relationship
(b) The closer the co-efficient is to 1, the stronger the relationship and vice
versa
(c) The co-efficient will be either positive or negative – this indicates the
direction of a relationship
iii) Spearman’s rho – This is similar to the Pearson’s r method but this is used for
pairs of ordinal variables.
iv) Phi and Cramer’s V – Phi is used for the analysis of the relationship between two
dichotomous variables. Similarly, Cramer’s V is used with nominal variables.
3) Multivariate analysis
Multivariate analysis entails the simultaneous analysis of three or more variables. This
basically answers three questions, Could the relationship be spurious? Could there be an
intervening variable? And Could a third variable moderate the relationship? It uses a mix of
methods mentioned previously.

One main criterion to be taken care when analyzing data using the various methods is the
Statistical significance. Statistical significance allows the analyst to estimate how confident
he or she can be that the results deriving from a study based on a randomly selected sample
are generalizable to the population from which the sample was taken. Hence proper sampling
is essential to increase its significance. Lack of proper sampling undermines the whole
purpose of the analysis. Hence the concept of confidence interval is used to test the level of
significance. Chi-square test is used in this case to analyze the same.
CHAPTER 14: Using IBM SPSS statistics
Key Points
o SPSS can be used to implement the techniques learned in Quantitative Data Analysis,
but learning new software requires perseverance and at times the results obtained may
not seen to be worth the learning process.
o But it is worth it – it would take you far longer to perform calculations on a sample of
around 100 than to learn the software.
o If you find your self moving into much more advance techniques, the time saved is
even more substantial, particularly with large samples.
o It is better to become familiar with SPSS before you begin designing your research
instruments, so you are aware of difficulties you might have in presenting your data in
SPSS at an early stage.

INTRODUCTION: WHAT IS SPSS?


➢ Originally it is an acronym of statistical Package for Social Science but now it stands
for Statistical Product and service Solutions
➢ One of the most popular statistical packages which can perform highly complex data
manipulation and analysis with simple instructions

PRE-REQUISITE
➢ Variables
➢ Data
➢ Measurement scales
➢ Code book
➢ Steps involved in hypothesis testing

I. VARIABLES
• A concept which can take on different quantitative values is called variable.
• Ex. What are the variables you would consider in buying a second hand bike?
Brand
Type
Age
Condition (Excellent, good, poor)
Price
• Dichotomous Variables are variables having two values only.
Yes or No
Male or Female
• Income, age or test scores are the examples of Continuous Variables.
• These variables may take on any value within a given range, or in some cases
an infinite set.
• TYPES OF VARIABLES
Independent Variables
Dependent Variables
Moderating Variables
Extraneous Variables

II. MEASUREMENT SCALES

• The process of assigning numbers to objects in such a way that specific


properties of the objects are faithfully represented by specific properties of the
numbers.
• Types of Scales
Nominal
Ordinal
Scale
▪ Interval
▪ Ratio
• NOMINAL SCALE
Nominal or categorical data is data that comprises of categories that
cannot be rank ordered – each category is just different
Example: what is your gender? (Please tick)
▪ Male
▪ Female
• ORDINAL SCALE
Ordinal data is data that comprises of categories that can be ranked
ordered.
Example: How satisfied are you with the level of service you have
received? (Please Tick)
▪ Very satisfied
▪ Somewhat satisfied
▪ Neutral
▪ Somewhat dissatisfied
▪ Very dissatisfied
• INTERVAL SCALE
Interval data measured in a continuous scale and has no true zero point.
Examples:
▪ Time – moves along a continuous measure or seconds, minutes
and so on and is without a zero point of time.
▪ Temperatures – moves along a continues measure of degrees
and is without a true zero.
• RATIO SCALE
Ratio data measured on a continuous scale and does have a true zero
point.
Examples: age, weight, height.

III. CHOICES OF SCALES IN SPSS

• The default is scale, which refers to an interval or ratio level of measurement.


• Choose nominal for categorical data
• And ordinal if data involves rankings or ordered values
IV. PREPARING THE CODE BOOK

• Before you can enter the information from your questionnaire, interviews or
experiments into SPSS it is necessary to prepare a “code book”.
• This is a summary of the instructions you will use to convert the information
obtained from each subject or case into a format that SPSS can understand.

V. STARTING WITH SPSS

• Two Windows
Data window and variable window
Output window
• DATA EDITOR
Spreadsheet-like system for defining, entering, editing and displaying data.
Extension of the saved file will be’.sav’

• VARIABLE VIEW WINDOW


The page contains information about that data set that is stored with the data sheet.
• OUTPUT WINDOW
VI. BASIC OPERATIONS IN SPSS

• Variable entry (adding or deleting a variable)


• Data entry (adding od deleting the data)
• Saving the data
• Importing data from excel file
• Checking the data entered
• Sorting the data
• Transforming the data

VII. DATA ANLYSIS WITH SPSS

• FREQUENCIES
This analysis produces frequency tables showing frequency counts and
percentages of the values of individual variables.
• DESCRIPTIVES
This analysis shows the maximum, minimum, mean and standard deviation of
the variables.
• CORRELATION ANALYSIS
Correlation analysis is used to describe the strength and direction of the linear
relationship between two variables.
• RELIABILITY.
The reliability of a scale indicates how free it is from random error.
Two frequently used indicators of a scale’s reliability are test-retest
reliability (also referred to as temporal stability) and internal
consistency.
The test-retest reliability of a scale is assessed by administering it to
the same people on two different occasions and calculating the
correlation between the two scores obtained.
The second aspect of reliability that can be assessed in internal
consistency.
This is the degree to which the items that make up the scale are all
measuring.
The same underlying attribute (i.e. the extent to which the items ‘hang
together’).
Internal consistency can be measured in a number of ways.
The most commonly used statistic is Cronbach’s co-efficient alpha
(available using SPSS)
This statistic provides an indication of the average correlation among
all of the items that make up the scale.
Values range from 0 to 1, with higher values indicating greater
reliability.
While different level of reliability is required, depending on the nature
and purpose of the scale, Nunnally (1978) recommends a minimum
level of .7.

You might also like