Session 3 - Bivariate Data Analysis Tutorial Prac
Session 3 - Bivariate Data Analysis Tutorial Prac
For example:
Is government spending related to GDP growth?
How does an increase in education affect earnings?
If two variables are related, this means that you can use information
about one variable to predict the values of the other variable.
We will investigate these types of predictions further when we
begin the econometrics part of the course next week.
The univariate summary statistics we featured last week can be
used to describe each of the individual variables, but they will not
give any indication of the relationship between variables.
The Total in the end column of the table gives the total number
people with each employment status, regardless of their gender.
In the table above, there are more females than males in each
employment status category.
However, since there are also more females than males in the
sample, it is difficult to tell whether females are more likely to be
economically inactive than males, for example.
The figures in the top of each row (right-aligned) are the frequencies,
identical to Table 1.
The next row contains row percentages (note that the values add to
100% at the end of the row).
Thus the figure of 38.2 means that 38.2% of all values in that row are
male
i.e. 38.2% of all economically inactive people are male, or
55% of all employed people are females.
etc.
The next row contains column percentages (note that the values
add to 100% at the bottom of the column).
Thus the figure of 65.6 means that 65.6% of all values in that
column are economically inactive
i.e. 65.6% of all males are economically inactive, or
6.3% of all females are unemployed.
etc.
The final row contains cell percentages (note that the values add to
100% in the bottom right cell of the table).
Thus the figure of 26.3 means that 26.3% of all values in the sample
are economically inactive and male
i.e. 26.3% of the sample are economically inactive males, or
13.8% of the sample are employed females.
Etc.
Interpret the figure of 70.8 in one sentence.
Compare it to the figure of 65.6; what does it tell you?
Bivariate Analysis of Continuous Data
Introduction
When quantitative variables have numerous different values, we
cannot meaningfully cross-tabulate them. (Why not?)
Instead, we look at summary statistics, or for graphical displays, that
describe the extent of association or relationship between the
variables.
e.g., the univariate descriptive statistics below summarise education
attainment and monthly income for several individuals.
The summary statistics for the two variables are useful but do not
tell us much about the relationship between the two variables.
However, a scatterplot reveals a much clearer relationship between
them.
Scatterplots
A scatterplot or X-Y graph is a graphical method of depicting the
relationship between two variables, and particularly to examine what
we believe to be a causal relationship.
The independent or explanatory variable, X, is represented -
horizontal axis, while the dependent variable, Y, is represented on the
vertical axis.
A scatterplot may reveal three important features:
The direction of the relationship:
An upward-sloping line indicates a positive linear relationship [(a), (c)
and (e) below].
A downward-sloping line indicates a negative linear relationship [(b), (d)
and (f)].
A curved pattern indicates a non-linear relationship [(h)].
No pattern indicates that there is no apparent relationship [(g)].
The strength of the relationship:
The more the data points cluster along an imaginary line, the stronger
the relationship [compare (a) and (b) with (c) and (d) and these again,
with (e) and (f)].
The presence of outliers:
Data points that are distant from the bulk of the data, or that lie far
away from the imaginary line showing the relationship between the
variables, may be outliers and should be investigated further.
The following table shows completed education (in years) and monthly
income (in thousands of Rands) for 8 individuals:
The positive sign on the covariance statistic indicates that there is a positive
relationship between education and income i.e. on average, as education
increases, so too does income.
In our example,
Commands
For cross-tabulating two categorical variables, use the command tab
followed by the names of the variables.
For calculating correlation coefficients, use the command correlate
followed by the names of the variables.
For calculating covariances, use the command correlate followed by the
names of the variables, and the command covariance.
Scatterplots
Use the drop-down menu Graphics > Twoway graph. Create a scatterplot,
enter the names of the variables, and add titles for the axes if desired, or
Use the command scatter with the names of the variables.