Stat Reviewer 1
Stat Reviewer 1
Statistics Review
Christopher Taber
Wisconsin
Statistics Review
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
Random variables
Lets forget about the details that arise in dealing with data for a while. Most objects that economists think about are random variables. Informally, a random variable is a numerical outcome or measurement with some element of chance about it. That is, it makes sense to think of it as having possibly had some value other than what is observed. Econometrics is a tool that allows us to learn about these random variables from the data at our disposal.
Examples of random variables: Gross Domestic Product Stock Prices Wages of Workers Years of Schooling Attained by Students Numeric Grade in a Class Number of Job Offers Received Demand for a new product at a given price
From one perspective, there are two types of random variables, discrete and continuous. A discrete random variable can only take on a nite number of values. Days worked last week is a nice example. It can only take on the 8 different values, 0 to 7. By contrast, a continuous random variable takes on a continuum of values. Literally, this would mean there are an innite number of values that the variable can take; but often a variable that can take on a very large number of values is treated as continuous because it is convenient. Most random variables we will think about are approximately continuous, but we will start with a consideration of the characterization of discrete random variables because it is easier to follow.
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
f (xj ) = 1
j=1
So, for example, if we regard grades as a random variable, and assign a 4 to As, 3 to Bs, etc., we might have f (4) = 0.30 f (3) = 0.40 f (2) = 0.25 f (1) = 0.04 f (0) = 0.01 Summarizing this full distribution is very complicated if it takes on many values. We need some way of characterizing it. The rst question we might ask is Is this typically a big number or a small number?"
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
E (X ) =
j=1
xj f (xj )
(The expectation is thought of as a typical value or measure of central tendency, though it has shortcomings for each of these purposes.)
If a is nonstochastic, E(aX ) = aE(X ). Also, E(aX + b) = aE(x) + b. However in general E(g(X )) = g(E(X ))
An example: Grades
Using the distribution from the example above, the expected grade is: E(G) =0.3 4 + 0.4 3 + 0.25 2 + 0.04 1 + 0.01 0 =2.94
One interpretation of an expectation is the value of a bet. I would break even if the expected value of the bet was zero.
Suppose you meet a guy on the street who charges you $3.50 to play a game. He rolls a die and gives you that amount in dollars, i.e. if he rolls a 1 you get $1.00, etc. Is this a good bet? The expected payoff from the bet is E (Y 3.5) = 1 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 3.5 6 6 6 6 6 6 = 0
Example 3: Occupation
Suppose you are choosing between being a doctor or a lawyer. You may choose to go into the profession where the expected earnings are highest.
You should have learned that the way we estimate the expected value is to use the sample mean That is suppose we want to estimate E(X ) from a sample of data X1 , X2 , ..., XN We estimate using = 1 X N
N
Xi
i= 1
Example
Suppose data is (as in Wooldridge Example C.1) City Unemployment Rate 1 5.1 2 6.4 3 9.2 4 4.1 5 7.5 6 8.3 7 2.6 8 3.5 9 5.8 10 7.5
In this case = 5.1 + 6.4 + 9.2 + 4.1 + 7.5 + 8.3 + 2.6 + 3.5 + 5.8 + 7.5 X 10 =6.0
) = a(Xi X
i=1 i=1
aXi
i=1
aX Xi NaX
= aN
1 N
N i=1
NaX = NaX =0
)2 (Xi X
i= 1 N
1 N1
)[Xi X ] (Xi X
i= 1 N
1 = N1 = 1 N1
)Xi (Xi X
i= 1 N
1 N1
)X (Xi X
i=1
)Xi (Xi X
i= 1
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
Variance
As a rst approximation, as the mean is a measure of central tendency, the variance is a measure of dispersion. In deciding to make a bet, which occupation to pursue, or which stocks to buy, we might care not only about the expected payoff, but also its variability. Variability captures some notion of risk. The variance of a random variable is given by: Var(X ) =E(X X )2
J
=
j=1
(xj X )2 f (xj )
where X = E(X ).
Var(aX + b) = E(aX + b (aX + b))2 = E(aX aX )2 = a2 E(X X )2 = a2 Var(X ) The standard deviation is the square root of the variance.
Interpretation of Variance
The higher the variance, the less condent you are about whether the outcome will be near the mean (or expectation). Suppose you bet on a coin toss. Case 1: $1.00 $1.00 heads tails
X=
Case 2, you bet $10,000: $10, 000 heads $10, 000 tails
X=
1 1 + (10, 000) = 0 2 2 1 21 V (X ) = (10, 000 0) + (10, 000 0)2 = 100, 000, 000 2 2 E(X ) = 10, 000 In considering a bet, an investment in a risky project, a life choice (education, occupation, marriage, etc.), the variance of the payoff is likely to be relevant.
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
=
a
f (x)dx
Pr(x X x + )
The particular value of the density is not interesting in its own right. It is only interesting if you integrate over it.
0 x
Analogous to the properties of the pdf in the discrete case, we have: f (x)dx = 1 E(X ) = Var(X ) = xf (x)dx
(x E(x))2 f (x)dx,
and the properties of expectations that we discussed for the discrete case carry over.
Another representation of the distribution of a random variable is the cumulative distribution function, usually denoted F (x) and dened to be the probability that X falls at or below the value x: F (x) = Pr(X x) For continuous random variables that can take on any value (i.e. positive f (x) for any x between and +) this can be written as: a F (a) = f (x)dx
For two random variables X and Y we can dene their joint distribution. For discrete random variables f (x, y) = Pr(X = x, Y = y) For continuous random variables:
a2 b2
Pr(a1 X a2 , b1 Y b2 ) =
a1 b1
f (x, y)dydx
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
Properties of covariance: Var(X + Y ) = Var(X ) + Var(Y ) + 2 Cov(X , Y ) Cov(X , X ) = Var(X ) Cov(a1 X + b1 , a2 Y + b2 ) = a1 a2 Cov(X , Y ) This last property means that if we change the units of measurement of X and/or Y , the covariance changes.
A measure of association that is unitfree is the correlation coefcient: Cov(X , Y ) (X , Y ) = Var(X ) Var(Y ) It turns out that is between 1 and 1. When X and Y are independent, so there is no relation between X and Y , is zero; if > 0 then X and Y go up and down together, whereas if < 0 then when X goes up, Y tends to go down and vice versa.
Examples of Correlation
Positively correlated random variables: Years of school, earnings Husbands wage, wifes wage Stock price, prot of rm GDP, country size
Negatively correlated random variables: GDP growth, unemployment Husbands income, wifes hours worked
Random variables with zero or nearzero correlation Number on rst die, number on second die Michelle Obamas temperature, Number of questions asked in this class today Stock gain today, stock gain tomorrow
Case 1: X Y (or Y X )
Money supply Ination Increase in minimum wage Increase in wages Retirement Decline in income
Case 2: X Y and Y X
Case 3: Z X and Z Y
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
Suppose Y = a1 X1 + a2 X2 + c,
2 ) and X N ( , 2 ), and the covariance where X1 N (1 , 1 2 2 2 between X1 and X2 is 12 .
Y is the sum of normal random variables so if we know its mean and variance, then we know its distribution. Using the rules for expectations to calculate the mean: E(Y ) = E(a1 X1 + a2 X2 + c) = E(a1 X1 ) + E(a2 X2 + c) = a1 E(X1 ) + a2 E(X2 ) + c = a1 1 + a2 2 + c Using the rules for variances: Var(Y ) = Var(a1 X1 + a2 X2 + c) = Var(a1 X1 + a2 X2 )
2 = a2 1 Var (X1 ) + a2 Var (X2 ) + a1 a2 2Cov(X1 , X2 )
where Y
2 Y
= a1 1 + a2 2 + c
2 = a2 1 Var (X1 ) + a2 Var (X2 ) + a1 a2 2Cov(X1 , X2 ) 2 2 2 = a2 1 1 + a2 2 + 2a1 a2 12
Outline
1 2 3 4 5 6 7 8
Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations
Conditional Expectations
Often the goal of empirical research in economics is to uncover conditional expectations. Formally, I could derive the conditional probability density function and derive conditional expectation from that. If you are interested you can nd this in Appendix B of Wooldridge. Instead I want to think of a conditional expectation in a looser and informal way The question we often care about is if I could gather everyone in the world for whom X is some particular value, what would be the expected value of Y
We dene conditional expectation E (Y | X ) to mean: if I condition X to be some value, what is the expected value of Y ? In almost all interesting cases Y is a random variable so after choosing X we dont know exactly what Y will be E(Y | X ) depends on X , so changing X will change the expected value of Y Very often in Economics we care about conditional expectations.
Examples
Lets consider some examples. Note that all of this ts in the descriptive type of analysis we consider We are not saying anything about causation
I do this using the le CPS78_85 from the textbook website I only look at the year 1985 In this year the values were: Average Wages 1985 Men Women $9.99 $7.88 This is the data
How do I estimate this? Clearly I cant just condition on all levels of expenditure and take the mean We need a model to help us think about this