0% found this document useful (0 votes)
106 views

Stat Reviewer 1

This document provides an outline and overview of key concepts in statistics including: 1. Random variables and their distribution functions 2. How to calculate the expectation (mean) of a random variable and interpret it as the value of a bet 3. How variance measures the dispersion of a random variable and can be used to assess risk 4. The distinction between discrete and continuous random variables and how their probabilities are defined
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Stat Reviewer 1

This document provides an outline and overview of key concepts in statistics including: 1. Random variables and their distribution functions 2. How to calculate the expectation (mean) of a random variable and interpret it as the value of a bet 3. How variance measures the dispersion of a random variable and can be used to assess risk 4. The distinction between discrete and continuous random variables and how their probabilities are defined
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Statistics Review

Statistics Review
Christopher Taber
Wisconsin

Spring Semester, 2011

Statistics Review

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Random Variables

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Random Variables

Random variables
Lets forget about the details that arise in dealing with data for a while. Most objects that economists think about are random variables. Informally, a random variable is a numerical outcome or measurement with some element of chance about it. That is, it makes sense to think of it as having possibly had some value other than what is observed. Econometrics is a tool that allows us to learn about these random variables from the data at our disposal.

Statistics Review Random Variables

Examples of random variables: Gross Domestic Product Stock Prices Wages of Workers Years of Schooling Attained by Students Numeric Grade in a Class Number of Job Offers Received Demand for a new product at a given price

Statistics Review Random Variables

From one perspective, there are two types of random variables, discrete and continuous. A discrete random variable can only take on a nite number of values. Days worked last week is a nice example. It can only take on the 8 different values, 0 to 7. By contrast, a continuous random variable takes on a continuum of values. Literally, this would mean there are an innite number of values that the variable can take; but often a variable that can take on a very large number of values is treated as continuous because it is convenient. Most random variables we will think about are approximately continuous, but we will start with a consideration of the characterization of discrete random variables because it is easier to follow.

Statistics Review Distribution Functions

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Distribution Functions

Probability Density Functions


Suppose that X is a random variable that takes on J possible values x1 , x2 , ...xJ . The probability density function (pdf), f () of X is dened as: f (xj ) = Pr(X = xj ) Some conventions: capital letters are used to denote the variable, small letters realizations or possible values; a pdf is a lower-case letter (often f ) Now it follows that if X can only take on the values x1 , x2 , ...xJ , we have
J

f (xj ) = 1
j=1

Statistics Review Distribution Functions

So, for example, if we regard grades as a random variable, and assign a 4 to As, 3 to Bs, etc., we might have f (4) = 0.30 f (3) = 0.40 f (2) = 0.25 f (1) = 0.04 f (0) = 0.01 Summarizing this full distribution is very complicated if it takes on many values. We need some way of characterizing it. The rst question we might ask is Is this typically a big number or a small number?"

Statistics Review The Expectation of a Random Variable

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review The Expectation of a Random Variable

The expectation is also called the mean or the average.


N

E (X ) =
j=1

xj f (xj )

(The expectation is thought of as a typical value or measure of central tendency, though it has shortcomings for each of these purposes.)

Statistics Review The Expectation of a Random Variable

Some properties of expected values:


1

If b is a nonstochastic (not random), E(b) = b.

If X and Y are two random variables, E(X + Y ) = E(X ) + E(Y )

If a is nonstochastic, E(aX ) = aE(X ). Also, E(aX + b) = aE(x) + b. However in general E(g(X )) = g(E(X ))

Statistics Review The Expectation of a Random Variable

An example: Grades

Using the distribution from the example above, the expected grade is: E(G) =0.3 4 + 0.4 3 + 0.25 2 + 0.04 1 + 0.01 0 =2.94

Statistics Review The Expectation of a Random Variable

Interpretation of Expectation as Bet

One interpretation of an expectation is the value of a bet. I would break even if the expected value of the bet was zero.

Statistics Review The Expectation of a Random Variable

Example 1: Coin Flip

I get $1.00 if heads $1.00 if tails Expected payoff is 0.5 1 + 0.5 1 = 0.

Statistics Review The Expectation of a Random Variable

Example 2: Die Roll

Suppose you meet a guy on the street who charges you $3.50 to play a game. He rolls a die and gives you that amount in dollars, i.e. if he rolls a 1 you get $1.00, etc. Is this a good bet? The expected payoff from the bet is E (Y 3.5) = 1 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 3.5 6 6 6 6 6 6 = 0

Statistics Review The Expectation of a Random Variable

Example 3: Occupation

Suppose you are choosing between being a doctor or a lawyer. You may choose to go into the profession where the expected earnings are highest.

Statistics Review The Expectation of a Random Variable

Outliers and the Expectation


Outliers are very inuential in expected values. Suppose you are a high school basketball player with a remote chance of becoming Lebron James. Your distribution of income might look like 1 $20, 000 3 199 $30, 000 Y= 300 1 $50, 000, 000 300 Then E(Y ) = $166, 748. One point made by this example is that the mean or expected value is not always a good representation of the typical value or central tendency. The medianthe value at or above which half the realization fallis $30,000 in the example and is arguably more typical or central.

Statistics Review The Expectation of a Random Variable

Estimation of Expected Value

You should have learned that the way we estimate the expected value is to use the sample mean That is suppose we want to estimate E(X ) from a sample of data X1 , X2 , ..., XN We estimate using = 1 X N
N

Xi
i= 1

Statistics Review The Expectation of a Random Variable

Example
Suppose data is (as in Wooldridge Example C.1) City Unemployment Rate 1 5.1 2 6.4 3 9.2 4 4.1 5 7.5 6 8.3 7 2.6 8 3.5 9 5.8 10 7.5

Statistics Review The Expectation of a Random Variable

In this case = 5.1 + 6.4 + 9.2 + 4.1 + 7.5 + 8.3 + 2.6 + 3.5 + 5.8 + 7.5 X 10 =6.0

Statistics Review The Expectation of a Random Variable

A fact about sample means


There is a particular feature of sample means that we will use a lot in our course. Forget about random variables for now and just think about the algebra. Notice that for any variable a,
N N N

) = a(Xi X
i=1 i=1

aXi
i=1

aX Xi NaX

= aN

1 N

N i=1

NaX = NaX =0

Statistics Review The Expectation of a Random Variable

Here is one important example of this 1 s = N1


2 N

)2 (Xi X
i= 1 N

1 N1

)[Xi X ] (Xi X
i= 1 N

1 = N1 = 1 N1

)Xi (Xi X
i= 1 N

1 N1

)X (Xi X
i=1

)Xi (Xi X
i= 1

Statistics Review Variance

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Variance

Variance
As a rst approximation, as the mean is a measure of central tendency, the variance is a measure of dispersion. In deciding to make a bet, which occupation to pursue, or which stocks to buy, we might care not only about the expected payoff, but also its variability. Variability captures some notion of risk. The variance of a random variable is given by: Var(X ) =E(X X )2
J

=
j=1

(xj X )2 f (xj )

where X = E(X ).

Statistics Review Variance

If X is nonstochastic xi = X for all i, so Var(X ) = 0.

Var(aX + b) = E(aX + b (aX + b))2 = E(aX aX )2 = a2 E(X X )2 = a2 Var(X ) The standard deviation is the square root of the variance.

Statistics Review Variance

Interpretation of Variance
The higher the variance, the less condent you are about whether the outcome will be near the mean (or expectation). Suppose you bet on a coin toss. Case 1: $1.00 $1.00 heads tails

X=

1 1 + (1) = 0 2 2 1 1 V (X ) = (1 0)2 + (1 0)2 = 1 2 2 E(X ) = 1

Statistics Review Variance

Case 2, you bet $10,000: $10, 000 heads $10, 000 tails

X=

1 1 + (10, 000) = 0 2 2 1 21 V (X ) = (10, 000 0) + (10, 000 0)2 = 100, 000, 000 2 2 E(X ) = 10, 000 In considering a bet, an investment in a risky project, a life choice (education, occupation, marriage, etc.), the variance of the payoff is likely to be relevant.

Statistics Review Continuous Random Variables

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Continuous Random Variables

Continuous Random Variables


The probability density function shows, heuristically, the relative probability of a value. Since the random variable is continuous it can take on an innite number of values and no exact value has nonnegligible probability. Instead, there is a probability of falling between two points a and b, which is given by Pr(a X b) = area under the curve between a and b
b

=
a

f (x)dx

Thus we also have: f (x) = lim


0

Pr(x X x + )

The particular value of the density is not interesting in its own right. It is only interesting if you integrate over it.

An example of a continuous pdf: The normal density function


0.4 dnorm(x) 0.0 0.1 0.2 0.3

0 x

Statistics Review Continuous Random Variables

Analogous to the properties of the pdf in the discrete case, we have: f (x)dx = 1 E(X ) = Var(X ) = xf (x)dx

(x E(x))2 f (x)dx,

and the properties of expectations that we discussed for the discrete case carry over.

Statistics Review Continuous Random Variables

Another representation of the distribution of a random variable is the cumulative distribution function, usually denoted F (x) and dened to be the probability that X falls at or below the value x: F (x) = Pr(X x) For continuous random variables that can take on any value (i.e. positive f (x) for any x between and +) this can be written as: a F (a) = f (x)dx

Statistics Review Continuous Random Variables

Joint Distributions of Random Variables

For two random variables X and Y we can dene their joint distribution. For discrete random variables f (x, y) = Pr(X = x, Y = y) For continuous random variables:
a2 b2

Pr(a1 X a2 , b1 Y b2 ) =
a1 b1

f (x, y)dydx

Statistics Review Covariance and Correlation

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Covariance and Correlation

Covariance and Correlation


When we have two random variables, the rst question one may ask is whether they move together. That is when X is high, is Y high? One measure of whether two variables move together is the covariance. Cov(X , Y ) = E[(X X )(Y Y )] If, when X is large relative to X Y tends to be large relative to Y , then the covariance is positive.

Statistics Review Covariance and Correlation

Properties of covariance: Var(X + Y ) = Var(X ) + Var(Y ) + 2 Cov(X , Y ) Cov(X , X ) = Var(X ) Cov(a1 X + b1 , a2 Y + b2 ) = a1 a2 Cov(X , Y ) This last property means that if we change the units of measurement of X and/or Y , the covariance changes.

Statistics Review Covariance and Correlation

A measure of association that is unitfree is the correlation coefcient: Cov(X , Y ) (X , Y ) = Var(X ) Var(Y ) It turns out that is between 1 and 1. When X and Y are independent, so there is no relation between X and Y , is zero; if > 0 then X and Y go up and down together, whereas if < 0 then when X goes up, Y tends to go down and vice versa.

Statistics Review Covariance and Correlation

Examples of Correlation

Positively correlated random variables: Years of school, earnings Husbands wage, wifes wage Stock price, prot of rm GDP, country size

Statistics Review Covariance and Correlation

Negatively correlated random variables: GDP growth, unemployment Husbands income, wifes hours worked

Statistics Review Covariance and Correlation

Random variables with zero or nearzero correlation Number on rst die, number on second die Michelle Obamas temperature, Number of questions asked in this class today Stock gain today, stock gain tomorrow

Statistics Review Covariance and Correlation

Correlation versus Causality


The difference between causation and correlation is a key concept in econometrics. We would like to identify causal effects and estimate their magnitude. It is generally agreed that this is very difcult to do; often a causal interpretation can be given that is consistent with results derived from an appropriate statistical procedure; having an economic model is often essential in establishing the causal interpretation. These issues are confused all the time by politicians and the popular press For some rst thoughts, suppose X and Y are positively correlated.

Statistics Review Covariance and Correlation

Case 1: X Y (or Y X )

Money supply Ination Increase in minimum wage Increase in wages Retirement Decline in income

Statistics Review Covariance and Correlation

Case 2: X Y and Y X

Prices, quantities Earnings, hours worked

Statistics Review Covariance and Correlation

Case 3: Z X and Z Y

Earnings of wife, earnings of husband Education, earnings GDP growth, unemployment

Statistics Review Normal Random Variables

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Normal Random Variables

Normal Random Variables


Normal random variables play a big role in econometricspartly because they are tractable, partly because normality is a recurring phenomenon in estimation theory. The distribution of a normal random variable depends only on its mean and variance, i.e. its rst two moments. To say it again, if its normal, and you know its mean, and you know its variance, you know everything about it. You know its exact distribution. The sum of normal random variables is normal. Many distributions (we encounter) are approximately normal. The foregoing facts and what we already know about mean and variances are important for all that we do.

Statistics Review Normal Random Variables

An Example of Calculations with Normal Random Variables

Suppose Y = a1 X1 + a2 X2 + c,
2 ) and X N ( , 2 ), and the covariance where X1 N (1 , 1 2 2 2 between X1 and X2 is 12 .

What is the distribution of Y ?

Statistics Review Normal Random Variables

Y is the sum of normal random variables so if we know its mean and variance, then we know its distribution. Using the rules for expectations to calculate the mean: E(Y ) = E(a1 X1 + a2 X2 + c) = E(a1 X1 ) + E(a2 X2 + c) = a1 E(X1 ) + a2 E(X2 ) + c = a1 1 + a2 2 + c Using the rules for variances: Var(Y ) = Var(a1 X1 + a2 X2 + c) = Var(a1 X1 + a2 X2 )
2 = a2 1 Var (X1 ) + a2 Var (X2 ) + a1 a2 2Cov(X1 , X2 )

Statistics Review Normal Random Variables

Therefore we know that:


2 Y N (Y , Y )

where Y
2 Y

= a1 1 + a2 2 + c
2 = a2 1 Var (X1 ) + a2 Var (X2 ) + a1 a2 2Cov(X1 , X2 ) 2 2 2 = a2 1 1 + a2 2 + 2a1 a2 12

Statistics Review Conditional Expectations

Outline
1 2 3 4 5 6 7 8

Random Variables Distribution Functions The Expectation of a Random Variable Variance Continuous Random Variables Covariance and Correlation Normal Random Variables Conditional Expectations

Statistics Review Conditional Expectations

Conditional Expectations
Often the goal of empirical research in economics is to uncover conditional expectations. Formally, I could derive the conditional probability density function and derive conditional expectation from that. If you are interested you can nd this in Appendix B of Wooldridge. Instead I want to think of a conditional expectation in a looser and informal way The question we often care about is if I could gather everyone in the world for whom X is some particular value, what would be the expected value of Y

Statistics Review Conditional Expectations

We dene conditional expectation E (Y | X ) to mean: if I condition X to be some value, what is the expected value of Y ? In almost all interesting cases Y is a random variable so after choosing X we dont know exactly what Y will be E(Y | X ) depends on X , so changing X will change the expected value of Y Very often in Economics we care about conditional expectations.

Statistics Review Conditional Expectations

Examples

Lets consider some examples. Note that all of this ts in the descriptive type of analysis we consider We are not saying anything about causation

Statistics Review Conditional Expectations

Wages and Gender


Let X be the gender of an individual, we may be very interested in how wages vary with gender That is we are interested in E(wage | male) E(wage | female) How do we estimate this? Here it is pretty clear. To estimate an expectation we use the mean, to estimate the conditional expectation we use the conditional mean That is, just take the mean value of wages for men and the mean value of wages for women

Statistics Review Conditional Expectations

I do this using the le CPS78_85 from the textbook website I only look at the year 1985 In this year the values were: Average Wages 1985 Men Women $9.99 $7.88 This is the data

Statistics Review Conditional Expectations

Smoking and Birthweight


How does the birthweight of a newborn depend on smoking of the mother? Here I use the data set bwght to look at the relationship I just look at whether you smoked at all Birth weight is measured in ounces Birthweights Smoked Didnt Smoke 111.15 120.06

Statistics Review Conditional Expectations

Baseball Position and Salaries


How do the salaries of baseball players vary with the position that they play? Baseball Salaries Position First Base Second Base Third Base Shortstop Catcher Outelder Salary $1,586,781 $1,309,641 $1,382,647 $1,069,21 $892,519 $1,539,324

Statistics Review Conditional Expectations

Voting Outcomes and Campaign Expenditures


How does the fraction of votes you get depend on campaign expenditure? The raw data looks like this

Statistics Review Conditional Expectations

How do I estimate this? Clearly I cant just condition on all levels of expenditure and take the mean We need a model to help us think about this

You might also like