Stat Lecture 2
Stat Lecture 2
In this lecture, we will see more datasets and give a brief introduction to some typical
models and setups.
a number, a vector, or even a matrix. Our goal is to draw useful information from the
data.
Examples:
2. ChickWeight data
data(ChickWeight)
ChickWeight
Weight: a numeric vector giving the body weight of the chick (gm).
Time: a numeric vector giving the number of days since birth when the measurement
was made.
Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the
chick.
Diet: a factor with levels 1,...,4 indicating which experimental diet the chick received.
lm(Employed ~ GNP,data=longley)
4. Air passenger data.
data(AirPassengers)
AirPassengers
Parameters: If we assume the samples follow some particular distribution, there will
be parameters for the distribution, generally unknown.
2. Confidence Interval.
We do not need a actual estimate of the parameter. But we want to find a interval
such that it will cover the true parameter with high probability (for example,
95%).
3. Hypothesis Testing.
We want to get a yes or no answer to some questions. Foe example or ,
or .
For example:In ChickWeight data, we want to compare the weight of Chicken with
different diet.
4. Prediction.
Predict the value of next observation. For example, the air passenger data.
data(trees)
attach(trees)
plot(Volume, Girth)
Measurement of Performance
Once we got an answer to a statistics problem, we need to know how good it is. We
need to measure the performance of our decision.
Unbiased estimation.
Mean squared error.
Efficiency.
……
Unbiased Estimator
In this lecture, we will study the estimation problem. Our goal here is to use the
dataset to estimate a quantity of interest. We will focus on the case where the quantity
of interest is a certain function of the parameter of the distribution of samples.
Examples:
1. data(morley)
We want to estimate the speed of light, under normal assumption.
2. Exponential distribution. (life time of a machine)
X<-rexp(100,rate=2)
Let us pretend that we do not know the true parameter (which is 2), and estimate it
based on the samples.
An estimate is a value that only depends on the dataset , i.e., the estimate
One can often think of several estimates for the parameter of interest.
In example 1, we could use sample mean or sample median.
In example 2, we could use the reciprocal of the sample mean or .
estimator.
Example:
y<-rep(0,50);
z<-rep(0,50);
for (i in 1:50) {
X<-rexp(100,rate=2);
y[i]<-1/mean(X);
z[i]<-log(2)/median(X);
}
For each set of samples, we have an estimate. So the estimator is a
random variable. We need to investigate the behavior of the estimators.
hist(y); mean(y); var(y);
hist(z); mean(z); var(z);
The mean squared error of the estimator is defines as
mean((y-2)^2)
mean((z-2)^2)
is an unbiased estimator of .
Method of Moments
From the previous normal example, we can see that if the parameter of interest is the
expectation or variance of the distribution, we can use the sample expectation or
sample variance to estimate it. This estimator is reasonable.
This method is called method of moments. From the Law of Large Number, we know
that these estimators are not bad.