0% found this document useful (0 votes)
17 views

Stat Lecture 2

This document discusses data, models, parameters, and statistics. It provides examples of different datasets, including Old Faithful eruption data, ChickWeight data, and Longley's Economic Regression Data. It then discusses assumptions made when performing statistical inferences on data, such as samples being independent and identically distributed. Parameters of distributions are discussed as unknown values that need to be estimated. Basic statistical models like estimation, confidence intervals, hypothesis testing, and prediction are introduced. The document concludes by discussing measuring the performance of statistical results and the definition of an unbiased estimator.

Uploaded by

Yuvraj Wale
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Stat Lecture 2

This document discusses data, models, parameters, and statistics. It provides examples of different datasets, including Old Faithful eruption data, ChickWeight data, and Longley's Economic Regression Data. It then discusses assumptions made when performing statistical inferences on data, such as samples being independent and identically distributed. Parameters of distributions are discussed as unknown values that need to be estimated. Basic statistical models like estimation, confidence intervals, hypothesis testing, and prediction are introduced. The document concludes by discussing measuring the performance of statistical results and the definition of an unbiased estimator.

Uploaded by

Yuvraj Wale
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data, Models, Parameters, and Statistics

In this lecture, we will see more datasets and give a brief introduction to some typical
models and setups.

In statistics, our starting point is a collection of data . Each could be

a number, a vector, or even a matrix. Our goal is to draw useful information from the
data.

Examples:

1. Old faithful data.


data(faithful)
faithful

eruptions: numeric Eruption time in minutes.


Waiting: numeric Waiting time to next eruption (in minutes).

2. ChickWeight data
data(ChickWeight)
ChickWeight

Weight: a numeric vector giving the body weight of the chick (gm).
Time: a numeric vector giving the number of days since birth when the measurement
was made.
Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the
chick.
Diet: a factor with levels 1,...,4 indicating which experimental diet the chick received.

3. Longley's Economic Regression Data


data(longley)
longley
This is a macroeconomic data set which provides a well-known example for a highly collinear
regression.
GNP.deflator: GNP implicit price deflator (1954=100)
GNP: Gross National Product.
Unemployed: number of unemployed.
Armed.Forces: number of people in the armed forces.
Population: ‘noninstitutionalized’ population >= 14 years of age.
Year: the year (time).
Employed: number of people employed.

lm(Employed ~ GNP,data=longley)
4. Air passenger data.
data(AirPassengers)
AirPassengers

Assumptions: Once we have a dataset, we need proper assumptions to do statistical


inferences (Estimation, Testing, Prediction, Confidence Interval, etc).

1. The samples are independent.


2. The samples are identically distributed.
3. Relationship among the coordinates of each sample (linear, for example).
4. The samples follow a particular distribution (normal, exponential, uniform, etc.).
5. ……..

We should be careful when apply those assumptions on the dataset.

Parameters: If we assume the samples follow some particular distribution, there will
be parameters for the distribution, generally unknown.

Example : Michaelson-Morley Speed of Light Data.


data(morley)
morley
attach(morley)
hist(Speed)
qqnorm(Speed)

The samples of Speed are approximately normal, so assume Speed follow a

distribution is reasonable. But the parameters and are unknown. We need to


estimate them in some cases.

Basic Models and Goals


1. Estimation.
Observe i.i.d. samples . They follow some distribution with parameter

. Our goal is to estimate , or more generally, a function of , .

2. Confidence Interval.
We do not need a actual estimate of the parameter. But we want to find a interval
such that it will cover the true parameter with high probability (for example,
95%).

3. Hypothesis Testing.
We want to get a yes or no answer to some questions. Foe example or ,
or .

For example:In ChickWeight data, we want to compare the weight of Chicken with
different diet.

4. Prediction.
Predict the value of next observation. For example, the air passenger data.

5. Linear Regression Model.


We observe paired data. . We assume are nonrandom

and are realization of the random variables

where are independent random variables with expectation 0 and variance .

and are unknown parameters. is called the regression line. We want to


estimate it.

data(trees)
attach(trees)
plot(Volume, Girth)

Measurement of Performance
Once we got an answer to a statistics problem, we need to know how good it is. We
need to measure the performance of our decision.

Unbiased estimation.
Mean squared error.
Efficiency.
……

Unbiased Estimator
In this lecture, we will study the estimation problem. Our goal here is to use the
dataset to estimate a quantity of interest. We will focus on the case where the quantity
of interest is a certain function of the parameter of the distribution of samples.

Examples:
1. data(morley)
We want to estimate the speed of light, under normal assumption.
2. Exponential distribution. (life time of a machine)
X<-rexp(100,rate=2)
Let us pretend that we do not know the true parameter (which is 2), and estimate it
based on the samples.

An estimate is a value that only depends on the dataset , i.e., the estimate

is a function of the data set .

One can often think of several estimates for the parameter of interest.
In example 1, we could use sample mean or sample median.
In example 2, we could use the reciprocal of the sample mean or .

Then we need to answer the following questions:


When is one estimate better than another? Does there exist a best estimate?

Since the dataset is a realization of random variables . So

the estimate is a realization of random variable . is called an

estimator.

Example:
y<-rep(0,50);
z<-rep(0,50);
for (i in 1:50) {
X<-rexp(100,rate=2);
y[i]<-1/mean(X);
z[i]<-log(2)/median(X);
}
For each set of samples, we have an estimate. So the estimator is a
random variable. We need to investigate the behavior of the estimators.
hist(y); mean(y); var(y);
hist(z); mean(z); var(z);
The mean squared error of the estimator is defines as

mean((y-2)^2)
mean((z-2)^2)

Now we know that an estimator is a random variable. The probability distribution of


is also called the sampling distribution of .

Definition: An estimator is called an unbiased estimator for parameter , if


for all . Generally, the difference is called the bias of .

Let us consider the normal mean problem. Suppose follow

distribution and we want to estimate . Since is the expectation of the distribution,


an intuitive estimator will be . This is an unbiased estimator.

Unbiased estimator for expectation and variance


Suppose are i.i.d. random variables with mean and variance . Now

we have the following unbiased estimators for both of them.


is an unbiased estimator of and

is an unbiased estimator of .

Remark: Unbiaed estimators do not necessarily exist. Unbiasedness does not


always carry over. is an unbiased estimator of does not mean is an
unbiased estimator of , unless is a linear function.

Method of Moments

From the previous normal example, we can see that if the parameter of interest is the
expectation or variance of the distribution, we can use the sample expectation or
sample variance to estimate it. This estimator is reasonable.

Suppose we have i.i.d. samples follow some distribution with unknown

parameter . Now we want to estimate this parameter . We first calculate the


expectation of the distribution . Usually, this is a function of (think about
normal or exponential distribution). Suppose , then under suitable
conditions, can be written as . Since we can always use sample mean

to estimate the expectation, we have an intuitive estimator for ,

In general, we can calculate the expectation of a function of . Suppose


for some function . In the previous discussion, .
Actually, could be any function, for example , , …as long as its
expectation is easy to compute. Then an estimator of would be

This method is called method of moments. From the Law of Large Number, we know
that these estimators are not bad.

You might also like