0% found this document useful (0 votes)

2 views

Statistic

The document discusses key concepts in statistics, including units of analysis, types of variables, sampling methods, and measures of central tendency such as mean, median, and mode. It also covers probability, random variables, and probability distributions, emphasizing the importance of sample size and the law of large numbers. Additionally, it explains the normal distribution and z-scores, which are used for standardizing scores for comparison.

Uploaded by

Giulia Pinna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Statistic

Uploaded by

Giulia Pinna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 47

STATISTICS

A unit of analysis is the entity that carries the information needed to realize the scope of a study.
Examples for units of analysis can be I an unemployed person (micro level), I a voter (micro), I
an organization (meso), I a country (macro).

The data in Figure represent a data matrix, which is a convenient and common way to organize
data, especially if collecting data in a spreadsheet. Each row of a data matrix corresponds to a
unique case (observational unit), and each column corresponds to a variable.

Types of variables

A variable is a characteristic that can take on a set of values for each observation in a sample or
population.

Quantitative/numerical if the measurement scale has numeric values.

 A variable is discrete if the set of possible values only allows separate values (normally
integers) such as 1, 2, 3, etc. An example is ‘age in years’.
 A variable is continuous if the set of possible values is an infinite continuum on the real
number scale. An example is ‘CO2 emissions in tons’.

Qualitative/categorical if the measurement scale has a set of categories.

 A variable is nominal if the set of categories it can take on have no logical ordering. An
example is ‘hair color’.
 A variable is ordinal if the set of categories can be clearly ordered from highest to lowest.
An example is ‘life satisfaction’.
If two variables are not associated, then they are said to be independent. That is, two variables
are independent if there is no evident relationship between the two.

Population

The population is the complete set of units of analysis for which the spatial, temporal, or factual
selection criteria apply.

 A population is called finite if the number of units of analysis can be counted. Examples:
Voters in Botswana, Students at a university, etc.
 A population is called infinite if there is (potentially) an infinite number of observations.

As one of its major contributions, statistics uses data from a sample to make estimates and test
hypotheses about the characteristics of a population through a process referred to as statistical
inference. This generalization from a small sample is called inference.

A sample represents a subset of the cases and is often a small fraction of the population. Each
unit of analysis from the population for which we actually obtain data is called an observation.
All observations together make up the sample. For all observations in the sample, we register
certain quantities of interest to us in a dataset. These quantities of interest are called variables.

A parameter is a numerical summary of the population, e.g. the mean of male/female heights. A
statistic is the numerical summary of the sample data.

Wouldn’t it be best to include the entire population in the study instead of taking a sample? This
is called a census. There are a few (obvious) problems with taking a census:

 It is difficult to complete a census, some units of analysis might be hard to reach or

unwilling to participate.
 Populations are often changing constantly, think for example about the population of
unemployed people.
 Taking a census can be prohibitively costly.
Sampling Bias, which results in a non-representative sample, can come from various sources

 Non-response: If too many people in a small sample don’t respond, this might lead to
problems. More importantly, it might be that people sharing certain characteristics are
harder to sample.
 Voluntary response: Occurs when only people are sample who volunteer to participate.
These might be for example people who might have a different (stronger?) view on
certain issues and the sample will not be representative for the population.
 Convenience sample: Includes only individuals who are easily accessible. For instance, if
a political science student samples only her peers for a small study.

There are several statistical sampling methods you can use:

1. Simple Random Sample = in simple random samples, every member of a population has
an equal chance of being included in the sample.
2. Stratified Random Sample = take into account different layers within the population.
We ensure that our sample is representative of the population by also keeping the same
ratio of sample sizes as the ratio of the population strata. Stratified sampling is beneficial
to ensure that enough data from each category is collected to make a meaningful
inference, and it also allows us to draw conclusions within each of the groups.
1. Cluster Sample = focus on representative subsets of the population. (prima divide in
gruppi randomici e poi scelgo random da quei gruppi)
The sampling error of a statistic equals the difference between the predicted value obtained by a
sample, and the population parameter. For instance, if a polling organization predicts a party to
win 33% of the votes on election day, but the party actually wins 36%, then the sampling error is
3%.
Frequency
The absolute frequency of a category i is the number ni of times this category occurs in the
dataset.
The relative frequency of a category is the proportion of percentage of observations falling into
n
that category. Mathematically, this is simply expressed as follows: f i= i
n
The pie chart provides another graphical device for presenting relative frequency and percent
frequency distributions for categorical data.
In addition, we could also compute cumulative frequencies. Yet this only makes sense for
categorial and ordinal data. The cumulative frequency Fi is the ‘running total’ of frequencies.

Histogram
A graph which displays the classes on the x-axis and the frequencies on the y-axis. The
histograms tell us something about the distribution of the variable in question.
 We are, on the one hand, interested in the modality of the distribution, i.e. how many
peaks the distribution has.
 On the other hand we can also analyse the skewness of the distribution, i.e. how
asymmetric the distribution is.
MODE

We can say that the centre of the distribution is the most common class. Such a measure is called
the mode. The mode is defined as the most common value in a data set. The modal class of a
distribution is the class with the highest frequency.

MEDIAN

Alternatively, we can say that the centre of the distribution is the value that splits the distribution

in half. This value is called the median. As mentioned, the median is the point
at which 50% of the values lie below it. Therefore, it is also called the 50th percentile.
The median is much more robust to outliers than the mean. This means, single unusual values do
not influence the median greatly, while they do have an impact on the mean. For symmetrical
distributions, the mean and the median are the same. For skewed distributions, the mean lies
further in the direction of the skew than the median.

MEAN

The mean is defined as the sum across all observations, divided by the number of observations

The mean has some important properties:

 The mean is only appropriate for numerical data, and it is not sensible to compute the
mean for categorical data, even if they have numbers associated with every category. For
instance, when you ask respondents in a survey whether they fully agree (1), agree (2),
disagree (3), or fully disagree (4), you should not compute the mean at the end for the full
dataset.

 The mean is sensitive to outliers in the data, i.e. single unusual observations may have a
strong influence on the mean.

 The mean is the only single number for which the residuals (deviations from the estimate)
sum to zero.
 The sum of squared residual is minimized at the mean.

 This means that the mean is the best choice if you want to present the ‘typical value’ of
the data.

Interquartile Range

The 25th, 50th, and 75th percentiles are named quartiles. The interquartile range (IQR),
calculated as the difference between the first (Q1) and third (Q3) quartiles (i.e., the 25th and 75th
percentiles), is another measure of the variability of a data set. The interquartile range gives us a
view of how the middle 50% of the data are spread. Higher values of the interquartile range
imply a larger spread of the data. A graphic representation of this interquartile range is called a
box plot.

There are five values of interest shown in the boxplot:

 The minimum value in the data set.
 The maximum value in the data set.
 The three quartiles (25th, 50th and 75th percentiles).
The two lines extending out at the top and bottom of the box are called the ‘whiskers’.

Variance

The variance is roughly the average squared deviation from the mean, or roughly the mean
squared error

Why do we use the squared deviation in the formula?

 to get rid of negative values,
 and to weigh larger deviations more heavily.

Why do we divide by n − 1 instead of n?

 Because the sum of the residuals must add up to zero, the last observation does not add
any additional information to us.
 We account for this lower information and divide by n − 1.

Standard deviation

The variance is the average of the squared distances from the mean. Thus, it has a different scale
than the original data. When we talk about the square root of the variance, we are back at the
original scale of the data. This is called the standard deviation.

The standard deviation is the square root of the variance and has the same scale as the data.

Random Process = A random process is a situation in which we know the possible outcomes
that could happen, but we don’t know which particular outcome will happen for a given
observation.

PROBABILITY

The probability of an outcome is the proportion of times the outcome would occur if we
observed the random process an infinite number of times.

This is the frequentist interpretation. An alternative interpretation, the Bayesian interpretation,

would state that we have a degree of belief prior to observing an outcome, and this belief would
then be updated after our observation. In this course, however, we will stick with frequentist
statistics.

The law of large numbers

The Law of Large Numbers states that as more observations are collected, the proportion of
occurrences with a particular outcome, ^pn, converges to the probability of that outcome, p.
 ^pn is the estimate for the probability at sample size n. Estimates are usually denoted with
the ‘hat’
 p is the ‘true’ value of the parameter we want to obtain.

certainty about our estimate increases. ⇒ Larger sample sizes are better!
What the Law of Large Numbers actually tells us, is that as we increase the sample size, our

Sample Space = collection of all possible outcomes of a trial. A student has sat a pass/fail exam.
What is the sample space for the outcome of the exam? S = {P, F}

Complementary Events = two mutually exclusive events whose probabilities add up to 1. A

student has sat an exam. If we know that the student did not fail, we know that the student
passed. → Pass and fail are complementary outcomes.

Disjoint (mutually exclusive) outcomes: Cannot happen at the same time. The outcome of a
single coin toss cannot be a head and a tail.
Non-disjoint outcomes: Can happen at the same time. A student can get an A in Stats and A in
Econ in the same semester.

Independence = Two processes are independent if knowing the outcome of one provides no
useful information about the outcome of the other.

MARGINAL PROBABILITY

The marginal probability is the probability of A occurring, p(A), and may be thought of as the
unconditional probability (i.e. it does not depend on another event). The event in the marginal of
a table.
What is the probability that a (depressive) patient
relapsed?
48
P ( relapsed )= ≈ 0.67
72

JOINT PROBABILITY
The joint probability is the probability of A and B occurring in the same event, p (A ∩ B). This is
also called the intersection of A and B. The ones inside the table.

CONDITIONAL PROBABILITY

The conditional probability is the probability A occurring, given that B occurs.

P(A∩B)
P ( A|B )=
P(B)

What is the probability of relapsing, given that the patient takes desipramine?

10
P ( relapsed ∩desipramine ) 72 10
P ( relapsed|desipramine )= = = ≈ 0.42
P ( desipramine ) 24 24
72

RANDOM VARIABLES
A random variable can take on (at least two) different values, and each possible outcome is
associated with a probability that it occurs. Conventionally, we use a capital letter, like X, to
denote a random variable. The values of a random variable are denoted with a lower-case letter,
in this case x.

Thus, we can write out probabilities that x occurs, P (X = x). For example, rolling a die once can
take on the values X = {1, 2, 3, 4, 5, 6} (this is the sample space). We want to know the
probability that an event smaller or equal to x = 2 occurs. P (X ≤ 2) = 1/3
EXPECTATION

The expected value of a random variable is the long-run average of all possible outcomes. In
other words, this is the mean. The expected values do not tell us anything about the short-run,
e.g. which number will occur in the next role of the die, but about what we should expect on
average over many trials. Mathematically, we express the expected value as follows:

In a game of cards, you win $1 if you draw a heart, $5 if you draw an ace (including the ace of
hearts), $10 if you draw the king of spades and nothing for any other card you draw. Write the
probability model for your winnings, and calculate your expected winning.

VARIABILITY

As in the case of a sample we discussed last time, we can also compute the long-run variance and
standard deviation for a random variable.
Note that σ and σ 2 are for the very long-run or the population, while s and s2 are for a sample.
For this reason, σ 2 is divided by n instead of n – 1

PROBABILITY DISTRIBUTION FOR DISCRETE VARIABLES

A probability distribution of a discrete random variable lists all possible outcomes and their
respective probabilities. Rules for probability distributions:

1. The events listed must be disjoint.

2. Each probability must be between 0 and 1, 0 ≤ P(X =x)≤1.
k
3. The probabilities must total 1: ∑ P ( X=x k )=1
i=1

PROBABILITY DISTRIBUTIONS FOR CONTINUOUS VARIABLES

Continuous variables have an infinite number of possible outcomes. Therefore, the probability
for every potential outcome is defined as 0, and we cannot use these values directly. Instead, we
need to define intervals of numbers, and assign probabilities between 0 and 1 to these intervals.
The resulting graph of such a distribution is a smooth, continuous curve. The fraction of the area
under the curve of an interval represents the probability that the variable takes on a value
contained in the interval.

Since height is a continuous numerical variable, it would be better to represent the distribution as
a continuous smooth curve instead of bins. We call this curve the probability density function.
NORMAL DISTRIBUTION

The normal distribution is symmetric, bell shaped, and characterized by its mean µ and the
standard deviation σ. No matter which normal distribution we look at, the probabilities of falling
within 1, 2, 3, x standard deviations of µ remain the same.

Denoted as N (µ , σ ) → Normal with mean µ and standard deviation σ

Z-Scores

The z-score for a value x of a random variable X is the distance of that value from µ expressed in
standard deviations.

These z-scores are also called standardizes scores, since they rescale distances from the mean of
all normal distribution to that of the standard normal distribution N (0, 1). Thus, z-scores make
comparison between normal distributions possible. Z-scores can also be computed for all other
(continuous) distributions, yet we cannot use them to calculate percentiles and to compare
distributions.
SAT and ACT scores (two standardized tests used for college admission in the US) are both

SAT ∼ N (1500, 300) ACT ∼ N (21, 5)

distributed nearly normally.

We want to compare two students: Pam, who earned an 1800 on her SAT, and Jim, who scored a
24 on his ACT.
Let’s standardize both of these scores:
1800−1500 300
 Pam: z= = =1
300 300
24−21 3
 Jim: z= = =0.6
5 5

Since these scores are standardized, we can now compare them and say that Pam’s score is 1

⇒ Pam performed better.

standard deviation higher than the mean, while Jim’s score is only 0.6 standard deviations higher

Percentiles

From the z-values we can

calculate percentiles. A
percentile is the percentage of
observations that fall below the
value that gave rise to the z-
value.

In our example, the percentile

would be the percentage of
students that performed worse
than Pam (or Jim). We can use
z-tables (from books or the
internet).
If we want to know the
percentage of observations that
fall above the value that gave rise to the z-value, we can simple use the formula:

So far, we looked at z-scores to evaluate how many percent lie above or below a given value. But
what if we want to know the probability of observations between two values?
 For instance, we might want to know the percentage of men not further away than 15 cm
of the average height of 178 cm. Thus, we look at all men between 163 and 193 cm.

First, we can look at the probability of men who are smaller than 193 cm, thus excluding all
those who are larger than that. The height of men is normally distributed with N (178, 10.1).
Let’s calculate the z-score and then use R to calculate probabilities:
193−178
z= =1.485149
10.1
Looking at the table I Find 0.9312478. Thus, about 93.12% of men are smaller than 193 cm.

From these 93.12%, we still need to subtract those observations that are smaller than 163 cm.
Again, we can calculate z-scores and percentiles:
163−178
z= =−1.485149
10.1
Looking at the table: 0.06875218. Thus, about 6.88% of men are smaller than 163 cm.

P(163< X <193)=P(Z <1.485)−P(Z>−1.485)=0.9312−0.0688=0.8624

Thus, about 86.24% of men are between 163 and 193 cm tall.

Conversely, if we know the distribution, we can also ask what the cut-off point for a certain
percentage is. For instance, we could ask at which height 10% of observations are smaller. To do
this, we need to look up the appropriate level in a z-score table, and then plug the value into the
z-score formula.

Thus, we need to plug the value −1.28 into our z-value formula.

No matter which normal distribution we look at, the following properties apply to all of them:
 about 68% falls within 1 SD of the mean,
 about 95% falls within 2 SD of the mean,
 about 99.7% falls within 3 SD of the mean.
 It is possible for observations to fall 4, 5, or more standard deviations away from the
mean, but these occurrences are very rare if the data are nearly normal.

The normal distribution is very important because many other distributions can be approximated
by the normal:

 The Binomial distribution is approximately normal for large n.

 The Poisson distribution is approximately normal for large λ values (λ is the only
parameter of the distribution).
 The Student’s t-distribution is approximately normal with mean 0 and variance 1 for
large n.
 For statistical purpose, the normal distribution is of particular importance because of the
Central Limit Theorem (CLT). The CLT states that the sampling distribution of a sample
parameter is approximately a normal distribution

PARAMETER ESTIMATION
We are often interested in population parameters. Since complete populations are difficult (or
impossible) to collect data on, we use sample statistics as point estimates for the unknown
population parameters of interest. Sample statistics vary from sample to sample. Quantifying
how sample statistics vary provides a way to estimate the margin of error associated with our
point estimate.
Example: List of random numbers: 59, 121, 88, 46, 58, 72, 82, 81, 5, 10
In our case, the sample mean is: x¯ = (8 + 6 + 10 + 4 + 5 + 3 + 5 + 6 + 6 + 6)/10 = 5.9

This is our population, if we sample it and find the mean of the sample multiple time we will find
slightly different means. What we see here, i.e. the variability of the sample means, is called a
sampling distribution.

A sampling distribution of a statistic is the probability distribution that specifies probabilities for
the possible values the statistic can take on.

 In a sample from a numeric variable (such as in our example), the sampling distribution
will be normally distributed with the mean of the population parameter, µ.
 The standard error describes the variability of the sampling distribution, and depends on
the population standard deviation, σ, and the sample size, n.
 The sampling distribution represents the distribution of the point estimates based on
samples of a fixed size from a certain population. It is useful to think of a particular point
estimate as being drawn from such a distribution.

STANDARD ERROR

The standard deviation of the sampling distribution of x is called the standard error of x and is
denoted by σ x . The standard error describes the typical error or uncertainty associated with the
estimate. Given n independent observations from a population with standard deviation σ, the
standard error of the sample mean is equal to:

Note that the SE decreases as n increases ⇒ we expect more consistent sample means as n
increases, thus the variability decreases. Also note that we usually do not know σ, hence we use
the sample standard deviation s instead.

CENTRAL LIMIT THEOREM

The Central Limit Theorem for random samples states that the sampling distribution is
σ
approximately a normal distribution with mean µ and standard error
√n
Certain conditions must be met for the CLT to apply:

1. Independence: Sampled observations must be independent. This is difficult to verify, but

is more likely if
a. random sampling/assignment is used,
b. and I if sampling without replacement, n < 10% of the population.

2. Sample size/skew: Either the population distribution is normal, or if the population

distribution is skewed, the sample size is large.
a. The more skewed the population distribution, the larger sample size we need for
the CLT to apply.
b. For moderately skewed distributions n > 30 is a widely used rule of thumb. This
is also difficult to verify for the population, but we can check it using the sample
data, and assume that the sample mirrors the population.

CONFIDENCE INTERVALS

A confidence interval for a parameter is an interval of numbers within which the parameter is
believed to fall. The probability that this method produces an interval that contains the parameter
is called the confidence level.
Example: A random sample of 50 college students were asked how many exclusive relationships
they have been in so far. This sample yielded a mean of 3.2 and a standard deviation of 1.74.
Estimate the true average number of exclusive relationships using this sample.

x=3.2 s=1.74 n=50

The approximate 95% confidence interval is defined as point estimate ± 2 × SE

We are 95% confident that college students on average have been in between 2.7 and 3.7
exclusive relationships.

Confidence interval, a general formula:

point estimate ± z∗× SE

where z* is a pre-defined level of confidence chosen by the analyst.

Conditions when the point estimate = x :

1) Independence: Observations in the sample must be independent.
a. random sample/assignment
b. if sampling without replacement, n < 10% of population
2) Sample size / skew: n ≥ 30 and population distribution should not be extremely skewed

What does 95% Confident Mean?

Suppose we took many samples and built a confidence

interval from each sample using the equation point estimate
± 2 × SE.

Then about 95% of those intervals would contain the true

population mean (µ). The figure shows this process with 25
samples, where 24 of the resulting confidence intervals
contain the true average number of exclusive relationships,
and one does not.

point estimate ± z∗× SE

z∗× SE is called margin of error. In order to change the confidence level, we need to adjust z* in
the above formula. The confidence levels most commonly used in practice are 90%, 95%, and
99%.

HYPOTHESIS TESTING

The null hypothesis ( H 0) is a statement that The alternative hypothesis ( H A ) represents

the parameter in question takes on a particular an alternative claim to H0 and states that the
value, either the status quo or often simply 0. parameter falls into some alternative range of
Thus, this is a claim that to be tested values.

We conduct a hypothesis test under the assumption that the null hypothesis is true. If the test
results suggest that the data do not provide convincing evidence for the alternative hypothesis,
we stick with the null hypothesis. If they do, then we reject the null hypothesis in favor of the
alternative.

Test statistic = first construct the distribution assumed under the Null, and then compute a z-
value of the observed value for this distribution.

P-VALUE
The p-value is the probability that the test statistic equals the observed value or an even more
extreme value in the direction predicted by H0. It is calculated presuming that HA is true.
 If the p-value is low (lower than the significance level, α, which is usually 5%) we say
that it would be very unlikely to observe the data if the null hypothesis were true, and
hence reject H0.
 If the p-value is high (higher than α) we say that it is likely to observe the data even if the
null hypothesis were true, and hence do not reject H0.

The most commonly used significance level in the social sciences is the 5% significance level.

 95% of observations have smaller z-  95% of observations have z-values

⇒ the p-value at µ + 1.64x for the under the Null ⇒ the p-value at µ ±
values than µ + 1.64x under the Null between µ − 1.96x and µ + 1.96x

one-sided test is exactly 0.05 (or at z = 1.96x for the two-sided test is exactly
1.64 for the standard normal). 0.05 (or at z = ±1.96 for the standard
normal).

Cut-Off Point
At which value is the cut-off point where we would reject the H0 (assuming the SE remains the
same)? Differently stated: At which value would the p-value be exactly 0.05?

Hence, we can plug in the values 8 for the mean under Null, 0.5 for the SE, and 1.64 for the z-
value at which we know that 95% of observations are expected to be smaller in a normal
distribution.

Therefore, given the (estimated) SE, we would reject the Null at the 5% confidence level for all
values equal or greater than 8.82. That is the point were the p-value equals 0.05.

Two-Tailed

Test I If the research question was “Do the data provide convincing evidence that the mean of
our sample is different than our H0 (instead of larger or smaller)?", the alternative hypothesis
would be different.

Then, when we observe a value x from a sample, we cannot only look at one side of the
distribution to see how likely the value is. We must also consider the other tail

For the p-value, this means that if we want to use the 5% confidence level that our hypothesis
holds, we must consider both tails.
 In practice, this means rejecting the Null if the sample falls in either of the tails, each of
which with 2.5% probability.
 This works because the normal distribution is symmetric.
 In R, we can therefore simply multiply the p-value from before by two

T-DISTRIBUTION
When we have a small sample size, it is better to be cautious and to use the more robust t
distribution instead of the normal distribution.
When working with small samples, and the population standard deviation is unknown (almost
always), the uncertainty of the standard error estimate is addressed by using a new distribution:
the t distribution.
 This distribution also has a bell shape, but its tails are thicker than the normal model’s.
 Therefore, observations are more likely to fall beyond two SDs from the mean than under
the normal distribution.

the standard error ⇒ we need larger t-values (not z-values any more) to reject H0
 These extra thick tails are helpful for resolving our problem with a less reliable estimate

Degree of freedom

The degrees of freedom (df) describe the shape of the t distribution. The larger the degrees of
freedom, the more closely the distribution approximates the normal model. If a sample has n
observations and we examine its mean, then we use the t distribution with df = n − 1. At around a
sample size of 30 (some say 50), the t distribution is almost equal to the normal distribution.

Properties of the t distribution:

 Always centered at zero, like the standard normal (z) distribution.
 Has a single parameter: degrees of freedom.

Finding the test statistic is very similar to the standard normal case. Instead of z-values, we now
compute t-values with df = n − 1 for the sample mean.

Once we know our t-value, we can again obtain p-values either using tables.

Confidence Interval

Confidence intervals are always of the form:

Point Estimate ± Margin of Error
ME is always calculated as the product of a critical value and SE. Since small sample means
follow a t distribution (and not a z distribution), the critical value is a t* (as opposed to a z*).
Point Estimate ± t∗× SE

t Distribution: Example for the Mean

For a study 29 people suffering from anorexia are treated with a new therapy. The study registers
the change in weight for each person receiving the therapy over two months. At the end of the
treatment period, the sample mean changed by 3.007 kilos, with a standard deviation of 7.309. Is
the treatment effective?

 First, define H0 and HA, then begin by calculating the SE for the study:
7.309
SE= =1.357
√ 29
3.007−0
 Next, get the t-value and the p-value (in R): =2.22
1.357

The p-value is 0.017. Thus, there is only an about 1.7% chance to observe the value of 3.007 if
the Null is correct. The value is significant at the 5% significance level.

Since we can’t find exact p-values using tables, for our hypotheses test we need to find the cut-
off value at which we start rejecting the Null. In this case, we are looking at a one-sides
hypotheses test (the new therapy has a positive effect), and we want to test the 5% confidence
level.

Thus, at t-values larger than 1.699, we reject H0. Since the value we found is 2.22, H0 is rejected
(at the 5% confidence level).

Finally, let’s construct a 95% confidence interval. Let’s find our t* value:

CI =3.007 ± 2.045∗1.357=[0.232 , 5.782]

We are 95% confident, that people with anorexia receiving the new treatment gain between
0.232 and 5.782 kg in the first two month.

DIFFERENCE OF TWO MEANS

Weights of diamonds are measured in carats. 1 carat = 100 points, 0.99 carats = 99 points, etc.
The difference between the size of a 0.99 carat diamond and a 1 carat diamond is undetectable to
the naked human eye, but does the price of a 1 carat diamond tend to be higher than the price of a
0.99 diamond?

We are going to test to see if there is a difference between the average prices of 0.99 and 1 carat
diamonds. In order to be able to compare equivalent units, we divide the prices of 0.99 carat
diamonds by 99 and 1 carat diamonds by 100, and compare the average point prices.

Parameter of interest: Average difference between the point prices of all 0.99 carat and 1 carat
diamonds.

Point estimate: Average difference between the point prices of sampled 0.99 carat and 1 carat
diamonds.

Our two hypotheses are:

Test statistic for inference on the difference of two small sample means is as follows:
point estimate−null value
Tdf =
SE
Where:

Now let’s use R to

compute a p-value for our one-sided test comparing two means. The p-value is small at 0.01,
hence there is only about a 1% chance of drawing our samples should the Null that there is no
difference between the two diamond classes be correct. We therefore reject H0, there is enough
evidence to suggest that 0.99 carat diamonds cost less than 1 carat diamonds.
INFERENCE FOR CATEGORICAL DATA

Approval Rating

In a survey, randomly chosen citizens of the (fictional) Republic

of Arstotzka were asked whether they approved or disapproved
of the government. We would like to estimate the proportion of
all Arstotzkans who approve of their government.

Parameter of interest: Proportion of all Arstotzkans who approve of their government.

p(a population proportion)

Point estimate: Proportion of sampled Arstotzkans who approve of their government.

^p (a sample proportion)

Note: So far we used sample mean x (the estimate) vs. population mean µ (the parameter).
Almost all other estimates are indicted by the hat over the symbol.

We can answer this research question about the proportion of Arstotzkans approving of their
government by using a confidence interval, which we know is always of the form
point estimate ± margin of error
And we also know that ME=critical value × standard error of the point estimate . So what is the
SE of a sample proportion?

Note: If p is unknown (as in most cases), we use ^pin the calculation of the standard error

CENTRAL LIMIT THEOREM FOR PROPORTIONS

Sample proportions will be nearly normally distributed with mean equal to the population mean,

p, and standard error equal to:

√ p(1− p)
n

But of course, this is true only under certain conditions: Independent observations (of less than
10% of the population). At least 10 successes and 10 failures.

We are given that n = 670, pˆ = 0.85, and we also just learned that the standard error of the
sample proportion is SE. Thus, we can construct confidence intervals:

The survey found that 571 out of 670 (85%) respondents have a favorable view of their
government. Do these data provide convincing evidence that more than 80% of Arstotzkans
approve of their government?

Since the p-value is low, we reject H0. The data provide convincing evidence that more than
80% of Arstotzkans value the work of their government. Important: For this hypothesis test we
536 ⇒ OK).
need at least 10 expected success and failures in the sample (670 × 0.2 = 134 and 670 × 0.8 =

DIFFERENCE OF PROPORTIONS

A survey asked people from the general population and then specifically students whether they
were bothered by global warming. Here is the result of the survey:

Now we want to test whether there is a difference in

the proportion of people who are concerned about
global warming in the two populations.

Parameter of interest: Difference between the

proportions of all students and all ‘normal’ citizens who are bothered a great deal by global
warming.

pstudends − p citizens

Point estimate: Difference between the proportions of sampled students and sampled ‘normal’
citizens who are bothered a great deal by global warming.

^pstudends − ^p citizens

The details are the same as before:

CI: point estimate ± margin of error
point estimate−null value
HT: Use Z= to find appropriate p-value.
SE
We just need the appropriate standard error of the point estimate (SE ^p students− ^p citizens ), which is the
only new concept.
Conditions:
 Independence within groups and less than 10% of either group sampled.
 Independence between groups.
 At least 10 observed successes and 10 observed failures in the two groups.

Construct a 95% confidence interval for the difference between the proportions of students and
citizens who are bothered a great deal by global warming

Do these data suggest that the proportion of all students who are bothered a great deal by
warming differs from the proportion of all citizens who do?

The question now is what is pˆ here? Remember that with one proportion we used the expected
proportion of H0 and HA. We don’t have this here, instead we use the pooled proportion of
successes in the sample. This is like saying that the two are essentially the same.

Interpretation: The z-value is small, the p-

value is large. Hence, there is no evidence that the two groups differ in their concern regarding
global warming. We fail to reject H0.

TYPE I AND TYPE II ERROR

Type 1 error is rejecting H0 when you shouldn’t have, and the probability of doing so is α
(significance level)

Type 2 error is failing to reject H0 when you should have, and the probability of doing so is β (a
little more complicated to calculate)
 Power of a test is the probability of correctly rejecting H0, and the probability of doing so
is 1 − β
In hypothesis testing, we want to keep α and β low, but there are inherent trade-offs.

The Null distribution and the true distribution are

overlapping. If our sample produces a value that is
smaller (in the depicted case) than the p-value of
0.05 of the Null distribution, then we commit a Type
II Error. Clearly, the power depends on the effect
size (δ).
There are several ways to increase power (and hence
decrease type 2 error rate):

1. Increase the sample size.

2. Decrease the standard deviation of the sample, which essentially has the same effect as
increasing the sample size (it will decrease the standard error). With a smaller s we have
a better chance of distinguishing the null value from the observed point estimate. This is
difficult to ensure but cautious measurement process and limiting the population so that it
is more homogenous may help.
3. Increase α, which will make it more likely to reject H0 (but note that this has the side
effect of increasing the Type I error rate).
4. Consider a larger effect size, change the hypothesis.

Real differences between the point estimate and null value are easier to detect with larger
samples. However, very large samples will result in statistical significance even for tiny
differences between the sample mean and the null value (effect size), even when the difference is
not practically significant. This is especially important to research: if we conduct a study, we
want to focus on finding meaningful results (we want observed differences to be real, but also
large enough to matter). The role of a statistician is not just in the analysis of data, but also in
planning and design of a study.

ASSOCIATION BETWEEN VARIABLES

We say there is an association between two variables, if the distribution of the dependent
variable changes in some way as the value of the independent variable changes.

 Dependent variable: This is usually the main variable of interest to us ⇒ we want to

explain why this variable change. Other names you might hear are “response variable”,
“regressand”, or “outcome variable”.
 Independent variable: This is the variable which we believe causes the dependent variable
to change. I Other names are “predictor variable”, “regressor”, or “explanatory variable”.

To identify the explanatory variable in a pair of variables, carefully investigate which of the two
is suspected of affecting the other. In the social sciences, we usually do this by using (or
developing) theory.
Thus, we derive a hypothesis: explanatory variable might affect →response variable
Labeling variables as explanatory and response does not guarantee the relationship between the
two is actually causal, even if there is an association identified between the two variables.
Observational study: Researchers collect data in a way that does not directly interfere with how
the data arise, i.e. they merely “observe", and can only establish an association between the
explanatory and response variables.

Experiment: Researchers randomly assign subjects to various treatments in order to establish

causal connections between the explanatory and response variables. This is where the famous
“correlation does not imply causation” comes from.

Scatterplot

Scatterplots are useful for visualizing the relationship between two numerical variables. Do life
expectancy and GDP per capita appear to be associated or independent? I They appear to be
linearly and positively associated: as GDP increases, life expectancy also increases

The scatterplots exhibit a tendency of the data that the two variables are associated. For instance,
nations with higher GDP tend to have higher life expectancies. The correlation describes this
trend numerically in a single number.

Correlation, which always takes values between -1 and 1, describes the strength of the linear
relationship between two variables. We denote the correlation by R. The correlation for the
observations (x1, y1), (x2, y2), ...,(xn, yn) has the following formula:

For each observation, both the xi and the yi values are standardized and then multiplied. So if
observations tend to increase/decrease together, they will have (on average) standardized values
for x and y with the same sign. As a consequence, the sum of these multiplied standardized
values will increase, and the correlation will be relatively large.

positive, sometimes negative (by random chance), and the sums will tend to cancel out ⇒ the
If they do not increase/decrease together, the standardized values for x and y are sometimes

correlation will be close to zero.

Properties of the Correlation

 The correlation only makes sense when there is a linear relationship between the two
variables. The correlation value must strictly fall between -1 and 1 and does not depend
on the variables’ units.
 When all observations fall on a straight line, then R is exactly 1 (or -1 if the slope is
negative).
 The larger the value of R, the stronger the association between the variables
SIMPLE LINEAR REGRESSION

We have data for two variables:

 y = the dependent variable (e.g. income)
 x = the independent variable (e.g. education)
We want to find the probability distribution of y as a function of x. Linear regression model: We
look for a straight line that models the mean of y as a function of x (i.e. at every value of x), plus
an error term. The goal is to find the line that fits the data best.

β0 and β1 are the model coefficients or parameters.

In this case, we have a perfect relationship.

β0 = 0 and β1 = 57.49

This is what you all learned in high school as a

linear function. y can be fully described by x plus the two coefficients. y = 0 + 57.48x Thus β0
is the intercept (the mean value of y when x equals zero), and β1 is the slope of the line. In a
simple linear regression, the fit is of course not perfect, and we look for a line that best represents
the data. In such (regular) cases, we look for a line that is ‘as close as possible’ to all data points
(very colloquial, non-technical description).
The formula we used so far (y = β0 + β1x) describes the fitted line after the estimation. The full
model before the estimation has the formula

 where I xi and yi are the values of the dependent and the independent variable for
observation i = 1, ..., n,
 ei is the error term for observation i = 1, ..., n (also called residuals),
 β0 and β1 are the model coefficients we want to estimate.

From this formula we want to estimate the best fit for β0 and β1, given the data. These estimates
are then called ^β 0(beta-zero hat) and ^β 1 (beta-one hat)

Fitted value: Also called predicted value. The expected value for an observation i at its given
value xi.

This fitted value always lies on the regression line!

Residuals: The residual for observation i is the difference between the observed response ( y i)
and its fitted value ( ^y i) Residuals are the ‘leftovers’ from the model fit: Data = Fit + Residual.

To obtain the best linear unbiased estimator (BLUE), we need to find the line with the smallest
sum of squared residuals (SSR).
n
Mathematically we want ¿ minimize: ∑ e2i
i=0

Intercept
The intercept is where the regression line intersects the y-axis. The calculation of the intercept
uses the fact the regression line always passes through (¯x, y¯).

Since there are no states in the dataset with no HS graduates, the intercept is of no interest, not
very useful, and also not reliable since the predicted value of the intercept is so far from the bulk
of the data.

Using the linear model to obtain the value of the response variable for a given value of the
explanatory variable is called prediction. We simply plug in the value of interest for x in the
linear model equation. I There will be some uncertainty associated with the predicted value.

Model Assumptions
GOODNESS OF FIT: R2

The strength of the fit of a linear model is most commonly evaluated using R2. I It tells us what
percent of variability in the response variable is explained by the model. The remainder of the
variability is explained by variables not included in the model or by inherent randomness in the
data. For the model we’ve been working with, R2=−0.622=0.38 . Interpretation: 38% of the
variability in the % of residents living in poverty among the 51 states is explained by the model.

How well does our model explain the total variation of the dependent variable? After running a
regression, there is a part of the variation that is explained, and the rest (the residuals).
INFERENCE FOR LINEAR REGRESSION

In 1966 C. Burt published a paper called “The genetic determination of differences in

intelligence: A study of monozygotic twins reared apart?" The data consist of IQ scores for 27
identical twins, one raised by foster parents, the other by the biological parents.

1. Additional 10 points in the biological twin’s IQ is associated with additional 9 points in the
foster twin’s IQ, on average.

2. Roughly 78% of the foster twins’ IQs can be accurately predicted by the model.

The linear model is fosterIQ \ =

9.2 + 0.9 × bioIQ.

Foster twins with IQs higher than

average IQs tend to have
biological twins with higher than
average IQs as well.
Assuming that these 27 twins comprise a representative sample of all twins separated at birth, we
would like to test if these data provide convincing evidence that the IQ of the biological twin is a
significant predictor of IQ of the foster twin. Thus, we can set the hypotheses:

If H0 is rejected, we will conclude that a statistically significant relationship exists between the
two variables. However, if H0 cannot be rejected, we will have insufficient evidence to conclude
that a significant relationship exists.

 HA: The intelligence of the biological twin does predict the foster twin’s IQ (nature).
 H0: The intelligence of the biological twin does not predict the foster twin’s IQ (nurture).

Testing for the Slope

We always use a t-test in inference for regression. We use the usual (by now known) formula to
obtain the t-statistic:
Maybe the most important assumption of our OLS models is that the relationship between y and
x is accurately described by a line. Using this assumption allows us to:
1) Characterize the relationship between x and y with a single number.
2) Easily interpret the marginal effect of x (the effect of a one-unit increases in x).
However, if the strong assumption of linearity is wrong, our results are wrong and meaningless.
Thus, we need to carefully assess the linearity assumption. Check using a scatterplot of the data,
or a residuals plot with the independent variable on the x-axis and the residuals on the y-axis.

The next assumption of the OLS model is that the residuals are (nearly) normally distributed.
Thus, at every point of x the residuals should follow a normal distribution. This condition may
not be satisfied when there are unusual observations that don’t follow the trend of the rest of the
data. Check using a histogram or normal probability plot of residuals.
The variability of points around the least squares line
should be roughly constant. This implies that the
variability of residuals around the 0 line should be
roughly constant as well. Also called homoscedasticity.
Check using a histogram or normal probability plot of
residuals.

Heteroskedasticity means that the

variance is not the same at
various values of x.
This violates our assumption of
constant variance (also called
spherical error terms). We can
assess the homoskedasticity
assumption by looking at the
residual plots.
MULTIPLE REGRESSION

Sometimes we include more than a single regressor (independent variables) in the model. This is
called a multiple linear regression (often simply called multiple regression).

Weights of Books

An increase in the size of books by 10 cm3 is

expected to increase the weight by 7 grams.

About 80% of the weight difference of books can be

explained by volume.

Let’s add a second regressor to the model:

 Slope of volume: All else held constant, books that are 1 more cubic centimeter in
volume tend to weigh about 0.72 grams more.
 Slope of cover: All else held constant, the model predicts that paperback books weigh
184 grams lower than hardcover books.
 Intercept: Hardcover books with no volume are expected on average to weigh 198 grams.
Obviously, the intercept does not make sense in context. It only serves to adjust the
height of the line.

Dummy Variables

The cover type variable has only two categories (hard cover or paperback). We call such a
variable a dummy variable. Dummy variables are categorical variables in binary form (entries
are either 0 or 1). Thus, in our example, one cover type will get the value 0, the other the value 1.
The benefit of such zero-one variables is that they lead to regression models with easy
interpretation. Advise: When creating your own dummy variables, name your variable after the
1-category (e.g. “female”). This way it is clear later on which category is which!

If we have a variable with multiple categories, then we get multiple estimates that change the
intercept. We need to include one dummy variable for every category (save the base category),
i.e. a variable that takes the value 1 if the observations is in that category, and 0 otherwise. The
intercept for the base category is the ‘normal’ intercept. Including an extra dummy for the base
category is called the dummy variable trap.
Example: If there are four political parties and
we create dummies for a regression for them,
they look as follows:

Collinearity Between Explanatory Variables

Two predictor variables are said to be collinear when they are correlated, and this collinearity
complicates model estimation. Predictors are also called explanatory or independent variables.
Ideally, they would be independent of each other.

Redundancy: Addition of such variables brings nothing to the table. Instead, we prefer the
simplest model, i.e. parsimonious model.

Biased estimates: If correlated we don’t know what part of the variance in y is caused by x1 or
x2. While it’s impossible to avoid collinearity from arising in observational data, experiments are
usually designed to prevent correlation among predictors: e.g., traffic accidents (y) = age (x1) +
legal drinking (x2)

Maybe compare fr. of accidents for people just below/above the legal age?
Adjusted R2
So, if adding variables to the model might cause problems, why should we add variables to our
models (and not test them one by one)? There are three reasons why we might want to do this:
1. We want to maximize predictive power
2. We have more than one hypotheses of interest.
3. There exists a variable that influences y and is correlated with x (omitted variable bias).
e.g., the effect of education on income and the role played by motivation in both

If we think back to our simple linear regression, we had the formula

Thus, the error term ei captures all the variance of y that is still unexplained by the model. This
unexplained variance has two parts
 Some of the variance can be explained by other (measurable) factors.

completely predictable) ⇒ this is no problem for us.

 In the social sciences, there is always random white noise (behavior of people is not

However, if any of the influential variables (that are captured in the error term) is correlated with
our independent variable of interest, then the regression result will be systematically biased! We
need to have a theoretically driven plan (assumptions): causal graphs (DAGs)

Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
IB Standard Level Maths Analysis Approaches
No ratings yet
IB Standard Level Maths Analysis Approaches
23 pages
AA SL - Unit 1a - Representing Data (Statistics)
No ratings yet
AA SL - Unit 1a - Representing Data (Statistics)
74 pages
math notes module 4A
No ratings yet
math notes module 4A
4 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Week 5 - Result and Analysis 1 (UP)
No ratings yet
Week 5 - Result and Analysis 1 (UP)
7 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Statistics 091147
No ratings yet
Statistics 091147
60 pages
CHAPTER+ONE+Descriptive+Statistics+ +Univariate
No ratings yet
CHAPTER+ONE+Descriptive+Statistics+ +Univariate
12 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Lesson+1+Introduction+to+Statistics
No ratings yet
Lesson+1+Introduction+to+Statistics
12 pages
Prob & Stat
No ratings yet
Prob & Stat
50 pages
Unit 2
No ratings yet
Unit 2
72 pages
Modified Ps Final 2023
No ratings yet
Modified Ps Final 2023
124 pages
Basic Statistics notes
No ratings yet
Basic Statistics notes
10 pages
Statistics L 1
No ratings yet
Statistics L 1
27 pages
Symboisis Statistics
No ratings yet
Symboisis Statistics
102 pages
All The Statistical Concept You Required For Data Science
No ratings yet
All The Statistical Concept You Required For Data Science
26 pages
Statistics Notes BS-1
No ratings yet
Statistics Notes BS-1
13 pages
1 Advanced Statistics
No ratings yet
1 Advanced Statistics
21 pages
1 - Basic Concepts
No ratings yet
1 - Basic Concepts
71 pages
week one may 20 bcsc108
No ratings yet
week one may 20 bcsc108
13 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
Statistics
No ratings yet
Statistics
25 pages
Elementary Statistics and Probability Chapter 1 3
No ratings yet
Elementary Statistics and Probability Chapter 1 3
5 pages
Statistics
No ratings yet
Statistics
12 pages
Intro To Statistics
No ratings yet
Intro To Statistics
35 pages
Economics Sem 1Lecture Notes Introduction to Statistics (1)
No ratings yet
Economics Sem 1Lecture Notes Introduction to Statistics (1)
90 pages
Statistics For Economists: Lecturer: DR Omid Mazdak Email: Omid - Mazdak@kcl - Ac.uk
No ratings yet
Statistics For Economists: Lecturer: DR Omid Mazdak Email: Omid - Mazdak@kcl - Ac.uk
25 pages
Stat
No ratings yet
Stat
5 pages
Study Notes
No ratings yet
Study Notes
154 pages
STAT C1
No ratings yet
STAT C1
4 pages
Stats Unit I Notes
No ratings yet
Stats Unit I Notes
24 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Statistics Introduction
No ratings yet
Statistics Introduction
37 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
M7u5 Parentletter
No ratings yet
M7u5 Parentletter
2 pages
Year 12 Statistics
No ratings yet
Year 12 Statistics
62 pages
Prelims Biostat
No ratings yet
Prelims Biostat
9 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Social Work Stat
No ratings yet
Social Work Stat
52 pages
STAT. Lec.1
No ratings yet
STAT. Lec.1
30 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Intro123243ewqs1
No ratings yet
Intro123243ewqs1
37 pages
Statistics and Probability
No ratings yet
Statistics and Probability
4 pages
Math as a Tool Data Management Introduction and Central Tendency
No ratings yet
Math as a Tool Data Management Introduction and Central Tendency
12 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
18 pages
Part 1 QT
No ratings yet
Part 1 QT
40 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Prelim Notes
No ratings yet
Prelim Notes
4 pages
EM-104-Module
No ratings yet
EM-104-Module
12 pages
PUB 107 Introduction To Social Statistics
No ratings yet
PUB 107 Introduction To Social Statistics
8 pages
Chapter 1 Descriptivestatistics
No ratings yet
Chapter 1 Descriptivestatistics
21 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
PowerPoint Presentation PR-1
No ratings yet
PowerPoint Presentation PR-1
21 pages
PROPOSAL (Tiblets Michaele)
No ratings yet
PROPOSAL (Tiblets Michaele)
24 pages
Final IMRAD Format
100% (1)
Final IMRAD Format
18 pages
Laboratory Exercise No. 1 Aggregates
No ratings yet
Laboratory Exercise No. 1 Aggregates
11 pages
Minitab Exercises LSSGB
No ratings yet
Minitab Exercises LSSGB
66 pages
Research Methodology
No ratings yet
Research Methodology
25 pages
Blending Aggregate
100% (3)
Blending Aggregate
40 pages
Bhatbhateni Rosh
No ratings yet
Bhatbhateni Rosh
39 pages
Sampling Methodology:-Sampling Frame:-: Type of Research in Project
No ratings yet
Sampling Methodology:-Sampling Frame:-: Type of Research in Project
25 pages
Prevalence of Anemia and Associated Fact
No ratings yet
Prevalence of Anemia and Associated Fact
14 pages
Peer Pressure
No ratings yet
Peer Pressure
19 pages
Bullying in Elementary Schools: Its Causes and Effects On Students
No ratings yet
Bullying in Elementary Schools: Its Causes and Effects On Students
15 pages
Axle Load Survey - 2024-Attiguppe
No ratings yet
Axle Load Survey - 2024-Attiguppe
3 pages
DISSERTATION REPORT - Varun Kumar
No ratings yet
DISSERTATION REPORT - Varun Kumar
45 pages
Purposive Sampling Also Known As Judgmental
No ratings yet
Purposive Sampling Also Known As Judgmental
3 pages
Inferential Statistics
No ratings yet
Inferential Statistics
3 pages
Practical Research 1 Revised Module
100% (1)
Practical Research 1 Revised Module
36 pages
RC Akicfi Research Guideline
No ratings yet
RC Akicfi Research Guideline
41 pages
Internship Report On Premier Bank
No ratings yet
Internship Report On Premier Bank
51 pages
Chapter13 Sampling Non Sampling Errors PDF
No ratings yet
Chapter13 Sampling Non Sampling Errors PDF
8 pages
Traffic Report
No ratings yet
Traffic Report
50 pages
A Study On Customers' Satisfaction Towards DTH Service in Gobichettipalayam Town
No ratings yet
A Study On Customers' Satisfaction Towards DTH Service in Gobichettipalayam Town
5 pages
PSYC2007 Lab Report 2 Sources
No ratings yet
PSYC2007 Lab Report 2 Sources
12 pages
4.-RANDOM-SAMPLING-notes
No ratings yet
4.-RANDOM-SAMPLING-notes
12 pages
Data Analysis: Learning Outcomes
No ratings yet
Data Analysis: Learning Outcomes
19 pages
CHAPTER 20 1ST WORKING DRAFT FOR COMMENT 20 Apr 2018 PDF
100% (1)
CHAPTER 20 1ST WORKING DRAFT FOR COMMENT 20 Apr 2018 PDF
55 pages
A18A0825 Paper
No ratings yet
A18A0825 Paper
18 pages
BCC Business Crime Survey
No ratings yet
BCC Business Crime Survey
28 pages
D0952832 PDF
No ratings yet
D0952832 PDF
5 pages
LESSON 1 Basic Concepts in Statistics
No ratings yet
LESSON 1 Basic Concepts in Statistics
6 pages