0% found this document useful (0 votes)
64 views16 pages

Bsa Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views16 pages

Bsa Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit V

Sample

Sample refers to that part of aggregate statistical information (i,e Population) which is actually selected in the
course of an investigation/enquiry to ascertain the characteristics of the population.

Sample size

Sample size refers to the number of members of the population included in the sample. Usually the sample
size is denoted by ‘n’

Statistical Laws

The possibility of reaching valid conclusion about the population on the basis of sample is based on the
following two important laws:

1. Law of statistical Regularity.


2. Law of inertia of large numbers.

Law of statistical Regularity:- it states that a sample of reasonable large size when selected at random is
almost sure to represent the characteristics of the population. The selection is said to be random, when every
item in the universe has an equal chance of being selected. The larger the size of the sample more reliable is
the result, because the sample error is inversely proportional to the square root of the number of items in the
sample.

Law of inertia of large numbers:- it states that sample of large size shows a high degree of stability i,e. the
result obtained from the sample are expected to be very close to the population characteristics. The greater the
size of the sample, the greater will be compensation or tendency to neutralize extreme values and
consequently more stable would be result.

Example- birth rate, death rate etc, may vary from place to place, but for the country as a whole they will be
found somewhat stable over a number of years.

Parameter

Parameter is a statistical measures based on each and every item of the universe/population.
For example:-population mean (µ- called mu), population standard deviation (σ – called sigma), Proportion
of defective items in the whole lot of population (P).
Parameter shows the characteristics of the universe/population. Since the parameter remains a constant, it has
neither a sampling fluctuation nor sampling distribution nor a standard error. Usually, parameter is unknown
and statistics are used as estimates of parameters. It is used for calculating the standard error of statistic.
Statistic

Statistic is a statistical measure based on item/observations of a sample.


For example, sample mean(X), sample standard deviation(S), Proportion of defective observed in the sample
(P).
Statistic shows the characteristics of the universe/population. Since the values of statistic vary from sample to
sample, it has sampling fluctuation, sampling distribution and standard error
SAMPLING DISTRIBUTION
Meaning: - sampling distribution of a given statistic is the probability distribution of that statistic.
Examples: - Two important sampling distribution for large samples are:-
1. Sampling distribution of sample mean
2. Sampling distribution of sample proportion

Sampling distribution of sample mean:- if X represent the mean of a random sample of size n, drawn from
a population with mean µ and standard deviation σ, then the sampling distribution of X is approximately a
normal distribution with mean = µ and standard deviation σ = standard error of X Provided the sample size n
is sufficiently large.
Sampling distribution of sample proportion:- if P represent the proportion of defectives in a random
sample of size n drawn from a lot with proportion of defectives P, then the sampling distribution of P is
approximately a normal distribution with mean = P and standard deviation σ = standard error of P, provided
the sample size n is sufficiently large.

Sampling distribution of mean has the following properties:-

1. Its mean(X) is the same as population mean (µ).


2. Its standard deviation (S) is = Population standard deviation
Square root of sample size

S = σ

3. It is normally distributed provided the sample size is sufficiently large(i,e. ˃ 30)

Usefulness of sampling distribution:-

1. It is used to find confidence limits for population parameters.


2. It is used in testing statistical hypothesis.

Central Limit theorem

If a universe or population has a mean µ and a finite standard deviation σ, then the distribution of the sample
means approaches a normal distribution with a mean µ and standard deviation S / n as the sample size
increases, where n is the sample size.

It permits us to use sample statistics for the purpose of making inferences about population parameters
without having any knowledge about the shape of the frequency distribution of that population.

The central limit theorem is remarkable because it states that the distribution of the sample mean tends to a
normal distribution regardless of the distribution of the population from which the random sample is drawn.
The theorem allows us to make probability statements about the possible range of values the sample mean
may take. It allows us to compute probabilities of how far away may be from the population mean it
estimates.
STANDARD ERROR

Standard error of a given statistic is the Standard deviation of sampling distribution of that statistic.

For Example:-

1. Standard error of mean is the Standard deviation of sampling distribution of mean.


2. Standard error of Proportion is the Standard deviation of sampling distribution of Proportion obtained
from all possible samples of same size drawn from same population.

Standard error of a given statistic is the Standard deviation of all possible values of that statistic in repeated
sample of a fixed size from a given population. It is a measure of the divergence between the statistic and
parameter values. This divergence varies with the sample size (n). Thus

Sample Size Standard error


Increases Decreases
Decreases Increases

Factors affecting standard Error:-

1. The sample size


2. The nature of the statistic i,e. mean, variance etc.
3. The mathematical form of the sampling distribution.
4. The values of some of the parameters used in the sampling distribution.

Usefulness of standard Error:-

1. It is used to find confidence limits with in which parameters are expected to lie.
Mean ± 1 S.E Will give 68.27% values
Mean ± 2 S.E Will give 95.45% values
Mean ± 3 S.E Will give 99.73% values

2. It is used in testing a given statistical hypothesis at different level of significance.


Level Of Difference between observed and expected Whether or not considered
Significance significant
5% If the difference is more than 1.96 S.E Significant
5% If the difference is less than 1.96 S.E Not Significant
1% If the difference is more than 2.58 P.E Significant
1% If the difference is less than 2.58 P.E Not Significant

3. It gives an idea about the unreliability of a sample. Greater standard error implies greater departure of
actual frequencies from the expected ones and hence the greater unreliability of the sample. The
reciprocal of standard error is a measure of reliability or precision of the sample.

Measure of reliability or precision of the sample = 1

Standard Error (S.E)


How to compute standard error of the mean

Population size When population standard (σ) When population standard (σ)
deviation is known(whether the deviation is not known(whether
sample size is small or large) the sample size is small or large)
(a) In case the population is
either infinite or in which S.E.X = σ S.E.X = S
samples are drawn from a
finite population with n n-1
replacement Where, σ = Population standard Where, S = Sample standard
deviation deviation
n = Sample Size n = Sample Size
(b) In case the population is
finite (generally whenever S.E.X = σ x N -n S.E.X = S x N -n
the ratio of sample size (n)
to population size (N) is n N-1 n-1 N-1
0.05 or more). In other
words sampling fraction Where, N = Population Size Where, N = Population Size
(n/N) is 0.05 or more

 When both the population standard deviation (σ) and the sample standard deviation (S) are given,
always use the population standard deviation (σ) while calculating standard error.
 When the population standard deviation (σ) is not known, use sample standard deviation (S) as an
unbiased estimate of population standard deviation (σ) in the numerator and n – 1 in the
denominator of the formula of standard error.

 Finite population multiplier N -n is used when the population is finite

N-1
How to compute standard error of the Proportion

Population size When population Proportion is When population Proportion is


known Not known
(a) In case the
population is either S.E. p = PQ S.E. p = pq
infinite or in which
samples are drawn n n
from a finite
population with Where, P = Population Proportion Where, p = Sample Proportion
replacement Q = 1- P q = 1- p
n = Sample Size n = Sample Size

(b) In case the


population is finite
(generally whenever S.E. p = PQ x N -n S.E. p = pq x N -n
the ratio of sample
size (n) to population n N-1 n N-1
size (N) is 0.05 or
more). In other
words sampling
fraction (n/N) is 0.05 Where, N = Population Size Where, N = Population Size
or more

 When both the population Proportion (P) and the sample Proportion (p) are given, always use the
population Proportion (P) while calculating standard error.
 When the population Proportion (P) is not known, use sample Proportion (p) as an unbiased estimate
of population (P)

 Finite population multiplier N -n is used when the population is finite

N-1

STATISTICAL INFERENCE

Statistical inference refers to the process of selecting and using a sample statistic to draw conclusion about the
parameter of a population from which the sample is drawn. It deals with following two types of problems:-

1. Problem of estimation
2. Problem of test of hypothesis or test of significance

Problem of estimation:-

This problem arises when no information is available about the parameters of the population from which the
sample is drawn.
Problem of test of hypothesis or test of significance:-

This problem arises when some information is available about the parameters of the population from which
the sample is drawn and it is required to test how far this information about the population parameters is
tenable in the light of the information provided by the sample.

ESTIMATION:-
When data are collected by sampling from a population the most important objective of statistical analysis is
to draw inferences about the population from the information embodied in the sample. There are two types of
estimates

(1) Point estimates


(2) Interval estimates

Point estimates:-

It provides a single value of a statistic that is used to estimate an unknown population parameters.

Interval estimates:-

Interval estimates provides an interval of finite width centered at the point estimate of a parameter, with in
which unknown parameter is expected to lie with a specified probability. Such an interval is called a
confidence interval of a population.

Properties of a good estimator:-

(1) Unbiasedness
(2) Consistency
(3) Efficiency
(4) Sufficiency

Having discussed the above concepts let us now discuss the various situations where we have to apply
different tests of significance. For the sake of convenience and clarity these situations may be summed up
under the following three heads:-

(1) Tests of significance for attributes.


(2) Tests of significance for variables.(Large sample)
(3) Tests of significance for variables. (small sample)
TESTS OF HYPOTHESIS
Hypothesis testing begins with an assumption called a hypothesis that we make about a population parameter.

A hypothesis is an assumption or supposition made as a basis for reasoning. There can be several types of
hypothesis.

Example: - a coin may be tossed 200 times and we may get heads 80 times and tails 120 times.

Procedure of testing hypothesis


1. Set up a hypothesis
The first thing in hypothesis testing is to set up a hypothesis about a population parameter. Then we collect
the sample data, produce sample statistics and use this information to decide how likely it is that our
hypothesized population is correct.

The conventional approach to hypothesis testing is not to construct a single hypothesis about the population
parameter, but rather to set up two different hypotheses. These hypotheses must be so constructed that if one
hypothesis is accepted. The other is rejected and vice versa

The two hypotheses in a statistical tool are normally referred to as:-

(1) Null hypothesis


(2) Alternative hypothesis

Null hypothesis
The Null hypothesis is a very useful tool in testing the significance of difference. In its simplest form the
hypothesis asserts that there is no real difference in the sample and the population in the particular matter
under consideration ( hence the word “null” which means invalid, void or amounting to nothing) the null
hypothesis is a kin to the legal principle that a man is innocent until he is proved guilty.

It asserts that there is no significant difference between the sample statistic (like – mean, standard deviation,
proportion of sample) and population parameter (like - mean, standard deviation, proportion of sample) and
the difference if any, is due to chance and has arisen out of sampling fluctuations.

Symbol:- it is denoted by H0

Acceptance:- the acceptance of null hypothesis implies that we have no evidence to believe otherwise and
indicates that the difference is not significant and is due to sampling fluctuations.

Rejection:- the rejection of null hypothesis implies that it is false and indicates that the difference is
significant.

Significant difference:- The difference is said to be significant one when it is more than what is expected
theoretically.

For example:- if we want to find out whether extra coaching has benefited the students or not, we shall set up
a null hypothesis – that extra coaching has not benefited the students.
Similarly if we want to find out whether a particular drug is effective in curing malaria we will take the null
Hypothesis- that the drug is not effective in curing the malaria.

ALTERNATIVE HYPOTHESIS
The Alternative hypothesis specifies those values that the researcher believes to hold true and of course, he
hopes that the sample data leads to acceptance of this hypothesis as true. Alternative hypothesis is the
hypothesis which differs from the null hypothesis. It is not tested.

Symbol:- it is denoted by H1

Acceptance:- It acceptance depends on the rejection of the null hypothesis.

Rejection:- It rejection depends on the acceptance of the null hypothesis.

For example:- thus a psychologist who wishes to test whether or not a certain class of people have a mean IQ
higher than 100 might establish the following null and alternative hypotheses.

H0 : µ = 100 (Null Hypothesis)

H1 : µ ≠ 100 (Alternative Hypothesis)

2- SET UP A SUITABLE SIGNIFICANCE LEVEL


Meaning:- Level of significance is the maximum probability of rejecting the null hypothesis when it is true.

Symbol:- it is usually expressed as % and is denoted by symbol α (Alpha)

Usefulness:- it is used as a guide in decision making. It is used to indicate the upper limit of the size of the
critical rejection.

Example: 5% level of significance implies that there are about 5 chances in 100 of rejecting the null
hypothesis when it is true or in other words, we are about 95% confident that we will make a correct decision.

Note:- customarily, 5% or 1% level of significance is taken. Unless otherwise stated in the questions, the
author advises the students to consider 5% level of significance.

3- SETTING A TEST CRITERION


Meaning:- Test statistics refers to a function of sample observation whose computed value determines the
final decision regarding acceptance or rejection of null hypothesis.

Usefulness:- it is used as a guide in decision making regarding acceptance or rejection of null hypothesis. If
the value of test statics falls in the critical region, the null hypothesis is rejected. If the value of the test
statistic does not fall in the critical region, the null hypothesis is accepted.
Example:-

Test - Statistic Used For


Z – Test For test of hypothesis involving large sample i.e ˃ 30

t – Test For test of hypothesis involving Small sample i.e ≤ 30 and if σ is unknown
2 – Test For testing the discrepancy between observed frequencies and expected frequencies,
without any reference to population parameter

F – Test For testing the sample variances.

Critical region or rejection region


Meaning:- Critical region is the region which corresponds to a pre – determined level of significance. The set
of values of the test statistic which leads to rejection of the null hypothesis is called region of rejection or
critical region of the test. Conversely, the set of values of the test statistic which leads to the acceptance of
null hypothesis is called region of acceptance.

Critical Value:- critical value is that value of statistic which separates the critical region from the acceptance
region. It lies at the boundary of the regions of acceptance and rejection.

Size of critical Region:- The probability of rejecting a true null hypothesis is called as size of critical region.

Area under normal curve:- The critical region may be represented by a portion of the area under the normal
curve in the following two ways:-

(1) One tailed test


(2) Two tailed test

ONE TAILED TEST


One tailed tests are of two types
(1) One tail test with rejection region on left
(2) One tail test with rejection region on right
One tail test with rejection region on left

Left tailed test is so called because the rejection region will be located only on the left tailed of the curve

Left – Tail Test

Acceptance
Region
(1-α)

Critical
Region (α)

y
µ
Critical
Value

One tail test with rejection region on right

Right tailed test is so called because the rejection region will be located only on the right tailed of the curve

Right – Tail Test

Acceptance
Region
(1-α)

Critical
Region (α)

y
µ
Critical
Value
Two tail test with rejection region on Both side:-

A two- tailed test of hypothesis will reject the null hypothesis, if the sample statistic is significantly higher or
lower than hypothesized population parameter. Thus in a two-tail test the rejection region is located in both
the tails.

Two – Tail Test

Acceptance
Region
(1-α)

Critical
Critical Region (α/2)
Region (α/2)

y y
µ
Lower Upper
Critical Critical
Value Value

Table showing the type of test and critical region while testing population mean (µ) = 50

Alternative Hypothesis Type of Alternative Type of Test Critical region is


represented by
H1 : µ # 50 Both sided Two - Tailed Both tails
H1 : µ ˃ 50 One sided One - Tailed Right tails
H1 : µ ˂ 50 One sided One - Tailed Left tails

Critical Value of Test – Statistic ‘Z’ at 1% and 5% level of significance for one
tailed and two tailed test are given below
Type of Test level of significance
1% 5%
Two - Tailed ± 2.58 ± 1.96
One – Tailed (Right tail) + 2.33 + 1.645
One – Tailed (Left tail) -2.33 -1.645
TWO TYPES OF ERRORS IN TESTING OF HYPOTHESIS
When a statistical hypothesis is tested there are four possibilities-

(1) The hypothesis is true but our test rejects it. ( Type I error)
(2) The hypothesis is false but our test accepts it. (Type II error)
(3) The hypothesis is true and our test accepts it. (correct decision)
(4) The hypothesis is false and our test rejects it. (correct decision)

Obviously, the first two possibilities lead to two types of errors:-

1- Type I error
2- Type II error

Type I error is committed by rejecting the null hypothesis when it is true. The probability of committing a
type I error is denoted by α (Alpha) where

α = Probability (Type I error)

α = Probability (Rejecting Ho/H1 is true)

Type II error is committed by not rejecting the null hypothesis. When it is false. The probability of
committing a type II error is denoted by β (Beta), where

β = Probability (Type II error)

β = Probability (Not Rejecting Ho/H1 is false)

TYPE I and Type II errors are described by the following table :-

Decisions Ho: True Ho: False


Accept Ho Correct decision Type II error
Reject Ho Type I error Correct decision

While testing hypothesis the aim is to reduce both the types of errors that is type I and type II. But due to
fixed sample size, it is not possible to control both the errors simultaneously. There is a trade – off between
these types of errors. The probability of making one type error can only be reduced.If we are willing to
increase the probability of making the other type of error. In order to get a low β, we will have to put with a
high α

Power of Test:-
Power of test is the probability of rejecting a false null hypothesis. It can be calculated as follows:

Power of test = 1 – probability of type II Error.


Students - Distribution Test

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if
the null hypothesis is supported. It can be used to determine if two sets of data are significantly different
from each other, and is most commonly applied when the test statistic would follow a normal distribution if
the value of a scaling term in the test statistic were known. When the scaling term is unknown and is
replaced by an estimate based on the data, the test statistic (under certain conditions) follows a
Student's t distribution.

One-sample t-test
In testing the null hypothesis that the population mean is equal to a specified value μ0, one uses the statistic

where is the sample mean, s is the sample standard deviation of the sample and n is the sample size. The
degrees of freedom used in this test are n − 1. Although the parent population does not need to be normally
distributed, the distribution of the population of sample means, , is assumed to be normal. By the central
limit theorem, if the sampling of the parent population is independent then the sample means will be
approximately normal.[11] (The degree of approximation will depend on how close the parent population is to
a normal distribution and the sample size, n.)
Equal or unequal sample sizes, equal variance
This test is used only when it can be assumed that the two distributions have the same variance. (When this
assumption is violated, see below.) The t statistic to test whether the means are different can be calculated as
follows:

where

Note that the formulae above are generalizations of the case where both samples have equal sizes
(substitute n for n1 and n2).
is an estimator of the common standard deviation of the two samples: it is defined in this way
so that its square is an unbiased estimator of the common variance whether or not the population
means are the same. In these formulae, n = number of participants, 1 = group one, 2 = group
two. n − 1 is the number of degrees of freedom for either group, and the total sample size minus two
(that is, n1 + n2 − 2) is the total number of degrees of freedom, which is used in significance testing.

Chi-Square Test

Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain
according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20
offspring from a cross to be male and the actual observed number was 8 males, then you might want to know
about the "goodness to fit" between the observed and expected. Were the deviations (differences between
observed and expected) the result of chance, or were they due to other factors. How much deviation can occur
before you, the investigator, must conclude that something other than chance is at work, causing the observed
to differ from the expected. The chi-square test is always testing what scientists call the null
hypothesis, which states that there is no significant difference between the expected and observed result.

The formula for calculating chi-square ( X2) is:

X2= (o-e)2/e

That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the
deviation, d), divided by the expected data in all possible categories.

For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green
seeds and 241 with yellow seeds. You are asked to propose the genotypes of the parents. Yourhypothesis is
that the allele for green is dominant to the allele for yellow and that the parent plants were both heterozygous
for this trait. If your hypothesis is true, then the predicted ratio of offspring from this cross would be 3:1
(based on Mendel's laws) as predicted from the results of the Punnett square

F-Test

The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two
variances. So, if the variances are equal, the ratio of the variances will be 1.

f the null hypothesis is true, then the F test-statistic given above can be simplified (dramatically). This ratio of
sample variances will be test statistic used. If the null hypothesis is false, then we will reject the null
hypothesis that the ratio was equal to 1 and our assumption that they were equal.

There are several different F-tables. Each one has a different level of significance. So, find the correct level of
significance first, and then look up the numerator degrees of freedom and the denominator degrees of freedom
to find the critical value.

You will notice that all of the tables only give level of significance for right tail tests. Because the F
distribution is not symmetric, and there are no negative values, you may not simply take the opposite of the
right critical value to find the left critical value. The way to find a left critical value is to reverse the degrees of
freedom, look up the right critical value, and then take the reciprocal of this value. For example, the critical
value with 0.05 on the left with 12 numerator and 15 denominator degrees of freedom is found of taking the
reciprocal of the critical value with 0.05 on the right with 15 numerator and 12 denominator degrees of
freedom.

S1 = Standard deviation of first data.

S2 = Standard deviation of second data.


Z – Test:

A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be
approximated by a normal distribution. Because of the central limit theorem, many test statistics are
approximately normally distributed for large samples. For each significance level, the Z-test has a single
critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-
test which has separate critical values for each sample size. Therefore, many statistical tests can be
conveniently performed as approximate Z-tests if the sample size is large or the population variance known. If
the population variance is unknown (and therefore has to be estimated from the sample itself) and the sample
size is not large (n < 30), the Student's t-test may be more appropriate.

where is the sample mean, Δ is a specified value to be tested, σ is the population standard deviation, and n is
the size of the sample. Look up the significance level of the z‐value in the standard normal table

You might also like