Probability and Statistics
Probability and Statistics
Prepared by:
Mr.YANGALADASU DILIP KUMAR
Assoc. Professor
CONTENTS
3. Blooms Taxonomy
4. Course Syllabus
b. Notes
f. Tutorial Questions
www.mrcet.ac.in
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
VISION
To establish a pedestal for the integral innovation, team spirit, originality and
competence in the students, expose them to face the global challenges and become
technology leaders of Indian vision of modern society.
MISSION
To become a model institution in the fields of Engineering, Technology and
Management.
To impart holistic education to the students to render them as industry ready
engineers.
To ensure synchronization of MRCET ideologies with challenging demands of
International Pioneering Organizations.
QUALITY POLICY
To implement best practices in Teaching and Learning process for both UG and PG
courses meticulously.
To channelize the activities and tune them in heights of commitment and sincerity,
the requisites to claim the never - ending ladder of SUCCESS year after year.
VISION
MISSION
The Department of Mechanical Engineering is dedicated for transforming the students into
highly competent Mechanical engineers to meet the needs of the industry, in a changing
and challenging technical environment, by strongly focusing in the fundamentals of
engineering sciences for achieving excellent results in their professional pursuits.
Quality Policy
PSO1 Ability to analyze, design and develop Mechanical systems to solve the
Engineering problems by integrating thermal, design and manufacturing Domains.
PSO3 Ability to apply the learned Mechanical Engineering knowledge for the
Development of society and self.
PEO1: PREPARATION
To make the students to design, experiment, analyze, interpret in the core field with the help of
other inter disciplinary concepts wherever applicable.
To inculcate the habit of lifelong learning for career development through successful completion
of advanced degrees, professional development courses, industrial training etc.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
PEO5: PROFESSIONALISM
To impart technical knowledge, ethical values for professional development of the student to solve
complex problems and to work in multi-disciplinary ambience, whose solutions lead to significant
societal benefits.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
Blooms Taxonomy
Bloom’s Taxonomy is a classification of the different objectives and skills that educators set for
their students (learning objectives). The terminology has been updated to include the following
six levels of learning. These 6 levels can be used to structure the learning objectives, lessons,
and assessments of a course.
1. Remembering: Retrieving, recognizing, and recalling relevant knowledge from long‐ term
memory.
2. Understanding: Constructing meaning from oral, written, and graphic messages through
interpreting, exemplifying, classifying, summarizing, inferring, comparing, and explaining.
3. Applying: Carrying out or using a procedure for executing or implementing.
4. Analyzing: Breaking material into constituent parts, determining how the parts relate to
one another and to an overall structure or purpose through differentiating, organizing, and
attributing.
5. Evaluating: Making judgments based on criteria and standard through checking and
critiquing.
6. Creating: Putting elements together to form a coherent or functional whole; reorganizing
elements into a new pattern or structure through generating, planning, or producing.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
B. Tech (ME) R-22
COURSE OBJECTIVES:
Single Random Variables -Discrete and Continuous, Probability distribution function, Probability
mass and density functions, mathematical expectation and variance.
Multiple Random variables: Discrete and Continuous, Joint probability distribution, Marginal
probability density functions, conditional probability distribution function and density functions.
Binomial distribution – properties, mean, variance and recurrence formula for Binomial distribution,
Poisson distribution – Poisson distribution as Limiting case of Binomial distribution, properties, mean
variance and recurrence formula for Poisson distribution, Normal distribution – mean, variance,
median, mode and characteristics of Normal distribution.
Sampling: Definitions - Types of sampling - Expected values of sample mean and variance, Standard
error - Sampling distribution of means and variance. Estimation - Point estimation and Interval
estimation.
Testing of hypothesis: Null and Alternative hypothesis - Type I and Type II errors, Critical region -
confidence interval - Level of significance, One tailed and Two tailed test.
B. Tech (ME) R-22
Large sample Tests: Test of significance - Large sample test for single mean, difference of means,
single proportion, and difference of proportions.
Small samples: Test for single mean, difference of means, paired t-test, test for ratio of variances
(F-test),Chi- square test for goodness of fit and independence of attributes.
TEXT BOOKS:
REFERENCES BOOKS:
1. Introduction to Probability and Statistics for Engineers and Scientists by Sheldon M.Ross.
2. Probability and Statistics for Engineers by Dr. J. Ravichandran
COURSE OUTCOMES:
1. Evaluate randomness in certain realistic situation which can be either discrete or continuous
type and compute statistical constants of these random variables.
2. Provide very good insight which is essential for industrial applications by learning probability
distributions.
3. Higher up thinking skills to make objective, data-driven decisions by using correlation and
regression.
4. Assess the importance of sampling distribution of a given statistic of a random sample.
5. Analyze and interpret statistical inference using samples of a given size which is taken from
a population.
UNIT 1
RANDOM VARIABLES
OBJECTIVES
continuous type.
variables.
OUTCOME:
RANDOM VARIABLES
Random Variable
A Random Variable X is a real valued function from sample space S to a real number R.
(or)
A Random Variable X is a real number which is determined by the outcomes of the random
experiment.
X(S)={0,1,2}.
X(S)={2,3,4,5,6,7,8,9,10,11,12}.
2. Continuous Random Variables: A Random Variable X which takes all possible values
in a given interval of domain.
𝑥𝑖 0 1 2
𝑝(𝑥𝑖) 1 1 1
4 2 4
Properties:
1) 𝐸(𝑋) = 𝜇
2) 𝐸(𝑋) = 𝑘 𝐸(𝑋)
3) 𝐸(𝑋 + 𝑘) = 𝐸(𝑋) + 𝑘
4) ) 𝐸(𝑎𝑋 ± 𝑏) = 𝑎𝐸(𝑋) ± 𝑏
= ∑ 𝑥𝑖2𝑝𝑖 − 𝜇2
Properties:
1) V(c) = 0 where c is a constant
2) V(kX) = k2 V(X)
3) V(X + k) = V(X)
4) V(aX ± b) = a2 V(X)
Problems
1. If 3 cars are selected randomly from 6 cars having 2 defective cars.
𝑥𝑖 0 1 2
𝑝(𝑥𝑖) 1 3 1
5 5 5
1 3 1
Expected number of defective cars = ∑n ( )
i=1 xip xi = 0 ( ) + 1 ( ) + 2 ( ) = 1
5 5 5
2. Let X be a random variable of sum of two numbers in throwing two fair dice. Find the
probability distribution of X, mean ,variance.
S ={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
∴ 𝑛(𝑆) = 36.
8 (6,2),(5,3),(4,4),(3,5),(2,6) 5 6
36
9 (6,3),(5,4),(4,5),(3,6) 4 5
36
10 (6,4),(5,5),(4,6) 3
4
11 (6,5),(5,6) 2 36
3
12 (6,6) 1 36
2
36
1
36
xi 2 3 4 5 6 7 8 9 10 11 12
p(xi) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36
n 1 2 3 4 5 6
Mean = μ = ∑ xip(xi) = 2 ( ) + 3 ( ) + 4 ( ) + 5 ( ) + 6 ( ) + 7 ( ) +
i=1 36 36 36 36 36 36
5 4 3 2 1
8( ) + 9( ) + 10 ( ) + 11 ( ) + 12( ) = 7.
36 36 36 36 36
Variance = V(X)= ∑ xi2pi − μ2
1 2 3 4 5 6 5
= 4 ( ) + 9 ( ) + 16 ( ) + 25 ( ) + 36 ( ) + 49 ( ) + 64 ( ) +
36 36 36 36 36 36 36
4 3 2 1
81 ( ) + 100 ( ) + 121 ( ) + 144 ( ) − 49 =5.83
36 36 36 36
3. Let X be a random variable of maximum of two numbers in throwing two fair dice
simultaneously. Find the
a) probability distribution of X
b)mean
c)variance
d)P(1<x<4)
e)P(2≤ 𝒙 ≤ 𝟒).
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
∴ 𝑛(𝑆) = 36.
6 (1,6)(6,1),(6,2),(2,6),(6,3),(3,6),(4,6),(6,4),(6,5)(5,6),(6,6) 11 9
36
11
36
Clearly , p(xi) > 0 and ∑ni=1 p(xi) = 1
𝑥𝑖 1 2 3 4 5 6
𝑝(𝑥𝑖) 1 3 5 7 9 11
36 36 36 36 36 36
3 5 7 9 11
xip(xi) = 1 ( 1 ) + 2 ( ) + 3 ( ) + 4 ( ) + 5 ( ) + 6 ( )
n
Mean = μ = ∑
i=1 36 36 36 36 36 36
= 4.4 7.
1 3 5 7 9 11
=1( )+4( )+ 9( ) + 16 ( ) + 25 ( ) + 36 ( )
36 36 36 36 36 36
∴ Variance = 1.99.
𝒙𝒊 -3 -2 -1 0 1 2 3
)
𝒑(𝒙𝒊 k 0.1 k 0.2 2k 0.4 2k
Find k ,mean, variance.
i.e k+0.1+k+0.2+2k+0.4+2k = 1
= 0.8.
∴ Variance = 2.86.
∞
In general, E(g(x)) = ∫−∞ g(x) f(x)dx.
Properties:
1) E(X) = μ
2) E(X) = k E(X)
3) E(X + k) = E(X) + k
4) ) E(aX ± b) = aE(X) ± b
Properties:
2) V(kX) = k2 V(X)
3) V(X + k) = V(X)
4) V(aX ± b) = a2 V(X)
Median: Median is the point which divides the entire distribution in to two equal parts. In
case of continuous distribution, median is the point which divides the total area in to two
b 1
equal parts i.e, ∫M f(x)dx = ∫ f(x)dx = ∀ x ∈ (a, b) .
a M 2
Mode: Mode is the value of x for which f(x) is maximum.
i.e f ′(x) = 0 and f "(x) < 0 for x ∈ (a, b)
Problems
𝒌
1. If the probability density function 𝒇(𝒙) = − ∞ < 𝒙 < ∞. Find the value of ‘k’
𝟏+𝒙𝟐
and probability distribution function of 𝐟(𝐱).
b
Sol: Since total area under the probability curve is 1 i. e, ∫a f(x)dx = 1.
∞
𝐤
∫ dx = 1.
−∞ 𝟏 + 𝐱𝟐
∞
2k(tan−1 x) = 1
0
2k(tan−1 ∞ − tan−1 0) =1
1
∴k=
π
Cumulative distribution function of f(x) is given by
x x
𝐤 1 x 1 π
∫ f(x)dx = ∫ dx = (tan−1 x) = [ + (tan−1 x)].
−∞ −∞ 𝟏 + 𝐱𝟐 π −∞ π 2
∞
∫ 𝐜𝐞−|𝐱| dx = 1
−∞
∞
2 ∫ 𝐜𝐞−𝐱 dx = 1
0
𝐞−𝐱 ∞
2c ( ) =1
−1 0
1
∴c=
2
1 ∞
Mean=μ = ∞ x f(x)dx = x𝐞−|𝐱| dx = 0 since x𝐞−|𝐱| is an odd function .
∫−∞ 2
∫−∞
variance = V(X)
∞
= ∫ x2 f(x)dx − μ2
−∞
1 ∞
= ∫ x2𝐞−|𝐱| dx
2 −∞
1 ∞ ∞
= ∫ 2x2𝐞−𝐱 dx = [x2(−𝐞−𝐱) − 2x(𝐞−𝐱) + 2(−𝐞−𝐱)] = 2.
2 0 0
𝐬𝐢𝐧𝐱
3. If the probability density function 𝐟(𝐱) = { 𝟐 𝐢𝐟 𝟎 ≤ 𝐱 ≤ 𝛑 .
𝟎 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
𝛑
Find mean,median,mode and 𝐏(𝟎 < 𝐱 < ).
𝟐
∞ 1
Sol: Mean = μ = x f(x)dx = π x 𝐬𝐢𝐧𝐱 dx 1
= [−xcosx + sinx] π = .
π
∫−∞ 2
∫0 𝟐 2 0 2
0 𝟐 M 𝟐 2
π
∴M=
2
sinx
Since f(x) = { 2 if 0 ≤ x ≤ π
0 otherwise
To find maximum, we have f ′(x) = 0
π
i. e, cosx = 0 implies that x =
2
sinx π
and f ′′(x) = − which is less than 0 at x =
2 2
π
∴ Mode = .
2
3
∫ 4k(x − 1)3dx = 1
1
[k(x − 1)4] 13 = 1
1
∴k=
16
1 0 if x ≤ 31
∴ f(x) = { (x − 1) if 1 ≤ x ≤ 3
4
0 if x > 3
1
Mean=μ = ∞ x f(x)dx = 3 x(x − 1)3dx = 19.6.
∫−∞ ∫
4 1
Multiple Random variables
𝛛2
and fXY (x, y) = [FXY (x, y) ]
𝛛x 𝛛y
fX(x) = ∫ fXY(x, y) dy
−∞
Problems
X\Y 1 2 3 4
1 0.1 0 0.2 0.1
2 0.05 0.12 0.08 0.01
3 0.1 0.05 0.1 0.09
Find i) P(X≤ 𝟐, 𝐘 = 𝟐) ii) 𝐅𝐗(𝟐) iii)P(Y=3) iv) P(X< 𝟑, 𝐘 ≤ 𝟒) v) 𝐅𝐲(𝟑).
Sol: Given
X\Y 1 2 3 4
1 0.1 0 0.2 0.1
2 0.05 0.12 0.08 0.01
3 0.1 0.05 0.1 0.09
=(0.1+0+0.2+0.1)+(0.05+0.2+0.08+0.1)
=0.66
= 0.2+0.08+0.1
= 0.38.
+ P(X< 3, Y = 4)
= P(X= 1, Y = 1) +P(X= 2, Y = 1)+ P(X= 1, Y = 2)
+P(X= 2, Y = 2)+ P(X= 1, Y = 3) +P(X= 2, Y = 3)
+P(X= 1, Y = 4) +P(X= 2, Y = 4)
=(0.1+0+0.2+0.1)+(0.05+0.2+0.08+0.1)
=0.66
=(0.1+0.05+0.1)+(0+0.12+0.05)+(0.2+0.08+0.1)
=0.8
2. Suppose the random variables X and Y have the joint density function defined by
𝐜(𝟐𝐱 + 𝐲) 𝐢𝐟 𝟐 < 𝐱 < 𝟔, 𝟎 < 𝐲 < 𝟓
𝐟(𝐱, 𝐲) = {
𝟎 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
Find i)c ii)P(X>3,Y>2) iii) P(X>3)
∞ ∞
Sol: Since 𝑓(𝑥, 𝑦) 𝑑𝑥𝑑𝑦 = 1
∫−∞ ∫−∞
6 5
∫ ∫ c(𝟐𝐱 + 𝐲 )dydx = 1
2 0
6 y2 5
∫ c(2xy + ) dx = 1
2
2 0
6 25
∫ c(10x + ) dx = 1
2 2
x2 25x 6
c(10 + ) =1
2 2 2
1
∴c=
210
ii) P(X > 3, 𝑌 > 2) = 6 5 f(x, y) dydx
∫3 ∫2
6 5 1 (2x + y )dydx
=
∫3 ∫2 210
1 6 y2 5 15
=
210
∫3 (2xy + 2 ) 2 dx = 28.
1 6 5
iii) ) P(X > 3) = f(x, y) dydx
∫ ∫
210 3 0
6 5
1
= (𝟐𝐱 + 𝐲 )dydx
210 ∫ ∫
3 0
6 y2 5
1
= ∫ (2xy + ) dx
210 3 2 0
1 6 25
= (10x + ) dx
∫
210 3 2
25x
= 1 [10x2 + (10x + )] 6 =23.
210 2 3 28
4 3
∫ ∫ 𝐜(𝐱𝐲)dxdy = 1
2 1
4 x2 3
∫2 cy ( 2 ) 1 dy = 1
8c y2 4 1
2
(2)2 = 1 ∴c = 24
.
xy
iii) Clearly f (x, y) =xy == = f (x) f (y)
XY 24 46 X Y
fXY(x, y)
fXY(X/Y) =
fY(y)
fXY(x, y)
fXY(Y/X) =
fX(X)
Y ( + )
3 6
Find (i) c (ii) marginal density functions (iii) conditional density functions
(iv) S.T X ,Y are independent
4a) A sample of 4 items is selected at random from a box containing 12 items of which 5 are
defective . Find the expected number of defective items.
b) If the p.d.f is f(x) = 𝐾 ,- ∞<x<∞ Find (i)K (ii) Probability distribution function
1+𝑥2
𝑘𝑥𝑒−𝛼𝑥, 𝑥 ≥ 0, 𝛼 > 0
5a) A continuous r.v has the p.d.f f(x)={ 0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒 } Determine (i)K, (ii) mean
(iii) variance
b) A random sample with replacement of size 2 is taken from S ={1,2,3}. Let X denote sum
of 2 no. s taken .(i)Write probability distribution (ii) find mean
ASSIGNMENT QUESTIONS
𝑘(1 − 𝑥2), 0 < 𝑥 < 1
1. If the p.d.f of a r.v xis given by 𝑓 (𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
find i) k
0 𝑖𝑓 𝑥 ≤ 1
2. If F(x) is the distribution function of x is given by𝐹(𝑋) = { 𝑘(𝑥 − 1)4 𝑖𝑓 1 < 𝑥 ≤ 3
1 𝑖𝑓 𝑥 > 3
x 0 1 2 3 4 5 6 7
Find (i) c (ii) marginal density functions (iii) conditional density functions
(iv) S.T X ,Y are independent
UNIT 2
PROBABILITY DISTRIBUTIONS
OBJECTIVE
To learn some of the important probability distributions like
OUTCOME
Examples:
1. Tossing a coin 𝑛 times
2. Throwing a die
3. No. of defective items in the box
=np[p + q]n−1
= np since [p + q = 1]
Mean = np.
𝜎2 = ∑ 𝑟2 𝑝(𝑟) − 𝜇2
𝑟=0
n
= ∑[r(r − 1) + r]P(r) − μ2
r=0
n n
let ∑r=0
n r(r − 1)P(r) = ∑n r(r − 1)nc P rqn−r = 2 nc P2q2nn−2 +
r=0 r 2r
6nc3P3qn−3+12nc rP4qn−4+---− + n(n − 1) Pn
= n2P2 − nP2
= np(1 − p)
= npq.
Problems
Sol: Given 𝑛 = 10
1
= [10𝐶7 + 10𝐶8 + 10𝐶9 + 10𝐶10 ]
210
1
= [120 + 45 + 10 + 1]
210
176
=
1024
= 0.1719
1
= [10𝐶0 + 10𝐶1 + 10𝑐2 + 10𝑐3 ]
210
= 1 [120 + 45 + 10 + 1]
210
176
=
1024
= 0.1719
10𝑐6 (2 ) ( 2)
= 0.205.
2. In 𝟐𝟓𝟔 sets of 𝟏𝟐 tosses of a coin ,in how many cases one can expect 𝟖 Heads
and 𝟒 Tails.
1
Sol: The probability of getting a head, 𝑝 =
2
1
The probability of getting a tail,𝑞 =
2
Here 𝑛 = 12
1 8 1 4
The probability of getting 8heads and 4Tails in 12trials = 𝑃(𝑋 = 8) = 12𝐶8 ( ) ( )
2 2
12! 1 12 495
= ( ) =
8! 4! 2 212
The expected number of getting 8 heads and 4 Tails in 12 trials of such cases in256 sets
495 495
= 256 × 𝑃(𝑋 = 8) = 28 × = = 30.9375 ~31
212 16
1
= 210 [10𝐶3 + 10𝐶4 + 10𝑐5 ]
=0.568.
4. Out of 800 families with 4 children each ,how many could you expect to have
𝑃(𝑋 = 3) = 5𝐶𝑟 ( ) ( ) =
2 2 16
𝑃(𝑋 ≥ 1) = 1 − 𝑃(𝑋 = 0)
2 2
31
1 0 1
5−0
( ) =
= 1 − 5 𝐶0 ( ) 2 32
2
x 0 1 2 3 4 5
f 2 14 20 34 22 8
Sol: Given n= 5,∑ 𝑓 = 2 + 14 + 20 + 34 + 22 + 8 = 100
X P(𝑥𝑖) E(𝑥𝑖)
0 5𝐶0 (0.568)0 (0.432)5−0=0.02 N p(0) =100(0.02)=2
1 5𝐶 (0.568)1 (0.432)5−1=0.09 9
1
2 5𝐶 (0.568)2 (0.432)5−2=0.26 26
2
5𝐶 (0.568)3 (0.432)5−3=0.34
3 3 34
5𝐶 (0.568)4 (0.432)5−4=0.22
4 4 22
x 0 1 2 3 4 5
f 2 10 26 34 22 6
Recurrence Relation
𝑝(𝑟 + 1) = 𝑛𝐶𝑟+1 (𝑝) (𝑞)𝑛−𝑟−1 .......................... (1)
𝑟+1
𝑝(𝑟 + 1) 𝑛𝐶𝑟+1 𝑝
∴ = ( )
𝑝(𝑟) 𝑛𝐶𝑟 𝑞
𝑛𝐶𝑟+1 𝑝
𝑝(𝑟 + 1) =
( ) 𝑝(𝑟).
𝑛𝐶𝑟 𝑞
Poisson Distribution
A random variable ‘X’ follows Poisson distribution if it assumes only non-negative values
with probability mass function is given by
𝑒−𝜆𝜆𝑟
𝑃(𝑥 = 𝑟) = 𝑃(𝑟, 𝜆) = { 𝑟! 𝑓𝑜𝑟 𝑦 = 0,1, − − (𝜆 > 0)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Conditions For Poisson Distribution
1. The number of trials are very large (infinite)
2. The probability of occurrence of an event is very small (𝜆 = 𝑛𝑝)
3. 𝜆 = 𝑛𝑝 = 𝑓𝑖𝑛𝑖𝑡𝑒
Examples:
1. The number of printing mistakes per page in a large text
2. The number of telephone calls per minute at a switch board
3. The number of defective items manufactured by a company.
Recurrence Relation
𝑒−𝜆𝜆𝑟+1
𝑃(𝑟 + 1) = - ----- (1)
(𝑟+1)!
𝑒−𝜆𝜆𝑟
𝑃(𝑟) = - ----- (2)
(𝑟)!
1 𝑃(𝑟 + 1) 𝑒−𝜆𝜆2. 𝜆 𝑟!
= = 𝑋 −𝜆 2
2 𝑃(𝑟) (𝑟 + 1)𝑟! 𝑒 𝜆
𝜆
𝑃(𝑟 + 1) = ( ) 𝑃(𝑟) 𝑓𝑜𝑟 𝑟 = 0,1,2 − − − −
𝑟+1
Problems
Sol: We have
𝜆
𝑃(𝑟 + 1) = ( ) 𝑃(𝑟) 𝑓𝑜𝑟 𝑟 = 0,1,2 − − − −(1)
𝑟+1
Given 𝜆= 3
𝑒−3𝜆0
𝑃(0) = = 𝑒−3 [ by definition of Poisson distribution ]
(0)!
From (1),
For 𝑟 = 0 , 𝑃(1) = (
3 ) 𝑃(0) =3 𝑒−3
0+1
For 𝑟 = 1 , 𝑃(2) = (
3 ) 𝑃(0) =3 𝑒−3
2
1+1
3 ) 𝑃(0) = 𝑒−3
For 𝑟 = 2 , 𝑃(3) = (
2+1
3
F For 𝑟 = 3 , 𝑃(4) = ( ) 𝑃(0) = 3 𝑒−3
3+1 4
3
or 𝑟 = 4 , 𝑃(5) = ( ) 𝑃(0) =3 𝑒−3.
4+1 5
𝑷(𝑿=𝟐)
2. If X is a random variable such that 𝟑𝑷(𝑿 = 𝟒) = + 𝑷(𝑿 = 𝟎).
𝟐
𝑒−𝜆𝜆𝑟
𝑃(𝑥 = 𝑟) =
𝑟!
2
𝑒−𝜆𝜆 𝜆4 𝑒−𝜆𝜆2 𝑒−𝜆𝜆0
∴3 = +
4! (2)2! 0!
Taking 𝜆2 = 𝑘, 𝑤𝑒 𝑔𝑒𝑡 𝑘2 − 2𝑘 − 4 = 0
∴ 𝑘 = 4, −2
∴ λ2 = 4
i.e λ = 2
3. A car hire firm has 2 cars which it hires out day by day.The number of demands for a
car on each day is distributed as poisson with mean 1.5 Calculate the proportion of days
𝑒−1.5(1.5)0
𝑃(𝑥 = 0) = = 0.223
0!
Expected number of days that demand refused for car = N𝑃(𝑥 > 2)
= 400 = 1.
400
X P(𝑥𝑖) E(𝑥𝑖)
𝑒−1(1)0
0 =0.368 N p(0)
0! =400(0.368)=147.2~147
𝑒−1(1)1
1 =0.368 147
1!
2 𝑒−1(1)2
=0.184
2! 74
𝑒−1(1)3
3 =0.061 24
3!
4 6
𝑒−1(1)4
= 0.015
4!
5 1
𝑒−1(1)5
= 0.003
5!
Let X be a continuous random variable, then it is said to follow normal distribution if its pdf
is given by
2
1 −1( 𝑥−𝜇 )
𝑓(𝑥, 𝜇, 𝜎) = 𝑒 2 𝜎 −∞ ≤ 𝑥 ≤ ∞, 𝜇, 𝜎>0
𝜎√2𝜋
𝜇 − 2𝜎 𝑡𝑜 𝜇 + 2𝜎=95.44%
𝜇 − 3𝜎 𝑡𝑜 𝜇 + 3𝜎=99.73%
The normal distribution with man ‘0’ and variance ‘1’ is said to be standard normal
distribution of its probability density function is defined by
1 1 𝑥−𝑢 2
𝑓(𝑥) = 𝑒−2( 𝜎 ) −∞ < 𝑥 ≤ ∞
𝜎√2𝜋
𝑧2
𝑓(𝑧) = 1 𝑒 −2 −∞ ≤ 𝑥 ≤ ∞ (𝜇 = 0, 𝜎 = 1)
√2𝜋
2
∞ ∞
1 −(x−b)
μ=∫ x f(x)dx = ∫ x e 2σ 2 dx
−∞ −∞ σ√2π
2
1 ∞ − (z) x−b
2 2
σ ∞ − (z) b ∞ −(z)
e dz
= ∫−∞ z e 2 dz + √2π ∫−∞ 2
√2π
2
2b ∞ −(z)
= ∫−0 e 2 dz
√2π
(z)2 (z)2
∴ Mean = b
∞ x−μ 2
1 σ − 2π
= ∫ √ −∞ x2 e
1( σ ) 2
dx
−
μ2
x−μ
Let z = ⟹ dx = σdz
σ
1 ∞
2 2 2 2
−z 2
= ∫(μ +σ z + 2μσz) e 2 σdz − μ
σ√2π
−∞
∞ ∞ ∞
μ2 z2 σ2 −3
2 2μσ −3
2
−
= ∫e 2 dz + ∫ z2 e 2 dz + ∫ z2 e 2 dz − μ2
√2π √2π √2π
−∞ −∞ −∞
∞ ∞
2μ2 2 2σ2 2
−z −3
= ∫e 2 dz + ∫ z2 e 2 dz − μ2
√2π √2π
0 0
2
2σ2 ∞ 2 −z
= √2π ∫0 z e 2 dz
z2 2zdz dt
∵ = +⇒ = dt dz =
2 2 √2t
∞
2σ2 dt
= ∫(2 +)2et
√2π √2t
0
2σ2 ∞ −t 3
= ∫ e +2 −1 .dt
√π 0
2σ2 3
= Γ (2 )
√π
2σ2 11
= Γ( )
√π 2 2
σ2
= √π = σ2
√π
∞
∴ σ2 = ∫ x2 f(x)dx − μ2.
−∞
Let μϵ(−∞, M)
1
Let ∞ f(x)dx = μ f(x)dx + M f(x)dx =
∫−∞ ∫−∞ ∫μ 2
1 μ
Consider μ f(x)dx = e− 1 x−μ 2
) dx
∫ ∫ ( σ
−∞ σ√2π −∞ 2
x−μ
Let z = ⇒ dx = σdz [∵ Limits of z − ∞ ⟶ 0]
σ
μ 1 0 Z2
−
1
π 1
= √ =
√2π 2 2
μ
∴ ∫ f(x)dx = 0 ⇒ μ = M
μ
1 −1(x−μ)
2 −1 x−μ 1
f`(x) = 0 ⇒ e 2 σ
( )2( ) =0
σ√2π 2 σ σ
⇒x−μ=0 ⇒x=μ
⇒x=μ
−1 2 x−μ 2 −1 x−μ 1
11 −1(x−μ) −1
−1
= [e0 + 0]
σ3√2π
−1
= <0
σ3√2π
Problems :
1. If X is a normal variate, find the area A
i) to the left of 𝒛 = 𝟏. 𝟕𝟖
= 0.2881+0.4370=0.7251.
=0.036
2. If the masses of 300 students are normally distributed with mean 68 kgs and standard
deviation 3kgs.How many students have masses
=0.5-0.4082
=0.092
=0.5-0.4082
=0.092
Expected number of students less than or equal to 64 = E(X less than or equal to
64)
=300(0.092)
=27.54~28 students .
𝑥−𝜇 71 − 68
𝑧2 = = =1
𝜎 3
=2(0.341)= 0.6826
3. In a normal distribution 𝟑𝟏% of the items are under 45 and 𝟖% of the items are
𝑥−𝜇
𝑧=
𝜎
Standard normal variate for 𝑋1=45 is
𝑋1 − 𝜇 45 − 𝜇
𝑧1 = =
𝜎 𝜎
⇒ 𝜇 + 𝜎𝑧1 = 45 … … … (1)
4. In a normal distribution 𝟕% of the items are under 35 and 𝟖𝟗% of the items are
𝑥−𝜇
𝑧=
𝜎
Standard normal variate for 𝑋1=35 is
𝑋1 − 𝜇 35 − 𝜇
𝑧1 = =
𝜎 𝜎
⇒ 𝜇 + 𝜎𝑧1 = 35 … … … (1)
⇒ 𝜇 + 𝜎𝑧2 = 63 … … … (2)
Given 𝑃(𝑋 < 35) = 𝑃(𝑧 < 𝑧1)
0.07 = 0.5- 𝑃(−𝑧1 ≤ 𝑧 ≤ 0)
𝑃(0 ≤ 𝑧 ≤ 𝑧1) = 0.43
From normal curve ,we ℎ𝑎𝑣𝑒
⇒ 𝑧1 = 1.48
x 0 1 2 3 4 5
f 142 156 69 27 5 1
ASSIGNMENT QUESTIONS
1. A sales tax officer has reported that the average sales of the 500 business that he has to
deal with during a year is Rs.36,000 with a standard deviation of Rs.10,000. Assuming
that the sales in these business are normally distributed ,find :
i) The number of business as the sales of which are greater than Rs.40,000
ii) The percentage of business the sales of which are likely to range between Rs.30,000 and
Rs.40,000
3. The mean and SD of a normal variate are 8 and 4 respectively .Find i) P( 5≤x≤10)
ii)P(x≥5)
4. Average number of accidents on any day on a national highway is 1.8 .determine the
probability that the number of accidents are i) atleast one ii) at the most one iii) exactly
one
OUTCOME
Make data-driven decisions by using correlation and regression.
CORRELATION AND REGRESSION
CORRELATION
Introduction
Definition Correlation is a statistical tool which studies the relationship b/w 2 variables &
correlation analysis involves various methods & techniques used for studying & measuring the
extent of the relationship b/w them.
Two variables are said to be correlated if the change in one variable results in a corresponding
change in the other.
1) Positive and Negative Correlation: If the values of the 2 variables deviate in the
same direction
i.e., if the increase in the values of one variable results in a corresponding increase in the
values of other variable (or) if the decrease in the values of one variable results in a
corresponding decrease in the values of other variable is called Positive Correlation.
e.g. Heights & weights of the individuals If the increase (decrease) in the values of one
variable results in a corresponding decrease (increase) in the values of other variable is called
Negative Correlation.
2) Linear and Non-linear Correlation: The correlation between two variables is said
to be Linear if the corresponding to a unit change in one variable there is a constant change in
the other variable over the entire range of the values (or) two variables 𝑥, 𝑦 are said to be
linearly related if there exists a relationship of the form y = a + bx.
e.g when the amount of output in a factory is doubled by doubling the number of workers.
Two variables are said to be Non linear or curvilinear if corresponding to a unit change
in one variable the other variable does not change at a constant rate but at fluctuating rate.
i.e Correlation is said to be non linear if the ratio of change is not constant. In other words,
when all the points on the scatter diagram tend to lie near a smooth curve, the correlation is
3) Partial and Total correlation: The study of two variables excluding some other
variables is called Partial correlation .
e.g Price, demand & supply ,all are taken into account.
4) Simple and Multiple correlation: When we study only two variables, the
relationship is described as Simple correlation.
Karl Pearson suggested a mathematical method for measuring the magnitude of linear
relationship between 2 variables. This is known as Pearsonian Coefficient of correlation. It is
denoted by ‘𝑟’. This method is also known as Product-Moment correlation coefficient
Cov(xy)
r=
σxσy
∑ xy
=
Nσxσy
∑ XY
=
√∑ X2 ∑ Y2
Properties
ac
r(aX + b, cY + d) = r(X, Y)
|ac|
4. Two independent variables are uncorrelated. That is if X and Y are independent variables
then r(X, Y) = 0
Charles Edward Spearman found out the method of finding the Coefficient of correlation by
ranks. This method is based on rank & is useful in dealing with qualitative characteristics such
as morality, character, intelligence and beauty. Rank correlation is applicable to only to the
individual observations.
∑ D2
formula: ρ = 6
N(N2−1)
where : ρ - Rank Coefficient of correlation
D2- Sum of the squares of the differences of two ranks
Properties
If any 2 or more items are with same value the in that case common ranks are given to repeated
items. The common rank is the average of the ranks which these items would have assumed, if
they were different from each other and the next item will get the rank next to ranks already
assumed.
1 1
∑ D2+ (m3−m)+ (m3−m)….
Formula: ρ = 1 − 6{ 12 12 }
N3−N
In regression we can estimate value of one variable with the value of the other variable which
is known. The statistical method which helps us to estimate the unknown value of one variable
from the known value of the related variable is called ‘Regression’. The line described in the
average relationship b/w 2 variables is known as Line of Regression.
Regression Equation:
The standard form of the Regression equation is Y = a + b X where a, b are called constants.
‘a’ indicates value of Y when X = 0. It is called Y-intercept. ‘b’ indicates the value of slope of
the regression line & gives a measure of change of y for a unit change in X . it is also called as
regression coefficient of Y on X. The values of a, b are found with the help of following Normal
Equations.
∑ Y= Na + b∑ X
∑ XY = a ∑ X + b∑ X2
∑ X = Na + b∑ Y
∑ XY = a ∑ Y + b∑ Y2
Regression equations when deviations taken from the arithmetic mean :
∑ XY
Regression equation of Y on X : Y − ̅Y = byx (X − ̅X ) where byx = ∑ 2
X
∑ XY
Regression equation of X on Y : X − ̅X = bxy (Y − ̅Y ) where bxy = ∑ Y2
m1−m2
Angle b/w Two Regression lines : tanθ =
1+m1m2
Note:
𝜎x𝜎 y 2
1. If θ is acute then tanθ = (1−𝑟 )
𝜎2x+𝜎2y 𝑟
𝜎x𝜎y 2−1
2. If θ is obtuse then tanθ = (𝑟
)
𝜎2x+𝜎2y 𝑟
π
3. If r = 0 then tan θ = ∞ then θ = . Thus if there is no relationship between the 2
2
π
variables (i.e, they are independent) then θ = .
2
Ht. in 57 59 62 63 64 65 55 58 57
inches
Weight 113 117 126 126 130 129 111 116 112
in lbs
Solution:
∑ XY 216
Coefficient of correlation r = = = 0.98
√∑ X2 ∑ Y2 √(102)(471)
X 12 9 8 10 11 13 7
Y 14 8 6 9 11 12 3
Solution: In both series items are in small number.
Cov(XY)
Formula used: r =
σxσy
X Y X2 Y2 XY
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
13 12 169 144 156
7 3 49 9 21
∑ XY− (∑ X ∑ Y)/N
r=
√(∑ X2)−(∑ X)2/N)(∑ Y2−(∑ Y)2)/N
Here N = 7.
3. A sample of 𝟏𝟐 fathers and their elder sons gave the following data about their
elder sons. Calculate the rank correlation coefficient.
Fathers 65 63 67 64 68 62 70 66 68 67 69 71
Sons 68 66 68 65 69 66 68 65 71 67 68 70
Solution:
Repeated values are given common rank, which is the mean of the ranks .In X: 68 & 67
appear twice.
1 1
∑ D2+ (m 3−m)+ (m3−m)
6(72.5 + 7)
ρ=1−6{ 12 12 } = 1− = 0.722
N3−N 12(122−1)
4. Given 𝐧 = 𝟏𝟎 , 𝛔𝐱= 5.4, 𝛔𝐲 = 𝟔. 𝟐 and sum of product of deviation from the mean of
𝐗 & 𝐘 is 𝟔𝟔. Find the correlation coefficient.
∑(x−x̅ )2
σ2x =
n
∑(y−̅y)2
σy2 =
n
5. The heights of mothers & daughters are given in the following table. From the 2 tables
of regression estimate the expected average height of daughter when the height of the
mother is 𝟔𝟒. 𝟓 inches.
Ht. of 62 63 64 64 65 66 68 70
Mother(inches)
Ht. of the 64 65 61 69 67 68 71 65
daughter(inches)
Solution:
∑X 522
X̅= = = 66.25
N 8
∑Y 530
Y̅= = = 65.25
N 8
∑ dxdy−∑ dx ∑ dy 2(−6)
20 −
byx = N = 8 = 0.434
2 2
(∑ dx)
∑ dx2 − 50 −
N 8
Y = 37.93 + 0.434X
5y − 4x − 3 = 0............................. (2)
(1) × 4 gives 28x − 64y + 36 = 0
(2) × 7 gives -28𝑥 + 35𝑦 − 21 = 0
On adding we get −29𝑦 + 15 = 0 y
= 0.5172
from(1) 7x = 16y − 9 which gives x = 0.1034
since regression line passes through ( x̅, y̅) we have x̅= 0.1034
y̅= 0.5172
16 9
From(1) x = y−
7 7
4 3
From (2) y = x + ,
5 5
σx 16 σy 4
r = and r =
σy 7 σx 5
16 4 64
Multiplying these 2 equations , we get r2 = =
7 5 35
8
r= .
√35
𝟒
7. If 𝛔𝐱 = 𝛔𝐲 = 𝛔 and the angle between the regression lines is Tan-1 ( ). Find 𝐫.
𝟑
σxσy 1−r2
Solution: tanθ = ( )
σ2x+σ2y r
σ2 1−r2
= ( )
2σ2 r
4
By data, θ = Tan−1 ( ).
3
1−r2 4
=
2r 3
3 − 3r2 − 8r = 0
(3r − 1)(r + 3) = 0
1
r= or − 3
3
1
Thus r =
3
𝐗̅= 𝟏𝟎, 𝐘
̅ = 𝟐𝟎, ∑(𝐗 − 𝐘)𝟐 = 𝟏𝟎𝟎, ∑(𝐘 − 𝟏𝟎)𝟐 = 𝟏𝟔𝟎. Find the regression
Solution: Here dx = X − 4, dy = Y − 10
∑ dx N ∑ dx
̅X = A + ⇒ 10 = Y +
5 ⇒ ∑ dx =
30(here A = 4)
∑ dy ∑ dy ⇒ ∑ dy = 50(here B = 10)
̅Y = B + ⇒ 20 = 10 +
N 5
∑ dx ∑ dy
∑ dxdy − −220
byx = N = = 2.75
(∑ dx)2 −80
∑ dx2 −
N
∑ dx ∑ dy
∑ dxdy −
bxy = N = −220 = 0.65
(∑ dy)2 −340
∑ dy2 −
N
𝟏
𝐗 respectively. Show that 𝟎 < 4𝒌 < 1. If 𝐤 = find the means of the two variables
𝟏𝟔
1
Since 0 ≤ r2 ≤ 1, we have 0 ≤ 4K ≤
4
1
If K = then we have X = 4Y + 5 and
16
Y = X/16 + 4
We get X − 4Y − 5 = 0
−X
4Y − 16 = 0
4
X
Adding we get 3 − 21 = 0
4
X = 28
23
From(2), we get Y =
4
4 1 1
We have r2 = 4k = = ⇒ r=±
16 4 2
1
We consider positive value and take r =
2
10. The difference between the ranks are 𝟎. 𝟓, −𝟔, −𝟒. 𝟓, −𝟑, −𝟓, −𝟏, 𝟑, 𝟎, 𝟓, 𝟓. 𝟓, 𝟎, −𝟎. 𝟓.
∑ 𝒎(𝒎𝟐−𝟏)
For refracted ranks 𝐱 𝐚𝐧𝐝 𝐲. =3.5, 𝒓 = 𝟎. 𝟒𝟒. Find the number of terms.
𝟏𝟐
Solution: Given difference (𝑑𝑖) 0.5, −6, −4.5, −3, −5, −1,3,0,5,5.5,0, −0.5
∑ d2i = 156
∑ m(m2−1)
∑ d2i +
Here r = 1 − 6 { 12 }
(N2−N)
1 − (159.5)6
=
(N2 − N)
957
=1−
N2−N
957
⇒ 0.44 = 1 −
N2−N
⇒ N2 − N = 1708.92
⇒ N = 42
TUTORIAL QUESTIONS
1. The heights of mothers & daughters are given in the following table. From the 2 tables
of regression estimate the expected average height of daughter when the height of the
mother is 64.5 inches.
Ht. of Mother(inches) 62 63 64 64 65 66 68 70
Ht. of the daughter(inches) 64 65 61 69 67 68 71 65
3. The marks obtained by 10 students in mathematics and statistics are given below. Find the
coefficient of correlation between the two subjects and the two lines of regression
Marks in 25 28 30 32 35 36 38 42 45 39
mathematics
Marks in 20 26 29 30 25 18 26 35 46 35
Statistics
4. Fit a straight line Y=𝑎0+𝑎1X for the following data and estimate the value of Y when
X=25
X 0 5 10 15 20
Y 7 11 16 20 26
5. Find the rank correlation for the following indices of supply and price of an article:
PRICE 80 100 102 91 100 111 109 100 99 104 111 102 98 111
INDEX 124 100 105 112 102 93 99 115 123 104 99 113 121 103
ASSIGNMENT QUESTIONS
1. Fit a curve of the form Y= a +bX by the method of least squares for the following data:
X 1 2 3 4 5
Y 5 2 4.5 8 12.5
\
2. The marks obtained by 10 students in two subjects are given below. Find the correlation
coefficient and lines of regression
Subject 48 75 30 60 80 53 35 15 40 38
1
Subject 2 44 85 45 54 91 58 63 35 43 45
3. The following table are the marks obtained by 12 students in economics and statistics:
Economics(X) 78 56 36 66 25 62 75 82
62Statistics(Y) 84 44 51 58 60 58 68 62
Obtain the regression lines.
random sample.
OUTCOME
Understand the importance of sampling distribution of a given statistic of a
random sample
SAMPLING
Introduction: The totality of observations with which we are concerned , whether this
number be finite or infinite constitute population. In this chapter we focus on sampling from
distributions or populations and such important quantities as the sample mean and sample
variance.
Def: Population is defined as the aggregate or totality of statistical data forming a subject of
investigation .
The number of observations in the population is defined to be the size of the population. It may
be finite or infinite .Size of the population is denoted by N.As the study of entire population
may not be possible to carry out and hence a part of the population alone is selected.
Def: A portion of the population which is examined with a view to determining the population
characteristics is called a sample . In other words, sample is a subset of population. Size of the
sample is denoted by n.
The process of selection of a sample is called Sampling. There are different methods of
sampling
Classification of Samples:
Large Samples : If the size of the sample n ≥
30 , then it is said to be large sample.
Small Samples : If the size of the sample n < 30 ,then it is said to be small sample or
exact sample.
Central Limit Theorem: If 𝑥̅ be the mean of a random sample of size n drawn from
population having mean 𝜇 and standard deviation 𝜎 , then the sampling distribution of the
sample mean 𝑥̅ is approximately a normal distribution with mean 𝜇 and SD = S.E of 𝑥̅ = 𝜎
√𝑛
provided the sample size n is large.
Standard Error of a Statistic : The standard error of statistic ‘t’ is the standard deviation
of the sampling distribution of the statistic i.e, S.E of sample mean is the standard deviation of
the sampling distribution of sample mean.
𝑃𝑄 𝑃𝑄
S.E of sample proportion p=√ i.e, S.E (p) = √ where Q=1-P
𝑛 𝑛
𝜎12 𝜎22
S.E of the difference of two sample means ̅𝑥̅and ̅𝑥̅ i.e, S.E ( ̅𝑥̅ − ̅𝑥̅) = √ +
1 2 1 2 𝑛1 𝑛2
Ex. Sample proportion is an estimate of population proportion , because with the help of
sample proportion value we can estimate the population proportion value.
Types of Estimation:
Point Estimation: If the estimate of the population parameter is given by a single value
, then the estimate is called a point estimation of the parameter.
Interval Estimation: If the estimate of the population parameter is given by two
different values between which the parameter may be considered to lie, then the
estimate is called an interval estimation of the parameter.
𝑧𝛼 2𝑃𝑄
𝑛= 𝐸2
where 𝑧𝛼 – Critical value of z at 𝛼 Level of significance
P − Population proportion
𝑄 − 1-P
𝐸 − Maximum Sampling error = p-P
Testing of Hypothesis :
It is an assumption or supposition and the decision making procedure about the assumption
whether to accept or reject is called hypothesis testing .
Def: Statistical Hypothesis : To arrive at decision about the population on the basis of sample
information we make assumptions about the population parameters involved such assumption
is called a statistical hypothesis .
Null hypothesis: A definite statement about the population parameter. Usually a null
hypothesis is written as no difference , denoted by 𝐻0.
Ex. 𝐻0: 𝜇 = 𝜇0
Alternative hypothesis : A statement which contradicts the null hypothesis is called
alternative hypothesis. Usually an alternative hypothesis is written as some difference
, denoted by 𝐻1.
Setting of alternative hypothesis is very important to decide whether it is two-tailed or
one – tailed alternative , which depends upon the question it is dealing.
Ex.𝐻1: 𝜇 ≠ 𝜇0 (Two – Tailed test)
or
𝐻1: 𝜇 > 𝜇0 (Right one tailed test)
or
𝐻1: 𝜇 < 𝜇0 (Left one tailed test)
Left one tailed test: If 𝐻1 has < sign , the critical region is taken in the left side of the
distribution.
Right one tailed test : If 𝐻1 has > sign , the critical region is taken on right side of the
distribution.
Errors of Sampling :
While drawing conclusions for population parameters on the basis of the sample results , we
have two types of errors.
Type I error : Reject 𝐻0 when it is true i.e, if the null hypothesis 𝐻0 is true but it is
rejected by test procedure .
Type II error : Accept 𝐻0 when it is false i.e, if the null hypothesis 𝐻0is false but it is
accepted by test procedure.
DECISION TABLE
𝑯𝟎 is accepted 𝑯𝟎 is rejected
𝑯𝟎 is true Correct Decision Type I Error
𝑯𝟎 is false Type II Error Correct Decision
Problems:
1. If the population is 3,6,9,15,27
a) List all possible samples of size 3 that can be taken without replacement
from finite population
b) Calculate the mean of each of the sampling distribution of means
c) Find the standard deviation of sampling distribution of means
81+36+9+9+225 360
=√ =√ = 8.4853
5 5
= 13.3
2. A population consist of five numbers 2,3,6,8 and 11. Consider all possible samples of
size two which can be drawn with replacement from this population .Find
(𝑥𝑖−𝑥̅)2
𝜎2 = ∑
𝑛
(2−6)2+(3−6)2+(6−6)2+(8−6)2+(11−6)2
=
5
∴ 𝜎 = √5.4 = 2.32
3. When a sample is taken from an infinite population , what happens to the standard
error of the mean if the sample size is decreased from 800 to 200
let n= 𝑛2=200
𝜎 𝜎
Then S.E2 = √200 =
10√2
𝜎 = 2( 𝜎
) = 2 (S.𝐸1)
∴ S.E2 = 10√2 20√2
Hence if sample size is reduced from 800to 200, S. E. of mean will be multiplied by 2
4. The variance of a population is 2 . The size of the sample collected from the
population is 169. What is the standard error of mean
𝜎 √2
Standard Error of mean = = = 1.41 = 0.185
√𝑛 √169 13
5. The mean height of students in a college is 155cms and standard deviation is 15 . What
is the probability that the mean height of 36 students is less than 157 cms.
Thus the probability that the mean height of 36 students is less than 157 = 0.7881
6. A random sample of size 100 is taken from a population with 𝝈 = 5.1 . Given that the
sample mean is ̅𝒙 = 21.6 Construct a 95% confidence limits for the population mean .
Sol: We have maximum error (E) = 10 days , 𝜎 = 60 days and 𝑧𝛼⁄2 = 1.645
𝑧𝛼⁄ .𝜎 2 1.645 x 60 2
2
∴n=[ ] =[ ] = 97
𝐸 10
8. A random sample of size 64 is taken from a normal population with 𝝁 = 𝟓𝟏. 𝟒 and 𝝈 =
6.8.What is the probability that the mean of the sample will a) exceed 52.9 b) fall
between 50.5 and 52.3 c) be less than 50.6
𝑥̅̅2̅−𝜇 52.3−51.4
𝑧 2= 𝜎 = = 1.06
0.85
√𝑛
6𝑥̅ 3
= -
𝜎 4
If Z < 0.75, ̅𝑥 is negative
P(z < 0.75) = P( − ∞ < 𝑧 < 0.75 )
0 0.75
= ∫− ∞ ∅(𝑧) dz + ∫ ∅(𝑧)dz = 0.50 + 0.2734
0
= 0.7734
10. The guaranteed average life of a certain type of electric bulbs is 1500hrs with a S.D
of 10 hrs. It is decided to sample the output so as to ensure that 95% of bulbs do not fall
short of the guaranteed average by more than 2% . What will be the minimum sample
size ?
𝑥̅− 𝜇 1470−1500 √𝑛
∴ |𝑧| = | 𝜎 | =| 120 |= 4
√𝑛 √𝑛
From the given condition , the area of the probability normal curve to the left of
√𝑛
should be 0.95
4
We do not want to know about the bulbs which have life above the guranteed life .
∴ n = 44
11. A normal population has a mean of 0.1 and standard deviation of 2.1 . Find the
probability that mean of a sample of size 900 will be negative .
12. In a study of an automobile insurance a random sample of 80 body repair costs had a
mean of Rs 472.36 and the S.D of Rs 62.35. If ̅𝒙 is used as a point estimator to the true
average repair costs , with what confidence we can assert that the maximum error doesn’t
exceed Rs 10.
∴ 𝑍𝛼⁄2= 1.43
13. If we can assert with 95% that the maximum error is 0.05 and P = 0.2 find the size of
the sample.
14. The mean and standard deviation of a population are 11,795 and 14,054 respectively
What can one assert with 95 % confidence about the maximum error if ̅𝒙 = 11,795 and n
= 50. And also construct 95% confidence interval for true mean .
𝜎 𝜎
∴ Confidence interval = ( 𝑥̅ − 𝑍𝛼⁄2 . √𝑛 , 𝑥̅ + 𝑍𝛼⁄2 . √𝑛 )
= (11795-3899, 11795+3899)
= (7896, 15694)
15. Find 95% confidence limits for the mean of a normally distributed population from
which the following sample was taken 15, 17 , 10 ,18 ,16 ,9, 7, 11, 13 ,14.
15+17+10+18+16+9+7+11+13+14
Sol: We have 𝑥̅ = = 13
10
(𝑥𝑖−𝑥̅)2
𝑆2 = ∑
𝑛−1
= 1 [(15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 +
9
(15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2 + (15 − 13)2]
40
=
3
𝑠
∴ Confidence limits are 𝑥̅ ± 𝑍𝛼⁄2 . √𝑛 = 13 ± 2.26 = ( 10.74 , 15.26 )
16. A random sample of 100 teachers in a large metropolitan area revealed mean weekly
salary of Rs. 487 with a standard deviation Rs.48. With what degree of confidence can
we assert that the average weekly of all teachers in the metropolitan area is between 472
to 502 ?
= 2 ( 0.4991) = 0.9982
a) List all possible samples of size 3 that can be taken without replacement from finite
population
b) Calculate the mean of each of the sampling distribution of means
2. A population consist of five numbers 2,3,6,8 and 11. Consider all possible samples of size
two which can be drawn with replacement from this population .Find
a) The mean of the population
3. A random sample of size 100 is taken from a population with 𝜎 = 5.1 . Given that the
sample mean is 𝑥̅ = 21.6 Construct a 95% confidence limits for the population mean .
4. A normal population has a mean of 0.1 and standard deviation of 2.1 . Find the probability
and 𝜎 = 6.8.What is the probability that the mean of the sample will
a) exceed 52.9
b) fall between 50.5 and 52.3 c) be less than 50.6.
ASSIGNMENT QUESTIONS
1. A manufacturer claimed that at least 95% of the equipment which he supplied to factory
conformed to specifications . An examination of a sample of 200 pieces of equipment
revealed that 180 were faulty .Test his claim at 5% an 1% LOS.
2. Write about i) critical region ii) one tailed and two tailed test
3. Define sample. Explain the different methods that are involved in selecting the sample.
4. Explain about i) Type I error ii) Type II error
5.a)Explain the five step procedure for testing of hypothesis
b) Explain about i) point estimation ii) interval estimation
UNIT 5
STATISTICAL INFERENCES
OBJECTIVE
To make inferences about a population from sample data
OUTCOME
Draw statistical inference using samples of a given size which is
experimental data.
STATISTICAL INFERENCES
Large Samples: Let a random sample of size n >30 is defined as large sample.
1. Null hypothesis: 𝐻0: There is no significant difference in the given population mean
value say ‘𝜇′0.
i.e 𝐻0: µ = 𝜇0
NOTE: Confidence limits for the mean of the population corresponding to the given sample.
Let ̅𝑥̅1̅ & ̅𝑥̅2̅ be the means of the samples of two ramdom sizes 𝓃1 & 𝓃2 drawn from two
populations having means 𝜇1&𝜇2 and SD’s 𝜎1&𝜎2
i) 𝐍𝐮𝐥𝐥 𝐡𝐲𝐨𝐩𝐨𝐭𝐡𝐞𝐬𝐢𝐬: 𝐻0: 𝜇1 = 𝜇2
𝑐𝑎𝑙 𝑆𝐸 𝑜𝑓 ( ×
̅ 1−×
̅2) 𝜎2 𝜎 2
√ 1+ 2
𝑛1 𝑛2
𝜎 𝜎 √𝑛
√ 1 2 1 𝑛2
+
𝑛1 𝑛2
𝜇1 − 𝜇2 = ( 𝑋̅ 1 − 𝑋̅ 2 ) ± 𝑧∝⁄2[𝑆. 𝐸 𝑜𝑓 ( 𝑋̅ 1 − 𝑋
̅ 2 )]
Let p1 and p2 be the sample proportions in two large random samples of sizes n1 & n2
drawn from two populations having proportions P 1 & P2
𝑝1−𝑝2 x x
𝑍𝑐𝑎𝑙 = 𝑃1𝑞1 𝑃2𝑞1
where p1 = 1 & p1 = 2
√ + n1 n2
𝑛1 𝑛2
OR
n1p1+n2p2
Z =
p1 − p2 Where p = = x1 +x2 and q = 1- p
cal 1 1
n1+n2 n1+n2
√pq( + )
n1 n2
CRITICAL VALUES OF Z
LOS ∝ 1% 5% 10%
µ≠ µ0 /Z/>2.58 /Z/>1.96 /Z/>1.645
µ> µ0 Z>2.33 z>1.645 Z>1.28
µ< µ0 Z<-2.33 Z<-1.645 Z<-1.28
𝑃1 − 𝑃2 = (𝑝1 − 𝑝2) ± 𝑍∝ (𝑆 . 𝐸 𝑜𝑓 𝑃1 − 𝑃2 )
2
Problems:
1. A sample of 64 students have a mean weight of 70 kgs . Can this be regarded as
asample mean from a population with mean weight 56 kgs and standard deviation
25 kgs.
Sol : Given ̅𝑥 = mean of he sample = 70 kgs
𝜇 = Mean of the population = 56 kgs
𝜎 = S.D of population = 25 kgs
and 𝑛 = Sample size = 64
2. In a random sample of 60 workers , the average time taken by them to get to work
is 33.8 minutes with a standard deviation of 6.1 minutes . Can we reject the null
hypothesis 𝝁 = 32.6 in favor of alternative null hypothesis 𝝁 > 32.6 at 𝜶 =
0.05 LOS
Sol : Given n = 60 , ̅𝑥 = 33.8 , 𝜇 = 32.6 and 𝜎 = 6.1
i) Null Hypothesis 𝐻0 : 𝜇 = 32.6
ii) Alternative Hypothesis 𝐻1 : 𝜇 > 32.6 ( Right one tailed test )
iii) Level of significance : 𝛼 = 0.01 (𝑍𝛼 = 2.33 )
𝑥̅−𝜇
iv) Test Statistic : 𝑍 𝑐𝑎𝑙 = 𝜎 =
33.8−32.6
6.1
= 1.2 = 1.5238
0.7875
√𝑛 √60
1.96(10)
i.e., (40 − , 40 + 1.96(10) )
√400 √400
1.96(10) 1.96(10)
= (40 − , 40 + )
20 20
= ( 40 – 0.98 , 40 + 0.98 )
= ( 39.02 , 40.98 )
4. An insurance agent has claimed that the average age of policy holders who issue
through him is less than the average for all agents which is 30.5. A random sample
of 100 policy holders who had issued through him gave the following age
distribution .
Age 16-20 21-25 26-30 31-35 36-40
No# of 12 22 20 30 16
persons
Calculate the arithmetic mean and standard deviation of this distribution and
use these values to test his claim at 5% los.
𝑑𝑖 = 𝑥 𝑖 – A
ℎ ∑ 𝑓𝑖𝑑𝑖
𝑥̅ = A +
𝑁
= 28 + 5 x 16 = 28.8
100
∑ 𝑓𝑑2 ∑ 2 16 2
S.D : S = h √ − ( 𝑓𝑑) = 5. √164 − ( ) = 6.35
𝑁 𝑁 100 100
5. An ambulance service claims that it takes on the average less than 10 minutes to
reach its destination in emergency calls . A sample of 36 calls has a mean of 11
minutes and the variance of 16 minutes .Test the claim at 0.05 los?
Sol : Given n = 36 , ̅𝑥 =11 , 𝜇 = 10 and 𝜎 = √16 = 4
i) Null Hypothesis 𝐻0 : 𝜇 = 10
ii) Alternative Hypothesis 𝐻1 : 𝜇 < 10 ( Left one –tailed test )
iii) Level of significance : 𝛼 = 0.05 (𝑍𝛼 = 1.645 )
𝑥̅−𝜇 11−10
iv) Test Statistic : 𝑍 𝑐𝑎𝑙 = 𝜎 = 4
= 6 = 1.5
4
√𝑛 √36
6. The means of two large samples of sizes 1000 and 2000 members are 67.5 inches
and 68 inches respectively . Can the samples be regarded as drawn from the same
population of S.D 2.5 inches.
Sol: Let 𝜇1 and 𝜇2 be the means of the two populations
Given 𝑛1 = 1000 , 𝑛2 = 2000 and ̅𝑥 1 = 67.5 inches , ̅𝑥 2 = 68 inches
Population S.D, 𝜎 = 2.5 inches
i) Null Hypothesis 𝐻0 :The samples have been drawn from the same population of
S.D 2.5 inches
i.e., 𝐻0 : 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 ≠ 𝜇2 ( Two – Tailed test)
iii) Level of significance : 𝛼 = 0.05 (𝑍𝛼 = 1.96 )
𝑋̅ 1 − 𝑋̅ 2 67.5−68 −0.5 = -5.16
iv) Test Statistic : 𝑍 𝑐𝑎𝑙 = 𝜎 1 1 = = 0.0968
+ ( 1 + 1 )
2
√𝑛 √(
1 𝑛2 2.5) 1000 2000
7. Samples of students were drawn from two universities and from their weights in
kilograms , mean and standard deviations are calculated and shown below. Make
a large sample test to test the significance of the difference between the means.
Mean S .D Size of the sample
University A 55 10 400
University B 57 15 100
8. The average marks scored by 32 boys is 72 with a S.D of 8 . While that for 36 girls
is 70 with a S.D of 6. Does this data indicate that the boys perform better than girls
at 5% los ?
Sol: Let 𝜇1 and 𝜇2 be the means of the two populations
Given 𝑛1 = 32 , 𝑛2 = 36 and ̅𝑥 1 = 72 , ̅𝑥 2 = 70
𝜎1 = 8 and 𝜎2 = 6
i) Null Hypothesis 𝐻0 : 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 > 𝜇2 ( Right One Tailed test)
iii) Level of significance : 𝛼 = 0.05 (𝑍𝛼 = 1.645 )
iv) Test Statistic : 𝑍 = 𝑥̅1− 𝑥̅2 = 72− 70 = 2 = 1.1547
𝑐𝑎𝑙 𝜎 2 𝜎 2 2 2 √2+1
√ 1 + 2 √8 + 6
𝑛1 𝑛2 32 36
v) Conclusion: Since |𝑍 𝑐𝑎𝑙| 𝑣𝑎𝑙𝑢𝑒 < 𝑍𝛼 value , we accept 𝐻0
Hence , we conclude that the performance of boys and girls is the same
9. A sample of the height of 6400 Englishmen has a mean of 67.85 inches and a S.D
of 2.56 inches while another sample of heights of 1600 Austrians has a mean of
68.55 inches and S.D of 2.52 inches. Do the data indicate that Austrians are on
the average taller than the Englishmen ? (Use 𝜶 𝒂𝒔 𝟎. 𝟎𝟏)
Sol : Let 𝜇1 and 𝜇2 be the means of the two populations
Given 𝑛1 = 6400 , 𝑛2 = 1600 and ̅𝑥 1 = 67.85 , ̅𝑥 2 = 68.55
𝜎1 = 2.56 and 𝜎2 = 2.52
i) Null Hypothesis 𝐻0 : 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 < 𝜇2 ( Left One Tailed test)
iii) Level of significance : 𝛼 = 0.01 (𝑍𝛼 = - 2.33 )
𝑥̅1− 𝑥̅2
iv) Test Statistic : 𝑍 𝑐𝑎𝑙 = = 67.85− 68.55
2 2
𝜎12 𝜎22 2.56 2.52
√ +𝑛 √ +
𝑛1 2 6400 1600
67.85 − 68.55
=
6.5536 6.35
√ +
6400 1600
− 0.7 − 0.7
= = 0.0707 - 9.9
√0.001+0.004
10. At a certain large university a sociologist speculates that male students spend
considerably more money on junk food than female students. To test her
hypothesis the sociologist randomly selects from records the names of 200 students
. Of thee , 125 are men and 75 are women . The mean of the average amount spent
on junk food per week by the men is Rs. 400 and S.D is 100. For the women the
sample mean is Rs. 450 and S.D is 150. Test the hypothesis at 5 % los ?
Sol: Let 𝜇1 and 𝜇2 be the means of the two populations
Given 𝑛1 = 125 , 𝑛2 = 75 and ̅𝑥 1 = Mean of men = 400 , ̅𝑥 2 = Mean of women = 450
𝜎1 = 100 and 𝜎2 = 150
i) Null Hypothesis 𝐻0 : 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 > 𝜇2 ( Right One Tailed test)
iii) Level of significance : 𝛼 = 0.05 (𝑍𝛼 = 1.645 )
𝑥̅1− 𝑥̅2 400− 450
iv) Test Statistic : 𝑍 𝑐𝑎𝑙 = =
2
𝜎12 𝜎22 1002 150
√ +𝑛 √ +
𝑛1 2 125 75
− 50
=
√80 + 300
− 50 − 50 = - 2.5654
= =
√380 19.49
v) Conclusion: Since Zcalvalue < Zα value , we accept 𝐻0
Hence , we conclude that difference between the means are equal
11. The research investigator is interested in studying whether there is a significant
difference in the salaries of MBA grads in two cities. A random sample of size 100
from city A yields an average income of Rs. 20,150 . Another random sample
of size 60 from city B yields an average income of Rs. 20,250. If the variance are
given as 𝝈𝟏𝟐 = 40,000 and
𝝈𝟐𝟐 = 32,400 respectively . Test the equality of means and also construct 95%
confidence limits.
Sol: Let 𝜇1 and 𝜇2 be the means of the two populations
Given 𝑛1 = 100 , 𝑛2 = 60 and ̅𝑥 1 = Mean of city A = 20,150 , ̅𝑥 2 = Mean of city B =
20,250
𝜎12 = 40,000 and 𝜎22 = 32,400
i) Null Hypothesis 𝐻0 : 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 ≠ 𝜇2 (Two -Tailed test)
iii) Level of significance : 𝛼 = 0.05 (𝑍𝛼 = 1.96 )
iv) Test Statistic : 𝑍 = 𝑥̅1− 𝑥̅2 = 20,150− 20,250
𝑐𝑎𝑙 𝜎 2 𝜎 2 40000 32400
√ +
√ 1 + 2 100 60
𝑛1 𝑛2
100
=
√400 + 540
100
= 30.66 = 3.26
40000
= (20,150 – 20,250) )± 1.96√ + 32400 = (39.90, 160.09)
100 60
12. A die was thrown 9000 times and of these 3220 yielded a 3 or 4. Is this consistent
with the hypothesis that the die was unbiased?
Sol : Given n = 9000
P = Population of proportion of successes
= P( getting a 3 or 4 ) = 1 + 1 = 2 = 1 0.3333
6 6 6 3
Q = 1- P = 0.6667
14. A manufacturer claimed that at least 95% of the equipment which he supplied to
a factory conformed to specifications . An experiment of a sample of 200 piece of
equipment revealed that 18 were faulty .Test the claim at 5% los ?
Sol : Given n = 200
Number of pieces confirming to specifications = 200-18 = 182
∴ p = Proportion of pieces confirming to specification = 182 = 0.91
200
P = Population proportion = 95
= 0.95
100
v) Conclusion: We reject 𝐻0
Hence , we conclude that the manufacturer’s claim is rejected.
15. Among 900 people in a state 90 are found to be chapatti eaters . Construct 99%
confidence interval for the true proportion and also test the hypothesis for single
proportion ?
Sol: Given x = 90 , n = 900
∴ p = 𝑥 = 90 = 1 = 0.1
𝑛 100 10
And q = 1- p= 0.9
( )
Now √𝑝𝑞 = √ 0.1 (0.9) = 0.01
𝑛 900
Confidence interval is 𝑃 = 𝑝 ± 𝑍∝ (√𝑝𝚐)
2 𝑛
i) Null Hypothesis 𝐻0 : 𝑃1 = 𝑃2
ii) Alternative Hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2 ( Two Tailed test)
iii) Level of Significance : 𝛼 = 0.05 (𝑍𝛼 = 1.96 )
iv) Test Statistic : 𝑍 = 𝑝1 − 𝑝2
𝑐𝑎𝑙 1 1
√𝑝𝑞( + )
𝑛1 𝑛2
𝑛1𝑝1+𝑛2𝑝2
= 𝑥1 +𝑥2 = 200+40
We have p = 400+200
= 240
600
= 25
𝑛1+𝑛2 𝑛1+𝑛2
q = 1- p = 3
5
0.5−0.2
= 1 1 = 7.07
√(0.4)(0.6) ( + )
400 200
17. A machine puts out 16 imperfect articles in a sample of 500 articles . After the
machine is overhauled it puts out 3 imperfect articles in a sample of 100 articles .
Has the machine is improved ?
Sol : Let 𝑃1 and 𝑃2 be the proportions of imperfect articles in the proportion of
articles manufactured by the machine before and after overhauling , respectively.
Given 𝑛1 = Sample size before the machine overhauling = 500
𝑛2 = Sample size after the machine overhauling = 100
𝑥1 = Number of imperfect articles before overhauling = 16
𝑥2 = Number of imperfect articles after overhauling = 3
∴ 𝑝 = 𝑥1 = 16 = 0.032 and 𝑝 = 𝑥2 = 3 = 0.03
1 𝑛1 500 2 𝑛2 100
i) Null Hypothesis 𝐻0 : 𝑃1 = 𝑃2
ii) Alternative Hypothesis 𝐻1 : 𝑃1 > 𝑃2 ( Left one Tailed test)
iii) Level of Significance : 𝛼 = 0.05 (𝑍𝛼 = 1.645 )
iv) Test Statistic : 𝑍 = 𝑝1 − 𝑝2
𝑐𝑎𝑙 1 1
√𝑝𝑞( 𝑛1
+𝑛 )
2
𝑛1𝑝1+𝑛2𝑝2 𝑥1 +𝑥2 16+3 19
We have p = = = = = 0.032
𝑛1+𝑛2 𝑛1+𝑛2 500+100 600
q = 1- p = 0.968
0.032−0.03
= 1 1
√(0.032)(0.968) ( + )
500 100
0.002 = 0.104
0.019
Sol: Let 𝑃1 and 𝑃2 be the proportions of defective units in the population of units inspected
in machine 1 and Machine 2 respectively.
i) Null Hypothesis 𝐻0 : 𝑃1 = 𝑃2
ii) Alternative Hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2 ( Two Tailed test)
iii) Level of Significance : 𝛼 = 0.05 (𝑍𝛼 = 1.96 )
iv) Test Statistic : 𝑍 = 𝑝1 − 𝑝2
𝑐𝑎𝑙 1 1
√𝑝𝑞( + )
𝑛1 𝑛2
q = 1- p = 1- 0.047 = 0.953
0.045−0.049
= 1 1
√(0.047)(0.953) ( + )
375 450
= - 0.267
v) Conclusion: Since |Zcal|value < Zα value , we accept 𝐻0
Hence we conclude that there is no significant difference in performance of
machines.
19. A cigarette manufacturing firm claims that its brand A line of cigarettes outsells
its
brand B by 8% . If it is found that 42 out of 200 smokers prefer brand A and 18
out of another sample of 100 smokers prefer brand B . Test whether 8% difference
is a valid claim?
Sol: Given 𝑛1 = 200
𝑛2 = 100
𝑥1 = Number of smokers preferring brand A= 42
𝑥2 = Number of smokers preferring brand B = 18
∴ 𝑝 = 𝑥1 = 42 = 0.21 and 𝑝 = 𝑥2 = 18 = 0.18
1 𝑛1 200 2 𝑛2 100
and 𝑃1 - 𝑃2 = 8% = 0.08
= −0.05 = - 1.02
0.0489
v) Conclusion: Since |𝑍𝑐𝑎𝑙|𝑣𝑎𝑙𝑢𝑒 < 𝑍𝛼 value , we accept 𝐻0
Hence we conclude that 8% difference in the sale of two brands of cigarettes is a
valid claim.
20. In a city A , 20% of a random sample of 900 schoolboys has a certain slight physical
defect . In another city B ,18.5% of a random sample of 1600 school boys has the
same defect . Is the difference between the proportions significant at 5% los?
Sol: Given 𝑛1 = 900
𝑛2 = 1600
𝑥1 = 20% of 900 = 180
𝑥2 = 18.5% of 1600 = 296
180
∴ 𝑝 = 𝑥1 = = 0.2 and 𝑝 = 𝑥2 = 296 = 0.185
1 𝑛1 900 2 𝑛2 1600
i) Null Hypothesis 𝐻0 : 𝑃1 = 𝑃2
ii) Alternative Hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2 ( Two Tailed test)
iii) Level of Significance : 𝛼 = 0.05 (𝑍𝛼 = 1.96 )
iv) Test Statistic : 𝑍 = (𝑝1 − 𝑝2)
𝑐𝑎𝑙 1 1
√𝑝𝑞( + )
𝑛1 𝑛2
𝑛1𝑝1+𝑛2𝑝2 476
= 𝑥1 +𝑥2 = 180+296 = 0.19
We have p = =
𝑛1+𝑛2 𝑛1+𝑛2 900+1600 2500
q = 1- p = 1- 0.19 = 0.81
0.2−0.185
𝑍𝑐𝑎𝑙 = 1 1
√(0.19)(0.81) ( + )
900 1600
−0.015
= = - 0.918
0.01634
v) Conclusion: Since |𝑍𝑐𝑎𝑙|𝑣𝑎𝑙𝑢𝑒 < 𝑍𝛼 value , we accept 𝐻0
Hence we conclude that there is no significant difference between the proportions.
SMALL SAMPLES
Introduction When the sample size n < 30, then if is referred to as small samples. In this
sampling distribution in many cases may not be normal ie., we will not be justified in estimating
the population parameters as equal to the corresponding sample values.
Degree Of Freedom The number of independent variates which make up the statistic is
known as the degrees of freedom (d.f) and it is denoted by 𝜗.
For Example: If 𝑥1 + 𝑥2 + 𝑥3 = 50 and we assign any values to two os the variables (say
x1,x2 ), then the values of x3 will be known. Thus, the two variables are free and independent
choices for finding the third.
𝑥̅−𝜇
𝑡= is a random variable having the 𝑡 − distribution with 𝜗 = 𝑛 − 1 degrees of freedom.
𝑆⁄
√𝑛
Properties of 𝒕 − Distribution
1. The shape of 𝑡 −distribution is bell shaped, which is similar to that of normal
distribution and is symmetrical about the mean.
2. The mean of the standard normal distribution as well as 𝑡 −distribution is zero, but
the variance of 𝑡 −distrubution depends upon the parometer 𝜗 which is called the
degrees of freedom.
3. The variance of 𝑡 −distribution exceeds 1, but approaches 1 as 𝑛 → ∞.
Applications of 𝒕 – Distributions
𝑋̅−𝜇
If ′𝜎′ is unknown, then 𝑡 = where
𝑆⁄
√𝑛
̅ )2
(𝑋 𝑖 −𝑋
2
𝑆 =∑
𝑛−1
𝑥̅1−𝑥̅2 ------------
𝑡= (1) where
√𝑆2( 1 + 1 )
𝑛1 𝑛2
∑ 𝑥1
̅𝑥 = ,̅ =
∑ 𝑥2 and
1 𝑛1 2 𝑛2
1 2
OR 𝑆2= 1
[(𝑛1 𝑠2) + (𝑛2 𝑠2)]
𝑛1+𝑛2−2
± t𝝰 (√𝑆2 ( 1 + 1 ))
𝑛1 𝑛2
For Example: To test the effectiveness of ‘drug’ some // person’s blood pressure is measured
before and after the intake of certain drug. Here the individual person is the experimental unit
and the two populations are blood pressure “before” and “after” the drug is given
Paired t-test is applied for n paired observations by taking the differences d1,d2 ----------- dn of the
paired data. To test whether the differences di from a random sample of a population with mean
𝜇.
𝑑̅ 1 2
𝑡=𝑠 𝑤ℎ𝑒𝑟𝑒 𝑑̅ = 𝜖 𝑑𝑖 and 𝑠2 = 1 ̅)
∑(𝑑 − 𝑑
⁄ 𝑛 𝑛 𝑛−1
√
Problems:
1. A sample of 26 bulbs gives a mean life of 990 hours with a S.D of 20 hours. The
manufacturer claims that the mean life of bulbs is 1000 hours . Is the sample not upto
the standard?
Sol: Given n = 26
̅𝑥 = 990
𝜇 = 1000 and S.D i.e., s = 20
i) Null Hypothesis : 𝐻0 : 𝜇 = 1000
ii) Alternative Hypothesis: 𝐻1 : 𝜇 < 1000( Left one tailed test )
(Since it is given below standard)
iii) Level of significance : 𝛼 = 0.05
t tabulated value with 25 degrees of freedom for left tailed test is 1.708
𝑥̅−𝜇 990−1000 = − 2.5
iv) Test Statistic : 𝑡 𝑐𝑎𝑙 = 𝑠 = 20
√𝑛−1 √25
i) Null Hypothesis 𝐻0 : 𝜇 = 56
ii) Alternative Hypothesis 𝐻1 : 𝜇 ≠ 56 (Two tailed test )
iii) Level of significance : 𝛼 = 0.05
t tabulated value with 15 degrees of freedom for two tailed test is 2.13
𝑥̅−𝜇 53−56
iv) Test Statistic : 𝑡 𝑐𝑎𝑙 = 𝑆 = √10
= − 3.79
√𝑛 √15
3. A random sample of 10 boys had the following I.Q’s : 70, 120 ,110, 101,88,
83,95,98,107 and 100.
a) Do these data support the assumption of a population mean I.Q of 100?
b) Find a reasonable range in which most of the mean I.Q values of samples of
10 boys lie
Sol: Since mean and s.d are not given
We have to determine these
x x − x̅ (x − x̅ ) 2
70 -27.2 739.84
120 22.8 519.84
110 12.8 163.84
101 3.8 14.44
88 -9.2 84.64
83 -14.2 201.64
95 -2.2 4.84
98 0.8 0.64
107 9.8 96.04
100 2.8 7.84
∑ 𝑥 = 972 ∑(x − x̅ ) 2
= 1833.60
Mean , ̅𝑥 = ∑ 𝑥 = 972 = 97.2 and
𝑛 10
1833.6
𝑆2 = 1 ∑(x − x̅ ) 2 =
𝑛−1 9
∴ S = √203.73 = 14.27
i) Null Hypothesis 𝐻0 : 𝜇 = 100
ii) Alternative Hypothesis 𝐻1 : 𝜇 ≠ 100 (Two tailed test )
iii) Level of significance : 𝛼 = 0.05
t tabulated value with 9 degrees of freedom for two tailed test is 2.26
𝑥̅−𝜇 97.2−100
iv) Test Statistic : 𝑡 𝑐𝑎𝑙 = 𝑆 = 14.27 = − 0.62
√𝑛 √10
4. Samples of two types of electric bulbs were tested for length of life and following
data were obtained
Type 1 Type 2
Sample number , 𝒏𝟏 = 8 𝒏𝟐 = 7
Sample mean , ̅𝒙̅𝟏̅ = 1234 ̅𝒙̅𝟐̅ = 1036
Sample S.D , 𝒔𝟏= 36 𝒔𝟐= 40
Is the difference in the mean sufficient to warrant that type 1 is superior to type
2 regarding length of life .
Sol: i) Null Hypothesis 𝐻0 : The two types of electric bulbs are identical
i.e., 𝐻0: 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 ≠ 𝜇2
iii) Test Statistic : 𝑡 = 𝑥̅1− 𝑥̅2
𝑐𝑎𝑙 1 1
√𝑠(𝑛 +𝑛 )
1 2
2 2
Where 𝑆2 = 𝑛1𝑠1 +𝑛1𝑠1
𝑛1+𝑛2
1
=
8+7−2
(8(36)2 + 7(40)2 ) = 1659.08
1234− 1036
∴t= = 9.39
1 1
√1659.08 ( + )
8 7
iv) Degrees of freedom = 8+7-2 =13 ,tabulated value of t for 13 d.f at 5% los is 2.16
v)Conclusion: Since |𝑡 𝑐𝑎𝑙| 𝑣𝑎𝑙𝑢𝑒 > 𝑡𝛼 value , we reject 𝐻0
Hence we conclude that the two types 1 and 2 of electric bulbs are not identical .
5. Two horses A and B were tested according to the time to run a particular track with
the following results .
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity
Sol: Given 𝑛1 = 7 , 𝑛2 = 6
We first compute the sample means and standard deviations
̅𝑥 = Mean of the first sample = 1 ( 28 + 30 + 32 + 33 + 33 + 29 + 34)
7
= 1 (219) = 31.286
7
𝑦
̅ = Mean of the second sample = 1 ( 29 + 30 + 30 + 24 + 27 + 29 )
6
= 1 (169) = 28.16
6
𝑥 𝑥 − 𝑥̅ (𝑥 − ̅𝑥 ) 2 𝑦 𝑦 − 𝑦̅ (𝑦 − 𝑦̅ ) 2
28 -3.286 10.8 29 0.84 0.7056
30 -1.286 1.6538 30 1.84 3.3856
32 0.714 0.51 30 1.84 3.3856
33 1.714 2.94 24 -416 17.3056
33 1.714 2.94 27 -1.16 1.3456
29 -2.286 5.226 29 0.84 0.7056
34 2.714 7.366
∑𝑥 ∑(𝑥 − ̅𝑥 ) 2 ∑𝑦 ∑(𝑦 − 𝑦̅ ) 2
= 219 = 31.4358 = 169 = 26.8336
Now 𝑆2 = 1
[(∑(𝑥 − ̅𝑥 ) 2 + ∑(𝑦 − 𝑦)2]
𝑛1+𝑛2−2
= 1 [31.4358 + 26.8336]
11
= 1 (58.2694)
11
= 5.23
∴ S = √5.23 = 2.3
𝑆√( + )
𝑛1 𝑛2
31.286 − 28.16
= = 2.443
1 1
2.3 (√ + )
7 6
∴ 𝑡𝑐𝑎𝑙 = 2.443
iv) Degrees of freedom = 7+6-2 =11
Hence we conclude that both horses do not have the same running capacity.
6. Ten soldiers participated in a shooting competition in the first week. After intensive
training they participated in the competition in the second week . Their scores before
and after training are given below :
Scores 67 24 57 55 63 54 56 68 33 43
before
Scores 70 38 58 58 56 67 68 75 42 38
after
Do the data indicate that the soldiers have been benefited by the training.
Sol: Given 𝑛1 = 10 , 𝑛2 = 10
We first compute the sample means and standard deviations
̅𝑥 = Mean of the first sample = 1 (67 + 24 + 57 +55+63+54+56+68+33+43)
10
= 1 (520) = 52
10
𝑦
̅ = Mean of the second sample = 1
(70+38+58+58+56+67+68+75+42+38)
10
= 1 (570) = 57
10
𝑥 𝑥 − 𝑥̅ (𝑥 − ̅𝑥 ) 2 𝑦 𝑦 − 𝑦̅ (𝑦 − 𝑦̅ ) 2
67 15 225 70 13 169
24 -28 784 38 -19 361
57 5 25 58 1 1
55 3 9 58 1 1
63 11 121 56 -1 1
54 2 4 67 10 100
56 4 16 68 11 121
68 16 256 75 18 324
33 -19 361 42 -15 225
43 -9 81 38 -19 361
∑ 𝑥 = 520 ∑(𝑥 − ̅𝑥 ) 2 ∑ 𝑦 = 570 ∑(𝑦 − 𝑦̅ ) 2
= 1882 = 1664
Now 𝑆2 = 1
[(∑(𝑥 − ̅𝑥 ) 2 + ∑(𝑦 − 𝑦)2]
𝑛1+𝑛2−2
= 1 [1882 + 1664]
18
= 1 (3546)
18
= 197
∴ S = √197 = 14.0357
i) Null Hypothesis 𝐻0: 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 < 𝜇2 (Left one tailed test)
iii) Test Statistic : 𝑡 = 𝑥̅1− 𝑥̅2
𝑐𝑎𝑙 1 1
𝑆√( + )
𝑛1 𝑛2
52 − 57
=
1 1
14.0357 (√ + )
10 10
= 3546 = −0.796
18
∴ 𝑡𝑐𝑎𝑙 = -0.796
Hence we conclude that the soldiers are not benefited by the training.
7. The blood pressure of 5 women before and after intake of a certain drug are given
below:
Before 110 120 125 132 125
After 120 118 125 136 121
Test whether there is significant change in blood pressure at 1% los?
Sol: Given n = 5
i) Null Hypothesis 𝐻0: 𝜇1 = 𝜇2
ii) Alternative Hypothesis 𝐻1 : 𝜇1 < 𝜇2 (Left one tailed test)
𝑑̅
iii) Test Statistic 𝑡𝑐𝑎𝑙 = 𝑠
⁄ 𝑛
√
∑d
̅=
where d and 𝑆2 = 1 ̅)2
∑(𝑑 − 𝑑
n 𝑛−1
∴ S = 5.477
𝑑̅
𝑡𝑐𝑎𝑙 = 𝑠 = 5.4772 = 0.862
⁄√𝑛 ⁄
√5
̅ = −12
𝑑 = -1.2
10
84− (−1.2)2x 10
𝑆2 = = 7.73
9
∴ S = 2.78
𝑑̅ −1.2
𝑡𝑐𝑎𝑙 = 𝑠 = 2.78 = -1.365 and d.f = n-1 = 9
⁄ 𝑛 ⁄
√ √10
Hence we conclude that there is no significant difference in memory capacity after the
training program.
Chi-Square (𝝌𝟐) Distribution
Chi square distribution is a type of cumulative probability distribution . probability
distributions provide the probability of every possible value that may occur . Distributions that
are cumulative give the probability of a random variable being less than or equal to a particular
value. Since the sum of the probabilities of every possible value must equal one , the total area
under the curve is equal to one . Chi square distributions vary depending on the degrees of
freedom. The degrees of freedom is found by subtracting one from the number of categories in
the data .
If the calculated value of 𝝌𝟐 is greater than the table value, the fit is considered to be poor.
ii) Altenative hypothesis: H1 : There is some difference in given values and calculated
values
(O−E)2
iii) Test Statistic 𝛘𝟐 cal = ∑ E
iv) At specified level of significance for n-1 d.f if the given problem is binomial
distribution
At specified level of significance for n-2 d.f if the given problem is Poisson distribution
v)Conclusion :If 𝝌𝟐cal value < 𝝌𝟐tab value , then we accept H0 , Otherwise reject H0.
2. Chi – Square test for independence of attributes :
Definition : An attribute means a quality or characteristic
Eg: Drinking, Smoking, blindness, Honesty, beauty etc.,
A is divided into two classes and B is divided into two classes. The various cell frequencies
can be expressed in the following table known as 2x2 contingency table.
a b a+b
c d c+ d
a + c b + d N =a + b + c + d
The expected frequencies are given by
(𝑎 + 𝑐)(𝑎 + 𝑏)
𝐸(𝑎) =
𝑁
(𝑏 + 𝑐)(𝑎 + 𝑏)
𝐸(𝑏) =
𝑁
(𝑎 + 𝑐)(𝑐 + 𝑑)
𝐸(𝑐) =
𝑁
(𝑏 + 𝑑)(𝑐 + 𝑑)
𝐸(𝑑) =
𝑁
𝟐 (𝑂 − 𝐸)2
𝝌 𝑐𝑎𝑙 =∑
𝐸
𝝌𝟐𝑐𝑎𝑙 value to be compared with 𝝌𝟐𝑡𝑎𝑏 value at 1% (5.1 or10%) level of significance for
c-number of columns.
Note: In 𝝌𝟐 distribution for independence of attributes, we test if two attributes A and B are
independent or not.
(O − E)2
iii) Test Statistic χ2 cal = ∑
E
iv) At specified level of significance for (m-1) (n-1) d.f where m- no. of rows and n- no. of
columns
v) Conclusion : If 𝛘𝟐cal value < 𝛘𝟐tab value , then we accept H0 , Otherwise reject H0.
Problems :
1. Fit a Poisson distribution to the following data and test for its goodness of fit at 5%
los
x 0 1 2 3 4
f 419 352 154 56 19
Sol:
X f fx
0 419 0
1 352 352
2 154 308
3 56 168
4 19 76
N=1000 ∑ 𝑓𝑥 = 904
x 0 1 2 3 4 Total
f = 1000 x 406.2 366 165.4 49.8 12.6 1000
𝑒−𝜆𝜆𝑥
𝑥!
(𝑂−𝐸)2
iii) 𝝌𝟐 𝑐𝑎𝑙 = ∑ 𝐸
O E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
419 406.2 (419 − 406.2)2 (419 − 406.2)2
406.2
352 366 (352 − 366)2 (352 − 366)2
366
154 165.4 (154 − 165.4)2 (154 − 165.4)2
165.4
3. A die is thrown 264 times with following results. Show that the die is biased [ Given
𝝌𝟐𝟎.𝟎𝟓 = 11.07 for 5 d.f]
No. appeared 1 2 3 4 5 6
on the die
Frequency 40 32 28 58 54 52
(𝑂−𝐸)2
iii) 𝛘𝟐 cal = ∑ 𝐸
Calculation of 𝝌𝟐:
O E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
40 44 16 0.3636
32 44 144 3.2727
28 44 256 5.8181
58 44 196 4.4545
54 44 100 2.2727
52 44 64 1.4545
(𝑂−𝐸)2
∑ = 17.6362
𝐸
𝝌𝟐𝑐𝑎𝑙 = 17.6362
(𝑂 − 𝐸)2
iii) 𝛘𝟐 cal = ∑
𝐸
90 x 100 90 x 100 90
200 = 45 200 = 45
100x 110 100 x 110 11
200 = 55 200 = 55
100 100 200
𝐑𝐨𝐰 𝐭𝐨𝐭𝐚𝐥 𝐱 𝐂𝐨𝐥𝐮𝐦𝐧 𝐭𝐨𝐭𝐚𝐥
where E = 𝐆𝐫𝐚𝐧𝐝 𝐭𝐨𝐭𝐚𝐥
Calculation of 𝝌𝟐:
O E (𝑂 − 𝐸)2 (𝑂 − 𝐸)2
𝐸
60 45 225 5
30 45 225 5
40 55 225 4.09
70 55 225 4.09
(𝑂−𝐸)2
∑ = 18.18
𝐸
𝝌𝟐𝑐𝑎𝑙 = 18.18
Hence we conclude that new and conventional treatment are not independent.
Snedecor’s F- Test of Significance
The F-Distribution is also called as Variance Ratio Distribution as it usually defines the ratio
of the variances of the two normally distributed populations. The F-distribution got its name
after the name of R.A. Fisher, who studied this test for the first time in 1924.
Greater Variance
Fcal =
Smaller Varinace
2
𝑆12 Or 𝑆2
𝐹𝑐𝑎𝑙 = 𝑆22 𝑆12
Where, 𝑛1𝑠12 1
𝑆 2 is the unbiased estimator of σ 2
and is calculated as: 𝑆 2 = = ∑(𝑥 − ̅𝑥̅) 2
1 1 1 𝑛1−1 𝑛1−1 1 1
𝑛2𝑠22 1
𝑆 2 is the unbiased estimator of σ 2 and is calculated as: 𝑆 2 = = ∑(𝑥 − ̅𝑥̅) 2
2 2 2 𝑛2−1 𝑛2−1 2 2
To test the hypothesis that the two population variances 𝝈𝟏𝟐 and 𝝈𝟐𝟐 are
equal
i) H0 : σ12 = σ22
𝐆𝐫𝐞𝐚𝐭𝐞𝐫 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞
iii) Fcal = 𝐒𝐦𝐚𝐥𝐥𝐞𝐫 𝐕𝐚𝐫𝐢𝐧𝐚𝐜𝐞
v) If 𝐅cal value < 𝐅tab value , then we accept H0 , Otherwise reject H0.
𝐹𝑐𝑎𝑙(𝜗1, 𝜗2) is the value of F with 𝜗1 and 𝜗2 degrees of freedom such that the area under the
F – distribution to the right of 𝐹𝛼 is 𝛼.
Problems:
1. In one sample of 8 observations from a normal population, the sum of the squares of
deviations of the sample values from the sample mean is 84.4 and in another sample
of 10 observations it was 102.6. Test at 𝟓% level whether the populations have the
same varience.
Sol: Let 𝜎12 and 𝜎22 be the variances of the two normal populations from which the
samples are drawn.
Here 𝑛1 = 8, 𝑛2 = 10
2
Also ∑(𝑥 𝑖− ̅𝑥 ) 2 = 84.4, ∑(𝑦 𝑖− 𝑦̅ ) = 102.6
1 2
84.4
∑(𝑥𝑖 − ̅𝑥 ) = = 12.057
2
𝑆1 =
𝑛1 − 1 7
and
2 1 2 102.6
𝑆2 = ∑(𝑦𝑖 − 𝑦̅ ) = = 11.4
𝑛2 − 1 9
and 𝑣2 = 𝑛2 − 1 = 10 − 1 = 9
i.e.,𝐹0.05(7,9) = 3.29
Method II 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population from which
these samples are drawn do not differ significantly?
Sol: Let the Null Hypothesis be 𝐻0: 𝜎12 = 𝜎22 where 𝜎12 and 𝜎22 are the variances of the
two populations from with the samples are drawn.
𝑥 𝑥 − 𝑥̅ (𝑥 − ̅𝑥 ) 2 𝑦 𝑦 − 𝑦̅ (𝑦 − 𝑦̅ ) 2
20 -2.3 5.29 27 -7.4 54.76
16 -6.3 39.69 33 -1.4 1.96
26 3.7 13.69 42 7.6 57.76
27 4.7 22.09 35 0.6 0.36
23 0.7 0.49 32 -2.4 5.76
22 -0.3 0.09 34 -0.4 0.16
38 3.6 12.96
134 81.34 241 133.72
𝑛1 = 6, 𝑛2 = 7
∑𝑥 134 ∑𝑦 241
∴ 𝑥̅ = = = 22.3, 𝑦̅= = = 34.
𝑛1 6 𝑛2 67
2
∑(𝑥 𝑖− ̅𝑥 ) 2 = 81.34, ∑(𝑦 𝑖− 𝑦̅ ) = 133.72
1 2
81.34
∑(𝑥𝑖 − ̅𝑥 ) = = 16.26
2
𝑆1 =
𝑛1 − 1 5
and
2 1 2 133.72
𝑆2 = ∑(𝑦𝑖 − 𝑦̅ ) = = 22.29
𝑛2 − 1 6
Let 𝐻0 be true
F0.05(5,6) d. f = 4.39
Since calculated F < tabulated F , we accept the null hypothesis 𝐻0 at 5% los i.e., there is no
significant difference between the variances of the distribution by the workers.
TUTORIAL QUESTIONS
1. A random sample of 500 apples was taken from a large consignment and 60 were found
to be bad. Obtain 95% confidence interval for the percentage number of bad apples in
the consignment.
2. The average income of 100 people of a city is Rs 210 with a standard deviation of Rs
10.For another sample of 150 people the average income is Rs 220 with a standard
deviation of Rs 12.Test the significant difference between two mean at 5% LOS.
3. A coin is tossed 960 times .Head turned up 184 times. Find whether the coin is unbiased.
4. Random samples of 600 men and 900 women in a locality were asked they would like to
have a bus stop near their residence .350 men and 475 women were in favor of the
proposal. Test the significance between the difference of two proportions at 5%LOS.
5. .A pair of dice are thrown 360 times and the frequency of each sum is indicated below:
Sum 2 3 4 5 6 7 8 9 10 11 12
Frequency 8 24 35 37 44 65 51 42 26 14 14
Would you say that the dice are fair on the basis of the chi-square test at 5% LOS
6. The following are the average weekly losses of worker hours due to accidents in 10
industrial plant before and after a certain safety programme was put into operation:
Before 45 73 46 124 33 57 83 34 26 17
After 36 60 44 119 35 51 77 29 24 11
Test whether the safety programme is effective in reducing the number of accidents at 5%
ASSIGNMENT QUESTIONS
1. A random sample of 500 items has mean 20 and another sample of size 400 has mean 15.
Can you conclude that the two samples are taken from the same population with SD as 4.
2. A sample of 500 products are examined from a factory and 5% found to be defective.
Another sample of 400 similar products are examined and 3% found to be defective.
Test the significance between the difference of two proportions at 5% LOS.
3. 20 people were attacked by a disease and only 18 survived .will you reject the hypothesis
that the survival rate of the attack by this disease is 85% in favor of the hypothesis that is
more at 5% LOS
4. Ten specimens of copper wires drawn from a large lot have the following breaking
strength(in kg) 518,572,570,568,572,578,572,569,548.Test whether the mean breaking
strengths of the lot may be taken to be 518 kg weight.
5. 4.A survey of 320 families with 4children each revealed the following distribution
No# of boys 5 4 3 2 1 0
No# of girls 0 1 2 3 4 5
No# of families 14 56 110 88 40 12
Is this result consistent with the hypothesis that male and female births are equally
popular?
R15
Code No: R15A0024
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
II B.Tech I Semester Regular/Supplementary Examinations, November 2017
Probability and Statistics
(CSE, IT)
Roll No
4. Calculate the correlation coefficient for the following heights (in inches) of fathers (X)
and their sons (Y):
X : 65 66 67 67 68 69 70 72
Y : 67 68 65 68 72 72 69 71
(OR)
5. Fit a linear regression equation of Y on X to the following data:
X: 5 8 7 6 4
Y: 3 4 5 2 1
SECTION – III
6. A sample of 900 members has a mean 3.4 cms and s.d 2.61 cms. Is the sample drawn from
a large population of mean 3.25 cms and s.d 2.61 cms? 5% level.
(OR)
7. In a sample of 1000 people in a state, 540 are rice eaters and the rest are wheat eaters. Can
we assume that both rice and wheat eaters are equally popular in this state at 1% level of
significance?
SECTION – IV
8. A sample of 10 boys has the I.Q’s 70, 120, 110,101, 88, 83, 95, 98, 107 and 100. Test the
mean I.Q of the students is 100 at 0.05 level of significance.
(OR)
9. A survey of 320 families with 5 children each, revealed the following distribution. Is the
result consistent with the hypothesis that male and female births are equally probable at
0.01 significance level?
No. of Boys : 5 4 3 2 1 0
No. of Girls : 0 1 2 3 4 5
No. of families: 14 6 110 88 40 12
SECTION – V
10. A television repairman finds that the time spent on his jobs an exponential distribution with
mean 30 minutes. If he repairs sets in the order in which they came in, and if the arrival of
sets follows a poisson distribution approximately with an average rate of 10 per 8-hour
daily, what is the repairman’s expected idle time each day? How many jobs are a head of
the average set just brought in?
(OR)
11. Explain Markov chain by an example.
******
Code No: R17A0024 R17
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
II B. Tech I Semester Regular Examinations, November 2018
Probability and Statistics
(CSE& IT)
Roll No
SECTION-I
1 a) Average number of accidents on any day on a national highway is 1.8. Determine the [7M]
probability that number of accidents are
(I) At least one
(II) Atthe most one
b) Four coins are tossed 160 times. The number of times X heads occurs is given below [7M]
X 0 1 2 3 4
No of times 8 34 69 43 6
Fit the binomial distribution for the above data
OR
SECTION-III
5 a) Explain in brief one tailed and two tailed tests [4M]
b) A random sample of 400 students is found to have a mean height of 171.38 cms. Can it be
reasonably regarded as a sample from a large population with mean height 171.17 cms. and [5M]
standard deviation 3.30 cms. (Test at 5% level of significance
c) A random sample of 500 apples was taken from a large consignment and 60 were found
bad. Obtain the 98% confidence limits for the percentage of bad apples in the consignment
(given z = 2.33) [5M]
OR
6 Random samples of 400 men and 600 women were asked whether they would like to have a flyover [14M]
near their residence. 200 men and 325 women were in favour of the proposal. Test the hypothesis that
proportions of men and women in favour of the proposal are same at 5% level.
SECTION-IV
7 A sample of 10 boys has the I.Q’s 70, 120, 110,101, 88, 83, 95, 98, 107 and 100. Test the mean I.Q [14M]
of the students is 100 at 0.05 level of significance.
OR
8 Fit a Poisson distribution to the following data and test the goodness of fit: [14M]
No. of accidents: 0 1 2 3 4 5 6
No. of days : 150 65 45 34 10 6 2
SECTION-V
9 What are the measures of queuing model M / M /1: N / FCFS . [14M]
b A self service canteen employs one cashier at its counter. 8 customers arrive per every 10 minutes
on an average. The cashier can serve on average one per minute. Assuming that arrivals are Poisson
and the service time distribution is exponential. Determine
i) The average number of customers in the system
ii) The average queue length
iii) The average time a customer spends in the system
iv) Average waiting time of each customer.
OR
10 a) Define Markov chain.Give examples. [14M]
b) Explai about limiting distribution of a Markov chain
**********
R15
Code No: R15A0024
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
II B.Tech I Semester supplementary Examinations, November 2018
Probability and Statistics
(CSE &IT)
Roll No
SECTION-IV
8 200 digits were chosen at random from a set of tables. The frequencies of the [10M]
digits were
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 18 19 23 21 16 25 22 20 21 15
Use χ2 test to assess the correctness of the hypothesis that the digits were
distributed in equal numbers in the table at 0.05 level of significance.
OR
9 Two horses A and B were tested according to the time (in seconds) to run a [10M]
particular track with the following results.
Horse 28 30 32 33 33 29 34
A
Horse 29 30 30 24 27 29
B
Test whether the two horses have the same running capacity.
SECTION-V
10 Assume the goods trains are coming in a yard at the rate of 30 trains per day and [10M]
suppose that inter-arrival time follows an exponential distribution. The service
time for each train is assumed to be exponential with an average of 36 minutes.
If the yard can admit 9 trains at a time, calculate the probability that the yard is
empty and find the average queue length.
OR
11 A training process is considered as two state Markov chain. If it rains, it is [10M]
considered to be in state 0 and it does not rain, the chain is in state of 1. The
0.6 0.4
transition probability of the Markov chain is defined by P=[ ]. Find the
0.2 0.8
probability that it will rain for three days from today assuming that is raining
today. Assume the mutual probabilities of state 0 or state 1 as 0.4 and 0.6
respectively.
******
R17
Code No: R17A0024
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
II B.Tech I Semester Supplementary Examinations, May 2019
Probability and Statistics
(CSE & IT)
Roll No
SECTION-II
3 Find the karlpearson’s coefficient of correlation for the paired data:
Cost of living 98 99 99 95 92 95 94 90 91 97
OR
4 a)In a record of an analysis of correlation data, the following results are readable
Variance of X = 9; Regression equations: 8X-10Y+66 = 0 and
40X-18Y = 214. Find (i) the mean values of X and Y (ii) The correlation coefficient
between X and Y and (iii) The standard deviation of Y
b) Find the spearman rank correlation coefficient to the following data:
X: 11 12 43 84 15
Y: 8 15 30 60 12
SECTION-III
5 a) Explain in brief. (1) Type-I-error (2) Type-II-error
b) Discuss the test procedure for testing single mean of the population when
arge.
c) Nine Determinations of specific heat of iron had a SD of 0.0086. [14M]
[7M]
[7M]
[3M]
[4M]
[7M]
Assuming that the determinatins constitute a random sample from normal
population, test the claim that the SD σ is less than 0.01 at 95% confidence.
OR
6 a) Random samples of 400 men and 600 women were asked whether they would [7M]
like to have a flyover near their residence. 200 men and 325 women were in
favour of the proposal. Test the hypothesis that proportions of men and
women in favour of the proposal are same at 5% level.
b) In a large consignment of oranges random sample of 64 oranges revealed [7M]
that14 oranges were band. Is it reasonable to ensure that 20% are bad?
SECTION-IV
7 Some investigators have proposed that students have elevated blood pressure during finals [14M]
week. To test this hypothesis 8 students volunteered to have their blood pressure taken at
the beginning of the semester and then again during finals week. The blood pressure data
(diastolic) is listed below. Test the hypothesis to determine if there is a significant
difference in blood pressure levels between the first week and finals week.
Student: A B C D E F G H
1st Week: 89 76 84 89 74 66 56 74
Final: 92 78 92 94 93 74 65 73 (table value t (0.05, 7) =2.365)
OR
8 Fit a Poisson distribution to the following data and test the goodness of fit: [14M]
No. of accidents: 0 1 2 3 4 5 6
No. of days : 150
65 45 34 10 6 2
SECTION-V
9 a)What are the measures of queuing model M / M /1: / FCFS [4M]
b) In railway marshalling yard goods trains arrive at a rate of 30 trains per day. Assuming [10M]
that the inter arrival time follows an exponential distribution and service time distribution is
also exponential with an average 36 minutes. Calculate:
(1) The mean queue size
(2) The probability that the queue length exceeds 10.
OR
10 A gambler has Rs.2. He bets Rs.1 at a time and wins Rs.1 with probability 0.5. [14M]
He stops playing if he loses Rs.2 or wins Rs.4.
a) What is the transition probability matrix of the related Markov chain?
b) What is the probability that he has lost his money at the end of 5 plays?
**********
R15
Code No: R15A0024
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
II B. Tech I Semester Supplementary Examinations, May 2018
Probability and Statics
(CSE & IT)
Roll No
(b) A problem in statistics is given to three students A, B and C, whose chances of solving
it are respectively.1/2, 1/3 and 1/4. What is the probability that the problem will be
solved? (3M)
(c) What are the normal equations to fit a straight line equation? (2M)
(d) From the following data, compute the coefficient of correlation between X and Y. (3M)
X Series Y Series
No. of Items 15 15
Arithmetic Mean 25 18
SECTION – I
4. Find the spearman rank correlation coefficient to the following data: (10M)
X: 11 12 43 84 15
Y: 8 15 30 60 12
(OR)
5. Estimate the production for the year 2008, by fitting regression line to the following data:
(10M)
6. The means of two large samples of 1000 and 2000 items are 67.5 cms and 68.0cms
respectively. Can the samples be regarded as drawn from the population with standard
deviation 2.5 cms. Test at 5% level of significance.. (10M)
(OR)
7. A random sample of 500 apples was taken from a large consignment and 60 were found
bad. Obtain the 98% confidence limits for the percentage of bad apples in the
consignment. (given z = 2.33) (10M)
SECTION – IV
8. Fit a Poisson distribution to the following data and test the goodness of fit: (10M)
No. of accidents : 0 1 2 3 4 5 6
No. of days : 150 65 45 34 10 6 2 .
(OR)
9. A sample of 10 boys has the I.Q’s 70, 120, 110,101, 88, 83, 95, 98, 107 and 100. Test the
mean I.Q of the students is 100 at 0.05 level of significance. (10M)
SECTION – V
10. What are the characteristics of a queuing system explain them in detail? (10M)
(OR)
11. Explain stochastic processes in detail. (10M)
*******