DSSM
DSSM
Unit – I
Data Representation
Let X and Y be two random variables with means X and Y respectively. Then the covariance
Cov X,Y E XY E X E Y
Cov X,Y E X EX Y EY
Cov X,Y E XY XE Y YE X E X E Y
E XY E Y E X E X E Y E X E Y
Cov X,Y E XY E X E Y
Note:
• Covariance of two independent variables is zero. But converse need not be true.
• If a and b are constants
Cov(X + a,Y + b) = Cov(X,Y)
Cov(aX,bY) = abCov(X,Y)
1 n
• Population Covariance of X and Y is given Cov X,Y xi x y i y
Ni 1
1 n
• Sample Covariance of X and Y is given Cov X,Y xi x y i y
N 1i 1
• Cov X,X Var X and Cov Y,Y Var Y
04. 05.
X 65.21 64.75 65.56 66.45 65.34 X 5 6 8 11 4 6
Y 67.15 66.29 66.2 64.7 66.54 Y 1 4 3 7 9 2
06.
X 13 15 17 18 19
Y 10 11 12 14 16
Number of samples N = 4
1 1
Computation table: Mean values x = ( x i ) and y = ( yi )
N N
x y ( x i − x ) ( x i − x )2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
2.1 8 -0.95 0.903 -3 9 2.85
2.5 12 -0.55 0.303 1 1 -0.55
4 14 0.95 0.903 3 9 2.85
3.6 10 0.55 0.303 -1 1 -0.55
x = 3.05 y =11 = 2.41 = 20 = 4.6
1 2 1
Cov X,X xi x 2.41 0.803
N 1 3
1 1 1
Cov X,Y xi x yi y Cov Y,X yi y xi x 4.6 1.533 and
N 1 N 1 3
1 2 1
Cov Y,Y yi y 20 6.66
N 1 3
Number of samples N = 6
1 1
Computation table: Mean values x = ( x i ) and y = ( yi )
N N
x y (xi − x) ( x i − x )2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
98 15 9.667 93.444 3.333 11.111 32.222
87 12 -1.333 1.778 0.333 0.111 -0.444
90 10 1.667 2.778 -1.667 2.778 -2.778
85 10 -3.333 11.111 -1.667 2.778 5.556
95 16 6.667 44.444 4.333 18.778 28.889
75 7 -13.333 177.778 -4.667 21.778 62.222
x =88.333 y =11.667 = 331.333 = 57.333 =125.666
1 2 1
Cov X,X xi x 331.333 66.267
N 1 5
1 1 1
Cov X,Y xi x yi y Cov Y,X yi y xi x 125.666 25.133
N 1 N 1 5
1 2 1
and Cov Y,Y yi y 57.333 11.467
N 1 5
Number of samples N = 5
1 1
Computation table: Mean values x = ( x i ) and y = ( yi )
N N
x y (xi − x) ( x i − x )2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
13 10 -3.400 11.560 -2.600 6.760 8.840
15 11 -1.400 1.960 -1.600 2.560 2.240
17 12 0.600 0.360 -0.600 0.360 -0.360
18 14 1.600 2.560 1.400 1.960 2.240
19 16 2.600 6.760 3.400 11.560 8.840
x =16.4 y =12.6 =23.2 =23.2 = 21.8
1 2 1
Cov X,X xi x 23.2 5.8
N 1 4
1 1 1
Cov X,Y xi x yi y Cov Y,X yi y xi x 21.8 5.45 and
N 1 N 1 4
1 2 1
Cov Y,Y yi y 23.2 5.8
N 1 4
Number of samples N = 5
1 1
Computation table: Mean values x = ( x i ) and y = ( yi )
N N
x y ( x i − x ) ( x i − x )2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
65.21 67.15 -0.252 0.064 0.974 0.949 -0.245
64.75 66.29 -0.712 0.507 0.114 0.013 -0.081
65.56 66.2 0.098 0.010 0.024 0.001 0.002
66.45 64.7 0.988 0.976 -1.476 2.179 -1.458
65.34 66.54 -0.122 0.015 0.364 0.132 -0.044
x = 65.462 y = 66.176 =1.571 =3.273 = −1.827
1 2 1
Cov X,X xi x 1.571 0.393
N 1 4
1 1 1
Cov X,Y xi x yi y Cov Y,X yi y xi x 1.827 0.457
N 1 N 1 4
1 2 1
and Cov Y,Y yi y 3.273 0.818
N 1 4
1 1
Step-02: Compute the mean vector (µ), using the formulae X x i and Y yi
N N
1 2 1
Cov X,X xi x , Cov X,Y xi x yi y ,
N 1 N 1
1 1 2
Cov Y,X y i y xi x and Cov Y,Y yi y
N 1 N 1
Step-05: Calculate the Eigen vectors and Eigen values of the covariance matrix.
xj x
Step-07: Derive the new data set, Pij eiT
yj y
x 2 3 4 5 6 7
y 1 5 3 6 7 8
Solution: Given data
Number of samples N = 6
1 1
Computation table: Mean of X is x = ( x i ) and Mean of Y is y = ( yi )
N N
x y (xi − x) ( x i − x )2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
2 1 -2.5 6.25 -4 16 10
3 5 -1.5 2.25 0 0 0
4 3 -0.5 0.25 -2 4 1
5 6 0.5 0.25 1 1 0.5
6 7 1.5 2.25 2 4 3
7 8 2.5 6.25 3 9 7.5
x = 4.5 y =5 =17.5 = 34 = 22
1 2 1
Cov X,X xi x 17.5 3.5
N 1 5
1 1 1
Cov X,Y xi x yi y Cov Y,X yi y xi x 22 4.4 and
N 1 N 1 5
1 2 1
Cov Y,Y yi y 34 6.8
N 1 5
3.5 λ 4.4
A λI 0
4.4 6.8 λ
Clearly, the first Eigen value is very small compared to the second Eigen value.
6.35 4.4 x
A λI X 0
4.4 3.05 y
x y
3.05 4.4
x = 3.05; y = 4.4
3.05
The Eigen vector is X = , the principal component.
4.4
X 1 3.05 0.57
Now the normalized Eigen vector is e1 = = =
X 5.35 4.4 0.822
xi x
The new data set is, Pij e iT
yi y
2 4.5 2.5
P11 e1T [0.57 0.822] 4.71
1 5 4
3 4.5 1.5
P12 e1T [0.57 0.822] 0.86
3 5 0
4 4.5 0.5
P13 e1T [0.57 0.822] 1.93
3 5 2
6 4.5 1.5
P15 e1T [0.57 0.822] 2.45
7 5 2
7 4.5 2.5
P16 e1T [0.57 0.822] 3.89
8 5 3
X 2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1
Y 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9
Solution: Given data
1 1
Computation table: Mean of X is x = ( x i ) and Mean of Y is y = ( yi )
N N
x Y ( x i − x ) ( x i − x ) 2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
2.5 2.4 0.69 0.48 0.49 0.24 0.34
0.5 0.7 -1.31 1.72 -1.21 1.46 1.59
2.2 2.9 0.39 0.15 0.99 0.98 0.39
1.9 2.2 0.09 0.01 0.29 0.08 0.03
3.1 3 1.29 1.66 1.09 1.19 1.41
2.3 2.7 0.49 0.24 0.79 0.62 0.39
2 1.6 0.19 0.04 -0.31 0.10 -0.06
1 1.1 -0.81 0.66 -0.81 0.66 0.66
1.5 1.6 -0.31 0.10 -0.31 0.10 0.10
1.1 0.9 -0.71 0.50 -1.01 1.02 0.72
x =1.81 y =1.91 = 5.55 = 6.45 = 5.54
1 2 1
Cov X,X xi x 5.55 0.616
N 1 9
1 1 1
Cov X,Y xi x yi y Cov Y,X yi y xi x 5.54 0.615 and
N 1 N 1 9
1 2 1
Cov Y,Y yi y 6.45 0.716
N 1 9
0.616 λ 0.615
A λI 0
0.615 0.716 λ
Clearly, the first Eigen value is very small compared to the second Eigen value.
0.667 0.615 x
A λI X 0
0.615 0.567 y
x y
0.567 0.615
0.567
The Eigen vector is X = , the principal component.
0.615
X 1 0.567 0.678
Now the normalized Eigen vector is e1 = = =
X 0.836 0.615 0.735
xi x
The new data set is, Pij e iT
yi y
2 1.81 0.19
P17 e1T [0.678 0.735] 0.099
1.6 1.91 0.31
1 1.81 0.81
P18 e1T [0.678 0.735] 1.145
1.1 1.91 0.81
P11 P12 P13 P14 P15 P16 P17 P18 P19 P10
PCA
0.828 -1.778 0.992 0.274 1.676 0.913 -0.099 -1.145 -0.438 -1.224
Number of samples N = 5
1 1
Computation table: Mean of X is x = ( x i ) = 3 and Mean of Y is y = ( yi ) = 6
N N
x Y ( x i − x ) ( x i − x ) 2 ( yi − y ) ( y i − y ) 2 ( xi − x)( yi − y)
1 4 -2 4 -2 4 4
2 8 -1 1 2 4 -2
4 6 1 1 0 0 0
6 5 3 9 -1 1 -3
2 7 -1 1 1 1 -1
1 2 1
Cov X,X xi x 16 4
N 1 4
1 1
Cov X,Y Cov Y,X xi x yi y 2 0.5 and
N 1 4
1 2 1
Cov Y,Y yi y 10 2.5
N 1 4
4 λ 0.5
A λI 0
0.5 2.5 λ
Clearly, the first Eigen value is small compared to the second Eigen value.
0.1514 0.5 x
Consider A λI X 0
0.5 1.6514 y
x y
1.6514 0.5
−1.6514
The Eigen vector is X = , the principal component.
0.5
X 1 −1.6514 −0.8554
Now the normalized Eigen vector is e1 = = =
X 1.9306 0.5 0.2589
xi x
The new data set is, Pij e iT
yi y
2 2
P11 e1T [ 0.8554 0.2589] 1.193
2 2
1 1
P12 e1T [ 0.8554 0.2589] 1.3732
2 2
1 1
P13 e1T [ 0.8554 0.2589] 0.8554
0 0
3 3
P14 e1T [ 0.8554 0.2589] 2.8251
1 1
1 1
P15 e1T [ 0.8554 0.2589] 1.1143
1 1
01. 02.
x 4 8 13 7 x 1 2 3 4
y 11 4 5 14 y 5 7 9 13
Where is mn diagonal matrix, where non-negative real values are called singular values,
1 0 ... 0
i.e., = 0 2 ... 0
0 0 n ...0
or
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
0 −2
Solution: The given matrix is A = of order 2 2
1 5
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
0 10 −2
AT A =
−2 51 5
1 5
B = AT A =
5 29
1− 5
B−I = =0
5 29 −
2 −30+ 4 = 0
Consider B −I X = 0
−28.866068 5 x
B−I X = =0
5 −0.866068 y
x −y
=
−0.866068 5
Implies x = 0.866068; y = 5
X1 1 0.866068 0.170672
By normalizing we get v1 = = = 0.985327
X1 5.074453 5
Consider B −I X = 0
0.866069 5 x
B−I X = =0
5 28.866069 y
x −y
=
28.866069 5
Implies x = 28.866069; y = −5
x 28.866069
The Eigen vector is X2 = =
y −5
X2 1 28.866069 0.985327
By normalizing we get v2 = = = −0.170672
X 2 29.295903 −5
0.170672 0.985327
The matrix V = v1 v2 =
0.985327 −0.170672
0.170672 0.985327
VT =
0.985327 −0.170672
1
Now u1 = Av1 here 1 = 1 = 29.866068 = 5.464985
1
1 0 −20.170672
u1 =
5.464985 1 5
0.985327
1 −1.970654
u1 =
5.464985 5.097307
−0.360596
u1 =
0.932721
1 0 −2 0.985327
u2 =
0.365965 1 5
−0.170672
1 0.341344
u2 =
0.365965 0.131967
0.932723
u2 =
0.360600
−0.360596 0.932723
The matrix U = u1 u 2 =
0.932721 0.360600
l 0
The diagonal matrix D ==
0 1
5.464985 0
D ==
0 0.365965
1 2
Solution: The given matrix is A = of order 2 2
2 1
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
1 21 2
AT A = B =
2 12 1
5 4
B = AT A =
4 5
5 − 4
B−I = =0
4 5 −
2 − (10)+ (9) = 0
2 −10+ 9 = 0
−4 4 x
(B−I)X = = 0
4 −4y
x y
=
1 1
1
The Eigen vector is X1 =
1
1
X 2
By normalizing v1 = 1 =
X1 1
2
4 4x
(B−I)X = = 0
4 4y
x −y
=
4 4
x y
Implies =
1 −1
1
The Eigen Vector is X 2 =
−1
1
X 2
By Normalizing v 2 = 2 =
X 2 −1
2
1 1 1 1
2 2 2 2
V= implies V T =
1 −1 1 −1
2 2 2 2
1
1 1 2 2
u1 =
3 2 1 1
2
3
1 2
u1 =
3 3
2
1
2
u1 =
1
2
1
u2 = Av2 here 2 = 2 = 1 =1
2
1
1 1 2 2
u2 =
1 2 1 −1
2
−1
2
u2 =
1
2
1 −1
2 2
The matrix U = u1 u 2 = and
1 1
2 2
1 0
The diagonal matrix D = =
0 2
3 0
D ==
0 1
1 −1 1 1
2
2 3 0 2 2
i.e., A =
1 1 0 1 1 −1
2 2 2 2
1 0 1
Solution: The given matrix is A = of order 23
−1 1 0
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
1 −1
1 1
AT A = 0 1
0
−1 0
0
1
1
2 −1 1
B = A A = −1 1 0
T
1 0 1
2 − −1 1
B −I = −1 1− 0 = 0
1 0 1−
Or
3 − 42 + 3= 0
−1 −1 1 x
(B−I)X = −1 −2 0
y = 0
1 0 −2
z
x −y z
= =
4−0 2−0 0+ 2
x y z
= =
4 −2 2
x y z
Implies = =
2 −1 1
2
The Eigen vector is X1 = −1 implies X 1 = ( 2 ) + ( −1) + (1) = 6
2 2 2
1
2
6
X −1
By normalizing v1 = 1 =
X1
6
1
6
1 −1 1x
(B−I)X = −1 0 0
y = 0
1 0 0
z
x −y z
Expansion by second row = =
0 − 0 0 −1 0 +1
x y z
Implies = =
0 1 1
1
0
X2 1
By normalizing v 2 = =
X2 2
1
2
2 −1 1x
(B−I)X = −1 1 0
y = 0
1 0 1
z
x −y z
= =
1− 0 −1+ 0 0 −1
x y z
Implies = =
1 1 −1
1
The Eigen vector is X3 = 1 , implies X 3 = (1) + (1) + ( −1) = 3
2 2 2
−1
1
3
X 1
By normalizing v3 = 3 =
X3
3
−1
3
1
Now u1 = Av1 here 1 = 1 = 3
1
2
6
1 1 0 1 −1
u1 =
3 −1 1 0 6
1
6
3
1 6
u1 =
3 −3
6
1
2
u1 =
−1
2
1
u2 = Av2 here 2 = 2 = 1 =1
2
0
1 1 0 1 1
u2 =
1 −1 1 0 2
1
2
1 1
2 2
The matrix U = u1
u2 = and
−1 1
2 2
1 0 0
The diagonal matrix D ==
0 2 0
3 0 0
D = =
0 1 0
2 −1 1
1 1 6 6 6
2
2 3 0 0 1 1
i.e., A = 0
−1 1 0 1 0 2 2
2 2 1 1 −1
3
3 3
1 1 0
Solution: The given matrix is A = of order 23
0 1 1
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
1 0
1 1 0
A A = 1 1
T
0 1 1
0 1
1 1 0
B = A A = 1 2 1
T
0 1 1
3 − 42 + 3= 0
−2 1 0 x
(B−I)X = 1 −1 1
y = 0
0 1 −2
z
x −y z
= =
1 −2 1
x y z
Implies = =
1 2 1
1
The Eigen vector is X1 = 2
1
1
6
X1 2
By normalizing v1 = =
X1 6
1
6
0 1 0x
(B−I)X = 1 1 1
y = 0
0 1 0
z
x −y z
= =
−1 0 1
x y z
Implies = =
−1 0 1
−1
The Eigen vector is X 2 = 0
1
1 1 0x
(B−I)X = 1 2 1
y = 0
0 1 1
z
x −y z
= =
1 1 1
x y z
Implies = =
1 −1 1
1
The Eigen vector is X3 = −1
1
1
3
X −1
By normalizing v3 = 3 =
X3
3
1
3
1 −1 1 1 2 1
6 6 6
6 2 3
2 −1 T −1 1
V= 0 implies V = 0
6 3 2 2
1 1 1 1 −1 1
3 3
6 2 3 3
1
6
1 1 1 0 2
u1 =
3 0 1 1 6
1
6
3
1 6
u1 =
3 3
6
1
2
u1 =
1
2
1
u2 = Av2 here 2 = 2 = 1 =1
2
−1
2
1 1 1 0
u2 = 0
1 0 1 1
1
2
−1
2
u2 =
1
2
1 −1
2 2
The matrix U = u1 u 2 = and
1 1
2 2
3 0 0
D = =
0 1 0
1 2 1
1 −1 6 6 6
2 2 3 0 0 −1 1
i.e., A = 0
1 1 0 1 0 2 2
2 2 1 −1 1
3 3
3
3 1 1
Solution: The given matrix is A =
− of order 23
1 3 1
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
3 −1
3 1
A A = 1 3
T 1
−1 1
1
3
1
10 0 2
B = A A = 0 10 4
T
2 4 2
3 − 222 +120= 0
−2 0 2 x
(B−I)X = 0 −2 4
y = 0
2 4 −10
z
x −y z
= =
4 −8 4
x y z
Implies = =
1 2 1
1
The Eigen vector is X1 = 2
1
1
6
X1 2
By normalizing v1 = =
X1 6
1
6
0 0 2 x
(B−I)X = 0 0 4
y = 0
2 4 −8
z
x −y z
= =
−16 −8 0
x y z
Implies = =
−2 1 0
−2
The Eigen vector is X 2 = 1
0
10 0 2x
(B−I)X = 0 10 4
y = 0
2 4 2
z
x −y z
= =
4 −8 −20
x y z
Implies = =
1 2 −5
1
The Eigen vector is X3 = 2
−5
1
30
X3 2
By normalizing v3 = =
X3 30
−5
30
1 −2 1 1 2 1
6 5 30 6 6 6
2 1 2 T −2 1
V= implies V = 0
6 5 30 5 5
1 −5 1 2 −5
0
6 30
30 30 30
1
6
1 3 1 1 2
u1 =
2 3 −1 3 1 6
1
6
6
1 6
u1 =
2 3 6
6
1
2
u1 =
1
2
1
u2 = Av2 here 2 = 2 = 10
2
−2
5
1 3 1 1 1
u2 =
10 −1 3 1 5
0
−1
2
u2 =
1
2
1 −1
2 2
The matrix U = u1 u 2 = and
1 1
2 2
2 3 0 0
D ==
0 10 0
1 2 1
−1
1
6 6 6
2 2 2 3 0 0 −2 1
i.e., A = 0
1 1 0 10 0 5 5
2 2 1 2 −5
30
30 30
1 −1 3
Solution: The given matrix is A = of order 23
3 1 1
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
1 3
1 −1 3
A A = −1 1
T
3 1 1
3 1
10 2 6
B = A A = 2 2 −2
T
6 −2 10
10 − 2 6
The characteristic equation is B −I = 2 2 − −2 = 0
6 −2 10 −
2 −2 10 6 10 2
+ +
−2 10 6 10 2 2
3 − (Trace) 2 + (Sumof monorsof principaldiagonal)− det ( V) = 0
3 − 222 + 96− 0 = 0
3 − 222 + 96= 0
−6 2 6 x
(B−I)X = 2 −14 −2
y = 0
6 −2 −6
z
x y z
Implies = =
1 0 1
1
The Eigen vector X1 = 0
1
1
2
X1
By normalizing v1 = = 0
X1
1
2
4 2 6 x
(B−I)X = 2 −4 −2
y = 0
6 −2 4
z
x −y z
= =
−20 20 20
x y z
Implies = =
−1 −1 1
−1
The Eigen vector X 2 = −1
1
−1
3
X −1
By normalizing v 2 = 2 =
X2
3
1
3
x −y z
= =
16 32 −16
x y z
Implies = =
−1 2 1
−1
The Eigen vector X3 = 2
1
−1
6
X 2
By normalizing v3 = 3 =
X3
6
1
6
1 −1 −1 1 1
0
2 3 6 2 2
−1 2 −1 −1 1
V= 0 implies V T =
3 6 3 3 3
1 1 1 −1 2 1
6 6
2 3 6 6
1
Now u1 = Av1 here 1 = 1 = 16 = 4
1
1
2
1 1 −1 3
u1 = 0
4 3 1 1
1
2
1
2
u1 =
1
2
1
u2 = Av2 here 1 = 1 = 6
2
−1
3
1 1 −1 3 −1
u2 =
6 3 1 1 3
1
3
3
1 3
u2 =
6 −3
3
1
2
u2 =
−1
2
1 1
2 2
U=
1 −1
2 2
1 0 0
The diagonal matrix D = =
0 2 0
1 1
0
1 1
2 2
2 2 4 0 0 −1 −1 1
i.e., A =
1 −1 0 6 0 3 3 3
2 2 −1 2 1
6 6
6
3 2 2
where A =
2 3 −2
3 2 2
Solution: The given matrix is A = of order 23
2 3 −2
1 1 1
Where u1 = Av1 , u 2 = Av2 … u n = Avn , here 1 = 1 , 2 = 2 … n = n
1 2 n
3 2
3 2
AT A = 2 3
2
−2
−2
2 3
2
13 12 2
B = A A = 12 13 −2
T
2 −2 8
3 −342 + 225= 0
−12 12 2 x
(B−I)X = 12 −12 −2
y = 0
2 −2 −17
z
x −y z
= =
200 −200 0
x y z
Implies = =
1 1 0
1
The Eigen vector is X1 = 1
0
1
2
X1 1
By normalizing v1 = =
X1 2
0
4 12 2 x
(B −I)X = 12 4 −2
y = 0
2 −2 −1
z
x −y z
= =
−8 −8 −32
x y z
Implies = =
1 −1 4
1
The Eigen vector is X 2 = −1
4
13 12 2 x
(B −I)X = 12 13 −2
y = 0
2 −2 8
z
x −y z
= =
100 100 −50
x y z
Implies = =
−2 2 1
−2
The Eigen vector is X3 = 2
1
−2
3
X3 2
By normalizing v3 = =
X3 3
1
3
1 1 −2 1 1
0
2 3 2 3 2 2
1 −1 2 T 1 −1 4
V= implies V =
2 3 2 3 3 2 3 2 3 2
0
4 1 −2 2 1
3 2 3 3 3 3
1
2
1 3 2 2 1
u1 =
5 2 3 −2 2
0
5
1 2
u1 =
5 5
2
1
2
u1 =
1
2
1
u2 = Av2 here 2 = 2 = 9 = 3
2
1
3 2
1 3 2 2 −1
u2 =
3 2 3 −2 3 2
4
3 2
3
1 2
u2 =
3 −3
2
1
2
u2 =
−1
2
1 0 0
The diagonal matrix D ==
0 2 0
5 0 0
D ==
0 3 0
1 1
2 0
1 1 2
2 2 5 0 0 1 −1 4
i.e., A =
1 −1 0 3 0 3 2 3 2 3 2
2 2 −2 2 1
3 3 3
Let u =a1,a 2 ,a 3 ,...a n and v =b1,b2 ,b3 ,...bn be two vectors in R n then the inner product of u
and v is denoted and defined by u,v = a1b1 + a 2 b2 + a 3b3 +...a n bn .
Orthogonal Vectors
A set of vectors u1,u 2 ,u3 ,...u n are mutually orthogonal if every pair of vectors is orthogonal.
Orthonormal Vectors
A set of vectors u1,u 2 ,u3 ,...u n is orthonormal if every has magnitude 1 and the set of vectors
are mutually orthogonal.
If u1,u 2 ,u3 ,...u n is a basis for vector space, we can construct a set of orthogonal basis
v ,v ,v ,...v as follows.
1 2 3 n
u1 = v1
u 2 ,v1
v2 = u 2 − 2
v1
v1
u 3 ,v1 u 3 ,v2
v3 = u 3 − 2
v1 − 2
v2 and so on…
v1 v2
u1 = v1 = (1, 2, 1)
u 2 ,v1
v2 = u 2 − 2
v1
v1
6
v 2 = (1, 1, 3) − (1, 2, 1)
6
v 2 = (0, −1, 2)
u 3 ,v1 u 3 ,v2
v3 = u 3 − 2
v1 − 2
v2
v1 v2
5 1
v3 = ( 2, 1, 1) − (1, 2, 1) − (0, −1, 2)
6 5
7 −7 −7
v3 = , ,
6 15 30
7 −7 −7
v1 = (1, 2, 1) , v 2 = (0, −1, 2) and v3 = , ,
6 15 30
The orthogonal basis is v1, v2 , v3
v2 1 −1 2
w2 = = (0, −1, 2) = 0, , and
v2 5 5 5
v3 30 7 −7 −7 30 − 30 − 30
w3 = = , , = , ,
v3 7 6 15 30 6 15 30
Orthonormal basis is w1, w 2 , w 3
u1 = v1 = (1, 2, 0)
u 2 ,v1
v2 = u 2 − 2
v1
v1
10
v 2 = (8, 1, − 6) − (1, 2, 0)
5
v 2 = (6, − 3, − 6)
u 3 ,v1 u 3 ,v2
v3 = u 3 − 2
v1 − 2
v2
v1 v2
0 −6
v3 = (0, 0, 1) − (1, 2, 0) − (6, − 3, − 6)
5 81
4 −2 −4
v3 = , ,
9 9 9
4 −2 −4
v1 = (1, 2, 0) , v 2 = (6, − 3, − 6) and v3 = , ,
9 9 9
The orthogonal basis is v1, v2 , v3
2 −1 −2
(6, −3, − 6) = , , and
v2 1
w2 = =
v2 9 3 3 3
v3 3 4 −2 −4 2 −1 −2
w3 = = , , = , ,
v3 2 9 9 9 3 3 3
Orthonormal basis is w1, w 2 , w 3
Solution: The three vectors are u1 = (1, −1, 1), u 2 = (1, 0, 1) and u 3 = (1, 1, 2)
u1 = v1 = (1, −1, 1)
u 2 ,v1
v2 = u 2 − 2
v1
v1
2
v 2 = (1, 0, 1) − (1, −1, 1)
3
1 2 1
v2 = , ,
3 3 3
2 2 2
2 1 2 1 2
Implies v2 = + + =
3 3 3 3
u 3 ,v1 u 3 ,v2
v3 = u 3 − 2
v1 − 2
v2
v1 v2
2 5 1 2 1
v3 = (1, 1, 2) − (1, −1, 1) − , ,
3 2 3 3 3
−1 1
v3 = , 0,
2 2
1 2 1 −1 1
v1 = (1, −1, 1) , v2 = , , and v3 = , 0,
3 3 3 2 2
v2 3 1 2 1 1 2 1
w2 = = , , = , , and
v2 2 3 3 3 6 6 6
v3 −1 1 −1 1
w3 = = 2 , 0, = , 0,
v3 2 2 2 2
Orthonormal basis is w1, w 2 , w 3
Solution: Given three vectors are u1 = (1, 1, 1, 1), u 2 = (1, 1, −1, −1) and u 3 = (0, −1, 2, 1)
u1 = v1 = (1, 1, 1, 1)
u 2 ,v1
v2 = u 2 − 2
v1
v1
0
v 2 = (1, 1, −1, −1) − (1, 1, 1, 1)
4
u3 ,v2 = (0, −1, 2, 1).(1, 1, −1, −1) = (0)(1) + (−1)(1) + (2)(−1) + (1)(−1) =−4
u 3 ,v1 u 3 ,v2
v3 = u 3 − 2
v1 − 2
v2
v1 v2
2 −4
v3 = (0, −1, 2, 1) − (1, 1, 1, 1) − (1, 1, −1, −1)
4 4
1
v3 = (0, −1, 2, 1) − (1, 1, 1, 1) + (1, 1, −1, −1)
2
1 −1 1 −1
v3 = , , ,
2 2 2 2
1 −1 1 −1
v1 = (1, 1, 1, 1) , v 2 = (1, 1, −1, −1) and v3 = , , ,
2 2 2 2
1 1 −1 −1
(1, 1, −1, −1) = , , , and
v2 1
w2 = =
v2 2 2 2 2 2
v3 1 1 −1 1 −1 1 −1 1 −1
w3 = = , , , = , , ,
v3 1 2 2 2 2 2 2 2 2
Orthonormal basis is w1, w 2 , w 3
Solution: Given three vectors are u1 = (1, 1, 1, 1), u 2 = (1, 1, 2, 4) and u 3 = (1, 2, − 4, −3)
u1 = v1 = (1, 1, 1, 1)
u 2 ,v1
v2 = u 2 − 2
v1
v1
8
v 2 = (1, 1, 2, 4) − (1, 1, 1, 1)
4
v 2 = ( −1, −1, 0, 2)
u 3 ,v1 u 3 ,v2
v3 = u 3 − 2
v1 − 2
v2
v1 v2
−4 −9
v3 = (1, 2, − 4, − 3) − (1, 1, 1, 1) − (−1, −1, 0, 2)
4 6
3
v3 = (1, 2, − 4, − 3) + (1, 1, 1, 1) + (−1, −1, 0, 2)
2
1 3
v3 = , , −3, 1
2 2
1 3
v1 = (1, 1, 1, 1) , v 2 = ( −1, −1, 0, 2) and v3 = , , −3, 1
2 2
v2 1 −1 −1 2
w2 = = (−1, −1,0, 2) = , , 0, and
v2 6 6 6 6
v3 21 3 2 3 2 −6 2 2 2
w3 = = , , − 3, 1 = , , ,
v3 5 2 2 10 10 10 10
Orthonormal basis is w1, w 2 , w 3
The least-squares method is a crucial statistical method that is practiced finding a regression
line or a best-fit line for the given pattern. This method is described by an equation with specific
parameters. The method of least squares is generously used in evaluation and regression. In
regression analysis, this method is said to be a standard approach for the approximation of sets
of equations having more equations than the number of unknowns.
For a given set of ‘n’ data points (x1 ,y1 ),(x 2 ,y 2 ),(x 3 ,y3 )...(x n ,y n ), assume that a
straight line y = f (x) = a + bx → (1) fits the given data in least square sense.
Let the observed value at x xi is y i and the corresponding value on the curve is f (x i )
Error of approximation is d i = yi − f (x i )
n n 2 n 2
n 2 n
2
di = yi − (a + bx i ) = 0
a i=1 a i=1
n 2−1
2yi − (a + bx i ) (−1) = 0
i=1
n n n n
n n
yi = na + bx i → (3)
i=1 i=1
n 2−1
2yi − (a + bx i ) (−x i ) = 0
i=1
y − a − bx )(x ) = 0
i=1
i i i
n n n
yi x i = a x i + bx 2i → (4)
i=1 i=1 i=1
Thus, two unknown parameters a and b of equation (1) can be determined by solving the two
equations (3) and (4).
equations.
Here n 10
XY a X b X 2 (3)
x y x2 xy
4 31 16 124
9 58 81 522
10 65 100 650
14 73 196 1022
4 37 16 148
7 44 49 308
12 60 144 720
22 91 484 2002
1 21 1 21
17 64 289 1088
Σx 100 Σy 544 Σx 2 1376 Σxy 6605
Y = 66.793613
When x 14 hours then the score is Y = 67.
Here n 6
X Y X2 XY
43 99 1849 4257
21 65 441 1365
25 79 625 1975
42 75 1764 3150
57 87 3249 4959
59 81 3481 4779
X = 247 Y = 486 X 2 = 11409 XY = 20485
X 10 12 13 16 17 20 25
Y 10 22 24 27 29 33 37
also find the value of X when Y 25 .
Here n 7
XY a X b X 2
X Y X2 XY
10 10 100 100
12 22 144 264
13 24 169 312
16 27 256 432
17 29 289 493
20 33 400 660
25 37 625 925
ΣX 113 ΣY 182 ΣX 2 1938 ΣXY 3186
Y na b X
XY a X b X 2 , we get
a 2.63,b 1.45
When Y 25 ,
25 2.63 1.45X
25 2.63
X 15.427
1.45
When Y 25 , X 15
Unit – II
Therefore X 0,1,2
1 2 1 1
P X 0 ;P X 1 ;P X 2
4 4 2 4
The distribution is
X 0 1 2
1 1 1
P(x xi )
4 2 4
The distribution is
X 1 2 3 4 5 6
1 1 1 1 1 1
P(x)
6 6 6 6 6 6
6
Mean μ E(x) xP(x)
x 1
60k 1
1
k
60
1 1
P(x 3) 2k 4k 6k 6
60 10
P X 5 P X 5 P X 6 P X 7 P X 8
1 2
P X 5 10k 12k 14k 4k 40k 40
60 3
Value of X 1 2 3 4 5 6 7 8
1 1 1 2 1 1 7 1
P X x
30 15 10 15 6 5 30 15
1 1 1 1 1 7 14
F x P X x 1
30 10 5 3 2 10 15
X 0 1 2 3 4 5 6 7
P(x xi ) 0 k 2k 2k 3k k2 2k 2 7k 2 k
i) Find k ii) Evaluate P(x 6),P(x 6), P(0 x 5) and P(0 x 4)
1
iii) If P(x k) , find the minimum value of k iv) Mean, Variance and Standard
2
Deviation
0 k 2k 2k 3k k 2 2k 2 7k 2 k 1
10k 2 9k 1 0
k 0.1
0 k 2k 2k 3k k 2 2k 2 0.8
2k 2 7k 2 k 0.19
k 2k 2k 3k 0.8
0 k 2k 2k 3k 0.8
1
iii) Given P(x k)
2
1 1
k 0 P(x 0) 0 , It is not True
2 2
1 1
k 2 P(x 2) 0 k 2k 0.3 , It is not True
2 2
1 1
k 3 P(x 3) 0 k 2k 2k 0.5 , It is not True
2 2
1 1
k 4 P(x 4) 0 k 2k 2k 3k 0.8 , It is True
2 2
Therefore k 4
7
iv) We know that μ E(x) xi P(xi )
i 0
7
Variance σ 2 xi2 P(xi ) μ2 02.0 12.k 2 2.2k 32.2k 42.3k 52.k 2 62.2k 2 3.404
i 0
k
Solution: Given that f x is a probability distribution of X, where x 0,1, 2, 3, 4.
2x
4
(i) We Know that f x 1
x 0
k k k k
0 1
2 4 6 8
K 1 1 1
1 1
2 2 3 4
24
K
25
x 1 2 3 4
12 6 4 3
f(x)
25 25 25 25
4
(ii) Mean μ E(x) xf(x)
x 1
12 6 4 3 48
1 2 3 4
25 25 25 25 25
4 12 2304
Variance σ 2 xf(x) μ2 1 2 3 4
x 1 25 625
i.e., X a,b max a,b find the probability distribution. X is a random variable with
Solution: Given X assign to each point a,b in S the maximum of its number
1, 1 , 1, 2 , 1, 3 , 1, 4 , 1, 5 , 1, 6 ,
2, 1 , 2, 2 , 2, 3 , 2, 4 , 2, 5 , 2, 6 ,
3, 1 , 3, 2 , 3, 3 , 3, 4 , 3, 5 , 3, 6 ,
Sample space S
4, 1 , 4, 2 , 4, 3 , 4, 4 , 4, 5 , 4, 6 ,
5, 1 , 5, 2 , 5, 3 , 5, 4 , 5, 5 , 5, 6 ,
6, 1 , 6, 2 , 6, 3 , 6, 4 , 6, 5 , 6, 6
1
Now The number 1 will appear only in once as maximum, so P X 1 1,1
36
3
The number 2 will appear three times as maximum, so P X 2 1,2 , 2,1 , 2,2
36
5
The number 3 will appear five times as maximum, so P X 3
36
7
The number 4 will appear seven times as maximum, so P X 4
36
9
The number 5 will appear nine times as maximum, so P X 5
36
11
The number 6 will appear eleven times as maximum, so P X 6
36
x 1 2 3 4 5 6
1 3 5 7 9 11
P(x)
36 36 36 36 36 36
6
Mean μ E(x) xP(x)
x 1
4 2 2 2 2 2 2
2 2 1 3 5 7 9 11 2
Variance σ xf(x) μ 1 2 3 4 5 6 4.47
x 1 36 36 36 36 36 36
791
Variance σ 2 19.98 1.99
36
x -3 6 9
1 1 1
P(x)
6 2 3
Find (i) E(X) (ii) E(X 2 ) (iii) E (2x 1)2 (iv) Mean and variance
Solution: Given
x -3 6 9
1 1 1
P(x)
6 2 3
3 1 1 1 11
(i) E(X) xP(x) 3 6 9
x 0 6 2 3 2
3 2 1 2 1 2 1 93
(ii) E(X 2 ) x2 P(x) 3 6 9
x 0 6 2 3 2
93
E(X 2 )
2
4E(X 2 ) 4E(X) 1
93 11
4 4 1 209
2 2
11
(iv) Mean μ E(x)
2
2
variance σ 2 E X 2 E(X)
2
2 93 11 251
σ
2 2 4
1
P (X = 2) = P ({(1, 1)}) =
36
2
P (X = 3) = P ({(1, 2), (2, 1)}) =
36
3
P (X = 4) = P ({(1, 3), (3, 1), (2, 2)}) =
36
4
P (X = 5) = P ({(1, 4), (4, 1), (2, 3), (3, 2}) =
36
5
P (X = 6) = P ({(1, 5), (5, 1), (4, 2), (2, 4), (3, 3)}) =
36
6
P (X = 7) = P ({(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)}) =
36
5
P (X = 8) = P ({(2, 6), (6, 2), (3, 5), (5, 3), (4, 4)}) =
36
4
P (X = 9) = P ({(3, 6), (6, 3), (4, 5), (5, 4)}) =
36
3
P (X = 10) = P ({(4, 6), (6, 4), (5, 5)}) =
36
2
P (X = 11) = P ({(5, 6), (6, 5)}) =
36
1
P (X = 12) = P ({(6, 6)}) =
36
12
Mean μ E(x) xP(x)
x 0
1 2 3 4 1
μ 2. 3. 4. 5. ... 12. 7
36 36 36 36 36
i.e., x 0,1,2,3,4
Evaluating P(X xi )
5 7
C2 C4 35
P(X 0) 12
0.07,
C4 459
5 7
C1 C3 105
P(X 1) 12
0.35,
C4 459
5 7
C2 C2 210
P(X 2) 12
0.42,
C4 459
5 7
C3 C1 70
P(X 3) 12
0.14,
C4 459
5 7
C4 C0 5
P(X 4) 12
0.01
C4 459
X 0 1 2 3 4
P(X=xi) 0.07 0.35 0.42 0.14 0.01
4
Mean μ E(x) xP(x)
x 0
Here n s 30 and
n E 5 1
Probability of getting – 4 is P 4
n s 30 6
n E 5 1
Probability of getting 4 is P 4
n s 30 6
n E 4 2
Probability of getting – 3 is P 3
n s 30 15
n E 4 2
Probability of getting 3 is P 3
n s 30 15
n E 3 1
Probability of getting – 2 is P 2
n s 30 10
n E 3 1
Probability of getting 2 is P 2
n s 30 10
n E 2 1
Probability of getting – 1 is P 1
n s 30 15
n E 2 1
Probability of getting 0 is P 0
n s 30 15
X -4 4 -3 3 -2 2 -1 0 1
1 1 2 2 1 1 1 1 1
P(X=xi)
6 6 15 15 10 10 15 15 15
1 1 2 2 1 1 1 1 1
μ 4 4 3 3 2 2 1 0 1
6 6 15 15 10 10 15 15 15
1 1 2 2 1 1 1 1 1
μ 4 4 3 3 2 2 1 0 1
6 6 15 15 10 10 15 15 15
Mean μ 0
Variance σ 2 x2 f x μ2
21 21 2 2 2 2 2 1 2 1 2 1
= ( −4 ) + ( 4 ) + ( −3) + ( 3) + ( −2 ) + ( −2 ) + ( −1) +
6 6 15 15 10 10 15
( 0 ) + (1) − 02
2 1 2 1
15 15
1 2 1 1
16 16 9 9 4 4 1 1 0
6 15 10 15
32 36 8 2
6 15 10 15
26
Variance σ 2
3
X 1 2 3 . . . n
1 1 1 1
P(X=xi) . . .
n n n n
n
Mean: We know that mean μ E(x) xP(x)
x 1
1 1 1 1
μ 1. 2. 3. ... n.
n n n n
1
1 2 3...n
n
1 n(n 1)
n 2
n 1
μ E(x)
2
n
Variance: Variance σ 2 x2 P(x) μ2
x 1
2
1 2 1 2 1 1 n 1
σ 2 12 . 2 . 3 . ... n 2 .
n n n n 2
2
1 2 2 2 2 n 1
1 2 3 ...n
n 2
2
1 n(n 1)(2n 1) n 1
n 6 2
(n 1) (2n 1) (n 1)
2 3 2
(n 1) n 1
2 6
2 n2 1
σ
12
A random variable which takes the values continuously, i.e., all possible values in a given
interval is called as continuous type random variable.
Eg: 01. The height of a student in a particular class may be between 5 and 8
i.e., X {x:5 x 8}
Let X be a continuous type of random variable. Let a function f(x) be such that
02. For X a random variable whose distribution function F x has a derivative. The function f t
x d
satisfying F x f(t)dt. Implies Probability density function f x F x .
dx
Mean (Expectation): Let X be a continuous type random variable with respective probabilities
P(x) f(x) , for x , then expectation or Mean is defined as xf(x)dx and is denoted by
In the case of continuous distribution, median is the point which divides the total area into two equal
b
parts. Thus, if X is defined from a to b and M is the median, then f(x)dx 1
a
M b
f(x)dx f(x)dx 1
a M
M 1
f(x)dx (1) and
a 2
b 1
f(x)dx (2)
M 2
Mode: Mode is the value of x for which f(x) is maximum. Thus, mode is given by f (x) 0 and
f '(x) 0, a x b
k
Solution: Given f(x) , x ,
1 x2
1
k dx 1
1 x2
k tan 1 x 1
1 1
k tan tan 1
π π
k 1
2 2
1
k
π
1
Solution: Given f(x) ,a x 1,
x2
1 1
i.e., 2
dx 1
ax
1
1
1
x a
1
1 1
a
1
2
a
1
a
2
0if x 1
4
Solution: Given F(x) k(x 1) if 1 x 3
1 if x 3
0 if x 1
d 3
(i) We know that f(x) F(x) 4k(x 1) if 1 x 3
dx
0 if x 3
1 3
f(x)dx f(x)dx f(x)dx 1
1 3
1 3
(0)dx 4k (x 1)3 dx (0)dx 1
1 3
3
(x 1)4
4k 1
4
1
k 24 0 1
1
k
16
1 3
xf(x)dx xf(x)dx xf(x)dx
1 3
1 3
x(0)dx 4k x(x 1)3 dx x(0)dx
1 3
3
4k x(x 1)3 dx
1
3
4k (x 4 3x3 3x2 x)dx
1
3
1 x5 x4 x3 x2
4 3 3
16 5 4 3 2
1
5 5 4 5 2 2
1 3 1 3 1 3 3 3 1
3 3 1
4 5 4 2
μ E(x) 2.6
1 3
x2 f(x)dx x2 f(x)dx x2 f(x)dx μ2
1 3
1 3
x(0)dx 4k x(x 1)3 dx x(0)dx (2.6)2
1 3
3
4k x2 (x 1)3 dx (2.6)2
1
3
4k (x5 3x4 3x3 x2 )dx (2.6)2
1
6 6 5 5 4 4 3 3 3
1 3 1 3 1 3 1 3 1
4 3 3 6.8 0.1066
4 5 5 4 3
1
Variance σ 2 0.1066
x
Solution: Given f(x) ce , x
x 1
ce dx 1 c.2. e x dx 1 2c 1 c
0 2
1 x
μ xe dx 0 Since f(x) is odd function
2
Variance σ 2 x2 f(x)dx μ2
1 x
σ2 x2 e dx μ2
2
1 x
.2 x2 e dx
2 0
x2 e x dx
0
x3 1 e x dx
0
Γ 3 By gamma function xn 1 e x dx Γ n
0
2!
14 x
P(0 x 4) e dx
20
14 x
e dx
20
1 x 4
e
2 0
1 4
1 e
2
Solution: Given Y aX b
E(y) aE(x) b
2
We know that V(y) E(y 2 ) E(y)
2
V(y) E (ax b)2 E(ax b)
2
V(y) E (a 2 x 2 2abx b 2 ) aE(x) b
2
V(y) a 2 E x 2 2abE x E b2 a 2 E(x) 2abE x E b2
2
V(y) a 2 E x 2 a 2 E(x)
2
V(y) a 2 E x 2 E(x) a 2 V(x)
V(y) a 2 V(x)
1π
xSinxdx
20
1 π
xcosx sinx 0
2
1
πcosπ sinπ 0 0
2
1
π 1 0
2
π
Mean E(x) μ
2
M π 1
1
ii) Let M be the median of distribution then f(x)dx and f(x)dx
0 2 M 2
1M 1
Sinxdx
20 2
M
Cosx 0
1
π
CosM 0 M
2
1
f(x) Sinx
2
1
f (x) Cosx
2
i.e., f (x) 0
1
Cosx 0
2
π
x
2
1
Now f (x) Sinx
2
π 1 π
f Sin
2 2 2
1
0
2
π
x is the point of maximum.
2
π
Mode of f(x) is
2
1 e 2x x 0
F(x) find (i) f(x) (ii) Mean (iii) Variance and Standard deviation
0 Otherwise
1 e 2x x 0
Solution: Given cumulative distribution function F(x)
0 Otherwise
2x x 0
d 0 2e
(i) We know that F(x) f x
dx 0 Otherwise
2e 2x x 0
f x
0 Otherwise
0
2x
E(x) μ x 0 dx x 2e dx
0
2x
μ 2 xe dx
0
x 2x 1 2x
μ 2 e e
2 4 0
1
μ 2 0 0 0 1
4
1
μ
2
Variance σ 2 x2 f(x)dx μ2
0
σ2 x2 f(x)dx x2 f(x)dx μ2
0
1
σ 2 2 x2 e 2x
dx
0 64
x2 x 1 1
σ2 2 e 2x
e 2x
e 2x
2 2 4 4
0
1 1
σ2 2 0 1
4 4
1 1
σ2
2 4
1
Variance σ 2
4
λx x 0
kxe
Solution: Given probability distribution function f(x)
0 Otherwise
0
f(x)dx f(x)dx 1
0
λx
0 k xe dx 1
x λx 1 λx
k e 2
e 1
λ λ 0
1
k 0 0 1 1
λ2
k
1
λ2
k λ2
0
λx
E(x) μ x 0 dx x kxe dx
0
μ k x2 e λx
dx
0
2 x2 2x 2x 2x 2 2x
μ λ e 2
e 3
e
λ λ λ 0
2
Mean E(x) μ
λ
Variance σ 2 x2 f(x)dx μ2
0
σ2 x2 f(x)dx x2 kxe λx
dx μ2
0
0 2
2
σ2 x2 0 dx k x3e λx
dx
0 λ
2
2 2 3 λx 2
σ λ x e dx
0 λ
2
x3 3x 2 6x 6 2
σ 2 λ2 e λx
e λx
e λx
e λx
λ λ2 λ3 λ4 0
λ
6 4
σ 2 λ2 0 4
1
λ λ2
6 4
σ2 2
λ λ2
2
Variance σ 2
λ2
kxe 2x x 0
f(x) find (i) k (ii) Mean (iii) Variance
0 Otherwise
KW-hour, determine the probability that there is power cut (shortage) on any given day.
x
1 3
Solution: Given probability density function f(x) xe x 0
9
0 x 0
12
i.e., P 0 x 12 f x dx
0
x
1 12 3
P 0 x 12 xe dx
90
x x 12
1
3xe 3 9e 3
9
0
1 4 4
3 12 e 9e 0 9 1
9
1 4 4
36e 9e 9
9
1 4
45e 9
9
9 4
5e 1
9
4
P 0 x 12 1 5e
i.e., P x 12
P x 12 1 P 0 x 12
4
P x 12 1 1 5e
4
P x 12 5e
(i) P(x a) P x a
1
P(x a) P x a Since total probability is 1
2
1
P(x a)
2
a 1
f(x)dx
0 2
a 1
3x2 dx
0 2
a
x3 1
3
3 2
0
1
a3 0
2
1
1 3
a
2
1 1
f(x)dx
b 20
1 1
3x2 dx
b 20
1
x3 1
3
3 20
b
1
b3 1
20
1
19 3
b
20
0
if x 2
1
Problem: If the function is defined by f(x) (2x 3) if 2 x 4 a probability density function?
18
if x 4
0
find the probability that variate having f x as density function will fall in the interval 2 x 3.
Problem: For the continuous probability function f x kx 2 e x when x 0, find (i) k (ii) Mean
(iii) Variance
Problem: The diameter of an elective cable say X is assumed to be continuous random variable with
1 1
P 0 x , P x .
2 4
Binomial distribution:
n− x
Let f ( x) = Cx p q , x = 0,1,2...n, zero elsewhere and q = 1 − p p + q = 1
n x
n
Clearly f(x) 0 and f(x) 1
x 0
Definition: Any random variable X whose p.m.f is the above f(x) is said to have a Binomial
distribution and the p.m.f. f(x) is called the binomial p.m.f.
n
μ xP(x)
x 0
n
x n C x px q n x
x 0
n 1 n 1 n 2 n 2 n 3 n 3 n n n n
0 (1) C 1p q (2) C2p q (3) C 3p q ... (n) Cnp q
(n 1)(n 2) 2 n
n p qn 1
(n 1)pq n 2
pq 3
... pn 1
2
n 1 n 2 2 n 3
np (n 1)
C0 q
(n 1)
C1pq
(n 1)
C2p q ... (n 1)
C(n 1) pn 1
n 1
np q p
np(1)n 1
np
Mean μ E(x) np
n
2
Variance of Binomial distribution: Variance σ x2 P(x) μ2
x 0
n
σ2 (x2 x x)P x μ2
x 0
n n
x(x 1)P x xP x μ2
x 0 x 0
n n
x(x 1)P x μ μ2 μ xP x
x 0 x 0
n
0 0 x(x 1) n Cxpxq n x
μ μ2
x 2
= n(n − 1) p2 ( ( n −2)C0 q n−2 + ( n −2)C1 pq n−3 + ( n −2)C2 p 2 q n−4 + ... + ( n −2)Cn −2 p n−2 ) + − 2
n 2
n(n 1)p2 q p μ μ2
n 2
n(n 1)p2 1 μ μ2
(n 2 n)p2 μ μ2
n 2 p2 n p2 np n 2 p2
2 = np (1 − p )
Mode of the Binomial distribution is the value of x that maximizes the probability function.
n− x x +1
We know that P( x) = Cx p q
n x
and P( x + 1) = Cx +1 p
n
qn− x−1
P( x + 1) nCx +1 p x +1 q n− x −1
= n x n− x
P( x) Cx p q
n x p
P(x 1) P(x) is known as Recurrence relation for Binomial distribution.
x 1 q
1
Solution: Given that the probability of man is p
3
1 2
q (1 p) 1 , and n 6
3 3
n− x
We know that Binomial distribution function is P( x) = Cx p q
n x
6 6 6
6 1 2 0 1 2 3 4 5 6
1 C6 0.9986
3 3
1 6 1
1 2
(ii) Exactly 1 time P(X 1) 6
C1 0.2633 Total Probability is 1
3 3
0 1 2 3 4 5 6
1 P(X 0) P(X 1)
0 6 0 1 6 1
6 1 2 6 1 2
C0 C1 0.6488
3 3 3 3
1
Problem: The probability that John hits a target is . He fires 6 times, find the probability that he
2
hits the target (i) Exactly 2 times (ii) More than 4 times (iii) At least once.
1
Problem: The probability that a man hitting target is , if he fires 7 times, find the probability of
4
hitting the target at least twice.
1
Solution: Let p be the probability of getting a head, i.e., p
2
1
q (1 p) . here n 10.
2
n− x
We know that Binomial distribution function is P( x) = Cx p q
n x
10 x 10 x
10 1 1
P(x 7) Cx
x 7 2 2
10 10
10 1
Cx
x 7 2
10
1 10 10 10 10
C7 C8 C9 C 10
2
11
P(x 7) .
64
10 x 10 x 10 10 10
10 1 1 10 1 1 10
P(x 6) Cx Cx C6 10 C7 10 C8 10 C9 10 C10
x 6 2 2 x 6 2 2
193
P(x 6) .
512
Problem: The mean and variance of a binomial variable X with parameters 'n' and 'p' are 16 and 8.
Find P X 1 and P X 2 .
Solution: Here, n = 5
n(E) 6 1
i.e., p
n(S) 36 6
1 5
q 1 p 1
6 6
E {(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}
n− x
The Binomial distribution function is P( x) = Cx p q
n x
0 5 0
5 1 5 0 1 2 3 4 5
1 C0
6 6
(ii) 2 times
2 5 2
5 1 5
i.e., P(x 2) C2
6 6
(iii) P(1 x 5)
2 5 2 3 5 3 4 5 4
5 1 5 5 1 5 5 1 5
C2 C3 C4
6 6 6 6 6 6
P(1 x 5) 0.19611
1 1 1
Let p be the probability of boy p implies probability of girl q 1 ,
2 2 2
n− x
The probability p of x successes is given by P( x) = Cx p q
n x
3 5 3
5 1 1 5
i.e., P(X 3) C3
2 2 16
5
For 800 families the probability of number of families having 3 boys is 800 250
16
2 5 2 3 5 3
5 1 1 5 1 1
C2 C3
2 2 2 2
5
P(X 2) P(X 3)
8
5
For 800 families the probability of number of families having 3 boys is 800 500
8
0 5 0
5 1 1
i.e., P(X 0) C0
2 2
1
P(X 0)
32
1
For 800 families the probability of number of families having 3 boys is 800 25
32
0 5 0
5 1 1 1
1 C0 1
2 2 32
31
P(X 1)
32
31
For 800 families the probability of number of families having 3 boys is 800 775
32
Problem: In a family of 5 children find the probability that there are (i) 2 boys (ii) At least one
boy (iii) All are boys (iv) No boys.
Solution: Given that the probability that in the solar heat installations the utility bill is reduced by
one-third is p 60% 0.6
q 1 p 0.4
Here n 5
x
The Binomial distribution function is P(x)
n
Cxp (1 p)n x
5 4 5 4
i.e., P(x 4) C 4 0.6 0.4
P(x 4) 0.2592
5 4 5 4 5 5 5 5
C 4 0.6 0.4 C 5 0.6 0.4
P(X 4) 0.337
(iii) Find P 1 x 5
20
Solution: Given p 20% 0.2
100
x
The Binomial distribution function is P(x)
n
Cxp (1 p)n x
5 0 5 0
i.e., P(X 0) C 0 0.2 (0.8)
P(X 0) 0.32768
5 1 5 1
i.e., P(X 1) C 1 0.2 (0.8)
P(X 1) 0.4096
(iii) P(1 X 5)
4
x
P(1 X 5) n
Cxp (1 p)n x
x 2
2 3 4
P(1 X 5) 5
C 2 0.2 (0.8)5 2 5
C 3 0.2 (0.8)
5 3 5
C 4 0.2 (0.8)5 4
P(1 X 5) 0.2624
(i) Exactly 10 (ii) At least 10 (iii) At most 8 (iv) At least 2 and at most 9 are good in
Mathematics
1
Solution: Given n 18; p 50%
2
1 1
q 1 p 1
2 2
n x n x
The probability p of x successes is given by P(X x) C xp q
(i) Exactly 10
10 18 10
18 1 1
i.e., P(x 10) C10
2 2
P x 10 0.1670
(ii) At least 10
10
x 18 x 10
18
18 1 1 18 1
i.e., P(x 10) Cx Cx
x 0 2 2 x 0 2
18
1 18
P(x 10) C0 18 C1 18 C 2 18 C 3 18 C 4 18 C 5 18 C6 18 C7 18 C8 18 C9 18 C10
2
(iii) At most 8
8
x 18 x 8
18
18 1 1 18 1
i.e., P(x 8) Cx Cx
x 0 2 2 x 0 2
18
1 18
P(x 8) C0 18 C1 18 C 2 18 C 3 18 C 4 18 C 5 18 C6 18 C7 18 C8
2
Solution: Given n 4
1
Let p be the probability of getting 1 p
9
1 8
q 1 p 1
9 9
n x n x
The probability p of x successes is given by P(X x) C xp q
0 4 0
4 1 8
P(X 0) C0 0.624
9 9
P(X 0) 0.624
Solution: Given n 8,
1 1 1 1 2
Let p be the probability of getting 2 or 4 p q 1 p 1
6 6 3 3 3
n x n x
The probability p of x successes is given by P(X x) C xp q
(i) 4 successes
4 8 4
8 1 2
i.e., P(X 4) C4 0.17
3 3
P(X 4) 0.17
0 8 0 1 8 1 2 8 2 3 8 3
8 1 2 8 1 2 8 1 2 8 1 2
C0 C1 C2 C3
3 3 3 3 3 3 3 3
P X 3 0.74
(iii) P(X 2)
i.e., P(X 2) 1 P X 2 1 Px 0 Px 1
0 8 0 1 8 1 2 8 2
8 1 2 8 1 2 8 1 2
1 C0 C1 C2
3 3 3 3 3 3
P(X 2) 0.8049
(i) The mean (ii) The variance of the distribution of defective bolts of 640
1 1 7
Solution: Given n 640; p q 1 p 1
8 8 8
640
(i) We know that mean μ np 80
8
7
(ii) Variance = npq = 640 ( 80 ) = 70
2
8
4
Solution: Given that mean np 4, variance npq
3
4 1 1 2
4q q p 1 and
3 3 3 3
2
np 4 n 4
3
n 6
0 6 0
6 2 1
1 C0
3 3
P(X 1) 0.9986
Solution: Given n 5;
i.e., P(X 1)
5
C1p1q 5 1 0.4096
P(X 2) 5 C 2 p2 q 5 2 0.2048
(1) pq 4 0.08192 q
2 3
4
2 p q 0.02048 p
(1 p) 1
4 1 p 4p 5p 1 p
p 5
The possible number of successes and their frequencies is called a Binomial Frequency
distribution. In N sets of n trials the theoretical frequencies of 0, 1, 2, …. n successes will be given by
n
the terms of expansion of N q p
n x p
Were, P x 1 .P x
x 1 q
x 0 1 2 3 4 5
f 2 14 20 34 22 8
Solution: Here n 5
Evaluation of Mean
x f xf Probability P x f x N.P x
0 2 0 P x 0 0.015 1.5
1 14 14 P x 1 0.098 9.8
2 20 40 P x 2 0.257 25.7
4 22 88 P x 4 0.223 22.3
5 8 40 P x 5 0.058 5.8
Σfi xi 284
Mean μ 2.84
Σfi 100
2.84
i.e., mean np 2.84 p 0.568
5
q 1 p 1 568 0.432
P x 0 n
C 0pxq n x
5 0 5 0
P x 0 C0 0.568 0.432
5
P x 0 0.432 0.015
n x p
We have recurrence relation P x 1 .P x
x 1 q
5 2 0.568 5 3 0.568
P 3 0.257 0.337 , P 4 0.34 0.223
2 1 0.432 3 1 0.432
5 4 0.568
P 5 0.223 0.058
4 1 0.432
The theoretical frequencies are 1.5, 9.89, 26, 34.1, 22.4, 5.9
As the frequencies always integer, therefore, by converting them to the nearest integers, we get
x 0 1 2 3 4 5 6 7 8 9 10
f 6 20 28 12 8 6 0 0 0 0 0
Solution: Here the random variable X denotes the number of seeds germinating out of a set of 10
seeds.
Evaluation of Mean
x f xf Probability P x f x N.P x
0 6 0 P x 0 0.086 6.88
1 20 20 P x 1 0.2390 19.12
2 28 56 P x 2 0.2990 23.92
3 12 36 P x 3 0.2216 17.73
4 8 32 P x 4 0.1078 8.62
5 6 30 P x 5 0.0360 2.88
6 0 0 P x 6 0.0083 0.67
7 0 0 P x 7 0.0013 0.11
8 0 0 P x 8 0.0001 0.01
9 0 0 P x 9 0.0000 0.00
10 0 0 P x 10 0.0000 0.00
N Σfi 80 Σfi xi 174
Σfi xi 174
Mean μ 2.175
Σfi 80
q 1 p 1 0.2175 0.7825
n
Therefore, the Binomial distribution to be fitted as N q p
P x 0 n
C 0pxq n x
10 0 10 0
P x 0 C0 0.2175 0.7825
10
P x 0 0.7825 0.086
n x p
We have recurrence relation P x 1 .P x
x 1 q
10 0 0.2175
P 1 0.086 0.2390 ,
0 1 0.7825
10 1 0.2175
P 2 0.2390 0.2990 ,
1 1 0.7825
10 2 0.2175
P 3 0.2990 0.2216 ,
2 1 0.7825
10 3 0.2175
P 4 0.2216 0.1078 ,
3 1 0.7825
10 4 0.2175
P 5 0.1078 0.0360 ,
4 1 0.7825
10 5 0.2175
P 6 0.0360 0.0083 ,
5 1 0.7825
10 6 0.2175
P 7 0.0083 0.0013 ,
6 1 0.7825
10 7 0.2175
P 8 0.0013 0.0001 ,
7 1 0.7825
10 9 0.2175
P 10 0.0000 0.0000 .
9 1 0.7825
The theoretical frequencies are 6.88, 19.12, 23.92, 17.73, 8.62, 2.88, 0.67, 0.11, 0.01, 0,0
As the frequencies always integer, therefore, by converting them to the nearest integers,
λxe λ
Clearly f(x) 0 and f(x)
x 0 x 0 x!
λ λx
e e λeλ 1
x 0 x!
Definition: Any random variable X whose pdf is above f(x) is said to have Poisson distribution and
the pdf is called a Poisson pdf.
Note: Poisson distribution can be derived from Binomial distribution by applying the following
conditions
n− x
x
f ( x) = C x
n
1 −
n n
n− x
n!
x
f ( x) = 1 −
x !( n − x )! n n
1−
n ( n − 1)( n − 2 ) ... ( n − ( x − 1) ) ( n − x )! x n
f ( x) =
x !( n − x )! nx x
1 −
n
1 2 x − 1 x
n
n 1 − 1 − ... 1 − 1 −
x
n n n n
f ( x) =
x
x !n x
1 −
n
1 2 x − 1 x
1 − 1 − ... 1 −
n
n
n n
f ( x) = 1 −
n
x
x !1 −
n
Taking as n →
1 2 x − 1 x
1 − 1 − ... 1 −
n
n
n n
lim f ( x) = lim 1 −
n
n → n → x
x !1 −
n
x − a
x
f ( x) = e lim 1 + = ea
x! x →
x
x e−
Therefore, the Poisson distribution is f ( x) = , x = 0,1, 2...
x!
λxe λ
Mean of Poisson distribution: Mean μ xP(x) x
x 0 x 0 x!
λxe λ
x
x 0 x(x 1)!
Put x 1 y x y 1,
limits x 1 y 0 and x y
λy 1e λ
μ
y 0 y!
λ λy
e λ
y 0 y!
λe λ e λ λ
Mean μ λ
2
Variance of the Poisson distribution: Variance σ x2 P(x) μ2
x 0
(x2 x x)P x μ2
x 0
λx e λ
0 0 x(x 1) μ μ2 (x)P x μ
x 2 x(x 1)(x 2)! x 0
λx e λ
μ μ2
x 2 (x 2)!
λxe λ
σ2 μ μ2
x 2 (x 2)!
imits x 2 y 0 and x y
2 λy 2 e λ
σ μ μ2
y 0 y!
λ 2 λy
e λ μ μ2
y 0 y!
λ 2
e λ eλ μ μ2
λ2 λ λ2 λ
Therefore variance σ 2 λ
Standard deviation σ λ
μx e μ
Therefore, Poisson distribution function can be written as f(x) , x 0,1,2,3....
x!
λxe λ λx 1e λ
p(x) p(x 1)
x! (x 1)!
λx λx λ
x! (x 1)(x 1 1)!
1 λ
x! (x 1)(x)!
λ
1
(x 1)
(x 1) λ
λxe λ λx 1e λ
x! (x 1)!
1 λ1
x(x 1)! (x 1)!
1 1
x λ
x λ (2)
Note:
01. If λ is an integer, then λ 1 is also an integer. So, we have two maximum values and the
Solution: Given λ 4
i.e., P( X 2) = P( X = 0) + P( X = 1) + P( X = 2)
0 e − 1e − 2e −
= + +
0! 1! 2!
4 41 4 2
e 1
1! 2!
P( X 2) = 13e−4
3e −
i.e., P( X = 3) =
3!
43
= e −4
3!
32 −4
P ( X = 3) = e
3
2
Solution: Given n 5 and probability that a man will be alive 30 years is p
3
2 10
Therefore mean = = np = 5 =
3 3
μx e μ
The Poisson distribution is P(x) , x 0,1,2,3....
x!
5 10
10 3
e
3
i.e., P(X 5)
5!
14.680
P(X 5)
120
P(X 5) 0.12
i.e., P(X 1) 1 P X 1
P(X 1) 1 P X 0
0 10
10 3
e
3
P(X 1) 1
0!
10
P(X 1) 1 e 3
P(X 1) 1 0.0357
P(X 1) 0.9643
i.e., P(X 3) 1 P x 3
P(X 3) 1 P x 4 P x 5
4 10 5 10
10 3 10 3
e e
3 3
1
4! 5!
P(X 3) 0.6965
Problem: Average number of accidents on any day on National highway is 1.8 () . Find the
probability that (i) At least once (ii) At most once
Problem: At a checkout counter customer arrive at an average of 1.5 per minute. Find the
probabilities that in any given minute of time (i) At most 4 will arrive (ii) Exactly 4 will arrive (iii)
At least 4 will arrive.
When the value of n in a Binomial distribution is large and the value of p is very small, the
binomial distribution can be approximated by a Poisson distribution.
If n 20 , p 0.05
Problem: It has been found that 2% of the tools produced by certain machine are defective, what is
the Probability that in a shipment of 400 such tools (i) 3% or more (ii) 2% or less will be defective.
μ λ np
= = 400 0.02 = 8
x e−
Poisson distribution function f ( x) = , x = 0,1, 2,3....
x!
0 1 2 3 4 5 6 7 8 9 10 11
= 1 − e− + + + + + + + + + + +
0! 1! 2! 3! 4! 5! 6! 7! 8 9! 10! 11!
80 81 82 83 84 85 86 87 88 89 810 811
−8
= 1− e + + + + + + + + + + +
0! 1! 2! 3! 4! 5! 6! 7! 8 9! 10! 11!
= 1 − e−8 ( 2647.29 )
= {P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5) + P(X = 6) + P(X = 7) + P(X = 8)}
0 e− 1e− 2 e− 3e − 4 e − 5e − 6 e − 7 e − 8e −
= + + + + + + + +
0! 1! 2! 3! 4! 5! 6! 7! 8
80 81 82 83 84 85 86 87 88
−8
=e + + + + + + + +
0! 1! 2! 3! 4! 5! 6! 7! 8
= e−8 (1766.3587 )
= 0.5925
= = np
= = 80 0.06 = 4.8
Here n = 80 20 and np 5
x e−
Poisson distribution function f ( x) = , x = 0,1, 2,3....
x!
(i) 4 will get at least one parking ticket in any given year.
( 4.8) e−4.8
4
i.e., P( X = 4) = = 0.182029
4!
(ii) At least 3 will get at least one parking ticket in any given year.
i.e., P( X 3) = 1 − P ( X 3)
1 P x 0 P x 1 P x 2
P ( X 3) = 1 − 0.142539
P( X 3) = 0.857461
(iii) Anywhere from 3 to 6, inclusive, will get at least one parking ticket in any given year
P(3 X 6) = + + +
3! 4! 5! 6!
P (3 X 6) = 0.648265
Problem: If 0.8% of the fuses delivered to an arsenal are defective, use the Poisson approximation to
determine the probability that 4 fuses will be defective in a random sample of 400.
Here n 200 20
A packet will violate guarantee if it contains more than 10% non-germination seeds
1 P(X 20)
20 λxe λ
1
x 0 x!
100 101 102 103 104 106 107 108 109 1010 1011 1012 1013 1014
+ + + + + + + + + + + + +
0! 1! 2! 3! 4! 6! 7! 8! 9! 10! 11! 12! 13! 14!
= 1− e
−10
1 0.9984
Solution: Given n = 51
13
C2 3
i.e., p
52 51
C2
3
Mean μ λ np 51 3
51
μx e μ
The Poisson distribution is P(x) , x 0,1,2,3....
x!
At least 3 times
30 e 3
31 e 3
32 e 3 3 9
1 1 e 1 3
0! 1! 2! 2
P(X 3) 0.5768
2
Solution: Given X is a random variable having Poisson distribution such that P(X 1) P(X 3)
3
λ1e λ 2 λ3e λ
1! 3 3!
λ1 2 λ3
1 36
9λ λ3
λ3 9λ 0
λ λ2 9 0
λ 0 or λ 3
Mean μ λ 3
1 P(X 0)
λ0e λ
1
0!
3
P(X 1) 1 e
3 31 3 2 3 3
e 1
1! 2! 3!
3
P X 3 13e
3 32 33 3 4 35
e
2! 3! 4! 5!
72 3
P(2 X 5) e
5
Solution: Given 6P X 4 P X 2 2P X 0 ,
λ4e λ λ2 e λ λ0e λ
6 2
4! 2! 0!
λ4 λ2
6 2
24 2
λ4 λ2 4
4 2
λ4 2λ2 8 0
λ 1.849 2
(i) Mean μ λ 2
Problem: If X is a random variable having Poisson distribution such that P(X 1) P(X 2) find
Problem: If a Poisson variate 2P(X 0) P(X 2) find the probability that (i) P(X 3) (ii) P(X 3)
(iii) P(2 X 5)
We know that Poisson is bimodal, and the two modes are at the points x λ 1 and x λ
Implies λ 2
μx e μ
The Poisson distribution is P(x) , x 0,1,2,3....
x!
21 e 2 2 2 e 2
1! 2!
2 2
2e 2e
2
4e
1 P(X 0) P(X 1)
20 e 2 21 e 2
1
0! 1!
2
1 3e
P(X 2) 0.5939
λ0e λ 3
Now P(x 0) e
0!
λ
By recurrence relation we have P(x 1) P(X)
x 1
Put x 0
λ 3
P(0 1) P(0) 3e
x 1
Put x 1
λ 9 3
P(1 1) P(1) e
x 1 2
Put x 2
λ 9 3
P(2 1) P(2) e
x 1 2
Put x 3
λ 27 3
P(3 1) P(3) e
x 1 8
Put x 4
λ 81 3
P(4 1) P(4) e
x 1 40
x 0 1 2 3 4 5 Total
f 142 156 69 27 5 1 400
Solution:
Evaluation of Mean
x f xf
0 142 0
1 156 156
2 69 138
3 27 81
4 5 20
5 1 5
Σfi 400 Σfi xi 400
Σfi xi 400
Mean μ λ 1
Σfi 400
N.P(x) 400P(x 0), 400P(x 1), 400P(x 2), 400P(x 3), 400P(x 4), 400P(x 5)
10 e 1 11 e 1 12 e 1 13 e 1 14 e 1 15 e 1
400 , 400 , 400 , 400 , 400 , 400
0! 1! 2! 3! 4! 5!
x 0 1 2 3 4 5 6 7
f 305 365 210 80 28 9 2 1
The central limit theorem (CLT) states that the distribution of sample means approximates a normal
distribution as the sample size gets larger, irrespective of the population's distribution.
Or
If X is the mean of the random sample of size n taken from a population having the mean μ and the
x μ
finite variance σ 2 , then z is a random variable whose distribution function approaches that of
σ n
the standard normal distribution as n
Normal distribution, also known as the Gaussian distribution, is a probability distribution that
is symmetrical around its mean, ensuing in a bell-shaped curve. This distribution is widely used in
innumerable fields due to its mathematical properties and prevalence in natural phenomena.
Stock prices: Changes in stock prices often follow a normal distribution pattern.
Returns on investments: The distribution of returns on investments is often modeled using a normal
distribution.
Quality Control
Manufacturing processes: Variations in product measurements, such as length, width, and weight,
often follow a normal distribution. Quality control charts are based on normal distribution
assumptions.
Biological Sciences
Physical characteristics: Traits like height, weight, and blood pressure in a population tend to follow
a normal distribution.
Biological variability: In studies of biological phenomena, the normal distribution is commonly used
to model genetic variations and other natural processes.
IQ scores: Intelligence quotient (IQ) scores are often assumed to follow a normal distribution in the
population.
Personality traits: Many personality traits, when measured, tend to follow a normal distribution.
Education
Standardized testing: Test scores on standardized exams, such as the SAT or GRE, are often
assumed to follow a normal distribution.
Grade distribution: In large populations, the distribution of grades in a course can approximate a
normal distribution.
Physics
Measurement errors: Errors in measurements and experimental data often follow a normal
distribution.
Quantum mechanics: Certain physical properties of particles, like their position and momentum, can
be modelled using normal distribution functions.
Social Sciences
Income distribution: In large populations, the distribution of income is often modeled using a log-
normal distribution, which is related to the normal distribution.
Survey responses: When collecting data through surveys, responses to questions can often be
analysed assuming a normal distribution.
Health Sciences
Blood pressure: Blood pressure measurements in a population tend to follow a normal distribution.
Body temperature: Normal body temperature variations in a healthy population can be modeled
using a normal distribution.
Environmental Studies
Weather patterns: Certain meteorological parameters, such as temperature and wind speed, can be
modelled using a normal distribution.
Pollution levels: Concentrations of pollutants in the air or water may follow a normal distribution.
Tolerance limits: Specifications for manufacturing parts often involve tolerance limits, and the
distribution of dimensions is assumed to be normal.
2
1 x a
1 2 b
Normal Distribution: Define f(x) e x
b 2π
Therefore f(x) satisfies the conditions of being a continuous pdf. This type of distribution is called
Normal distribution and the pdf f(x) is called the normal pdf.
2
1 x a
1 2 b
μ E(X) x e dx
b 2π
2
1 x a
1 2 b
xe dx
b 2π
Put z x a x zb a dx bdz
b
if x z ,x z
z2
1 2
μ (zb a)e (bdz)
b 2π
z2
1 2
(zb a)e dz
2π
z2 z2
b 2 a 2
ze dz e dz
2π 2π
z2
2a π π
a e 2 dz
2π 2 0 2
Mean μ a
2
1 x a
1 2
x e 2 b
dx μ2
b 2π
x a
Put z b
x zb a dx bdz ; if x z ,x z
z2 z2
2 1 2 2 1 2
σ (zb a) e 2 (bdz) μ (zb a) e 2 dz μ2
b 2π 2π
z2 z2 z2
b2 2 2ab 2 1
z e 2 dz ze 2 dz a e 2 dz a2
2π 2π 2π
z2 z2
b2 2ab 1
2 z2 e 2 dz (0) a2 2 e 2 dz a2
2π 0 2π 2π 0
z2
2 b2 1 π
σ 2 z2 e 2 dz a2 2 a2
2π 0 2π 2
z2
b2
σ2 2 z2 e 2 dz a 2 a 2
2π 0
z2
2 2b2
σ z2 e 2 dz
2π 0
2b2 1
σ2 t 2 e t dt
π 0
2b2 3 1
t 2b 2 3 3 1
t 2 e dt Γ Γ 3
2
e t t 2 dt
π 0 π 2 0
2b2 1 1
σ2 Γ
π2 2
b2 1
π b2 Γ π
π 2
Variance σ 2 b2
Standard deviation σ b
2
1 x μ
1 2 σ
Now the Normal distribution pdf becomes f(x) e x
σ 2π
2
1 x μ
1 2 σ
We know that f(x) e
σ 2π
2
1 x μ
1 2 σ 1 x μ 1
f (x) e 2
σ 2π 2 σ σ
2
1 x μ
1 2 σ x μ
f (x) e (1)
σ 2π σ2
2
1 x μ
1 2 σ x μ
e 0
σ 2π σ
x μ
0
σ
x μ
x μ
From (1) f (x) f(x)
σ2
x μ 1
f (x) f (x) f(x)
σ σ2
When x μ, f (x) 0
M 1
If M is the median of the Normal distribution, we have f(x)dx f(x)dx
M 2
M 1
Now f(x)dx
2
2
M 1 x μ
1 2 σ 1
e dx
σ 2π 2
2 2
μ 1 x μ M 1 x μ
1 2 σ 1 2 σ 1
e dx e dx (1)
σ 2π σ 2π μ 2
2
μ 1 x μ
1 2 σ
Consider e dx
σ 2π
x μ
Put z then dx σdz
σ
If x z ; x μ z 0;
2
μ 1 x μ
1 2 σ
e dx
σ 2π
1 0 z2
e 2 σdz
σ 2π
1 0 z2 0 z2 π
e 2 dz e 2 dz
2π 2
1 π 1
2π 2 2
2
1 x μ
1 1 M 2 σ 1
(1) e dx
2 σ 2π μ 2
2
M 1 x μ
2 σ
e dx 0
μ
b
Therefore μ M if f(x)dx 0 a b here f(x) 0
a
01. The graph of the Normal distribution y f(x) in the xy – plane is known as the normal curve.
02. The curve is a bell-shaped curve and symmetrical with respect to mean
i.e., the line x μ and the two tails on the right and left sides of the mean extends to infinity. The
03. Mean, median and mode of the distribution coincide at x μ as the distribution is symmetrical.
04. The probability that the normal variate X with mean μ and standard deviation lies between
x1 and x 2 is given by
2
x2 1 x μ
1 2 σ
P(x1 X x2 ) e dx
σ 2π x1
Standard Normal distribution: The Normal distribution with mean μ 0 and standard deviation σ 1
is known as the standard Normal distribution.
z1 z2
Note: The definite integral e 2 dz is known as the normal probability integral and gives the area
0
The probability that normal variate X with mean and standard deviation lies between two
specific values x1 and x 2 with x1 x2 can be obtained using area under the standard normal curve
as follows
x μ
01. Use transformation z & find z1 and z 2 corresponding to values x1 and x 2 respectively.
σ
Case (i) If both z1 and z 2 are positive (or both negative), then P(x1 X x2 ) A(z2 ) A(z1 )
= (Area under the normal curve from 0 to z2 ) − (Area under the normal curve from 0 to z1 )
= (Area under the normal curve from 0 to z2 ) + (Area under the normal curve from 0 to z1 )
P(z 0) P(z 0) 1
Case (i) If z1 0, then P(z z1 ) 0.5 A(z1 ) 2
1 0.5 A(z1 )
1 0.5 A(z1 )
Note: A( z) A(z).
Solution: Given that X is a normal variate with mean μ 30 and standard deviation σ 5.
x μ
We know that z
σ
x1 μ x2 μ
(i) Let z1 and z 2 where x1 26 and x 2 40
σ σ
26 30 40 30
z1 0.8 0 and z 2 2 0
5 5
A( 0.8) A(2)
A(0.8) A(2)
0.7653
x1 μ 45 30
(ii) When x1 45 z1 3 0
σ 5
0.5 0.4987
x μ
We have z
σ
x1 μ x2 μ
Now z1 and z 2 where x1 778 and x 2 834
σ σ
A(0.85) A( 0.55)
A(0.85) A(0.55)
0.3023 0.2088
x μ
Here x1 12 & x 2 15. we know that z
σ
x1 μ 12 14 x 2 μ 15 14
z1 0.8 0 and z 2 0.4 0
σ 2.5 σ 2.5
A( 0.8) A(0.4)
A(0.8) A(0.4)
0.2881 0.1554
0.4435.
x1 μ 18 14
z1 z1 1.6 0
σ 2.5
0.5 A(1.6)
0.0548
x1 μ 18 14
z1 z1 1.6 0
σ 2.5
0.5 A(1.6)
0.5 0.4452
0.94452
Let x1 30 and x 2 60
x1 μ 30 34.5
We have z1 z1 0.27 0 and
σ 16.5
x2 μ 60 34.5
z2 z2 1.54 0
σ 16.5
A( 0.27) A(1.54)
A(0.27) A(1.54)
0.4382 0.1084
0.5916
x1 μ 140 138
We have z1 z1 0.2 0 and
σ 10
x2 μ 140 148
z2 z2 0.8 0
σ 10
A( 0.2) A(0.8)
A(0.2) A(0.8)
0.2881 0.0793
0.3674
Therefore, the number of students whose weights are between 138 and 148 pounds is
x1 μ 152 140
z1 z1 1.2 0
σ 10
0.5 A(1.2)
0.5 0.3849
0.1151
cells are expected to have life (i) More than 15 hours. (ii) Less than 6 hours. (iii) Between 10 and 14
hours?
x1 x 15 12
z1 z1 1 0
σ 3
0.5 A(1)
0.5 0.3413
0.1587
Percentage of battery cells are expected to have life more than 15 hours is 15.87%
i.e., P(X 6)
x1 x 6 12
z1 z1 2 0
σ 3
0.5 A( 2)
0.5 A(2)
0.5 0.4772
0.0228
Here x1 10, x 2 14
x x
we know that z
σ
x1 x 10 12 x 2 μ 14 12
z1 0.67 0 and z 2 0.67 0
σ 3 σ 3
A( 0.67) A(0.67)
A(0.67) A(0.67)
2A(0.67)
2 0.2485
0.4970
Percentage of battery cells are expected to have life between 10 and 14 hours is 49.70%
Therefore, the pairs of shoes need to replace after 12 months is 5000 114 4886
Problem: The mean yield per plot of a crop is 17 kg. and standard deviation is 3 kg. If distribution of
yield per plot is normal, find the percentage of plot giving yield (i) Between 15.5 kg. and 20 kg. (ii)
More than 20 kg.
Ans (i) 53.28% (ii) 15.87%
Problem: If the masses of 300 students are normally distributed with mean 68 kg. and standard
deviation 3kg. How many students have masses,
Solution: Given that mean μ 78% 0.78 and standard deviation σ 11% 0.11
x1 μ 0.9 0.78
z1 z1 1.09 0
σ 0.11
0.5 A(1.09)
0.5 0.3621
0.1379
Hence the number of students with marks more than 90% is 0.1379 1000 137.9 138
The 0.1 area to the left of z corresponds to the lowest 10% of the students
i.e., A z1 0.4
Implies z1 1.28
x1 μ x 0.78
z1 1.28 x 0.6392
σ 0.11
Hence the height marks obtained by the lowest 10% of students 0.6392 1000 64%
(iii) Middle 90% corresponds to 0.9 area, leaving 0.05 area on both sides, then form diagram
x μ
we know that z
σ
x1 μ x1 0.78
z1 1.64 x1 0.5996 or 59.96% and
σ 0.11
x2 μ x 2 0.78
z2 1.64 x 2 0.9604 or 96.04%
σ 0.11
(iii) Corresponding to 0.8 Z 1.58 (iv) to the left of z 2.52 and to the right of z 1.83
0.5 A( 1.78)
0.5 A(1.78)
0.5 0.4625
P(Z z1 ) 0.0375
0.5 A( 1.45)
0.5 A(1.45)
0.5 0.4265
P(Z z1 ) 0.9265
A( 0.8) A(1.53)
A(0.8) A(1.53)
0.2881 0.43701
0.0059
P(Z z1 ) 0.0336
Solution: Given that P(X 35) 10% 0.1 and P(X 90) 5% 0.05
i.e., A z1 0.4
i.e., A z 2 0.45
x1 μ
We know that z1
σ
35 μ
1.29
σ
1.29σ 35 μ
x2 μ
z2
σ
90 μ
1.65
σ
1.65σ 90 μ
1.65σ μ 90 (2)
1 0.89 0.11
i.e., A z1 0.43
i.e., A z 2 0.39
x1 μ
We know that z1
σ
35 μ
1.48
σ
1.48σ 35 μ
63 μ
1.23
σ
1.23σ 63 μ
1.23σ μ 63 (2)
Solution: Given that P(X 30) 40% 0.4 and P(X 60) 15% 0.15
i.e., A z1 0.1
i.e., A z 2 0.35
x1 μ
We know that z1
σ
30 μ
0.26
σ
x2 μ
z2
σ
60 μ
1.04
σ
1.04σ 60 μ (2)
To calculate the probabilities with large values of n, you had to use the binomial formula,
which could be very complicated. Using the normal approximation to the binomial distribution
simplified the process.
distribution can be approximated to Normal distribution with mean = np and variance = npq .
2
If we have Poisson distribution with parameter , if 20, then the Poisson distribution can be
Note: The Poisson distribution is discrete distribution, where as Normal distribution is continuous
distribution. While changing discrete distribution to continues distribution requires continuity
1 1
Here n = 12 , p is the probability of getting head, p = q=
2 2
1 1
Mean = np = 12 = 6 5 and nq = 12 = 6 5
2 2
1 1
Variance = npq = 12 = 3
2
2 2
1
Now X B 12, is approximated to X N ( = 6, 2 = 3)
2
P (1 X 8) = P ( 0.5 X 8.5)
x x
we know that z
σ
x1 μ 0.5 6 x 2 μ 8.5 6
z1 3.18 0 and z 2 1.44 0
σ 1.732 σ 1.732
0.4993 0.4251
P 1 X 8 0.9244
The probability that at least 250 favour Morgan for mayor is P ( X 250 )
P ( X 250 ) = P ( X 249.5)
Here x1 249.5
x x
we know that z
σ
x μ 249.5 230
z1 1.75 0
σ 11.1445
0.5 A(1.75)
0.5 0.4599
P X 250 0.0401
1 3
Here n = 80 , p is the probability of choice, p = q=
4 4
1 3
Mean = np = 80 = 20 5 and nq = 80 = 60 5
4 4
1
Now X B 80, is approximated to X N ( = 20, 2 = 15)
4
P ( 25 X 30 ) = P ( 24.5 X 30.5)
x x
we know that z
σ
x1 μ 24.5 20 x 2 μ 30.5 20
z1 1.16 0 and z 2 2.71 0
σ 3.872 σ 3.872
A(2.71) A(1.16)
0.4966 0.3770
P 25 X 30 0.1196
Given the average number of calls received by a hotel receptionist is 6 per hour.
Mean = 6
i.e., X Pos ( 6 )
P ( X 17 ) = P ( X 16.5)
x−
We have z =
16.5 − 24
z= = −1.53 0
4.899
Now P ( X x1 ) = P ( Z z1 ) = 0.5 − A ( Z )
P ( X 17 ) = P ( X 16.5) = 0.0630
P ( X 31) = P ( X 31.5)
x−
We have z =
31.5 − 36
z= = −0.75 0
6
Now P ( X x1 ) = P ( Z z1 ) = 0.5 + A ( Z )
P ( X 31) = 0.7734
Solution: Let X be the variable that number of buses that passes the central station.
Given the average number of buses that passes the central station is 25.
Mean = = 25
P ( X = 25) = P ( 24 X 26 )
x x
we know that z
σ
x1 μ 24.5 25 x 2 μ 25.5 25
z1 0.1 0 and z 2 0.1 0
σ 5 σ 5
A( 0.1) A(0.1)
A(0.1) A(0.1)
2A(0.1)
2 0.0398
P X 25 0.0796
Mean = = 30
The probability of the website receiving between 25 and 32 visitors inclusive in a given day.
i.e., P ( 25 X 32 )
P ( 25 X 32 ) = P ( 24.5 X 32.5)
x x
we know that z
σ
x1 μ 24.5 30 x 2 μ 32.5 30
z1 1.01 0 and z 2 0.46 0
σ 5.477 σ 5.477
A( 1.01) A(0.46)
A(1.01) A(0.46)
0.3438 0.1772
P 25 X 32 0.521
A continuous random variable X is said to have a Uniform distribution over the interval a,b
k a x b
, if its probability density function is given by f ( x ) = .
0 other wise
b
We have f ( x ) dx = 1
a
b
k dx = 1
a
b
k 1dx 1
a
b
k x a
1
k b a 1
1
k
b a
1
a xb
Then the probability density function is f ( x ) = b − a
0 other wise
x
The cumulative distribution function is F ( x ) = P ( X x ) = f ( x ) dx
−
x
When x a then F ( x ) = P ( X x ) = f ( x ) dx = 0
−
b
x
When a x b then F ( x ) = P ( X x ) = f ( x ) dx
−
a x
F ( x) = f ( x ) dx + f ( x ) dx
− a
x
1
F ( x) =
b − a a
1 dx
1
F ( x) = ( x )a
x
b−a
x−a
F ( x) =
b−a
x
When x b then F ( x ) = P ( X x ) = f ( x ) dx
−
a b x
F ( x) = f ( x ) dx + f ( x ) dx + f ( x ) dx
− a b
b
1
F ( x) = 0 + dx + 0
a
b−a
b
1
F ( x) =
b − a a
1 dx
1
F ( x) = ( x )a
b
b−a
b−a
F ( x) = =1
b−a
0 if x a
x−a
Therefore, cumulative distribution function is F ( x ) = P ( X x ) = if a x b
b − a
1 if x b
b
1
=x dx
a
b−a
b
1
b − a a
= x dx
b
1 x2
=
b − a 2 a
1 b2 − a 2
=
b−a 2
1 ( b − a )( b + a )
=
b−a 2
= E ( x) =
(a + b)
2
b
Variance: Variance 2 = x 2 f ( x ) dx − 2
a
b
1
2 = x2 dx − 2
a
b−a
b
1
2 =
b−a a
x 2 dx − 2
b
1 x3
=
2
−
2
b − a 3 a
1 b3 − a 3 a + b
2
2 = −
b−a 3 2
2 =
2
(
1 ( b − a ) b + ab + a
2
) − a + b 2
b−a 3 2
a 2 − 2ab + b 2
2 =
12
Variance 2 =
( b − a )2
12
(ii) P ( X 1) (iii) P ( 2 X + 3 5)
1
−2 x 2
Solution: Given X has a uniform distribution f ( x ) = 4
0 other wise
0 if x a
x−a
The cumulative distribution function is F ( x ) = P ( X x ) = if a x b
b − a
1 if x b
0 if x −2
x+2
i.e., F ( x ) = if − 2 x 2
4
1 if x 2
(i) P ( X 1)
1
P ( X 1) = f ( x ) dx
−
−2 1
P ( X 1) = f ( x ) dx + f ( x ) dx
− −2
−2 1
1
P ( X 1) = 0 dx + 4 dx
− −2
1
1
P ( X 1) =
4 −2
1 dx
1
P ( X 1) = ( x )1−2
4
1
P ( X 1) = (1 + 2 )
4
(ii) P ( X 1)
P ( X 1) = 1 − P ( X 1)
P ( X 1) = 1 − P ( −1 X 1)
1
P ( X 1) = 1 − f ( x ) dx
−1
1
P ( X 1) = 1 −
1
dx
−1
4
1
P ( X 1) = 1 − 1 dx
1
4 −1
P ( X 1) = 1 −
1 1
( x )−1
4
P ( X 1) = 1 −
1
(1 + 1)
4
P ( X 1) = 1 −
1 1
=
2 2
(iii) P ( 2 X + 3 5)
P ( 2 X + 3 5) = P ( 2 X 2 )
P ( 2 X + 3 5) = P ( X 1)
P ( 2 X + 3 5) = f ( x ) dx
1
2
P ( 2 X + 3 5) = f ( x ) dx + f ( x ) dx
1 2
2
1
P ( 2 X + 3 5) = dx + 0 dx
1
4 2
1
P ( 2 X + 3 5) = ( x )1
2
4
1
P ( 2 X + 3 5) = ( 2 − 1)
4
1
P ( 2 X + 3 5) =
4
1
−2 x 2
The Probability distribution function is f ( x ) = 4
0 other wise
0 if x a
x−a
The cumulative distribution function is F ( x ) = P ( X x ) = if a x b
b − a
1 if x b
0 if x −2
x+2
i.e., F ( x ) = if − 2 x 2
4
1 if x 2
(i) P ( X 0 )
0
P ( X 0) = f ( x ) dx
−
−2 0
P ( X 0) = f ( x ) dx + f ( x ) dx
− −2
−2 0
1
P ( X 0) = 0 dx + 4 dx
− −2
1
(ii) P X − 1
2
1 1
P X −1 = 1 − P X −1
2 2
1 1 1
P X −1 = 1 − P − X −1
2 2 2
1 1 1
P X − 1 = 1 − P 1 − X 1 +
2 2 2
1 2
P X − 1 = 1 − f ( x ) dx
2 1
2
1 2
1
P X − 1 = 1 − dx
2 1 4
2
1 2
1
P X − 1 = 1 − 1 dx
2 41
2
3
1 1
P X − 1 = 1 − ( x ) 12
2 4 2
1 13 1
P X −1 = 1 − −
2 4 2 2
1 1
P X − 1 = 1 − (1)
2 4
1 3
P X −1 =
2 4
−
(ii) P X
2 2
0 if x −
x +
Solution: Given the cumulative distribution function F ( x ) = if − x
2
1 if x
d
(i) We know that f ( x ) = F ( x)
dx
0 if x −
1
f ( x) = if − x
2
0 if x
1
if − x
i.e., f ( x ) = 2
0 else where
−
(ii) P X
2 2
− −
P X = F −F
2 2 2 2
+ − +
−
P X = 2 − 2
2 2 2 2
3
− 2
P X = − 2
2 2 2 2
− 1
P X =
2 2 2
Problem: If X is uniform variable in −1, 1 , find the probability density function, mean and variance.
(i) 1 X 3 (ii) 2 X 5.5 (iii) X 2.9 (iv) X 6.8 (v) 6.2 X 8.6
1
a xb
The probability density function is f ( x ) = b − a
0 other wise
Here given T = b − a = 10
(i) P (1 X 3)
3
P (1 X 3) = f ( x ) dx
1
3
1
P (1 X 3) = dx
1
10
3
1
P (1 X 3) = 1 dx
10 1
1
P (1 X 3) = ( x )13
10
1 1
P (1 X 3) = ( 3 − 1) =
10 5
P (1 X 3) = 0.2
(ii) P ( 2 X 5.5)
5.5
P ( 2 X 5.5 ) = f ( x ) dx
2
5.5
1
P ( 2 X 5.5 ) = 10 dx
2
1
P (1 X 3) = ( x )5.5
2
10
1 3.5
P (1 X 3) = ( 5.5 − 2 ) =
10 10
P (1 X 3) = 0.35
(iii) P ( X 2.9 )
10
P ( X 2.9 ) = f ( x ) dx
2.9
10
1
P ( X 2.9 ) = 1 dx
10 2.9
1
P ( X 2.9 ) = ( x )102.9
10
1
P ( X 2.9 ) = (10 − 2.9 )
10
7.1
P ( X 2.9 ) =
10
P ( X 2.9 ) = 0.71
(iv) P ( X 6.8)
6.8
P ( X 6.8 ) = f ( x ) dx
0
6.8
1
P ( X 6.8 ) = 10 dx
0
1
P ( X 6.8 ) = ( x )6.8
0
10
P ( X 6.8) = 0.68
8.6
P ( 6.2 X 8.6 ) = f ( x ) dx
6.2
8.6
1
P ( 6.2 X 8.6 ) = 10 dx
6.2
8.6
1
P ( 6.2 X 8.6 ) = 1 dx
10 6.2
1
P ( 6.2 X 8.6 ) = ( x )8.6
6.2
10
1 2.4
P ( 6.2 X 8.6 ) = (8.6 − 6.2 ) =
10 10
cumulative distribution function of the random variable Y. Also calculate the mean and variance of
the random variable X.
1
a xb
The Probability distribution function is f ( x ) = b − a
0 other wise
1
−3 x 3
i.e., f ( x ) = 6
0 other wise
0 if x a
x−a
The cumulative distribution function is F ( x ) = P ( X x ) = if a x b
b − a
1 if x b
0 if x −3
x+3
i.e., F ( x ) = if − 3 x 3
6
1 if x 3
Given Y = X 2
X = Y
−3 X 3 0 X 2 9
i.e., 0 Y 9
F ( y) = P ( X 2 y)
(
F ( y) = P − y X y )
F ( y) = f ( x ) dx
− y
y
1
F ( y) = dx
− y
6
y
1
F ( y ) = 1 dx
6− y
1
F ( y) = ( x )− y
y
F ( y) =
1
6
( y+ y )
y
F ( y) =
3
0 if y 0
y
The cumulative distribution function of Y is F ( y ) = P (Y y ) = if 0 y 9
3
1 if y 9
Mean of X is = E ( X ) = x f ( x ) dx
−
3
1
= E ( X ) = x dx
−3
6
3
1
= E ( X ) = x dx
6 −3
a
2 f ( x ) dx if f ( x ) is even function
f ( x ) dx 0
a
1
= E(X ) = (0)
6
−a
0 if f ( x ) is odd function
Mean of X is = 0
Or
Variance of X is = x f ( x ) dx −
2 2 2
−
3
1
= x2 dx − ( 0 )
2 2
−3
6
3
1
= x 2 dx
2
6 −3
a
2 f ( x ) dx if f ( x ) is even function
f ( x ) dx 0
3 a
2 2
2 =
6 0
x dx
−a 0 if f ( x ) is odd function
3
1 x3
=
2
3 3 0
1
2 = ( 27 − 0 )
9
Variance of X is 2 = 3
Or
(b − a ) ( 3 + 3)
2 2
36
Variance = 2
= = =3
12 12 12
cumulative distribution function of the random variable Y. Also calculate the mean and variance of
the random variable X.
1
a xb
The Probability distribution function is f ( x ) = b − a
0 other wise
1 0 x 1
i.e., f ( x ) =
0 other wise
0 if x a
x−a
The cumulative distribution function is F ( x ) = P ( X x ) = if a x b
b − a
1 if x b
0 if x 0
i.e., F ( x ) = x if 0 x 1
1 if x 1
Given Y = −2 log X , 0 X 1
0 X 1 − log X 0
0 −2 log X
i.e., 0 Y
F ( y ) = P ( −2log X y )
y
F ( y ) = P log X
−2
−
y
F ( y) = P X e 2
y
−
e 2
F ( y) = 1− f ( x ) dx
−
y
−
e 2
F ( y) = 1− f ( x ) dx
0
y
−
e 2
F ( y) = 1− 1 dx
0
y
−
F ( y ) = 1 − ( x )0
e 2
− 2y
F ( y ) = 1 − e − e0
y
−
F ( y) = 2 − e 2
0 if y 0
The cumulative distribution function is F ( y ) = −
y
2 − e 2 if 0 y
Mean of X is = E ( X ) = x f ( x ) dx
−
1
= E ( X ) = x .1 dx
0
1
= E ( X ) = x dx
0
1
x2
= E(X ) =
2 0
1
= E ( X ) = − 0
2
Variance of X is = x f ( x ) dx −
2 2 2
−
2
1
1
= x .1 dx −
2 2
0 2
1
x3 1
= −
2
3 0 4
1 1
2 = − 0 −
3 4
1 1
2 = −
3 4
1
Variance of X is 2 =
12
Mean:
The mean of the exponential distribution is = E ( x ) = x f ( x ) dx
−
= E ( x ) = x e− x dx
0
= E ( x ) = x e− x dx
0
e− x e− x
= E ( x) = x − 2
− 0
1
= E ( x ) = ( 0 − 0 ) − 0 − 2
1
Mean = E ( x ) =
Variance:
The variance of the exponential distribution is 2 = x f ( x ) dx −
2 2
−
2
1
2 = x 2 e− x dx −
0
e− x e− x e− x
= x2
2
− 2 x 2 + 2 3
− − 0
1
2 = ( 0 − 0 + 0 ) − 0 − 0 + 2 3
−
1
2 =
2
When x 0
x
F ( x) = P ( X x) = f ( x ) dx = 0
−
When x 0
x
F ( x) = P ( X x) = f ( x ) dx
−
0 x
F ( x) = f ( x ) dx + f ( x ) dx
− 0
x
F ( x ) = 0 + e− x dx
0
x
F ( x ) = e− x dx
0
x
e− x
F ( x) =
− 0
F ( x ) = − ( e − x − e0 )
F ( x ) = 1 − e− x
0 x0
The cumulative distribution function is F ( x ) = − x
1 − e x0
−x
a e 3
Solution: Given the probability density function f ( x ) = x0
0 else where
(i) We have f ( x ) dx = 1
−
0
f ( x ) dx + f ( x ) dx = 1
− 0
−x
0 + a e 3 dx = 1
0
−x
a e 3
dx = 1
0
−x
e 3
a =1
1
−
3 0
−3a ( e− − eo ) = 1
−3a ( 0 − 1) = 1
3a = 1
1
a=
3
1 −3x
e x0
The probability density function f ( x ) = 3
0
else where
(ii) P ( X 3)
P ( X 3) = f ( x ) dx
3
P ( X 3) = − ( e− − e−1 )
1
P ( X 3) = − 0 −
e
1
P ( X 3) =
e
(iii) P (1 X 5)
5
P (1 X 5) = f ( x ) dx
1
1 −x
5
P (1 X 5) = e 3 dx
1
3
5 −x
1
P (1 X 5) = e 3
dx
31
5
−x
1e 3
P (1 X 5 ) =
3 −1
3 1
−5 −1
P (1 X 5 ) = − e 3 − e 3
−1 −5
P (1 X 5 ) = e − e 3 3
1
Solution: Given parameter =
3
e − x x0
The probability distribution is f ( x ) =
0 else where
1 − 3x
e x0
i.e., f ( x ) = 3
0
else where
P ( X 3) = f ( x ) dx
3
1 −3x
P ( X 3) = e dx
3
3
−x
1
P ( X 3) =
3 3
3
e dx
−x
1 e 3
P ( X 3) =
3 −1
3 3
P ( X 3) = − ( e− − e−1 )
1
P ( X 3) = − 0 −
e
1
P ( X 3) =
e
1
Solution: Given mean = =6
1
=
6
e − x x0
The probability distribution is f ( x ) =
0 else where
1 − 6x
e x0
i.e., f ( x ) = 6
0
else where
i.e., P ( X 8) = f ( x ) dx
8
1 −x
P ( X 8) = e 6 dx
8
6
−x
1
P ( X 8) = e 6 dx
68
−x
1e 6
P ( X 8) =
6−1
6 8
− − 86
P ( X 8) = − e − e
−
4
P ( X 8) = − 0 − e 3
4
−
P ( X 3) = e 3
1 −x
8
P ( 4 X 8) = e 6 dx
4
6
8−x
1
P ( 4 X 8) = e 6
dx
64
8
−x
1e 6
P ( 4 X 8) =
6−1
6 4
−8 −4
P ( 4 X 8) = − e 6 − e 6
−2 −4
P ( 4 X 8) = e − e 3 3
Problem: Assume that the time between arrivals of customers at a particular bank is exponentially
distributed with a mean of 4 minutes.
(i) Find the probability that the time between arrivals is greater than 5 minutes.
(ii) Find the probability that the time between arrivals is between 1 and 4 minutes.
Unit – III
0.40
H
0.45
0.30 F N 0.10
0.65
Solution: Here we have three states namely full credit (F), half-credit (H) and no credit (N).
So, the transition probability matrix is of order 3 3.
From / To Full Credit (F) Half Credit (H) No Credit (N)
Full Credit (F) 0.30 0.25 0.45
Half Credit (H) 0.45 0.40 0.15
No Credit (N) 0.65 0.25 0.10
Solution: Given
From/To Brand A Brand B
Brand A 0.8 0.2
Brand B 0.6 0.4
0.4
The Markov process is
0.8 A B 0.6
0.2
Path: Let ‘i’ and ‘j’ be two states in the chain, a path from i to j is a sequence of transitions
that begin in i and end in j, such that each transition in the sequence has a positive probability
of occurring.
Communicate: Two states i and j are said to be communicate if j is reachable from i, and i is
reachable form j. Is denoted by i j 0.40
Eg: In the given Markov chain every state is communicate with other, 1
i.e., i i
Symmetric: If state i communicates with state j, then state j communicates with state i.
i.e., i j and j i
Transitive: If state i communicates with state j, and state j communicates with state k, then
state i communicates with state k
Communicative class: Two states that communicate with each other are in the same class. A
state that communicates with no other states itself is a class.
i.e., C ( i ) = j S / i j
Two states are said to be in the same class if the two states communicate with each other, that
is i ↔ j, then i and j are in same class.
Thus, all states in a Markov chain can be partitioned into disjoint classes
Closed Set: A set of states S in a Markov chain is a closed set if no state outside of S is
reachable from any state in S.
Absorbing State: A state ‘i’ is an absorbing state if pii = 1 , an absorbing state is also a closed
set containing only one state.
Whenever we enter in absorbing state, we never leave the state.
Eg: Suppose a person has 1 coin in a game, he is going to gain a coin or lose a coin until that
person loses everything or gets 4 coins then that person quits the game.
If he gets 0 coins, he quits the game and if he gets 4 coins, he quits the game.
If the person has 1 coin, he may get zero coin if he loses the game, 0.5 0.5 0.5 0.5
Transient: A state in a Markov chain is transient if it is not possible to return to that state in a
finite number of steps.
i.e., A state i is transient state if there exists a state j that is reachable from i, but the state i is
not reachable from j.
Eg: 0.5 0.5 0.5 0.5
1 1 2 3 4 5 1
0.5 0.5
In the Markov chain, we can go from state 2 to state 1 or 5, but not from 1 to 2 or 5 to 2,
Similarly,
we can go from state 3 to 1 or 5, but not from 1 to 3 or 5 to 3,
we can go from state 4 to 1 or 5, but not from 1 to 4 or 5 to 4.
Hence the states 1 and 5 are transient.
Periodic: A state i is periodic with period k 1, if k is the smallest number such that all paths
leading from state i back to state i have a length that is a multiple of k.
A state which is not periodic is called aperiodic.
The length of the other paths that can take us out of 1 and let us to go back to 1
is LO = 4 , i.e., 1 to 2, 2 to 3, and 3 to 2, 2 to 1.
Clearly the length of other path is multiple of shortest path, so the state 1 is periodic.
In diagram B the length of the shortest path that can take us from out of the state 1 and bring
us back, here the length of that path is LS = k = 2 i.e., 1 to 2 and 2 to 1.
The length of the other paths that can take us out of 1 and let us to go back to 1 1 2
is LO = 3 , i.e., 1 to 2, 2 to 3, and 3 to 1.
Irreducible: In Markov chain it is possible to transition from any state in the chain to any other
state, possibly with positive probability, in a finite number of steps.
Ergodicity: A Markov chain is ergodic if all states are irreducible, aperiodic and communicate
with each other.
Regular chain: A Markov chain is called a regular chain if some power of the transition matrix
has only positive elements. i.e., strictly greater than zero.
Note: There is a theorem that says that if an n n transition matrix represents n states, then we
need only examine powers T m up to m = ( n − 1) + 1 .
2
0.6 0 0.4
Solution: Let the given matrix be A = 0.2 0.4 0.4
0 1
0
0.36 0 0.64
A = 0.2 0.16 0.64 , …
2
0 1
0
0.07776 0 0.92224
A = 0.06752 0.01024 0.9224
5
0 1
0
Here all the entries of the 5th order matrix are not all positive, so the matrix is nor regular.
0.5 0.5 0 0
0.4 0.6 0 0.25 0.5 0.25
(i) 0.5
0.5 0.5 0 0
0 0.5 (ii) (iii) 0.6 0.4 0
0 0.25 0.75 0 0 0.6 0.4 0
0.6 0.4
0 0 0.25 0.75
0 0.8 0.2
(iv) 0.3 0.7 0
0.4 0.5 0.1
2 3
Let X 0 , X1 , X 2 ,... be a Markov chain with n n transition probability matrix P. Then the
Transition probability
One step time period
The probability of moving from one state of a system into another state is the transition
probability. In a Markov chain Pij is the transition probability of going from ith state to jth
state at the one step time period.
i.e., pij = P X n +1 = j | X n = i
If P X 2 = 2 | X 1 = 1 = pij = p12 ;
P X 2 = 3 | X1 = 2 = pij = p23 are the one step transition probabilities.
Probability form state ‘i’ to state ‘j’ after 2- step time period, denoted by pij (2) is defined as
pij (2) = P X 3 = j | X 1 = i
If P X 2 = 2 | X 0 = 3 = p32
(2)
;
P X 3 = 3 | X1 = 2 = p23
(2)
are the two step transition probabilities.
Probability form state ‘i’ to state ‘j’ after n- step time period, denoted by pij ( n ) is defined as
pij ( n ) = P X n +1 = j | X 1 = i
Note: In the Markov process, two terms States and Transition Probability play a key role.
…
S n is probability of the states after the nth time period.
Note: X n = i ; P X n a Sn a
The probability that he changes from going by train to driving exactly in four days is given by
P ( X 4 = C / X 0 = T ) = pTC
4
P ( X n+t = j / X n = i ) = pijt
0.375 0.625
P4 =
0.3125 0.6875
P ( X 4 = C / X 0 = T ) = pTC
4
= 0.625
(ii) Find the probability that subcontract is switched from A to B after two years period.
(ii) Find the probability that subcontract is switched from B to C after two years period.
(iii) Find the probability that subcontract C will be able to retain its after two years period.
The probability that subcontract from A will be shifted to subcontract B after two years period
is given by P ( X 2 = B / X 0 = A) = pAB
2
P ( X 2 = B / X 0 = A) = p AB
2
= 0.33
There is 33% chance that subcontract from A will be shifted to subcontract B after two years
period.
The probability that subcontract from B will be shifted to subcontract C after two years period
is given by P ( X 2 = C / X 0 = B ) = pBC
2
P ( X 2 = B / X 0 = A) = pBC
2
= 0.16
The probability that subcontracts C will be able to retain its after two years period is given by
P ( X 2 = C / X 0 = C ) = pCC
2
P ( X 2 = C / X 0 = C ) = pCC
2
= 0.15
There is 15% chance that subcontract C will be able to retain its after two years period.
X n , ( n 1) denotes the child who had the ball after n throws. Determine the transition
and the probability that the child who had originally the ball will have it after 2 throws.
Solution: Given that at each stage the child having the ball is equally likely to throw it into
any one of the other two children.
The transition probability matrix P is
To
1 2 3
1 40 0.5 0.5
From 2 0.5 0 0.5
3 0.5 0.5 0
1 2 3
1 4 0.5 0.25 0.25
P = 2 0.25 0.5 0.25
2
P ( X 2 = 1/ X 0 = 1) = p112
P ( X 2 = 1/ X 0 = 1) = p112 = 0.5
P ( X 2 = 2 / X 0 = 3) = p32
2
P ( X 2 = 2 / X 0 = 3) = p32
2
= 0.25
P ( X 2 = 3 / X 0 = 2 ) = p23
2
P ( X 2 = 3 / X 0 = 2 ) = p23
2
= 0.25
If the child 1 has the ball at initially, then the initial probability is S 0 1 0 0
If the child 2 has the ball at initially, then the initial probability is S 0 0 1 0
If the child 3 has the ball at initially, then the initial probability is S 0 0 0 1
(ii) Calculate the number of students who do mathematics work and Physics work for the next
subsequent two study periods.
0.8 0.2
Solution: The transition probability matrix is P =
0.7 0.3
0.78 0.22
P2 =
0.77 0.23
We have S2 = S0 P2 S n S 0P n
0.78 0.22
S2 = ( 60 40 )
0.77 0.23
S2 = ( 77.6 22.4 )
After two study periods there will be approximately 78 students do Mathematics and 22
students do Physics.
(ii) If the prices of the share decreased today, what are the chances that it will increase
tomorrow?
(iii) If the price of the share remained unchanged today, what are the chances that it will
increase, decrease or remain unchanged day after tomorrow?
(i) If the price of share increased today, then the initial probability is S 0 1 0 0
Now S1 = S0 P S n S 0P n
Thus, the chance that the price will increase, decrease or remain unchanged tomorrow are 50%,
20% and 30%.
(ii) If the prices of the share decreased today then the initial probability is S 0 0 1 0
Now S1 = S0 P S n S 0P n
Thus, the chance that the price will increase tomorrow is 70%.
(iii) If the price of the share remained unchanged today, then initial probability is
S0 0 0 1
Tomorrow will be S1 = S0 P S n S 0P n
Thus, the chance that the price will increase, decrease or remain unchanged day after tomorrow
are 59%, 18% and 23%.
Solution: In the given matrix each element value between 0 and 1 and the sum of elements in
each row is exactly one. Thus, the given matrix is a transition matrix.
(a) Each row of the matrix can be used to interpret the monthly retention and loss by each
manufacturer.
Row 1: According to the data of row 1, in the coming month manufacturer X will retain 70%
of its customers but losses 10% to B and 20% to C.
i.e., Given that a customer purchased manufacturer X's product in the past month, the
probability of a repeat purchase of that product from manufacturer X is 0.7, and the probability
of a switch to manufacturer Y's product is 0.1 and manufacturer Zs product is 0.2.
Row 2: According to the data of row 2, in the coming month manufacturer Y will lose 10% of
its previous month's customers to manufacturer X and retain 80% of them itself and will lose
10% to Z.
Row 3: According to the data of row 3, in the coming month manufacturer Z will retain 70%
of its customers but loses 20% to X and 10% to Z.
(b) Similarly, columns can be used to study the retention and gains of each manufacturer.
Column 1: The data of column 1, reveals that manufacturer X will retain 70% of last month's
customers and also gains 10% customers of B and 20% customers of C.
Column 2: The data of column 2 reveals that manufacturer Y will gain 10% of X's customers,
10% of Zs customers and retain 80% of its own.
Column 3: The data of column ,3 reveals that manufacturer Z will gain 20% of X's customers,
10% of Y's customers and retain 70% of its own.
(i) Construct and interpret the state transition matrix in terms of (a) retention and loss.
(ii) Calculate the probability of Mr. Arjun purchasing brand A car at the end of second period,
third period. Draw the transition probability diagrams and the transition trees.
(iii) Calculate the probability of Mr. Arjun purchasing brand B car at the end of second period,
third period.
Solution:
(i) Let state 1 be the condition of owning a brand A car and state 2 that of owning a brand B
car. The transition probabilities are then p11 = 0.8, p12 = 0.2, p21 = 0.6 and p22 = 0.4 The state-
0.8 0.2
transition matrix P is P =
0.6 0.4
p11 = 0.8 means that Mr. Arjun now using brand A car will again purchase brand A car at the
p12 = 0.2 means that Mr. Arjun now using brand A car will switch over to brand B car the next
p11 = 0.8 means that Mr. Arjun now using brand A car will again purchase brand A car at the
p21 = 0.6 shows that Mr. Arjun now using brand B car will switch over to brand A car next
0.6
0.8 A B 0.4
0.2
0.8
0.8
0.2
Current Purchase
0.6
0.2
0.4
0.8
0.6
0.2
Current Purchase
0.6
0.4
0.4
0.8 0.2
S1 = (1 0 ) = ( 0.8 0.2 )
0.6 0.4
0.76 0.24
S2 = (1 0 ) = ( 0.76 0.24 )
0.72 0.28
0.752 0.248
S3 = (1 0 ) = ( 0.752 0.248)
0.744 0.256
0.8 0.2
S1 = ( 0 1) = ( 0.6 0.4 )
0.6 0.4
0.76 0.24
S2 = ( 0 1) = ( 0.72 0.28)
0.72 0.28
0.752 0.248
S3 = ( 0 1) = ( 0.744 0.256 )
0.744 0.256
(i) Express this information as a transition probability matrix and determine the probabilities of
bike being at a particular station after two days.
(ii) Suppose when we start observing the bike share program, 30% of the bikes are at station A,
45% of the bikes are at station B and 25% are at station C, determine the distribution of bikes at
the end of the next day and after two days.
Solution:
(i) The given Markov chain is
0.3 0.5
0.1 0.6
0.8
We have S 2 S 0 P 2 S n S 0P n
We have S 2 S 0 P 2 S n S 0P n
We have S 1 S 0 P S n S 0P n
We have S 2 S 0 P 2 S n S 0P n
This means that 70% chance that a person uses car to go to work at the first day.
20% chance that a person uses bus to go to work at the first day.
10% chance that a person uses train to go to work at the first day.
Now S 2 S 0 P 2 S n S 0P n
i.e., The probability that on the second day a man uses car to go to work.
P ( X 2 = 1) = S2 (1) P ( X n = a ) = Sn ( a )
P ( X 2 = 1) = S2 (1) = 0.385 or 38.5%
i.e., The probability that on the second day a man uses train to go to work.
P ( X 2 = 3) = S 2 ( 3) P ( X n = a ) = Sn ( a )
P ( X 2 = 3) = S2 ( 3) = 0.279 or 27.9%
We have S 2 S 0 P 2 S n S 0P n
P ( X 2 = A) = S2 ( A) = 0.420
P ( X 2 = B ) = S2 ( B ) = 0.312
P ( X 2 = C ) = S2 ( C ) = 0.268
1 1 1
The expected distribution of consumers two time periods later is .
0.420 0.312 0.268
(ii) P ( X 3 = 2, X 2 = 3, X1 = 3, X 0 = 2 )
We have S 2 S 0 P 2 S n S 0P n
(i) P ( X 2 = 1) and P ( X 2 = 3)
P ( X 2 = 1) = S2 (1) = 0.354
P ( X 2 = 3) = S2 ( 3) = 0.286
(ii) P ( X 3 = 2, X 2 = 3, X1 = 3, X 0 = 2 )
(i) P ( X 2 = 1) and P ( X 2 = 3)
P ( X 2 = 1) = S2 (1) = 0.354
P ( X 2 = 3) = S2 ( 3) = 0.286
(ii) P ( X 3 = 2, X1 = 0, X 0 = 2 ) (iii) P ( X 2 = 2 )
0 1 2
0 0.2 0.3 0.5
Solution: Given TPM is P = 1 0.1 0.6 0.3
2 0.4 0.3 0.3
(i) P ( X 3 = 2, X 2 = 1, X 1 = 0, X 0 = 2 )
P ( X 3 = 2, X 2 = 1, X1 = 0, X 0 = 2 ) = 0.0072
(ii) P ( X 3 = 2, X1 = 0, X 0 = 2 )
(iii) P ( X 2 = 2 )
We have S 2 S 0 P 2 S n S 0P n
P ( X 2 = 2 ) = S2 ( 2 ) = 0.342 P ( X n = a ) = Sn ( a )
and initial probability is ( 0.6 0.3 0.1) then evaluate (i) P ( X 3 = 3, X 1 = 1) (ii) P ( X 2 = 3)
1 2 3
1 0.1 0.4 0.5
Solution: Given TPM is P = 2 0.2 0.4 0.4
3 0.3 0.4 0.3
(i) P ( X 3 = 3, X 1 = 1)
(ii) P ( X 2 = 3)
We have S 2 S 0 P 2 S n S 0P n
P ( X 2 = 3) = S2 ( 3) = 0.37
34 1
4 0
1
Solution: Given the transition probability matrix P = 14 1
2 4
0 3 1
4 4
We have S 2 S 0 P 2 S n S 0P n
S 2 S0P 2
85 165 1
5 16
3
(
S2 = 13 1
3
1
)
3 16 2 16 = 8
1 3
( 11
24
1
6 )
3 9 1
16 16 4
(i) P ( X 2 = 2 )
P ( X 2 = 2 ) = S2 ( 2 ) = 11
24
(ii) P ( X 2 = 2, X 1 = 1, X 0 = 2 )
1 1 1 1
P ( X 2 = 2, X 1 = 1, X 0 = 2 ) = =
3 4 4 48
Problem: Given that a person last cola purchase was Coke, there is a 90% chance that his next
cola purchase will also be Coke. If a person last cola purchase was Pepsi, there is a 80% chance
that his next cola purchase will also be Pepsi. The present market share of the Coke and Pepsi
is 55% and 45% respectively. Construct the TPM. In the long run, what is the market
share of such cola?
0.90 0.10
Solution: The Transition Probability matrix is P =
0.20 0.80
0.90 0.10
i.e., p1 p2 = p1 p2
0.20 0.80
p1 + p2 = 1 → (3)
2 1
By solving (1) and (3), we get p1 = , p2 =
3 3
Hence in the long run, the market share of the Coke and Pepsi will be 66.67% and 33.33%
respectively.
Solution: The transition probabilities for this Markov chain with three states are as follows
0.6 0.4 0
The Transition probability matrix is 0.1 0.6 0.3
0 0.2 0.8
0.6 0.4 0
i.e., p1 p2 p3 0.1 0.6 0.3 = p1 p2 p3
0 0.2 0.8
−0.4 p1 + 0.4 p2 = 0
p1 = p2 → (1)
0.2 p2 + 0.8 p3 = p3
p1 + p2 + p3 = 1 → (4)
1 4 6
By solving (1), (2) and (4), we get p1 = , p2 = , p3 =
11 11 11
1
Thus, the limiting fraction of drivers in the POOR category is , in the SATISFACTORY
11
4 6
category is , and in the PREFERRED category is .
11 11
The present market share of the three brands is S0 = 0.60 0.30 0.10
p1 + p2 + p3 = 1 → (4)
Problem: A hospital operates on a charity basis. All expenses are paid by the government. Of
late, the board of governors of the hospital has been complaining about the size of the budget
and insisting that the hospital cut expenses. The major area of concern has been cost of keeping
patients in the intensive care unit (ICU). The cost has averaged Rs. 1000 per week per person
compared to only Rs. 500 per week per person for keeping patients in the WARDS.
Past history shows that of the patients in the ICU at the beginning of the week, 50% will remain
there at the end of the week, 50% will be moved to a WARD. Of the patients in the WARDS
at the beginning of the week, 50% will remain there at the end of the week, 10% will get worse
and be transferred to the ICU, and 40% will become OUTPATIENTS. Of the person who is
OUTPATIENTS at the beginning of the week, 85% will remain as OUTPATIENTS at the end
of the week, 10% will be admitted to WARDS, and 5% will be admitted to the ICU. Find the
expected cost in steady state.
p1 + p2 + p3 = 1 → (4)
Tomorrow
1 2 3 4
1 0 34 14 0
1 1
Today 2 0 2 2 0
3 0 0 12 12
4
1 0 0 0
If it costs Rs. 1,250 to overhaul a machine (including lost time), on the average, and Rs. 750 in
production is lost if a machine is found inoperative. Using steady state probabilities, compute
the expected per day cost of maintenance.
Solution: Here the given matrix P is an ergodic regular Markov process, it will certainly reach
to steady state equilibrium.
Let the steady state probabilities p1 , p2 , p3 and p4 represent the proportion of times that the
0 34 14 0
0 12 12 0
i.e., p1 p2 p3 p4 = p1 p2 p3 p4
0 0 12 12
1 0 0 0
implies p1 = p4 → (1)
3 1
p1 + p2 = p2
4 2
3 1
p1 = p2
4 2
3 3
p2 = p1 p2 = p4 → (2)
2 2
1 1 1
p1 + p2 − p3 = 0 → (3)
4 2 2
1
p3 = p4
2
p3 = 2 p4 → (4) and
p1 + p2 + p3 + p4 = 1 → (5)
3
p4 + p4 + 2 p4 + p4 = 1
2
11
p4 = 1
2
2
p4 =
11
2 3 4
p1 = ; p2 = ; p3 =
11 11 11
Thus, on an average, 2 out of every 11 days the machine will be overhauled; 3 out of every
11 days it will be in good condition; 4 out of every 11 days it will be in fair condition; and 2
out of every 11 days it will be found inoperative at the end of the days.
Hence the average cost per day of maintenance will be p1 1250 + p4 750
2 2
i.e., 1250 + 750 = Rs. 363.6
11 11
Unit – IV
The probability density function of the Univariate normal distribution ( p = 1 variables) is given
1 x−
2
1 −
by f ( x ) = e 2 for − x
2
( p = 2 variables) is given by
−
1
2
x − x − 2 ( )
( x−x ) y − y y − y
+
2
1 (
2 1− 2 ) x x y y
f xy ( x, y ) =
e
2 x y 1 − 2
Y
Where x & y , x , y 0 and ( −1,1)
Two random variables X and Y are said to have the standard bivariate normal distribution withX
correlation coefficient ( −1,1) if their joint probability density function is given by
1 x 2 − 2 xy + y 2
−
f xy ( x, y ) =
1 ( )
2 1− 2
e where x = y = 0, x = y = 1
2 1 − 2
The exponent term, squared statistical distance between x & in standard deviation units is
x−
2
= ( x − ) ( x − )
−1
Generalization to p 1 variables, here X p1 and parameters μ p1 and Σ p p then the exponent
1
The integral ... exp − ( X − μ ) Σ −1 ( X − μ ) = ( 2 ) Σ
T p /2 1/2
x1 xp 2
This implies that the multivariate normal distribution probability density function is given by
1
− ( X −μ ) −1 ( X −μ )
T
1
f ( X) = e 2
where − xi and i = 1, 2,... p .
( 2 )
p /2 1/2
Σ
It is denoted by X N p ( μ, Σ )
x1 1 11 12 . . . 1 p
x2 2 21 22 . . . 2p
. . . . . . . .
Here X = , μ = and Σ =
. . . . . . . .
. . . . . . . .
. pp
xn n p1 p 2 . .
01. Joint Density: The multivariate normal distribution M N ( μ, Σ ) has joint density
( y − ) −1 ( y − )
1
1 1 −
f y (Y / μ, Σ ) = e 2
( 2 )
n /2 1/2
Σ
04. Moment Generating Function: The M N ( μ, Σ ) has moment generating distribution that
T 1 T
is given by M ( t ) = e
t + t t
2
where t is n1 real vector ( )
M x ( t ) = E et
T
x
r12 − r13r23
then any linear transformation Y = AX + bR12.3 = also normally
1 − r132 1 − r232
Marginal distribution
Let X and Y be two discrete types of random variables, with joint distribution function
P ( x, y ) then,
Here PX ( x ) 0; PY y 0 and P ( x ) = 1; P ( y ) = 1
X
X
Y
Y
PXY ( x, y )
The conditional probability mass function of X given Y = y is PX /Y ( x / y ) = or
PY ( y )
P ( X = x, Y = y )
PX /Y ( x / y ) =
P (Y = y )
PXY ( x, y )
The conditional probability mass function of Y given X = x is PY / X ( y / x ) = or
PX ( x )
P ( X = x, Y = y )
PY / X ( y / x ) =
P ( X = x)
Here PX /Y ( x / y ) 0; PY / X ( y / x ) 0 and
P ( x, y )
XY
P ( y) P ( x, y )
XY
PX ( x )
PX /Y ( x / y ) =
X
X
PY ( y )
= Y
PY ( y )
= 1; PY / X ( y / x ) =
Y
Y
PX ( x )
=
PX ( x )
=1
PXY ( x, y ) = PX ( x ) PY ( y )
PXY ( x, y ) PX ( x ) PY ( y )
PX /Y ( x / y ) = = = PX ( x )
PY ( y ) PY ( y )
PXY ( x, y ) PX ( x ) PY ( y )
PY / X ( y / x ) = = = PY ( y )
PX ( x ) PX ( x )
E ( aX + bY ) = aE ( X ) + bE (Y )
V ( aX + bY ) = a 2V ( X ) + b2V (Y )
Y
1 2 3 4 Total
X
4 3 2 1 10
1
36 36 36 36 36
1 3 3 2 9
2
36 36 36 36 36
5 1 1 1 8
3
36 36 36 36 36
1 2 1 5 9
4
36 36 36 36 36
11 9 7 9
Total 1
36 36 36 36
X = 2.
i.e., P ( X = 1) = P ( X = 1, Y = 1) + P ( X = 1, Y = 2 ) + P ( X = 1, Y = 3) + P ( X = 1, Y = 4 )
4 3 2 1 10
P ( X = 1) = + + + =
36 36 36 36 36
P ( X = 2 ) = P ( X = 2, Y = 1) + P ( X = 2, Y = 2 ) + P ( X = 2, Y = 3) + P ( X = 2, Y = 4 )
1 3 3 2 9
P ( X = 2) = + + + =
36 36 36 36 36
P ( X = 3) = P ( X = 3, Y = 1) + P ( X = 3, Y = 2 ) + P ( X = 3, Y = 3) + P ( X = 3, Y = 4 )
5 1 1 1 8
P ( X = 3) = + + + =
36 36 36 36 36
1 2 1 5 9
P ( X = 4) = + + + =
36 36 36 36 36
X 1 2 3 4
10 9 8 9
P ( X)
36 36 36 36
Marginal distribution of Y is P (Y = y ) = P ( X = xi , Y = y )
x
i.e., P (Y = 1) = P ( X = 1, Y = 1) + P ( X = 2, Y = 1) + P ( X = 3, Y = 1) + P ( X = 4, Y = 1)
4 1 5 1 11
P (Y = 1) = + + + =
36 36 36 36 36
P (Y = 2 ) = P ( X = 1, Y = 2 ) + P ( X = 2, Y = 2 ) + P ( X = 3, Y = 2 ) + P ( X = 4, Y = 2 )
3 3 1 2 9
P (Y = 2 ) = + + + =
36 36 36 36 36
P (Y = 3) = P ( X = 1, Y = 3) + P ( X = 2, Y = 3) + P ( X = 3, Y = 3) + P ( X = 4, Y = 3)
2 3 1 1 7
P (Y = 3) = + + + =
36 36 36 36 36
P (Y = 4 ) = P ( X = 1, Y = 4 ) + P ( X = 2, Y = 4 ) + P ( X = 3, Y = 4 ) + P ( X = 4, Y = 4 )
1 2 1 5 9
P (Y = 4 ) = + + + =
36 36 36 36 36
Y 1 2 3 4
11 9 7 9
P (Y )
36 36 36 36
4
P ( X = 1, Y = 1) 36 4
i.e., P ( X = 1/ Y = 1) = = =
P (Y = 1) 11 11
36
1
P ( X = 2, Y = 1) 36 1
P ( X = 2 / Y = 1) = = =
P (Y = 1) 11 11
36
5
P ( X = 3, Y = 1) 36 5
P ( X = 3 / Y = 1) = = =
P (Y = 1) 11 11
36
1
P ( X = 4, Y = 1) 36 1
P ( X = 4 / Y = 1) = = =
P (Y = 1) 11 11
36
X 1 2 3 4
4 1 5 1
P ( X / Y = 1)
11 11 11 11
P ( X = x, Y = yi )
Conditional distribution of Y given X = x is P (Y = yi / X = x ) =
P ( X = x)
1
P ( X = 2, Y = 1) 36 1
i.e., P (Y = 1/ X = 2 ) = = =
P ( X = 2) 9 9
36
3
P ( X = 2, Y = 2 ) 36 3
P (Y = 2 / X = 2 ) = = =
P ( X = 2) 9 9
36
2
P ( X = 2, Y = 4 ) 36 2
P (Y = 4 / X = 2 ) = = =
P ( X = 2) 9 9
36
Y 1 2 3 4
1 3 3 2
P ( Y / X = 2)
9 9 9 9
X 0 1 2 3 4 Y 0 1 2 3 4 5
P(X) 0.1 0.2 0.3 0.2 0.2 P(Y) 0.1 0.1 0.2 0.3 0.2 0.1
Assume that X and Y are independent. Find the joint PMF of X and Y. Also find the PMF of
Z = X + Y.
Solution: Given X and Y are independent variables and the PMF of X and Y are,
X 0 1 2 3 4 Y 0 1 2 3 4 5
P(X) 0.1 0.2 0.3 0.2 0.2 P(Y) 0.1 0.1 0.2 0.3 0.2 0.1
Y
0 1 2 3 4 5 Total
X
Let Z = X + Y
P ( Z = 0 ) = P ( X = 0, Y = 0 ) = 0.01
P ( Z = 1) = P ( X = 0, Y = 1) or ( X = 1, Y = 0 )
P ( Z = 1) = P ( X = 0, Y = 1) + P ( X = 1, Y = 0 )
P ( Z = 2 ) = P ( X = 0, Y = 2 ) or ( X = 2, Y = 0 ) or ( X = 1, Y = 1)
P ( Z = 2 ) = P ( X = 0, Y = 2 ) + P ( X = 2, Y = 0 ) + P ( X = 1, Y = 1)
P ( Z = 3) = P ( X = 0, Y = 3) or ( X = 3, Y = 0 ) or ( X = 1, Y = 2 ) or ( X = 2, Y = 1)
P ( Z = 3) = P ( X = 0, Y = 3) + P ( X = 3, Y = 0 ) + P ( X = 1, Y = 2 ) + P ( X = 2, Y = 1)
P ( Z = 4 ) = P ( X = 0, Y = 4 ) or ( X = 4, Y = 0 ) or ( X = 1, Y = 3) or ( X = 3, Y = 1) or ( X = 2, Y = 2 )
Similarly,
Z 0 1 2 3 4 5 6 7 8 9
P(Z) 0.01 0.03 0.07 0.12 0.18 0.20 0.18 0.13 0.06 0.02
X P(X) Y P(Y)
1 0.3 0 0.4
2 0.5 1 0.6
3 0.2
(a) Find the probability mass function (PMF) of the random variable Z = X + Y .
Solution: Given X and Y be two independent random variables with the following
probability distributions
X P(X) Y P(Y)
1 0.3 0 0.4
2 0.5 1 0.6
3 0.2
Y
0 1 Total
X
a) Let Z = X + Y
P ( Z = 1) = P ( X = 1, Y = 0 ) = 0.12
P ( Z = 2 ) = P ( X = 1, Y = 1) or ( X = 2, Y = 0 )
P ( Z = 2 ) = P ( X = 1, Y = 1) + P ( X = 2, Y = 0 )
P ( Z = 3) = P ( X = 2, Y = 1) or ( X = 3, Y = 0 )
P ( Z = 3) = P ( X = 2, Y = 1) + P ( X = 3, Y = 0 )
P ( Z = 4 ) = P ( X = 3, Y = 1)
P ( Z = 4 ) = 0.12
Z 1 2 3 4
E (Z ) = Z P(Z )
4
2 2
z =1
E ( Z 2 ) = 6.98
( )
Variance of Z is V ( Z ) = E Z 2 − ( E ( Z ) )
2
Variance of Z is V ( Z ) = 0.73
Marginal distribution
Let X and Y be two continuous types of random variables, with joint distribution function
f ( x, y ) then,
The marginal distribution of X is f X ( x ) = f XY ( x, y ) dy
−
The marginal distribution of Y is fY ( y ) = f XY ( x, y ) dx
−
Here f X ( x ) 0; fY y 0 and
−
f ( x ) = 1; f ( y ) = 1
X
−
Y
f XY ( x, y )
The conditional probability mass function of X given Y = y is f X /Y ( x / y ) =
fY ( y )
f XY ( x, y )
The conditional probability mass function of Y given X = x is fY / X ( y / x ) =
fX ( x)
Here f X /Y ( x / y ) 0; fY / X ( y / x ) 0 and f X /Y ( x / y ) dx = 1 ; fY / X ( y / x ) dy = 1
− −
f XY ( x, y ) = f X ( x ) fY ( y )
f XY ( x, y ) f X ( x ) fY ( y )
f X /Y ( x / y ) = = = fX ( x)
fY ( y ) fY ( y )
f XY ( x, y ) f X ( x ) fY ( y )
fY / X ( y / x ) = = = fY ( y )
fX ( x) fX ( x)
kxy 0 x y 1
Solution: Given joint probability density function f XY ( x, y ) =
0 Other wise
We have f XY ( x, y ) dxdy = 1
− −
1 1
i.e., f XY ( x, y ) dxdy = 1
x =0 y = x
1 1
x=0 y= x kxy dy dx = 1
1
1
y2
2 dx = 1
x =0
kx
x
1
1 x2
2 − 2 dx = 1
x =0
kx
1
k
2 x =0
( x − x3 ) dx = 1
1
k x2 x4
− =1
2 2 4 0
k 1 1
− =1
22 4
k 1
=1
24
k =8
8xy 0 x y 1
f XY ( x, y ) =
0 Other wise
1
fX ( x) = 8xy dy
y=x
1
y2
f X ( x ) = 8x
2 x
1 x2
f X ( x ) = 8x −
2 2
4 x (1 − x 2 ) 0 x 1
The marginal distribution of X is f X ( x ) =
0 Other wise
The marginal distribution of Y is fY ( y ) = f XY ( x, y ) dx
−
fY ( y ) = 8xy dx
x =0
y
x2
fY ( y ) = 8 y
2 0
y2
fY ( y ) = 8 y − 0
2
fY ( y ) = 4 y 3
4 y 3 0 y 1
The marginal distribution of Y is fY ( y ) =
0 Other wise
k ( x y + xy ) 0 x 1, 0 y 1
2 3
functions.
Problem: Let X and Y be two random variables with joint probability density function
k ( x y + xy ) x + y 1, x 0, y 0
2 3
marginal functions.
K x + yz
Solution: Given joint density function f ( x, y, z ) =
2
( ) 0 x 1, 0 y 1, 0 z 1
0 otherwise
1 1 1
i.e., f ( x, y, z ) dx dy dz = 1
x =0 y =0 z =0
1 1 1
K (x )
+ yz dx dy dz = 1
2
x =0 y =0 z =0
1 1
1 2
K x=0 y=0 z=0 ( x + yz ) dz dx dy = 1
1
1
21
z2
K x z + y dx dy = 1
x =0 y =0
2 0
1
1 1
x + y dx dy = 1
2
K
x =0 y =0
2
1 1 2 1
K x + 2 y dy dx = 1
x =0 y =0
1
1
x + dx = 1
2
K
x =0
4
1
x3 1
K + x =1
3 4 0
1 1
K + =1
3 4
7
K =1
12
12
K =
7
12 2
f ( x, y , z ) = 7
x + yz ( ) 0 x 1, 0 y 1, 0 z 1
0 otherwise
(ii) Marginal distribution of X is f X ( x ) = f ( x, y, z ) dy dz
y =− z =−
1 1
fX ( x) = f ( x, y, z ) dy dz
y =0 z =0
1 1
f X ( x) =
12 2
7
( )
x + yz dy dz
y =0 z =0
12
1 1
fX ( x) = ( x + yz ) dz dy
2
7 y =0 z =0
1
12 2 z2
1
fX ( x) =
7 y=0
x z + y dy
2 0
12 2 1
1
fX ( x) = x + y dy
7 y =0 2
1
12 y2 1
f X ( x ) = x2 y +
7 2 2 0
1 1
fY ( y ) = f ( x, y, z ) dx dz
x =0 z =0
1 1
fY ( y ) = 7
( x + yz ) dx dz
12 2
x =0 z =0
12
1 1
fY ( y ) = ( x + yz ) dz dx
2
7 x =0 z =0
1
12 2 z2
1
fY ( y ) =
7 x=0
x z + y dx
2 0
12 2 1
1
fY ( y ) = x + y dx
7 x =0 2
1
12 x3 y
fY ( y ) = + x
7 3 2 0
12 1 y
fY ( y ) = + for 0 y 1
7 3 2
Marginal distribution of Z Y is f Z ( z ) = f ( x, y, z ) dx dy
x =− y =−
1 1
fZ ( z ) = f ( x, y, z ) dx dy
x =0 y =0
1 1
fZ ( z ) = 7
(
12 2
x + yz dx dy)
x =0 y =0
12
1 1
fZ ( z ) =
7 x=0 y=0
x (
2
+ yz dy )
dx
1
12 2 y2
1
fZ ( z ) =
7 x=0
x y + z dx
2 0
12 1 z
fZ ( z ) = + for 0 z 1
7 3 2
(iii) The joint marginal distributions of X , Y is f XY ( x, y ) = f ( x, y, z ) dz
z =−
1
f XY ( x, y ) = 7
( x + yz ) dz
12 2
z =0
1
12 z2
f XY ( x, y ) = x 2 z + y
7 2 0
12 2 1
f XY ( x, y ) = x + y
7 2
12 2 y
f XY ( x, y ) = x + for 0 x 1; 0 y 1
7 2
The joint marginal distributions of X , Z is f XZ ( x, z ) = f ( x, y, z ) dy
y =−
1
f XZ ( x, z ) = 7
( x + yz ) dy
12 2
y =0
1
12 y2
f XZ ( x, z ) = x 2 y + z
7 2 0
12 2 1
f XZ ( x, z ) = x +z
7 2
12 2 z
f XZ ( x, z ) = x + for 0 x 1; 0 z 1
7 2
The joint marginal distributions of Y , Z is fYZ ( y, z ) = f ( x, y, z ) dx
x =−
1
fYZ ( y, z ) = 7
( x + yz ) dx
12 2
x =0
1
12 x3
fYZ ( y, z ) = + yzx
7 3 0
f ( x, y , z )
12 2
7
( x + yz )
=
f XY ( x, y ) 12 2 1
x + y
7 2
f ( x, y , z ) x 2 + yz
= for 0 z 1
f XY ( x, y ) x + y
2 1
2
(v) Conditional expectation of U = X 2 +Y + Z given X = x, Y = y is given by
f ( x, y , z )
E U x, y =
z =−
U
f XY ( x, y )
dz
+ yz
1 2
E U x, y = (x + y+z )x
2
dz
1
z =0 x + y
2
2
1
(x )( )
1
= 2
+ y + z x 2 + yz dz
1
x + y z =0
2
( yz ))
1
=
1
1
2
( ) (
+ y x 2 + y + x 2 z + x 2 x 2 + y dz
x2 + y 0
2
1
z3 2
=
1
1
y +
y x 2
+ y + x 2 z
2 + x 2
(
x 2
+ y z ) ( )
x2 + y 3 0
2
1
=
1
1
(
y + y x 2 + y + x 2 + x 2 x 2 + y ) ( )
x2 + y 3
1 2
2
y x2 2 1 2
Conditional expectation of U given X = x, Y = y is
1
+ + x + y x + y ( )
x2 + y 3 2 2
1
2
(vi) E ( XYZ ) = xyz f ( x, y, z ) dx dy dz
− − −
12
1 1
1 3
E ( XYZ ) = x yz + xy z dz dx dy
2 2
7 x =0 y =0 z =0
1
12
1 1
3 z2 2 z
3
E ( XYZ ) = x y + xy dx dy
7 x =0 y =0 2 3 0
3 1 2 1
1 1
12
E ( XYZ ) = x y + xy dx dy
7 x =0 y =0 2 3
12 3 1 1
1 1
E ( XYZ ) = x y + xy 2 dy dx
7 x =0 y =0 2 3
1
12 3 y 2 1 y3 1
1
E ( XYZ ) =
7 x=0
x + x dx
2 2 3 3 0
12 3 1 1 1 1
1
E ( XYZ ) = x +x dx
7 x =0 2 2 3 3
12 3 1 1
1
E ( XYZ ) = x + x dx
7 x =0 4 9
1
12 x 4 1 x 2 1
E ( XYZ ) = +
7 4 4 2 9 0
12 1 1
E ( XYZ ) = +
7 16 18
17
E ( XYZ ) =
84
(i) Find the mean and variance of the area of the rectangle A = XY
1
Length X is selected form an exponential distribution with mean = = 5,
1 −1 x 1 1
− x
So, the marginal distribution of X is f X ( x ) = e 5 for x 0 f ( x) = e
,x0
5
Once the length has been chosen, its width Y, is selected from a uniform distribution from 0 to
half its length.
1
So, the conditional distribution of Y given X = x is fY / X ( y / x ) = for 0 y x / 2
x/2
f XY ( x, y )
The joint density function is f XY ( x, y ) = fY ( y ) f X /Y ( x / y ) f X /Y ( x / y ) =
fY ( y )
2 1 − 15 x
f XY ( x, y ) = e
x 5
2 − 15 x
f XY ( x, y ) = e for 0 y x / 2; x 0
5x
The area is A = XY
E ( A) = E ( XY ) = xy f ( x, y ) dxdy
XY
− −
2 − 15 x
E ( A) = xy e dxdy
− −
5x
2
x /2 1
− x
E ( A) = y e 5
dy dx
5 x =0 y =0
x /2
− x y
1 2
2
E ( A) = e 5 dx
5 x =0 2 0
− x x
1 2
1
E ( A ) = e dx
5
5 x =0 4
1
1 − x
E ( A) =
20 x=0
e 5
x3−1 dx
1 ( 3) Γ n
E ( A) = e kx
x n 1dx
20 1 3 0 kn
5
2! 53 25
E ( A) = = ( n + 1) = n !
20 2
E(A ) = E(X Y
2 2 2
)= x y 2 2
f XY ( x, y ) dxdy
− −
2 − 15 x
E(A ) = x y
2 2
e dxdy 2
− −
5x
2
x /2 1
E ( A ) = xy e 5 dy dx
− x
2 2
5 x =0 y =0
− x
1 x /2
E ( A ) = xe y 2 dy dx
2 2 5
5 x =0
y =0
x /2
− x y
1 3
E ( A ) = xe 5 dx
2 2
5 x =0 3 0
− x x
1 3
E(A ) =
2
15 x=0
2
dx
5
xe
8
60 x=0
2 4
x e 5 dx
1
E ( A2 ) =
1 − x
60 x=0
e 5
x5−1 dx
1 ( 5) Γ n
E ( A2 ) = e kx
x n 1dx
60 1 5 0 kn
5
4! 55
E ( A2 ) = = 1250 ( n + 1) = n !
60
(ii) The probability that its area A = XY is less than 4 is form the diagram we have,
2 2 x /2 4/ x
P ( A 4 ) = P ( XY 4 ) = f ( xy ) dxdy + f ( xy ) dxdy
x =0 y =0 x = 2 2 y =0
2 2 x /2 4/ x
2 − 15 x 2 − 15 x
P ( A 4) = e dxdy + 5x e dxdy
x =0 y =0
5x x=2 2 y =0
2 1 − 15 x 1 − 15 x
2 2 x /2 4/ x
2
=
5 x =0 x
e
1 dy
dx +
5 x =2 2 x
e
1 dy dx
y =0 y =0
2 2
2 1 − 15 x 2 1 − 15 x
= e ( y )0 dx + e ( y )0 dx
x /2 4/ x
5 x =0 x 5 x=2 2 x
2 1 − 15 x x 1 − 15 x 4
2 2
2
5 x=0 x 5 x =2 2 x
= e dx + e dx
2 x
2 2 −1 −1
1 x 8 x
−2 5
= e 5
dx + x e dx
5 x =0 5 x=2 2
2 2 2 2
−1 x −1,
1e5 8 5
= + By Numerical Method
5 −1 5 5
5 0
P ( A 4 ) = P ( XY 4 ) = 0.2184
between X 1 and the joint effect of X 2 and X 3 . It can also be defined as the correlation
Multiple correlation coefficient is the simple correlation coefficient between a variable and its
estimate and is given by
r122 + r132 − 2r12 r13r23 r122 + r232 − 2r12 r13r23 r132 + r232 − 2r12 r13r23
R1.23 = ; R2.13 = and R3.12 =
1 − r232 1 − r132 1 − r122
N ( X 1 X 2 ) − ( X 1 )( X 2 )
Here r12 = …
N ( X 12 ) − ( X 1 ) N ( X 22 ) − ( X 2 )
2 2
Let us consider the case of three variables X 1 X 2 and X 3 . Sometimes the correlation between
two variables X 1 and X 2 may be partly due to the correlation of a third variable X 3 with both
X 1 and X 2 . In this point of situation, one may be absorbed to study the correlation between
known as partial correlation. The correlation coefficient between X 1 and X 2 after eliminating
r12 − r13r23
and X 2 are eliminated is R12.3 =
1 − r132 1 − r232
The partial correlation coefficient between X 1 and X 3 , when the effect of X 2 on each of X 1
The partial correlation coefficient between X 2 and X 3 , when the effect of X 1 on each of X 2
correlations co-efficient.
X1 X2 X3 X1 X 2 X1 X 3 X2 X3 X12 X 22 X 32
2 8 0 16 0 0 4 64 0
5 8 1 40 5 8 25 64 1
7 6 1 42 7 6 49 36 1
8 5 3 40 24 15 64 25 9
5 3 4 15 20 12 25 9 16
= 27 = 30 = 9 = 153 = 56 = 41 = 167 = 198 = 27
( 5) (153) − ( 27 )( 30 )
r12 = = −0.4607
5 (167 ) − ( 27 ) 5 (198 ) − ( 30 )
2 2
N ( X 1 X 3 ) − ( X 1 )( X 3 )
r13 =
N ( X 12 ) − ( X 1 ) N ( X 32 ) − ( X 3 )
2 2
5 ( 56 ) − ( 27 )( 9 )
r13 = = 0.4890
5 (167 ) − ( 27 ) 5 ( 27 ) − ( 9 )
2 2
N ( X 2 X 3 ) − ( X 2 )( X 3 )
r23 =
N ( X 22 ) − ( X 2 ) N ( X 32 ) − ( X 3 )
2 2
5 ( 41) − ( 30 )( 9 )
r23 = = −0.9324
5 (198) − ( 30 ) 5 ( 27 ) − ( 9 )
2 2
The partial correlation coefficient between X 1 and X 2 when the effect of X 3 on each of X 1
r12 − r13r23
and X 2 are eliminated is R12.3 =
1 − r132 1 − r232
The partial correlation coefficient between X 2 and X 3 , when the effect of X 1 on each of X 2
Bit is a unit of information. 1 bit refers to the amount of information that one is uncertain
about in a binary random variable that takes the value of either 0 or 1 with equal probability.
Surprisal
Surprisal quantifies the uncertainty in a random variable X taking a certain value x based on
its probability of occurrence P ( X = x ) or P ( x ) . Surprisal is measured in bits when the base
of the logarithm is 2.
1
Base ' 2 '− Bit ;
i.e., s ( x ) = log Base ' e '− Nat ;
P( x )
2 = − log 2P( x )
Base '10 '− Hartly.
Entropy
on its probability distribution. Entropy is measured in bits when the base of logarithm is 2.
H ( X ) = P ( x) s ( x)
x
i.e., H ( X ) = − P ( x ) log 2
( p x)
Joint Entropy
expresses how much extra information you still need to supply on average to communicate Y
given that the other party knows X.
Coding: The procedure of associating words from the alphabet of one language to the
alphabet of with the given words of another language.
i.e., we represent the words form one language into with the words of another language.
The objective of the coding is to increase the efficiency and reduce the transmission errors.
Let the source or the transmitter be represented by the set of containing ‘k’ symbols.
Let us consider another set X containing ‘r’ symbols, X = x1 , x2 , x3 , ... xr is called Code
alphabet.
For example:
s1 = x1 x2
s2 = x2 x1 x3
s3 = x3 x2 x4 x5
...
A source from which the data is being emitted at successive intervals, which is independent
of previous values, can be characterized as discrete memoryless source.
Let a DMS outputs a symbol every ‘t’ second, each symbol is selected from a finite
set of symbols xi , i = 1, 2,...L , occurring with probabilities P ( xi ) , i = 1, 2,...L . Then the
L
entropy of this DMS in bits per source symbol is H ( X ) = − P ( xi ) log 2 i log 2
( ) L p x
i =1
If a finite set of symbols xi , i = 1, 2,...L , then the length of the codes is given by
log 2 + 1 If L is not power of 2
L
R=
L
log 2 f L is power of 2
Prefix Codes
A prefix code is one in which no codeword forms the prefix of any other codeword,
such codes are called as Prefix codes or instantaneous codes.
The rate of inference about the outcomes per second is given by Rate = L H ( X )
H (X )
The percentage of maximum possible information is given by
R
P ( x)
The relative entropy from Q to P is DKL ( P ( X ) || Q ( X ) ) = P ( X ) log and
xX Q( X )
Q ( x)
The relative entropy from P to Q is DKL ( Q ( X ) || P ( X ) ) = Q ( X ) log
xX P( X )
P ( x)
The relative entropy from Q to P is DKL ( P ( X ) || Q ( X ) ) = P ( X ) log Q ( X ) dx and
−
Q ( x)
The relative entropy from P to Q is DKL ( Q ( X ) || P ( X ) ) = Q ( X ) log P ( X ) dx
−
X 0 1 2
P ( x)
Solution: The relative entropy from Q to P is DKL ( P ( X ) || Q ( X ) ) = P ( X ) log
xX Q( X )
2
P ( x)
DKL ( P ( X ) || Q ( X ) ) = P ( X ) log
x =0 Q( X )
Q ( x)
The relative entropy from P to Q is DKL ( Q ( X ) || P ( X ) ) = Q ( X ) log
xX P( X )
X 0 1 2
Unit – V
Optimization
Hessian Matrix: The Hessian matrix, is a square matrix of second-order partial derivatives of
a scalar-valued function, it is denoted by H .
If f ( X ) is a scalar-valued function, where X = ( x1 , x2 ,...xn ) then the Hessian matrix is defined
2 f 2 f 2 f
x 2 x1x2
...
x1xn
1
2 f 2 f 2 f
as H = x2 x1 x22 x2 xn
2 f 2 f 2 f
x x xn x2 xn2
n 1
The Hessian matrix helps determine the local behaviour of a function around critical points
such as maxima, minima, or saddle points. Specifically, it's used in optimization algorithms to
test for optimality conditions.
Leading Principal Minor: The k th order leading principal minor (LPM) of H is the
determinant of the matrix formed by deleting the last ( n − k ) columns and rows of H .
The matrix H is positive definite if all its leading principal minors are positive at x *
i.e., Leading principal minors H1 0; H 2 0; H 3 0... at x * then the function f ( X )
is called strictly convex, and the point x * is strict local minimum of the function f ( X ) .
The matrix H is negative definite if all its leading principal minors are not zero and
alternate in sign, but first leading principal minor should be negative.
i.e., H1 0; H 2 0; H 3 0... , at x * then the function f ( X ) is called strictly concave
The matrix H is positive semi definite if all its leading principal minors are positive at x *
and at least one of H i = 0 for i = 2,3...n .
The matrix H is negative semi definite if all its leading principal minors alternate in sign,
but first leading principal minor should be negative and at least one of H i = 0 for
i = 2,3...n .
Note:
A function which is both convex and concave, then it has to be linear function.
If f and g are two convex functions over the convex set then
So, the stationary point x = 1 is the point of maximum, that maximize the function f ( x ) and
f ( 3) = 6 ( 3) − 12 = 6 0
The stationary point x = 3 is the point of minimum, that minimize the function f ( x ) .
minimum.
f f
Now = 3 x 2 − 6 y and = 3y2 − 6x
x y
To get stationary point we set these partial derivatives to zero
f
i.e., = 0 3x 2 − 6 y = 0
x
6 y = 3x 2
x2
y= → (1) and
2
f
= 0 3y2 − 6x = 0
y
3y2 = 6x
y 2 = 2 x → (2)
From (1) and (2)
2
x2
= 2x
2
x4
= 2x
4
x4 − 8x = 0
x ( x3 − 8 ) = 0
x = 0; x = 2
Implies y = 0; y = 2
2 f 2 f 2 f f
= 6 x; = 6 y ; − 6 and = −6
x 2
y 2
xy yx
2 f 2 f
2
x xy 6 x −6 12 −6
H (X ) = 2 = =
f 2 f −6 6 y −6 12
yx y 2
The matrix H is positive definite as all its leading principal minors are positive, so
So, the point ( 2, 2 ) is a point of minimum.
Problem: Construct the Hessian matrix, check its definiteness, and extremum values for the
function f ( X ) = 2 x12 + x22 + 3x32 + 10 x1 + 8 x2 + 6 x3 − 100
Step 01. Find the search direction Si = −f ( X i ) is the gradient negative.
Step 02. Determine the optimal step length i in the direction of S i and set X i +1 = X i + i Si
SiT Si
Where i =
SiT HSi
If we reach the optimality stop, otherwise implement step 01 again for this new point X i +1.
0
from the point X 1 = .
0
Solution: The given objective function is f ( x1 , x2 ) = x1 − x2 + 2 x12 + 2 x1 x2 + x22
0
And starting from the point is X 1 = .
0
f f
Now = 1 + 4 x1 + 2 x2 and = −1 + 2 x1 + 2 x2
x1 x2
f
x 1 + 4 x + 2 x
The gradient of f is f = =
1 1 2
f −1 + 2 x1 + 2 x2
x
2
1 + 4 x1 + 2 x2
Implies −f = −
−1 + 2 x1 + 2 x2
2 f 2 f
2
x xy 4 2
Now the Hessian matrix is H ( X ) = 2 =
f 2 f 2 2
yx y 2
S1T S1
1 = T
S1 HS1
−1
−1 1
1
1 =
4 2 −1
−1 1
2 2 1
2
1 = =1
2
Now X i +1 = X i + i Si
i.e., X 2 = X 1 + 1S1
0 −1 −1
X 2 = + (1) =
0 1 1
−1 0
We check the optimum f ( X 2 ) =
−1 0
So X 2 is not optimum, so move to next iteration.
−1
Iteration 02. At X 2 =
1
1
S2 = −f ( X 2 ) = and
1
SiT Si
We have i = T
Si H i S i
S2T S2
2 =
S2T HS2
i.e., X 3 = X 2 + 2 S2
−0.8
Iteration 03. At X 3 =
1.2
−0.2
S3 = −f ( X 3 ) = and
0.2
SiT Si
We have i =
SiT H i Si
S3T S3
3 = T
S3 HS3
−0.2
−0.20.2
3 = 0.2
4 2 −0.2
−0.2 0.2
2 2 0.2
0.8
3 = =1
0.8
Now X i +1 = X i + i Si
i.e., X 4 = X 3 + 3 S3
−1.0
Iteration 04. At X 4 =
1.4
0.2
S4 = −f ( X 4 ) = and
0.2
SiT Si
We have i =
SiT H i Si
S4T S4
4 =
S4T HS4
0.2
0.2
0.2
4 = 0.2
4 2 0.2
0.2 0.2
2 2 0.2
1
4 =
5
Now X i +1 = X i + i Si
i.e., X 5 = X 4 + 4 S4
−0.96
Iteration 05. At X 5 =
1.44
−0.04
S5 = −f ( X 5 ) = and
0.04
SiT Si
We have i = T
Si H i S i
S5T S5
5 = T
S5 HS5
i.e., X 6 = X 5 + 5 S5
−0.96 −0.04 −1
X6 = + (1) =
1.44 0.04 1.48
0.04 0
We check the optimum f ( X 5 ) =
0.04 0
−1.0
So X 6 = is the optimum,
1.48
Problem: Minimize by Steepest descent method f ( x1 , x2 ) = 4 x12 + 6 x22 − 8x1 x2 starting from the
1
point X 1 = .
1
0.5
the point X 0 = .
0.5
Let the loss function measures the error between the polynomial P ( x ) and sin ( x ) within the
range −3 x 3 .
sin ( x ) − P ( x )
2
These are the values that give you the slope of the loss function with regards to each
specific coefficient.
They indicate whether you should increase or decrease it to reduce the loss, and also by
how much it should be safe to do so.
Given coefficients a0 , a1 , a2 , a3 , a4 , and a5 calculated gradients ga0 , ga1 , ga2 , ga3 , ga4 , ga5
and learning rate , typically one would update the coefficients so that their new, updated
values are defined as below:
a0 = a0 − ga0 ; a1 = a1 − ga1 ; a2 = a2 − ga2 ; a3 = a3 − ga3 ; a4 = a4 − ga4 ;
a5 = a5 − ga5
Repeat the above process and update the coefficients until, once you have applied that new
model to the data, your loss should have decreased.
Use these coefficients to get the polynomial of degree 5 within the given range.
Karush-Kuhn-Tucker Conditions
The optimality conditions for a constrained local optimum are called the Karush Kuhn
Tucker (KKT) conditions and they play an important role in constrained optimization theory
and algorithm development. The KKT conditions for optimality are a set of necessary and
sufficient conditions for a solution to be optimal in a mathematical optimization problem.
Subject to gi ( x ) bi
The necessary and sufficient conditions for The necessary and sufficient conditions for
an absolute maximum of f ( x ) at x* are an absolute minimum of f ( x ) at x* are
f gi f gi
− i =0 − i =0
x i x x i x
i ( gi ( x) − bi ) = 0 i ( gi ( x) − bi ) = 0
g ( xi ) bi g ( xi ) bi
i 0 i 0
Minimize f ( x ) Minimize f ( x )
Subject to gi ( x ) bi Subject to gi ( x ) bi
Maximize f ( x ) Maximize f ( x )
Subject to gi ( x ) bi Subject to gi ( x ) bi
Subject to 3x1 + 2 x2 6
x1 , x2 0
Subject to 3x1 + 2 x2 6
x1 , x2 0
Initially, we check the sufficient condition of Karush-Kuhn-Tucker Conditions
For the KKT conditions to be sufficient for Z to be maximum,
f ( x ) should be Concave and g ( x ) is convex.
f f
= 8 − 2 x1; = 10 − 2 x2
x1 x2
2 f 2 f
2
x xy −2 0
The Hessian matrix H ( X ) = 2 =
f 2 f 0 −2
yx y 2
Here H1 = −2 0 and H 2 = 4 0 ,
The matrix H is negative definite as all its leading principal minors are not zero and
alternate in sign, with first negative leading principal minor. So, H is negative definite and the
function f ( X ) is concave.
( )
i.e., L = 8x1 + 10 x2 − x12 − x22 − ( 3x1 + 2 x2 − 6 )
L
= 10 − 2 x2 − 2 = 0 → (2)
x2
( 3x1 + 2 x2 − 6 ) = 0 → (3)
0 → (4)
3x1 + 2 x2 6 → (5)
x1 , x2 0 → (6)
3x1 + 2 x2 − 6 = 0 → (7)
8 − 3
And from equation (1) 2 x1 = 8 − 3 x1 =
2
From equation (2) 2 x2 = 10 − 2 x2 = 5 −
8 − 3
3 + 2 (5 − ) − 6 = 0
2
32
= 0
13
8 − 3 4 33
Implies x1 = = and x2 = 5 − =
2 13 13
4 33 32
Hence the stationary point ( x1 , x2 ; ) = , , is the optimal Point.
13 13 13
2 2
4 33 4 33 277
And optimal value is Z = 8 + 10 − − = .
13 13 13 13 13
Subject to 2 x1 + x2 10
x1 , x2 0
Solution: For 0
3.6 − 2 1.6 −
x1 = ; x2 = and
0.8 0.4
= 0.4
Implies x1 = 3.5; x2 = 3
Subject to 2 x1 + x2 5
x1 , x2 0
Solution: For 0
11 4
x1 = ; x2 = and
6 3
4
=
3
91
Optimal value is Z =
6
Subject to x1 + x2 2
x1 , x2 0
Solution: For 0
1 1
x1 = − ; x2 = − and
= −1
Implies x1 = 1; x2 = 1
Optimal value is Z = 0
Subject to x1 + x2 10; x2 8
x1 , x2 0
Subject to x1 + x2 10; x2 8
x1 , x2 0
Initially, we check the sufficient condition of Karush-Kuhn-Tucker Conditions
For the KKT conditions to be sufficient for Z to be maximum,
f ( x ) should be Concave and g ( x ) is convex.
f f
= 12 + 2 x2 − 4 x1; = 21 + 2 x1 − 4 x2
x1 x2
2 f 2 f
2
x xy −4 2
The Hessian matrix H ( X ) = 2 =
f 2 f 2 −4
yx y 2
Here H1 = −4 0 and H 2 = 12 0 ,
The matrix H is negative definite as all its leading principal minors are not zero and
alternate in sign, with first negative leading principal minor. So, H is negative definite and the
function f ( X ) is concave.
( )
i.e., L = 12 x1 + 21x2 + 2 x1 x2 − 2 x12 − 2 x22 − 1 ( x1 + x2 − 10 ) − 2 ( x2 − 8)
1 ( x1 + x2 − 10 ) = 0 → (3)
2 ( x2 − 8) = 0 → (4)
1 , 2 0 → (5)
x1 + x2 10 → (6)
x2 8 → (7)
x1 , x2 0 → (8)
Case 01. 1 = 0; 2 = 0
21 + 2 x1 − 4 x2 = 0
By solving these two equations we get
15
x1 = ; x2 = 9
2
These values do not satisfy the equations (6) and (7)
So, it is discarded
Case 02. 1 = 0; 2 0
x1 = 7
But the equation (6) is violated with these values
Thus, this case is also discarded.
Case 03. 1 0; 2 0
x1 + x2 − 10 = 0
And x1 = 2
Case 04. 1 0; 2 = 0
21 + 2 x1 − 4 x2 = 1
From above equations
12 + 2 x2 − 4 x1 = 21 + 2 x1 − 4 x2
Implies 6 x1 − 6 x2 + 9 = 0 → ( B)
13
1 =
2
13
The optimal solution is ( x1 , x2 ; 1 , 2 ) = 4.25, 5.75; , 0
2
The optimal value is Z = 12 ( 4.25) + 21( 5.75) + 2 ( 4.25)( 5.75) − 2 ( 4.25) − 2 (5.75)
2 2
947
Z=
8
Subject to x1 + x2 2; 2 x1 + 3x2 12
x1 , x2 0 ; x3 is unrestricted.
Solution:
H is negative definite and the function f ( X ) is concave.
Subject to x12 − x2 0; x1 + x2 2
x1 , x2 0
Subject to x12 − x2 0; x1 + x2 2
x1 , x2 0
Here f ( X ) = ( x1 − 2 ) + ( x2 − 1)
2 2
f f
= 2 ( x1 − 2 ) ; = 2 ( x2 − 1)
x1 x2
2 f 2 f
2
x xy 2 0
The Hessian matrix H ( X ) = 2 =
f 2 f 0 2
yx y 2
Here H1 = 2 0 and H 2 = 4 0 ,
The matrix H is positive definite as all its leading principal minors are positive then function
f ( X ) is convex.
i.e., L = ( x1 − 2 ) + ( x2 − 1) − 1 ( x12 − x2 ) − 2 ( x1 + x2 − 2 )
2 2
1 ( x12 − x2 ) = 0 → (3)
2 ( x1 + x2 − 2 ) = 0 → (4)
1 , 2 0 → (5)
x12 − x2 0 → (6)
x1 + x2 2 → (7)
x1 , x2 0 → (8)
Case 01. 1 = 0; 2 = 0
2 ( x2 − 1) = 0 x2 = 1
Case 02. 1 = 0; 2 0
2 ( x1 − 2 ) = 2
2 ( x2 − 1) = 2
Implies 2 ( x1 − 2 ) = 2 ( x2 − 1)
x1 − x2 = 1 → ( A)
x1 + x2 = 2 → ( B)
From (A) and (B)
1 1
x1 = and x2 = −
2 2
But the equation (8) is violated with these values.
Thus, this case is also discarded.
x1 + x12 − 2 = 0
x1 = 1, − 2
As x1 0 x1 = 1 and
x2 = 1
Now from equation (1) and (2)
2 (1 − 2 ) − 21 − 2 = 0 21 + 2 + 2 = 0
2 (1 − 1) + 1 − 2 = 0 1 = 2
2
1 = 2 = − 0
3
2 2
The stationary point is ( x1 , x2 ; 1 , 2 ) = 1, 1; − , −
3 3
subject to x 2 + y 2 2; x, y 0
L = ( xy ) − ( x 2 + y 2 − 2)
( x2 + y 2 − 2 ) = 0 → (3)
0 → (4)
x 2 + y 2 2 → (5)
x, y 0 → (6)
y = 2 ( 2 y )
y = 4 2 y
y − 4 2 y = 0
y (1 − 4 2 ) = 0
1 1
4 2 = 1 2 = =
4 2
1
Now =
2
From x 2 + y 2 = 2
x2 + x2 = 2
x2 = 1 x = 1
y =1
1
The stationary point ( x, y; ) = 1,1; is the optimal point
2
Afterwards formulating the linear programming problem, our aim is to determine the
values of decision variables to find the optimum (maximum or minimum) value of the objective
function. Linear programming problems which involve only two variables can be solved by
graphical method. If the problem has three or more variables, the graphical method is
impractical.
Subject to x1 + 4 x2 24
3x1 + x2 21
x1 + x2 9 and
x1 , x2 0
Maximize Z = 2 x1 + 5 x2
Subject to x1 + 4 x2 24
3x1 + x2 21
x1 + x2 9 and
x1 , x2 0
First, we have to find the feasible region using the given conditions.
Since both the decision variables x1 and x2 are non-negative, the solution lies in the first
quadrant.
x1 + 4 x2 = 24 ; 3x1 + x2 = 21 ; x1 + x2 = 9
Clearly
The line x1 + 4 x2 = 24 passing through ( 24, 0 ) ; ( 0, 6 )
x1 x2
+ =1
24 6
The line 3x1 + x2 = 21 passing through ( 7, 0 ) ; ( 0, 21)
x1 x2
+ =1
7 21
The line x1 + x2 = 9 passing through ( 9, 0 ) ; ( 0, 9 )
26
24
22
20
18
16
14
0 2 4 6 8 10 12 14 16
From the graph the shaded region is the feasible region, and the corner points of the feasible
region are O ( 0, 0 ) ; A ( 7, 0 ) ; B ( 6,3) ; C ( 4,5 ) ; D ( 0, 6 ) .
O ( 0, 0 ) 0
A ( 7, 0 ) 14
Z = 2 x1 + 5 x2 B ( 6,3) 27
C ( 4,5 ) 33
D ( 0, 6 ) 30
Here the optimum point is C ( 4,5 ) and the optimum value is 33.
Subject to x1 + x2 30
x2 12
x1 20 and
x1 , x2 0
Subject to 4 x1 + x2 40
2 x1 + 3 x2 90 and
x1 , x2 0
Solution:
The no. of product X and the no. of product Y are our decision variables.
The constraints are the available processing time on the machines. The available processing
time on machine A is forecast to be 40 hours. Since it is in hours whereas it is given in minutes
for processing X and Y, we need to convert hours into minutes.
Therefore, 50 x + 24 y 2400
The total demand for products X and Y in the current week is forecasted as 75 units and 95
units respectively. To maximise the no. of units of X and no. of units of Y left in stock at the
end of the week,
Z = ( x + 30 − 75) + ( y + 90 − 95)
The demand for X and Y are 75 units and 95 units respectively whereas initially, they are in
stock of 30 units of X and 90 units of Y.
x 75 − 30 x 45 and y 95 − 90 y 5
30 x + 33 y 2100
x 45
y5
We write these inequalities in to equalities & plot these equalities on a graph as shown below.
Here the line 50 x + 24 y 2400 pass through the points ( 48, 0 ) and ( 0, 100 )
100
95
90
85
80
75
70
65
60 v
55
50
45
40 v v
35
30
25
20
15
10
05
0 05 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Here, we obtain the corner points are A ( 45, 6.25) , B ( 45,5 ) and C ( 45.6,5 ) .
Z = x + y − 50 B ( 45,5 ) 0
C ( 45.6,5 ) 0.6
be 45 and no. of units of Y should be 6.25 to obtain the maximum sum of products in the
current week.
Number of units of X should be 45 and number of units of Y should be 6.25 to obtain the
maximum sum of products in the current week.
Solution: Let x and y denotes the number of bulbs in type A and type B respectively
Objective function:
Profit on x bulbs in type A is 15x
Profit on y bulbs in type B is 10 y
Total profit is 15 x + 10 y
Profit is to be maximized, therefore, the objective function is
Maximize Z = 15 x + 10 y
Constraints:
Raw materials required for each bulb A is twice as that of bulb B.
i.e., for bulb A raw material required is 2x and for B is y .
Raw material is sufficient only for 1000 bulb per day
i.e., 2 x + y 1000
Bulb A requires 400 clips per day
i.e., x 400
Bulb B requires 700 clips per day
i.e., y 700
Non-negative restrictions:
Since the number of bulbs is non-negative,
we have x 0; y 0
Thus, the mathematical formulation of the LPP is
Maximize Z = 15 x + 10 y
Subject to 2 x + y 1000
x 400
y 700 and
x 0; y 0
Solution: Let x1 and x2 denotes the number of electric Kettles of two types ordinary and auto-
cut, respectively.
Objective function:
Profit on x1 ordinary Kettles is 100x1
Constraints:
The assembling and testing time required for x1 units of ordinary Kettles is 0.8x1 and for x2
Non-negative restrictions:
Since the number of both types of Kettle s is non-negative,
we have x1 0; x2 0
x1 600
x2 400 and
x1 0; x2 0
Writing the inequalities into equalities we get,
0.8 x1 + 1.2 x2 = 720 this line is passing through ( 900, 0 ) and ( 0, 600 )
1000 950
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
150
100
50
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
950
40 Compiled by: Magisetty Harikrushna DSSM
MAPTSMS; MIAENG; MAMS; MIAEPT M.Sc., B.Ed., M.Phil., (Ph.D.) Reference Material
Problem: Determine the slope and intercept and graph the linear inequalities:
i) 5 x + 2 y −6
ii) 8 x − 5 y 5
Solution: The given linear inequalities 5 x + 2 y −6; 8 x − 5 y 5
Writing in to equalities 5 x + 2 y = −6 and 8 x − 5 y = 5
5 x + 2 y = −6 2 y = −5 x − 6
5
y = − x − 3 is in the form y = mx + c
2
−5
Here m = is the slope, Y- intercept c = −3 .
2
8x − 5 y = 5 5 y = 8x − 5
8
y= x − 1 is in the form y = mx + c
5
8
Here m = is the slope, Y- intercept c = −1 .
5
−6
The line 5 x + 2 y = −6 passes through the points , 0 and ( 0, −3) .
5
5
The line 8 x − 5 y = 5 passes through the points , 0 and ( 0, −1) .
8
Graph
2
-3 -2 -1 0
1
Optimization in ML
In fact, behind every Machine Learning (and Deep Learning) algorithm, some optimization is
involved. let us take the simplest possible example. Consider the following 3 data points
X Y
1 4.6
3 12.8
6 15.5
Everybody conversant with machine learning will immediately recognize that we are referring
to X as the independent variable (also called “Features” or “Attributes”), and the Y is the
dependent variable (also referred to as the “Target” or “Outcome”). Hence, the overall task of
any machine is to find the relationship between X & Y. This relationship is
actually “Learned” by the machine from the DATA, and hence we call the term Machine
Learning. We, humans, learn from our experiences, similarly, the same experience is fed into
the machine in the form of data.
Now, let us say that we want to find the best fit line through the above 3 data points. The
following plot shows these 3 data points in blue circles. Also shown is the green line, which
we are claiming as the “Best-Fit Line” through these 3 data points. Also, have shown a “Poor-
Fitting” line (the red line) for comparison.
12
10
08
06
04
02
X
0.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
The main objective is to find the Equation of the Best-Fitting Straight Line through these three
data points.
Y = a0 + a1 X → (1) is the equation of the best-fit line green-line in the above plot, where a1
slope of the line, a0 intercept of the line. In machine learning, this best fit is also called
the Linear Regression (LR) model, and w0 and w1 are also called model weights or model
coefficients.
The predicted values from the Linear Regression model (Y) are represented by orange-squares
in the above plot. Obviously, the predicted values are NOT exactly the same as the actual values
of Y (blue circles), the vertical difference represents the error in the prediction given by
Error = Yi − Y i → (2) for any ith data point. Now we claim that this best fit-line will have the
minimum error for prediction, among all possible infinite random “poor-fit” lines. This total
Error across all the data points is expressed as the Mean Squared Error (MSE) Function, which
will be minimum for the best-fit line.
1 n
(
Yi − Y i )
2
MSE = → (3) , where ‘n’ is the total number of data points in the dataset.
n i =1
There are two types of Supervised learning tasks, Regression & Classification.
Regression
In Regression problems, the objective is to find the “best-fit” line passing through the
majority of the data points and will have the minimum value of an error function, also called
as the Cost function or Loss function, commonly which is the Mean Square Error (MSE). Why
only MSE, why not a simple sum of errors terms or mean the absolute error will be discussed
in another article. Hence, as discussed in the above section, the main objective is to find the
line having the minimum value of this error function.
The following table lists the Error functions being minimized in the most commonly used
regressor algorithms.