Computer Numerical and Statistical Method Unit 2 Calicut Univercitty Note
Computer Numerical and Statistical Method Unit 2 Calicut Univercitty Note
Correlation
2 variables are said to be corelated if the change in 1 variable results in a corresponding change in
the other variable, i.e., when 2 variables move together, we say that they are corelated.
For example, when the price of a commodity races the supply for that commodity also races.
When the values of 2 variables move in the same direction, correlation is said to be positive.
i.e., an increase in the value of 1 variable results into an increase in the value of the other variable, or
if a decrease in the value of 1 variable results into a decrease in the value of the other variable.
X Y
1 2
2 4
3 6
4 8
The value of 2 variable moves in opposite direction so that an increase in the value of 1 variable
results into a decrease in the value of the other variable, the correlation is said to be negative.
X Y
1 4
2 3
3 2
4 1
Linear and nonlinear correlation
When the amount of change in 1 variable leads to a constant ratio of change in the other variable,
correlation is said to be linear.
When the amount of change in 1 variable is not in constant ratio to the change in the other variable,
the correlation is said to be nonlinear.
X Y
1 1
2 2
3 3
4 4
Simple, multiple and partial correlation
In the study of relationship between variables, if there are only 2 variables, the correlation is said to
be simple.
Thuse, the relationship between rain and temperature together is multiple correlation.
In partial correlation, we study the relationship of 1 variable with 1 of the other variable assuming
that, the other variables remain constant.
e.g., There are 3 variables, yield, rain and temperature. The relationship between yield and rain
(assuming temperature is constant) is the partial correlation.
When the coefficient of correlation is 0, it indicates that there is no correlation between the
variables.
n Σ ( xy ) −Σ ( x ) Σ ( y )
r=
√ n Σ ( x )−( Σ x ) √ n Σ ( y ) −( Σ y )
2 2 2 2
x 2 3 4 5 6 7 8
y 4 5 6 12 9 5 4
xy 8 15 24 60 54 35 32
x
2
4 9 16 25 36 49 64
y
2
16 25 36 144 81 25 16
Σ x = 35
Σ y = 45
Σ (xy) = 228
2
Σ x =203
2
Σ y =343
( Σ x )2=1225
( Σ y )2=2025
N=7
n Σ ( xy )−Σ ( x ) Σ ( y )
r=
√ n Σ ( x ) −( Σ x ) √n Σ ( y )− ( Σ y )
2 2 2 2
Price (x), 7, 8, 9, 6, 5
Demand (y), 8, 6, 7, 9, 10
Answer = -0.9
x 7 8 9 6 5
y 8 6 7 9 10
xy 56 48 63 54 50
x
2
49 64 81 36 25
y
2
64 36 49 81 100
n=5
Σ x = 35
Σ y = 40
Σ (xy) = 271
2
Σ x =255
2
Σ y =330
( Σ x )2=1225
( Σ y )2=1600
n Σ ( xy )−Σ ( x ) Σ ( y )
r=
√ n Σ ( x ) −( Σ x ) √n Σ ( y )− ( Σ y )
2 2 2 2
( 5 ×271 )− (35 × 40 )
¿
√( 5 × 255 )−1225 √ ( 5× 330 )−1600
−45
¿
√50 √ 50
−45
¿
7.071× 7.071
−45
¿
50
¿−0.9
Properties of correlation coefficient
Correlation coefficient has a well-defined formula.
Coefficient of correlation doesn’t change with reference to change of origin or change of scale.
0.6745 ( 1−r 2 )
Probable error ¿
√n
r = coefficient correlation
1−r 2
standard error (se) ¿
√n
q: if r = 0.6 n = 64 find probable error and se
0.6745 ( 1−r 2 )
probable error ¿
√n
0.6745 ( 1−0.6 2 )
¿
√ 64
Q: if r = 0.89 probable error = 0.023 find n
Q: calculate Karl Pierson’s coefficient of correlation between x and y from the following data
N= 10
Σ x = 35
Σ y = 28
Σ x^2 = 203
Σ y^2 = 140
Σ xy = 168
n Σ ( xy ) −Σ ( x ) Σ ( y )
r=
√ n Σ ( x )−( Σ x ) √ n Σ ( y ) −( Σ y )
2 2 2 2
700
¿
√805 √616
700
¿
704.18747503772033505607095349378
¿ 0.99
Spearman’s method (rank correlation)
2
6ΣD
According to spearman’s method, the formula for rank correlation coefficient is 1−
n ( n 2−1 )
Where d is difference between ranks, n is number of observations
Q: the ranking of 10 individuals of the start and at the finish of a course of a training is as follows:
Individuals, a, b, c, d, e, f, g, h, I, j
D,5,2,0,7,2,82,1,4,3
2
d ,5,2,0,7,2,82,1,4,3
n=10
2
Σ d =106
2
6ΣD
r =1−
n ( n2−1 )
6 ×106
r =1−
10 ( 99 )
6 ×106
r =1−
10 ( 99 )
r =0.357
q: 10 competitors in a beauty contest are ranked by 3 judges in the following order:
judge 1,1,6,5,10,3,2,4,9,7,8
judge 2,3,5,8,4,7,10,2,1,6,9
judge 3,6,4,9,8,1,2,3,10,5,7
use the correlation coefficient to discuss which pare of judges have nearest approach to common
taste in beauty.
Pare 1:
judge 1,1,6,5,10,3,2,4,9,7,8
judge 2,3,5,8,4,7,10,2,1,6,9
d,2,1,3,6,4,8,2,8,1,1
2
d ,4,1,9,36,16,64,4,64,1,1
2
Σ d =200
n=10
6 ( 200 )
Rank correlation ¿ 1−
10 ( 100−1 )
6 ( 200 )
¿ 1−
10 ( 100−1 )
¿−0.212
Pare 2:
judge 2,3,5,8,4,7,10,2,1,6,9
judge 3,6,4,9,8,1,2,3,10,5,7
d,3,1,1,4,6,8,1,9,1,2
2
D ,9,1,1,16,36,64,1,81,1,4
2
Σ d =214
n=10
6 ( 214 )
Rank correlation ¿ 1−
10 ( 100−1 )
6 ( 214 )
¿ 1−
10 ( 100−1 )
¿−0.296
Pare 3:
judge 1,1,6,5,10,3,2,4,9,7,8
judge 3,6,4,9,8,1,2,3,10,5,7
d,5,2,4,2,2,0,1,1,2,1
2
d ,25,4,16,4,4,0,1,1,4,1
2
Σ d =60
n=10
6 ( 60 )
Rank correlation ¿ 1−
10 ( 100−1 )
¿ 0.636
Q:
X,6,8,12,15,18,20,24,28,31
Y,10,12,15,15,18,25,22,26,28
Q:
X,50,60,55,65,75,70m75,80,90,80
Y, 10,14,15,11,12,15,16,20,18,19
Q: find the rank correlation coefficient between poverty and overcrowding from the following table
X, 17,13,15,16,6,11,14,9,7,12
Rank (x), 1, 5, 3, 2, 10 7, 4, 8, 9, 6
Y, 36,46,35,24,12,18,27,22,2,8
D,1,4,0,3,2,0,0,2,1,3
2
d ,1,16,0,9,4,0,0,4,1,9
2
Σ d =44
6 ( 44 )
Rank correlation ¿ 1−
10 ( 100−1 )
X,68,64,75,50,64,80,75,40,55,64
Y,62,58,68,45,81,16,68,48,50,70
Rank (x), in x, 64 accurse 3 times so m = 3
Test 1 ,80,45,55,58,55,60,45,68,70,45,85
Test 2,82,56,50,43,56,62,64,65,70,64,90
Scatter diagram
This is a graphical method of studying correlation between 2 variables
Scatter diagram is a visual aid to show the presents or absence of correlation between 2 variables
Regression analysis
It is the estimation of or the prediction of the unknown value of 1 variable from the known-value at
the other variable
It is a statistical device used to study the relationship between 2 or more variables that are related
The variable which influences the values or is used for prediction is called independent variables.
In multiple regression analysis there are more than 2 variables and we try to find out the effect of 2
or more independent variable on 1 depend variable.
If the regression curve is not a straight line, we say that it is nonlinear regression between variable
under the study
( byx ) × ( bxy ) =r 2
byx and bxy will have the same sign as r
( bxy ) =r ( σxσy )
Both the regression coefficients will not be greater than 1. i.e., one of them can be greater than 1 or
both can be less than 1.
Properties of regression line
2 lines intercept at ( x ' , y ' )
Regression equation of y on x
y− y ' =( byx ) ( x −x' )
( byx )=
( n Σ xy−(Σ x × Σy)
2
n Σ x −( Σ x )
2
)
' Σx
x=
n
' Σy
y=
n
Regression equation of x on y
x−x ' =( bxy ) ( y− y ' )
( bxy ) =
( n Σ xy −(Σ x . Σy)
2
n Σ y −(Σ y )
2
)
Where, byx and bxy are known as regression coefficients
x,2,3,4,5,6
y,3,5,4,8,9
x 5 6 7 3 2 23
y 4 5 8 2 1 23
2
y
( bxy ) =
( n Σ ( xy ) −(Σ x × Σy)
n Σ y 2− ( Σ y )
2 )
Q: the following data of age of husband-and-wife form 2 regression equations and calculate the
husband’s age when the wife’s age 16 also find the age of wife when husband’s age 40
X,36,23,27,28,28,29,30,31,33,35
Y,29,18,20,22,27,21,29,27,29,28
Relation between correlation coefficient (r) regression coefficient (byx, and bxy)
Byx=r ¿
Bxy=r ¿
2
r = ( byx )( bxy )
r =√ ( byx )( bxy )
Q: find r if byx = -.2 bxy = -.7
r =√ ( byx )( bxy )
= -0.374
Byx = r ( σyσx )
q: find byx if 2x+4y-5 = 0 is the equation of y on x
Given 2x+4y-5=0,
The principle of least square says that the sum of the squares of the error between the observed
values and the corresponding estimated value should be the least.
σ y = ( na )+ bσ ( x )
σ (xy) = ¿
q: fit a straight line to the following data. Estimate the value of y when x = 3.5
x,1,2,3,4,5
y,14,13,4,5,2
xy,14,26,12,20,10
2
x ,1,4,9,16,25
σ x = 15
σ y = 38
σ xy = 82
σ x^2 = 55
38 = ( 5 a )+15 b
82 = ( 15 a )+ (55 b )
Equation 1, ( 5 a )+ ( 15 b )=38
Equation 3 – equation 2
-10b = 32
32
b=
−10
b=−3.2
Substitute b = -3.2 on equation 1
( 5 a )+ ( 15 (−3.2 )) =38
( 5 a )=38+48
( 5 a )=86
86
a=
5
a=17.2
Apply a and b to y = a+bx
y=17.2 x−3.2
Applying x as 3.5 to the equation
X,1,2,3,4,6,8
Y,2.4,3,3.6,4,5,6
2
x ,1,4,9,16,36,64
Xy,2.4,6,10.8,16,30,48
σ x = 24
σ y = 24
σ (xy) = 113.2
2
σx =130
Normal equations are:
24=( 6 a )+ 24 b
113.2=( 24 a )+ ( 130 b )
Equation 1, ( 6 a )+ 24 b=24
34b= 17.2
17.2
b=
34
b=0.505
Substitute b = 0.505 on equation 1
( 6 a )+ 24 ( 0.505 ) =24
( 6 a )+ 12.12=24
6 a=(24−12.12)
6 a=11.88
11.88
a=
6
a=1.98
Substituting a and b to the equation y
y=1.98+ 0.505 x
Notes
when the equation of strait line is y = ax+b
σy= ( nb ) +a Σ x
2
σ (xy)=aσ x + bσ x
q: fit a straight line by the method of least square for the following data
x,0,1,2,3,4
y,1,1.8,3.3,4.5,6.3
x^2, 0,1,4,9,16
xy,0,1.8,6.6,13.5,25.2
σ x = 10
σ y = 16.9
σ x^2 = 30
σ (xy) = 47.1
normal equations
16.9= ( 5b ) +a 10
47.1=a 30+b 10
Equation 1, 10 a+ ( 5 b )=16.9
Equation 3 – equation 2
Equation 3, 30a+15b=50.7
5b=3.6
3.6
b= =0.72
5
Substitute b = 0.72 in equation 1
10 a+ ( 3.6 )=16.9
10 a=16.9−( 3.6 )
10 a=13.3
13.3
a=
10
a=1.33
y=ax+ b
y=1.33 x +0.72
Probability
The probability of a given event may be defined as the numerical value given to the likelihood of the
occurrence of that event.
0 is for an event which can’t occur and 1 for an event certain to occur (sure)
e.g., when we toss a coin the event of getting head is uncertain so the probability is neither 0 nor 1
but between the 2.
1
Therefore, the probability for getting head =
2
Random experiment
An experiment that has 2 or more outcomes which various unpredictable manor from trial to trial
which is conducted under uniform conditions is called a random experiment.
e.g., when a coin is tossed the probability of getting head is a sample point
sample space
the sample space of a random experiment is the set containing all the sample points of that random
experiment
when 2 coins are tossed the sample space is { ( head , tail ) , ( head , head ) , ( tail , head ) , ( tail , tail ) }
q: a box contains 10 tickets, each numbered from 1 to 10. A ticket is drawn. What is the sample
space?
Sample space = { ( g , g , g ) , ( g , g ,b ) , ( g , b , g ) , ( g , b , b ) , ( b , g , g ) , ( b , g , b ) , ( b , b , g ) , ( b ,b ,b ) }
sample space =
{ ( 1 , 1 ) , ( 1 , 2 ) , ( 1 , 3 ) , ( 1 , 4 j ) , ( 1 , 5 ) , ( 1 , 6 ) , ( 2 ,1 ) , ( 2 ,2 ) , ( 2 ,3 ) , ( 2 , 4 ) , ( 2 , 5 ) , ( 2 , 6 ) , ( 3 ,1 ) , … , ( 6 , 6 ) }
Total number of cases = 36
6
P ( both dice shows same number )
36
10
p ( 1 die shows 5 )=
36
6
p ( first die shows 5 )=
36
5
p ( thetotal of the numbers of the dice are 8 )=
36
10
P (The total of the numbers on the dice is greater than 8) =
36
3
P (a sum of 10) =
36
Q: a ball is drawn from a bag containing 4 whites 6 black and 5 green balls. Find the
probability that a ball is drawne:
1. White
2. Green
3. Black
4. Not green
5. Green or white
4
p ( drawing a white ball )=
15
5
p ( drawing a green ball ) =
15
6
P (drawing a black ball) =
15
10
p ( not green )= 15
❑
9
p ( green∨white )=
15
Q: a card is drawn from pack of cards. What is the probability that it is:
1. A black card
2. A king
3. A queen
4. A spade
5. A spade king
6. A king or a queen
Sample space
sample space={ Ace of Spades(♠), 2 of Spades , 3 of Spades , 4 of Spades ,5 of Spades , 6 of Spades , 7 of Spades
26
p ( drawing a black card )=
52
4
P (drawing a king) ¿
52
4
p ( drawing a queen )=
52
13
p ( drawing a spade )=
52
1
p ( drawing a spade king )=
52
8
p ( drawing a king∨queen )=
52
Q: what is the probability that a leap year selected at random will contain 53 sundays
52 weeks have 52 Sundays. So, a leap year will have 53 Sundays only if the last 2 days of the year is a
Sunday.
2
Therefore, p (getting 53 Sundays) =
7
Q: what is the probability of getting 3 white balls in a draw of 3 balls from a box containing 5 whites
and 4 blacks
5 c 3 10
Therefore p ( getting 3 white balls )= =
9 c 3 84
Q: a bag contains 7 white and 9 black balls. 3 balls are drawn together. What is the probability that:
9c3 7c3
p ( all are blacks )= p ( all are whites )=
16 c 3 16 c 3
( 7 c 1) ( 9 c 2)
p ( 1 white∧2 blacks ) =
16 c 3
( 7 c 2 )( 9 c 1 )
p ( 2 whites∧1 black )=
16 c 3