Regression
Regression
Measure of central tendency (Averages) is the middle point and unique value that describes the entire data. Methods: 1.Arithmatic mean or mean 2.Median 3.Mode 4.Geometric Mean 5.Hormonic Mean
Mean:The mean of n observations is the ratio of sum of n observations to the total no of observations. Ungrouped data/ Raw data:
Sumofobservations Mean ! No.ofobservations
ean
f vx f
Deviation Method:
ean f v d A vh f
x A h
1.Child care community Nursery is eligible for a county social services grant as long as the average age of its children stays below 9. If these data represent the ages of all the children currently attending ChildCare, do they qualify for the grant? 8 5 9 10 9 12 7 12 13 7 8
2. The following data represent the ages of patients admitted to a small hospital. 85 75 66 43 40 88 80 56 56 67 89 83 65 53 75 87 83 52 44 48 a) Construct a frequency distribution with classes 40-49,50-59 etc.. b) Compute the A.M
3)National Tire company holds reserve funds in short-term marketable securities. The ending daily balance (in million $) of the marketable securities account for 2-weeks is shown below: Week 1: 1.973 1.970 1.972 1.975 1.976 Week 2: 1.969 1.892 1.893 1.887 1.895 What was the average amount invested in marketable securities during a) the first week b) the second week c) the 2-week period d) An average balance over the 2-weeks of more than $1.970 millions would qualify National Tire company for higher interest rates. Does it qualify? e) If the answer to part (c) is less than $1.970 millions, by how much would the last days invested amount have to rise to qualify the company for the higher interest rates?
Merits:1.It is based on each and every observation of the data. 2.It is useful for performing statistical procedures such as comparing the means from several datasets. 3.It is unique Demerits:1.It is highly affected by extreme values 2.It is not suitable in case of open-end class intervals.
Ex:- Compute average for the following 2 , 8 , 9 , 11 Mean = (2+8+9+11)/4=30/4=7.5 Mean is not suitable
Median:Median is the value which separates the entire data into two parts. Raw data: 1.Arrange the data in an ascending or descending order of magnitude. 2.Compute [(n+1)/2]th term which gives the position of the median.
edian
L
vh
Where L is the lower limit of the median class f is the frequency of the median class h is the width of the median class interval N is the total frequency m is the sum of all the class frequencies up to but not including the median class Median class is the cumulative frequency just greater than N/2.
1.Find the median 5 , 7, 4, 9, 5, 6, 2 Ans: 2, 4, 5, 5, 6, 7, 9 n=7 i.e odd median=(7+1)/2=4th term=5 2.Find the median 8, 7, 9, 4, 8, 10, 9, 9, 3, 5 Ans: 3, 4, 5, 7, 8, 8, 9, 9, 9, 10 n=10 even Median=(10+1)/2=5.5th term i.e median is the average of 5th and 6th terms=(8+8)/2=8
C.f 2
2 5 7(m) 6(f) 4 3
13 Median Class 17 20
N/2=20/2=10 L=4000
The following data relates to distribution of total loans/credit among various borrowers according to rate of interest.
Rate of interest % of Total C.F <6 6-8 8-10 10-12 >12 10.2 15.5 26 32.6 15.7 N=100
24.3 50 25.7 Median ! 8 2!8 ! 8 1.8692 ! 9.8692 $ 9.87 13 26
Merits: 1. Extreme values do not affect its value 2. Specially useful when data is skewed De-Merits: 1.It may not be representative of the data Ex: 1, 2, 10 2. It does not consider all the observations
Mode
The mode of a data set is the value that occurs with greatest frequency. The greatest frequency can occur at two or more different values. If the data have exactly two modes, the data are bimodal. If the data have more than two modes, the data are multimodal.
Mode !
f1 f 0 h 2 f1 f 0 f 2
Compute average growth rate (savings A/c) for the following Year Interest rate(%) 7 8 10 12 18 Growth Savings factor at the end of year 1.07 107 1.08 1.10 1.12 1.18 115.56 127.12 142.37 168
1 2 3 4 5
Growth factor = 1+(Interest rate/100) Let us deposit initially Rs100.00 Average growth factor=(1.07+1.08+1.10+1.12+1.18)/5=1.11 i.e an average interest rate of 11% per year Therefore a Rs100.00 deposit would grow in five years to (100)x(1.11)x(1.11)x(1.11)x(1.11)x(1.11)= Rs168.51
Measures of Variability
Range Interquartile Range Quartile Deviation Mean Deviation Variance Standard Deviation Coefficient of Variation
Range
The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values.
Interquartile Range
The interquartile range of a data set is the difference between the third quartile and the first quartile. It overcomes the sensitivity to extreme data values.
Interquartile Range 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80
50% of the apartment have rent between 445&525 and the range of their rents being 80
Q1 2
Quartile eviation is .
Q1 2
. is
Q1 3 Q1
3
Mean Deviation
M.D is the arithmetic mean of the absolute deviations of all items from average and is given 1 by
M .D !
x Mean n
Variance
The variance is a measure of variability that utilizes all the data.
It is based on the difference between the value of each observation (xi) and the mean.
The variance is the average of the squared differences between each data value and the mean and usually denoted by 2 .
1 2 W ! x x
n 1 2 2 W ! f x x
2
1 2 W !
fd
fd
2 vh
x -a Where d ! h
The standard deviation of a data set is the positive square root of the variance, denoted by . 1 2 W! x x n
W!
1 N
f x x
2
1 W! N
1 fd N
fd
x -a Where d ! h
Roll No Marks
1 25
2 55
3 5
4 45
5 15
6 35
th n 1 7 ite ! 1.75th ite Q1 ! size of ite ! size of 4 4 size of 1.75 th ite ! size of 1st ite 0.75(size of 2 nd size of 1st ite )
th
A Quality control laboratory received samples of electric bulbs for testing their lives, from two suppliers. The results were as follows.
Length of life (in hours) 1500-2000 2000-2500 2500-3000 Company A Company B
16 26 8
18 22 8
Which companys bulbs have the greater length of life? Which companys bulbs are more uniform w.r.t their lives?
The shareholders Research Centre of India has recently conducted a research-study on price behaviour of three leading industrial shares, A,B, and C for the period 1979-1985, the results of which are published as follows in its Quarterly Journal: Share Average price (RS) A B C 18.2 22.5 24.0 Standard deviation 5.4 4.5 6.0 Current selling price 36.00 34.75 39.00
i)
Which share, in your opinion, appears to be more stable? ii) If you are the holder of all the three shares, which one would you like to dispose of at present, and why?
X 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
Bell shaped cu
80 70 60 50 40 30 20 10 0 1 2 3 4 5 x f
Relative Measures:-
Karl Pearsons Coefficient of Skewness is given by Mean Median Sk ! Standard deviation 3( Mean Mode) Sk ! Standard deviation Bowlys Coefficient of Skewness is given by Q3 Q2 Q2 Q1 Sk B ! Q3 Q2 Q2 Q1
Ex: The data on the profits (in Rs lakh) earned by 60 companies is as follows:
Profits No.of companies Below 10 5 10-20 12 20-30 20 30-40 16 Above 40-50 50 5 2