0% found this document useful (0 votes)
41 views

6 Descriptive StatisticsIntroduction

Uploaded by

balbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

6 Descriptive StatisticsIntroduction

Uploaded by

balbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

F.Y.B.Sc.

(Computer Science)
SEMESTER - I (CBCS)

DESCRIPTIVE STATISTICS
AND INTRODUCTION TO
PROBABILITY

SUBJECT CODE : USCS106


© UNIVERSITY OF MUMBAI
Prof. Suhas Pednekar
Vice-Chancellor,
University of Mumbai,

Prof. Ravindra D. Kulkarni Pr of. Pr ak ash M ahanwar


Pro Vice-Chancellor, Director,
University of Mumbai, IDOL, University of Mumbai,

Programme Co-ordinator : Shri Mandar Bhanushe


Head, Faculty of Science and Technology,
IDOL, University of Mumbai, Mumbai
Course Co-ordinator : Mr. Sumedh Shejole
Asst. Professor,
IDOL, University of Mumbai, Mumbai

Course Writers : Mr. Sujal Shah


Asst. Professor,
Patkar Varde College
Goregaon (w) Mumbai

October 2021, Print - I


Published by : Director
Institute of Distance and Open Learning ,
University of Mumbai,
Vidyanagari, Mumbai - 400 098.
ipin Enterprises
DTP Composed and : Tantia UniversityPress
Mumbai Jogani Industrial Estate, Unit No. 2,
Printed by Ground Floor,
Vidyanagari, Sitaram
Santacruz (E),Mill Compound,
Mumbai - 400098
J.R. Boricha Marg, Mumbai - 400 011
CONTENTS
Unit No. Title Page No.

Unit - I

1. Data Presentation 01

2. Measures of Cetral tendency 12

3. Measures of Dispersion 23

Unit - II
4. Moments, Skewness And Kurtosis 33

5. Correlation And Regression Analysis 41

Unit- III

6. Proabaility 56


Syllabus
F.Y.B.Sc. (CS)
Semester I (CBCS)
Descriptive Statistics and Introduction to Probability

Objectives:
The purpose of this course is to familiarize students with basics of
Statistics. This will be essential for prospective researchers and
professionals to know these basics.

Expected Learning Outcomes:


1) Enable learners to know descriptive statistical concepts
2) Enable study of probability concept required for Computer learners

Unit I :
Data Presentation
Data types : attribute, variable, discrete and continuous variable
Data presentation : frequency distribution, histogram o give, curves, stem
and leaf display

Data Aggregation
Measures of Central tendency: Mean, Median, mode for raw data, discrete,
grouped frequency distribution.
Measures dispersion: Variance, standard deviation, coefficient of variation
for raw data, discrete and grouped frequency distribution, quartiles,
quantiles Real life examples

Unit II
Moments: raw moments, central moments, relation between raw and
central moments Measures of Skewness and Kurtosis: based on moments,
quartiles, relation between mean, median, mode for symmetric,
asymmetric frequency curve.

Correlation and Regression: bivariate data, scatter plot, correlation,


nonsense correlation, carl Pearson’s coefficients of correlation
independence.

Linear regression: fitting of linear regression using least square


regression, coefficient of determination, properties of regression
coefficients (only statement)

Unit III
Probability : Random experiment, sample space, events types and
operations of events

I
Probability definition : classical, axiomatic, Elementary Theorems of
probability (without proof)

Text Book:
1. Trivedi, K.S.(2001) : Probability, Statistics, Design of Experiments and
Queuing theory, with applications of Computer Science, Prentice Hall
of India, New Delhi

Additional References:
1. Ross, S.M. (2006): A First course in probability. 6 th Edition Pearson
2. Kulkarni, M.B., Ghatpande, S.B. and Gore, S.D. (1999): common
statistical tests. Satyajeet Prakashan, Pune
3. Gupta, S.C. and Kapoor, V.K. (1987): Fundamentals of Mathematical
Statistics, S. Chand and Sons, New Delhi
4. Gupta, S.C. and Kapoor, V.K. (1999) Applied Statistics S. chand and
son, New Delhi.
5. Montgomery, D.C. (2001): Planning and Analysis of Experiments,
wiley.



II
1
DATA PRESENTATION
Unit Structure
1.0 Objective
1.1 Introduction
1.2 Data Presentation
1.2.1 Data Types
1.2.1.1 Ungrouped Data
1.2.1.2 Grouped Data
1.2.2 Frequency Distribution
1.2.2.1 Types of class Intervals
1.2.3 Graphs and displays
1.2.3.1 Frequency curve
1.2.3.2 Histogram
1.2.3.3 O give curves
1.2.3.4 Stem and Leaf display
1.3 Summary
1.4 Exercise
1.5 List of References

1.0 OBJECTIVE

The learner will be able to understand variuos data types,


understand frequency distributon and be able to plot simple graphs like
Histograms, O give curve to display data. Also stem and leaf type of
display can be learned from this chapter.

1.1 INTRODUCTION

Any Statistical study involves collecting, processing, analysing


data and then reporting information from this data.

Statistics is defined as “Statistics is a science that includes the


methods of collecting, organising, presenting, analysing and interpreting
numerical facts and decision taken on that basis”.

1
1.2 DATA PRESENTATION

1.2.1 DATA TYPES


Data(or Distribution) can be classified as Ungrouped data and Grouped
Data.
Grouped data can be further classified as Discrete and Continous type.
1.2.1.1 Ungrouped Data
In this type, no grouping is done on data and data is available in
the raw form.
Ex 1 : Age of students in a group of five people can be 35, 38, 37, 30 and
35 years
Ex 2 : Scores of six students in a Statistics test can be 4, 6, 8, 3, 2 and 9
marks
1.2.1.2 Grouped Data
In this type data is grouped for some purpose. Grouped data can be
Discrete or Continuous.
Grouped Discrete Data
Number of occurences of each discrete data can be marked as
frequency of that data value in Discrete type of Data Presentation
Ex 3 : The scores of 100 students in a 10 Marks Physics class test can be
grouped as :
Marks 0 1 2 3 4 5 6 7 8 9 10
Number of students 2 3 6 12 18 15 13 16 8 6 1

Ex 4: The number of students in a degree college in various courses :


Course BCom BMS BScCS BScIT BAF
Number of students 145 98 62 48 80

Grouped Continuous Data


Some suitable class intervals are created and data is placed in the
appropriate class.
Ex 5 : The scores of students in a 100 Marks Calculus class test can be
grouped as :

Marks 0-40 40-60 60-75 75-100


Number of students 12 32 28 12

Ex 6: Expenses per month of families in a society are :

Expenses in 10,000- 20,000- 30,000- >40000


Rupees 20,000 30,000 40,000
Number of 5 12 18 3
families

2
Ex 7 : Time to manufacture an auto assembly is given in hours

Time (in hrs) 1-3 3-5 5-7 7-9 9-11


Number of assemblies 1 13 15 12 3

1.2.2 FREQUENCY DISTRIBUION

After collecting data, it can be organised in some meaningful form.


The data is thus compressed in systematic manner, for example collected
data can be organised in a tabular form.

Ex 8 : Following data gives marks scored by students in a test of 10


marks. Prepare frequency distribution table.
2, 4, 8, 6, 3, 4, 5, 4, 8, 6, 5, 3, 2, 0, 3, 5, 8, 9, 8, 3.

Solution:
Marks Tally Marks Frequency
0 | 1
1 0
2 || 2
3 |||| 4
4 ||| 3
5 ||| 3
6 || 2
7 0
8 |||| 4
9 | 1
10 0

Data can also be grouped with some suitable class Interval in


frequency table.

1.2.2.1 Types of Class Intervals


Three methods of making class Intervals are :
a) Exclusive method, b) Inclusive method and c) Open end classes.

a) Exlcusive method
The upper limit of a class becomes the lower limit of the next class
in this method.
For example, classes can (10-20), (20-30), (30-40) and so on.
b) Incusive method
In this type the lower limit of a class is kept onemore than the
upper limit of the previsous class.
For example, classes can be (10-19), (20-29), (30-39) and so on.

3
a) Open end classes
In this type, the lower class limit of the first class is not given. Also
the upper limit of the last class may not be given.
For example, classes can be (<100), (100-200), (200-300), (>300)

1.2.3 GRAPHS

A frequency distribution can be represented by Graphs. Graphs


represent the data pictorically.
Types of Graphs :
a) Frequency curve
b) Histogram
c) O give curve
d) Stem and Leaf display

1.2.3.1 Frequency curve


Ex 9 : Plot Frequency curve
Month Jan Feb Mar April May June July Aug Sept Oct Nov Dec
Sales 120 135 148 190 212 250 283 312 287 252 313 314
(in Lakh)

1.2.3.2 Histogram

In this type, each class is represented by a vertical bar. The bars are
adjacent to each other in Histogram. The areas of the bars are proportional
to the frequencies.

4
Ex 10 : Plot Histogram
Number of employees
10000-20000 25
20000-30000 15
30000-40000 30
40000-50000 10

Solution :

Ex 11 : Plot Histigram and hence find Mode


CI 0-5 5-10 10-15 15-20 20-25 25-30
f 20 30 40 50 30 20

0 5 10 15 20 25 30

Mode =15.4 (Ans)


1.2.3.3 O give curves
An O give curve represents the cumulative frequencies for the classes.
5
Ex 12 : Prepare Less than and More than cumulative frequency table.
Salary Range No. of workers
10000-20000 125
20000-30000 134
30000-40000 150
40000-50000 85
50000-60000 15

Solution :
Salary Range No. of workers Less than cf More than cf
10000-20000 125 125 510
20000-30000 134 259 385
30000-40000 150 409 251
40000-50000 85 494 101
50000-60000 16 510 16

O give curves are of two types :


a) Less than O give curve and b) More than O give curve
a) Less than O give curve

Ex 13 : Plot Less than Ogive curve


Class Frequency
10-20 12
0-30 24
30-40 43
40-50 38
50-60 22
60-70 11

Solution :
Class Frequency Cumulative
frequency
0-10 0 0
10-20 12 12
20-30 24 36
30-40 43 79
40-50 38 117
50-60 22 139
60-70 11 150

6
Ex 14: Plot More than Ogive curve

Class Frequency
5-10 25
10-15 30
15-20 35
20-25 38
25-30 22
35-40 11
40-45 5
45-50 4

Solution :

Class Frequency More than


Cumulative
frequency
5-10 25 170
10-15 30 145
15-20 35 115
20-25 38 80
25-30 22 42
35-40 11 20
40-45 5 9
45-50 4 4
50-55 0 0

7
Ex 15: Plot Less than O give curve and hence find Median.

CI 0-10 10-20 20-30 30-40 40-50 50-60


f 15 32 41 45 28 15

Solution :

CI 0-10 10-20 20-30 30-40 40-50 50-60


f 15 32 43 45 28 15
Cf 15 47 90 135 163 178

Median = 29, the point of intersection of cf and Rank lines Ans)

Ex 16 : Plot Less than and More than O give curves


Range f
10-20 5
20-30 15
30-40 20
40-50 10
50-60 10

Solution :
Range f Less than cf More than cf
10-20 5 5 60
20-30 15 20 55
30-40 20 40 40
40-50 10 50 20
50-60 10 60 10

8
1.2.3.4 Stem and Leaf display
Stem and Leaf plot shows exact value of individual observation. It
uses ungrouped data.

Steps to draw Stem and Leaf plot :


1) Divide each value of the observation into two parts. One part consisting
of one or more digits as stem and rest digits as leaf.
2) The stem values are listed on the left of the vertical line and each leaf
value corresponding to the stem is written in horizontal line to the right
of the stem in the increasing order.
3) The stem and the leaf display gives us the ordered data and the shape of
the distribution.
Ex 17 : Display the given data as stem and leaf
42, 53, 65, 63, 61, 77, 47, 56, 74, 60, 64, 68, 45, 55, 57, 82, 42, 35, 39, 51,
65, 55, 33, 76, 70, 50, 52, 54, 45, 46, 25, 36, 59, 63, 83.
Solution :
Stem Leaf
2 5
3 3, 5, 6, 9
4 2, 2, 5, 5, 6, 7, 9
5 0, 1, 2, 3, 3, 4, 5, 5, 6, 7
6 0, 1, 3, 4, 5, 5, 8
7 0, 4, 6, 7
8 2, 3
Comparison of Histogram and Stem and Leaf plot :
1) Stem and Leaf display is simple to plot
2) Data can be easily seen in both stem and Leaf and Histogram.
3) Hsitogram is more suitable for large data set.

9
1.3 SUMMARY

1) Data can be of ungrouped or grouped (discrete or continuous) type


2) Frequency table gives count of observations of each variable or each
class
3) Frequency curve gives data trend over period of time
4) Histogram gives pictorial representation of data in each class
5) O give curve plots cumulative frequencies in successice classs
6) Stem and Leaf plot gives more clear picture of individual data

1.4 EXERCISE

1) Explain various types of distributons with suitable examples for each.


2) Plot frequency curve

Quarter Expenses
(in K)
I 25
II 32
III 35
IV 25

3) Plot Histogram
Class Frequency
0-4 15
4-8 22
8-12 32
12-16 25
16-20 22
4) Plot Less than O give curve
Class Frequency
10-20 20
20-30 36
30-40 45
40-50 62
50-60 27
60-70 20

5) Plot More than O give curve


Class Frequency
0-20 15
20-40 16
40-60 32
60-80 24
80-100 22
100-120 20

10
6) Draw stem and leaf plot
22, 25, 28, 32, 35, 21, 42, 42, 53, 52, 33, 35, 46, 51, 44, 34, 42, 53

7) Draw stem and leaf plot


15, 22, 26, 35, 24, 21, 25, 30, 35, 38, 24, 26, 26, 29, 32, 38, 27, 33, 35,
24, 25

1.5 LIST OFREFERENCES

1) Probability, Statistics, design of experiments and queuing theory with


applications of Compter Science, S. K. Trivedi, PHI
2) Applied Statistics, S C Gupta, S Chand



11
2
MEASURES OF CENTRAL TENDENCY
Unit Structure
2.0 Objective
2.1 Introduction
2.2 Measures of Cetral tendency
2.2.1 Mean
2.2.1.1 Mean of Ungrouped data
2.2.1.2 Meanof Grouped Discrete data
2.2.1.3 Mean of Grouped Continuous data
2.2.1.4 Merits and Demerits of AM
2.2.2 Median
2.2.2.1 Median of Ungrouped data
2.2.2.2 Median of Grouped Discrete data
2.2.2.3 Median of Grouped Continuous data
2.2.2.4 Merits and Demerits of Median
2.2.3 Mode
2.2.3.1 Mode of Ungrouped data
2.2.3.2 Mode of Grouped Discrete data
2.2.3.3 Mode of Grouped Continuous data
2.2.3.4 Merits and Demerits of Mode
2.2.4 Relationship between Mean, Median and Mode
2.3 Summary
2.4 Exercise
2.5 List of References

2.0 OBJECTIVE

Learnr will be able to understand concept of Averages. Also


learner will be able to take decision on correct selection of central value
for the given distribution.

2.1 INTRODUCTION

It is required to convert the given set of data into some form which
can represent the data. Such reduced or compressed form should be easy
to interpret the distribution and also it should allow further algebraic
treatment. Averages are such compact form of the distribution. Such

12
compact form to represent central tendency of the distribution can also be
calles Averages.

Objective of a good measures of central tendency :


1) To condense the data in a single value
2) To enable comprison among various data sets

Requisites of a good Measure of Sentral tendency :


1) It should be rigidly defined.
2) It should be simple to nderstand and interpret.
3) It should cover all observations in the data set.
4) It should be capable of further algebraic treatment.
5) It should have good sampling stability.
6) It should not be undulyaffeted by extreme values.
7) It should be easy to calculate.

2.2 MEASURES OF CENTRAL TENDENCY

Types of Averages :
There are three types of Averages : Mean, Median and Mode. Also
there are some more types like Geometric Mean, Harmonic Mean and
Quantiles.

2.2.1 MEAN

2.2.1.1 Mean of Ungrouped Data )

For Ungrouped Data :

This can also be written as :

Ex 1 : Find Arithmetic Mean of 4, 5, 2, 5, 7

Solution :

(Ans)
2.2.1.2 Mean of Grouped (Discrete) Data )
13
For Grouped (discrete) Data :

This can also be written as :

Ex 2 : Find Arithmetic Mean (AM) of


X 1 2 3 4 5
f 20 12 25 23 30

Solution :
X f fX
1 20 20
2 12 24
3 25 75
4 23 92
5 30 150
Total 110 361

Mean, (Ans)

Ex 3 : Marks obtained by students of Discrete mathematics class are as


given below. Find AM.
Marks 1 2 3 4 5 6 7 8 9 10
No of students 12 25 23 30 23 24 12 26 13 3

Solution :
Marks, X 1 2 3 4 5 6 7 8 9 10 Total
No of 12 25 23 30 23 24 18 27 14 3 191
students, f
fX 12 50 69 120 115 144 84 208 117 30 949

Mean, (Ans)

14
2.2.1.3 Mean of Grouped (Continuous) Data )

For Grouped (continuous) Data :

This can also be written as :

Ex 4 : Find Arithmetic Mean (AM) of


Class 15- 20- 25- 30- 35- 40- 45- 50- 55-
Interval 20 25 30 35 40 45 50 55 60
f 4 5 11 6 5 8 9 6 4

Solution :
Class 15- 20- 25- 30- 35- 40- 45- 50- 55-
Interval 20 25 30 35 40 45 50 55 60
f 4 5 11 6 5 8 9 6 4
Class 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5
Mark, X
fX 70 112.5 302.5 195 187.5 340 427.5 315 230

Mean, (Ans)

Ex 5 : Find Arithmetic Mean (AM) of


Class Interval 10-20 20-30 30-40 40-50 50-60
f 15 12 18 19 21

Solution :
Class Interval 10-20 20-30 30-40 40-50 50-60 Total
f 15 12 18 19 21 85
Class Mark, x 15 25 35 45 55
fX 225 300 630 855 1155 3165

Mean, (Ans)

15
2.2.1.4 Merits and Demerits of AM

Merits of AM
(i) It is rigidly defined
(ii) It is easy to calculate and easy to understand
(iii) It is based on all observations
(iv) It is capable of further algebraic treatment

Demerits of AM
(i) It is affected by extreme values
(ii) It is not possible to calculate AM for open end class intervals
(iii) It is unduly affected by extreme values
(iv) It may be number which itself may not be present in data

2.2.2 MEDIAN

2.2.2.1 Median of Ungrouped Data

Median is the positional average of the data set.


Data needs to be arranged in ascending order to find the Median.
Median is middle value when there are odd number of observations.
Median is average of middle two values when there are even number of
observations.

Ex 6 : Find Median of 5, 4, 3, 6, 8, 2, 5
Solution : Arrange the data in ascending order.
2, 3, 4, 5, 5, 6, 8
Median = 5 (Ans)

Ex 7 : Find Median of 2, 4, 3, 6, 8, 2, 5, 6
Solution : Arrange the data in ascending order.
2, 2, 3, 4, 5, 6, 6, 8

Median = (Ans)

2.2.2.2 Median of Grouped(discrete) Data

Use cumulative frequency to find Median of Grouped(discrete) data.

Ex 8 : Find Median
X 1 2 3 4 5
f 20 12 25 23 30

Solution :
X 1 2 3 4 5
f 20 12 25 23 30
Cf 20 32 75 98 128

16
N = 128
Rank = (N+1)/2 = 129/2 = 64.5
Cf value first exceed Rank at 75. So, corresponding X value is Median
Median = 3 (Ans)
2.2.2.3 Median of Grouped(continuous) Data

Use cumulative frequency to find Median of Grouped(continuous) data.

Steps :
1) Arrange data in ascending order
2) Obtain cumulative frequency against each class
3) Find sum of all frequencies (N).
4) Find Rank, R=N/2
5) Locate a cumulative frequency which first appears higher than Rank
6) Use given formula to find Median

Where,

Ex 9 : Find Median
Class Interval 0-10 10-20 20-30 30-40 40-50
F 2 12 25 23 3

Solution :
Class Interval 0-10 10-20 20-30 30-40 40-50
f 2 12 25 23 3
Cf 2 14 39 62 65

(Ans)

17
Ex 10 : Find Median
Class Interval 10- 20- 30- 40- 50- 60- 70- 80-
20 30 40 50 60 70 80 90
F 16 21 20 28 10 3 1 1

Solution :
Class Interval 10- 20- 30- 40- 50- 60- 70- 80- Total
20 30 40 50 60 70 80 90
f 16 21 20 28 10 3 1 1 100
Cf 16 37 57 85 95 98 99 100

(Ans)

2.2.2.4 Merits and Demerits of MEDIAN

Merits of Median
(i) It is not affected by extreme value
(ii) It is easy to calculate. Sometimes, Median can be found out simply by
observation
(iii) It can be located Graphically
(iv) It is easy to understand and easy to calulate

Demerits of Median
(i) It does not include all data in the data set
(ii) For larger data sets, arranging numbers in ascending order is tedious
(iii) It is not capable of further algebraic treatment
(iv) It does not capture small changes in data set

2.2.3MODE
Mode is the highest occuring number in the distribution, or it is the
number with the highest frquency.

2.2.3.1 Mode of Ungrouped Data )


Mode of ungrouped data can be simply obtained by observation.
Arrange all the numbers in the ascending (or descending) order and count
the occurrence of each number. The number with thehighest or most
occurrence is Mode. There can bemore than Mode in the distribution.

Ex 11 : Find Mode of 7, 5, 8, 7, 6, 8, 2, 7
18
Solution : Arranging inascending order : 2, 8, 6, 7, 7, 7, 8, 8
Since number 7 occurred highest number of times, i.e. three times,
Mode = 7 (Ans)

Ex 12 : Find Mode of 7, 5, 8, 7, 6, 8, 2, 7, 8
Solution : Arranging inascending order : 2, 8, 6, 7, 7, 7, 8, 8, 8
Two numbres 7 and 8 bith occurred three times,
Mode = 7 and Mode = 8 (Ans)

2.2.3.2 Mode of Grouped (discrete) Data )


Ex 13 : Find Mode
X 2 3 4 5 6 7 8
F 12 25 28 63 54 53 17

Since highest frequency is 63, corresponding X value is Mode.


Mode = 5 (Ans)

2.2.3.3 Mode of Grouped (continuous) Data )


Following formula is to be used to find Mode of grouped
(continuous) data.

Where,

Ex 14 : Find Mode
Range 0-4 4-8 8-12 12-16 16-20
F 12 25 28 63 54

Since hhighest frequency is 63, class interval [12-16] is Modal class.

Mode = 15.18 (Ans)

Ex 15 : Find Mode
Range 0-10 10-20 20-30 30-40
F 12 25 28 63

Since highest frequency is 63, class interval [30-40] is Modal class.

19
Mode = 33.57 (Ans)

2.2.3.4 Merits and Demerits of MODE

Merits of Mode
(i) It is not affected by extreme value
(ii) It is easy to calculate. Sometimes, Mode can be found out simply by
observation
(iii) It can be located Graphically
(iv) It is easy to understand and easy to calulate

Demerits of Mode
(i) It does not include all data in the data set
(ii) Mode is not unique, hence not suitable for further algebraic treatment.
(iii) It does not capture small changes in data set
Ex 16 : The following are the weights of 30 wooden logs :
132, 166, 134, 119, 151, 114, 138, 124, 130, 132,
142, 121, 144, 147, 126, 104, 143, 129, 108, 111,
155, 131, 157, 137, 145, 122, 148, 139, 135, 136.

Arrange the data in a frequency table with class interval of 10 kg. each.
The first interval being 100-110. Find Arithmetic Mean (AM), Median and
Mode.

Solution :
Class Mid Tally Frequency fX Cumulative
Interval value mark (f) Frequency
(X) (cf)
100-110 105 || 2 210 2
110-120 115 ||| 3 345 5
120-130 125 |||| 5 625 10
130-140 135 | | | || | | | 10 1350 20
140-150 145 |||| | 6 870 26
150-160 155 ||| 3 465 29
160-170 165 | 1 165 30

20
Mean :

Mean, (Ans 1)

Median :

(Ans 2)

Mode :

Mode = 135.56 (Ans 3)

2.2.4 RELATIONSHIP BETWEEN MEAN, MEDIAN AND MODE

For moderately assymetrical distributions, the empirical formula


relating Mean, Median and Mode is :

Ex 17 : Find Mode if Mean is 12 and Median is 15


Solution :

(Ans)

2.3 SUMMARY

Averages (Mean, Median and Mode) represent the central value in


the distribution. The formula for central value depends upon the type of
data. Different data sets can be compared using averages of each data set.

21
2.4 EXERCISE

1) Find AM of 5, 3, 2, 12, 5, 6, 9
2) Find AM of
Class Interval 0-10 10-20 20-30 30-40 40-50
f 125 123 234 220 101

3) Find Median class interval from the following distribution


X 200-202 202-204 204-206 206-208 208-210
f 145 320 445 469 342

4) Find Median
X 10 12 14 16 18
f 210 223 245 268 213

5) Find Median
X 0-4 4-8 8-12 12-16 16-20
F 65 56 43 69 34

6) Find Mode
X 6 7 8 9 10 11
F 21 23 25 37 21 15

7) Find Mode
Range 0-100 100-200 200-300 300-400 400-500
F 123 145 180 162 121

8) Find Mode if Median is 54 and Mean is 62

2.5 LIST OFREFERENCES

1) Probability, Statistics, design of experiments and queuing theory with


applications of Compter Science, S. K. Trivedi, PHI
2) Applied Statistics, S C Gupta, S Chand



22
3
MEASURES OF DISPERSION
Unit Structure
3.0 Objective
3.1 Introduction
3.2 Measures of Dispersion
3.2.1 Variance
3.2.1.1 Variance of Ungrouped data
3.2.1.2 Variance of Grouped Discrete data
3.2.1.3 Variance of Grouped Continuous data
3.2.2 Standard Deviation
3.2.2.1 Standard Deviation of Ungrouped data
3.2.2.2 Standard Deviation of Grouped Discrete data
3.2.2.3 Standard Deviation of Grouped Continuous data
3.2.2.4 Combined Mean and combined standard Deviation
3.2.3 Co efficient of Variation (CoV)
3.2.4 Quartiles
3.3 Summary
3.4 Exercise
3.5 List of References

3.0 OBJECTIVE

The understanding of Dispersion (or deviation) is essential to


completely understand and anlyse the distribution alongwith Central
Tendencies. Variance, Standard Deviation and Quantiles sare useful in
Data analysis. This unit helsp learner to analyse distribution using
measures of deviations.

3.1 INTRODUCTION

The central value of the data can be represented by Averages, the


spread of data can be exlained with the help of Measure of Dispersion.

3.2 MEASURES OF DISPERSIONS

Measure of Dispersion serve the objective of determining the


reliability of an average and compare the variability of different
distributions.

23
Requisite of a Good Measure of Dispersion :
1) It should b erogodly defined.
2) It should covr all observations in the distribution
3) It should have Sampling stability
4) It shuld be capable of further Mathematical treatment
5) It should not be duly affected by extreme values

Some important Measures of Disersion are :


1) Variance (v)
2) Standard Deviation (SD)
3) Quartile Deviation (QD)
4) Range

3.2.1 Variance

The Arithmetic Mean of squares of deviations taken from


Arithmetic Mean is called Variance.

3.2.1.1 Variance of Ungrouped data

Alternate and more convinient formula for Variance is,

Ex 1 : Find Variance of 3, 6, 8, 1, 3

Solution :

(Ans)

3.2.1.2 Variance of Grouped (discrete) data

Alternate and more convinient formula for Variance is,

, where,

Ex 2 : Find Variance of
X 4 5 6 7
F 12 24 23 18

24
Solution :
X 4 5 6 7 Total
F 12 24 23 18 77
16 25 36 49 -
Fx 48 120 138 126 432

(Ans)

3.2.1.3Variance of Grouped (continuous) data

Alternate and more convinient formula for Variance is,

, where,

Ex 3 : Find Variance of
X 0-4 4-8 8-12 12-16
F 12 24 23 18

Solution :
X 0-4 4-8 8-12 12-16 Total
f 12 24 23 18 77
X 2 6 10 14 -
4 36 100 196 -
Fx 48 120 138 126 650
48 864 2300 3528 6740

(Ans)

3.2.2 Standard Deviation

Standard Deviation is square root of the variance. One can find


variance and then take square root of variance, which will give standard
deviation

3.2.2.1 Standard Deviation of Ungrouped data

Ex 4 : Find standard deviation of 3, 6, 8, 1, 3


25
Solution :

(Ans)
Ex 5 : Find standard deviation of 49, 63, 46, 59, 65, 52, 60, 54

(Ans)
3.2.2.2 Standard Deviation of Grouped (discrete) data
Standard deviation of Grouped (discrete) data can be found out by
taking square root of variance

Ex 6 : Find Standard Deviation


X 2 3 4 5 6 7 8 9
f 2 3 4 2 5 3 2 1

Solution :
2 3 4 5 6 7 8 9 Total
2 3 4 2 5 3 2 1 22
4 9 16 10 30 21 16 9 115
8 27 64 50 180 147 128 81 685

(Ans)

26
3.2.2.3 Standard Deviation of Grouped (continuous) data

Standard deviation of Grouped (continuous) data can be found out


by taking square root of variance

Ex 7 : Find standard deviation


X 0-10 10- 20- 30- 40- 50- 60- 70- Total
20 30 40 50 60 70 80
F 2 5 3 6 4 2 1 1
fX

3.2.2.4Combined Mean and combined Standard Deviation


Combined Mean :

Combined Mean of two data sets can be found out using following
formula.

Ex 8 : Find combined Mean of following data sets.


Set 1 Set 2
Number of observations 25 45
Mean 8 9

Solution :

(Ans)

Ex 9 : Find Combined Mean


Set 1 Set 2 Set 3
Number of observations 120 135 145
Mean 51 48 46

Solution :

(Ans)

27
Combined Standard Deviation :

Where,

Ex 10 : Find Combined Mean and Combined Standard Deviation :


Group 1 Group 2
No. of observations 32 25
Mean 12 14
SD 3 4

Solution :
Group 1 Group 2
No. of observations
Mean
SD

(Ans)

3.2.3 Coefficient of Variation (CV)

The Coefficient of Variation is the ratio of standard deviation to


the arithmetic mean expressed as percentage.

CV can be used to know the consistency of the data. A distribution


with smaller CV is more consistent than the other one. CV is also useful
for comparing two or more sets of data that are measued in different units
of measurement.

28
Ex 11 : Find coefficient of variation of 2, 5, 4, 1 and 3

Solution :

x 100 = 47% (Ans)

3.2.4 Quartile Deviation (QD)

Quartile Deviation is defined as ,

Where, Q3 is upper (third) quartil and Q1 is lower (first) quartile.


is defined as,

, where for

Coefficient of QD is defined as,

Ex 12 : Find QD
Class Interval 0-10 10-20 20-30 30-40 40-50
f 2 12 25 23 3

Solution :
Class Interval 0-10 10-20 20-30 30-40 40-50
f 2 12 25 23 3
Cf 2 14 39 62 65

To find Q3 :

Select cumulative frequency value higher or equal to Rank,

29
To find Q1 :

Select cumulative frequency value higher or equal to Rank,

(Ans)

Ex 13 : Find Co-efficient of QD
Class Interval 0-2 2-4 4-6 6-8 8-10
f 14 18 21 20 12

Solution :
Class Interval 0-2 2-4 4-6 6-8 8-10
f 14 18 21 20 12
Cf 14 32 53 73 85

To find Q3 :

Select cumulative frequency value higher or equal to Rank,

To find Q1 :

Select cumulative frequency value higher or equal to Rank,

30
(Ans)

Merits and Demerits of QD

Merits of QD :

1) It is rigidly defined
2) It is not affected by extreme values
3) It can be calculated with open end class intervals

Demerits of QD :

1) It is not based on alll observations


2) It is much affected by sampling fluctuations

3.3 SUMMARY

1) Standard Deviation and Variance are two important measures of


Dispersion.
2) Coefficient of Variation is the ration of standard deviation to mean
expressed as percentage.

3.4 EXERCISE

1) Find SD of 4, 6, 2, 8, 2

2) Find Variance of

X 2 3 4 5 6
F 65 78 110 88 86

3) Find Standard Deviation of

Range 10-20 20-30 30-40 40-50 50-60 60-70 70-80


F 5 4 8 9 4 5 3

4) Find QD and Coefficient of QD of

Range 0-4 4-8 8-12 12-16 16-20 20-24 24-28


F 5 12 24 18 16 12 1

31
5) Find Combined Mean and Combined Standard Deviation

Group 1 Group 2 Group 3


No. of observations 120 135 130
Mean 13 16 15
SD 3 5 4

3.5 LIST OFREFERENCES

1) Probability, Statistics, design of experiments and queuing theory with


applications of Compter Science, S. K. Trivedi, PHI
2) Applied Statistics, S C Gupta, S Chand



32
4
MOMENTS, SKEWNESS AND KURTOSIS
Unit Structure
4.0 Objective
4.1 Introduction
4.2 Moments
4.3 Relation between Central moments and Raw moments
4.4 Skewness
4.5 Kurtosis
4.6 Summary
4.7 Exercise
4.8 List of References

4.0 OBJECTIVE

Moments are used to describe characteristics of a distribution such


as central tendency, dispersion. Skewness refers to the lack of symmetry
of the curve on both sides, whereas, Kurtosis referes to peakedness of the
normal distribution curve.

4.1 INTRODUCTION

Moments are a family of equations, each representing a different


quantity.

Skewness refers to lack of symmetry in the distribution, whereas


Kurtosis refers to peakedness of the normal distribution curve.

Skewness is represented by either Karl Pearson’s measure or Bowley’s


measure of Skewness.

4.2 MOMENTS

Moments can be defined as arithmetic mean of different powers of


deviations of observations from a particular value. When that particlular
value is zero, moment is called raw moment, and when that value is
mean, moment is called central moment.

33
For ungrouped data :

Central Moment for ungrouped data is given as :

In general Moment around a point a is given as

For Grouped data :

Central Moment for grouped data is given as :

In general Moment around a point a is given as

Ex 1: Find first four raw moments of following data :


X 2 3 4 5
f 12 15 18 15

Solution :
X f fX
2 12 24 48 96 192
3 15 45 135 405 1215
4 18 72 288 1152 4608
5 15 75 375 1875 9375
Total 14 60 216 846 3528 15390

First Raw Moment :

34
Second Raw Moment :

Third Raw Moment :

Fourth Raw Moment : (Ans)

4.3 RELATION BETWEEN CENTRAL MOMENTS AND


RAW MOMENTS

For grouped data, these results can be proved by replacing

4.4 SKEWNESS

Skewness refers to deviation from (or lack of) symmetry. A curve


which is not symmetric about any central vlaue on both the sides is called
skewed curve. When data is perfectly symmetrical about both the sides,
mean, median and mode coinicide at the central point. In case of
skewness, they change their position relative to each other.
Skewness can positive or negative.
Skewness measurement can be Absolute or Relative.

Absolute measures of Skewness :

There are two absolute measures.

1) Karl Pearson’s measure of Skewness = Mean - Mode

2) Bowley’s measure of Skewness = ,


35
Where,

Relative measures of Skewness :

There are three relative measures of Skewness.

1)

2)

Bowley’s coefficient of Skewness lies between -1 to +1

3)

Ex 2: Find Karl Pearson’s coefficient of Skewness for 4, 5, 3, 5, 5

Solution : Mean =
Mode = 5

(negavive skewness)
(Ans)

Ex 3:Find Bowley’s coefficient of Skewness for the following data.


Score 0-20 20-40 40-60 60-80 80-100
Number of student 15 25 32 35 16

36
Solution :
Score 0-20 20-40 40-60 60-80 80-100
Number of student 15 25 32 35 16
cf 15 40 72 107 123

To find Q1 :

Select cumulative frequency value higher or equal to Rank,

To find Q2 :

Select cumulative frequency value higher or equal to Rank,

To find Q3 :

Select cumulative frequency value higher or equal to Rank,

, slight negative Skewness (Ans)

37
Ex 4 : Find Karl Pearson’s coefficient of Skewness

Range f
20-40 15
40-60 20
60-80 35
80-100 12
100-120 5

Solution :

Range F X fX
20-40 15 30 450 900 13500
40-60 20 50 1000 2500 50000
60-80 35 70 2450 4900 171500
80-100 12 90 1080 8100 97200
100-120 5 110 550 12100 60500
Total 87 5530 28500 392700

The curve is slightly negatively skewed (Ans)

4.5 KURTOSIS

Normal distrbution curve is bell shaped in nature. But two


distribution may have symmetry, but their peakedness may vary. One may
have more height than the other. This characteristic is known asKurtosis.
The main reason for this variation in peak is concentration of data around
the mean value. The curve will have higher peak for smaller standard
deviation.

38
A distribution that is peaked in the same way as any normal
distribution is termed as Mesokurtic.

A Leptokurtic distribution is one with higher peak compared to


Mesokurtic distribution. The curne has higher peak and is thin.

In contrast to Leptokurtic distribution, Platykurtic distribution is


flattened from top and has broad appearance compared to Mesokurtic
curves.

Measure of Kurtosis :

For Mesokurtic distribution, , and


For Leptokurtic distribution, , and
For Platykurtic distribution, , and

Both are unit free parameters and are independent of


change of scale and change of origin.

4.6 SUMMARY

1) Moments describe various parameters


2) Raw moments and Central moments can be related with various
formulas
3) Skewness represent extent of lack of symmetry in un symmetrical
distributions
4) Karl Pearson’s measure of Skewness and Bowley’s co efficient of
Skewness are measures of Skewness
5) Kurtosis represent thinness or flattened but symmetrical normal
distribution curves
6) Kurtosis can be Mesokurtic, Laptokurtic or Platykurtic

39
4.7 EXERCISE

1) Expian Karl Pearson’s co-efficient of Skewness.


2) Find Karl Pearson’s coefficient of Skewness for 12, 14, 13, 16, 18
3) Find Bowley’s coefficient of Skewness for the following data.
Score 0-10 10-20 20-30 30-40 40-50
Number of student 23 42 45 40 12
4) Given find

4.8 LIST OFREFERENCES

1) Probability, Statistics, design of experiments and queuing theory with


applications of Compter Science, S. K. Trivedi, PHI
2) Applied Statistics, S C Gupta, S Chand



40
5
CORRELATION AND REGRESSION
ANALYSIS
Unit Structure
5.0 Objective
5.1 Introduction
5.2 Correlation
5.2.1Scatter plot
5.2.2 Karl Pearson’s coefficient of Correlation
5.2.3 Properties of Correlation coefficient
5.2.4 Merits and Demerits of Correlation coefficient
5.2.5 Rank Correlation
5.3Regression
5.3.1 Linear Regression using method of least squares
5.3.2 Regression coefficient
5.3.3 Coefficient of determination
5.3.4 Properties of Regression coefficients
5.4 Summary
5.5 Exercise
5.6 List of References

5.0 OBJECTIVE

Correlation, as name suggests correlates two parameters.


Statistically, Correlation coefficient gives an estimate of extent of
correlation between these two parameters (or quantities). One can
correlate score in final exam with the number of hours of study during the
term.
Regression is an estimation technique. It uses historical data to
estimate the possible value of that parameter in future. Regression analysis
helps to allocate resources based on estimation of the parameter like
estimation of future sales or estimation of future climatic condition.

5.1 INTRODUCTION

Correlation can be measured statistically by Coefficient of


Correlation or even Scatter graph can be used.

Regression equation can be obtained either by method of least


squares or one can even use Regression coefficient.

41
5.2 CORRELATION

Correlation analysis provides information about changes in one


parameter with reference to changes in othe rparameter. When one
variable increases, the other also increases (may be in different extent),
then the correlation is positive. In contrast to this, when variable increases,
the other dcreases, the correlation can be termed negative. There can
instances when there is no correlation between two parameters.

Correlation can be represented by :


1) Scatter Graph (Graphical representation) or
2) Karl Pearson’s coefficient of correlation (r) which is a stastical
measure of correlation

5.2.1 SCATTER GRAPH

Scatter Graph, also called X-Y plot gives following information about
two paratemers :
1) Shape (linear or non linear)
2) Extent of correlation
3) Nature of correlation like positive, negative or no correlation

Ex 1 : Plot Scatter Graph and comment.


X Y
3 12
5 15
8 32
9 35
12 45

Solution :

Comment : There seems to be high positive and linear relationship


between X and Y
(Ans)
42
Ex 2 : Plot Scatter Graph and comment.

X Y
56 12
45 15
32 32
22 35
12 45

Comment : There seems to be high negative and linear relationship


between X and Y
(Ans)

Ex 3 : Plot Scatter Graph and comment.

X Y
5 12
16 15
3 32
22 35
1 45

43
Solution :

Comment : There seems to be slight negative or no correlation between X


and Y
(Ans)

Merits and Demerits of Scatter Graph

Merits :
1) Scatter Graph is easy to plot
2) It is also easy to understand and interpret general trend
3) Non linear relation can be easily detected
4) Scatter graph can very easily spot some abnormal values which are
ot consistent with rest of the values
Demerits :
1) Scatter graph does not give mathematical (or numerical) value of the
correlation, hence can not be used in further calculations, except for
visual observations
2) This method is useful for relatively small number of observations
3) It can not be applied to qualitative data whose numberical values are
not available like emotions, sentimets correlation can not be
represented by Scatter Graph as no numerical values are available

5.2.2 KARL PEARSON’S COEFFICIENT OF CORRELATION

Karl Pearson’s coefficient of correlation (r) is used to find tpe of


correlation i.e. positive, negative or no correlation and also extent of
correlation like strong, medium or weak correlation.
It is a numerical measure of correlation and is very useful in
statistical analysis.
44
Basic definition of r is
But, working formula for r is,

Ex 4 :Find Karl Pearson’s coefficient of correlation

X Y
3 12
5 15
8 32
9 35
12 45

Solution :

X Y XY X² Y²
3 12 36 9 144
5 15 75 25 225
8 32 256 64 1024
9 35 315 81 1225
12 45 540 144 2025
Total 37 139 1222 323 4643

n = 5, number of ordered pairs

There is very strong positive correlation between X and Y (Ans)

Ex 5 : Find Karl Pearson’s coefficient of correlation

X Y
56 12
45 15
32 32
22 35
12 45

45
Solution :
X Y XY X² Y²
56 12 672 3136 144
45 15 675 2025 225
32 32 1024 1024 1024
22 35 770 484 1225
12 45 540 144 2025
Total 37 139 1222 323 4643

n = 5, number of ordered pairs

There is very strong negative correlation between X and Y (Ans)

Ex 6 : Find Karl Pearson’s coefficient of correlation

X Y
5 12
16 15
3 32
22 35
1 45

Solution :

X Y XY X² Y²
5 12 60 25 144
16 15 240 256 225
3 32 96 9 1024
22 35 770 484 1225
1 45 45 1 2025
Total 47 139 1211 775 4643

n = 5, number of ordered pairs

There is slight negative correlation between X and Y (Ans)

46
5.2.3 PROPERTIES OF KARL PEARSON’S COEFFICIENT OF
CORRELATION

1) Correlation coefficient lies between -1 and +1


2) Correlation coefficient is independent of change of origin and scale
3) If variables are independent then they are uncorrelated (r near zero), but
the converse is not true
4) Sometimes, correlation value may mislead, as there may be some value
of correlation by chance, but actually there is no evidence of
correlation

5.2.4MERITS AND DEMERITS OF COEFFICIENT OF


CORRELATION

Merits :
1) It is easy to understand and easy to calculate
2) It indicates type of correlation i.e. negative, positive or no correlation
3) It also gives clear information about extent of correlation, +1 for
perfect positive and -1 for perfect negative correlation
Demerits :
1) It can mislead as higher correlation does not always mean close
relationship. Two variables can have high value of correlation but
may not actually have any relatinship
2) It is affected by extreme values of data set
3) Non linear relation is not very clearly indicated by correlation
coefficient, whereas it is vlearly seen in Scatter plot

5.2.5 RANK CORRELATION

Rank correlation coefficient measures the degree of similarity


between two rankings.

For example, in a singing competition, two judges may give their


independent opinion about the participants through ranking, say 1, 2, 3 etc.
With the Rank correlation coefficient, one can find the extent to which
these two judges agree on the performance of the participant.

Spearman’s Rank Correlation

Where d is difference in Rank

47
Ex 7 : Find Spearman’s Rank Correlation

R1 R2
1 2
2 3
3 1
4 5
5 4

Solution :

R1 R2 d = R1 – R2
1 2 -1 1
2 3 -1 1
3 1 2 4
4 5 -1 1
5 4 1 1
Total 8

(Ans)

Spearman’s Rank Correlation when Ranks are repeated

Where d is difference in Rank

5.3 REGRESSION

Regression is an estimation technique. It uses historical


data/information to estimate/predict near future value of that parameter.
For Example, score of a student in Mathematics exam can be predicted
based on student’s performance in a few previous years.

48
Regression line :

If X is independent variable, and Y is dependent variable, then the


Regression line can be given as :

Above Regression equation represents a strainght line. In practice,


there can be non linear relationship between X and Y, in such a case, the
Regression equation can include square or cube or higher degree terms
also.

Regression Equation actually approximates and straightens the


point orientation by introducing some error for alignment of the points to
get a straight line .i.e. Regression line.

5.3.1 LINEAR REGRESSION USING METHOD OF LEAST


SQUARES

Method of Least Squares is one of the methods to derive


Regression Equation.

Two parameters can be


found out using two normal equations.

……….. Normal Equation I

……….. Normal Equation II

Solving these equations give values of a and b required to form


Regression Equation

49
Ex 8 :Form Regression Equation for the following data set.
X Y
5 12
12 15
15 32
22 35
25 45

Solution :

The two Normal equations are :

……….. Normal Equation I

……….. Normal Equation II

5 12 60 25
12 15 180 144
15 32 480 225
22 35 770 484
25 45 1125 625
Total 79 139 2615 1503

Substituting these values in the two normal equations :

Solving simultaneously, or by method of substitution,

Substituting these values in the Regression Equation :


is the Regression Equation (Ans)

Ex 9 :Form Regression Equation for the following data set, and hence
estimate

X Y
1 25
3 18
4 12
6 5
9 1

50
Solution :

The two Normal equations are :

……….. Normal Equation I

……….. Normal Equation II

1 25 60 25
3 18 180 144
4 12 480 225
6 5 770 484
9 1 1125 625
Total 23 61 166 143

Substituting these values in the two normal equations :

Solving simultaneously, or by method of substitution,

Substituting these values in the Regression Equation :


is the Regression Equation
For
,
(Ans)

5.3.2 REGRESSION COEFFICIENT

Regression Coefficient b of Y on X

Regression Coefficient b of Y on X is given as :

Regression Equation can now be obtained as :

51
Ex 10 : Find Regression Equation using Regression coefficient
X Y
2 13
3 24
4 54
6 65
9 72

Solution :

2 13 60 25
3 24 180 144
4 54 480 225
6 65 770 484
9 72 1125 625
Total 23 61 166 143

Regression Coefficient b of Y on X is given as :

and
Regression Equation can now be obtained as :

is the Regression Equation (Ans)

Regression Coefficient b of Y on X
Regression Coefficient b of X on Y is given as :

Regression Equation can now be obtained as :

52
5.3.3 COEFFICIENT OF DETERMINATION

The Coefficient of detrmination, , is a paramter used to judge


how well the estimated Regression line fits all the data, where , is Karl
Pearson’s coefficient of Correlation.

If the Regression line passes through all or most points, then


coefficient if determination will be close to 1.
Since,

Significance of coefficient of detrmination


1) It gives the strength of linear relaionship between two variables
2) It gives confidence to obtain variable to be predicted from the
indepndent variable
3) The coefficient of determination is the ratio of explained variation to
toal variation
4) It represents the quantum of data that is closest to the line of best fit
5) It is a measure of how well the Regression line represents the data

5.3.4 PROPERTIES OF REGRESSION COEFFICIENT

1) The point lies on both the Regression lines


2) In case of perfect correlation between two variables, or

3) Slope of Regression equation Y on X is given as, wheras, slope

of Regression equation X on Y is given as

53
4) The angle between two Regression lines is given as,

5.4 SUMMARY

1) Correlation between two parameters can be represented either by


Scatter Graph or Karl Pearson’s coefficient of Correlation (r) can be
used
2) Karl Perason’s coefficient of correlation ranges between -1 to +1.
Negative correlation has negative value of r and positiove correlation
has positiove value of r
3) Regression line helps to estimate or predict near future value of the
dependent parameter using historical values of the independent
variable
4) Regression line can be found out using method of least squares or
using Regression coefficient method
5) Coefficient of determination helps to understand how well is the
regression line fits or covers all or most data points

5.5 EXERCISE

1) Plot Scatter Graph and comment

X Y
201 34
226 45
230 56
312 53
340 62
357 64

2) Find Karl Pearson’s coefficient of correlation

X Y
55 12
43 10
32 7
24 4
18 3
11 1

54
3) Find Spearman’s Rank Correlation

R1 R2
1 4
2 3
3 2
4 1
5 5

4) Find Regression Equation for the following data set, using method of
least squares

X Y
12 12
18 34
26 67
34 87
53 106
66 134

5) Find Regression Equation using Regression coefficient

X Y
1 4
6 22
8 45
10 77
11 87

5.6 LIST OFREFERENCES

1) Probability, Statistics, design of experiments and queuing theory with


applications of Compter Science, S. K. Trivedi, PHI
2) Applied Statistics, S C Gupta, S Chand



55
6
PROBABILITY
Unit Structure
6.0 Objective
6.1 Introduction
6.2 Some basic definitions of Proabaility
6.3 Permutations and Combinations
6.4 Classical and axiomatic definitions of Probability
6.5 Addition Theorem
6.6 Conditional Probability
6.7 Baye’s Theorem
6.8 Summary
6.9 Exercise
6.10 List of References

6.0 OBJECTIVE

The study of Probability helps learner to find solution to various


types problems which have some uncertainty in their occurence. Thie
shapter explains various definitions, concept and terms used in probabiluty
study in detail.

Learner should be able to understand and find solution to various


problems for which probability theory gives reasonably good solution.

6.1 INTRODUCTION

Study of Probability is the study of chance. Probability theory is


widely applied to understand economic, social as well business problems.
Refer to the statements used by us in our daily life :
1) The train may get delayed
2) There is a chance of getting distinction in Mathematics by Mahesh
3) Asha may come on time today

Such statements are commonly used by all of us. One can


systematically study such probable events using principles of Probability
discussed in this chapter.

56
6.2 SOME BASIC DEFINITIONS OF PROBABILITY

Experiment : An experiment is an action that has more than one posiible


outputs

For Example :
1) Tossing a coin gives either a Head or a Tail
2) Throwing a die gives any one number from 1 to 6 on top face of the
die
3) A student appearing for an exam may pass or may fail exam

Experiment may be random or deterministic.

The output of the random experiment changes and occurs


randomly without any bias. In random experiments, all outcomes are
equally likely. For example, tossing a coin

The outcome of the deterministic experiment does not change


when performed many times. For example, counting number of windows
of a particular room

Outcome : The result of an experient is called outcome. For example,


counting number of students in a class
Trial : Performing an experiment is called taking a trial
Sample Space :The collection of all possible outcomes is called sample
space of that experiment, For example, drawing a ball from a box having
three balls of Red, Blue and Green colours has a sample space of balls of
Red, Blue and Green colours. Sample space is demoted by letter S
Sample point : Each outcome of the sample space is called sample point.
The total number of sample points are denoted as n(S)
Finite sample space : When the number of outcomes are finite, the
sample space in finite sample space. For example, number of students in
Statistics class of a college
Countably infinite sample space : When the number of elements in a
sample space are infinite, the sample space is said to countably infinite
sample space. For example, set all all natural numbers
Exhaustive outcomes : Outcomes are exhaustive if they combine to be
the entire sample space.For example, outcomes Head and Tail are
exhaustive outcomes, when a coin is tossed
Event : Any subset of sample space associated with random experiment is
called an Event. Fro example, for a sample space={1, 2, 3, 4, 5}, an event
A can be “getting and odd number” and can be written as A={1, 3, 5}

57
Types of Event : Events can be described as given below :
1) Simple event : An event having only one outcome is called simple
event. For example, the evet of getting a head when a coin is tossed
2) Impossible event : The event corresponding to null set is called an
impossible event. For example, an event of getting a number more
than 6 when a die is thrawn
3) Sure event : The event corresponding to the sample space is called sure
event. For example, an event of getting either a head or a tail when a
fair coin is tossed
4) Mutually exclusive events : Two or more events are said be mutually
exclusive events if they do not have a sample point in common. For
example, an event of getting an even and another event of getting an
even number when a die is rolled
5) Exhaustive events : The events are said to be exhaustive events if
occurrence of any one event is surely going to take place. For example,
event of getting either red or black card when a card is drawn from a
pack of cards
6) Equally likely event : When all events have same chance of occurrence
then the events are equally likey. For example, getting a Head or a Tail
when an uniased coin is tossed, are called equally likely events
7) Independent events : Two or more events are said to be independent
events if one of them is not affected by occurrence of any other events.
i.e. P(A/B)=P(A)

6.3 PERMUTATIONS AND COMBINATIONS

Factorial: Factorial of a real number is written as such that

Ex 1 : Find
Solution : (Ans)
Permutation : Permutation means arrangement of objects in different
ways. For example, out of three objects A, B and C taken two at a time
can be arranged as AB, BA, BC, CB, CA, AC. We can arrange in six
different ways, as order or sequence of objects in Permutations is
important. So, if n objects are are arranged taken r at a time can be written
as,

Ex 2 : Find
Solution : take

(Ans)

58
Ex 3 : How many ways are there for eight men and five women to stand in
a row so that no two women stand next to each other.

Solution :
Eight men can be arranged in ways.
Five women can be arranged in 9 ways as shown below :
* M * M * M *M *M *M *M *M *
Here * represents a place for a woman, and M represents a place for man.
Five women can be arranged in 9 places in

ways.
So, together eight men and five women can be arranged such that no two
women stand together as :
Total number of ways = ways
(Ans)

Ex 4 : In Hhw many ways can the letters of the word ‘MOUSE’ arranged,
where meaning/spelling does not matter.

Solution :
The words can be arranged in ways. (Ans)

Combination : Combination is a selection of objects without consideting


the order of arrangements. For Example, for three objects A, B and C,
when two objects are taken at a time, the arrangement can be AB, BC and
AC. Order or sequence of arranements is not important in case of
Combination. So, Combination of n objects taken r at a time can be
written as,

Ex 5 : Find
Solution : take

(Ans)

Ex 6 : Find

59
Solution :

Also,

(Ans)

Ex 7 : In how many ways can a committee of 2 officers and 3 clerks can


be made from 4 officers and 10 clerks.
Solution : This can be done in ways

ways (Ans)

6.4 CLASSICAL AND AXIOMATIC DEFINITIONS OF


PROBABILITY

Classical definition of Proability


When a random experiment is conducted having sample space S
having n(S) equally likely outcomes, the event A having n(A) favourable
outcomes, the probability of occurrence of event A is given as P(A) such
that :

Some inportant points regarding Probability definition are :


1) The sum of all probabilities in the sample space is 1 (one)
2) The probability of an impossible event is 0 (zero)
3) The probability of a sure event is 1 (one)
4) The probability of not occuring an event is 1 – probability of
occuring an event. i.e.

60
Ex 8 :Write down sample space for each of the following cases
1) A coin is thrawn three times
2) A coin is thrawn three times and number of heads in each thraw is
noted
3) A tetraheadron (a solid with four traingular surfaces) whose sides are
painted red, red, blue and green. The color of the side touching the
gound is noted
4) Blood group of husband and wife are tested and noted

Solution
1)
2)
3)
4)

Ex 9 : Thre eunbiased coins are tossed. What is the probability of getting


at least one Head.
Solution :
Sample Space,

Let A be the event of getting at least one Head

(Ans)

Ex 10 : Nine tickets are marked numbers 1 to 9. One ticket is drawn at


random. What is the probability that the number is an odd number.
Solution :

(Ans)

61
Ex 11 : An urn contains 8 blue balls, 7 green balls and 5 red balls. One
ball is drawn at random, what is the probability that it is (a) a red ball, (b)
a blue ball.
Solution :

(a) Let A be the event of getting a red ball

(b) Let B be the event of getting a blue ball

Ex 12 : From a well shuffeled pack of cards, a card is drawn at random.


What is the probability that the drawn card is a red card
Solution :

Let A be the event of getting a red card

(Ans)

Ex 13 : What is the probability of getting a sum nine (9) when two dice
are thrawn
Solution :

Let A be the event of getting a sum nine (9)

(Ans)

Ex 14 : The Board of Directors of a company wants to form a quality


management committee to monitor quality of their products. The company
has 5 scientists, 4 engineers and 6 accountants. Find the probability that
the committee will have 2 scientists, 1 engineer and 2 accountants.
Solution :

62
Let A be the event of having 2 scientists, 1 engineer and 2 accountants

(Ans)

Axiomatic definition of Proability


Suppose, for an experiment, S is the sample space containing outcomes,
, then assigning a real number to each
uch that
1)

2)

6.5 ADDITION THEOREM

If A and B are two events defined on sample space, S then

a) Addition theorem can also be explained by Venn diagram

A A B

b) If two events are mutually exclusive, then

c) For three events,

Ex 15 :An integer is chosen at random from 1 to 100. Find the probability


that it is multiple of 5 or a perfect square
Solution :

Let A be the event of getting a number multiple of 5


Let B be the event of getting a perfect square

63
By addition theorem,

Required probability of getting a multiple of 5 or a getting a perfect square


is

(Ans)

Ex 16 : A card is drawn at random from a pack of cards. Find the


probability that the drawn card is a diamond or face card.
Solution :

Let A be the event of getting a diamond card


Let B be the event of getting a face card

By addition theorem,

64
Required probability of getting a multiple of 5 or a getting a perfect square
is

(Ans)

6.6 CONDITIONAL PROBABILITY

Let there be two events A and B. The probability of event A given


that event B has occurred is known as conditional probabilityof A given
that B has occurred and is given as :

Ex 17: Given . Find

Solution :

(Ans)

Ex 18 : Find the probability that a single toss of die will result in a number
less than 4 if it is given that the toss resulted in an odd number.
Solution : Let event A be the toss resulting in an odd number
And let event B be getting the number less than 4

(Ans)
65
6.7 BAYE’S THEOREM

Let be a set of mutually exclusive events that


together form the sample space S. Let be any event from the same
sample space. Then Baye’s theorem states that

Ex 19 : In a toy factory, machines manufacture


respectively 25%, 35% and 40% of total toys. Of these 5%, 4% and 2%
are defective toys. A toy is selected at random and is found to be
deefctive. What is the probability that it was manufactured by machine
Solution :

Let be any event that the drawn toy is defective.

We have to find

Required probability is 0.40 (Ans)

66
6.8 SUMMARY

1)

2)

3)
4)

5)

6)

6.9 EXERCISE

1) One card is drwan at random from a pack of cards. What is the


probability that it is a King or a Queen.
2) Find
3) Given an equiprobable sample space , and an
event Fnd )
4) Given, Find

5) A class has 40 boys and 20 girls. How many ways a class representative
(CR) be selected such that the CR is either a boy or a girl
6) From a set of 16 tickets numbered from 1 to 16, one ticket is drawn at
random. Find the probability that the number is divisible by 2 or 5
7) A car manufacturing company has two plants. Plant A manufactures
70% of the cars and the plant B manufactures 30 % of the cars. 80%
and 90% of the cars are of standard quality at plant A and plant B
respectuvely. A car is selected at random and is found to be of
standard quality. What is the probability that is was manufactured in
plant A

6.10 LIST OFREFERENCES

1) Probability, Statistics, design of experiments and queuing theory with


applications of Compter Science, S. K. Trivedi, PHI
2) Applied Statistics, S C Gupta, S Chand


67

You might also like