0% found this document useful (0 votes)
55 views27 pages

Statistical Methods Overview and Applications

Uploaded by

TANISHA SINHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views27 pages

Statistical Methods Overview and Applications

Uploaded by

TANISHA SINHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Statistical Methods and Applications (MSC 503)

Text Books:
• Introduction to Probability Models, 12th Edition, 2019, Sheldon M. Ross Academic
Press, Elsevier.
• Applied Statistics and Probability for Engineers, 7th edition, 2018, D. C. Montgomery
and G. C. Runger, John Wiley & Sons.
Reference Books:
• Statistics for Management, 8th edition, 2017, Richard I. Levin, Masood Husain
Siddiqui, Sanjay Rastogi and David S. Rubin, Pearson Education Publication.
• An Introduction to Probability Theory and Its Applications, Volume 1, 3rd Edition,
1991, William Feller, John Wiley & Sons.
Case Studies: to be provided
Marks Distribution
• Quizzes: 20% marks
• Mid-semester Examination: 32% marks
• End Semester Examination: 48% marks
Introduction
• Variables
• Uncertainty/Variations
• Random Variables
• Data: Categorical / Continuous
• Causes of Variations
• Chance causes/Natural causes
• Assignable causes
• Data filtration- Outliers, Leverage points
• Statistics: Science of modeling random variables
Example:
The following sample data set lists the number of minutes of
50 Internet subscribers spent on the Internet during their most
recent session.
50 40 41 17 11 7 22 44 28 21
19 23 37 51 54 42 86 41 78 56
72 56 17 7 69 30 80 56 29 33
46 31 39 20 18 29 34 59 73 77
36 39 30 62 54 67 39 31 53 44
Frequency Distribution
• Grouping of data into mutually exclusive categories (classes or
intervals) showing the number of observations in each class
(frequency, f ).
• Constructing a Frequency Distribution
• Find the number of classes (k): at least 6-7
• Find the class width (i)
• Should be the same for all classes, and
• lowest value (L) and highest (H) of observation values
• i ≥ (H-L)/k
• With help of tally find the frequency.
Solution:
Class Tally Frequency, f
Class width 7 – 18 IIII I 6
19 - 7 = 12
19 – 30 IIII IIII 10
31 – 42 IIII IIII III 13
43 – 54 IIII III 8
55 – 66 IIII 5
67 – 78 IIII I 6
79 – 90 II 2
Σf = 50
Lower class Upper class
limits limits

Class Midpoint Frequency, f


Midpoint of a class = 7  18
7 – 18  12.5 6
(Lower class limit)  (Upper class limit) 2
Class
2 19  30
19 – 30  24.5 width
10 = 12
2
31  42
31 – 42  36.5 13
2
Relative Frequency of a class
• Portion or percentage of the data that falls in a particular class.
relative frequency Class Frequency, f Relative Frequency
6
class frequency 7 – 18 6  0.12
 50
Sample size 10
19 – 30 10  0.20
f 50
 13
n 31 – 42 13  0.26
f 50
 1
n

Cumulative frequency of a class


The sum of the Class Frequency, f Cumulative frequency
frequency for that 7 – 18 6 6
class and all
previous classes. 19 – 30 + 10 16

31 – 42 + 13 29
Class boundaries
• The numbers that separate classes without forming gaps between them. Mostly applicable for continuous
variables
• The distance from the upper limit of the first class to the lower limit of the second class is 19 – 18 = 1.
• Half this distance is 0.5.
• First class lower boundary = 7 – 0.5 = 6.5
• First class upper boundary = 18 + 0.5 = 18.5

Class Class boundaries Midpoint Frequency, f


7 – 18 6.5 – 18.5 12.5 6
19 – 30 18.5 – 30.5 24.5 10
31 – 42 30.5 – 42.5 36.5 13
43 – 54 42.5 – 54.5 48.5 8
55 – 66 54.5 – 66.5 60.5 5
67 – 78 66.5 – 78.5 72.5 6
79 – 90 78.5 – 90.5 84.5 2
Frequency Distribution

Class Frequ Mid Relative Cumulative Cumulative Histogram


14 100.00%
ency, point frequency frequency Relative
90.00%
f x frequency 12
80.00%
7 – 18 6 12.5 0.12 6 0.12 10 70.00%

19 – 30 10 24.5 0.20 16 0.32 60.00%

Frequency
8
31 – 42 13 36.5 0.26 29 0.58 50.00%
6
40.00%
43 – 54 8 48.5 0.16 37 0.74
4 30.00%
55 – 66 5 60.5 0.10 42 0.84
20.00%
67 – 78 6 72.5 0.12 48 0.96 2
10.00%

79 – 90 2 84.5 0.04 50 1.00 0 0.00%


18 30 42 54 66 78 90 More

Internet Usage Time


Typical Histogram Shapes and What They Mean
Normal. A common • Skewed. Asymmetrical because a natural limit prevents
pattern is the bell-shaped outcomes on one side.
curve known as the • The distribution’s peak is off center toward the limit and
“normal distribution.” In a tail stretches away from it.
a normal distribution, • For example, a distribution of purity of a product would
points are as likely to be skewed, because the product cannot be more than
occur on one side of the 100 percent pure. Other examples of natural limits are
average as on the other. holes that cannot be smaller than the diameter of the
drill bit or call-handling times that cannot be less than
zero. These distributions are called right- or left-skewed
according to the direction of the tail.
• Double-peaked or bimodal. Two peaks • Plateau. The plateau might be called a
in the distribution “multimodal distribution.”
• The outcomes of two processes with • Several processes with normal
different distributions are combined in distributions are combined. Because
one set of data. there are many peaks close together,
• For example, a distribution of production the top of the distribution resembles a
data from a two-shift operation might be plateau.
bimodal, if each shift produces a
different distribution of results.
Stratification often reveals this problem.
• Truncated. The truncated distribution • Dog food. The dog food distribution is
looks like a normal distribution with the missing something—results near the
tails cut off. average.
• The supplier might be producing a • If a customer receives this kind of
normal distribution of material and then distribution, someone else is receiving a
relying on inspection to separate what is heart cut, and the customer is left with the
within specification limits from what is “dog food,” the odds and ends left over after
out of spec. The resulting shipments to the master’s meal. Even though what the
the customer from inside the customer receives is within specifications,
specifications are the truncated. the product falls into two clusters: one near
the upper specification limit and one near
the lower specification limit. This variation
often causes problems in the customer’s
process.
Characterizing a distribution
• Data set have a tendency to lie around a particular point
• Measures of central tendency
• How much is the spread of the data set?
• Measures of dispersion
• Symmetry of the distribution: Skewness
• Peakedness of the distribution: Kurtosis
Measures of Central Tendency
• A measures of central of tendency may be defined as single
expression of a group of data
• There are two main objectives for the study of measures of central
tendency:
• To get one single value that represent the entire data
• To facilitate comparison
• Mean:
• Arithmetic Mean
• Weighted Mean
• Geometric Mean
Different Measures of Central Tendency • Harmonic Mean
• Median
• Mode
Arithmetic Mean
• The arithmetic mean is the sum of a Arithmetic Mean of Group Data
set of all observations, positive,
if x1 , x2 , x3 ,......... ., xk are the mid-
negative or zero, divided by the
number of observations. If we have values and f1 , f 2 , f 3 ,........, f k
“n” real numbers are the corresponding frequencies,
x1 , x2 , x3 , ......., xn . where the subscript ‘k’ stands for the
number of classes, then the mean is
• Arithmetic mean

x
x1  x2  x3  .......... ...  xn
x
 f x i i

n
n f i

x i
x  i 1

n
Example: Find the Mean of a Frequency Distribution
Class Midpoint, x Frequency, f (x∙f)
7 – 18 12.5 6 12.5*6 = 75.0
19 – 30 24.5 10 24.5*10 = 245.0
31 – 42 36.5 13 36.5*13 = 474.5
43 – 54 48.5 8 48.5*8 = 388.0
55 – 66 60.5 5 60.5*5 = 302.5
67 – 78 72.5 6 72.5*6 = 435.0
79 – 90 84.5 2 84.5*2 = 169.0
n = 50 Σ(x ∙ f) = 2089.0

( x  f ) 2089
x   41.8 minutes
n 50
Weighted Mean
Example:
Weighted mean of the Source Score, x Weight, w x∙w
positive real numbers Test Mean 86 0.50 86(0.50)= 43.0
x1,x2, ..., xn with their Midterm 96 0.15 96(0.15) = 14.4

weight w1,w2, ..., wn Final Exam 82 0.20 82(0.20) = 16.4


Computer Lab 98 0.10 98(0.10) = 9.8
is defined to be
Homework 100 0.05 100(0.05) = 5.0
n Σw = 1 Σ(x∙w) = 88.6
w i xi
x  i 1
( x  w) 88.6
x    88.6
n

w
i 1
i w 1

Your weighted mean performance for the course is 88.6.


Median
The implication of this definition Median of Group Data
is that a median is the middle
value of the observations such Median = l + (N/2 – F) * i
that the number of observations f
above it is equal to the number Where,
of observations below it. l = lower limit of Median class
N= Total frequency (total number of
If “n” is odd Me  X 1 observations)
( n 1)
2 F = Cumulative frequency of the class
just preceding to median class
If “n” is Even f = Frequency of Median class
1 
M e   X n  X n  i = Size of the class interval
2  (2) ( 1)
2 
Class Cumulative
Midpoint, x Frequency, f
boundaries frequency
6.5 – 18.5 12.5 6 6
18.5 – 30.5 24.5 10 16
30.5 – 42.5 36.5 13 29
42.5 – 54.5 48.5 8 37
54.5 – 66.5 60.5 5 42
66.5 – 78.5 72.5 6 48
78.5 – 90.5 84.5 2 50

N/2 = 50/2 = 25
Median Class: 30.5 – 42.5
Md = 30.5 + (25-16)*12/13 = 38.806
QUARTILES
• The values which divide the given data in to four equal parts when
observations are arranged in order.
• Obviously there will be three quartiles Q1, Q2 & Q3.

Q1 (1st quartile):25% below &75% above


Q2 (2nd quartile): same as median, 50% above & below
Q3 (3rd quartile):75% below & 25% above
To calculate nth quartile :
l + (nN/4 – F) * i
f
Class Cumulative
Midpoint, x Frequency, f
boundaries frequency
6.5 – 18.5 12.5 6 6
18.5 – 30.5 24.5 10 16 N/4=12.5
30.5 – 42.5 36.5 13 29
42.5 – 54.5 48.5 8 37
54.5 – 66.5 60.5 5 42
66.5 – 78.5 72.5 6 48
78.5 – 90.5 84.5 2 50

N/4 = 50/4 = 12.5


Q1 Class: 18.5 – 30.5
Q1 = 18.5 + (12.5 - 6)*12/10 = 26.3
QUINTILES & DECILES PERCENTILE
• Quintiles : It contains four • Points divide the data set
points so it will divide data in into 100 equal parts of total
to five equal parts. frequency.
• Deciles : it contain 9 points & • To calculate nth percentile :
it will divide data in to ten l + (nN/100 – F) * i
equal parts.
f
• To calculate nth deciles:
l + (nN/10 – F) * i
f
Mode
• Mode is the value of a Mode of Group Data
distribution for which the
frequency is maximum. 1
M 0  L1  i
In other words, mode is 1   2
the value of a variable,
which occurs with the • L1 = Lower boundary of modal class
highest frequency. • Δ1 = difference of frequency between
• So the mode of the list modal class and previous class
(1, 2, 2, 3, 3, 3, 4) is 3. • Δ2 = difference of frequency between
The mode is not modal class and following class
necessarily well defined.
• i = class interval
• Modal Class: 30.5-42.5
Class
Frequency, f • L1 = 30.5
boundaries
• Δ1 = 13 – 10 = 3
6.5 – 18.5 6
• Δ2 = 13 – 8 = 5
18.5 – 30.5 10
• i = 12
30.5 – 42.5 13
42.5 – 54.5 8
1
54.5 – 66.5 5 M 0  L1  i
66.5 – 78.5 6 1   2
78.5 – 90.5 2 3 12
 30.5   35
35
Geometric Mean
Geometric mean of Group data
• Geometric mean is defined as
the positive root of the product If the “n” non-zero and positive and
of observations. Symbolically, values x1 , x2 ,..., xn occur f1 , f 2 ,..., f n
times, respectively, then the geometric
G  ( x1 x2 x3 xn )1/ n
mean of the set of observations is
defined by:
• It is also often used for a set of 1
numbers whose values are 
1 n
fi 
xn    xi 
N

meant to be multiplied together G   x1 f1 x2 f2 fn N

or are exponential in nature,  i 1 


such as data on the growth of n
the human population or ratios. where N  i 1
fi
• Cannot be used with numbers
of value 0 or negative.
Harmonic Mean • The harmonic mean H of the
positive real numbers x1,x2, ..., xn
• Harmonic mean (formerly is defined to be
sometimes called the
subcontrary mean) is one of Ungroup Data n
H 
several kinds of average. n
1
• Typically, it is appropriate for i 1 xi
situations when the average
of rates is desired. The
harmonic mean is the
number of variables divided n
by the sum of the reciprocals
Group Data H  n
fi
of the variables. Useful for i 1 xi
rates such as speed
(=distance/time) etc.

You might also like