Chap 2 Introduction To Statistics
Chap 2 Introduction To Statistics
0 1 3 0 1 0 1 0
1 5 4 1 2 1 2 0
1 0 2 0 0 2 0 1
2 1 1 1 2 1 1
0 4 1 3 1 1 1
1 3 4 0 0 0 0
1 3 0 1 2 2 3
Data in this from
Meaningless
Not effective
Difficult to use
Need to summarize data in the form of:
Graphical – Freq. Dist., Histogram, Graphs,
Charts, Diagrams
Analytical – Measures of central tendency,
Measure of dispersion
Frequency Distribution (FD)
Summary of how data (observations) occur
within each subdivision or groups of observed
values
Help visualize distribution of data
Can see how total frequency is distributed
Two types :
Ungrouped data – listing of observed values
Grouped data – lump together observed values
20.5 21.5
21.5 22.5
22.5 23.5
FD - Ungrouped Data
1. Establish array, No of Tally mark Frequency
arrange in ascending errors
0 /////////// 13
or descend (as in 1 ////
column 1) 2 /////
2. Tabulate the 3 ////
frequency – place 4
5
tally marking in
column 2
3. Present in graphical
form – Histogram,
Relative freq. distr.
FD – Ungrouped data
14
4 graphical representations
12
1. Frequency histogram 10
Frequency
8
2. Relative freq histogram 6
4
3. Cumulative frequency histogram 2
4. Relative cum frequency histogram 0 1 2 3 4 5
CELL
Midpoint
UPPER BOUNDARY CELL NOMENCLATURE
2. Determine the range
R = XH - XL
R = range
XH = highest value of data
XL = lowest value of data
Example :
If highest number is 2.575 and lowest number
is 2.531, then
R = XH - XL
= 2.575 – 2.531
= 0.044
3. Determine the cell interval
Cell interval = distance between adjacent cell midpoints.
If possible, use odd interval values e.g. 0.001, 0.07,
0.5 , 3; so that midpoint values will have same no.
decimal places as data values.
Use Sturgis rule.
i = R/(1+ 3.322 log n)
Trial and error
h = R/i ;h= number of cells or cllases
Assume i = 0.003; h = 0.044/0.003 = 15 cells
Assume i = 0.005; h = 0.044/0.005 = 9 cells
Assume ii = 0.007; h = 0.044/0/.007 = 6 cells
Cell interval 0.005 with 9 cells will give best presentation
of data. Use guidelines in step 1.
4. Determine cell midpoints
MPL = XL + i/2 (do not round)
= 2.531 + 0.005/2 = 2.533
1st cell have 5 different values (also the other
cells)
2.5 2.53
33 8
2.531 2.532 2.533 2.534
2.535
5. Determine cell boundaries
Limit values of cell
lower
upper
To avoid ambiguity in putting data
Boundary values have an extra decimal
place or sig. figure in accuracy that
observed values
+ 0.0005 to highest value in cell
- 0.0005 to lowest value in cell
6. Tabulate cell frequency
Post amount of numbers in each cell
Frequency distribution table
Sym.
Skew
Right
Skew
flatter
Left ‘very peak’
platykurtic
leptokurtic
Characteristics of Frequency
Distribution
F.D. can give sufficient info to provide basis for
decision making.
Distributions are compared regarding:-
X x i
x1 x 2 x n
i1
n n
Mean
Example
A QA engineer inspects 5 pieces of a tyre’s
thread depth (mm). What is the mean thread
depth?
x1 = 12.3 x2 = 12.5 X3 = 12.0.
x4 = 13.0 x5 = 12.8
Σx i 62.5
x 12.5 mm
5 5
Mean - Grouped Data
When data already grouped in frequency
distribution
h
f i xi
x i 1
Σfi
fi (n)= sum. of freq.
fi = freq in the ith cell
n = no. of cells/class
xi = mid point in ith cell
Mean - Grouped Data
Cell (i) Class Mid Freq Fixi fi fixi
boundary Point (fi)
(xi)
1 1 – 20 10 2 20 2
2 21 – 40 30 10 300 12
3 41 - 60 50 20 1000 32
4 61 – 80 70 12 840 44
5 81 -100 90 6 540 50
Totals 2700
fi xi
x = 2700/50 = 54
fi
Weighted average
Tensile tests aluminium alloy conducted with different
number of samples each time. Results are as follows:
1st test : x1 = 207 MPa n=5 n
w i xi
2nd test : x2 = 203 MPa n=6 xw i 1
n
3rd test : x3 = 206 MPa n=3 wi
i 1
xw
(5)(207) 6 (203) 3 (206)
205 MPa
xw = weighted avg.
563 wi = weight of ith
or use sum of weights equals 1.00 average
W1 = 5/(5+6+3) = 0.36
W2 = 6/(5+6+3) = 0.43
W3 = 3/(5+6+3) = 0.21 Total = 1.00
Median – Ungrouped Data
Median – value of data which divides total
observation into 2 equal parts
Ungrouped data – 2 possibilities
When total number of data (N) is a) odd or b)
even
If N is odd ; (N+1/2)th value is median
eg. 3 4 5 6 8 N+1/2=6/2=3 ,
3rd no.
If N is even
eg. 3 5 7 9 ½ of (5+7)=6
x x
2
i
s i1
n 1
xi = observed value
x = average
n = no. of observed value
or use
2
n
n
n x i x i
2
s i 1 i1
n n 1
Standard deviation – grouped
data
Cell (i) Class Mid Freq Fixi fi fixi
boundary Point (fi)
(xi)
1 1 – 20 10 2 20 2
2 21 – 40 30 10 300 12
3 41 - 60 50 20 1000 32
4 61 – 80 70 12 840 44
5 81 -100 90 6 540 50
Totals 2700
h 2
h
n fi x i fi x i 50 (166,600) (2700)2
2
424.49 20.6
s 1 1 50 49
n (n 1)
SAMPLE POPN.
Statistics, Parameter
x , s - mean
- std. dev.
Normal Distribution
Also called Gaussian distribution
Symmetrical, unimodal, bell-shaped dist
with mean, median, mode same value
Popn. curve – as sample size cell
interval - get smooth polygon
ND
Normal Distribution
Much of variation in nature & industry
follow N.D.
Variation in height of humans, weight of
elephants, casting weights, size piston
ring
Electrical properties, material – tensile
strength, etc.
Example - ND
Characteristics of ND
Can have different mean but same
standard deviation
Different standard deviation but same
mean
Relationship between std
deviation and area under curve
Normal Distribution Example
Need estimates of mean and standard
deviation and the Normal Table
Example :
From past experience a manufacturer
concludes that the burnout time of a
particular light bulb follows a normal
distribution. Sample has been tested and
the average (x ) found to be 60 days with
a standard deviation () of 20 days. How
many bulbs can be expected to be still
working after 100 days.
Solution
Problem is actually to find area under the curve beyond 100 days
Sketch Normal distribution and shade the area needed
Calculate z value corresponding to x value using formula
Z=(xi - )/ = (100-60)/20 = +2.00
Look in the Normal Table for z = +2.00 – gives area under curve as
0.9773
But, we want x >100 or z > 2.00. Therefore Area = 1.000 – 0.9773
= 0.0227, i.e. 2.27% probability that life of light bulb is > 100
hours
σ =20
0 μ = 60 100 x
Test For Normality
To determine whether data is normal
Probability Plot - plot data on normal probability
paper
Steps
1. Order the data
2. Rank the observations
3. Calculate the plotting position
i= rank , n=sample size,
100(i position
PP= plotting 0. 5 ) in %
PP
n
4. Label data scale
5. Plot the points on normal probability paper
6. Attempt to fit by eye ‘best line’
7. Determine normality
Example