0% found this document useful (0 votes)
79 views

Chap 2 Introduction To Statistics

The mean thread depth is 12.5 mm.

Uploaded by

Ananthanarayanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Chap 2 Introduction To Statistics

The mean thread depth is 12.5 mm.

Uploaded by

Ananthanarayanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Chap 2 Introduction to Statistics

This chapter gives overview of statistics


including histogram construction,
measures of central tendency, and
dispersion
INTRODUCTION TO STATISTICS
 Statistics – deriving relevant information
from data
 Deals with
 Collection of data – census, GDP, football,
accident, no. of employees (male, female ,
department, etc)
 Collection , tabulation, analysis,
interpretation, an presentation of quantitative
data – can make some conclusions on sample
or population studied, make decisions on
quality
INTRODUCTION TO STATISTICS
 Use of statistics in quality deals with
second meaning. – inductive statistics
 Examples :
 What can we learn from the data?
 What conclusions can be drawn?
 What does the data tell about our process
and product performance? etc.
INTRODUCTION TO STATISTICS
 Understand the use of statistics vital
in business
 to make decisions based on facts
 in conducting business improvements
 in controlling and monitoring process,
products or service performance
 Application of statistics to real life
problems such as for quality
problems will result in improved
organizational performance
Collection of data
 Collect Data – direct observation or indirect
through written or verbal questions (market
research, opinion polls)
 Direct observation measured, visual checking,
classified as variables and attributes
 Variables data – measurable quality
characteristics
 Attributes – characteristics not measured but
classified as conforming or non-conforming
Collection of data
 Data collected with purpose
 Find out process conditions
 For improvement
 Variables – quality characteristics that are
measurable and countable
 CONTINUOUS - Dimensions, weight, height,
etc. (meter, gallon, p.s.i., etc.)
 DISCRETE - numbers that exhibit gaps,
countable, (no. of defective parts, no. of
defects/car, Whole numbers, 1, 2, 3….100)
Collection of data
 Attributes - quality characteristics that are non-
measurable and ‘those we do not want to
measure’
 Example : surface appearance, color,
Acceptable, non-acceptable conforming, non-
conf.
 Data collected in form of discrete values
 Variables (weight of sugar) CAN be classified as
attributes 
 weight within limits – number of conforming
 outside limits – no. of non conforming
Summarizing Data
 Consider this data set on number of Daily Billing errors

0 1 3 0 1 0 1 0
1 5 4 1 2 1 2 0
1 0 2 0 0 2 0 1
2 1 1 1 2 1 1
0 4 1 3 1 1 1
1 3 4 0 0 0 0
1 3 0 1 2 2 3
 Data in this from
Meaningless
Not effective
Difficult to use
 Need to summarize data in the form of:
 Graphical – Freq. Dist., Histogram, Graphs,
Charts, Diagrams
 Analytical – Measures of central tendency,
Measure of dispersion
Frequency Distribution (FD)
 Summary of how data (observations) occur
within each subdivision or groups of observed
values
 Help visualize distribution of data
 Can see how total frequency is distributed
 Two types :
 Ungrouped data – listing of observed values
 Grouped data – lump together observed values
20.5  21.5 
 
 21.5  22.5 
22.5  23.5 
FD - Ungrouped Data
1. Establish array, No of Tally mark Frequency
arrange in ascending errors
0 /////////// 13
or descend (as in 1 ////
column 1) 2 /////
2. Tabulate the 3 ////

frequency – place 4
5
tally marking in
column 2
3. Present in graphical
form – Histogram,
Relative freq. distr.
FD – Ungrouped data
14
4 graphical representations
12
1. Frequency histogram 10
Frequency
8
2. Relative freq histogram 6
4
3. Cumulative frequency histogram 2
4. Relative cum frequency histogram 0 1 2 3 4 5

No error Freq Relative Cumulative Rel cum


freq freq freq
0 15 0.29 15 0.29
1 20 0.38 35 0.67
2 8 0.15 43 0.83
3 5 0.10 48 0.92
4 3 0.06 51 0.98
5 1 0.02 52 1.00
Total 52
Frequency Distribution For
Grouped Data
 Data which are continuous variable need grouping
Steps
1. Collect data and construct tally sheet
 Make tally - coded if necessary
 Too many data – group into cells
 Simplify presentation of distribution
 Too many cells – distort true picture
 Too few cells – too concentrated
 No of cells – judgment by analyst – trial and error
 Generally 5-20 cells
 Less than 100 data – use 5 –9 cells
 100 – 500 data – use 8 to 17 cells
 More than 500 – use 15 to 20 cells
Cell interval (i)

CELL

Midpoint
UPPER BOUNDARY CELL NOMENCLATURE
2. Determine the range
 R = XH - XL
 R = range
 XH = highest value of data
 XL = lowest value of data
 Example :
 If highest number is 2.575 and lowest number
is 2.531, then
 R = XH - XL
 = 2.575 – 2.531
 = 0.044
3. Determine the cell interval
 Cell interval = distance between adjacent cell midpoints.
If possible, use odd interval values e.g. 0.001, 0.07,
0.5 , 3; so that midpoint values will have same no.
decimal places as data values.
 Use Sturgis rule.
 i = R/(1+ 3.322 log n)
 Trial and error
 h = R/i ;h= number of cells or cllases
 Assume i = 0.003; h = 0.044/0.003 = 15 cells
 Assume i = 0.005; h = 0.044/0.005 = 9 cells
 Assume ii = 0.007; h = 0.044/0/.007 = 6 cells
 Cell interval 0.005 with 9 cells will give best presentation
of data. Use guidelines in step 1.
4. Determine cell midpoints
 MPL = XL + i/2 (do not round)
 = 2.531 + 0.005/2 = 2.533
 1st cell have 5 different values (also the other
cells)

2.5 2.53
33 8
2.531 2.532 2.533 2.534
2.535
5. Determine cell boundaries
 Limit values of cell
 lower
 upper
 To avoid ambiguity in putting data
 Boundary values have an extra decimal
place or sig. figure in accuracy that
observed values
 + 0.0005 to highest value in cell
 - 0.0005 to lowest value in cell
6. Tabulate cell frequency
 Post amount of numbers in each cell
 Frequency distribution table

Cell boundary Cell MP Freq.


2.531 – 2.535 2.533 6
2.536 – 2.540 2.538 8
2.541 – 2.545 2.543 12
2.546 – 2.550 2.553 13
2.551 – 2.555 2.553 20
2.556 – 2.560 2.563 19
2.561 – 2.565 2.563 13
2.566 – 2.570 2.568 11
2.571 – 2.575 2.573 8
110
 Freq dist gives better view of central value and
how data dispersed than the unorganized data
sheet
 Histogram – describes variation in process
 Used to
 solve problems
 determine process capability
 compare with specifications
 suggest shape of distribution
 indicate data discrepancies, e.g. gaps
Characteristics Of Frequency
Distribution
 Symmetry, Number of modes (one, two
or multiple), Peakedness of data
Bi-modal

Sym.

Skew
Right
Skew
flatter
Left ‘very peak’
platykurtic
leptokurtic
Characteristics of Frequency
Distribution
 F.D. can give sufficient info to provide basis for
decision making.
 Distributions are compared regarding:-

Location Spread Shape


Descriptive Statistics
 Analytical method allow comparison between
data
 2 main analytical methods for describing data
 Measures of central tendency
 Measures of dispersion
 Measures of central tendency of a distribution -
a numerical value that describes the central
position of data
 3 common measures
 mean
 median
 mode
Measure of Central Tendency
 Mean - most common measure used
 What is middle value? What is average
number of rejects, errors, dimension of
product?
 Mean for Ungrouped Data - unarranged
 x (x bar)
n

X x i
x1  x 2   x n
 i1

n n
Mean
Example
A QA engineer inspects 5 pieces of a tyre’s
thread depth (mm). What is the mean thread
depth?
x1 = 12.3 x2 = 12.5 X3 = 12.0.
x4 = 13.0 x5 = 12.8
Σx i 62.5
x    12.5 mm
5 5
Mean - Grouped Data
 When data already grouped in frequency
distribution
h

f i xi
x  i 1
 Σfi 
fi (n)= sum. of freq.
fi = freq in the ith cell
n = no. of cells/class
xi = mid point in ith cell
Mean - Grouped Data
Cell (i) Class Mid Freq Fixi fi fixi
boundary Point (fi)
(xi)
1 1 – 20 10 2 20 2
2 21 – 40 30 10 300 12
3 41 - 60 50 20 1000 32
4 61 – 80 70 12 840 44
5 81 -100 90 6 540 50
Totals 2700

 fi xi
x  = 2700/50 = 54
fi
Weighted average
Tensile tests aluminium alloy conducted with different
number of samples each time. Results are as follows:
1st test : x1 = 207 MPa n=5 n
 w i xi
2nd test : x2 = 203 MPa n=6 xw  i 1
n
3rd test : x3 = 206 MPa n=3  wi
i 1

xw 
(5)(207)  6 (203)  3 (206)
 205 MPa
xw = weighted avg.
563 wi = weight of ith
or use sum of weights equals 1.00 average
W1 = 5/(5+6+3) = 0.36
W2 = 6/(5+6+3) = 0.43
W3 = 3/(5+6+3) = 0.21 Total = 1.00
Median – Ungrouped Data
 Median – value of data which divides total
observation into 2 equal parts
 Ungrouped data – 2 possibilities
 When total number of data (N) is a) odd or b)
even
 If N is odd ; (N+1/2)th value is median
 eg. 3 4 5 6 8 N+1/2=6/2=3 ,
3rd no.
 If N is even
 eg. 3 5 7 9 ½ of (5+7)=6

 NOTE: ORDER THE NUMBERS FIRST!


Median – Grouped Data
 Need to find cell / class having middle value &
interpolating in the cell using
n 
  cfm 
x 0.5  L m   2 i
 fm 
 
 
Lm = lower boundary of cell with the median
Cfm = Cum. freq. of all cells below Lm
fm =class/cell freq. where median occurs
i = cell interval
Example
MD = 40.5 + 10
= 53.5
Measures of dispersion
 describes how the data are spread out or
scattered on each side of central value
 both measures of central tendency & dispersion
needed to describe data
 Exams Results
 Class 1 – avg. : 60.0 marks
 highest : 95
 lowest : 25
 Class 2 – avg. : 60.0 marks
 highest : 100
 lowest : 15 marks
Measures of dispersion
 Main types – range, standard deviation,
and variance
 Range – difference bet. highest & lowest
value
 R = XH - XL
 Standard deviation
 Variance – standard deviation squared
 Large value shows greater variability or
spread
Standard deviation
 For Ungrouped Data
 s = sample std. dev.
n

  x  x
2
i
s i1
n 1
xi = observed value
x = average
n = no. of observed value
or use
2
 n
 n 
n x i    x i 
2

s i 1  i1 
n  n 1
Standard deviation – grouped
data
Cell (i) Class Mid Freq Fixi fi fixi
boundary Point (fi)
(xi)
1 1 – 20 10 2 20 2
2 21 – 40 30 10 300 12
3 41 - 60 50 20 1000 32
4 61 – 80 70 12 840 44
5 81 -100 90 6 540 50
Totals 2700

   h 2
h
n fi x i     fi x i   50 (166,600)  (2700)2
2
  424.49  20.6
s 1  1  50  49 
n (n 1)

NOTE: DO NOT ROUND OFF fixi & fixi2


ACCURACY AFFECTED
Concept Of Population and Sample
 Total daily prod. of steel shaft. Population
 Year’s Prod. Volume of calculators
 Compute x and s sample statistics
 True Population Parameters
  and 
 Why sample? Sample
 not possible measure population
 costs involved
 100% manual inspection –
accuracy/error
Concept Of Population and Sample

SAMPLE POPN.
Statistics, Parameter
x , s  - mean
 - std. dev.
Normal Distribution
 Also called Gaussian distribution
 Symmetrical, unimodal, bell-shaped dist
with mean, median, mode same value
 Popn. curve – as sample size  cell
interval  - get smooth polygon

ND
Normal Distribution
 Much of variation in nature & industry
follow N.D.
 Variation in height of humans, weight of
elephants, casting weights, size piston
ring
 Electrical properties, material – tensile
strength, etc.
Example - ND
Characteristics of ND
 Can have different mean but same
standard deviation
 Different standard deviation but same
mean
Relationship between std
deviation and area under curve
Normal Distribution Example
 Need estimates of mean and standard
deviation and the Normal Table
 Example :
 From past experience a manufacturer
concludes that the burnout time of a
particular light bulb follows a normal
distribution. Sample has been tested and
the average (x ) found to be 60 days with
a standard deviation () of 20 days. How
many bulbs can be expected to be still
working after 100 days.
Solution
 Problem is actually to find area under the curve beyond 100 days
 Sketch Normal distribution and shade the area needed
 Calculate z value corresponding to x value using formula
 Z=(xi - )/ = (100-60)/20 = +2.00
 Look in the Normal Table for z = +2.00 – gives area under curve as
0.9773
 But, we want x >100 or z > 2.00. Therefore Area = 1.000 – 0.9773
= 0.0227, i.e. 2.27% probability that life of light bulb is > 100
hours
σ =20

0 μ = 60 100 x
Test For Normality
 To determine whether data is normal
 Probability Plot - plot data on normal probability
paper
 Steps
1. Order the data
2. Rank the observations
3. Calculate the plotting position
i= rank , n=sample size,
100(i  position
PP= plotting 0. 5 ) in %
PP 
n
4. Label data scale
5. Plot the points on normal probability paper
6. Attempt to fit by eye ‘best line’
7. Determine normality
Example

You might also like