Module 2 Reminders of Statistics
Module 2 Reminders of Statistics
PPT/L-TCIO-1
Chapter II – Reminder
of statistics &
Introduction to
Statistical Process
Operational quality and Lean
management - S7 EENG 4 Control Title of Lesson
Title of Lesson 1
ID document: e.g. PPT/L-TCIO-1
Chapter summary
1 - Basic tools of statistics (Calculation of mean value, standard deviation, quartile,
median, mode)
- Notion of population and sample
- Positional characteristics
2 - Graphical representation (Histogram, box plot, Pareto chart)
3 - Normal law (normal distribution)
- Introduction to the normal law
- The parameters of the normal law
- Construction and analysis of the histogram
4 - Use and comparison of these three indicators: Mean value, mode and median
Title of Lesson 2
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 3
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 4
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 5
ID document: e.g. PPT/L-TCIO-1
● We can theoretically multiply this measure to infinity in order to have the true value
of the average. We will then speak about population.
● Notations :
- n : size of the sample, composed of n data
- xi : value of the ith data of the sample
- N : size of the population
Title of Lesson 7
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 9
ID document: e.g. PPT/L-TCIO-1
Nuts sampling
Si
ze
Thickness
Title of Lesson 11
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 12
ID document: e.g. PPT/L-TCIO-1
- Easy to calculate
- Sensitive to extreme values, it is sometimes necessary to remove aberrant values.
Title of Lesson 13
ID document: e.g. PPT/L-TCIO-1
The median divides the data set into 2 equal parts. There are 50% of the values on each
side of the median. It is usually represented by the symbol Me.
If xi is the ith data of the sample ordered in increasing order of size n, then the median is
defined by :
Title of Lesson 14
ID document: e.g. PPT/L-TCIO-1
Me =
(12,86+12,86)/2
15 values
Title of Lesson 15
ID document: e.g. PPT/L-TCIO-1
In descriptive statistics, a quartile is each of the three values that divide the sorted
data into four equal parts, so that each part represents 1/4 of the population sample.
Following this logic: the second quartile is therefore the median of the sample.
Title of Lesson 16
ID document: e.g. PPT/L-TCIO-1
● The third quartile (Q3) is the smallest value in the series such that at least 75% of the
values are less than or equal to Q3.
Title of Lesson 17
ID document: e.g. PPT/L-TCIO-1
23 values Me
75% of 30 = 22,5
3rd Quartile is place 23 = 12,87mm
Title of Lesson 18
ID document: e.g. PPT/L-TCIO-1
The mode also characterizes the position of the distribution. The mode is the value
where the frequency is the most important. It will therefore correspond to the
largest bar on the histogram.
● Mode properties:
- The mode does not always exist, and when it does, it is not always unique.
- We will take up this notion of mode after the definition of graphical representations
and the definition of classes.
Title of Lesson 19
ID document: e.g. PPT/L-TCIO-1
The estimation of the standard deviation of a population is done on a sample. The real
standard deviation (σ) is calculated using the values of the complete population. S (or
σn-1) is therefore the best estimation of σ on a sample of this population:
Nut size
𝑿
Big standard deviation
Xi
Title of Lesson 22
ID document: e.g. PPT/L-TCIO-1
The true value of the standard deviation, calculated on a whole population, is expressed
by :
Title of Lesson 23
ID document: e.g. PPT/L-TCIO-1
- In the case where the distribution follows a normal distribution, the standard
deviation allows us to find the percentage of the population belonging to an
interval centered on the mathematical expectation.
- The standard deviation takes into account all the data, it is the best dispersion
characteristic.
Title of Lesson 24
ID document: e.g. PPT/L-TCIO-1
2 – Graphical representation
Title of Lesson 25
ID document: e.g. PPT/L-TCIO-1
2 – Graphical representation
- Definition
- Construction
A - Histogram
Title of Lesson 26
ID document: e.g. PPT/L-TCIO-1
A histogram is a contiguous bar graph (rectangle) whose areas are proportional to the
frequencies.
The number of classes (rounded up to the next integer) is given by the following
relationship:
- N : number of values
The number of classes is generally limited to 20 classes. For the nuts size example : √30 ~ 5,5
rounded to 6
Title of Lesson 28
ID document: e.g. PPT/L-TCIO-1
The first step is to calculate the measurement range (Wt) of the sample:
Wt = For the nuts size example :
Wt = 12,92-12,77 = 0,15
We can then calculate the class interval (Ht):
Ht : 0,15/6= 0,025
Caliper resolution = 0,01
Ht rounded to : 0,03
- The theoretical class interval should be rounded to a multiple of the measurement resolution of the
instrument. The measurement resolution depends on the instrument used.
- If the class interval is not a multiple of the instrument's resolution, each class will not contain the same
measurement deviation.
Title of Lesson 29
ID document: e.g. PPT/L-TCIO-1
The lower limit value is equal to the smallest value minus half the resolution.
● Example :
For the nuts size example :
Lower value = 12,77mm
We have :
Caliper resolution = 0,01
- Lower value = 9,3
- Measurement resolution = 0,1 Lower limit value = 12,77-0,005=12,765
Title of Lesson 31
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 32
ID document: e.g. PPT/L-TCIO-1
2 – Graphical representation
B - Box plot
Title of Lesson 33
ID document: e.g. PPT/L-TCIO-1
● The box:
- The bottom of the box corresponds to the first quartile (Q1: 25% of the population)
and the top to the third quartile (Q3: 75% of the population).
The height of the box corresponds to the interquartile distance (Q3 - Q1 = 50% of the
population).
Title of Lesson 34
ID document: e.g. PPT/L-TCIO-1
These are the lines that run on either side of the box. They represent the range of the
data, if there are no outliers (aberrant values) :
Outliers are the points outside the high and low boundary values defined by Tukey's rule.
They are represented by asterisks (*).
Title of Lesson 35
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 36
ID document: e.g. PPT/L-TCIO-1
3rd Quartile
A big part of the measures
are at 12,86mm
Mediane
1rst Quartile
Title of Lesson 37
ID document: e.g. PPT/L-TCIO-1
2 – Graphical representation
C – Pareto analysis
Title of Lesson 38
ID document: e.g. PPT/L-TCIO-1
Context
Very likely, in your professional life you’ll have more topics to work on than
your availability or your essources will allow you .
The question will then be : what topics should I select to work on in order
to have the better overall impact
Title of Lesson 39
ID document: e.g. PPT/L-TCIO-1
The purpose of the Pareto analysis is to allow to classify different elements that are
sources of trouble, of problems, by order of importance and to suggest objectively
a choice.
Title of Lesson 40
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 41
ID document: e.g. PPT/L-TCIO-1
The first step is to reproduce a first table (next slide) and to fill it with the different
elements to be analyzed (the different families of components). It will allow us to
classify these families and to include them in the Pareto recovery table.
Title of Lesson 42
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 43
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 44
ID document: e.g. PPT/L-TCIO-1
We can now check if the distribution of the components follows the Pareto
law or not.
Title of Lesson 47
ID document: e.g. PPT/L-TCIO-1
● Class A: products in this class generally represent 80% of the total value of
the inventory and 20% of the total number of items. It is on this point that the
ABC classification method is the heir of the Pareto principle;
● Class B: the articles generally represent 15 % of the total value of stock and
30 % of the total number of articles;
● Class C: items generally represent 5% of the total stock value and 50% of the
total number of items.
Title of Lesson 48
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 49
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 50
ID document: e.g. PPT/L-TCIO-1
3 – Normal law
Title of Lesson 51
ID document: e.g. PPT/L-TCIO-1
3 – Normal law
- Distribution
A – Introduction to normal
law
Title of Lesson 52
ID document: e.g. PPT/L-TCIO-1
https://www.facebook.com/TrustMyScience/videos/planche-de-galton/
971920232964990/
Title of Lesson 53
ID document: e.g. PPT/L-TCIO-1
Why do the marbles accumulates in the channels like the red curve ?
Title of Lesson 54
ID document: e.g. PPT/L-TCIO-1
Galton board
bell-shaped curve
Also called Gauss Curve
Title of Lesson 55
ID document: e.g. PPT/L-TCIO-1
Example:
If we represent the histogram of the sizes of a group of 50 men, the distribution
will not be random. Many people are around 1.75m tall, and few are 1.95m
tall. Similarly, men who are 1.55m tall are quite rare. The size distribution will
therefore follow a normal distribution.
Title of Lesson 57
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 58
ID document: e.g. PPT/L-TCIO-1
The average position of the pieces gives a good indication of the machine's
setting position.
The width of the curve (dispersion) will give a good indication of the machine's
ability to produce pieces within a given tolerance interval.
Title of Lesson 59
ID document: e.g. PPT/L-TCIO-1
3 – Normal law
-Range
-Standard deviation
B – Normal law parameters
Title of Lesson 60
ID document: e.g. PPT/L-TCIO-1
- The range is easy to calculate, but very imprecise when the number of
values is high (>10). Indeed, the range only takes into account the 2 extreme
values. An outlier therefore has a lot of influence on the range.
Title of Lesson 61
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 62
ID document: e.g. PPT/L-TCIO-1
It is important to make the difference between the range and the dispersion.
Title of Lesson 64
ID document: e.g. PPT/L-TCIO-1
dispersion. Dispersion
Title of Lesson 65
ID document: e.g. PPT/L-TCIO-1
3 – Normal law
- Example of packages
C – Histogram construction
Title of Lesson 66
ID document: e.g. PPT/L-TCIO-1
Histogram construction
This part will be based on an example : parcels.
In order to establish new scales by delivered weight, a transport company has taken
the weights in kg of 92 shipments controlled with a digital scale.
The document
The table of the different weights is on Moodle. for the example
is on Moodle
Title of Lesson 67
ID document: e.g. PPT/L-TCIO-1
Histogram construction
- Step 1: Calculation of the range
In each column the maximum and minimum values have been marked. The
sample size is n = 92 parcels.
Title of Lesson 68
ID document: e.g. PPT/L-TCIO-1
Histogram construction
- Step 2: Number of classes
Number of classes :
On the range, all the values are distributed in
intervals of constant width called classes. The
distribution by classes is more meaningful than
the one that would consist in marking each
value with a cross.
Title of Lesson 69
ID document: e.g. PPT/L-TCIO-1
Histogram construction
- Step 3: Width between classes
The width of the classes is given by the range, divided by the number of classes.
The center of the class should be a simple number that makes the calculations easier.
It doesn't matter what the ends of the classes are, they will be of little use for the
calculation.
To simplify: Ht = 5 (from - 2.5 to + 2.5 around the class
center)
The class centers are: 35, 40, 45, 50 ...
Calculation is for you
Title of Lesson 70
ID document: e.g. PPT/L-TCIO-1
Histogram construction
- Step 4: Construction of the histogram
Title of Lesson 71
ID document: e.g. PPT/L-TCIO-1
Histogram construction
- Step 4: Construction of the histogram
Title of Lesson 72
ID document: e.g. PPT/L-TCIO-1
3 – Normal law
-Comparison parameters
Title of Lesson 73
ID document: e.g. PPT/L-TCIO-1
- The arithmetic mean: it is calculated by the ratio of the sum of the values to
the considered number of individuals.
Title of Lesson 74
ID document: e.g. PPT/L-TCIO-1
It is calculated in the same way as before but used in the case where the
numbers are grouped by class and the values considered at the value of the
class center.
Title of Lesson 75
ID document: e.g. PPT/L-TCIO-1
● Median :
This is the value that divides the sample size into two equal parts. If the sample
size is even, there are two values in the middle, and the median is the half-
sum of the two values.
Title of Lesson 76
ID document: e.g. PPT/L-TCIO-1
With as parameters :
- ni : class size
- xi : classe center value
- X : Arithmetic mean
- N : Total
The variance is an index that characterizes the distribution (in area) of the
population, i.e. the form of spread of this population in relation to the
average.
Title of Lesson 77
ID document: e.g. PPT/L-TCIO-1
- ni : class size
Title of Lesson 79
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 80
ID document: e.g. PPT/L-TCIO-1
- Advantages
A - Mean
- Disadvantages
Title of Lesson 81
ID document: e.g. PPT/L-TCIO-1
• Advantages: • Disadvantages :
Title of Lesson 82
ID document: e.g. PPT/L-TCIO-1
- Advantages
B - Mode
- Disadvantages
Title of Lesson 83
ID document: e.g. PPT/L-TCIO-1
• Advantages: • Disadvantages :
Title of Lesson 84
ID document: e.g. PPT/L-TCIO-1
- Advantages
C - Median
- Disadvantages
Title of Lesson 85
ID document: e.g. PPT/L-TCIO-1
• Advantages: • Disadvantages:
- Concrete and easy to understand - Does not take individual values into
meaning account at all
- Simple to define
Title of Lesson 86
ID document: e.g. PPT/L-TCIO-1
D – Normal law
Title of Lesson 87
ID document: e.g. PPT/L-TCIO-1
The distribution is symmetrical with respect to the central value from where :
Title of Lesson 88
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 89
ID document: e.g. PPT/L-TCIO-1
Title of Lesson 90
ID document: e.g. PPT/L-TCIO-1
Documents Management
Qualité 1 – Chapitre 2 – Rappels et introduction à la maitrise
Title: statistique des procédés ID No.:
Root:
Description:
Sébastien
1.0 30/08/2021 Lasserre Initial document
2,0 29/01/23 Laurent Sestier Added several examples and Galton board experience
Title of Lesson 91