Distribution Analyzer User Guide
Distribution Analyzer User Guide
A nalyzer
Break Force (pounds)
Pearson Family (2.91937, 0.889588, 1.353065, 6.814741)
Sample Size = 50 LSL = 1
Average = 2.92
Standard Deviation = 0.88
Skewness = 0.93
Excess Kurtosis = 0.57
Pp = ---
Ppk = 1.37
Est. % In Spec. = 99.998051% 1.400 3.400 5.400
With 95% confidence more than 99% of the values are above 1.333
With 95% confidence more than 99.9641% of the values are in spec.
Torque
No Transformation (Normal Distribution)
Sample Size = 30 LSL = 17 USL = 37
Average = 26.0
Standard Deviation = 3.3
Skewness = 0.11
Excess Kurtosis = -1.23
Pp = 1.01
Ppk = 0.90
Est. % In Spec. = 99.623524% 20.00 26.00 32.00
With 95% confidence more than 99% of the values are between 14.89 and 37.04
With 95% confidence more than 97.9349% of the values are in spec.
User's Guide
Distribution
Analyzer
Version 1.2
Dr. Taylor retired from his position as Director of Quality Technologies at Baxter Healthcare
Corporation where he was responsible for implementing Baxter's Six Sigma program. He
had been with Baxter for 22 years.
Dr. Taylor is author of the books Optimization and Variation Reduction in Quality and Guide
to Acceptance Sampling. His two courses Successful Acceptance Sampling and Robust
Tolerance Analysis have been attended by thousands of engineers and scientists.
Dr. Taylor is a leading expert on acceptance sampling and process validation in the
pharmaceutical, medical device and diagnostics industries. His articles on selecting
statistically valid sampling plans have become standards in the industry. He is one of the
authors of the Global Harmonization Task Force guideline on Process Validation.
The VarTran software and Dr. Taylor's course Robust Tolerance Analysis have rapidly
become a key component of many companies DFSS (Design for Six Sigma) programs.
Honeywell, the Six Sigma Academy and numerous other companies have adopted VarTran
as an essential tool for designing high quality products.
Dr. Taylor received his Ph.D. in Statistics from Purdue University. He is a fellow of the
American Society for Quality.
Table of Contents
Chapter 1 Program Installation 1
1.1 System Requirements 1
1.2 Installation Distribution Analyzer 1
1.3 Registering the Software 2
1.4 Uninstalling Distribution Analyzer 2
Chapter 2 Getting Started 3
2.1 Uses and Capabilities 3
2.2 Test for Normality 6
2.3 Transforming Data 9
Chapter 3 Program Details 15
3.1 Main Window 15
3.2 Data Window 16
3.3 Test Distribution Window 20
3.4 Transforming Data 32
3.5 Skewness Kurtosis Plot Window 36
3.6 Generating Random Values 46
Chapter 4 Menus and Toolbar 49
4.1 File Menu 49
4.2 Edit Menu 52
4.3 Analysis Menu 54
4.4 Window Menu 56
4.5 Help Menu 57
4.6 Toolbar 59
Chapter 5 Distributions 61
5.1 What is a Distribution? 61
5.2 Beta Distribution 64
5.3 Exponential Distribution 66
5.4 Extreme Value, Largest Family (Fréchet) 68
5.5 Extreme Value, Smallest Family (Weibull) 72
5.6 Gamma Distribution 76
5.7 Johnson Family 78
5.8 Loglogistic Family 81
5.9 Lognormal Family 84
5.10 Normal Distribution 86
5.11 Pearson Family 88
5.12 Uniform Distribution 92
Glossary 95
References 119
Index 121
1 Program Installation
This section describes the system resources required by Distribution Analyzer and how to
install Distribution Analyzer on your computer.
A shortcut is added to the desktop for starting the program. It appears as to the
right. Double clicking this shortcut will start the program.
In addition, four menu items are added to All Programs menu. Click on the button
in the lower left corner of the screen and then click on the All Programs menu item. Find the
Distribution Analyzer folder and click on it. This displays the following four menu items:
The first menu item starts the program. The second menu item displays program
documentation in pdf format. You must have a copy of the free Adobe Acrobat Reader® to
view this file. The third menu item displays the Distribution Analyzer home page. The final
menu item uninstalls the program.
Program Installation 1
1.3 Registering the Software
Distribution Analyzer can be used free of charge for 30 days. After the 30 days trial period,
you must register the software to continue to use it. To register the software, go to
www.variation.com/da and click the Register button. Up to date pricing information will be
provided along with further instructions. The registration process can be completed online.
The Help menu of Distribution Analyzer also contains further details about registering the
software.
Once the software is registered, you will receive a user name and registration code by email.
These must be entered into the software. The user name and registration code can be
manually entered into Distribution Analyzer by selecting the Enter Registration Code menu
item on the Help menu. This displays the Registration dialog box shown below. The user
name and registration code must both be entered exactly as provided (case-sensitive). Once
done, click the OK button.
This will remove the program, user manual and example files. It will not affect any files you
created using the software. You can also uninstall it using the menu item described in
Section 1.2.
2 Chapter 1
2 Getting Started
Distribution Analyzer is used to test whether a set of data fits the normal distribution and, if
not, to determine which distribution best fits the data. Associated with each distribution is a
transformation that, when applied to the data, will convert data from that distribution to the
normal distribution. Once the data is transformed to the normal distribution, Distribution
Analyzer constructs confidence statements like the following:
• With 95% confidence more than 99% of the values are between 17.8 and 23.2 pounds
(normal tolerance interval)
• With 95% confidence more than 99.34% of the values are within the specification
limits (variables sampling plan)
2. Test and Fit Distributions: Distribution Analyzer has the ability to fit a wide range
of distributions using both the method of moments and maximum likelihood methods.
Both methods have been modified to ensure the distribution covers a specified range.
Getting Started 3
Distribution Analyzer has simplified the whole process of comparing and fitting
distributions by characterizing all distributions in terms of their moments (average,
standard deviation, skewness and kurtosis) rather than using a different set of Greek
letters for each distribution.
3. Learning Tool: Distribution Analyzer can be used to learn about the many
distributions and their relationships. Included is a skewness-kurtosis plot for
understanding the range of shapes each distribution can fit and the relationships
between the different distributions. For each distribution you can view density plots,
calculate probabilities and explore the effect of changing the parameters. Finally, you
can generate random values for any of the distributions as well as for dice
experiments to experience handling different types of data.
1. Robust and Specific Tests for Normality: While Distribution Analyzer contains the
traditional Anderson-Darling and Shapiro-Wilks tests, it contains two equally
powerful tests designed to overcome shortcomings of these two tests. The first is the
Skewness-Kurtosis All test that, like the previous two tests, tests for all departures
from normality. However, this test is not adversely affected by ties in the data, as are
the other two tests. The second is the Skewness-Kurtosis Specific test. This test is
designed to detect those departures from normality that invalidate the use of a normal
tolerance interval or variables sampling plan. Specifically, they are designed to detect
tails that are heavier than the normal distribution. It does not reject when the tails are
equal to or less than the normal distribution so that the confidence statements remain
valid, although potentially conservative.
• Beta Distribution
• Exponential Distribution
• Extreme Value Distributions (Smallest Extreme Value, Largest Extreme
Value, Weibull, Fréchet)
• Gamma Distribution
• Johnson Family of Distributions
• Logistic and Loglogistic Distributions
• Lognormal Distribution
• Normal Distribution
• Pearson Family of Distribution (Includes Inverse Beta and Inverse Gamma)
• Uniform
4 Chapter 2
Not only are the above distributions covered, but the negative of these distributions
are included to facilitate the fitting of data with negative skewness. Further, if there
are physical bounds (like zero), these bounds can be used to pre-transform the data to
the unbounded case. This effectively expands the above list of distributions to
include the Log-Beta, Log-Pearson and much more. Distribution Analyzer automates
the whole process by letting you click a button "Fit Best Distribution". However, you
also have the ability to completely control the selection process including the
distribution fit, the method of fitting and whether to pre-transform.
3. Integrated Supporting Analysis: Data can fail a normality test for a variety of
other reasons including a shift over time, a mixture of different groups and the
presence of outliers. Whenever testing that a distribution fits the data, the data is
automatically checked for outliers, shifts over time and differences between groups
and the user notified if anything of importance is detected.
4. Detailed Information about Each Distribution: For each distribution you can view
density plots, calculate probabilities and explore the effect of changing the
parameters. You can view the distribution on a skewness-kurtosis plot for
understanding the range of shapes each distribution can fit and its relationships with
other distributions.
5. Generating Data: Random numbers from any of the distributions can be generated.
You can also use Distribution Analyzer to perform simulated dice experiments to
illustrate the types of physical phenomena that create different distributions.
The remainder of Chapter 2 highlights the process of testing for normality and transforming
data to allow one to quickly get started. The remaining chapters then provide more complete
details.
Getting Started 5
2.2 Testing for Normality
Take as an example the torque values shown below. There is a lower spec limit of 17 pounds
and upper spec limit of 37 pounds:
Start by entering the data in the first column of the Data window. The column labeled
"Trans. Value" automatically displays the original values because no transformation has been
selected.
6 Chapter 2
The Characteristic, Units, I.D. and Date fields are optional. In this case “Torque” was
entered in the Characteristic edit box and “lbs.” was entered in the Units edit box. Also
enter the upper and lower spec limits.
Finally enter any physical bounds. In this case negative values are impossible, so there is a
physical lower bound of zero. For yield data, reported as a percent, there is a lower bound of
zero percent and upper bound of 100 percent.
Click the Test Distribution button to test whether the data fits the
normal distribution and, if so, to perform additional analysis. The Test Distribution window
appears as shown below:
A histogram of the data is shown along with estimates of the average, standard deviation,
skewness and excess kurtosis. As no transformation was selected, the best fit normal curve is
shown in blue.
Also shown are the results of two tests for normality. The first is a general test for all
departures from normality called the Skewness-Kurtosis All test (SK All). This test has a p-
value of 0.0577. The p-value is the probability that the data or one more extreme than it
would have been generated if the data came from the normal distribution. The smaller the p-
value, the more evidence there is that the data does not come from the normal distribution.
The rule is that one rejects the normal distribution if the p-value is 0.05 or below. This
corresponds to data with a 1 in 20 chance or below of being generated by the normal
distribution. If the normality test fails, one can state: “With 95% confidence the data is not
from the normal distribution.” If one passes the normality test, one can state: “No significant
departure from normality was detected.” The p-value of 0.0577 passes, although barely.
Getting Started 7
The second test for normality is the Skewness-Kurtosis Specific test (SK Spec). This test is
designed to only reject for those departures from the normality that invalidate the confidence
statements associated with normal tolerance intervals and variables sampling plans. Passing
this test indicates it is OK to use these two procedures even if the other normality test fails.
There is no associated p-value for this test. Just a pass/fail result is displayed. However, the
interpretation of this decision is the same as before and the same confidence statements can
be made. This test also passes. As a result, it is OK to proceed with normal tolerance
intervals and variables sampling plans.
Distribution Analyzer also can perform the Anderson-Darling and Shapiro-Wilks tests. To
see the results of these tests, click the Menu button
(or right mouse click) and the popup menu to the
right will appear. Then select the desired test.
The default general test for all departures from
normality is the Skewness-Kurtosis All test. This
can be changed to either of the other two tests
using the Options menu item on the Analysis
menu. The Skewness-Kurtosis Specific test is
always performed.
At the bottom are the normal tolerance interval, "With 95% confidence more than 99% of the
values are between 14.89 and 37.04.", and the variables sampling plan, "With 95%
confidence more than 97.9349% of the values are in spec." These statements also depend on
the data fitting the normal distribution. The confidence level and percent in the interval can
be adjusted using the Tolerance Interval Options menu item on the above popup menu. The
normal tolerance interval and variables sampling plan are closely related. The normal
tolerance interval gives an interval containing a specified percentage of values. The variables
sampling plan gives the percentage of values in a specified interval (namely the
specifications). The variables sampling plan generally is more useful for
validation/verification/qualification studies.
By default, the estimated percentage in spec and the two confidence statements are only
displayed if one of the two normality tests passes. This can be changed using the Options
menu item on the Analysis menu.
8 Chapter 2
2.3 Transforming Data
Take as a second example the removal force values shown below. There is a lower spec limit
of 4 pounds and upper spec limit of 15 pounds.
Start by entering the data in the first column of the Data window. The column labeled
"Trans. Value" automatically displays the original values because no transformation has been
selected.
Getting Started 9
“Removal Force” was entered in the Characteristic edit box and “lbs.” was entered in the
Units edit box. The upper and lower specification limits were also entered. Since negative
values are impossible, a lower bound of zero was entered.
Click the Test Distribution button to test whether the data fits the
normal distribution and, if so, to perform additional analysis. The Test Distribution window
appears as shown below:
What is different this time is that both normality tests fail. As a result neither the estimated
percentage in spec nor the two confidence statements are displayed. This requires that the
data be transformed. Transforming the data means to apply a function like the log to the
values so that the transformed values fit the normal distribution. The same transformation is
applied to the specifications limits. The percentage in spec and the two confidence
statements can then be correctly calculated using the transformed values.
Identifying the best transformation is done by identifying the distribution that best fits the
data. For every distribution there is a transformation that will make data from that
distribution fit the normal distribution.
10 Chapter 2
deviation, skewness and kurtosis. The transformed values are displayed in the “Trans.
Values” column.
Transforming data has the potential for being abused. There may be several distributions that
fit the data. The decision of which one to use should be based solely on which one results in
transformed values best fitting the normal distribution. Distribution Analyzer returns the
distribution maximizing the p-value of the Skewness-Kurtosis All test for normality.
Next click the Test Distribution button again to see if the Johnson distribution fits the data.
The results are shown below. This time both tests pass. The Johnson distribution fits the
data and can be used to analyze and transform the data. The Johnson distribution fit to the
data is shown in blue. As a result, further analysis is performed and shown at the bottom.
The resulting normal tolerance interval is: "With 95% confidence 99% of the values are
between 5.143 and 15.610." The resulting variables sampling plan gives: "With 95%
confidence statement more than 99.3411% of the values are in spec."
Getting Started 11
To further understand the transformation and how it works, the second tab of the Test
Distribution window shows the transformation:
12 Chapter 2
A histogram of the transformed values is shown along with the best fit normal curve. The
transformed values themselves are shown in the last column of the Data window. The same
transformation is applied to the spec limits as well. Distribution Analyzer uses special
routines for fitting the different distributions to ensure that the spec limits are safely within
the range of the distribution and can be transformed. This allows Distribution Analyzer to
find transformations for nearly every set of data.
The Ppk, Pp, percent in spec, normal tolerance interval and confidence statement relative to
the spec limits are all calculated using the transformed values. These same results were
displayed on the previous page except the tolerance interval (-3.528, 3.528) is transformed
back into the original units of measure (5.143, 15.610) by applying the inverse of the
transformation equation. From the above analysis, it is seen more values are predicted to
exceed the upper spec limit than the lower.
• Test for shifts over time if the order the data points were collected is indicated in the
Order column of the Data window.
• Test for differences between groups like cavities, nozzles and operators if the groups
are indicated in the Group column of the Data window.
• Identify potential outliers.
• Generate random values.
• Understand relationships between distributions using a skewness-kurtosis plot.
Getting Started 13
3 Program Details
This section describes more completely how to enter information into Distribution Analyzer
and how to use the software to perform different analysis.
Caption Bar: The caption bar contains the program name and the name of any associated
file. When the program is not maximized to cover the entire screen, dragging the caption bar
Program Details 15
moves the window. Double clicking on the caption bar maximizes/restores the window. On
the left of the caption bar is the system menu button . Clicking this button displays the
system menu containing items to move, size, and close the window. The program can also
be closed by double clicking on this button. On the right of the caption bar are the minimize
button , the maximize/restore button , and the close button .
Menu Bar: The menu bar provides a list of drop-down menus containing menu items. These
menu items serve as the primary means of telling the program what to do. Menu items exist
for adding/editing data, performing analysis, printing the results and much more. Chapter 4
gives a complete description of all the menus.
Toolbar: The toolbar contains buttons serving as shortcuts for the most commonly used
menu items. To see what a particular button does, hold the mouse cursor over the top of the
button. A description of the button will appear.
Interior: The interior initially contains the Data child window. Later we will encounter
other child windows that can also be displayed in the interior. If a child window extends
outside the interior of the main window, scroll bars are provided for shifting the child
windows up/down and left/right.
A sizing border surrounds the window except when the main window is maximized.
Dragging the border causes the window to be resized.
Order Column: Optionally, enter the time order the values were collected in. This may
be either numbers or labels. The data does not have to appear in time order as the values
entered in the Order column are sorted into numeric or alphanumeric order. If several
values are taken at the same time, use the same number or label for all values. The Order
column can be automatically filled by clicking the Fill Order/Group Column button to
display the Fill Order/Group Column dialog box. If the order is specified, additional
analysis is performed to see if the values shifted over time and the results are displayed in
the Order - Analysis tab of the Test Distribution window.
Group Column: Optionally, enter the group (cavity, operator, line, lot, etc.) associated
with each value. This may be either numbers or labels. The Group column can be
automatically filled by clicking the Fill Order/Group Column button to display the Fill
Order/Group Column dialog box. If the groups are specified, additional analysis is
16 Chapter 3
performed to see if the groups are different and the results are displayed in the Group -
Analysis tab of the Test Distribution window.
Transformed Values Column: You cannot type into this column. Instead it is used to
display the transformed values. If no distribution/transformation is selected, the values
are identical to the first column. Distributions/transformations are selected by clicking
the Find Best Distribution or Select Distribution buttons.
Either type the data in directly or paste it from the clipboard using the Paste menu item. The
Edit menu has further menu items for selecting, copying, cutting and clearing cells as well as
for adding, moving, deleting and renaming tab sheets.
Click the Fill Order/Group Column button to display the Fill Order/Group Column dialog
box. This dialog box is a time saving feature designed to save having to type values into the
Order or Group columns of the currently selected tab in the Data window. It can only be
used when the data is in a patterned order.
Column to Fill Radio Group Box: Select whether to fill Order column or Group
column.
Use Numbers 1 to Combo Box: Enter the number of subgroups when filling the Order
column or the number of groups when filling the Group column. It must be an integer of
2 or more.
List Each Value ? Times Combo Box: Repeat each value the specified number of times
before going to the next number. It must be an integer of 1 or more.
Program Details 17
Repeat Whole Sequence ? Times Combo Box: Repeat the whole sequence specified by
the previous 2 controls the specified number of times. It must be an integer of 1 or more.
Click the Fill Column button to generate the values for the selected column in the currently
selected tab of the Data window. If the tab already contains data, you are prompted to make
sure it is OK to overwrite the data. To exit without generating any values, click the Cancel
button or press the Esc key.
In the example above, the following values are generated in the Order column:
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3
Each tab sheet also contains the following controls for entering additional information:
Characteristic Edit Box: Optionally, use to enter the name of the characteristics. If no
characteristic is entered, the name of the sheet is used.
Units Edit Box: Optionally, use to enter the units for the characteristic.
I.D. Edit Box: Optionally, use to further differentiate data sets with product name, lot
number, lane, cavity etc.
Date Edit Box: Optionally, used to enter date or any other further information.
Lower Spec Limit Edit Box: Used to enter the lower spec limit, if one exists.
Upper Spec Limit Edit Box: Used to enter the upper spec limit, if one exists.
Lower Bound Edit Box: Used to enter the lower physical bound, if one exists. A
physical bound represents a barrier below which it is impossible for values to occur.
Frequently zero is the lower physical bound when negative values are impossible. This
bound can be used to pre-transform the data.
Upper Bound Edit Box: Used to enter the upper physical bound, if one exists. This
bound can be used to pre-transform the data.
If there are more than 3 sets of data, the Edit menu can be used to add additional sheets. It
also contains menu items for deleting, renaming and reordering the sheets.
Once the data is entered, it is time to start the analysis using the following buttons:
Test Distribution Button: Displays the Test Distribution window where the data is
analyzed to see if the selected distribution fits the data (Section 3.3). The selected
distribution is displayed in the Selected Distribution group box. It is initially set to "No
Transformation (Normal Distribution)", in which case the data is tested to see if it fits the
normal distribution. Additional analysis are also performed including: estimates of
moments, capability indexes, tolerance interval, confidence statement relative to spec
limits, test if order effect, test if groups are different and check for outliers.
18 Chapter 3
Find Best Distribution Button: Used to determine which of the available distributions
best fits the data with or without pre-transforming the data (Section 3.4). The result is
displayed in the Selected Distribution group box. The distribution producing the highest
p-value for the Skewness-Kurtosis All Test for normality when applied to the
transformed data is selected, excluding those not including the spec limits within the
range of the distribution.
Finally, Section 3.6 describes how to use Distribution Analyzer to generate random set of
data from the different distributions.
Program Details 19
3.3 Test Distribution Window
Clicking the Test Distribution button on the Data window analyzes the data on the current
sheet and displays the results in a Test Distribution window. If a distribution is selected in
the Selected Distribution group box, the associated transformation is first applied to the data.
The results appear on the 7 tabs shown below:
Tab 1: Histogram
Tab 2: Transformed Data
Tab 3: Order - Plot
Tab 4: Order - Analysis
Tab 5: Group - Plot
Tab 6: Group - Analysis
Tab 7: Outliers
Each tab has options that can be set. By performing a right mouse click over the graphic or
clicking the Menu button, a popup menu will appear to set these options. Click the Print
button to print the current tab and click the Copy button to copy the current tab to the
clipboard.
Tab 1: Histogram
Tab 1 of the Test Distribution window displays a histogram of the original data along with
the density of the selected distribution. In the graphic above, the Pearson distribution was
selected. The parameters in parenthesis are the parameters of the distribution (average,
standard deviation, skewness and kurtosis). The normal distribution is shown if no
distribution was selected or fit to the data. Also shown are:
20 Chapter 3
Moments: The sample size, average, standard deviation, skewness and kurtosis (or
excess kurtosis) of the data.
Test of Fit: Two normality tests are performed. When no transformation is selected,
these tests are applied directly to data values and test for normality. When a
transformation/distribution is selected, these tests are applied to the transformed values
and test for whether the selected distribution fits. The first test is a general test for all
departures from normality. The Skewness-Kurtosis All test (SK All) is the default test
but the Anderson-Darling test (AD) and Shapiro-Wilks test (SW) are also available.
They can be selected using the Analysis Options dialog box or the popup menu described
below. The p-value and decision are given. The test passes if the p-value > 0.05. The
second test is the Skewness-Kurtosis Specific test (SK Spec). This test is designed to
only reject for those departures from normality that invalid the tolerance interval and
confidence statement relative to the spec limits. Passing this test indicates the statements
are valid. Only the pass/fail decision is given. If the normality test fails, one can state:
“With 95% confidence the data is not from the normal distribution.” If it passes, one can
state: “No significant departure from normality was detected.”
Capability Indexes: Pp, Ppk and the estimated defect rate are shown. The estimated
defect rate may be in terms of percent in spec, percent out of spec or reliability,
depending on the option selected in the Analysis Options dialog box and/or the Tolerance
Interval Options dialog box. The estimated defect rate assumes the selected distribution
fits the data. By default, it is only shown if the selected distribution fits the data. This
can be altered using the Analysis Options dialog box and/or the Tolerance Interval
Options dialog box. When a transformation/distribution is selected, the capability
indexes and defect rate are calculated using the transformed values and spec limits.
Tolerance Interval and Confidence Statement Relative To Spec Limits: These two
statements assume the selected distribution fits the data. By default, they are only shown
if the selected distribution fits the data. This can be altered along with the confidence
level, etc. using the Analysis Options dialog box and/or the Tolerance Interval Options
dialog box. If the general test for normality passes, the statements are accurate. If the
general test fails but the Skewness-Kurtosis Specific test passes, the statements provide
conservative bounds. When a transformation/distribution is selected, these statements are
calculated using the transformed values and
spec limits. The normal tolerance interval is
then transformed back to the original units.
Program Details 21
only it the distribution fits. This dialog box is described in more detail later in this
section.
Display Excess Kurtosis: If checked, the excess kurtosis is display. Otherwise, the
kurtosis is displayed.
Increase Number Cells in Histogram: Selecting this menu item increases the number
of cells in the histogram meaning fewer values fall in each cell. This menu item can be
selected multiple times. This menu item is grayed out (not available) when the histogram
is such that every unique value is contained in its own cell.
Decrease Number Cells in Histogram: Selecting this menu item decreases the number
of cells in the histogram meaning more values fall in each cell. This menu item can be
selected multiple times.
Use Anderson-Darling Test: Selects Anderson-Darling test (AD) as general test for
normality.
Use Shapiro-Wilks Test: Selects Shapiro-Wilks test (SW) as general test for normality.
Use Skewness-Kurtosis All Test: Selects Skewness-Kurtosis All test (SK All) as
general test for normality.
Size To Fit: Sizes plot to fit window. Plot will shrink and expand to fit window when
window is resized.
Fixed Size - Normal: Sizes the plot so it is easy to read. If the plot is too large to fit the
window, scroll bars are added.
Fixed Size - Custom: Can specify the size of the plot. If the plot is too large to fit the
window, scroll bars are added.
Print: Prints the plot.
Copy to Clipboard: Copies the plot to the clipboard in Windows Meta file (Picture)
format.
Copy Transformation Equation to Clipboard: Copies the equation for the
transformation to the clipboard in EXCEL format.
22 Chapter 3
Setting Tolerance Intervals Options
Default options for the Test Distribution and Skewness-Kurtosis Plot windows are set using
Analysis Options dialog box displayed using the Analysis menu. Changing these options will
not affect existing windows but will affect all new windows and future sessions.
Confidence Level Combo Box: Confidence level as a percentage to use for constructing
tolerance intervals and confidence statements. Must be a real number greater than equal
to 50.0 but below 100.0. Recommend 95.0.
Percentage in Interval Combo Box: For tolerance intervals, percentage of values in the
interval. It must be a real number greater than 0.0 but below 100.0. Initially set to 99.0.
Type of Tolerance Interval Radio Group: For tolerance intervals, can select whether to
display an upper, lower or two sided tolerance interval. A fourth option is provided to
use the type of tolerance interval matching the type of specifications. A two sided
tolerance is used for two sided specification and so on. If no specs are provided, a two
sided tolerance interval is displayed.
Units to Use Radio Group: Can select whether to report estimated defect rates and
confidence statements relative to the spec limits in terms of the percent in spec, percent
out of spec or the reliability.
Program Details 23
Display Tolerance Intervals Only If Distribution Fits Check Box: The estimated defect
rate, tolerance interval and confidence statement relative to the spec limits are only valid
if the distribution used adequately fits the data. Checking this box will only display them
when one of the normality tests passes when applied to the transformed values. It is
recommended this box be checked.
Normality Test to Use Radio Group: It is recommended that the Skewness-Kurtosis All
Test (SK All) be used to test whether the distribution fits. However, two alternative tests
are also provided: Anderson-Darling and Shapiro-Wilks.
Initial Format to Display S-K Plot Radio Group: In Skewness-Kurtosis Plot window,
select whether to initially display either the positive and negative skewness regions or
just the positive skewness region.
When done, click the OK button. If any errors are found, an error message will be displayed
and the errors must be corrected before the dialog box can be closed. To set everything back
to the original defaults, click the Defaults button. You can also click the Advanced button to
display and change additional advanced options. This displays the Advanced Options dialog
box. To exit without updating the options, click the Cancel button or press the Esc key.
The Advanced Options dialog box is used to set advanced options that should not generally
be changed by the user.
Minimum Sample Size Combo Box: It is recommended that at least 15 samples be used
when fitting and testing distributions but the program allows the user to adjust this policy.
The program will not perform an analysis if the sample size is less than the specified
minimum sample size. The minimum sample size must be an integer of at least 8.
Once the options have been specified, click the OK button or press the Enter key to save the
results and close the dialog box. If any errors are found, an error message will be displayed
and the errors must be corrected before the dialog box can be closed. Clicking the Cancel
button or pressing the Esc key instead restores the options to their former values and closes
the dialog box.
24 Chapter 3
To modify the options used by the current Test Distribution window, use the Tolerance
Interval Options dialog box shown below and the menu items on the popup menu. Changing
these options will only affect the current Test Distribution window.
Program Details 25
Tab 2: Transformed Data
Tab 2 of the Test Distribution window displays a histogram of the transformed values along
with the density of the normal distribution. The equation used to transfer the data is shown at
the top. Also shown are:
Moments: The sample size, average, standard deviation, skewness and kurtosis (or
excess kurtosis) of the transformed values.
Test of Fit: Identical to Tab 1.
Capability Indexes: Identical to Tab 1.
Tolerance Interval: For transformed values. The inverse of the transformation equation
is applied to this interval to obtain the tolerance interval in Tab 1.
Confidence Statement Relative To Spec Limits: Identical to Tab 1.
By performing a right mouse click over the graphic or clicking the Menu button, the same
popup menu as on tab 1 will appear.
26 Chapter 3
Tab 3: Order - Plot
55.7
53.25
50.8
1
10
Tab 3 of the Test Distribution window displays a plot of the values in time order like the one
shown above. In order for this plot to be displayed, the Order column must be filled out in
the Data window. An analysis for whether shifts occurred over time is displayed on Tab 4:
Order - Analysis.
Size To Fit: Sizes plot to fit window. Plot will shrink and
expand to fit window when window is resized.
Fixed Size - Normal: Sizes plot so easy to read. If the plot is too large to fit the
window, scroll bars are added.
Fixed Size - Custom: Can specify the size of the plot. If the plot is too large to fit the
window, scroll bars are added.
Copy to Clipboard: Copies the plot to the clipboard in Windows Meta file (Picture)
format.
Program Details 27
Tab 4: Order Analysis
Tab 4 of the Test Distribution window displays the results of a change-point analysis to
determine if there are shifts in the data over time. In order for this analysis to be performed,
the Order column must be filled out in the Data window. A plot of the data in time order is
displayed on Tab 3: Order - Plot. If there are significant shifts, this tab will be boldrd in the
Test Distribution window.
A change-point analysis can detect multiple shifts. If it detects shifts, the shifts are listed in a
table. For each shift, the estimated first point or subgroup following the change is listed
along with the confidence level representing the confidence that the shift occurred. Only
shifts detected with 95% confidence or better are listed.
By performing a right mouse click over the graphic or clicking the Menu button, the same
popup menu as on Tab 3 will appear.
Torque
Plot of Data by Group
32
26
20
2
Tab 5 of the Test Distribution window displays a plot of the values by group like the one
shown above. In order for this plot to be displayed, the Group column must be filled out in
28 Chapter 3
the Data window. An analysis for whether differences exist between the groups is displayed
on Tab 6: Group - Analysis.
By performing a right mouse click over the graphic or clicking the Menu button, the same
popup menu as on Tab 3 will appear.
Torque
Analysis of Group Differences - Differences Found - Consider Analyzing Each Group Seperately
No significant differences were found between the standard deviations (p-value = 0.361)
Tab 6 of the Test Distribution window displays the results of several analyses to detect
differences between groups. In order for these analyses to be performed, the Group column
must be filled out in the Data window. A plot of the data by group is displayed on Tab 5:
Group - Plot. If significant differences between the groups are detected, this tab will be
bolded in the Test Distribution window. Any significant differences are highlighted in red.
Program Details 29
The following analyses are performed:
Group Medians Different: The second analysis performed is the Kruskall-Wallis test to
see if the medians of the groups are different. This is a nonparametric test that makes no
assumptions about the distributions of the groups. It is an alternative to the above
ANOVA. The p-value and confidence level are reported. A significant difference is
reported if the p-value is less than or equal to 0.05 and the results are highlighted in red.
Group Standard Deviations Different: The third analysis performed is the Levene's
test to see if the standard deviations of the groups are different. This analysis assumes
the data within the groups fits the normal distribution. The p-value and confidence level
are reported. A significant difference is reported if the p-value is less than or equal to
0.05 and the results are highlighted in red.
Comparison of Averages: The average of each group is displayed along with 95%
confidence intervals. This analysis assumes the data within the groups fits the normal
distribution and that the standard deviations are equal. If the ANOVA or Kruskall-Wallis
test is significant, this section can be used to further compare the group averages. A
Tukey-Kramer Multiple Comparison is also performed to determine which groups are
significantly different. For each group, there is a corresponding column with a "+" sign
in it. Any other groups with "x" signs in the same column are significantly different.
Above, group 1 is significantly different from groups 3 and 4 but is not significantly
different than group 2.
By performing a right mouse click over the graphic or clicking the Menu button, the same
popup menu as on Tab 3 will appear.
30 Chapter 3
Tab 7: Outliers
Value Z-Score
6.2 5.47
Tab 7 of the Test Distribution window displays any potential outliers in the data. These
points should be reviewed to see if they are in error and replaced if this is the case. They
might also represent extreme values for a long tailed distribution so transforming the data is
another option. If they cannot be eliminated and a distribution cannot be fit to the data, the
use of a nonparametric tolerance interval or attribute sampling plan may be required.
Value − Average
Standard Deviation
A z-score of 10 means the value is 10 standard deviations above the average. A point is
marked as likely being an outlier if it is more than 10 standard deviations from the average
(z-score greater than 10 or less than -10). Robust estimates of the average and standard
deviation are used for this calculation called 20% trimmed estimates. This is so that an
outlier does distinguish itself by inflating the estimate of the standard deviation. Points that
are from 4.5 to 10 standard deviations from the average are marked as either outliers relative
to the normal distribution or extreme values from a long tailed distribution.
The outlier shown above is from a set of data generated using the largest extreme value
distribution. Even though the one point is flagged as a potential outlier, it is in reality just
part of a long tail.
By performing a right mouse click over the graphic or clicking the Menu button, the same
popup menu as on Tab 3 will appear.
Program Details 31
3.4 Transforming Data
If the data does not fit the normal distribution; consider transforming it, among other things.
Before deciding to transform the data, consider the following items:
1. Did a shift occur in the middle of the data? This would indicate the process in
unstable. Generally the cause of the shift should be identified and eliminated.
2. Are there multiple sources that are different? For example, there might be
consistent differences between the different cavities of an injection molding process.
If a significant difference is detected between the cavities, consider testing each
cavity separately.
3. Is data truncated? For example, the supplier might 100% inspect the components
before shipping them. If product is 100% inspected and the 100% inspection removes
significant numbers of out of spec units, then the data must be handled as attribute
data rather than variables data. An attribute sampling plan can be used instead to
demonstrate any claims about units being in spec.
4. Is there poor measurement resolution? This will be evidenced by frequent ties in
the data. The Anderson-Darling and Shapiro-Wilks tests should not be used in this
case. Make sure the Skewness-Kurtosis based tests are used.
5. Are there outliers? If it can be demonstrated outlier values are measurement related,
they can be eliminated or replaced. One way of doing this when testing is
nondestructive is to retest it multiple times and demonstrate the result is consistently
different than the first test result. In which case, the result can be replaced by the
average of the retests. Outliers are easily confused with long tails of nonnormal
distributions, so don’t be afraid to try transforming the data in this case, once
reviewing the potential outliers.
6. Is there too much data? Nothing is truly normal. With enough data very small
departures from normality can be detected that are of no practical concern. If there
are over several hundred data points, it is better to use the estimates of the skewness
and kurtosis to judge it the data is sufficiently normal.
If none of these issues are identified and it appears the underlying reason the data failed the
normality tests is that it comes from some distribution other than the normal distribution,
then a transformation is appropriate.
The recommended method of determining the transformation is to click the Find Best
Distribution button in the Data window. This determines which of the available distributions
best fits the data with or without pre-transforming the data. It also uses all available methods
of fitting the distribution (method of moments and maximum likelihood). The distribution
producing the highest p-value for the Skewness-Kurtosis All test for normality when applied
to the transformed data is selected, excluding those not including the spec limits within the
range of the distribution. The result is displayed in the Selected Distribution group box of
the Data window.
32 Chapter 3
Pre-transforming the data is often helpful when the data has physical bounds. Pre-
transforming data takes data that is bounded and converts it to unbounded data. Bounds, if
they exist, can be specified along with the data in the Data window. Physical bounds and
pre-transforming data is defined in more detail on the Glossary. A simple example is to take
positive data, bounded below by zero, and apply the log pre-transformation to the data.
Including pre-transformations essentially doubles the number of distributions fit by
Distribution Analyzer. It means distributions like the log-gamma and log-Johnson can be fit
to the data.
Distribution Analyzer fits a distribution to the data using both of the commonly used
approaches: method of moments and maximum likelihood. However, it modifies both
approaches to ensure that the distribution fit includes the spec limits within the range of the
distribution. Distributions like the normal distribution are unbounded and never cause a
problem. Distributions like the lognormal and gamma are bounded below and it is possible
that the spec limit falls below the lower bound of the distribution. When this is the case, the
transformation associated with the distribution cannot be used to transform the spec limit so
no further analysis can be performed.
The method of moments approach does not even guarantee all the data will be within the
range of the distribution. Distribution Analyzer reduces the number of moments it fits to
ensure the data and spec limits are within the range of the distribution. For example, fitting
the lognormal distribution generally involves matching the average, standard deviation and
skewness of the distribution to that of the data. If this results in a lower bound that does not
include the spec limits and data, then a lognormal is fit to the data matching the lower bound,
average and standard deviation. A lower bound of one standard deviation below the lower
spec limit is used.
The maximum likelihood method guarantees all the data is within the range of the
distribution but does not guarantee the spec limits are. This method involves finding the
values for the parameters that maximizes the log-likelihood function. This optimization is
constrained to ensure the selected values include the spec limits.
If one knows the distribution most likely to fit the data up front,
then it is best to start with the known distribution, rather than
starting with the normal distribution. To fit a specific
distribution to the data, click the Select Distribution button in the
Data window. This displays the popup menu shown to the right
for selecting a specific distribution. When a distribution is
selected, it is fit to the data using both method of moments and
maximum likelihood. The best fit is returned. The difference
between selecting the No Transformation menu item and the
Normal menu item is that the No Transformation menu item
does not transform the data (Y=X) and the Normal menu item
transforms the data using the transformation Y = (X-Average) /
Standard Deviation.
Program Details 33
The last Custom menu item displays the Select Distribution to Fit Data dialog box shown
below. This dialog gives you complete control over the process including the distribution to
fit, the method used to fit the data and whether to pre-transform the data or not. The user is
encouraged to use the Find Best Distribution button in the Data window instead. Selecting
transformations has the potential of being abused. The distribution that best fits the data is
generally the one that should be used.
34 Chapter 3
Average Edit Box: First moment required for all distributions. It must be a real number.
Standard Deviation Edit Box: Second moment required for all distributions. It must be
a real number greater than zero.
Skewness Edit Box: Third moment. It is only displayed for certain distributions. Can
be any real number except for the largest extreme value (> -5.6051382), smallest extreme
value (< 5.6051382) and loglogistic (> -4.284783 and < 4.284783).
Kurtosis/Excess Kurtosis Edit Box: Fourth moment. It is only displayed for certain
distributions. Enter either the kurtosis or excess kurtosis depending on which is selected
in the Analysis Options dialog box. The kurtosis must be a real number greater than 1.0
and satisfying Kurtosis > Skewness * Skewness + 1. The excess kurtosis must be a real
number greater than -2.0 and satisfying Excess Kurtosis > Skewness * Skewness - 2.
First Transform to Unbounded Distribution Check Box: Checking this box pre-
transforms the data using the Lower and/or Upper bounds of the currently selected tab in
the Data window. If no bounds are provided, this box is grayed out.
Minimum Value Edit Box: Specifying a minimum value forces the distribution to
include this value in the range of the fitted distribution. It may be left blank. The value
has no affect on unbounded distributions but does effect bounded distributions. This
value is by default set to one standard deviation below the lower spec limit, if one exists,
to ensure the lower spec limit is within the range of the distribution fit to the data. This
ensures the lower spec limit can be transformed along with the data.
Maximum Value Edit Box: Specifying a maximum value forces the distribution to
include this value in the range of the fitted distribution. It may be left blank. The value
has no affect on unbounded distributions but does affect bounded distributions. This
value is by default set to one standard deviation above the upper spec limit, if one exists,
to ensure the upper spec limit is within the range of the distribution fit to the data. This
ensures the upper spec limit can be transformed along with the data.
A picture of the distribution fit to the data is displayed if the selected approach can fit the
data. Otherwise, an error message is displayed. The parameters of the distribution are given
on the second tab. The corresponding transformation in EXCEL format is displayed on the
third tab. Right clicking the mouse while the cursor is over one of these tabs displays a
popup menu for printing the graphic or copying it to the clipboard.
Click the Find Best Distribution button to determine which of the available distributions best
fits the data with or without pre-transforming the data. The distribution producing the
highest p-value for the Skewness-Kurtosis All Test for normality when applied to the
transformed data is selected. This button attempts to return a distribution that covers the
specified range but, in certain cases, the method of moments approach may fit a distribution
not covering the range.
When done, click the OK button. If any errors are found, an error message will be displayed
and the errors must be corrected before the dialog box can be closed. To exit without
changing the distribution, click the Cancel button or press the Esc key.
Program Details 35
3.5 Skewness-Kurtosis Plot Window
The Skewness-Kurtosis Plot window is a child window that displays a skewness-kurtosis
plot. This plot is useful for exploring the shapes and relationships of the different
distributions. This window is displayed by selecting the View Skewness-Kurtosis Plot menu
item on the Analysis menu, by clicking on the View Skewness-Kurtosis Plot button on the
toolbar, or by clicking the Skewness-Kurtosis Plot button in the Data window.
Use the Distributions panel at the right of the window to select which distributions and
family of distribution to display. The skewness and kurtosis of any data entered in the Data
window can also be displayed on the plot.
A skewness-kurtosis plot indicates the range of skewness and kurtosis values a distribution
can fit. Two-parameter distributions like the normal distribution are represented by a single
point. Three parameters distributions like the lognormal distribution are represented by a
curve. Four parameter distributions like the beta distribution are represented by a shaded
region. At the bottom of the plot is a gray shaded region called the impossible region. No
distributions can fall into this region.
• Locate the point on the plot that corresponds to a set of data and see which
distributions are nearby and might fit the data.
• See which distributions are close to each other. For example, the exponential
distribution is at the point where the gamma and Weibull distributions intersect and is
a special case of both distributions. Another example is that the normal distribution is
on the curve of the lognormal distribution. The lognormal distribution limits to the
normal distribution as the skewness goes to zero.
36 Chapter 3
• See the relationships between distributions. For example, the lognormal distribution's
curve is above the gamma distribution's curve. This means that for the same
skewness, the lognormal distribution has a higher kurtosis (heavier tails) than the
gamma distribution.
There are a number of buttons at the bottom on the window to alter, print and copy the
skewness-kurtosis plot:
View Distributions Button: Displays the Select/View Distribution dialog box to find
further information about the distributions including viewing the density function,
calculating probabilities and determining bounds. This dialog box is described more
fully later in the section.
Print Button: Prints the skewness-kurtosis plot. Same as clicking the Print button on
the toolbar or selecting the Print menu item on the File menu.
Uncheck All Button: Removes all checks from distributions in the Distributions panel
in order to start over selecting distributions.
+/- View Button: Displays both the positive and negative skewness regions of the plot
by setting the scale for the X-axis from -3 to 3 in the Plot Options dialog box.
+ View Button: Displays only the positive skewness region of the plot by setting the
scale for the X-axis from 0 to 3 in the Plot Options dialog box.
Show Data Check Box: When checked, points are displayed on the skewness-kurtosis
plot representing data found on the tabs in the Data window. If lower or upper physical
bounds are also specified, then points are also displayed representing the pre-transformed
data.
To close the Skewness-Kurtosis Plot window, click the X in the upper right corner.
Program Details 37
Select/View Distribution Dialog Box
The Select/View Distribution dialog box is used to specify a distribution and its parameters.
It is used by the Skewness-Kurtosis Plot window to view different distributions to better
understand their shapes. It is also used by the Generate Random Values dialog box to select
a distribution to use when generating random numbers. This dialog box is displayed by
clicking the View Distributions button on the Skewness-Kurtosis Plot window and by
clicking the Select Distribution button on the Generate Random Values dialog box.
Moments Group: Used to enter up to four moments depending on the distribution. All
distributions require the first two moments (average and standard deviation). Many also
require the skewness. A few require both the skewness and kurtosis.
Average Edit Box: First moment required for all distributions. It must be a real number.
38 Chapter 3
Standard Deviation Edit Box: Second moment required for all distributions. It must be
a real number greater than zero.
Skewness Edit Box: Third moment. It is only displayed for certain distributions. Can
be any real number except for the largest extreme value (> -5.6051382), smallest extreme
value (< 5.6051382) and loglogistic (> -4.284783 and < 4.284783).
Kurtosis/Excess Kurtosis Edit Box: Fourth moment. It is only displayed for certain
distributions. Enter either the kurtosis or excess kurtosis depending on which is selected
in the Analysis Options dialog box. The kurtosis must be a real number greater than 1.0
and satisfying Kurtosis > Skewness * Skewness + 1. The excess kurtosis must be a real
number greater than -2.0 and satisfying Excess Kurtosis > Skewness * Skewness - 2.
Otherwise, the specified value falls into the impossible region.
X Edit Box: Used to explore the distribution. Enter a value for X and the percent of
values less than X will be displayed (distribution function). For the distribution shown
above, 63.2% of values fall below 0.
P Edit Box: Used to explore the distribution. Enter a percentage for P and the value of
X that has P% of values below it will be displayed (inverse of distribution function). For
the above example, 50% of the values fall below -0.31.
A picture of the selected distribution is displayed if the moments are valid. Otherwise, an
error message is displayed. The parameters of the distribution are also given on a second tab.
Right clicking the mouse while the cursor is over one of these tabs displays a popup menu
used for printing the graphic or copying it to the clipboard.
When done, click the OK button. If any errors are found, an error message will be displayed
and the errors must be corrected before the dialog box can be closed. To exit without
changing the distribution, click the Cancel button or press the Esc key.
Program Details 39
Lines Changes the labels, values and styles associated with the lines.
Other Changes the resolution of the plot and whether to automatically
update axis scales.
When you are done, click the OK button to close the dialog box and update the plot. An
alternative is to press the Enter key. Clicking the Cancel button instead closes the dialog box
without saving any changes. An alternative is to press the Esc key. The Help button
provides help concerning the Plot Options dialog box.
The Main Title edit box is used to change the text for the main title. It may be left blank if
no title is desired. The Main Title Size combo box is used to select the size for displaying
the main title. Normal size is the same size as the system font. The default is twice the
normal size. You can select one of the values in the list or type your own value. Entering a
value of 1.3 results in the title being 1.3 times normal size. The Main Title Color list box is
used to change the color of the main title.
The Sub Title edit box, Sub Title Size combo box and Sub Title Color list box are used to
enter an optional subtitle. By default it is blank. They work the same as the main title
controls.
40 Chapter 3
Plot Options Dialog Box – Axis Style Tab
The Axis Style tab is used to change the axis style, tic mark style, tic mark size and the plot’s
aspect ratio.
The Axis Style radio buttons are used to select one of four different axis styles. The Tic
Style radio buttons are used to select from three different tic styles. Tic marks can appear
next to the values on an axis as well as in between values. The Tic Size Number or Label
combo box is used to select the size for displaying tics marks appearing next to values. The
Tic Size Between combo box is used to select the size for displaying tics marks appearing
between values. The default is to make the tic marks appearing next to values 0.4 times
normal size and the tic marks appearing between values smaller at 0.2 times normal size.
The Ratio X-Axis to Y-Axis edit box is used to change the plot’s aspect ratio. A value of 1
results in a square plotting region. A value of 2 makes the x-axis twice as long as the y-axis.
A value of 0.5 makes the x-axis half as long as the y-axis. The y-axis size remains fixed.
Program Details 41
Plot Options Dialog Box – Y-Axis Tab
The Y-Axis tab is used to change the y-axis label, scale, tic marks and orientation. Double
clicking on the y-axis displays this tab.
The Text edit box is used to change the axis label. By default this is the phrase “Kurtosis” or
“Excess Kurtosis”. The Label Size combo box is used to select the size for displaying the
axis label. The default is 1.5 times normal size. The Display Label Vertically check box
allows the label to be displayed either vertically or horizontally.
The Minimum Value and Maximum Value edit boxes are used to change the scale of the y-
axis. The minimum value must be less than the maximum value. The Value Size combo
box is used to select the size for displaying axis values. The default is normal size.
The Number of Values on Axis edit box is used to change the number of values displayed
along the y-axis. The minimum is two. The default is 3. The Number of Tics Between
Values edit box is used to change the number of tic marks displayed between each value on
the y-axis. A value of zero can be entered. The default is 4.
42 Chapter 3
Plot Options Dialog Box – X-Axis Tab
The X-Axis tab is used to change the x-axis label, scale and tic marks. Double clicking on the
x-axis displays this tab. It functions identical to the Y-Axis tab except it affects the bottom or
x-axis.
The Key Title edit box is used to change the text for the key title. It may be left blank if no
title is desired. The Key Title Size combo box is used to select the size for displaying the key
title. Normal size is the same size as the system font and is the default size. You can select
one of the values in the list or type your own value. Entering a value of 1.3 results in the title
being 1.3 times normal size. The Key Title Color list box is used to change the color of the
key title.
The Display Key To Line check box is used to specify whether to display the key to the
lines. Generally the key is desired but can be removed for purposes of printing and copying
to the clipboard. The Size of Line Labels in Key combo box is used to select the size for
displaying the line labels in the key. The default is normal size.
Program Details 43
Plot Options Dialog Box – Lines Tab
The Lines tab is used to change the label and style of a point, line or region representing a
distribution. A short cut is to double click on the key for the lines. Double clicking on a line
in either the plot or key displays this tab.
44 Chapter 3
The Line tabs are used to select which line to change. The values displayed in the other
controls are for the line whose tab is selected.
The Label edit box is used to change the label shown next to the line in the key for lines.
The Color list box is used to change the color of the line. The Style list box is used to select
the line style. If a line style other than a solid line is selected, the line thickness is set to
hairline. The Thickness combo box is used to change the line thickness of solid lines. Only
solid lines can have a thickness other than hairline. If a thickness other than hairline is
selected, the line style will be set to solid. The Interior Color list box is used to change the
color of the region. The Interior Style list box is used to select the region style.
The Automatic Scaling check box is used to select whether automatic scaling is active.
Changing the x-axis scale can cause the curves to shift so that they are no longer visible on
the plot. Automatic scaling automatically changes the plot's scales whenever such a change
occurs so that the curves remain visible. Uncheck this box if you want to override the default
scaling. It is recommended auto-scaling not be used.
The Resolution list box is used to select the plot’s resolution. The resolution is the number
of points used to construct the curve. The default is low resolution which speeds up the
drawing of the plots. Higher resolutions can be used to produce smoother plots.
Program Details 45
3.6 Generating Random Values
Random values from any of the distributions can be generated using the Generate Random
Values dialog box. This dialog box is displayed by selecting the Generate Random Values
menu item on the Analysis menu or by clicking on the Generate Random Values button on
the toolbar.
This dialog box is used to create a set of data by generating random values that are saved in
the currently selected tab in the Data window. You can select the number of values, the
precision and the distribution.
Number of Random Values to Generate Edit Box: Enter the number of values to
generate. It must be an integer from 1 to 9,999. At least 15 are required to perform an
analysis using the generated data (may be changed in the Advanced Options dialog box).
Precision Edit Box: Specifies the precision values are rounded to. For example, if a
precision of 0.2 is specified, values will be a multiple of 0.2. This will result in the
values like -0.2, 0.0, 0.2, 0.4, and 0.6. It must be a positive number or blank.
46 Chapter 3
Select Distribution Button: Click this button to select a different distribution or to
change the parameters of the distribution. It displays the Select/View Distribution dialog
box described in the previous section.
The currently selected distribution is displayed in the graphic box. Right clicking the mouse
while the cursor is over the graphic displays a popup menu for printing the graphic or
copying it to the clipboard.
Click the Generate Random Values button to generate the data and save it in the currently
selected tab of the Data window. If the tab already contains data, you are prompted to make
sure it is OK to overwrite the data. To exit without generating any data, click the Cancel
button or press the Esc key.
A second way to generate random values is to perform dice experiments. The Dice
Experiments dialog box is displayed by selecting the Dice Experiments menu item on the
Analysis menu.
This dialog box is used to create a set of data points mimicking the rolling of dice that are
saved in the currently selected tab in the Data window. This is intended to aid in
demonstrating how adding dice tends to the normal distribution and multiplying dice tends to
the lognormal distribution.
Number Dice Edit Box: Specifies the number of dice to roll and either add or multiply
together in order to create a single data point. It must be an integer from 1 to 20.
Number Data Points Edit Combo Box: Enter the number of data points to generate. It
must be an integer from 1 to 10000.
Click the Roll the Dice button to generate the data and save it in the currently selected tab of
the Data window. If the tab already contains data, you are prompted to make sure it is OK to
Program Details 47
overwrite the data. To exit without generating any data, click the Cancel button or press the
Esc key.
Dice experiments can be used to demonstrate the central limit theorem. The central limit
theorem states that as items are added and subtracted together, under certain restrictions, the
result will tend to the normal distribution. The normal distribution is the distribution of
addition and subtraction. To see the central limit theory in practice, go to the Dice
Experiments dialog box and specify 4 dice be added together as shown above. The normal
distribution fits the resulting data as shown below.
Add 4 Dice
No Transformation (Normal Distribution)
Similarly, the lognormal distribution is the distribution of multiplication and division. The
central limit theorem states that as positive items are multiplied and divided, under certain
restrictions, the result will tend to the lognormal distribution. To see the central limit theory
in practice, go to the Dice Experiments dialog box and specify 4 dice be multiplied together.
The lognormal distribution fits the resulting data as shown below.
Multiply 4 Dice
Lognormal Family (151.719, 227.061, 7.596354)
48 Chapter 3
4 Menus and Toolbar
This chapter covers the menus and toolbar. Many of the menu items have already been
covered elsewhere. However, a few have not yet been described. This chapter explains
those remaining menu items.
File menu: Used to start a new session, save a session, open a previously saved
session, setup the printer, print and exit the program.
Edit menu: Used to help edit the spreadsheet in the Data window. It includes
menu items for copying, cutting, pasting and clearing cells and for
adding, deleting, moving and renaming sheets.
Analysis menu: Used to perform different analyses including testing whether the
selected distribution fits the data, selecting specific distributions and
selecting the best distribution. Also contains items to learn more about
the different distributions including generating example sets of data
and skewness-kurtosis plots.
Window menu: Used to rearrange and select child windows displayed in interior of
main window.
Help menu: Used to obtain help, register the software, obtain technical support and
to link to our web site.
New Begins a new session. Shortcuts are Ctrl+N and the button on the
toolbar. First asks whether to save the current session.
Open... Restores a previously saved session. An Open File dialog box will be
opened for selecting the file to open. Distribution Analyzer files have
the extension "da". Shortcuts are Ctrl+O and the button on the
toolbar. You will first be asked whether to save the current session.
Save Saves changes to a session back into the file which was originally
opened to start the session. If no file is associated with the session,
the Save As menu item is automatically invoked instead. The file
associated with a session is displayed on the caption bar of the main
window. Shortcuts are Ctrl+S and the button on the toolbar.
Save As... Saves the session in a new file. A Save File dialog box will be
opened for selecting the file to save the session in. An extension of
"da" will automatically be added to indicate it is a Distribution
Analyzer file. Shortcut is the button on the toolbar.
50 Chapter 4
Print Prints the contents of the active child window. Shortcuts are Ctrl+P
and the button on the toolbar.
Print Setup… Displays the Print Setup dialog box for selecting the printer to use
and for changing printer options such as portrait versus landscape,
paper size and paper source.
The Name list box is used to select which printer to print to. Clicking
on the Properties button displays a dialog box for modifying
additional printer options.
In addition, the most recently saved sessions are displayed between the Print Setup and Exit
menu items. Selecting one of them will reopen that session.
The Edit menu is the second menu on the menu bar. It is used to help edit the spreadsheet in
the Data window. It includes menu items for copying, cutting, pasting and clearing cells and
for adding, deleting, moving and renaming sheets. The available menu items are:
Cut Cuts currently selected cells from the spreadsheet. They are placed
in the clipboard so that they can be pasted into this or some other
program. Selected cells are cleared (made blank). Shortcut is
Ctrl+X.
Paste Pastes cells from the clipboard into the spreadsheet. All cells in the
clipboard will be pasted with the selected cell being the top-left
pasted cell. Shortcut is Ctrl+V.
Select All All cells containing data are selected. This is typically used in
conjunction with the Copy menu item to copy the entire data sheet
into the clipboard. Shortcut is Ctrl+A.
Clear Cells Clears (makes blank) the currently selected cells of the spreadsheet.
Prompts first to confirm.
52 Chapter 4
Add Sheet Inserts a new data sheet behind the current sheets.
Delete Sheet Deletes the current data sheet. Prompts first to confirm.
Rename Sheet Displays the Rename Sheet dialog box for entering a new name for
the current sheet. Shortcut is to double click on the current name on
the tab.
Move Sheets Displays the Move Sheets dialog box for reordering the sheets. Drag
and drop the sheets into a new order or select a sheet and use the up
and down buttons to move the sheet in the list.
The Analysis menu is the third menu on the menu bar. It is used to perform different
analyses including testing whether the selected distribution fits the data, selecting specific
distributions and selecting the best distribution. Also contains items to learn more about the
different distributions including generating example sets of data and skewness-kurtosis plots.
The second group of buttons on the toolbar are shortcuts for the menu items from the
Analysis menu. The available menu items are:
Find Best Distribution Determines the best distribution to fit to the data. It fits
all the available distributions using both the method of
moments and maximum likelihood methods, with and
without pre-transforming the data. It returns the
distribution giving the highest p-value per the
Skewness-Kurtosis All test. The result is displayed in
the Selected Distribution box of the Data window.
Shortcuts are to click the Find Best Distribution button
on the toolbar or in the Data window.
54 Chapter 4
View Skewness-Kurtosis Plot Displays a skewness-kurtosis plot in a Skewness-
Kurtosis Plot window summarizing the shapes each
distribution can take and illustrating relationships
between the different distributions. This is primarily a
learning tool. However, if data exists, its location is
shown on the plot. Shortcuts are to click the View
Skewness-Kurtosis Plot button on the toolbar or in the
Data window.
Fill Order/Group Column Displays the Fill Order/Group Column dialog box for
generating patterned data for the Group and Order
columns in the Data window. This saves time relative
to typing the information into these columns.
Generate Random Values Displays the Generate Random Values dialog box for
generating example sets of data from any of the
available distributions. This is a learning tool. A
shortcut is to click the Generate Random Values button
on the toolbar.
Display Tabs at Bottom By default, the tabs in the Data and Test Distribution
windows are displayed at the bottom. This menu item
can be used to move the tabs to the top of the window.
The Window menu is the fourth menu on the menu bar. It is used to rearrange the child
windows or to select a particular child window. A list of opened child windows is appended
to the bottom of this menu. These can be used to select a child window. Selecting a child
window brings it to the top. Clicking on a child window has the same affect but the menu
items can be used when the child window is buried under other windows.
Tile Displays all the child windows in a tile pattern. The windows are
reduced in size so that none of the windows overlap. A shortcut is to
press shift+F4.
Cascade Stacks the child windows slightly offset from each other so that the
caption of each window is visible. A shortcut is to press shift+F5.
Arrange Icons Places the icons of all child windows that are minimized along the
bottom of the parent window.
At the bottom of the Window menu, a list of currently opened child windows is displayed.
Selecting one of these menu items will activate that child window (display it on top of the
others).
56 Chapter 4
4.5 Help Menu
The Help menu is the last menu on the menu bar. It is used to obtain help, to register the
software, to obtain technical support and to link to our web site. The last group of buttons on
the toolbar provides short cuts for some of the more commonly used menu items on the Help
menu. The available menu items are:
Distribution Analyzer Home Page If you have an internet connection and a web
browser, displays the home page for Distribution
Analyzer containing downloads of the most recent
version and up to date information.
58 Chapter 4
4.6 Toolbar
The toolbar buttons provide short cuts for the more frequently used menu items. Holding the
mouse cursor over a toolbar button displays a popup description of the button. The menu
items associated with each button is given in the table below.
Open… File
Save File
Print File
Options Analysis
This chapter provides descriptions of the distributions and family of distributions included in
Distribution Analyzer.
0.4
0.3
µ=0 σ=1
0.2
σ=2
0.1
σ=3
x
-10 -5 5 10
( x −µ )2
1
f (x µ, σ ) =
−
2σ2
e for -∞ < x < ∞
σ 2π
Probabilities are obtained from the density function as areas under the curve. The probability
that a data point is between 1 and 2 is the area under the density function between 1 and 2.
For all density functions, the total area under the curve is one.
Distributions 61
The parameters of a distribution are variables included in the density function so that the
distribution can be adapted to a variety of situations. Of greatest importance is the number of
parameters as shown below:
2 Parameters: The two parameters determine the average and standard deviation of
the distribution. Such distributions are represented as a point on a
skewness-kurtosis plot as they have fixed values of the skewness and
kurtosis. Examples are the exponential, normal and uniform
distributions.
3 Parameters: The three parameters determine the average, standard deviation and
skewness of the distribution. Such distributions are represented as a
curve on a skewness-kurtosis plot as the kurtosis depends of the
skewness. Examples are the gamma and lognormal distributions.
Different books and articles will sometimes parameterize the same distribution differently.
One set of parameters can always be calculated from the other. Further, sometimes different
numbers of parameters are used so there are 2 and 3 parameter versions of the lognormal
distribution. This greatly complicates comparing and using distributions. For this reason,
Distribution Analyzer represents all the distributions in terms of their moments (average,
standard deviation, skewness and kurtosis). Further, the distributions are broadened to
always include at least two parameters so that they cover all possible averages and standard
deviations.
Certain distributions like the normal distribution and logistic distribution are unbounded.
Values generated from these distributions range from -infinity to infinity. Distributions like
the lognormal distribution and gamma distribution have lower bounds. They range from this
lower bound to infinity. Distributions like the negative of the lognormal distribution and
negative of the gamma distribution have upper bounds. They range from -infinity to this
upper bound. Finally distributions like the beta distribution have both upper and lower
bounds. They range from the lower bound to the upper bound.
A family of distributions is several distributions combined so that they cover a well define
region in a skewness-kurtosis plot. For example, the lognormal family of distributions
includes the lognormal, negative lognormal and normal distributions. This allows the family
to fit all possible average, standard deviation and skewness values. It appears as a curve in
the skewness-kurtosis plot. The lognormal, negative lognormal and normal distributions are
distinct distributions because they have different density and distribution functions.
62 Chapter 5
The negative of a distribution is the distribution of -X (negative of the values) when X varies
according to the distribution. For example, if the value X follows the lognormal distribution,
then -X follows the negative lognormal distribution.
The remainder of this chapter provides specific details concerning the distributions used by
Distribution Analyzer.
Distributions 63
5.2 Beta Distribution
Shape: The beta distribution is a 4-parameter distribution that is represented by a region
between the gamma curve and the impossible region (gray area) on a skewness-kurtosis plot
as shown below. For a specified skewness, it covers the following range of kurtosis values:
⎛ Skewness 2 ⎞
Skewness 2 + 1 < Kurtosis < 3⎜⎜1 + ⎟⎟
⎝ 2 ⎠
10
Gamma
7
Kurtosis
Beta
1
-3 -2 -1 0 1 2 3
Skewness
Density Function: A plot of the density function of the beta distribution is shown below:
f
4
η=0.5
3 η=2
η=1
2
x
0.2 0.4 0.6 0.8 1
64 Chapter 5
The equation, parameters and bounds of the density function are:
⎧ r r
⎪ 0 x <m− or x > m +
2 2
⎪ γ −1 η−1
⎪
f (x m, r, γ, η) = ⎨ ⎛ r⎞ ⎛ r ⎞
⎜x−m+ ⎟ ⎜m+ −x⎟
⎪ Γ (γ + η) ⎜ 2⎟ ⎜ 2 ⎟ m−
r
≤ x ≤m+
r
⎪ rΓ(γ )Γ(η) ⎜ r ⎟ ⎜ r ⎟ 2 2
⎪⎩ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
Moments: The moments of the beta distribution can be calculated from the parameters as
shown below:
⎛ γ 1⎞
Mean: m + r⎜⎜ − ⎟⎟
⎝ γ +η 2⎠
ηγ
Standard Deviation: r
(η + γ ) (η + γ + 1)
2
2(η − γ ) γ + η + 1
Skewness: which can range from -∞ to ∞.
ηγ (η + γ + 2)
Kurtosis:
[
3(η + γ + 1) 2(η + γ ) + γη(η + γ − 6)
2
]
ηγ (η + γ + 2)(η + γ + 3)
Properties:
Distributions 65
5.3 Exponential Distribution
Shape: The exponential distribution is a 2-parameter distribution and covers any specified
average and standard deviation. It is represented by a single point with a skewness of 2 and
kurtosis of 9 (excess kurtosis of 6) on a skewness-kurtosis plot as shown below. It is at the
intersection of the gamma and Weibull distributions.
10
Negative Exponential
Exponential
Gamma
7
Negative
Kurtosis
Gamma Weibull
Negative
4 Weibull
1
-3 -2 -1 0 1 2 3
Skewness
Density Function: The density function of the exponential distribution is shown below:
f
1
0.8 ε=0
λ=1
0.6
λ=2
0.4
λ=3
0.2 λ=4
x
2 4 6 8 10
66 Chapter 5
The equation, parameters and bounds of the density function are:
⎧ 0 x≤ε
⎪
f (x ε, λ ) = ⎨ 1 − λ
( x −ε )
x>ε
⎪⎩ λ e
Moments: The moments of the exponential distribution can be calculated from the
parameters as shown below:
Mean: ε+λ
Standard Deviation: λ
Skewness: 2
Kurtosis: 9
Properties:
• The exponential distribution is a special case of the both the gamma and Weibull
distributions falling at the intersection of these two curves on the skewness-kurtosis
plot.
Distributions 67
5.4 Extreme Value, Largest Family (Fréchet)
Shape: The largest extreme value family of distributions is made up of three distributions:
Fréchet, negative Weibull and largest extreme value. It covers any specified average,
standard deviation and any skewness below 5.6051382. Together they form a 3-parameter
family of distributions that is represented by a curve on a skewness-kurtosis plot as shown
below. The Fréchet distribution covers the portion of the curve with skewness above
1.139547. The negative Weibull distribution covers the portion of the curve with skewness
below 1.139547. The largest extreme value distribution handles the remaining case of
skewness equal to 1.139547.
10
Fréchet
7
Largest
Extreme
Kurtosis
Value
Negative
Weibull
4
-3 -2 -1 0 1 2 3
Skewness
Density Function - Largest Extreme Value Distribution: The density function of the
largest extreme value distribution is shown below:
f
0.35
0.3
µ=0, σ=1
0.25
0.2
0.15
0.1
0.05
x
-6 -4 -2 2 4 6
68 Chapter 5
The equation, parameters and bounds of the density function are:
⎡ − ( x − µ )⎥ ⎤
⎡ 1 ⎤
⎢ − 1 ( x −µ )− e ⎢⎣ σ ⎦⎥
1
f (x µ, σ ) =
⎢ σ ⎥
⎣⎢ ⎦⎥
e for -∞ < x < ∞
σ
Bounds: Unbounded
Moments - Largest Extreme Value Distribution: The moments of the largest extreme
value distribution can be calculated from the parameters as shown below:
π
Standard Deviation: σ = 1.2825498301618640955 σ
6
12 6 Zeta[3]
Skewness: = 1.1395470994046486575
π3
Kurtosis: 5.4
Density Function - Fréchet: The density function of the Fréchet distribution is shown
below:
f
2 ε = 0, σ = 1
η = 0.3
η=5
1.5
1
η=1
0.5
x
0.5 1 1.5 2 2.5 3
Distributions 69
The equation, parameters and bounds of the density function are:
⎧ η x − ε −η−1 −⎛⎜ x −ε ⎞⎟
−η
⎪ ⎛ ⎞
f (x ε, σ, η) = ⎨ σ ⎜ σ ⎟ e
⎝ σ ⎠
x>ε
⎝ ⎠
⎪ 0 x≤ε
⎩
Moments - Fréchet: The moments of the Fréchet distribution can be calculated from the
parameters as shown below:
⎛ 1⎞
Mean: ε + σΓ⎜⎜1 − ⎟⎟
⎝ η⎠
2
⎛ 2 ⎞ ⎛ ⎛ 1 ⎞⎞
Standard Deviation: σ Γ⎜⎜1 − ⎟⎟ − ⎜⎜ Γ⎜⎜1 − ⎟⎟ ⎟⎟
⎝ η ⎠ ⎝ ⎝ η ⎠⎠
3
⎛ 3⎞ ⎛ 2 ⎞ ⎛ 1 ⎞ ⎛ ⎛ 1 ⎞⎞
Γ⎜⎜1 − ⎟⎟ − 3Γ⎜⎜1 − ⎟⎟Γ⎜⎜1 − ⎟⎟ + 2⎜⎜ Γ⎜⎜1 − ⎟⎟ ⎟⎟
⎝ η⎠ ⎝ η ⎠ ⎝ η ⎠ ⎝ ⎝ η ⎠⎠
Skewness: 3
for η>3
⎛ ⎛ 2 ⎞ ⎛ ⎛ 1 ⎞⎞ 2
⎞ 2
⎜ Γ⎜1 − ⎟ − ⎜ Γ⎜ 1 − ⎟ ⎟ ⎟
⎜ ⎜⎝ η ⎟⎠ ⎜⎝ ⎜⎝ η ⎟⎠ ⎟⎠ ⎟
⎝ ⎠
⎛ 4⎞ ⎛ 3⎞ ⎛ 1⎞
Γ⎜⎜1 − ⎟⎟ − 4Γ⎜⎜1 − ⎟⎟Γ⎜⎜1 − ⎟⎟
⎝ η⎠ ⎝ η⎠ ⎝ η⎠
2 4
⎛ 2 ⎞⎛ ⎛ 1 ⎞ ⎞ ⎛ ⎛ 1 ⎞⎞
+ 6Γ⎜⎜1 − ⎟⎟⎜⎜ Γ⎜⎜1 − ⎟⎟ ⎟⎟ − 3⎜⎜ Γ⎜⎜1 − ⎟⎟ ⎟⎟
⎝ η ⎠⎝ ⎝ η ⎠ ⎠ ⎝ ⎝ η ⎠⎠
Kurtosis: 2
for η>4
⎛ ⎛ 2 ⎞ ⎛ ⎛ 1 ⎞⎞2 ⎞
⎜ Γ⎜1 − ⎟ − ⎜ Γ⎜1 − ⎟ ⎟ ⎟
⎜ ⎜⎝ η ⎟⎠ ⎜⎝ ⎜⎝ η ⎟⎠ ⎟⎠ ⎟
⎝ ⎠
70 Chapter 5
Properties:
• The largest extreme value family of distributions are the negatives of the smallest
extreme value family of distributions. In particular, the largest extreme distribution is
the negative of the smallest extreme value distribution.
• The negative Weibull and largest extreme value distributions are the distributions of
maximums. Under certain restrictions, the maximum of distributions without upper
bounds tends to the largest extreme value distribution and the maximum of
distributions bounded above tends to the negative Weibull.
Distributions 71
5.5 Extreme Value, Smallest Family (Weibull)
Shape: The smallest extreme value family of distributions is made up of three distributions:
Weibull, negative Fréchet and smallest extreme value. It covers any specified average,
standard deviation and any skewness above -5.6051382. Together they form a 3-parameter
family of distributions that is represented by a curve on a skewness-kurtosis plot as shown
below. The Weibull distribution covers the portion of the curve with skewness above -
1.139547. The negative Fréchet distribution covers the portion of the curve with skewness
below -1.139547. The smallest extreme value distribution handles the remaining case of
skewness equal to -1.139547.
10
Negative Fréchet
7 Smallest
Extreme
Kurtosis
Value
4
Weibull
-3 -2 -1 0 1 2 3
Skewness
Density Function - Smallest Extreme Value Distribution: The density function of the
smallest extreme value distribution is shown below:
f
0.35
0.3
0.25
0.2
0.15
0.1
0.05
x
-6 -4 -2 2 4 6
72 Chapter 5
The equation, parameters and bounds of the density function are:
⎡ ⎡1
( x − µ ) ⎤⎥ ⎤
⎢ 1 ( x −µ )− e ⎢⎣ σ ⎦⎥
1
f (x µ, σ ) =
⎢σ ⎥
⎣⎢ ⎦⎥
e for -∞ < x < ∞
σ
Bounds: Unbounded
Moments - Smallest Extreme Value Distribution: The moments of the smallest extreme
value distribution can be calculated from the parameters as shown below:
π
Standard Deviation: σ = 1.2825498301618640955 σ
6
12 6 Zeta[3]
Skewness: − = -1.1395470994046486575
π3
Kurtosis: 5.4
Density Function - Weibull: The density function of the Weibull distribution is shown
below:
f
0.8
η=2
ε = 0, σ = 1
0.6
0.4
η=1
0.2 η = 0.5
x
2 4 6 8
Distributions 73
The equation, parameters and bounds of the density function are:
⎧ 0 x≤ε
⎪
f (x ε, σ, η) = ⎨ η ⎛ x − ε ⎞
η
η−1 ⎛ x −ε ⎞
−⎜ ⎟
⎝ σ ⎠
⎪ σ ⎜⎝ σ ⎟⎠ e x>ε
⎩
Moments – Weibull: The moments of the Weibull distribution can be calculated from the
parameters as shown below:
⎛ 1⎞
Mean: ε + σΓ⎜⎜1 + ⎟⎟
⎝ η⎠
2
⎛ 2 ⎞ ⎛ ⎛ 1 ⎞⎞
Standard Deviation: σ Γ⎜⎜1 + ⎟⎟ − ⎜⎜ Γ⎜⎜1 + ⎟⎟ ⎟⎟
⎝ η ⎠ ⎝ ⎝ η ⎠⎠
3
⎛ 3⎞ ⎛ 2 ⎞ ⎛ 1 ⎞ ⎛ ⎛ 1 ⎞⎞
Γ⎜⎜1 + ⎟⎟ − 3Γ⎜⎜1 + ⎟⎟Γ⎜⎜1 + ⎟⎟ + 2⎜⎜ Γ⎜⎜1 + ⎟⎟ ⎟⎟
⎝ η⎠ ⎝ η ⎠ ⎝ η ⎠ ⎝ ⎝ η ⎠⎠
Skewness: 3
⎛ ⎛ 2 ⎞ ⎛ ⎛ 1 ⎞⎞2 ⎞ 2
⎜ Γ⎜1 + ⎟ − ⎜ Γ⎜1 + ⎟ ⎟ ⎟
⎜ ⎜⎝ η ⎟⎠ ⎜⎝ ⎜⎝ η ⎟⎠ ⎟⎠ ⎟
⎝ ⎠
⎛ 4⎞ ⎛ 3⎞ ⎛ 1⎞
Γ⎜⎜1 + ⎟⎟ − 4Γ⎜⎜1 + ⎟⎟Γ⎜⎜1 + ⎟⎟
⎝ η⎠ ⎝ η⎠ ⎝ η⎠
2 4
⎛ 2 ⎞⎛ ⎛ 1 ⎞ ⎞ ⎛ ⎛ 1 ⎞⎞
+ 6Γ⎜⎜1 + ⎟⎟⎜⎜ Γ⎜⎜1 + ⎟⎟ ⎟⎟ − 3⎜⎜ Γ⎜⎜1 + ⎟⎟ ⎟⎟
⎝ η ⎠⎝ ⎝ η ⎠ ⎠ ⎝ ⎝ η ⎠⎠
Kurtosis: 2
⎛ ⎛ 2 ⎞ ⎛ ⎛ 1 ⎞⎞2 ⎞
⎜ Γ⎜1 + ⎟ − ⎜ Γ⎜1 + ⎟ ⎟ ⎟
⎜ ⎜⎝ η ⎟⎠ ⎜⎝ ⎜⎝ η ⎟⎠ ⎟⎠ ⎟
⎝ ⎠
74 Chapter 5
Properties:
• The smallest extreme value family of distributions are the negatives of the largest
extreme value family of distributions. In particular, the smallest extreme distribution
is the negative of the largest extreme value distribution.
• The exponential distribution is a special case of the Weibull distribution and thus falls
on the Weibull curve in the skewness-kurtosis plot.
• The Weibull and smallest extreme value distributions are the distributions of
minimums. Under certain restrictions, the minimum of distributions without lower
bounds tends to the smallest extreme value distribution and the minimum of
distributions bounded below tends to the Weibull.
Distributions 75
5.6 Gamma Family of Distributions
Shape: The gamma family of distributions is made up of three distributions: gamma,
negative gamma and normal. It covers any specified average, standard deviation and
skewness. Together they form a 3-parameter family of distributions that is represented by a
curve on a skewness-kurtosis plot as shown below. The gamma distribution covers the
positive skewness portion of the curve. The negative gamma distribution covers the negative
skewness portion of the curve. The normal distribution handles the remaining case of zero
skewness. The gamma curve falls below the lognormal curve.
10
Negative Lognormal
7 Lognormal
Kurtosis
Negative
Gamma Normal Gamma
4
1
-3 -2 -1 0 1 2 3
Skewness
Density Function: The density function of the gamma distribution is shown below:
f
0.8 ε = 0, λ = 1
η=1
0.6
0.4 η=2
η=3
0.2 η=4
x
2 4 6 8 10
76 Chapter 5
The equation, parameters and bounds of the density function are:
⎧ 0 x≤ε
⎪ 1
f (x ε, λ, η) = ⎨
x −ε
η−1 − λ
⎪⎩ λη Γ(η)
(x − ε ) e x>ε
Moments: The moments of the gamma distribution can be calculated from the parameters as
shown below:
Mean: ε + λη
Standard Deviation: λ η
2
Skewness: which is always positive
η
6
Kurtosis: 3+
η
Properties:
• As the skewness goes to zero, both the gamma and negative gamma distributions
limit to the normal distribution. This means that in some cases the gamma and
normal distributions can be difficult to distinguish between. As a result, some sets of
data may fit both the gamma and normal distributions.
• The exponential distribution is a special case of the gamma distribution and thus falls
on the gamma curve in the skewness-kurtosis plot.
Distributions 77
5.7 Johnson Family
Shape: The Johnson family of distributions is made up of three distributions: Johnson SU,
Johnson SB and lognormal. It covers any specified average, standard deviation, skewness
and kurtosis. Together they form a 4-parameter family of distributions that covers the entire
skewness-kurtosis region other than the impossible region. The Johnson SU distribution
covers the area above the lognormal curve and the Johnson SB covers the area below the
normal curve.
10
Johnson SU
7 Johnson SB
Kurtosis
Lognormal
4
1
-3 -2 -1 0 1 2 3
Skewness
Density Function - SB: The density function of the Johnson SB distribution is shown below:
η = 0.5 5 ε = 0, λ = 1, γ = 1
η=2 3
2 η=1
x
-1.5 -1 -0.5 0.5 1 1.5
78 Chapter 5
The equation, parameters and bounds of the density function are:
⎧ 0 x ≤ m − 2r , x ≥ m + r
2
⎪⎪
f (x η, γ, r, m ) = ⎨
2
1⎛ ⎛ x −m+ r ⎞ ⎞
− ⎜ γ + η ln ⎜ 2 ⎟⎟
ηr 2 ⎜ ⎜ m+ −x ⎟
r ⎟
⎪ e ⎝ ⎝ 2 ⎠⎠
m − 2r < x < m + r
⎪⎩ 2π (x − m + 2 )(m + 2 − x )
r r 2
Moments - SB: The moments of the Johnson SB distribution do not have a simple
expression.
Density Function - SU: The density function of the Johnson SU distribution is shown below:
f
0.7 ε = 0, λ = 1, γ = 1
η=2 0.6
0.5
η=1 0.4
0.3
0.2 η = 0.5
0.1
x
-3 -2 -1 1 2 3
Distributions 79
The equation, parameters and bounds of the density function are:
2
1⎛ ⎛ x −ε ⎞ ⎞
η − ⎜⎜ γ + η asinh ⎜ ⎟ ⎟⎟
f (x η, γ, λ, ε ) = e 2⎝ ⎝ λ ⎠⎠
for -∞ < x < ∞
(x − ε )2 + λ2 2π
Bounds: Unbounded
Moments - SU: The moments of the Johnson SU distribution can be calculated from the
parameters as shown below:
⎛ 1
⎛ γ ⎞⎞ e x − e −x
Mean: λ − e sinh⎜⎜ ⎟⎟ ⎟ + ε where sinh =
⎜ 2 η2
⎜ ⎝ η ⎠ ⎟⎠ 2
⎝
− 1 ⎛⎜ η2 ⎛ γ⎞ ⎞
η2 1
e
Standard Deviation: λ e cosh⎜⎜ 2 ⎟⎟ + 1⎟
2 ⎜ ⎝ η ⎠ ⎟⎠
⎝
1
1
⎛ 12 ⎞ 2⎛ 1
⎛ 1 ⎞ ⎞
e 2η
2
⎜ e η − 1⎟ ⎜ 3 sinh⎛⎜ γ ⎞⎟ + e η2 ⎜ e η2 + 2 ⎟ sinh⎛⎜ 3 γ ⎞⎟ ⎟
⎜ ⎟ ⎜ ⎜ η⎟ ⎜ ⎟ ⎜ η ⎟⎟
⎝ ⎠ ⎝ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎠
Skewness: − 3
⎛ 1
⎛ 2γ ⎞ ⎞
2
2 1 + e cosh⎜⎜ ⎟⎟ ⎟
⎜ η2
⎜ ⎝ η ⎠ ⎟⎠
⎝
⎛ 1 2
⎛ 1 ⎞ ⎞
⎜ 3 + 6e η2 + 4e η2 ⎜ e η2 + 2 ⎟ cosh ⎛⎜ 2 γ ⎞⎟ ⎟
⎜ ⎜ ⎟ ⎜η⎟ ⎟
⎝ ⎠ ⎝ ⎠
⎜ ⎟
⎜ 2
⎛ 2 3 4
⎞ ⎟
⎜ η2 ⎜ η2
+ e − 3 + 3e + 2e + e η2 η2 ⎟ cosh⎛⎜ 4 γ ⎞⎟ ⎟
⎜ ⎜ ⎟ ⎜ η ⎟⎟
⎝ ⎝ ⎠ ⎝ ⎠⎠
Kurtosis: 2
⎛ 1
⎛ 2γ ⎞ ⎞
2 1 + e cosh ⎜⎜ ⎟⎟ ⎟
⎜ η2
⎜ ⎝ η ⎠ ⎟⎠
⎝
e x + e−x
where cosh = .
2
80 Chapter 5
5.8 Loglogistic Family
Shape: The loglogistic family of distributions is made up of three distributions: loglogistic,
negative loglogistic and logistic. It covers any specified average, standard deviation and any
skewness in the range of -4.2847830295411833030 to 4.2847830295411833030. Together
they form a 3-parameter family of distributions that is represented by a curve on a skewness-
kurtosis plot as shown below. The loglogistic distribution covers the positive skewness
portion of the curve. The negative loglogistic distribution covers the negative skewness
portion of the curve. The logistic distribution handles the remaining case of zero skewness.
10
Negative Loglogistic
Loglogistic
7
Kurtosis
Logistic
1
-3 -2 -1 0 1 2 3
Skewness
Density Function - Logistic: The density function of the logistic distribution is shown
below:
f
0.25
a=0
0.2 σ=1
0.15
σ=2
0.1
0.05 σ=3
x
-15 -10 -5 5 10 15
Distributions 81
The equation, parameters and bounds of the density function are:
x −a
e
f (x a , b ) =
b
2
for -∞ < x < ∞
⎛ x −a
⎞
b⎜1 + e b ⎟⎟
⎜
⎝ ⎠
Bounds: Unbounded
Moments - Logistic: The moments of the logistic distribution can be calculated from the
parameters as shown below:
Mean: a
πb
Standard Deviation:
3
Skewness: 0
Kurtosis: 4.2
1.2
c=5
0.8
c=4
0.6
0.4
0.2
x
0.5 1 1.5 2 2.5 3
82 Chapter 5
The equation, parameters and bounds of the density function are:
x− a
e
f (x a , b ) =
b
2
for -∞ < x < ∞
⎛ x −a
⎞
b⎜1 + e b ⎟⎟
⎜
⎝ ⎠
Moments - Loglogistic: The moments of the loglogistic distribution can be calculated from
the parameters as shown below:
π ⎛π⎞
Mean: a + b Csc⎜ ⎟ (Csc(x) = 1/sin(x))
c ⎝c⎠
2
2π ⎛ 2π ⎞ ⎛ π ⎛ π ⎞⎞
Standard Deviation: b Csc⎜ ⎟ − ⎜⎜ Csc⎜ ⎟ ⎟⎟ for c>2.
c ⎝ c ⎠ ⎝c ⎝ c ⎠⎠
3
⎛π⎞ ⎛π⎞ ⎛ 2π ⎞ ⎛ 3π ⎞
2π Csc⎜ ⎟ − 6cπCsc⎜ ⎟Csc⎜ ⎟ + 3c 2 Csc⎜ ⎟
2
π ⎜ − π Csc⎜ ⎟ + 2c Csc⎜ ⎟ ⎟
⎜ ⎝c⎠ ⎝ c ⎠ ⎟⎠
⎝
4
⎛π⎞ ⎛π⎞ ⎛ 3π ⎞
− 3π Csc⎜ ⎟ − 12πc 2 Csc⎜ ⎟Csc⎜ ⎟
3
⎝c⎠ ⎝c⎠ ⎝ c ⎠
3
⎛ 4π ⎞ ⎛π⎞ ⎛π⎞
+ 4c 3 Csc⎜ ⎟ + 6cπ 2 Csc⎜ ⎟ Sec⎜ ⎟
Kurtosis: ⎝ c ⎠ ⎝c⎠ ⎝c⎠
2
⎛ ⎛ π ⎞
2
⎛ 2π ⎞ ⎞
π⎜ − π Csc⎜ ⎟ + 2c Csc⎜ ⎟ ⎟
⎜ ⎝c⎠ ⎝ c ⎠ ⎟⎠
⎝
(Sec(x) = 1/cos(x))
Properties:
Distributions 83
5.9 Lognormal Family
Shape: The lognormal family of distributions is made up of three distributions: lognormal,
negative lognormal and normal. It covers any specified average, standard deviation and
skewness. Together they form a 3-parameter family of distributions that is represented by a
curve on a skewness-kurtosis plot as shown below. The lognormal distribution covers the
positive skewness portion of the curve. The negative lognormal distribution covers the
negative skewness portion of the curve. The normal distribution handles the remaining case
of zero skewness.
10
Negative Lognormal
7 Lognormal
Kurtosis
Normal
4
1
-3 -2 -1 0 1 2 3
Skewness
Density Function: The density function of the lognormal distribution is shown below:
f
σ=2
0.5
σ=0.5 ε=0, µ=1
0.4
σ=1
0.3 σ=0.3
0.2
0.1
x
2 4 6 8 10
84 Chapter 5
The equation, parameters and bounds of the density function are:
⎧ 0 x≤ε
⎪
f (x ε, µ, σ ) = ⎨
(ln ( x − ε )−µ )2
1 −
e 2σ2
x>ε
⎪ (x − ε )σ 2π
⎩
Moments: The moments of the lognormal distribution can be calculated from the
parameters as shown below:
⎛ σ2 ⎞
⎜ µ+ ⎟
⎜ 2 ⎟
Mean: ε+e ⎝ ⎠
Standard Deviation:
2
(
e (2µ +σ ) e σ − 1
2
)
Skewness:
2
eσ − 1 eσ + 2 ( 2
)
Kurtosis: ( 2
)( 2
3 + e σ − 1 e 3σ + 3e 2σ + 6e σ + 6
2 2
)
Properties:
• As the skewness goes to zero, both the lognormal and negative lognormal
distributions limit to the normal distribution. This means that in some cases the
lognormal and normal distributions can be difficult to distinguish between. As a
result, some sets of data may fit both the lognormal and normal distributions.
Distributions 85
5.10 Normal Distribution
Shape: The normal distribution is a 2-parameter distribution and covers any specified
average and standard deviation. It is represented by a single point with a skewness of zero
and kurtosis of three (excess kurtosis of zero) on a skewness-kurtosis plot as shown below:
10
7
Kurtosis
Normal
4
1
-3 -2 -1 0 1 2 3
Skewness
Density Function: The density function of the normal distribution is shown below:
f
0.4
0.3 σ=1
µ=0
0.2
σ=2
0.1 σ=3
x
-10 -5 5 10
86 Chapter 5
The equation, parameters and bounds of the density function are:
⎧ 0 x≤ε
⎪
f (x ε, µ, σ ) = ⎨
(ln ( x − ε )−µ )2
1 −
e 2σ2
x>ε
⎪ (x − ε )σ 2π
⎩
Bounds: Unbounded
Moments: The moments of the normal distribution can be calculated from the parameters as
shown below:
Mean: µ
Standard Deviation: σ
Skewness: 0
Kurtosis: 3
Properties:
• The normal distribution is the distribution of addition and subtraction. The central
limit theorem states that as items are added and subtracted together, under certain
restrictions, the result will tend to the normal distribution.
Distributions 87
5.11 Pearson Family
Shape: The Pearson family of distributions is made up of seven distributions: Type I-VII. It
covers any specified average, standard deviation, skewness and kurtosis. Together they form
a 4-parameter family of distributions that covers the entire skewness-kurtosis region other
than the impossible region. The seven types are described below.
4 IV
I II
-2
-3 -2 -1 0 1 2 3
Skewness
Density Function – Type IV: The density function of the Type IV Pearson distribution is
shown below:
f
1.4
m=3
1.2 ε = 0, s = 1, v=2
1
m=5 0.8
0.6
m=7 0.4
0.2
x
-2 -1 1 2
88 Chapter 5
The equation, parameters and bounds of the density function are:
−m ⎛ x −ε ⎞
⎛ ⎛ x − ε ⎞2 ⎞ − v tan −1 ⎜
f (x ε, s, v, m ) = C⎜1 + ⎜
⎟
⎟ ⎝ s ⎠
⎜ ⎝ s ⎟⎠ ⎟
e for –∞< x < ∞
⎝ ⎠
1 π
1 − vπ
where C = where F(r,v) = e 2 ∫ e v φ sin(φ) r dφ
s F[2m − 2, v ] 0
Bounds: Unbounded
Moments – Type IV: The moments of the Type IV Pearson distribution can be calculated
from the parameters as shown below:
2v 2m − 3
Skewness: −
m − 2 v + (2m − 2)2
2
Kurtosis:
(3m − 6)Skewness 2 + 6m − 9
2m − 5
4s
Standard Deviation:
16(r − 1) − Skewness 2 (r − 2)
2
6(Kurtosis - Skewness 2 - 1)
where r =
2Kurtosis - 3Skewness 2 - 6
vs Skewness (r - 2 )
Mean: ε− = ε + Standard Deviation ×
r 4
Density Function – Type V: The density function of the Type V Pearson distribution is
shown below:
f
1.2 ε = 0, γ = 1
1
p=3
0.8
0.2
x
0.5 1 1.5 2
Distributions 89
The equation, parameters and bounds of the density function are:
⎡ 0 x≤ε
f (x ε, γ, p ) = ⎢ γ
1 − p − x −ε
⎢ 1− p (x − ε ) e x>ε
⎣ γ Γ(p − 1)
Moments – Type V: The moments of the Type V Pearson distribution can be calculated
from the parameters as shown below:
γ
Mean: ε+ assuming p > 2
p-2
γ
Standard Deviation: assuming p > 3
(p − 2) (p − 3)
4 p−3
Skewness: assuming p > 4, which is always > 0
p−4
3(p + 4)(p − 3)
Kurtosis: assuming p > 5
(p − 4)(p − 5)
Density Function – Type VI: The density function of the Type VI Pearson distribution is
shown below:
f
2 q1 = 7
1.5
ε = 0, s = 1, q2=1
q1 = 5
1
0.5 q1 = 3
x
1.5 2 2.5 3 3.5 4
90 Chapter 5
The equation, parameters and bounds of the density function are:
q − q1
⎛x −ε ⎞ ⎛x −ε⎞
f (x ε, s, q 2 , q 1 ) = Constant ⎜
2
Moments – Type VI: The moments of the Type VI Pearson distribution can be calculated
from the parameters as shown below:
s(q 1 − 1)
Mean: +ε
q1 − q 2 − 2
s 2 (q 1 − 1)(q 2 + 1)
Standard Deviation:
(q1 − q 2 − 3)(q1 − q 2 − 2)2
2(q 1 + q 2 )
Skewness:
(q 1 − q 2 − 4)
Sign[s]
(q1 − 1)(q 2 + 1)
(q1 − q 2 − 3)
3(q 1 − q 2 − 3)(4 - 5 q 1 + 3 q 12 + 5 q 2 - 2 q 1q 2 + q 12 q 2 + 3 q 22 - q 1q 22 )
Kurtosis:
(q1 − 1)(q 2 + 1)(q1 − q 2 − 4)(q1 − q 2 − 5)
Distributions 91
5.12 Uniform Distribution
Shape: The uniform distribution is a 2-parameter distribution and covers any specified
average and standard deviation. It is represented by a single point with a skewness of zero
and kurtosis of 1.8 (excess kurtosis of -1.2) on a skewness-kurtosis plot as shown below:
10
7
Logistic
Kurtosis
4
Normal
Uniform
1
-3 -2 -1 0 1 2 3
Skewness
Density Function: The density function of the uniform distribution is shown below:
f
0.5
x
-0.5 -0.25 0.25 0.5 0.75 1 1.25 1.5
92 Chapter 5
The equation, parameters and bounds of the density function are:
⎧1 r r
⎪ m− ≤ x ≤m+
f (x m, r ) = ⎨ r 2 2
⎪⎩0 otherwise
Moments: The moments of the uniform distribution can be calculated from the parameters
as shown below:
Mean: m
r
Standard Deviation:
12
Skewness: 0
Kurtosis: 1.8
Properties:
Distributions 93
Glossary
Alpha Level See p-value.
Glossary 95
Both the Shapiro-Wilks test (p-value = 0.1311) and Skewness-
Kurtosis All test (p-value = 0.9930) pass this set of data. The
Shapiro-Wilks test is also affected by ties, but not nearly as bad
as the Anderson-Darling test. The Skewness-Kurtosis All test is
not affected by ties and thus the default test.
x1 + x 2 + L + x n ∑x i
= i =1
n n
where n is the sample size and xi represents the data points.
Bi-modal Data Histograms can appear to have multiple peaks (modes). Such
data is called bi-modal or multi-modal. As a result, it is likely to
fail a normality test.
Finally distributions like the beta distribution have both upper and
lower bounds. They range from the lower to the upper bound.
Bounds, Physical Physical bounds for the data occur when the data is restricted to
certain values. For example, the radius is bounded below by zero
as only positive values can result. Yield results are bounded
below by 0% and above by 100%. Speed, if the object can move
in two directions is unbounded. Most real data is bounded.
Bounded data can be more difficult to fit a distribution to. To aid
in fitting a distribution, it is often helpful to pre-transform the
96 Glossary
data. Physical bounds, if they exist, can be specified along with
the data in the Data window.
Capability Index Capability indexes, like Pp and Ppk, are measures of how well the
data fits within the specification limit(s) associated with
Statistical Process Control (SPC). The higher the values, the
better the fit. They are used by variables sampling plans to make
pass/fail decisions.
Child Window A sizable window displayed in the interior of the main or parent
window. Distribution Analyzer has a permanent Data window
and allows multiple Test Distribution and Skewness-Kurtosis Plot
windows, which are all child windows.
Plots and tables from the Test Distribution window are place into
the clipboard in Windows Meta file (picture) format. They can
be pasted into Word. In order to edit them in Word, they must
first be converted. To do this, right mouse click over the graphic
and select the Edit Picture menu item.
Confidence Level Confidence levels are associated with normal tolerance intervals,
variables sampling plans, tests of fit, tests of group differences
and tests if there was a shift over time (order). The confidence
statement can be thought of as representing the probability the
statement or conclusion is correct. By default, confidence levels
of 95% are used. You can adjust the confidence level used for
normal tolerance intervals and variables sampling plans using the
Analysis Options dialog box or Tolerance Interval Options dialog
box.
Glossary 97
For tests of fit (normality tests), the confidence level is based on
the significant level, alpha-level or p-value. The p-value is the
probability that the data or one more extreme than it would have
been generated had the data come from the selected distribution.
A p-value of 0.05 would indicate that the chance of the observed
data is low, 1 in 20, due to variation alone. This is good evidence
that the data is not from the selected distribution. A p-value of
0.5 would indicate that there is a 50-50 chance of the something
as extreme as the observed data assuming the selected
distribution. This is consistent with the selected distribution. The
smaller the p-value, the greater the evidence that the data did not
come from the selected distribution.
For tests of fit and other tests, the confidence level is calculated
from the p-value as 100*(1 - p-value). Therefore:
98 Glossary
f
0.4
0.3
µ=0 σ=1
0.2
σ=2
0.1
σ=3
x
-10 -5 5 10
Glossary 99
are the smallest extreme value family of distributions (including
the Weibull distribution) and largest extreme value family of
distributions. However, there are many other distributions that
might fit your data. The distributions and family of distributions
included in Distribution Analyzer are:
Beta Distribution
Exponential Distribution
Negative Exponential Distribution
Gamma Family of Distributions
Johnson Family of Distributions
Largest Extreme Value Family of Distributions
Loglogistic Family of Distributions
Lognormal Family of Distributions
Normal Distribution
Pearson Family of Distributions
Smallest Extreme Value Family of Distributions
Uniform Distribution
For every distribution there is a transformation that makes data
from that distribution fit the normal distribution. Identifying the
distribution that fits the data is identical to identifying the
transformation that makes the data fit the normal distribution.
Excess Kurtosis The excess kurtosis = kurtosis - 3. This results in the normal
distribution having an excess kurtosis of zero. An excess kurtosis
above 0 indicates the tails are heavier than the normal
distribution. An excess kurtosis below 0 indicates the tails are
lighter than the normal distribution. An excess kurtosis value of
1 and above or -1 and below represents a sizable departure from
normality. The formula used for estimating the excess kurtosis
from a set of data is:
∑ (x − X)
n
4
n (n + 1) 3(n − 1)
i 2
i =1
−
(n − 1)(n − 2)(n − 3) S4 (n − 2)(n − 3)
where n is the sample size, xi represents the data points, X is the
average and S is the standard deviation.
100 Glossary
Exponential See Section 5.3.
Distribution
There are two families that allow one to fit all possible average,
standard deviation, skewness and kurtosis values (excluding the
impossible region): Johnson, combining 3 distributions, and
Pearson, combining 7 distributions.
Groups Groups are categories that the data can be divided into. Examples
are: operator producing or testing the unit, line the unit was
manufactured on and the cavity that the unit was made in. When
the values fall into to such groups there is the possibility that the
groups are different. If each group fits the normal distribution,
but if some groups are shifted left or right relative to other
groups, the resulting histogram can appear to have multiple peaks
(modes). Such data is called bi-modal or multi-modal. As a
result, it is likely to fail a normality test.
Glossary 101
there should be statistical evidence the groups are different. The
group each value is from should be entered in the Group column
of the Data window. Then a comparison of the groups is
automatically performed and displayed in Tab 6 of the Test
Distribution window.
The difference between groups and ordered data is that groups are
categories that are unordered whereas ordered data consists of
subgroups that are ordered relative to time. Both can produce
multi-modal data.
0.4
0.3
0.2
0.1
x
-10 -5 5 10
A kurtosis greater than 3 means the tails are heavier than the
normal distribution. In order to have more units in the extreme
tails also means there must also be more units near the middle of
the distribution. Such distributions appear to have a very high
peak in the middle with wide plateaus for tail. An example is the
logistic distribution with a kurtosis of 4.2 shown below:
102 Glossary
f
0.25
0.2
0.15
0.1
0.05
x
-15 -10 -5 5 10 15
A kurtosis less than 3 means the tails are lighter than the normal
distribution like the Uniform distribution with a kurtosis of 1.8
shown below:
f
1.5
0.5
x
-0.5 -0.25 0.25 0.5 0.75 1 1.25 1.5
∑ (x − X)
n
4
n (n + 1) i
3(3n − 5)
i =1
−
(n − 1)(n − 2)(n − 3) S 4
(n − 2)(n − 3)
where n is the sample size, xi represents the data points, X is the
average and S is the standard deviation.
Levene’s Test Levene's test checks to see if the standard deviations of the
groups are different. This analysis assumes the data within the
groups fits the normal distribution.
Glossary 103
Lognormal Family See Section 5.9.
of Distributions
This approach has been further adapted so that the user can
specify a range of values that must be in the range of the selected
distribution. This range can be specified using the Select
Distribution to Fit Data dialog box. When the Find Best
Distribution button is clicked in the Data window, the required
range is automatically specified as at least 1 standard deviation
beyond any spec limits. This assures the spec limits are also
within the range of the distribution and can be transformed.
104 Glossary
Analyzer reduces the number of moments that are matched and
instead matches the bounds.
This approach has been further adapted so that the user can
specify a range of values that must be in the range of the selected
distribution. This range can be specified using the Select
Distribution to Fit Data dialog box. When the Find Best
Distribution button is clicked in the Data window, the required
range is automatically specified as at least 1 standard deviation
beyond any spec limits. This assures the spec limits are also
within the range of the distribution and can be transformed.
Multi-Modal Data Data that has multiple peaks in the histogram. Data with 2 peaks
is often referred to as bimodal (See Bi-Modal).
Glossary 105
Negative See Section 5.3.
Exponential
Distribution
106 Glossary
It is commonly desired to make a confidence statement like:
"With 95% confidence, 99% of the values are in spec." One way
of accomplishing this goal is to construct a normal tolerance
interval. If this interval falls inside the specs, then the same
confidence statement can be made relative to the spec limits.
This is a valid approach. A similar, but slightly more powerful
approach is to use a variables sampling plan.
Ordered Data Ordered data is when data is collected over time. It may be that 3
samples are collected at 10 points in time. The groups of 3
samples are referred to as subgroups. It may be that the
underlying data fits the normal distribution, but if a shift occurs in
the middle of collecting the data, the resulting histogram can
appear to have multiple peaks (modes). Such data is called bi-
modal or multi-modal. As a result, it is likely to fail a normality
test.
The difference between groups and ordered data is that groups are
categories that are unordered whereas ordered data consists of
subgroups that are ordered relative to time. Both can produce
multi-modal data.
Outlier A true outlier is a point that is not from the same distribution as
the other values, but instead something happened to it (typically
an error) to make it different than the other values. For example,
consider a filling operation where bags are filled with a solution.
Bags are consistently in the 50 to 55 mL range. However,
occasionally a half filled bag is found (15-35 mL). An
investigation into the cause identified these bags were being
removed from the filling nozzle before the cycle was completed.
As a result, some the solution missed going into the bag. An
outlier can be the result of either a manufacturing error or a
measurement error.
Glossary 107
relative to the normal distribution, it must generally be at least 4.5
standard deviations from the average. In Tab 7: Outliers of the
Test Distribution window values are flagged as definite outliers if
they are 10 or more standard deviations from the average.
Between 4.5 and 10 standard deviations they are flagged as either
outliers from the normal distributions or extreme values from a
long tailed distribution.
The smaller the p-value, the greater the evidence that the data did
not come from the selected distribution. For tests of fit and other
tests, the confidence level is calculated from the p-value as
100*(1 - p-value). Therefore:
108 Glossary
standard deviation and skewness of the distribution. Such
distributions are represented as a curve on a skewness-kurtosis
plot as the kurtosis depends of the skewness. Examples are the
gamma and lognormal distributions.
Pp =
(Upper Spec Limit − Lower Spec Limit )
6 Standard Deviations
The numerator is the width of the spec limits. The denominator is
6 standard deviations, which can be thought of as the width of the
process. For the normal distribution 99.7% of values fall within
±3 standard deviations of the average or into an interval 6
standard deviations wide. A Pp value of 1 means the process
variation fills the spec limits. A Pp of 2 means the specs are twice
as wide as the process. The larger Pp is, the better the capability.
Glossary 109
over time and uses the within subgroup standard deviation,
ignoring the effects of shifts between subgroups. Pp estimates
actual performance, while Cp estimates the capability the process
could achieve if made stable over time.
Ppk Capability index that measure the relative distance to the nearest
spec limit:
⎛ Upper Spec Limit − Average Average - Lower Spec Limit ⎞
Ppk = Minimum⎜⎜ , ⎟
⎝ 3 Standard Deviations 3 Standard Deviations ⎟⎠
The numerator is the distance to the nearest spec limit. For a one-
sided spec limit (lower spec only or upper spec only) only use the
portion of the formula for that spec limit. The denominator is 3
standard deviations, which can be thought of as the half the width
of the process. For the normal distribution 99.7% of values fall
within ±3 standard deviations of the average or into an interval 6
standard deviations wide. A Ppk value of 1 means the distance
between the average and the nearest spec limit is 3 standard
deviations and thus the process fills this interval and touches the
spec limit. A Ppk value of 2 means the distance between the
average and the nearest spec limit is 6 standard deviations and
thus the process fills only half this interval. This leaves a safety
margin. The larger Ppk is, the better the capability.
110 Glossary
limits.
When both a lower bound (LB) and upper bound (UB) are
specified, the data can be pre-transformed using the
transformation:
⎛ X − LB ⎞
Ln⎜ ⎟
⎝ UB − X ⎠
This transforms the range from LB to UB to -infinity to infinity.
Glossary 111
A value equal to LB cannot be transformed by this equation.
Such a value is replaced with LB + Precision/4 before performing
the pre-transform. A value also equal to UB cannot be
transformed by this equation. Such a value is replaced with UB -
Precision/4 before performing the pre-transform.
Skewness The third moment of a distribution and the first shape parameter.
The skewness is measure of the symmetry of the distribution. A
skewness of zero means the distribution is symmetrical like the
normal distribution shown below:
f
0.4
0.3
0.2
0.1
x
-10 -5 5 10
112 Glossary
A positive skewness means the upper tail is longer than the lower
tail like the Largest Extreme Value distribution with a skewness
of 1.14 shown below:
f
0.35
0.3
0.25
0.2
0.15
0.1
0.05
x
-6 -4 -2 2 4 6
A negative skewness means the lower tail is longer than the upper
tail like the Smallest Extreme Value distribution with a skewness
of -1.14 shown below:
f
0.35
0.3
0.25
0.2
0.15
0.1
0.05
x
-6 -4 -2 2 4 6
∑ (x − X)
n
3
i
n i =1
(n − 1)(n − 2) S3
where n is the sample size, xi represents the data points, X is the
average and S is the standard deviation.
Glossary 113
It is the default test because it is not affected by ties like both the
Anderson-Darling and Shapiro-Wilks tests.
-3 -2 -1 0 1 2 3
Skewness
114 Glossary
positive kurtosis.
Two-Sided Spec Limit Only: Rejects if either positive or
negative skewness or a positive kurtosis.
The Skewness-Kurtosis test does not give a p-value but instead
just indicates pass/fail. If you fail, you can state with 95%
confidence the data is not from the normal distribution as before.
∑ (x − X)
n
2
i
i =1
n −1
where n is the sample size, xi represents the data points and X is
the average.
Glossary 115
normality. They answer the question: "Does the data fit the
normal distribution?"
One has two options for using variables sampling plans. First,
you can use tables of variables sampling plans to determine the
acceptance criteria for Pp and Ppk. Distribution Analyzer then
calculates and displays these capability indexes, allowing a
pass/fail decision to be made. Second, you can state an
116 Glossary
acceptance criteria like "With 95% confidence more than 99% of
values must be in spec." Then use Distribution Analyzer to
construct the confidence statement relative to the spec limits to
see if the study passes.
Glossary 117
References
D’Agostino, Ralph B. and Stephens, Michael A. (1986). Goodness of Fit Techniques.
Marcel Dekker, Inc., New York, NY.
Elderton, William Palin (1953), Frequency Curves and Correlation. Harren Press,
Washington, D.C.
Johnson, Norman L. et. al. (1994). Continuous Univariate Distributions Volumes 1 & 2
Second Edition, John Wiley & Sons, New York, NY.
Rose, Colin and Smith, Murray D. (2001). Mathematical Statistics with Mathematica.
Spring-Verlang, New York, NY.
Shapiro, Samuel S. (1990). How to Test Normality and Other Distributional Assumptions.
American Society for Quality, Milwaukee, WI.
Thode Jr., Henry C. (2002). Testing for Normality. Marcel Dekker Inc, New York, NY.
References 119
Index
A Tolerance Interval Options, 25
Dice Experiments dialog box, 47
About dialog box, 58 Distribution, 61, 99
Advanced Options dialog box, 24 Beta, 64
Alpha level, 108 Bounds, 96
Analysis menu, 54 Density function, 98
Analysis of Variance, 30, 95 Distribution function, 100
Analysis Options dialog box, 23 Exponential, 66
Anderson-Darling test, 95 Extreme value largest family, 68
ANOVA, 95 Extreme value smallest family, 72
Average, 96 Family, 62, 101
Fréchet, 69
B Gamma, 76
Beta distribution, 64 Johnson family, 78
Bi-modal, 96, 105 Logistic, 81
Bounds Loglogistic family, 81
Distribution, 33, 96 Lognormal, 84
Physical, 18, 96 Moments, 105
Negative of, 63, 106
C Normal, 86
Parameter, 108
Capability index, 21, 97 Pearson family, 88
Caption bar, 15 Range, 112
Central limit theorem, 48 Uniform, 92
Change-point analysis, 28, 97 Weibull, 73
Child window, 97 Distribution function, 100
Clipboard, 97
Confidence level, 97 E
Confidence statement relative to spec limits,
21, 26, 98 Edit menu, 52
Excess kurtosis, 24, 100
D Exponential distribution, 66
Extreme value largest family, 68
Data window, 16 Extreme value smallest family, 72
Density function, 61, 98
Dialog box F
About, 58
Advanced Options, 24 Family of distributions, 62, 101
Analysis Options, 23 File
Dice Experiments, 47 Open, 50
Generate Random Values, 46 Save, 50
Plot Options, 39 Save as, 50
Registration, 57 File menu, 50
Select Distribution to Fit Data, 34 Fréchet distribution, 69
Select/View Distribution, 38, 47
Index 121
G Example, 6
Reasons for failing, 32
Gamma distribution, 76 Shapiro-Wilks, 112
Generate Random Values dialog box, 46 Skewness-Kurtosis All, 113
Groups, 16, 28, 29, 101 Skewness-Kurtosis Specific, 114
H O
Help menu, 57 Order, 16, 27, 28, 107
Histogram, 20 Outlier, 31, 107
I P
Impossible region, 102 Parameters of a distribution, 62, 108
Installing software, 1 Pearson family, 88
Plot Options dialog box, 39
J Pp, 21, 109
Johnson family, 78 Ppk, 21, 110
Precision, 111
K Pre-transforming data, 33, 111
Print
Kruskall-Wallis test, 30, 102 Menu, 51
Kurtosis, 24, 102 Setup, 51
Excess, 24, 100 p-value, 108
L R
Levene’s test, 30, 103 Range of a distribution, 33, 112
Logistic distribution, 81 Registering software, 2
Loglogistic family, 81 Registration dialog box, 57
Lognormal family, 84 Reliability, 112
M S
Main window, 15 Select Distribution to Fit Data dialog box, 34
Maximum likelihood method, 33, 104 Select/View Distribution dialog box, 38, 47
Menu, 16, 49 Shape parameters, 112
Analysis, 54 Shapiro-Wilks normality test, 112
Edit, 52 Sheet
File, 50 Add, 53
Help, 57 Add, 53
Window, 56 Delete, 53
Method of moments, 33, 104 Move, 53
Moments of a distribution, 105 Significance level, 108
Multi-modal, 96, 105 Skewness, 112
Skewness-Kurtosis All normality test, 113
N Skewness-Kurtosis plot, 114
Skewness-Kurtosis Plot window, 36
Negative of a distribution, 63, 106
Skewness-Kurtosis Specific normality test,
Normal distribution, 86
114
Normal tolerance interval, 106
Standard deviation, 115
Normality test, 115
Anderson-Darling, 95
122 Index
T V
Test Distribution window, 20 Variables sampling plan, 116
Test of fit, 115
Tolerance interval, 21, 26, 106 W
Tolerance Interval Options dialog box, 25
Toolbar, 16, 59 Weibull distribution, 73
Transformation, 26, 32, 116 Window
Example, 9 Data, 16
Tukey-Kramer multiple comparison, 30, 116 Skewness-Kurtosis Plot, 36
Test Distribution, 20
Window menu, 56
U
Uniform distribution, 92 Z
Uninstalling software, 2
z-score, 31, 117
Index 123