Design of Experiments
Design of Experiments
Primer
Understanding the terms and concepts that are part of a DOE can help practitioners be better prepared to
use the statistical tool.
By K. Sundararajan
Design of experiments (DOE) is a systematic method to determine the relationship between factors
affecting a process and the output of that process. In other words, it is used to find cause-and-effect
relationships. This information is needed to manage process inputs in order to optimize the output.
An understanding of DOE first requires knowledge of some statistical tools and experimentation concepts.
Although a DOE can be analyzed in many software programs, it is important for practitioners to understand
basic DOE concepts for proper application.
Controllable input factors, or x factors, are those input parameters that can be modified in an
experiment or process. For example, in cooking rice, these factors include the quantity and quality of the
rice and the quantity of water used for boiling.
Uncontrollable input factors are those parameters that cannot be changed. In the rice-cooking
example, this may be the temperature in the kitchen. These factors need to be recognized to understand
how they may affect the response.
Responses, or output measures, are the elements of the process outcome that gage the desired
effect. In the cooking example, the taste and texture of the rice are the responses.
The controllable input factors can be modified to optimize the output. The relationship between the factors
and responses is shown in Figure 1.
Hypothesis testing helps determine the significant factors using statistical methods. There are two
possibilities in a hypothesis statement: the null and the alternative. The null hypothesis is valid if the status
quo is true. The alternative hypothesis is true if the status quo is not valid. Testing is done at a level of
significance, which is based on a probability.
Blocking and replication: Blocking is an experimental technique to avoid any unwanted variations
in the input or experimental process. For example, an experiment may be conducted with the same
equipment to avoid any equipment variations. Practitioners also replicate experiments, performing the
same combination run more than once, in order to get an estimate for the amount of random error that
could be part of the process.
Interaction: When an experiment has three or more variables, an interaction is a situation in which
the simultaneous influence of two variables on a third is not additive.
A Simple One-factor Experiment
The comparison of two or more levels in a factor can be done using an F-test. This compares the variance
of the means of different factor levels with the individual variances, using this equation:
F = ns2Y-bar / s2pooled
where:
n = the sample size
s2Y-bar = the variance of the means, which is calculated by dividing the sum of variances of the individual
means by the degrees of freedom
s2pooled = pooled variance, or the average of the individual variances
This is similar to the signal-to-noise ratio used in electronics. If the value of F (the test statistic) is greater
than the F-critical value, it means there is a significant difference between the levels, or one level is giving
a response that is different from the others. Caution is also needed to ensure that s2pooled is kept to a
minimum, as it is the noise or error term. If the F value is high, the probability (p-value) will fall below 0.05,
indicating that there is a significant difference between levels. The value of 0.05 is a typical accepted risk
value.
As an example of a one-factor experiment, data from an incoming shipment of a product is given in Table
1.
Lot Data
61, 61, 57, 56, 60, 52, 62, 59, 62, 67, 55, 56, 52, 60, 59, 59, 60,
A
59, 49, 42, 55, 67, 53, 66, 60
56, 56, 61, 67, 58, 63, 56, 60, 55, 46, 62, 65, 63, 59, 60, 60, 59,
B
60, 65, 65, 62, 51, 62, 52, 58
62, 62, 72, 63, 51, 65, 62, 59, 62, 63, 68, 64, 67, 60, 59, 59, 61,
C
58, 65, 64, 70, 63, 68, 62, 61
70, 70, 50, 68, 71, 65, 70, 73, 70, 69, 64, 68, 65, 72, 73, 75, 72,
D
75, 64, 69, 60, 68, 66, 69, 72
When a practitioner completes an analysis of variance (ANOVA), the following results are obtained:
Within
2,419.84 96 25.20667
groups
Total 4,021 99
Statistical software can provide hypothesis testing and give the actual value of F. If the value is below the
critical F value, a value based on the accepted risk, then the null hypothesis is not rejected. Otherwise, the
null hypothesis is rejected to confirm that there is a relationship between the factor and the response.
Table 2 shows that the F is high, so there is a significant variation in the data. The practitioner can
conclude that there is a difference in the lot means.
For an example of a two-level factorial design, consider the cake-baking process. Three factors are
studied: the brand of flour, the temperature of baking and the baking time. The associated lows and highs
of these factors are listed in Table 3.
The output responses considered are “taste” and “crust formation.” Taste was determined by a panel of
experts, who rated the cake on a scale of 1 (worst) to 10 (best). The ratings were averaged and multiplied
by 10. Crust formation is measured by the weight of the crust, the lower the better.
Analysis of the results is shown in Table 5. Figures 2 through 4 show the average taste scores for each
factor as it changes from low to high levels. Figures 5 through 7 are interaction plots; they show the effect
of the combined manipulation of the factors.
F-crit at
Factor df SS MS F Effect Contrast p
1%
Brand 1 2.0 2.0 0.0816 -1 -4.00 0.82 16.47
Time 1 840.5 840.5 34.306 -20.5 -82.00 0.11
Brand x time 1 0.5 0.5 0.0204 0.5 2.00 0.91
Temp 1 578.0 578.0 23.592 -17 -68.00 0.13
Brand x temp 1 72.0 72.0 2.9388 -6 -24.00 0.34
Time x temp 1 924.5 924.5 37.735 -21.5 -86.00 0.10
Brand x time x
1 24.5 24.5 1 -3.5 -14.00 0.50
temp
Error 1 24.5 24.5
Total 7 2442.0
Figure 2: Average Taste Scores for Low and High Flour Brand Levels
Figure 3: Average Taste Scores for Low and High Bake Time (Minutes) Levels
Figure 4: Average Taste Scores for Low and High Baking Temperature (C) Levels
In this case the actual F value for the three factors (brand, time and temperature) are below the critical F
value for 1 percent (16.47). This shows that these are not significant factors for the crust formation in the
cake. If further optimization of the crust formation is needed, then other factors, such as the quantity of
ingredients in the cake (eggs, sugar and so on), should be checked.
1. DOE Simplified Practical Tools for Effective Experimentation (Productivity Inc., 2000)
About the Author: K. Sundararajan is the regional quality assurance director, greater Asia, for
International Flavours and Fragrances Inc. He is a Black Belt and has been a trainer and practicing quality
professional for more than 20 years. He can be reached at [email protected] .
Most Practical DOE Explained
(Free Template!)
For purposes of learning, using, or teaching Design of Experiments (DOE), one can argue that an eight run
array is the most practical and universally applicable array that can be chosen.
By Kim Niles
For purposes of learning, using, or teaching Design of Experiments (DOE), one can argue that an eight run
array is the most practical and universally applicable array that can be chosen. There are several forms of
and names given to the various types of these eight run arrays (e.g., 2^3 Full Factorial, Taguchi L8, 2^4-1
Half Fraction, Plackett-Burman 8-run, etc.), but they are all very similar.
A free Microsoft Excel spreadsheet with a 2^3 Full Factorial array showing the mathematical calculations
accompanies this article (click below to download it). Generic steps for using the spreadsheet, precautions,
and additional advice are included below.
Viewing Tip: Usually, you can click on the icon link above to view the document in a new window -- it may
open within your browser using the application (in this case either Word or Excel). If you are having
difficulty, try right clicking the link and selecting "Save Target As..." or "Save As..." to save it to your
computer harddrive.
1. Determine the acceptance criteria you need (i.e. acceptable alpha error or confidence level for
determining what you will accept as passing criteria). This is typically alpha=.05 or 95% confidence
2. Pick 2-3 factors to be tested and assign them to columns A, B, and C as applicable (advise using
3. Pick 2 different test levels for each of the factors you picked (i.e. low/high, on/off, etc.).
4. Determine the number of samples per run (room for 1-8 only; affects normality and effect
5. Randomize the order to the extent possible.
6. Run the experiment and collect data. Keep track of everything you think could be important (i.e.
people, material lot numbers, etc.). Keep all other possible control factors as constant as possible as
7. Analyze the data by entering the data into the yellow boxes of the spreadsheet and reading the
results. A review of the ANOVA table will show you those effects that meet the acceptance criteria
established in step number one. If the alpha value in the table is greater than the acceptance criteria,
accept the result; if it is less, reject the result. Similarly, the higher the confidence, the higher the
probability that that factor is statistically different from the others. Signal to noise measurements are
8. Confirm your results by performing a separate test, another DOE, or in some other way before
fully accepting any results. You may want to more closely define results that are close to your
acceptance criteria by retesting the factor using larger differences between the levels.
Additional Advice
How DOEs Work
Note that using the eight run array, we have four runs being tested with each factor at high levels and four
without being at a high level. We have the equivalent of eight data points comparing the effects of each
high level (4 high + 4 not high = 8 relative to high) and vice versa for each factor and the interactions
between the three factors. Therefore, using this balanced multifactor DOE array, our eight run test
becomes the statistical equivalent of a 96 run, one-factor-at-a-time (OFAT) test [(8 Ahigh)+(8 Alow)+(8
Bhigh)+(8 Blow)+(8 Chigh)+(8 Clow)+(8 ABhigh)+(8 ABlow)+(8 AChigh)+(8 AClow)+(8 BChigh)+(8
BClow)]. Other advantages to using DOE include the ability to use statistical software to make predictions
about any combination of the factors in between and slightly beyond the different levels, and generating
various types of informative two- and three-dimensional plots. Therefore, DOEs produce orders of
magnitude more information than OFAT tests at the same or lower test costs.
DOEs don't directly compare results against a control or standard. They evaluate all effects and
interactions and determine if there are statistically significant differences among them. They also calculate
statistical confidence levels for each measurement. A large effect might result but if the statistical
confidence for that measurement is low then that effect is not believed. On the other hand, a small
measured effect with high confidence tells us that the effect really isn't important.
Precautions
Since this array has less power than others available, we need to remember that when optimizing a
process that isn't critical to human safety, using test results with a low confidence level can often be much
better than not knowing which way to go with machine settings, etc. Assuming all the error in one's
experiment is evenly distributed (random distribution of error), a confidence level of 60% (measured to be
true via DOE), for example, might seem horrible but really means the equivalent of 80%, since the 40%
that we are unsure of could go either way [60% + (40% / 2) = 80%].
The accompanying spreadsheet cannot easily be changed. It should be used while training others (shows
the math), or when you want to perform a quick experiment and are away from statistical software. It can't
be replicated or you can't add center points in its current form (center points increase statistical confidence
by improving measurements of error in the experiment).
About the Author: Kim Niles has more than 17 years process control and improvement experience
working with San Diego manufacturing companies in a wide range of industries and disciplines. Currently
an officer in three professional societies, Mr. Niles has a master's degree in quality science with an
emphasis in Six Sigma from California State University Dominguez Hills. He has a bachelor's degree from
San Diego State University through the industrial technology department. He can be reached at
[email protected] .
Six Sigma is the buzzword of today. Companies big and small talk about it and numerous success stories
further advertise its relevance in today's challenging business environment. Debates on its emergence as
a strategic initiative have created critics who consider it old wine in a new bottle and loyal followers willing
to swear by it.
The latter of these are practitioners who view Six Sigma as an effective way to implement statistical
thinking, a philosophy of learning and action based on the following fundamental principles:
Statistical thinking provides practitioners with the means to view processes holistically. Table 1 reflects the
Six Sigma tools that map to these principles, which are not exhaustive in nature and could overlap with
other principles.
Many Black Belts feel compelled to apply Six Sigma tools in the order learned during training. For
example, using Control Charts, taught in the Measure Phase, before Design of Experiment tools, which
are taught in the Improve Phase. Black Belts can often complete projects faster and obtain better results
using statistical thinking practices rather then DMAIC as their only guideline. To illustrate this point, I would
like to share two case studies where different approaches were used, both employing statistical thinking
principles.
Case Study 1
Shafts were ground on machines to a close tolerance of +- 5 Microns, representing a process capability of
less than 1.0. The objective of the project was to increase the Cpk to 1.5. Using conventional DMAIC as a
guideline, the team created a detailed process map. The map was created after a comprehensive study of
the machine manual, which also allowed identification of the critical Xs. A Failure Mode Effects Analysis
(FMEA) was then completed to identify risks associated with each critical X and a Design of Experiment
(DOE) was conducted to optimize speed, feed, coolant and other controllable parameters. As a result of
the team's effort, the grinding process was optimized and a Cpk of 1.68 achieved. Control charts were
used to document the effectiveness of shaft monitoring and a Control Plan was established to control the
critical Xs.
In this Case Study the first principle of statistical thinking, all work occurs in a system of interconnected
processes, was employed during the initial phase of the project. Process Capability Analysis and Process
Mapping were key elements of this phase. During the second phase of the project, use of Control Charts
and a FMEA emphasized the use of the second principle, that variation exists in all processes. The
success of the project, illustrated through the use of DOE and Control Charts, was achieved by
understanding and reducing variation, the third statistical thinking principle.
If the Black Belt first implemented a DOE without studying the machine manual and employing Control
Charts, noise experienced during experimentation would lead to inconclusive results. The Black Belt would
reach the conclusion that Control Charts should be analyzed prior to attempting a DOE and that the only
variation in the process is due to common causes.
Case Study 2
In a credit card processing organization the response time between swiping a credit card to approval or
decline is critical. The objective of the project was to improve the process such that no response would
take more than 14 seconds from the moment of dialing to response. The team first mapped the network to
gain an understanding of the process flow then used the conventional DMAIC methodology. Data was
collected and analyzed through Multi Vari Charts. At this stage of the project, the team felt that they did not
understand much of the variation and, instead of continuing to employ the conventional DMAIC
methodology, elected to implement a DOE. At the end of the experiment, 92% of the variation control
factors impacting response time were identified. This finding greatly assisted the team in transitioning from
the Measure phase to Control phase in just 2 days!
The following review of advantages and disadvantages can help determine whether to apply DOE or
Control Charts first.
Gives conceptual understanding of the process early in the project, leading to quick completion of
the project.
Helps understand the Cause and Effect relationship early in the project.
Helps identify the components for the Y= f (X) equation earlier in the project.
Encourages a sequential approach for DOE.
Disadvantages:
Does not typically distinguish between Special Causes and Common Causes. If there is a Special
Cause during experimentation it will appear in a residual analysis of the experiment.
Too much noise during experimentation may hide significant Xs.
A lack of understanding of how Special Causes impact the process may create a difficult
environment for sustaining results after experimentation.
Process understanding of Special Causes may remain poor.
Advantages:
Helps understand the variation of the process and eliminate Special Causes that can affect the
process thus eliminating undue noise during future applications of DOE.
Enables an understanding of whether the variation is time dependent and helps gain an
understanding of appropriate factors for Blocking in experimentation.
Helps analyze the process dynamically and provides a means for obtaining quick feedback to
rectify process drift.
Disadvantages:
Creates time constraints for solving problems. Effective charts contain a minimum of 25 sub-
groups, which can lead to long data collection periods.
Hinders identification of direct Cause and Effect relationships and creates environment where
Special Causes are difficult to distinguish.
In summary, I would like to emphasize that the best results are achieved when statistical thinking
principles are used as a guideline. The above case studies illustrate that there is no set rule to
understanding variation based on the order of the tools used. Black Belts should review each project
independently to determine the most effective use of these tools and look to utilize them judiciously rather
than with a standard cookbook approach.
References
ASQ Statistical Division, Glossary of Statistical Terms. (2003). Available On-line: http://www.asq.org.
The Therabath paraffin therapy bath (pictured below) is a durable medical device that holds one gallon of
molten paraffin wax. Sufferers of osteoarthritis use it for physical therapy. They dip their hands repeatedly
in the heated bath, which helps loosen stiff joints. The wax then slowly solidifies as a glove, producing
further therapeutic benefits via the heat of fusion. Oils reduce the overall melt point to a comfortable level,
facilitate removal of the glove and provide moisturizing for skin.
The tank holding the molten wax is made out of stamped steel. It's then powder-coated electrostatically
with an epoxy-based paint. The coating must withstand temperatures approaching 130 degrees Fahrenheit
and exposure to salt-water that collects from the sweat of the Therabath users.
The units are sold with a life-time guarantee. Very few units get returned, but in those that do, two types of
off-grade predominate:
It's believed that if adhesion can be maintained at a level above 200, with hardness set at 140, the
problems noted above will recede to a level of quality consistent with Six Sigma. (For reasons of propriety,
we do not show units of measure of these responses.)
The mixture variables (A, B and C) added to a constant of 90 weight percent of the powder. The other 10
percent was made up of bisphenol A, aluminum oxide and silica, all held in constant proportion. The paint
chemists expected the three main powder-coating components to interact in complex ways. To properly
model such behavior, we wanted to formulate a sufficient number of unique blends to fit a "special cubic"
mixture model:
The process factors of bake time and temperature (D and E) also were expected to interact with each
other and possibly create non-linear responses. For this reason, we decided to collect sufficient data for
fitting a "quadratic" process model:
Models like this are typically used to develop response surface graphs for process optimization.
These two models were crossed to account for possible interactions between mixture and process. For
example, the ideal coating formulation may differ depending on the choice of bake time and temperature,
which could be a function of the particular type oven available for processing the tanks. The crossed model
contains 42 terms (7 from the special cubic mixture model times 6 from the quadratic process model). With
the aid of computer software,6 a d-optimal design2 was generated to provide the ideal combinations of
variables for fitting response data to the combined mixture-process model. We added 5 extra unique points
for estimating lack-of-fit for the model, plus 5 replicates of existing points to estimate pure error. We also
incorporated one center point (combination #3 on the list in the appendix), representing the standard
operating conditions, for a grand total of 53 combinations.
The results (simulated) of this experiment can be seen in the Appendix. The combinations are organized
for easier viewing, with replicate points lumped together. However, experiments like this should always be
performed as randomly as possible to offset any lurking variables, such as ambient humidity.
Figures 6a, 6b: Effect Of Powder-Coating Components On Adhesion (Left) And Hardness (Right) With
Bake Time And Temperature At Mid-Levels (Centerpoint)
Figures 7a, 7b: Effect Of Process Factors On Adhesion (Left) And Hardness (Right) With The Powder-
Coating Formulated At Mid-Levels (Centroid)
Notice on Figure 7b how the hardness response drops precipitously at the low ends of bake time and
temperature. Even assuming you'd accept such a low hardness, this would be a very bad place to set up
your process, because results would be very sensitive to variations in the input factors. This can be seen
more clearly in the computer-generated graph of POE shown in Figure 8.
The input variables aren't listed above, but they do get constrained within their previously specified ranges.
Never extrapolate outside of your experimental region because the predictive models probably won't work
out there. Stay in bounds or possibly face some dangerous consequences!
Based on the goals and ranges, the software set up desirability scales on all responses and then searched
for a solution that maximized overall desirability. (For an enlightening discussion on how desirability gets
calculated and applied, see reference 7, reprints of which can be obtained by contacting the authors.) The
optimal combination found by the software can be seen in Table 3.
Figure 9: Sweet Spots For Processing Powder Coat Made By Optimal Formulation
In this view of the multivariable experimental space, adhesion appears to be more of a limiting response
than hardness (it forms most of the boundary around the operating windows at top and bottom). It's
tempting to consider reducing the bake time from the recommended level of approximately 26.9 (flag set at
starred point in the upper window) to it's minimum tested value of 15 minutes (see flag in lower window).
However, as you can see in Figure 10, the adhesion is the least in the mid-range of bake time (the trough
on the response surface).
Figure 10: Response Surface For Adhesion As A Function Of Bake Time And Temperature With Optimal
Formulation Of Powder-Coat
At the values of 15 minutes for bake time at 364 degrees (approximately what's shown in the lower flag in
Figure 9) the POE of adhesion exceeds 23 units of standard deviation, a substantial increase from the
18.9 result reported in Table 3 for the optimum setup. In companies that place an emphasis on Six Sigma
goals this may be an unacceptable trade-off. However, if time is of the essence, it's easy enough to
incorporate this in the optimization criteria - simply add a goal to minimize it. That's what makes the whole
DOE procedure outlined in this article so incredibly powerful: It produces statistically-valid predictive
models that can be easily manipulated for "what-if" analysis. Then when goals shift, or get weighted
differently, it's just a matter of plug-and-chug and the answer magically appears. To ensure that it's correct,
confirmation runs must be performed. Hopefully, the results will agree well with what's predicted to
happen.
Conclusion
As a result of doing systematic experimentation, using sound statistical principles, the quality of the
powder-coating of Therabath tanks can be improved and became more robust to variations in the levels of
components and processing factors. With the proper knowledge and software tools, chemists and
engineers from any of the process industries (pharmaceutical, food, chemical, etc.) can apply these same
methods to their systems and accomplish similar breakthrough improvements.
Mark J. Anderson, PE, CQE, is a principal and General Manager of Stat-Ease, Inc. Prior to joining Stat-
Ease, he spearheaded an award-winning quality improvement program for an international manufacturer,
generating millions of dollars in profit. Mark offers a diverse array of experience in process development,
quality assurance, marketing, purchasing, and general management. In addition to his duties at Stat-Ease,
Mark is a partner and technical consultant for WR Medical Electronics, a medical device manufacturer.
Mark is also co-author of the popular book, DOE Simplified: Practical Tools for Effective Experimentation,
and has published numerous articles on design of experiments (DOE). Mark can be reached via email at
[email protected]
Patrick J. Whitcomb, PE, is the founding principal and President of Stat-Ease, Inc. Before starting Stat-
Ease, he worked as a chemical engineer, quality assurance manager and plant manager. Pat co-authored
Design-Ease® software, an easy-to-use program for design of two-level factorial experiments and Design-
Expert® software, an advanced user's program for response surface and mixture designs. He's provided
consulting on application of design of experiments (DOE) and other statistical methods for over 20 years.
In addition, Pat is co-author of the popular book, DOE Simplified: Practical Tools for Effective
Experimentation