0% found this document useful (0 votes)
84 views11 pages

Introductory Example

The document discusses design of experiments (DoE), which is a statistical methodology used to determine the relationships between factors that influence a process and the resulting outputs. DoE involves carefully planning experiments to efficiently gather information and draw conclusions. An introductory example is provided to illustrate how DoE is used, including determining important factors, designing experiments, analyzing results using models, and using the models to optimize the process. The goal of DoE is to learn as much as possible from experiments to improve processes and save time and money.

Uploaded by

ChristianAslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views11 pages

Introductory Example

The document discusses design of experiments (DoE), which is a statistical methodology used to determine the relationships between factors that influence a process and the resulting outputs. DoE involves carefully planning experiments to efficiently gather information and draw conclusions. An introductory example is provided to illustrate how DoE is used, including determining important factors, designing experiments, analyzing results using models, and using the models to optimize the process. The goal of DoE is to learn as much as possible from experiments to improve processes and save time and money.

Uploaded by

ChristianAslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Experiments are done to quantify cause-and-effect relationships between

adjustable parameters (inputs, factors, x) and the results (outputs, responses, y).
Theyhelp to answer questions like: How do the ingredients and proportions of a
mixture (inputs) influence its properties (outputs)? How do process parameters
like temperature and stir-rate (inputs) influence the yield of a reaction and the
purity of the end product (outputs)? Which settings offer the best compromise
between conflicting aims, such as yield and purity? What range of inputs is
permissible without causing out-of-specification outputs?

Design of experiments (DoE) looks at a complete experimental program, that is,


a whole set of factor level combinations (individual experiments, runs) and
addresses questions like: How should I distribute these runs in design space (that
is, the plausible range for all factors) to gain as much information as possible?
How many runs do I need to achieve the desired accuracy? In what order should I
do these runs to avoid spurious results? And afterwards, DoE helps to draw the
right conclusions from the results.

DoE can be used on its own. But DoE is also the crucial learning method within a
Six-Sigma-Strategy (the Improve), or the more chemically oriented Quality by
Design (QbD) methodology. Experiments are done with many different aims.
Correspondingly, DoE is not an individual tool, but rather a collection of ideas, or
a toolbox. It is an adaptation of statistical methods to industrial experimentation.

The aim of this article is to give an introduction and an overview to help you
decide whether DoE is useful for you.

Generally speaking, DoE is nota cure-all in itself. However, it is avaluable set of


tools that helps you, the process expert, to learn as much as possible from as
few individual experiments as possible in order to to save time and money and to
get better, more reliable results.

Introductory example

To show what DoE is about, let me start with an example [1]. Important factors
influencing a particular chemical reaction are as follows:

Reaction time best results are expected in the range 40 to 50 min

Reaction temperature best re sults are expected in the range 80 to 90C

Catalyst amount best results are expected in the range 2 to 3%

Important results or responses are the conversion (yield, in percentage), which of


course should be as high as possible, and the thermal activity a dimensionless
quality characteristic, which should be in the range 55 to 60, ideally near the
middle of this range. This sort of starting information is initial process knowledge
and is crucial for the success of a DoE.

The design space, then, is the cube defined by these three factor ranges (or
slightly beyond). The experimental design is a list of points in this design space.
Figure 1 shows the design used in this case a so-called central composite
design, which is a frequently useful standard design. The points are at the
corners and center of the cube defined by the above ranges, and at axial or star
points slightly outside the face-centers of the cube. It is graphically obvious that
the design points are evenly distributed in the region of interest and therefore
well suited to learn as much as possible about the dependence of the responses
on the factors in these ranges and this is what DoE is all about.

Table 1 shows the experimental design and the results used for the example.
Every line is an individual experiment, called a run. All runs together are the
experimental design, the experimental program as a whole.

Using suitable software, models are fit to the data usually, low order
polynomials (mainly quadratic) are used. They smooth out random variation and
sort of interpolate between the observed results. These models can then be
presented graphically. From them we see that the conversion is highest at high
temperatures and the thermal activity is independent of temperature. So the
highest temperature is obviously the best. Figure 2 shows the fitted model for
conversion as a function of time and catalyst as a surface in space (the z-axis is
the response conversion) the so-called response surface. Figure 2 also shows
the same dependence as a contour diagram, to be interpreted like a map. The
surface has the shape of a sloping ridge, the highest conversion being achieved
for a time of 50 min and 3% catalyst. Always try to understand the model
bearing in mind what you already know about your process.This is an excellent
way of learning more about the cause-and-effect relationships in your process.

Figure 3 shows the contour diagram for the thermal activity. It is lowest for time
40 min and 2% catalyst and highest at 50 min and 3%, increasing linearly from
one corner to the other. Comparing Figures 2 and 3, one recognizes that highest
conversion (about 87%) combined with thermal activity in the middle of the
range 55 to 60 is achieved at 42.3 min with 2.4% catalyst at a temperature of
90C. Better conversion (about 96%) would be achieved at 50 min with 3%
catalyst, but the thermal activity (67%) is far outside the permitted range. Also,
for tem- peratures greater than 90C, higher conversion is expected.

FEATURE REPORT

In this comparatively simple situation it was possible to find an optimal


compromise by looking at the contour diagrams. Often there are more than three
factors or more than two responses, and graphical optimization is tedious or
even im- possible. Therefore, good software helps to find compromises.
Frequently the so-called desirability is used as a standardized way of telling the
software what the user wants. Alternatively, intersection plots or profiles, as
shown in Figure 4, can be used for graphical optimization. Intersection plots show
the dependence of all responses on all factors, when the other factors are held
constant at the values given. These curves are cuts (intersections) through the
response surface, such as the surface in Figure 2. Changing the fixed value for
any of the factors changes the shape of the curves for the other factors, because
the cut is made in a different position.
The red lines in Figure 4 show the model, the dashed green lines above and
below show the 95%-confidence interval (an estimate for the uncertainty of the
model). The solid green lines at the bottom of all diagrams indicate the ranges
covered in the experiment.

The aim of this introductory example is to show what an experimental design can
look like, what sort of results can be obtained, how the resulting models can be
visualized and how they can be used to better understand the cause-and-effect
relationships and find compromises between conflicting aims.

However, it is important to realize that experiments can also be done for


completely different reasons. For example, you might be interested in reducing
random variation of the results or become independent of noise or nuisance
variables (robust processes).

Finally, let us have another look at Figure 2. Please note that for a catalyst
amount of 2%, an increase in time leads to a decrease in conversion, while for a
catalyst amount of 3%, it leads to an increase in conversion the effect of time
on conversion depends on catalyst amount. Similarly, the effect of catalyst
amount depends on time. In DoE, this behavior is called an interaction between
the factors time and catalyst for the response conversion: the effect of one factor
on the response depends on the value of the other factor.

Experimentally, interactions can only be detected if both factors involved are


changed simultaneously. This is an important aspect of DoE and runs counter to
the frequently heard advice to only vary one factor at a time.

The experimental cycle

Learning from experiments is done in cycles, as shown in Figure 5. Each cycle


starts with a problem to be solved or with questions to be answered the aim,
which should be as quantitative as possible. After careful preparation, the factors
and their ranges are chosen. These steps are crucial for the success of designed
experiments. Then the actual design is set up, the experiments are done, the
models fitted and the conclusions drawn. Several such cycles may be required.

Define the aim. What improvement is wanted or what problem is to be solved?


A good aim has the following properties:

Specific and precise

Measurable

Accepted by everybody involved

Realistic

A good starting point is the customer (internal or external). Leading questions


are: What is his or her problem with the present product or process? What does
he or she need? How valuable is the expected improvement to him or her?
Determine the boundary conditions. When choosing the boundary
conditions, the following questions should be considered:

How much time and money are available for the experiment?

Is the necessary equipment (both for the experiments and for the
measurement of the results) available?

How many components are involved, or how much material is


available? Is it homogeneous?

Often these boundary conditions impose serious limitations on what is possible.


They may require adjustment of the aim.

Analyze available data (if there are any). The following questions are asked
regarding available data:

What are the present settings for the process parameters? What
results are achieved with these settings?

How much do the results vary? (The standard deviation is a good


measure for variation.)

If process parameters, material properties or environmental conditions vary in


the available data, it is useful to look for correlations between all of these
parameters and all important outputs (responses). Strong correlations may
indicate that a parameter is important. But beware of the following:

If a process parameter is tightly controlled (for example, because it


is known to be important) and therefore does not vary much, no correlation with
outputs will be found and it looks unimportant

Even strong correlations may be spurious. For example, just imagine


that occasionally your starting material is contaminated, and this contamination
causes foaming and a low yield. To suppress the foaming, you then increase the
pressure. If now you plot yield versus pressure, you will find low yield at high
pressure but it is no cause-and-effect relationship. Only if you decide on the
pressure before the experiment (as you would do in a designed experiment) can
you be sure that the observed dependence is cause-and-effect

Define the output variables. The output variables should represent the aim
(that is, from their value you can see whether the problem has been solved and
the intended improvement reached or not). Furthermore, all quantities that are
important to the customer should be included as outputs. They may be directly
measured or calculated from one or several measured quantities.

Outputs should be quantitative (as conversion or thermal activity are in the


introductory example), rather than just a yes or no, because numbers contain
much more information. If no measurement is possible (for instance, taste), a
grade is better than just yes or no, or good or bad.
Collect input variables. Collect all variables that may influence the outputs,
such as process variables, environmental conditions, material properties and so
on. Use your experience and also brainstorm in a small team, to make sure that
nothing is forgotten. For a particular process, this list only has to be compiled
once (possibly extended later with new experience). It may be very long.

The aim of this list is to not to forget important influences and ensure
comparability of the results with later experiments.

Select factors. Only after the complete list of input variables is compiled are
the factors for the present design selected. The selection criterion is the
expected importance of the factors for achieving the aim or for the process
generally. Good selection is crucial for the success of the whole cycle, because
only the effect of factors selected can be determined. So take care and use all
your experience. Also, make sure that the levels of all factors can be set
independently of each other.

For a response-surface design as used in the introductory example, two to six


factors are usually selected. If you cannot decide on such a small number of
important factors, a two-step-procedure is used: first, use a screening design,
which only has the aim to separate the unimportant factors (the trivial many)
from the vital few. And only afterwards use a response-surface design. Details
are given later.

The input variables that were not selected are held fixed at their presumed best
value (to the extent possible) and this value is documented for future
comparability. In this way, the random variation of the outputs is minimized.
Inputs that cannot be held fixed (for example, uncontrollable environmental
conditions) are measured and documented. If they turn out to be important, they
can later be included in the model.

Determine factor ranges. The next step is to choose a suitable range for the
factors to be varied in the design. Here too, your experience is crucial.

For response-surface designs as in the introductory example, a promising range


around the expected optimum should be used. No phase transitions or other
dramatic changes should occur in the chosen range. If the chosen range is too
broad, the simple smooth models used for the fit do not work. If the range is too
narrow, the optimum may not be in the chosen range and the effects of the
factors may be so small that they are difficult to detect.

For screening designs, the ranges can be broader, but they should be realistic.
The idea is that, if a factor still has no significant effect, it is definitely not
important.

Set up the experimental design. There are many good designs. The most
suitable one depends on your problem. An overview of the most common
designs is given in the next section. One of them is the central composite design
in Figure 1. And all of these designs are statistically efficient, which means the
model fitting will be as accurate as possible (for a given number of runs).
Generally, setting up the design is done using software, and is normally not a
problem. However, there are two questions that I would like to discuss here, as
follows:

How often should each point in the design be done (that is,
replicated)?

In what order should the individual experiments be run?

How often, that is, how many runs altogether? This is a trade-off

Obviously, the more runs there are, the more expensive the whole experiment
becomes. And frequently the boundary conditions dictate the maximum number
of runs possible.

On the other hand, more runs give more accurate models. Averages vary less
than individual results and the width of confidence intervals decreases inversely
as the square root of n, the number of runs, (n1/2). So, for example, you need
four times as many runs to double the accuracy (at least as a good
approximation). A term in the model fit to the data is called statistically
significant if it is bigger than the width of its confidence interval, that is, if it
stands outside of random variation. This happens earlier, if there are more data.

In what order? Use randomization whenever possible and blocking when


appropriate.

Randomization means doing the runs in randomized order, that is, mixed up, not
in the systematic order shown in Table 1. If randomization is used, then a trend in
the data (for instance, due to aging of a chemical or degradation of an
apparatus) or an unexpected change cannot lead to a systematic error in the
model.

Blocking means keeping expected differences (for example, different lots of the
raw materials, different operators) out of the random variation (and thus
increasing the accuracy and sensitivity of the experiment). This is achieved by a
careful balancing of the levels of all factors between the blocks.

Prepare and do the actual experiment. The basic principles here are care
and attention to detail. The designs are very efficient all data are used for
fitting all model terms. Therefore, every single data point is important.
Frequently useful steps are as follows:

Plan all necessary resources (material, equipment, personnel) and


make sure all materials are available, homogeneous and so on; all equipment is
in good working condition; all personnel are well trained, and so on

Write down the experimental and measuring procedure in detail and


practice it beforehand

Test whether possibly problematic extreme combinations in your


design still work
Calibrate your measuring equipment and do some repeat-
measurements to know the random variation and pitfalls of your equipment
before, and take appropriate precautions

Make sure all runs are done in the planned order (and not reordered
for convenience) the messed up (randomized) order is intentional

Stay with the experiments, particularly if they are run in a


production environment or if it is the first designed experiment. Many surprising
discoveries have been made in that way

Document any departures from the procedures laid down, even if


they seem unimportant at first. Document environmental and other relevant
conditions run-by-run

If there are any departures from the planned design, always use the
actual factor values in modeling (rather than the design values)

Make sure that no settings or results are confused this could have
a dramatic effect on the modeling. Check all results for plausibility

Fit and evaluate models. This is done using suitable software. Usually simple
polynomial models are fitted using standard linear or quasi-linear regression, a
common statistical procedure. Model terms that are significant (that is, those
that are larger than random variation) are retained in the model. The usual
criterion is the p-value, which is the probability with which the observed term, or
an even bigger one, occurs by chance, given the observed random variation. If
this p-value is small (often <5%, but this is only a guideline), the corresponding
term is called significant.

There are many diagnostic plots of the residuals (that is, deviations from the
fitted model) to help in checking the results. The most important ones are as
follows:

Normal plot of residuals: Deviations from a normal distribution can


be recognized, in particular outliers these are results that are unusual. The
reason may be a mistake in the experiment, but it may also indicate that
something new happens at the factor-level combination (setting) involved

Residuals in the run order: Trends or sudden changes may be


recognized

Residuals versus factors: Systematic deviations from the model can


be detected

Box-Cox-plot: It shows whether a transformation of the response


(for instance, taking the logarithm) leads to a model that fits the data better

A wide variety of further diagnostic information is given, such as the following:


Lack-of-fit: It checks whether the systematic deviations from the
model are comparable to the random variation found from replications at
identical settings of the factors

Predicted r2 or cross-validation: It checks the capability of the


model to predict responses at settings not used for fitting the model, to avoid
overfitting

Deduce improvements. If the model is good, it can be plotted for a response-


surface design as shown in Figures 2, 3 and 4. As shown in the introductory
example, these plots can be used for a better understanding of the cause-and-
effect relationships in the process or product studied and for finding optimal
compromises between conflicting aims. A confirmation experiment at this setting
rounds off the program.

For screening designs, the results can be used to distinguish between important
and not so important factors to then investigate the important factors in a
response-surface design. Or the results can be used to identify the direction of
steepest improvement, if one is far from the optimum and then proceed in
that direction.

In other words, either the improvement is complete after the current cycle or a
new cycle starts. In any case, important new information has been gained.

Typical experimental designs

There are many different experimental designs. They all have in common that
they are efficient for a particular purpose. What follows are some typical designs.

Two-level factorial designs. Figure 6 shows a two-level factorial design for


three factors. The full factorial design consists of all 2 3 = 8 corners of the cube
(the red and the blue points in Figure 6). With it, only linear effects of the factors
and all their interactions can be determined. For k factors, the design consists of
the 2k corners of a k-dimensional cube.

If all factors are numerical (as in the introductory example, where any value
between, for example, time 40 and 50 min is possible), a center point (green) is
strongly recommended. For categorical factors (for example, Supplier 1 and
Supplier 2, or solvents water and alcohol), no center is possible. Even if the other
points are not replicated, at least the center point should be replicated to get an
independent measure for the random variation of the results.

The first eight runs in Table 1 are a full factorial 23, the next four runs are the
replicated center (in systematic order, they should of course be randomized).

With a center point, a departure from linearity can be recognized (although it


cannot be attributed to a particular factor). However, as Table 1 shows that a full
factorial with center can later be extended to a central composite design, simply
by adding extra runs.
Factorial designs are very efficient, particularly if the random variation is
noticeable. The effect of each factor is the difference between the average of all
four points on one face of the cube and the opposite face, that is, each result is
involved in the calculation of each effect.

Only the four red points or only the four blue points (if possible combined with
the center) are a fractional factorial design. Fractional factorial designs contain
only a fraction (here half) of the points of a full factorial design. Their advantage
is that they contain fewer runs and therefore need less effort. The disadvantage
is that not all effects can be calculated separately (this is called confounding).
This may involve a risk, but if a suitable design is selected carefully, they are
very useful for screening. More details are given in the literature below.

Central composite designs. A central composite design was used in the


introductory example and is shown in Figure 1. Central composite designs are
very useful for fitting response-surface models and optimization. Central
composite designs are very efficient if they fit the situation, that is, when only
numerical factors and all points shown in Figure 1 are possible. In more
complicated situations, optimal designs are used.

Optimal designs. Optimal designs are not better than central composite
designs, but they are more flexible, and they are used in most non-standard
situations. They are particularly useful when the following conditions exist:

If there are some numerical and some categorical factors (and


mixture factors, see below)

If specific (more complicated) models are known to be required

If specific boundary conditions are to be taken into account

If factors restrict each other, that is, if certain combinations are not
possible (for instance, high temperature combined with a long reaction time, or
low temperature with short reaction times)

Optimal designs require software for setting them up. Figure 7 illustrates the
basic idea for the case of two factors A and B that re- strict each other in this
case it is assumed that high values of A (marked +) cannot be combined with
high values of B (+). The starting point is a list of settings far greater than
necessary (Figure 7A; here 5 5). Taking away the settings that are not possible
still leaves much more than enough (Figure 7B). The task then is to select the
subset that is most suitable. This requires specifying the model to be fitted
(frequently quadratic), the number of settings to be included (which must be
larger than the number of terms in the model) and any further boundary
conditions. Using an exchange algorithm, the software then chooses iteratively a
best subset that satisfies all conditions. For best there are different
optimality criteria, which in practice do not differ very much. Figure 7C shows
such a selection (marked blue) from the 19 candidates in Figure 7B for a
quadratic model (which contains six terms), if the ten (>6) best points were
wanted.
Mixture designs. The components of a mixture cannot be varied inde-
pendently of each other; the sum of all components is fixed, usually to 1 or
100%. For example, for three components, only two can be chosen
independently, the third is then fixed. Figure 8A shows the principle.

The setting of the three components cannot vary within the cube, but are
restricted to a triangular part of the plane. Mixture designs take that into
account, as exemplified in Figure 8B.

If within the same experiment, components of a mixture and process variables


(like temperature) are varied, an optimal design has to be used, as there are no
suitable standard de- signs. An optimal design also has to be used if there are
restrictions on the ranges of the components.

Advantages & limitations of DoE

DoE as a systematic procedure offers many advantages, including the following:

Systematic preparation of ex- periments, involving everybody


concerned and clear aims

All results are used for the cal- culation of all effects, resulting in
efficient use of the data and reduced numbers of runs

The number of runs required and hence the resulting costs can be
estimated in advance

Interactions between factors can be detected, resulting in an


improved and quantitative description of the cause-and-effect relationships
allowing good compromises even for conflicting aims

Consequences are drawn only from statistically significant results


this avoids chasing after spurious results

The results of an experiment are presented systematically and


graphically this improves the transfer of experiences to new but similar
contexts.

DoE is particularly advantageous when the following situations occur:

If there is appreciable random variation, which might lead to


spurious results without the use of statistics

If several factors influence sev eral responses, all in a complicated


and not obvious way

Typical savings reported using DoE are in the range 40 to 75%, both in time and
money. Savings up to some millions of dollars have been reported (of course only
in high-volume production).

However, if your process is still a very long way from optimum and you know
obvious improvements (based on your process knowledge), you should harvest
these improve- ments and get closer to the optimum first. DoE is efficient, if the
direction of further improvement is no longer obvious and you need a systematic
search for further improvement.

DoE is efficient if the response surface is smooth, as in the intro- ductory


example and in most common applications. However, if the response surface
contains sudden jumps (discontinuities, such as phase transitions) or narrow
spikes (for instance, resonances), difficulties arise. For sudden jumps, you should
first decide, which side of the jump is better and then stay on that side never
fit a model across the discontinuity. Narrow spikes in several factors would
require huge amounts of data to localize, therefore you should keep the factor
causing the resonance separate, optimize with respect to the other (smooth)
factors first and then search for the resonance in one fac- tor only, or always only
measure the peak of the resonance.

DoE is a powerful technique to learn from experimental results. To get the best
out of it requires some experience to select the most suitable design, to handle
the initially unfamiliar software, to help interpret the results and learn from them.
For your first project, you should either get somebody with experience to help or
start with a simple project with at most three factors and progress to more
complicated applications gradually.

Brief guide to the literature

The aim of this article was to give an introduction to what DoE can do to help
optimize processes and products, describe the procedure and give an overview
over designs. More details can be found in the literature. A small selection of
suitable books is provided below in the References section. The level of the
books range from a short introduction [2] through those with higher levels of
detail and difficulty. A more detailed introductory book is avail- able in German
[3]. Software-based introductions are available for factorial designs [4] and
response-surface design [5]. Ref. 6 is a very readable book showing the flexibility
of optimal designs, based on examples. Ref. 7 is a very good and detailed classic,
mainly on factorial designs. A classic on response surfaces (in- cluding the
introductory example) is Ref. 1. Finally, Ref. 8 is a classic on mixtures (a shorter
version has been published more recently).

You might also like