Chapter 9
Input Modeling
Discrete-Event System Simulation
Purpose & Overview
The quality of the output is no better than the quality of inputs.
In this chapter, we will discuss the 4 steps of input model
development:
Collect data from the real system
Identify a probability distribution to represent the input
process
Choose parameters for the distribution
Evaluate the chosen distribution and parameters for
goodness of fit.
2
Identifying the Distribution
Histograms
Selecting families of distribution
Parameter estimation
Goodness-of-fit tests
3
Histograms [Identifying the distribution]
A frequency distribution or histogram is useful in
determining the shape of a distribution
The number of class intervals depends on:
The number of observations
Suggested: the square root of the sample size
4
Histograms [Identifying the distribution]
What is the correct number of class intervals?
Same data
with different
interval sizes
5
Selecting the Family of Distributions
[Identifying the distribution]
Remember the physical characteristics of the process
Is the process naturally discrete or continuous valued?
Is it bounded?
A family of distributions is selected based on:
The context of the input variable
Shape of the histogram
Frequently encountered distributions:
Easier to analyze: exponential, normal and Poisson
More difficult to analyze: beta, gamma and Weibull
No “exact” distribution for any stochastic process
Goal: obtain a good approximation
6
Selecting the Family of Distributions
[Identifying the distribution]
Use the physical basis of the distribution as a guide, for
example:
Binomial: # of successes in n trials
Poisson: # of independent events that occur in a fixed amount of
time or space
Exponential: time between independent events, or a process time
that is memoryless
Weibull: time to failure for components
Discrete or continuous uniform: models complete uncertainty
Triangular: a process for which only the minimum, most likely,
and maximum values are known
Empirical: resamples from the actual data collected
7
Parameter Estimation [Identifying the distribution]
Next step after selecting a family of distributions
If observations in a sample of size n are X1, X2, …, Xn (discrete
or continuous), the sample mean and variance are:
i1 X i
n n
X i
2
n X 2
X S2 i 1
n n 1
If the data are discrete and have been grouped in a frequency
distribution:
j 1 f j X j
n n
j 1
f j X 2
j nX 2
X S2
n n 1
where fj is the observed frequency of value Xj
8
Parameter Estimation [Identifying the distribution]
When raw data are unavailable (data are grouped into class
intervals), the approximate sample mean and variance are:
j 1 f j X j
c n
j 1
f j m 2
j nX 2
X S2
n n 1
where fj is the observed frequency of in the jth class interval
mj is the midpoint of the jth interval, and c is the number of class intervals
9
Goodness-of-Fit Tests [Identifying the distribution]
Conduct hypothesis testing on input data distribution using:
Kolmogorov-Smirnov test
Chi-square test
Ho= Random variable fits the suggested distribution
H1= Random variable does not fit the suggested distribution
10
Chi-Square test [Goodness-of-Fit Tests]
Intuition: comparing the histogram of the data to the shape of
the candidate density or mass function
Valid for large sample sizes (>20 data values)
By arranging the n observations into a set of k class intervals or
cells, the test statistics is:
k
(Oi Ei ) 2 Expected Frequency
02
i 1
Ei
Ei = n*pi
where pi is the theoretical
Observed prob. of the ith interval.
Frequency Suggested Minimum = 5
which approximately follows the chi-square distribution with k-s-1 degrees
of freedom, where s = # of parameters of the hypothesized distribution
estimated by the sample statistics.
11
Chi-Square test [Goodness-of-Fit Tests]
critical values of the test statistic given in Table A.6
k=# of classes
Ei= expected number of data in class i
Oi= observed number of data in class i
Ei= npi
n= total number of observations
pi=expected probability of the ith class
Degrees of freedom= k-s-1
where s= # of parameters of the dist.
Minimum # of data in one Ei must be = 5
# of class intervals:
Obvious for the discrete case
use Table 9.5 for the conts case
For Continuous Dists:
take classes with equal probability, pi=1/k
Ei=npi >=5 implies k<= n/5 12
Chi-Square test [Goodness-of-Fit Tests]
Recommended number of class intervals (k):
Sample Size, n Number of Class Intervals, k
20 Do not use the chi-square test
50 5 to 10
100 10 to 20
1/2
> 100 n to n/5
13
Chi-Square test [Goodness-of-Fit Tests]
Vehicle Arrival Example (continued):
H0: the random variable is Poisson distributed.
H1: the random variable is not Poisson distributed.
xi Observed Frequency, Oi Expected Frequency, Ei (Oi - Ei)2/Ei Ei np ( x)
0 12 2.6
7.87 e x
1
2
10
19
9.6
17.4 0.15
n
3 17 21.1 0.8
x!
4 19 19.2 4.41
5 6 14.0 2.57
6 7 8.5 0.26
7 5 4.4
8 5 2.0
9 3 0.8 11.62 Combined because
10 3 0.3
> 11 1 0.1 of min Ei
100 100.0 27.68
Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is
rejected at the 0.05 level of significance.
02 27.68 02.05,5 11 .1
14