SEQUENTIAL GAUSSIAN SIMULATION
SEMINAR REPORT
Submitted by
Parag Jyoti Dutta
Roll No: 09406802
in partial fulfillment for the award of the degree
of
MASTER OF TECHNOLOGY
IN
GEOEXPLORATION
At
DEPARTMENT OF EARTH SCIENCES
INDIAN INSTITUTE OF TECHNOLOGY BOMBAY
MUMBAI - 400076
NOVEMBER 2009
Seminar
Contents
Chapter 1
Introduction.......................................................................................................................2
Chapter 2
Estimation versus simulation ..... 3
Reproducing model statistics by simulation
Using the spatial uncertainty model
Chapter 3
Monte-Carlo Simulation ....... 7
Modeling spatial uncertainty
Chapter 4
The MultiGaussian RF Model.......... 9
Normal Score Transform
Chapter 5
The Sequential Simulation Genre........ 12
Remarks
Implementation
Chapter 6
Sequential Gaussian Simulation ........ 16
Limitations
Bibliography
Seminar
Chapter 1
Introduction
Spatial interpolation concerns how to estimate the variable under study at an un-sampled
location given sample observations at nearby locations. This process of estimation (kriging)
aims at computing the minimum error variance (optimal) estimate of the unknown value and the
associated error variance at the unsampled location.
In many applications, however, we are more interested in modeling the uncertainty about the
unknown rather than deriving a single estimate. Uncertainty is modeled through conditional
probability distributions. The distribution
function
F ( x; z | (n)) Prob{Z ( x) z | (n)} made
conditional to the information available (n) fully models that uncertainty in the sense that
probability intervals can be derived, such as Prob{Z ( x) (a, b] | (n)} F ( x; b | (n)) F ( x; a | (n)) .
It is worth noting that these probability intervals are independent of any particular estimate z*(x) of
the unknown value z(x). Indeed uncertainty depends on information available (n), and not on the
particular optimality criterion retained to define an estimate. Such a model of local uncertainty
allows one to evaluate the risk involved in any decision-making process, such as delineation of
rich zones of mineralization where a drill-core sampling programme needs to be planned. From
the model of uncertainty, one can also derive estimates optimal for different criteria, customized
to the specific problem at hand, instead of retaining the least-squares error (kriged) estimate.
Each conditional cumulative distribution function (ccdf) F ( x; z | ( n)) provides a measure of local
uncertainty relating to a specific location x. However, a series of single-point ccdfs do not provide
any measure of multiple-point or spatial uncertainty, such as the probability that a string of
locations jointly exceed a given threshold value. Most applications require a measure of the joint
uncertainty about attribute values at several locations taken together. Such spatial uncertainty is
modeled by generating a set of multiple equiprobable realizations {z(x)(l), xA}, l = 1, 2,.., L}
of the joint distribution of attribute values in space, a process known as stochastic simulation.
The set of alternative realizations provides a visual and quantitative measure (a model) of
spatial uncertainty. All of these realizations reasonably match the same sample statistics and
exactly match the conditioning data. Each realization reproduces the variability of the input data
Seminar
in the multivariate sense; hence said to represent the geological texture or true spatial variability
of the phenomena.
Chapter 2
Estimation versus Simulation
The objective of estimation is to provide, at each point x, an estimator z*(x) which is as close as
possible to the true unknown value of the attribute z0(x). The criteria for measuring the quality of
2
estimation are unbiasedness and minimal estimation variance {[ Z ( x) Z * ( x )] } . There is no
reason, however, for these estimators to reproduce the spatial variability of the true values { z0(x)}.
In the case of kriging, for instance, the minimization of the estimation variance involves a
smoothing of the true dispersions. Typically, small values are overestimated, whereas large
values are underestimated. Another drawback of estimation is that the smoothing is not uniform.
Rather, it depends on the local data configuration: smoothing is minimal close to the data
locations and increases as the location being estimated is gets farther away from the data
locations. A map of kriging estimates appears more variable in densely sampled areas than in
sparsely sampled areas.
On the other hand, the simulation {z(l)(x)} with l denoting the lth realization, has the same first
two experimentally found moments (mean and covariance/variogram, as well as the histogram)
as the true values {z0(x)}, i.e., it identifies the main dispersion characteristics of these true values.
However, at each point x, the simulated value z(l)(x) is not the best possible estimator of z0(x). In
case of conditional simulation, in particular, the estimation variance of z0(x) by the conditionally
simulated value zc(l)(x) is exactly twice the kriging variance.
In general, the objective of simulations and estimations are not compatible. Conditional
simulation is preferred for a better reproduction of the variability of the attribute where too much
information would be lost, otherwise, by the smoothing effects of kriging. Therefore, we do not
simulate if our purpose is estimation. Estimation is preferable to locate and estimate reserves,
while conditional simulation is preferred to study the dispersion of the characteristics of these
reserves, remembering that in practice the real values are known only at the experimental
points x. A suite of conditional simulations also provides a measure of uncertainty about the
spatial distribution of the attributes of interest.
Seminar
Smooth interpolated maps should never be used for applications sensitive to the presence of
extreme values and their patterns of continuity. Let us consider the example of a problem of
assessing groundwater travel-times from a nuclear repository to the surface. A smooth map of
estimated transmissivities would fail to reproduce critical features, such as strings of large or
small values that form flow paths or barriers. The processing of kriged transmissivity map
through a flow simulator may yield inaccurate travel-times. Similarly, the risk of soil pollution by
heavy metals would be underestimated by a kriged map of metal concentrations that fails to
reproduce clusters of large concentrations above the tolerable maximum.
Reproducing model statistics by simulation
Instead of a map of local best estimates, stochastic simulation generates a map or a realization
of z values over the study area A, say, {z (l) (x), xA} with l denoting the lth realization, which
reproduces statistics deemed most consequential for the problem in hand. Typical requisites for
such simulated maps are as follows:
1. Data values are honoured at their locations:
x = x , = 1,., n
z (l) (x) = z (x)
The realization is then said to be conditional (to the data values).
2. The histogram of simulated values reproduces closely the declustered sample
histogram.
3. The covariance model C(h) or, better, the set of indicator covariance models CI (h; zk) for
various thresholds zk are reproduced.
4. Spatial correlation with a secondary attribute or multiple-point statistics may also be
reproduced.
Figure 1 (a, left) shows 29 sample data taken from a true field (b, right) sample variogram (broken line) of the 29
data versus variogram of the true field (more continuous line).
50 .
40 .
Locations
0.
of
29
Data
40 00
5.
000
4.
000
3.
000
40 .
30 .
20 .
10 .
0.
Variogram
17.
22.
190
750
3.
1.
30 .
20 .
8.
0.
2.
3.
0.
2.
21
0
5.
980
080
260
0.
0.
2.
5700
0.
940
330 0
2.
610
1.
7100
0.
550
1
.
10.
000
920
660
1
.
2800
0.
510
840 0
0.
0.
0.
1.
0. 1
700
01
0
4.
0.
10 .
9.
840
1
.
030
11
00
340
1.
360 2.
210
0 00
1
900
81
0
0.
090 00
20 .
30 .
40 .
50 .
0.
5.
10 .
1
5.
Distance
20 .
25.
30 .
Seminar
Figure 1:
Figure 2 shows (a) the true field along with the corresponding histogram (b) kriged estimates
based on the 29 data of Figure 1 (smoother than true field); the variance of the kriged estimate
is less than the actual variance, (c, d) two sequential Gaussian simulations constrained to the
29 data; the histograms of the Gaussian simulations are similar to the true field.
Figure 2: (a) True field and histogram, (b) kriging estimates (smoother than true field); notice that the variance of the
kriging estimate is less than the actual variance, (c,d) two sequential Gaussian simulations conditioned to the data;
the histograms of the Gaussian simulations are similar to the true field.
Histogram unknown
Unknow true field
50. 000
(a)
0. 500
Number of Data
mean
std. dev.
coef. of var
0. 400
maximum
upper quartile
median
lower quartile
minimum
10. 0 00
North
6. 000
Frequency
8. 000
102. 70
2. 56
0. 96
0. 34
0. 0 1
0. 200
4. 000
0. 100
2. 000
0. 000
0. 0
0. 0
0. 300
250 0
2. 58
5. 15
2. 00
0. 0
East
0. 0
10. 0
primary -
15. 0
(b)
4. 000
maximum
upper quartile
median
lower quartile
minimum
0. 300
Frequency
North
3. 000
Number of Data
mean
std. dev.
coef. of var
0. 400
5. 000
20. 0
Z or U
Histogram kriging map
kriging map
50. 000
5. 0
50 . 000
250 0
2. 75
2. 46
0. 90
22. 75
2. 89
1. 96
1. 48
0. 0 9
0. 200
2. 000
0. 100
1. 00 0
0. 000
0. 0
0. 0
0. 0
East
0. 0
10. 0
15. 0
Histogram real.
Number of Data 250 0
mean 3. 08
std. dev.
4. 83
coef. of var 1. 57
0. 400
10. 0 00
Frequency
North
6. 000
maximum
upper quartile
median
lower quartile
minimum
0. 300
8. 000
(c)
20. 0
Estimate
Simulated realization 1
50. 000
5. 0
50 . 000
30. 00
3. 02
1. 37
0. 42
0. 0 1
0. 200
4. 000
0. 100
2. 000
0. 000
0. 0
0. 0
0. 0
East
0. 0
5. 0
10. 0
15. 0
Simulated realization 1
0. 500
Histogram real.
Number of Data 250 0
mean 2. 53
std. dev.
4. 35
coef. of var 1. 72
50. 000
10. 0 00
0. 400
maximum
upper quartile
median
lower quartile
minimum
North
6. 000
0. 300
0. 200
4. 000
2. 000
0. 0
0. 0
Frequency
8. 000
(d)
20. 0
value
50 . 000
0. 0
East
50 . 000
0. 100
0. 000
0. 0
5. 0
10. 0
value
15. 0
20. 0
30. 00
2. 52
1. 00
0. 34
0. 0 1
Seminar
Using the spatial uncertainty model
Generating alternative realizations of the spatial distribution of an attribute is rarely a goal per se.
Rather, these realizations serve as inputs to complex transfer functions such as flow simulators
in reservoir engineering. Flow simulators consider all locations simultaneously rather then one
at a time. The processing of input realizations yields a unique value for each response, for
example, a unique value for the groundwater travel time from one location to another or
remediation cost. The histogram of the L response values, corresponding to those L input
realizations provides a measure of the response uncertainty resulting from our imperfect
knowledge of the distribution of the phenomena ( z) in space. That measure can be used in
subsequent risk analysis and decision-making. In the mining industry, simulations of the spatial
distribution of an attribute can be used for studying the technical and economic effects of
complex mining operations; for instance, complex geometries in underground mining or testing
various mining schedules on several different simulations. Thus simulations provide an
appropriate platform to study any problem relating to variability, for example risk analysis, in a
way that estimates cannot.
Seminar
Chapter 3
Monte-Carlo Simulation
Let F ( x; z|n )) be the conditional cumulative distribution function (ccdf) modeling the uncertainty
about the unknown z0(x), at the point x. Rather than deriving a single estimated value z*(x) from
that ccdf, one may draw from it a series of L simulated values z(l)(x), l = 1,, L. Each value z(l)(x)
represents a possible realization of the RV Z(x) modelling the uncertainty at the location x.
The Monte-Carlo simulation proceeds in two steps:
1. A series of L independent random numbers p(l), l = 1,, L, uniformly distributed in [0,1], is
drawn.
2. The lth simulated value z(l)(x) is identified with the p(l)-quantile of the ccdf (Fig 3):
1
(l )
z(l)(x) = F ( x; p | (n))
l = 1,, L
The L simulated values z(l)(x) are distributed according to the conditional cdf. Indeed,
Prob{Z ( l ) ( x) z} P rob{F 1 ( x; p ( l ) | (n))}
from the previous definition,
(l )
= Prob{ p F ( x; z | (n))}
since F ( x; z|n )) is monotonic increasing,
= F ( x; z|n ))
since p(l) are uniformly distributed in [0,1]
This property of ccdf reproduction allows one to approximate any moment or quantile of the
conditional distribution by the corresponding moment or quantile of the histogram of many
realizations z(l)(x)
F ( x; z|n ))
z(l)(x)
z -value
Seminar
Figure 3
Monte-Carlo simulation from a conditional cdf F ( x; z|n ))
Modeling spatial uncertainty
The basic idea is to generate a set of equiprobable realizations of the joint (spatial) distribution
of attribute values at several locations and to use differences among simulated maps as a
measure of uncertainty. Rather than modeling the uncertainty at one location, a set of simulated
maps {z(l)(x), = 1,., N}, l = 1,, L, can be generated by sampling the N-variate or N-point
ccdf that models the joint uncertainty at the N locations x:
F ( x1 , x 2 ,....., x N ; z1 , z 2 ,....., z N | (n)) Prob{Z ( x1 ) z1 , Z ( x 2 ) z 2 ,......, Z ( x N ) z N | (n)}
Inference of the above N-point conditional cdf requires knowledge or stringent hypothesis about
the spatial law (multi-variate distribution) of the RF Z(x). Ccdfs can be modeled using either a
parametric (a model is assumed for the multi-variate distribution) or non-parametric (indicator)
approaches. In the parametric approach, the multi-Gaussian RF model is commonly adopted
because it is one model whose spatial law is fully determined by the z-covariance function; it
underlies several simulation algorithms such as LU decomposition algorithm, Sequential
Gaussian Simulation and Turning bands Simulation. Other Gaussian-related techniques include
truncated Gaussian and pluriGaussian simulation algorithms.
Two shortcomings of the parametric approach are:
1. The spatial uncertainty assessment becomes very complex as the number of grid nodes
increases.
2. It is cumbersome to check in practice the validity of the Gaussian assumption, and data
sparsity prevents us from performing such checks for more than two locations at a time.
Seminar
Chapter 4
The MultiGaussian RF Model
The spatial law of the RF Z(x) as derived by the assumed model must be congenial enough so
that all ccdfs F ( x; z|n )) , x A, have the same analytical expression and are fully specified
through a few parameters. The problem of determining the ccdf at location x thus reduces to
that of estimating a few parameters, say, the mean and variance. The multivariate Gaussian RF
model is most widely used because its extremely congenial properties render the inference of
the parameters of the ccdf straightforward. The approach typically requires a prior normal score
transform of data to ensure that at least the univariate distribution (histogram) is normal. The
normal score ccdf then undergoes a back-transform to yield the ccdf of the original variable.
If {Y(x), x A } is a standard multivariate Gaussian RF with covariance function CY (h), then the
following are true (Goovaerts, 1997):
1. All subsets of that RF, e.g., {Y(x), x D A }, are also multivariate normal.
2. The univariate cdf of any linear combination of RV components is normal:
n
U Y ( x )
1
is normally distributed, for any choice of n locations
x A and any set of weights .
3. The bivariate distribution of any pairs of RVs Y(x) and Y(x + h) is normal and fully
determined by the covariance function CY (h) .
4. If two RVs Y (x ) and Y (x ) are uncorrelated, i.e., if Cov{Y ( x), Y ( x )} 0, they are also
independent.
5. All conditional distributions of any subset of the RF Y (x ) , given realizations of any other
subsets of it, are (multi-variate) normal. In particular, the conditional distribution of the
single variable Y (x) given the n data
y ( x ) is normal and fully characterized by its two
10
Seminar
parameters, mean and variance, which are the conditional mean and the conditional
variance of the RV Y (x ) given the information (n):
y E{ y ( x) | (n)}
[G ( x; y | (n))] G
Var{ Y ( x) | (n)}
where G (.) is the standard normal cdf.
Under the multiGaussian model, the mean and variance of the ccdf at any location x are
identical to the simple kriging (SK) estimate
data
*
y SK
( x)
and SK variance
2
SK
( x)
obtained from the n
y ( x ) (Journel and Huijbregts, 1978). The ccdf is then modelled as
[G ( x; y | (n))]
*
SK
*
y y SK
( x)
(
x
)
SK
with
*
y SK
( x) m( x ) SK [ y ( x ) m( x )]
1
n
2
SK
( x) C (0) SK C ( x x )
Normal Score Transform
The multiGaussian approach is very convenient: the inference of the ccdf reduces to solving a
simple kriging system at any location x. The trade-off cost is the assumption that data follow a
multiGaussian distribution, which implies first that the one point distribution of data (histogram)
is normal. However, many variables in earth sciences show an asymmetric distribution with a
few very large values (positive skewness). Thus the multiGaussian approach starts with an
identification of the standard normal distribution and involves the following steps:
1.
The original z-data are first transformed into y-values with a standard normal histogram.
Such a transform is referred to as a normal score transform, and the y-values
y ( x ) ( z ( x )) are called normal scores.
2.
Provided the biGaussian assumption is not invalidated, the multiGaussian model is
applied to the normal scores, allowing the derivation of the Gaussian ccdf at any
unsampled location x:
G ( x; y | (n)) rob{Y ( x) y | (n)}
3. The ccdf of the original variable is then retrieved as
F ( x; z | (n)) Prob{Z ( x) z | ( n)}
11
Seminar
Prob{Y ( x) y | (n)}
G ( x; ( z ) | (n))
under the condition that the transform function (.) is monotonic increasing
The normal score transform function (.) can be derived through a graphical correspondence
between the univariate cdfs of the original and standard normal variables (Figure 4).
Let F (z ) and G ( y ) be the stationary univariate cumulative density functions (cdf) of the original
RF Z (x ) and the standard normal RF Y (x ) :
F ( z ) Prob{Z ( x ) z}
G ( y ) Prob{Y ( x ) y}
The transform that allows one to go from a RF Z (x ) with cdf F (z ) to a RF Y (x ) with standard
Gaussian cdf G ( y ) is depicted by arrows in Figure 4 and is written as
Y ( x) ( Z ( x)) G 1 [ F ( Z ( x ))]
1
where G (.) is the inverse Gaussian cdf or quantile function of the RF Y (x )
F(z)
G(y)
y ( z)
z-values
y-values
Figure 4: Graphical procedure for transforming the cumulative distribution of original z-values into the
standard normal distribution of original y-values called normal scores.
In practice, the normal score transform proceeds in three steps:
12
Seminar
1.
The original data {z(x), = 1,., N}are ranked in ascending order. Since the normal
score transform is monotonic, ties in z-values must be broken.
2.
The sample cumulative distribution function of the original data variable z(x), is
calculated.
3.
The normal score transform of the z-datum with rank k is matched to the
quantile of the standard normal cdf:
p k*
y ( x ) G 1 [ F * ( z ( x ))] G 1 ( p k* )
Chapter 5
The Sequential Simulation Genre
The wide class of simulation algorithms known under the generic name sequential simulation is
essentially based on the same underlying theory: instead of modeling the N-variate ccdf, a
univariate ccdf is modeled and sampled at each of the N nodes visited along a random
sequence. To ensure reproduction of the z-covariance model, each univariate ccdf is made
conditional not only to the original n data but also to all values simulated at previously visited
locations.
{Z ( x ), j 1,......, N }
j
Let
be a set of random variables defined at N locations j within the study
area A. These locations need not be gridded. The objective is to generate several joint
realizations of these N RVs:
{z (l ) ( x j ), j 1,......, N }
{z ( x ), 1,......, n}
conditional to the data set
l = 1,, L,
Let us consider the joint simulation of z-values at two locations only, say, x1 and x 2 . A set of
(l )
(l )
realizations {z ( x1 ), z ( x 2 )} , l = 1,, L, can be generated by sampling the bivariate ccdf:
F ( x1 , x 2 ; z1 , z 2 | (n)) Prob{Z ( x1 ) z1 , Z ( x 2 ) z 2 | (n)}
An alternative approach is provided by Bayes axiom, whereby any bivariate ccdf can be
expressed as a product of two univariate ccdfs:
F ( x1 , x 2 ; z1 , z 2 | ( n)) F ( x 2 ; z 2 | (n 1)) F ( x1 ; z1 | (n))
13
Seminar
where |(n+1) denotes conditioning to the n data
z ( x ) , and to the realization Z ( x1 ) z ( l ) ( x1 ).
(l )
(l )
The above decomposition allows one to generate the pair {z ( x1 ), z ( x 2 )} in two steps: the
(l )
value z ( x1 ) is first drawn from the ccdf F ( x1 ; z1 | (n)) , then the ccdf at location x 2 is
(l )
conditioned to the realization z ( x1 ) in addition to the original data ( n) and its sampling yields
(l )
the correlated value z ( x 2 ) . The idea is to trade the sampling hence modeling of the bivariate
ccdf for the sequential sampling of two univariate ccdfs easier to infer, hence the generic name
sequential simulation algorithm.
The sequential principle can be generalized to more than two locations. By recursive application
of the Bayes axiom, the N-variate ccdf can be written as the product of N univariate ccdfs:
F ( x1 ,......x N ; z1 ,......z N | (n)) F ( x N ; z N | (n N 1)) F ( x N 1 ; z N 1 | (n N 2)) ............
. F ( x 2 ; z 2 | (n 1)) F ( x1 ; z1 | (n))
F ( xN ; z N | (n N 1)) is the ccdf of Z ( x N ) given the set of n original data
Z ( x j ) z (l ) ( x j ), j 1,......, N 1
values and the ( N 1) realizations
where, for example,
The above decomposition allows one to generate a realization of the random vector
{Z ( x j ), j 1,......, N }
in N successive steps:
Model the cdf at the first location x1 , conditional to the n original data
z ( x ) :
F ( x1 , z | (n)) Prob{Z ( x1 ) z | (n)}
(l )
Draw from that cdf a realization z ( x1 ), which becomes a conditioning datum for all
subsequent drawings.
.
.
.
.
.
At the ith node
xi visited, model the conditional cdf of Z ( xi ) given the n original data and
z (l ) ( x j )
x j , j 1,....., i 1 :
all the (i 1) values
simulated at previously visited locations
14
Seminar
F ( xi , z | (n i 1)) Prob{Z ( xi ) z | (n i 1)}
Draw from that ccdf a realization
subsequent drawings.
Repeat the two previous steps until all the N nodes are visited and each has been given
a simulated value.
The resulting set of simulated values
the
z (l ) ( xi ),
RF
{Z ( x), x A} over
{z (l ) ( x j ), j 1,......, N }
the
which becomes a conditioning datum for all
{z ( l ) ( x j ), j 1,......, N }
nodes
x j
Any
represents just one realization of
number
L of such realizations
, l = 1,, L, can be obtained by repeating L times the entire sequential
process with possibly different paths to visit the N nodes.
Remarks:
1. The sequential simulation algorithm requires the determination of a conditional cdf at
each location being simulated. Two major classes of sequential simulation algorithms
can be distinguished, depending on whether the series of conditional cdfs are
determined using the multi-Gaussian or the indicator formalisms.
2. Sequential simulation ensures that data are honored at their locations (conditional).
Indeed, at any datum location
x , the simulated value is drawn from a zero-variance,
unit step ccdf with mean equal to the z-datum
z ( x ) itself. If large measurement errors
render questionable the exact matching of data values, one should allow the simulated
values to deviate somewhat from data at their locations. If the errors are normally
distributed, the simulated value could be drawn from a Gaussian ccdf centered on the
datum value and with a variance equal to the error variance.
3. The sequential principle can be extended to simulate several continuous or categorical
attributes.
Implementation
15
Seminar
Search strategies
The sequential simulation algorithm requires the determination of N successive conditional cdfs
F ( x1 ; z | (n)),....... , F ( x N ; z | (n N 1)) , with an increasing level of conditioning information.
Correspondingly, the size of the kriging system (s) to be solved to determine these ccdfs
increases and becomes quickly prohibitive as the simulation progresses. The data closest to the
location being estimated tend to screen the influence of more distant data. Thus, in the practice
of sequential simulation, only the original data and those previously simulated values closest to
the location
distance
being simulated are retained. Good practice consists of using the semi-variogram
( x x ) so that the conditioning data are preferentially selected along the direction of
maximum continuity.
As the simulation progresses, the original data tend to be overwhelmed by the large number of
previously simulated values, particularly when the simulation grid is dense. A balance between
the two types of conditioning information can be preserved by separately searching the original
data and the previously simulated values (two part search): at each location
, a fixed number
n(x ) of closest original data are retained no matter how many previously simulated values are
in the neighborhood of
Visiting sequence
In theory, the N nodes can be simulated in any sequence. However, because only neighboring
data are retained, artificial continuity may be generated along a deterministic path visiting the N
nodes. Hence, a random sequence or path is recommended.
When generating several realizations, the computational time can be reduced considerably by
keeping the same random path for all realizations. Indeed, the N kriging systems, one for each
node
x j
, need be solved only once since the N conditioning data configurations remain the
same from one realization to another. The trade-off cost is the risk of generating realizations that
are two similar. Therefore, it is better to use a different random path for each realization.
16
Seminar
Multiple grid simulation
The use of a search neighborhood limits reproduction of the input covariance model to the
radius of that neighborhood. Another obstacle to reproduction of long-range structure is the
screening of distant data by too many data closer to the location being simulated.
The multiple-grid concept (the attribute values are first simulated on a coarse grid and then
continue on a finer grid) allows one to reproduce long-range correlation structures without
having to consider large search neighborhoods with too many conditioning data. The previously
simulated values on the coarse grid are used as data for simulation on the fine grid. A random
path is followed within each grid. The procedure can be generalized to any number of
intermediate grids; the number depends on the number of structures with different ranges to be
reproduced and the final grid spacing.
Chapter 6
Sequential Gaussian Simulation
Implementation of the sequential principle under the MultiGaussian RF model is referred to as
sequential Gaussian simulation (sGs). Several algorithms exist: algorithms for simulating a
single attribute using only values of that attribute, with modifications to account for secondary
information as well as for joint simulation of several correlated attributes. Here, only the first
case, i.e., accounting for a single attribute, is considered.
Let us consider the simulation of the continuous attribute z at N nodes
necessarily regular) conditional to the data set
x j
of a grid (not
{z ( x ), 1,......, n} .
Sequential Gaussian simulation proceeds as follows:
1.
First step: check the appropriateness of the multiGaussian RF model, which calls for a
prior transform of z-data into y-data with a standard normal cdf using the normal score
17
Seminar
transform. Normality of the bivariate distribution of the resulting normal score variable
Y ( x) ( Z ( x)) is then checked. In practice, if indicator semivariograms or ancillary
information do not invalidate the biGaussian assumption, the multiGaussian formalism is
adopted.
If the multiGaussian RF model is retained for the y-variable, sequential Gaussian
2.
simulation is performed on the y-data:
Define a random path visiting each node of the grid only once.
At each node x , determine the parameters (mean and variance) of the Gaussian
ccdf G ( x ; y | (n)) using SK with the normal score variogram model Y (h) . The
conditioning information (n) consists of a specified number n(x ) of both normal score
data
(l )
y ( x ) and values y ( x j ) simulated at previously visited grid nodes.
(l )
Draw a simulated value y ( x ) from that cdf, and add it to the data set.
Proceed to the next node along the random path, and repeat the two previous
steps.
Loop until all N nodes are simulated.
3.
The
final
step
{ y (l ) ( x j ), j 1,...., N }
consists
of
back-transforming
the
simulated
normal
scores
into simulated values for the original variable, which amounts to
applying the inverse of the normal score transform to the simulated y-values:
z (l ) ( x j ) 1 ( y (l ) ( x j ))
j 1,....., N
1
1
1
with (.) F (G (.)), where F (.) is the inverse cdf or quantile function of the variable Z,
and G(.) is the standard Gaussian cdf. That back-transform allows one to identify the original
z-histogram F (z ) . Indeed,
Prob{Z ( l ) ( x ) z} Prob{ 1 (Y ( l ) ( x)) z}
(l)
= Prob{Y ( x ) ( z )}
since (.) is monotonic increasing
18
Seminar
= G[ ( z )] F ( z )
Other realizations
from the definition of normal score transform
{z (l ) ( x j ), j 1,......, N }, l l ,
are obtained by repeating steps 2 and 3 with a
different random path.
The basic steps of the sGs algorithm are illustrated in the following flow chart.
Non-stationary
behaviors
could
be
accounted for using algorithms other than simple kriging to estimate the mean of the Gaussian
ccdf: ordinary kriging or universal kriging of the order k. However, Gaussian theory requires that
the simple kriging variance of normal scores be used for variance of the Gaussian ccdf (Journel,
1980).
Limitations:
Various limitations and shortcomings can be attributed to sequential Gaussian simulation:
1. sGs relies on the assumption of multi-variate Gaussianity, an assumption that can never be
fully checked in practice, yet always seems to be taken for granted. Multi-Gaussianity leads
19
Seminar
to simulated realizations that have maximally disconnected extremes (maximum entropy), a
property that often conflicts with geological reality.
2. sGs requires a transformation into Gaussian space before simulation and a corresponding
back-transformation after simulation is finished. However, often the primary variable to be
simulated has to be conditioned to a secondary variable that is a linear or non-linear volume
average of the primary variable. Normal-score transforms are non-linear transforms, hence
they destroy the possible linear relation that exists between primary and secondary variable,
or, they change the non-linearity if that relation is non-linear.
3. sGs reproduces, by theory, only the normal score variogram, not the original variogram
model. Usually reproduction of the normal score variogram entails reproduction of the
original data variogram if the data histogram is not too skewed. However in case of high
skewness, the reproduction of the variogram model after back-transformation is not
guaranteed at all.
Actually, reproduction of the covariance model CY (h) does not require the successive ccdf
models to be Gaussian; they can be of any type as long as their means and variances are
determined by simple kriging (Journel, 1994). This result leads to an important extension of the
sequential simulation paradigm whereby the original z-attribute values are simulated directly
without any normal score transform. This algorithm is called direct sequential simulation (dssim).
In the absence of a normal score transform and back-transform, there is, however, no control on
the histogram of simulated values. Reproduction of a target histogram can be achieved by post
processing the dssim realization.
Bibliography
Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, New
York, 512 pp.
Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics. Academic Press, New York, 600 pp.
Journel, A.G., 1980. The lognormal approach to predicting local distributions of selective mining
unit grades. Mathematical Geology, 12(4), 285303.
Journel, A.G., 1994. Modeling uncertainty: Some conceptual thoughts. Geostatistics for the
Next Century, pages 3043. Kluwer, Dordrecht.
20
Seminar