Sequential Gaussian Simulation
Sequential Gaussian Simulation
SEMINAR REPORT
Submitted by
MASTER OF TECHNOLOGY
IN
GEOEXPLORATION
At
Seminar
Contents
Chapter 1
Introduction.......................................................................................................................2
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Bibliography
Seminar
Chapter 1
Introduction
Spatial interpolation concerns how to estimate the variable under study at an un-sampled
location given sample observations at nearby locations. This process of estimation (kriging)
aims at computing the minimum error variance (optimal) estimate of the unknown value and the
associated error variance at the unsampled location.
In many applications, however, we are more interested in modeling the uncertainty about the
unknown rather than deriving a single estimate. Uncertainty is modeled through conditional
probability distributions. The distribution
function
conditional to the information available (n) fully models that uncertainty in the sense that
probability intervals can be derived, such as Prob{Z ( x) (a, b] | (n)} F ( x; b | (n)) F ( x; a | (n)) .
It is worth noting that these probability intervals are independent of any particular estimate z*(x) of
the unknown value z(x). Indeed uncertainty depends on information available (n), and not on the
particular optimality criterion retained to define an estimate. Such a model of local uncertainty
allows one to evaluate the risk involved in any decision-making process, such as delineation of
rich zones of mineralization where a drill-core sampling programme needs to be planned. From
the model of uncertainty, one can also derive estimates optimal for different criteria, customized
to the specific problem at hand, instead of retaining the least-squares error (kriged) estimate.
Each conditional cumulative distribution function (ccdf) F ( x; z | ( n)) provides a measure of local
uncertainty relating to a specific location x. However, a series of single-point ccdfs do not provide
any measure of multiple-point or spatial uncertainty, such as the probability that a string of
locations jointly exceed a given threshold value. Most applications require a measure of the joint
uncertainty about attribute values at several locations taken together. Such spatial uncertainty is
modeled by generating a set of multiple equiprobable realizations {z(x)(l), xA}, l = 1, 2,.., L}
of the joint distribution of attribute values in space, a process known as stochastic simulation.
The set of alternative realizations provides a visual and quantitative measure (a model) of
spatial uncertainty. All of these realizations reasonably match the same sample statistics and
exactly match the conditioning data. Each realization reproduces the variability of the input data
Seminar
in the multivariate sense; hence said to represent the geological texture or true spatial variability
of the phenomena.
Chapter 2
reason, however, for these estimators to reproduce the spatial variability of the true values { z0(x)}.
In the case of kriging, for instance, the minimization of the estimation variance involves a
smoothing of the true dispersions. Typically, small values are overestimated, whereas large
values are underestimated. Another drawback of estimation is that the smoothing is not uniform.
Rather, it depends on the local data configuration: smoothing is minimal close to the data
locations and increases as the location being estimated is gets farther away from the data
locations. A map of kriging estimates appears more variable in densely sampled areas than in
sparsely sampled areas.
On the other hand, the simulation {z(l)(x)} with l denoting the lth realization, has the same first
two experimentally found moments (mean and covariance/variogram, as well as the histogram)
as the true values {z0(x)}, i.e., it identifies the main dispersion characteristics of these true values.
However, at each point x, the simulated value z(l)(x) is not the best possible estimator of z0(x). In
case of conditional simulation, in particular, the estimation variance of z0(x) by the conditionally
simulated value zc(l)(x) is exactly twice the kriging variance.
In general, the objective of simulations and estimations are not compatible. Conditional
simulation is preferred for a better reproduction of the variability of the attribute where too much
information would be lost, otherwise, by the smoothing effects of kriging. Therefore, we do not
simulate if our purpose is estimation. Estimation is preferable to locate and estimate reserves,
while conditional simulation is preferred to study the dispersion of the characteristics of these
reserves, remembering that in practice the real values are known only at the experimental
points x. A suite of conditional simulations also provides a measure of uncertainty about the
spatial distribution of the attributes of interest.
Seminar
Smooth interpolated maps should never be used for applications sensitive to the presence of
extreme values and their patterns of continuity. Let us consider the example of a problem of
assessing groundwater travel-times from a nuclear repository to the surface. A smooth map of
estimated transmissivities would fail to reproduce critical features, such as strings of large or
small values that form flow paths or barriers. The processing of kriged transmissivity map
through a flow simulator may yield inaccurate travel-times. Similarly, the risk of soil pollution by
heavy metals would be underestimated by a kriged map of metal concentrations that fails to
reproduce clusters of large concentrations above the tolerable maximum.
x = x , = 1,., n
50 .
40 .
Locations
0.
of
29
Data
40 00
5.
000
4.
000
3.
000
40 .
30 .
20 .
10 .
0.
Variogram
17.
22.
190
750
3.
1.
30 .
20 .
8.
0.
2.
3.
0.
2.
21
0
5.
980
080
260
0.
0.
2.
5700
0.
940
330 0
2.
610
1.
7100
0.
550
1
.
10.
000
920
660
1
.
2800
0.
510
840 0
0.
0.
0.
1.
0. 1
700
01
0
4.
0.
10 .
9.
840
1
.
030
11
00
340
1.
360 2.
210
0 00
1
900
81
0
0.
090 00
20 .
30 .
40 .
50 .
0.
5.
10 .
1
5.
Distance
20 .
25.
30 .
Seminar
Figure 1:
Figure 2 shows (a) the true field along with the corresponding histogram (b) kriged estimates
based on the 29 data of Figure 1 (smoother than true field); the variance of the kriged estimate
is less than the actual variance, (c, d) two sequential Gaussian simulations constrained to the
29 data; the histograms of the Gaussian simulations are similar to the true field.
Figure 2: (a) True field and histogram, (b) kriging estimates (smoother than true field); notice that the variance of the
kriging estimate is less than the actual variance, (c,d) two sequential Gaussian simulations conditioned to the data;
the histograms of the Gaussian simulations are similar to the true field.
Histogram unknown
(a)
0. 500
Number of Data
mean
std. dev.
coef. of var
0. 400
maximum
upper quartile
median
lower quartile
minimum
10. 0 00
North
6. 000
Frequency
8. 000
102. 70
2. 56
0. 96
0. 34
0. 0 1
0. 200
4. 000
0. 100
2. 000
0. 000
0. 0
0. 0
0. 300
250 0
2. 58
5. 15
2. 00
0. 0
East
0. 0
10. 0
primary -
15. 0
(b)
4. 000
maximum
upper quartile
median
lower quartile
minimum
0. 300
Frequency
North
3. 000
Number of Data
mean
std. dev.
coef. of var
0. 400
5. 000
20. 0
Z or U
kriging map
50. 000
5. 0
50 . 000
250 0
2. 75
2. 46
0. 90
22. 75
2. 89
1. 96
1. 48
0. 0 9
0. 200
2. 000
0. 100
1. 00 0
0. 000
0. 0
0. 0
0. 0
East
0. 0
10. 0
15. 0
Histogram real.
0. 400
10. 0 00
Frequency
North
6. 000
maximum
upper quartile
median
lower quartile
minimum
0. 300
8. 000
(c)
20. 0
Estimate
Simulated realization 1
50. 000
5. 0
50 . 000
30. 00
3. 02
1. 37
0. 42
0. 0 1
0. 200
4. 000
0. 100
2. 000
0. 000
0. 0
0. 0
0. 0
East
0. 0
5. 0
10. 0
15. 0
Simulated realization 1
0. 500
Histogram real.
50. 000
10. 0 00
0. 400
maximum
upper quartile
median
lower quartile
minimum
North
6. 000
0. 300
0. 200
4. 000
2. 000
0. 0
0. 0
Frequency
8. 000
(d)
20. 0
value
50 . 000
0. 0
East
50 . 000
0. 100
0. 000
0. 0
5. 0
10. 0
value
15. 0
20. 0
30. 00
2. 52
1. 00
0. 34
0. 0 1
Seminar
Seminar
Chapter 3
Monte-Carlo Simulation
Let F ( x; z|n )) be the conditional cumulative distribution function (ccdf) modeling the uncertainty
about the unknown z0(x), at the point x. Rather than deriving a single estimated value z*(x) from
that ccdf, one may draw from it a series of L simulated values z(l)(x), l = 1,, L. Each value z(l)(x)
represents a possible realization of the RV Z(x) modelling the uncertainty at the location x.
The Monte-Carlo simulation proceeds in two steps:
1. A series of L independent random numbers p(l), l = 1,, L, uniformly distributed in [0,1], is
drawn.
2. The lth simulated value z(l)(x) is identified with the p(l)-quantile of the ccdf (Fig 3):
1
(l )
z(l)(x) = F ( x; p | (n))
l = 1,, L
The L simulated values z(l)(x) are distributed according to the conditional cdf. Indeed,
(l )
= Prob{ p F ( x; z | (n))}
= F ( x; z|n ))
This property of ccdf reproduction allows one to approximate any moment or quantile of the
conditional distribution by the corresponding moment or quantile of the histogram of many
realizations z(l)(x)
F ( x; z|n ))
z(l)(x)
z -value
Seminar
Figure 3
Seminar
Chapter 4
U Y ( x )
1
3. The bivariate distribution of any pairs of RVs Y(x) and Y(x + h) is normal and fully
determined by the covariance function CY (h) .
4. If two RVs Y (x ) and Y (x ) are uncorrelated, i.e., if Cov{Y ( x), Y ( x )} 0, they are also
independent.
5. All conditional distributions of any subset of the RF Y (x ) , given realizations of any other
subsets of it, are (multi-variate) normal. In particular, the conditional distribution of the
single variable Y (x) given the n data
10
Seminar
parameters, mean and variance, which are the conditional mean and the conditional
variance of the RV Y (x ) given the information (n):
y E{ y ( x) | (n)}
[G ( x; y | (n))] G
Var{ Y ( x) | (n)}
*
y SK
( x)
and SK variance
2
SK
( x)
obtained from the n
[G ( x; y | (n))]
*
SK
*
y y SK
( x)
(
x
)
SK
with
*
y SK
( x) m( x ) SK [ y ( x ) m( x )]
1
n
2
SK
( x) C (0) SK C ( x x )
The original z-data are first transformed into y-values with a standard normal histogram.
Such a transform is referred to as a normal score transform, and the y-values
11
Seminar
Prob{Y ( x) y | (n)}
G ( x; ( z ) | (n))
under the condition that the transform function (.) is monotonic increasing
The normal score transform function (.) can be derived through a graphical correspondence
between the univariate cdfs of the original and standard normal variables (Figure 4).
Let F (z ) and G ( y ) be the stationary univariate cumulative density functions (cdf) of the original
RF Z (x ) and the standard normal RF Y (x ) :
F ( z ) Prob{Z ( x ) z}
G ( y ) Prob{Y ( x ) y}
The transform that allows one to go from a RF Z (x ) with cdf F (z ) to a RF Y (x ) with standard
Gaussian cdf G ( y ) is depicted by arrows in Figure 4 and is written as
Y ( x) ( Z ( x)) G 1 [ F ( Z ( x ))]
1
where G (.) is the inverse Gaussian cdf or quantile function of the RF Y (x )
F(z)
G(y)
y ( z)
z-values
y-values
Figure 4: Graphical procedure for transforming the cumulative distribution of original z-values into the
standard normal distribution of original y-values called normal scores.
12
Seminar
1.
The original data {z(x), = 1,., N}are ranked in ascending order. Since the normal
score transform is monotonic, ties in z-values must be broken.
2.
The sample cumulative distribution function of the original data variable z(x), is
calculated.
3.
The normal score transform of the z-datum with rank k is matched to the
quantile of the standard normal cdf:
p k*
y ( x ) G 1 [ F * ( z ( x ))] G 1 ( p k* )
Chapter 5
{Z ( x ), j 1,......, N }
j
Let
be a set of random variables defined at N locations j within the study
area A. These locations need not be gridded. The objective is to generate several joint
realizations of these N RVs:
{z (l ) ( x j ), j 1,......, N }
{z ( x ), 1,......, n}
conditional to the data set
l = 1,, L,
Let us consider the joint simulation of z-values at two locations only, say, x1 and x 2 . A set of
(l )
(l )
realizations {z ( x1 ), z ( x 2 )} , l = 1,, L, can be generated by sampling the bivariate ccdf:
13
Seminar
(l )
(l )
The above decomposition allows one to generate the pair {z ( x1 ), z ( x 2 )} in two steps: the
(l )
value z ( x1 ) is first drawn from the ccdf F ( x1 ; z1 | (n)) , then the ccdf at location x 2 is
(l )
conditioned to the realization z ( x1 ) in addition to the original data ( n) and its sampling yields
(l )
the correlated value z ( x 2 ) . The idea is to trade the sampling hence modeling of the bivariate
ccdf for the sequential sampling of two univariate ccdfs easier to infer, hence the generic name
sequential simulation algorithm.
The sequential principle can be generalized to more than two locations. By recursive application
of the Bayes axiom, the N-variate ccdf can be written as the product of N univariate ccdfs:
The above decomposition allows one to generate a realization of the random vector
{Z ( x j ), j 1,......, N }
in N successive steps:
Model the cdf at the first location x1 , conditional to the n original data
z ( x ) :
(l )
Draw from that cdf a realization z ( x1 ), which becomes a conditioning datum for all
subsequent drawings.
.
.
.
.
.
xi visited, model the conditional cdf of Z ( xi ) given the n original data and
z (l ) ( x j )
x j , j 1,....., i 1 :
14
Seminar
Repeat the two previous steps until all the N nodes are visited and each has been given
a simulated value.
z (l ) ( xi ),
RF
{Z ( x), x A} over
{z (l ) ( x j ), j 1,......, N }
the
{z ( l ) ( x j ), j 1,......, N }
nodes
x j
Any
number
L of such realizations
Remarks:
1. The sequential simulation algorithm requires the determination of a conditional cdf at
each location being simulated. Two major classes of sequential simulation algorithms
can be distinguished, depending on whether the series of conditional cdfs are
determined using the multi-Gaussian or the indicator formalisms.
2. Sequential simulation ensures that data are honored at their locations (conditional).
Indeed, at any datum location
render questionable the exact matching of data values, one should allow the simulated
values to deviate somewhat from data at their locations. If the errors are normally
distributed, the simulated value could be drawn from a Gaussian ccdf centered on the
datum value and with a variance equal to the error variance.
3. The sequential principle can be extended to simulate several continuous or categorical
attributes.
Implementation
15
Seminar
Search strategies
The sequential simulation algorithm requires the determination of N successive conditional cdfs
being simulated are retained. Good practice consists of using the semi-variogram
( x x ) so that the conditioning data are preferentially selected along the direction of
maximum continuity.
As the simulation progresses, the original data tend to be overwhelmed by the large number of
previously simulated values, particularly when the simulation grid is dense. A balance between
the two types of conditioning information can be preserved by separately searching the original
data and the previously simulated values (two part search): at each location
, a fixed number
n(x ) of closest original data are retained no matter how many previously simulated values are
in the neighborhood of
Visiting sequence
In theory, the N nodes can be simulated in any sequence. However, because only neighboring
data are retained, artificial continuity may be generated along a deterministic path visiting the N
nodes. Hence, a random sequence or path is recommended.
When generating several realizations, the computational time can be reduced considerably by
keeping the same random path for all realizations. Indeed, the N kriging systems, one for each
node
x j
, need be solved only once since the N conditioning data configurations remain the
same from one realization to another. The trade-off cost is the risk of generating realizations that
are two similar. Therefore, it is better to use a different random path for each realization.
16
Seminar
Chapter 6
x j
of a grid (not
{z ( x ), 1,......, n} .
First step: check the appropriateness of the multiGaussian RF model, which calls for a
prior transform of z-data into y-data with a standard normal cdf using the normal score
17
Seminar
transform. Normality of the bivariate distribution of the resulting normal score variable
2.
Define a random path visiting each node of the grid only once.
At each node x , determine the parameters (mean and variance) of the Gaussian
ccdf G ( x ; y | (n)) using SK with the normal score variogram model Y (h) . The
conditioning information (n) consists of a specified number n(x ) of both normal score
data
(l )
y ( x ) and values y ( x j ) simulated at previously visited grid nodes.
(l )
Draw a simulated value y ( x ) from that cdf, and add it to the data set.
Proceed to the next node along the random path, and repeat the two previous
steps.
Loop until all N nodes are simulated.
3.
The
final
step
{ y (l ) ( x j ), j 1,...., N }
consists
of
back-transforming
the
simulated
normal
scores
applying the inverse of the normal score transform to the simulated y-values:
z (l ) ( x j ) 1 ( y (l ) ( x j ))
j 1,....., N
1
1
1
with (.) F (G (.)), where F (.) is the inverse cdf or quantile function of the variable Z,
and G(.) is the standard Gaussian cdf. That back-transform allows one to identify the original
z-histogram F (z ) . Indeed,
Prob{Z ( l ) ( x ) z} Prob{ 1 (Y ( l ) ( x)) z}
(l)
= Prob{Y ( x ) ( z )}
18
Seminar
= G[ ( z )] F ( z )
Other realizations
{z (l ) ( x j ), j 1,......, N }, l l ,
are obtained by repeating steps 2 and 3 with a
Non-stationary
behaviors
could
be
accounted for using algorithms other than simple kriging to estimate the mean of the Gaussian
ccdf: ordinary kriging or universal kriging of the order k. However, Gaussian theory requires that
the simple kriging variance of normal scores be used for variance of the Gaussian ccdf (Journel,
1980).
Limitations:
Various limitations and shortcomings can be attributed to sequential Gaussian simulation:
1. sGs relies on the assumption of multi-variate Gaussianity, an assumption that can never be
fully checked in practice, yet always seems to be taken for granted. Multi-Gaussianity leads
19
Seminar
Bibliography
Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, New
York, 512 pp.
Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics. Academic Press, New York, 600 pp.
Journel, A.G., 1980. The lognormal approach to predicting local distributions of selective mining
unit grades. Mathematical Geology, 12(4), 285303.
Journel, A.G., 1994. Modeling uncertainty: Some conceptual thoughts. Geostatistics for the
Next Century, pages 3043. Kluwer, Dordrecht.
20
Seminar