Statistical Analysis in Climate Research Hans Von
Statistical Analysis in Climate Research Hans Von
net/publication/256938453
CITATION READS
1 2,703
1 author:
Manfred Mudelsee
Climate Risk Analysis
197 PUBLICATIONS 14,182 CITATIONS
SEE PROFILE
All content following this page was uploaded by Manfred Mudelsee on 27 January 2018.
Book Review
Statistical Analysis in Climate Research ability to perform and repeat experiments. Linear
Hans von Storch and Francis W. Zwiers; Cambridge regression (Chapter 8) is a well-elaborated statistical
University Press, Cambridge, 1999, x+484pp., US$ 110, field of estimation and the authors render a concise
ISBN 0-521-45071-3 (hardback) description. Rightly they stress the importance of
diagnosing model suitability with residual analysis.
Climate is a complex system: it has many variables, Multiple linear regression is presented conveniently in
and they are acting nonlinearly, in general. Therefore, matrix notation. Robust regression, stepwise regression,
no exact answers to questions should be expected, and and weighted regression are listed. Closely related to
many climatic processes are and will be poorly under- regression is analysis of variance (Chapter 9), involving
stood. That means that statistical analysis is undeniable the partitioning of variability into different regression
in climate research. This single book provides climato- and error components. This concept can be used in the
logists } whether they work in modeling or in measur- design of GCM experiments aimed at assessing the
ing, whether they are professionals seeking references or influence of certain model parameters (Section 9.4 gives
students starting to learn } with what they need to a good example).
know about statistical analysis. Climate evolves in time, and stochastic processes
The introduction (Chapter 1) offers a broader and (i.e., time-dependent random variables) and time series
more colorful motivation than mine for statistical (i.e., the observed or sampled process) are central to
analysis in climatology. Chapter 2 supplies the funda- statistical analysis in climate research. Chapter 10
mentals of probability theory such as random variables defines stochastic processes as a composition of a
or statistical distributions. The authors adopt a frequen- deterministic part (‘‘signal’’) and a random part
tist viewpoint which should be easily accessible for (‘‘noise’’). Basic models such as completely random
natural scientists. Examples of distributions of climate processes and autoregressive (AR) processes are intro-
variables (Chapter 3) over broad spatial and temporal duced. Of particular importance are AR processes of
scales illustrate the fundamentals and introduce some first order (AR1) which are a good model for persistence
key meteorological quantities: 500 hPa height, sea sur- (memory) of climatic processes. Hasselmann’s (1976)
face temperature, streamflow, etc. The backbone of the famous theoretical demonstration of the AR1 climate
authors’ approach is developed in Chapters 4–6: process is explained. Further discussed are moving
statistical inference. We humans have only a limited average processes, multivariate processes, and, of special
sample of data but wish to know the truth about the interest, seasonal AR processes. Time series analysis
climate system. That means that, on the one hand, we aims to estimate the signal from a sampled process
can only estimate the parameters of an assumed (Chapters 11 and 12). Estimation can be carried out in
relationship, a statistical model, and give a confidence the time domain using covariance functions or in the
interval which is hoped to cover the true value with a frequency domain using spectra. Time series and
certain probability. On the other hand, the truth of spectral analysis is certainly a very wide area and the
hypotheses about the climate system is hidden, and we authors rightly do not attempt to be exhaustive but
can only test them, having a certain risk of falsely rather present basic concepts, such as estimation bias
rejecting a true hypothesis. These two types of inference, and variance, and testing the suitability of a fitted
estimation and hypothesis testing, are explained in detail process.
and with rigor. You learn about parametric/nonpara- Although Chapters 1–12 cover essential material, the
metric estimation and robustness, maximum likelihood remaining chapters on eigentechniques could be con-
estimation, important sampling distributions such as sidered optional. These techniques are useful for
Student’s t, chi-squared or F, about the tradeoff between analyzing GCM output or observed datasets of high
bias and variance, bootstrap methods, etc. Chapter 7 dimensionality. The book covers Empirical Orthogonal
offers Hasselmann and coworkers’ spectacular example Functions (Chapter 13), Canonical Correlation Analysis
of testing the hypothesis ‘‘man changes climate’’ using (Chapter 14), Principal Oscillation Patterns (Chapter 15)
temperature observations and a General Circulation and complex techniques (Chapter 16). Estimation and
Model (GCM). Unlike natural climate, GCMs offer the model selection are also heavily emphasized in these
PII: S 0 0 9 8 - 3 0 0 4 ( 0 0 ) 0 0 1 4 3 - 6
372 Book review / Computers & Geosciences 27 (2001) 371–373
chapters, as in earlier chapters. Additional useful to extreme value distributions (Section 2.9). Least-
statistical concepts such as decorrelation time, predict- squares estimation should have its own section in
ability, teleconnections and forecast skill of climate Chapter 5. The section on bootstrapping (Section 5.5)
predictions are presented in Chapters 17 and 18. The is not up-to-date. The stationary bootstrap (Politis and
appendices explain notation, give a very short course on Romano, 1994) is the method of choice for dependent
linear algebra and Fourier analysis, list statistical tables data. The partitioning of histograms ought not to be
and demonstrate proofs of some theorems in the main made subjectively (Section 5.2); rather, as in density
text. The list of references has 454 entries, the index estimation, there is a tradeoff between bias and variance,
approximately 1300. and selection rules exist. The conditions presented in
The book has two major strengths, the first being the Section 10.3 are for only asymptotically stationary AR
clarity and rigor (at an ‘‘applied statistics’’ level) of the processes. This might explain the residual plot in
mathematical presentation. Rigor should not be con- Fig. 12.5. Studentized residuals are wrongly defined
fused with complexity: you only need a good basic (p. 156). Instead of the ‘‘decorrelation time’’ (Section
knowledge of mathematics } the rest you can learn 17.1), the decay period of the autocorrelation function
here, with pencil, paper, and concentration. The second of an AR1 process advantageously corresponds directly
is the very close connection between statistical theory to the relevant physical timescale and might therefore be
and climatological application in the examples which a better estimator of persistence.
you find throughout the book. You understand the need The style is highly original: computers and software
for statistics and also learn about topics such as El Niño/ are hardly mentioned, the internet does not exist. I like
Southern Oscillation, Madden-and-Julian oscillation, that because it allows you to concentrate on the content.
hydrology, sunspots, North Atlantic Oscillation, etc. The authors’ attempt to coin meaningful phrases that
You do not learn to blindly use some nice statistical prevent confusion is appreciated. However, in the
computer program, resulting in colorful plots even- case of ‘‘SSA’’, the original name (Singular Spectrum
tually. You rather gain a basic knowledge and learn to Analysis) had to be told.
be aware of many pitfalls such as the dependence of The book is certainly excellent value for your money
statistical tests on serial correlation, the dependence of and a must if you are analyzing GCM output (as the
estimations on made assumptions (e.g., the Gaussian authors do) or meteorological data. If you are using
assumption), the influence of outliers and multicolli- paleoclimatic data, be aware that these are usually not
nearity in regression, the multiplicity of statistical available as equidistant time series and many methods
tests, etc. explained in the book are not applicable (interpolation is
The major weakness is that nonlinear methods in obsolete).
statistics are neglected, although the authors stress It would be unfair to conclude without giving
climate’s nonlinearity in the introduction. The section the misprinted/erroneous formulas that I discovered.
on nonlinear regression mentions only transformations They should be corrected in the next edition which
to linearity (which are not always possible or useful) the book surely will enjoy. These formulas include
and function minimization techniques with no those for variance of the discrete uniform distribution
details. Nonlinear time-series models are only (p. 23), the variance of the lognormal distribution (p.
touched in one short subsection. Standard texts such 36), the multinormal density (p. 41), the binormal
as Seber and Wild (1989) or Tong (1990) are not density (p. 43), the density of Gumbel’s distribution
referred to. The theory of nonlinear dynamic systems (p. 49), the relation between the fourth central moment
(‘‘chaos’’) and its relevance for statistical description and kurtosis (p. 86), the confidence interval for the
of natural phenomena seems not to exist for the intercept of a regression line (p. 153), Eq. (11.14)
authors. I certainly agree that too many low-dimen- (p. 221/222), Eq. (11.62) (p. 234), the approximate
sional climatic attractors have been ‘‘found’’ using bias of the estimated autocorrelation function
insufficient data sizes and without any model identifica- (p. 252), and the decorrelation time for an AR2 process
tion. However, there exist also serious nonlinear (p. 373/374).
measures such as generalized redundancies (Prichard
and Theiler, 1995), a kind of nonlinear correlation
coefficient, with potential for further insights in clima-
tology (Diks and Mudelsee, 2000). GCMs can produce
References
data sizes allowing reliable estimations.
A few further points: The Law of Large Numbers is Diks, C., Mudelsee, M., 2000. Redundancies in the Earth’s
not the same as the Central Limit Theorem! The climatological time series. Physics Letters A 275 (5–6),
inventor of the normal distribution was de Moivre, not 407–414.
Gauss. Mandelbrot’s interpretation of the Hurst phe- Hasselmann, K., 1976. Stochastic climate models: Part I.
nomenon might be alluded to as an alternative approach Theory. Tellus 28 (6), 473–485.
Book review / Computers & Geosciences 27 (2001) 371–373 373
Politis, D.N., Romano, J.P., 1994. The stationary bootstrap. Manfred Mudelsee
Journal of the American Statistical Association 89 (428), Institute of Meteorology, University of Leipzig,
1303–1313. Stephanstr. 3, D-04103 Leipzig, Germany
Prichard, D., Theiler, J., 1995. Generalized redundancies for Tel.: +49-341-97-32-866; fax: +49-341-97-32899
time series analysis. Physica D 84 (3–4), 476–493. E-mail address: [email protected]
Seber, G.A.F., Wild, C.J., 1989. Nonlinear Regression. Wiley,
New York, 768pp.
Tong, H., 1990. Non-linear Time Series. Clarendon Press,
Oxford, 564pp.