0% found this document useful (0 votes)
77 views

Statistical Methods in Hydrology

The document discusses statistical methods used in hydrology. It explains that statistical methods are important for analyzing hydrologic processes because natural phenomena contain uncertainties. The most common statistical method is frequency analysis, which analyzes historical hydrologic data to predict future events. Specific distributions discussed include the log-normal, Gumbel, and log-Pearson type III distributions, which are commonly used to model variables like rainfall and floods. The document also defines key statistical terms like mean, variance, standard deviation, and skew coefficient that are used to characterize probability distributions.

Uploaded by

Thomas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Statistical Methods in Hydrology

The document discusses statistical methods used in hydrology. It explains that statistical methods are important for analyzing hydrologic processes because natural phenomena contain uncertainties. The most common statistical method is frequency analysis, which analyzes historical hydrologic data to predict future events. Specific distributions discussed include the log-normal, Gumbel, and log-Pearson type III distributions, which are commonly used to model variables like rainfall and floods. The document also defines key statistical terms like mean, variance, standard deviation, and skew coefficient that are used to characterize probability distributions.

Uploaded by

Thomas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K.

Mwangi

Statistical methods in Hydrology

Statistical methods are very important in hydrology. Many hydrologic processes, e.g. rainfall, are
not amenable to purely deterministic analysis because of inherent uncertainties arising from
randomness of natural processes and insufficient quantity and of quality data. Statistical methods
account for these uncertainties, thus enabling accurate predictions to be made. However, their
application is always accompanied by some degree of probability of occurrence.

In applying statistical methods, we assume that natural processes are governed by some
mathematical rules rather than by the physical laws underlying them and it is on the basis of this
assumption that the methods are used to analyze hydrologic processes. The most common statistical
method used in hydrology is frequency analysis whose purpose is to extract information from
observed hydrologic data in order to make predictions concerning future events.

For example, suppose a data series contains discharges at a stream section for the past 20 years on
which a highway bridge is proposed. The proposed bridge must be designed to pass the discharge
without being flooded. What discharge should be used in the design? What are the chances that the
bridge will be flooded during its life span if a particular design discharge is used? Frequency
analysis of the available data is used to answer these questions.

Hydrologic data used in frequency analysis must represent the situation being studied; i.e. the data
set must be homogeneous. For example, future surface runoff from a developed area cannot be
determined using historical runoff data observed under undeveloped conditions. Other changes that
affect the data set include relocated gauges, stream flow diversions and construction of dams and
reservoirs during the period of observation all of which affect the data.

Available hydrologic data may contain more information than required for frequency analysis in
which case, the data is reduced to a useful form. For example, suppose we have a daily stream flow
record at a gauge site for the past N years (called a complete duration series). In most cases
however, for flood studies we are interested only in extreme stream flow values. We can form an
annual exceedance series by considering only the highest N values on record. Alternatively, we can
obtain an annual maximum series using the largest N values occurring in each of the N years.

Either annual exceedance or annual maximum series can then be used in a frequency analysis.
Results of the two approaches are almost similar when extreme events of rare occurrence are being
investigated. However, it is normal to use annual maximum series because the values included in
this series are more likely to be statistically independent, as commonly assumed in frequency
analysis methods.

The Probability Concept


Understanding probability concept requires definition of some terms: random variable, sample,
population, and probability distribution. A random variable is a numerical variable that cannot be
precisely predicted. In probabilistic methods, we treat all hydrologic variables as random variables.
These include rates of rainfall, stream flow, evaporation, wind velocity, and reservoir storage.

1
Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K. Mwangi

A sample is a set of observations of a random variable. For example, the annual maximum stream
flow observed at a gauge site during the past N years forms a sample. Likewise, the annual
maximum stream flow that will occur over some specific period in future forms another sample. We
assume that samples are drawn from an infinite hypothetical population, which is defined as the
complete assemblage of all the values representing the random variable being investigated.

A probability distribution is a mathematical expression that describes the probabilistic


characteristics of a population. It is useful in calculating the chances that a random variable drawn
from this population will fall in a specified range of numerical values. For example, probability
distribution of annual maximum stream flow enables estimation of the chances that the maximum
stream flow will exceed a specified value in any one year in the future (Useful in design of
hydraulic structures).

Statistical Parameters
Most theoretical probability distributions are expressed in terms of statistical parameters that
characterize the population such as mean, standard deviation and skewness. These parameters
cannot be determined accurately because all values included in the entire population are not known
but the statistical parameters can be estimated from a sample of the population.

Mean
Simply stated, the mean m is the average of all the observed values included in a sample.
1 N
m   xi
N i 1
Variance is a measure of the variability of data. The square root of the variance is called the
standard deviation. A sample estimate of the standard deviation (S) is given by:
1/ 2
 1 N 2
s  xi  m 
 N  1 i 1
Skewness, (skew), is a measure of the symmetry of a probability distribution about the mean.
Skew coefficient can be estimated from data as follows:
N
N  xi  m 3
G i 1

N  1N  2s 3
In all cases N is the observed values of a random variable xi, with i = 1, 2 …………………. N.

Logarithms of observed values for many hydrologic variables tend to follow certain probability
distributions. The aforementioned statistical parameters can hence be calculated as:
1 N
ml   log xi
N i 1
1/ 2
 1 N
sl    log xi  ml 2 
 N  1 i 1 
N
N  log xi  ml 
3

Gl  i 1

N  1N  2sl 3
2
Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K. Mwangi

Where ml, sl, and Gl are the mean, standard deviation and skew coefficient of the logarithms of the
observed data (all in base 10).
Example:
Annual peak discharges (Q) of Ndarugu River at Juja, are as tabulated in Table 1 for the years 1952
to 1990. Determine the mean, standard deviation and skew coefficient of the data.

Table 1: Ndarugu River discharges (1952-1990)


Year 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
3
Q (m /s) 9410 11200 5860 12600 7520 7580 12100 9400 8710 6700 12900 8450 4210
Year 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1976 1977 1978
3
Q (m /s) 7030 7470 5200 6200 5800 5400 7800 19400 21100 10000 16200 8100 5640
Year 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
3
Q (m /s) 19400 16600 11100 4940 9360 13800 8570 17500 16600 3800 7390 7060

Answers mean = 9820 m3/s, standard deviation 4660 m3/s and skew coefficient = 0.946.
Check the results using the mean approach.

Probability Distributions
Among the many theoretical probability distributions available, normal, log-normal, Gumbel, and
log-Pearson type III distributions are the most commonly used in hydrology.

Normal Distribution
The normal distribution, is the most common model of probability but is rarely used in hydrology
because it allows random variables to assume values from -∞ to ∞, yet most hydrologic variables,
such as stream discharge, are non-negative.

Log Normal Distribution


The log transformation of hydrologic random variables is more likely to follow the normal
distribution than the original values and, in such cases, the random variable is said to be log-
normally distributed. The probability density function for log-normal distribution is given by:

  log x  ml 2 
f X x  
1
exp   Where “exp” is the base of Naperian logarithm (2.718)
x sl 2 
2
2 s l 
raised to the power of the value in the bracket. Log function in the bracket is the normal log of x.

Gumbel Distribution
Gumbel distribution, (extreme value type I distribution), is commonly used for frequency analysis
of maximum rainfall and floods. The probability density function for this distribution is given by:

in which y and u are



intermediate parameters, defined as: u  m  0.45s and y  where m and s represent, mean
s 6
and standard deviation, respectively of the sample, as defined previously.
3
Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K. Mwangi

Log-Pearson Type III Distribution


The probability density function for the log-Pearson type III distribution is given by:

Where
is the gamma function. Values of the gamma function can be found in standard tables.
Parameters b, r, and v are related to sample statistical parameters through the expressions:
4 s
b  2 , v  l , r  ml  sl b where parameters ml, sl , and Gl are as previously obtained.
Gl b

For sample sizes more than 100, the skew coefficient to be used in the log-Pearson type III distribution is
simply Gl which is computed as previously (page 2).

For practical purposes cumulative density functions are more useful than probability density
functions and given probability density function fx(x), cumulative density function for any
x
distribution is expressed as: FX ( x)   fx(u)du

Where u is a dummy variable of integration. The lower limit of integration should be adjusted to
zero if the distribution allows only positive values. The numerical value of FX(x) represents the
probability that the random variable being modelled will take a value smaller than x.

Suppose we hypothesize that one of the probability distributions discussed earlier can be used to
describe an annual maximum discharge for Ndarugu River. Using the sample mean, standard
deviation, skew coefficient, and the chosen probability density function, the above equation is
evaluated with an upper limit of integration of x = 3,000m3/s.

Suppose further that the resultant cumulative density, computes to be 0.80. Then we can say that the
annual maximum discharge of Ndarugu River will be smaller than 3,000 m3/s with a probability of
0.80 (80%), in any single year in future. It should be noted that the numerical value of FX(x) is
always be between zero and unity. Occasionally, FX(x) is referred to as the non-exceedance
probability.

Hydrologists / engineers dealing with flood studies / design of hydraulic structures are usually
interested in exceedance probability (p), which is expressed as p  1  FX ( x) , with p values ranging
between zero and unity. In the above case p = 1-0.80 = 0 = 0.20

This means that in any given year in future, maximum discharge in Ndarugu River will exceed
3,000m3/s with a probability of 20% (0.20) which is termed 20% exceedance probability.

Return period, (recurrence interval), is defined as the average number of years between occurrences
1
of a hydrologic event with a certain magnitude or greater. Return period is denoted by T, T 
p

4
Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K. Mwangi

1
For example, in the previous example, the return period for 3,000 m3/s will be  5 years,
0.20
meaning that the annual maximum discharge of Ndarugu River will exceed 3,000 m3/s once every 5
years on average.

Hydraulic structures are designed to accommodate, design discharge at full capacity, with a
specified return period and the structure will fail to function as intended if the design discharge is
exceeded. The hydrologic risk is the probability that the design discharge will be exceeded one or
more times during the service life of the structure.

Denoting the risk by R and the service life of the project in years by n, R  1  1  p n

Frequency Analysis
The purpose of a frequency analysis of a series of observed data of a hydrologic variable is to
determine future values of this variable corresponding to different return periods of interest. To
achieve this, we need to determine the probability distribution that best fits the available hydrologic
data using statistical means. Only after identifying a probability distribution that adequately
represents the data series can we interpolate and extrapolate from the observed data values
intelligently. Frequency factors and special probability graph papers are used for this purpose.

Frequency factors
For most theoretical distributions used in hydrology, closed form analytical expressions are not
available for cumulative density functions. However, Chow showed that equation p  1  FX ( x) can
be written in a more convenient form as: X T  m  KT s , where m and s are sample mean and
standard deviation, respectively; xT is the magnitude of a hydrologic variable corresponding to a
specified return period T and KT is the frequency factor for that return period. When log-
transformed variables are used, as is the case in log-normal and log-Pearson type III distributions,
LogX T  ml  K T sl . Frequency factor used depends on probability distribution in use. Frequency
factors for various return periods for Gumbel, Log-normal and Log-Pearson Type III distributions
are obtained from standard tables. Frequency factors obtained from standard tables for use in
estimating magnitudes of future events are applicable only if the probability distribution is specified
and methods for testing the goodness of fit of data to a probability distribution are available.

Testing Goodness of Fit


The chi-square test is a statistical procedure used to determine the goodness of fit of data to a
probability distribution. To perform a chi-square test, it is necessary to first choose a significance
level α. Commonly, α = 0.01 is used in hydrology. This means that if we use ,α = 0.10 and as a
result of the chi-square test we reject the probability distribution being considered, then there is a
10% chance that we have rejected a satisfactory distribution.

Confidence Limits
Normally there are uncertainties associated with estimates made using frequency factors. Usually,
these estimates are presented within a range called a confidence interval, the upper and lower limits
of which are called the confidence limits. The width of the confidence interval depends on sample

5
Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K. Mwangi

size and confidence level. An interval is said to have a confidence level of 90% if the true value of
the estimated hydrologic variable is expected to fall in this range with a probability of 90% (0.90).

Frequency Analysis Using Probability Graphs

Probability Graphs
Graphical representation of hydrologic data is an important tool in statistical analysis. Usually, the
data is plotted on a specially designed probability paper. The ordinate represents the hydrologic
variable while the abscissa represents the return period (T) or exceedance probability (p). The
ordinate scale can be linear or logarithmic, depending on the probability distribution in use. The
abscissa scale is designed such that equation X T  m  KT s or LogX T  ml  K T sl will plot as a
theoretical straight line. When plotted, the data points should fall on / near this straight line if the
probability distribution used represents the data series adequately. With this linear relationship, the
plotted data can be easily interpolated and extrapolated.

Log-normal, and Gumbel distribution graph papers are available commercially (see sample) but for
Log-Pearson type III distribution, there would have to be a different graph paper for each value of
skew coefficient making it impracticable to have commercial log-Pearson type III graph papers.
However, a log-normal probability paper can be used for log-Pearson type III distribution, but
equation LogX T  ml  K T sl will plot as a smooth theoretical curve rather than a straight line,
making extrapolation of the plotted data difficult.

Plotting Positions
Plotting position is the return period T (or exceedance probability) assigned to each data plotted on
a probability paper. Among the many methods available in literature, Weibull method is the
common one and is adopted here. In this method, the data values are listed in a decreasing order of
magnitude and a rank (m) assigned to each data value, i.e., if there are N data values in the series, m
= 1 for the largest value in the series and m= N for the smallest. Then the exceedance probability
assigned to each data value for plotting purposes is found as: T  N  1 where T is the assigned
m
return period for plotting purposes or p  m where p is the exceedance probability.
N 1

Data Plotting and Theoretical Distributions


As discussed previously, a graphical representation of a hydrologic data series can be obtained by
plotting the data points on specially designed probability paper depending on the probability
distribution appropriate to the data or distribution being tested.

A theoretical straight line representing the probability distribution may be plotted using the
frequency factors discussed earlier. Although two points are adequate to draw a straight line, it is
good practice to use at least three points in order to detect any computational errors. For a perfect
fit, all data points must fall on the straight line but rarely do we find a perfect fit in actual
applications. If the data points are close enough to the theoretical straight line, then the probability
distribution being tested is acceptable.

6
Dr. J.K. Mwangi Lecture Notes ECE 2410: HYDROLOGY II Dr. J.K. Mwangi

Sample probability graph paper (Normal distribution)

You might also like