Research Assignment
Research Assignment
Yields in wells penetrating rock units without fractures were measured by Wright
(1985), and are given below.
Unit well yields (in gal/min/ft.) in Virginia (Wright, 1985)
0.001 0.030 0.10 0.003 0.040 0.454
0.007 0.041 0.49 0.020 0.077 1.02
Calculate the
sum 2.283
n 12
a) Mean
n
Mean, ~
x = ∑ x i /n
i=1
=12√ 0.001∗0.030∗0.10∗0.003∗0.040∗0.454∗0.007∗0.041∗0.49∗0.020∗0.077∗1.0 2
=0.043
c) median
To compute the median, first rank the observations from smallest to largest, so that x1 is the
smallest observation, up to xn, the largest observation. Then
0.001 X +X
n th n+1 th
0.003 ( )
2
(
2
)
Median=
0.007 2
0.02
0.03
Or
0.04
0.041
0.077 P0.5 =x( n+1 )∗0.5=6.5 item
0.1
0.454 Is between and therefore
0.49 X6 X7
1.02
x 6 + 0.5( x 7−x 6 ¿=0.04+0.5(0.041-0.04) =0.04
They differ because the data are skewed. The estimates which are more robust are similar,
while the mean is larger.
a) Standard deviation, s
S=√ s 2
s is sample variance, s =∑ ¿ ¿¿
2 2
i =1
2
=((0.001−0.19) +(0.030−0.19)2 +(0.10−0.19)2+(0.003−0.19)2+(0.040−0.19)2+
2 2 2 2 2 2
(0.454−0.19) +(0.007−0.19) +(0.041−0.19) +(0.49−0.19) +(0.020−0.19) +(0.077−0.19) +
2
(1.02−0.19) )/12-1
2
s =0.0961
S=0.31
b) Interquartile range
X_9+0.75(X_10-X_9) =0.3655
X_ ((12+1)0.25) =3.25
X_3+0.25(X_4-X_3) =0.00375
IQR=0.3655-0.00375=0.36
c) MAD
= (|0.001-0.19|+|0.003-0.19|+|0.007-0.19|+|0.02-0.19|+|0.03-0.19|+|0.04-0.19|+|0.041-0.19|+|0.077-
0.19|+|0.1-0.19|+|0.454-0.19|+|0.49-0.19|+|1.02-0.19|)/12
=2.785/12
=0.232
d) Skew
Skew=2.07
e) Quartile skew
(0.3655−0.0405)−( 0.0405−0.00375)
= =0.83
0.3655−0.00375
The largest observation is an outlier. Though the skew appears to be strongly positive, and
the standard deviation large, this is due only to the effect of that one point. The majority of
the data are not skewed, as shown by the more resistant quartile skew coefficient.
b) The median would be a better "typical" concentration, and the IQR a better "typical"
variability, than the mean and standard deviation. This is due to the strong effect of the one
unusual point on these traditional measures.
3. The following chemical and biological data were reported by Frenzel (1988) above and
below a waste treatment plant (WTP). Graph and compare the two sets of multivariate data.
What effects has the WTP appeared to have?
Therefore, the bar chart show that there appears to be no effect of the waste treatment plant.
Correlation measures observed co-variation. It does not provide evidence for causal
relationship between the two variables. One may cause the other, as precipitation causes
runoff. They may also be correlated because both share the same cause, such as two solutes
measured at a variety of times or a variety of locations. (Both are caused by variations in the
source of the water).
Evidence for causation must come from outside the statistical analysis from the knowledge of
the processes involved.
Measures of correlation (here designated in general as ρ) have the characteristic of
being dimensionless and scaled to lie in the
Range −1 ≤ ρ ≤ 1.
H0: ρ = 0 versus H1: ρ ≠ 0. When one variable is a measure of time or location, correlation
becomes a test for temporal or spatial trend.
5. Are uranium concentrations correlated with total dissolved solids in the following
groundwater samples? If so, describe the strength of the relationship.
7. Define Outlier; what are the cause of outliners and how do you identifies outliers in the
data set?
Outliers, observations whose values are quite different than others in the data set,
often cause concern or alarm. Outliers may be the most important points in the data
set, and should be investigated further.
Outliers can have one of three causes:
1. A measurement or recording error.
2. An observation from a population not similar to that of most of the data, such as a
flood caused by a dam break rather than by precipitation.
3. A rare event from a single population that is quite skewed.
The graphical methods are very helpful in identifying outliers. Whenever outliers
occur, first verify that no copying, decimal point, or other obvious error has been
made. If not, it may not be possible to determine if the point is a valid one
8. Write a short review about the different techniques used for quality assurance of
hydrometric network.
Answer: quality assurance process is the methods followed to achieve a specified quality and
the methods to check the quality of an existing data set.
Measurement is measured parameter checks: These tests represent the heart of the data validation
process and normally consist of range tests, relational tests, and trend tests.
Data acquisition is a method to acquire and store all types of data from a wide range of parameters.
Data integration - Data integration procedures are the methods for combining various data sets into
a unified, geographically harmonious data set
Data validation is defined as the inspection of all the collected data for completeness and
reasonableness, and the elimination of erroneous values. This step transforms raw data into
validated data.
Correction Proper correction of data is crucial for data processing. To get qualitative data, it is
necessary to go through each record.
Double mass analysis is a technique commonly employed to determine corrections to
hydrological data to account for changes in data collection procedures or other local
conditions.
Dissemination This means that it is possible to access the data through special views so that users
can execute queries.
Use Handling of observational data until they are in a form ready to be used for a specific purpose.
To make the proper use of data, data should be stored in such a way that all-possible errors can be
avoided and that the data can be made easily accessible.
Variability check: To check the obvious errors of water level, discharge, groundwater, sediment,
rainfall and evaporation data three main characteristics of time series data have been considered as
criteria of. These are:
a. Punching error if this type of data is plotted against time it would follow a more or less
systematic variability shape from where marked deviation within one time interval is quite
impossible.
b. Change in Trend Water level is measured by data collecting agencies with automatic or
manual gauges.
c. Missing Data Time series data are collected at regular time intervals. When data is not
collected or available for some period for some reason, the gap should be filled for analysis
purposes.
Quality check: If too many errors crop up, or if the surveyed area has changed greatly, the work is
updated and corrected. Preliminary data checking can be done by different ways such as:
a. cross checking
b. general observation
c. graphical presentation, and
d. simple statistical analysis