0% found this document useful (0 votes)
73 views

Research Assignment

The document provides well yield data from 12 wells in Virginia as measured by Wright (1985). It calculates the mean, geometric mean, and median of the well yield data and finds they differ because the data is skewed. It then calculates various statistical measures of the well yield data, including standard deviation, interquartile range, median absolute deviation, skew, and quartile skew. It discusses how the outlier in the data affects some of the measures and influences the choice of central tendency.

Uploaded by

Abdusalam Idiris
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Research Assignment

The document provides well yield data from 12 wells in Virginia as measured by Wright (1985). It calculates the mean, geometric mean, and median of the well yield data and finds they differ because the data is skewed. It then calculates various statistical measures of the well yield data, including standard deviation, interquartile range, median absolute deviation, skew, and quartile skew. It discusses how the outlier in the data affects some of the measures and influences the choice of central tendency.

Uploaded by

Abdusalam Idiris
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Yields in wells penetrating rock units without fractures were measured by Wright
(1985), and are given below.
Unit well yields (in gal/min/ft.) in Virginia (Wright, 1985)
0.001 0.030 0.10 0.003 0.040 0.454
0.007 0.041 0.49 0.020 0.077 1.02

Calculate the

𝑿_𝒊 mean, μ Geometric mean,G median


0.001
0.003
0.007
0.02
0.03
0.04
0.041
0.077
0.1
0.454
0.49
1.02 0.19 0.043 0.0405

sum 2.283
n 12

a) Mean

n
Mean, ~
x = ∑ x i /n
i=1

(0.001+0.030+ 0.10+0.003+0.040+0.454 +0.007+ 0.041+ 0.49+ 0.020+0.077 1.02)


= =0.19
12
b) Geometric mean

G=√n X 1∗X 2∗…∗X n

=12√ 0.001∗0.030∗0.10∗0.003∗0.040∗0.454∗0.007∗0.041∗0.49∗0.020∗0.077∗1.0 2
=0.043

c) median
To compute the median, first rank the observations from smallest to largest, so that x1 is the
smallest observation, up to xn, the largest observation. Then

0.001 X +X
n th n+1 th
0.003 ( )
2
(
2
)
Median=
0.007 2
0.02
0.03
Or
0.04
0.041
0.077 P0.5 =x( n+1 )∗0.5=6.5 item
0.1
0.454 Is between and therefore
0.49 X6 X7
1.02
x 6 + 0.5( x 7−x 6 ¿=0.04+0.5(0.041-0.04) =0.04

d) Compare these estimates of location. Why do they differ?

 They differ because the data are skewed. The estimates which are more robust are similar,
while the mean is larger.

2. For the well yield data of exercise 1, calculate the

𝑿_𝒊 deviation,δ IQR MAD Skew quartile skew


0.001 0.31 2.07
0.003
0.007
0.02
0.03
0.04
0.041
0.077
0.1
0.454
0.49
1.02

a) Standard deviation, s
S=√ s 2

s is sample variance, s =∑ ¿ ¿¿
2 2

i =1

2
=((0.001−0.19) +(0.030−0.19)2 +(0.10−0.19)2+(0.003−0.19)2+(0.040−0.19)2+
2 2 2 2 2 2
(0.454−0.19) +(0.007−0.19) +(0.041−0.19) +(0.49−0.19) +(0.020−0.19) +(0.077−0.19) +
2
(1.02−0.19) )/12-1

2
s =0.0961

S=0.31

b) Interquartile range

IQR= P75 −P25

X_ ((12+1) ∗0.75) =9.75

Is b/n X_9 and X_10

X_9+0.75(X_10-X_9) =0.3655

X_ ((12+1)0.25) =3.25

Is b/n x_3 and x_4

X_3+0.25(X_4-X_3) =0.00375

IQR=0.3655-0.00375=0.36

c) MAD

= (|0.001-0.19|+|0.003-0.19|+|0.007-0.19|+|0.02-0.19|+|0.03-0.19|+|0.04-0.19|+|0.041-0.19|+|0.077-
0.19|+|0.1-0.19|+|0.454-0.19|+|0.49-0.19|+|1.02-0.19|)/12

=2.785/12

=0.232

d) Skew
Skew=2.07

e) Quartile skew

(0.3655−0.0405)−( 0.0405−0.00375)
= =0.83
0.3655−0.00375

Discuss the differences between a) through c).

The largest observation is an outlier. Though the skew appears to be strongly positive, and
the standard deviation large, this is due only to the effect of that one point. The majority of
the data are not skewed, as shown by the more resistant quartile skew coefficient.

a). It would probably be computed by weighting concentrations by the surface area


represented by each environment. The median would under-represent this mass loading.

b) The median would be a better "typical" concentration, and the IQR a better "typical"
variability, than the mean and standard deviation. This is due to the strong effect of the one
unusual point on these traditional measures.

3. The following chemical and biological data were reported by Frenzel (1988) above and
below a waste treatment plant (WTP). Graph and compare the two sets of multivariate data.
What effects has the WTP appeared to have?
Therefore, the bar chart show that there appears to be no effect of the waste treatment plant.

4. Discuss the characteristics of Correlation Coefficients

Characteristics of Correlation Coefficients measure of the strength of association between


two continuous variables. Of interest is whether one variable generally increases as the
second increases, whether it decreases as the second increases, or whether their patterns of
variation are totally unrelated.

Correlation measures observed co-variation. It does not provide evidence for causal
relationship between the two variables. One may cause the other, as precipitation causes
runoff. They may also be correlated because both share the same cause, such as two solutes
measured at a variety of times or a variety of locations. (Both are caused by variations in the
source of the water).

Evidence for causation must come from outside the statistical analysis from the knowledge of
the processes involved.
 Measures of correlation (here designated in general as ρ) have the characteristic of
being dimensionless and scaled to lie in the

Range −1 ≤ ρ ≤ 1.

ρ = 0, When there is no correlation between two variables.

ρ is positive, when one variable increases as the second increases.

ρ is negative, When they vary in opposite directions,

 The significance of the correlation is evaluated using a hypothesis test:

H0: ρ = 0 versus H1: ρ ≠ 0. When one variable is a measure of time or location, correlation
becomes a test for temporal or spatial trend.

5. Are uranium concentrations correlated with total dissolved solids in the following
groundwater samples? If so, describe the strength of the relationship.

Uranium conc, in TDS, in mg/L Uranium conc, in TDS, in mg/L


ppb ppb
682.65 0.9315 1240.81 6.8559
819.12 1.9380 538.35 0.4806
303.76 0.2919 607.75 1.1452
1151.40 11.9042 705.89 6.0876
582.42 1.5674 1290.57 10.8823
1043.39 2.0623 526.09 0.1473
634.84 3.8858 784.68 2.6741
1087.25 0.9772 953.14 3.0918
1123.51 1.9354 1149.31 0.7592
688.09 0.4367 1074.22 3.7101
1174.54 10.1142 1116.59 7.2446
599.50 0.7551
6. Discuss about the Simple Linear Regression and Multiple Linear Regression

 Simple Linear Regression The relationship between two continuous variables or


between the same two variables.
 The name "simple linear regression" is applied because one explanatory variable
is the simplest case of regression models.
 Multiple linear regression (MLR) is the extension of simple linear regression (SLR) to
the case of multiple explanatory variables.

7. Define Outlier; what are the cause of outliners and how do you identifies outliers in the
data set?

 Outliers, observations whose values are quite different than others in the data set,
often cause concern or alarm. Outliers may be the most important points in the data
set, and should be investigated further.
 Outliers can have one of three causes:
1. A measurement or recording error.
2. An observation from a population not similar to that of most of the data, such as a
flood caused by a dam break rather than by precipitation.
3. A rare event from a single population that is quite skewed.
 The graphical methods are very helpful in identifying outliers. Whenever outliers
occur, first verify that no copying, decimal point, or other obvious error has been
made. If not, it may not be possible to determine if the point is a valid one

8. Write a short review about the different techniques used for quality assurance of
hydrometric network.

Answer: quality assurance process is the methods followed to achieve a specified quality and
the methods to check the quality of an existing data set.

Measurement, data acquisition, validation and correction, dissemination, use

Measurement is measured parameter checks: These tests represent the heart of the data validation
process and normally consist of range tests, relational tests, and trend tests.

Data acquisition is a method to acquire and store all types of data from a wide range of parameters.

Data integration - Data integration procedures are the methods for combining various data sets into
a unified, geographically harmonious data set

Data validation is defined as the inspection of all the collected data for completeness and
reasonableness, and the elimination of erroneous values. This step transforms raw data into
validated data.

Correction Proper correction of data is crucial for data processing. To get qualitative data, it is
necessary to go through each record.
 Double mass analysis is a technique commonly employed to determine corrections to
hydrological data to account for changes in data collection procedures or other local
conditions.

Dissemination This means that it is possible to access the data through special views so that users
can execute queries.

Use Handling of observational data until they are in a form ready to be used for a specific purpose.
To make the proper use of data, data should be stored in such a way that all-possible errors can be
avoided and that the data can be made easily accessible.

Variability check: To check the obvious errors of water level, discharge, groundwater, sediment,
rainfall and evaporation data three main characteristics of time series data have been considered as
criteria of. These are:

a. Punching error if this type of data is plotted against time it would follow a more or less
systematic variability shape from where marked deviation within one time interval is quite
impossible.
b. Change in Trend Water level is measured by data collecting agencies with automatic or
manual gauges.
c. Missing Data Time series data are collected at regular time intervals. When data is not
collected or available for some period for some reason, the gap should be filled for analysis
purposes.

Quality check: If too many errors crop up, or if the surveyed area has changed greatly, the work is
updated and corrected. Preliminary data checking can be done by different ways such as:

a. cross checking
b. general observation
c. graphical presentation, and
d. simple statistical analysis

You might also like