Assignments 2017 PDF
Assignments 2017 PDF
• Three freely chosen questions are compulsory, the remaining are supplementary.
• The solution of the supplementary questions gives additional marks up to a sum of 30 marks.
!10
A = {Y1 = 1, Y2 = 0, Y3 = 1, Y4 = 1}, B={ Yi = 7}
i=1
a) Suppose that a request was sent to a server and was rejected. Then it was sent again to
the same server and was rejected again. If the request will now be sent to another server
(randomly chosen from the remaining four), what is the probability of a successful response?
1
b) Suppose that a request was sent to a server and was rejected. Then it was sent to another
server (randomly chosen from the remaining four) and was also rejected. Then the request is
sent to another server (randomly chosen from the remaining three). Calculate the probability
of a successful response.
a) Calculate the probability that the third red card occurs at the trial i = 3, 4 . . . , under the
assumptions that each card is returned to a random location (put the card into the deck and
shuffle).
b) Calculate the probability that the third red card occurs at the trial i = 3, 4 . . . under the
assumptions that each card is not returned to the deck but is put aside.
c) Determine the probability of the event ∪3i=1 {Xi = 0} that at least one of the queues is
empty.
b) Assume that P(A ∪ B) = 59 , what is the maximal possible value for P(A ∩ B)?
2
QUESTION 6 (10 marks) Consider independent random variables U and V and assume that U
follows a normal distribution with mean µ = 0 and variance σ 2 = 1 and V follows exponential
distribution with scale parameter β = 1. Define the variables
X = U, Y = U +V
3
Assignment Part II
Statistics and financial econometrics
Autumn 2017
Problem 1 (Big Data: 15 marks) Working with big data requires specific skills and software.
The statistical language R provides diverse packages to address big data analysis. One of such
packages is data.table which yields an alternative to the traditional data.frame objects and their
methods. The superior computational efficiency of data.table shall be examined in the following
exercise.
a) Study the traditional data.frame objects and methods including reading and writing data
(read the documentation ?data.frame, ?read.table, ?write.table). Install the package
data.table (by typing install.packages("data.table"), load this package by
require("data.table")) and study its documentation ( ?data.table, ?fread, ?fwrite,
run the examples example(data.table)).
b) Download from our UTSOnline site the file Poloniex.csv.zip (38,7 MB) and unpack it to
Poloniex.csv (971,9 MB). These data contain market snapshots (at regular times) of trad-
ing digital currencies at the Poloniex exchange https://poloniex.com/exchange#btc_eth
d) Compare this time to the time spent on market1 <- fread("Poloniex.csv", skip=1, header=T).
Compare also the performance of data writing operations write.csv(x=market, file="test.csv"),
fwrite(x=market1, file="test.csv") to their binary versions
saveRDS(object=market, file="test.rds") and saveRDS(object=market1, file="test.rds").
e) Determine the column names for the object market1. Create a data field prices by
extracting those columns, whose names end with the string _last. (use the command
grep(pattern="_last", ....) to extract these names). Subindex market1 on these names.
The resulting field will contain the last paid prices of digital currencies at snapshot times.
f) Extract the column representing the time of the snapshot from the data field market1 (use
sub-indexing market1[["TIME"]] or market1[,TIME] ), convert these machine times to
1
the calendar time objects using
time_stamps <- as.POSIXct(...,origin = ’1970-01-01’, tz = ’GMT’)
Which time range is encompassed? What is the average time interval between the snapshots?
g) Create and store (as RDS-object, use saveRDS) a data field named five_min_prices which
contains all last prices, sampled at five-minute intervals with the first column representing
the machine time of the snapshot.
Problem 2 (Risk Analysis: 15 marks) Standard work on financial data encompasses calculation
of moments, determining distributions, detecting outliers and estimating covariances. We will use
the prices of digital currencies, sampled from Poloniex exchange at five minute intervals to practice
these steps.
a) Load the data generated in g) of the Problem 1. (Alternatively, use ReadRDS to load the
data file five_min_prices.rds obtained from UTSonline.)
b) Convert the machine times of the snapshots to calendar times (use as.POSIXct as in f) of
the Problem 1) and plot Etherium prices, expressed in BitCoins (this is the column with the
name BTC_ETH_last), against calendar times.
e) Determine the 0.001 and 0.999 empirical quantiles of diff_price_BTC_ETH_last and plot
its empirical density in this range using plot(density(diff_price_BTC_ETH_last), xlim=...).
Plot the normal density, whose mean and standard deviation equals those of diff_price_BTC_ETH_last
on the same graph.
2
is to extract the so-called principle axes. Therefore, one determines for a given number k ∈ N, the
largest eigenvalues λ1 ≥ · · · ≥ λk of X ! X and the principle axes b1 , . . . , bk ∈ Rp are obtained as
orthonormal eigenvectors of X ! X corresponding to these eigenvalues. With this, given a element
#
" = ki=1 (x! bi )bi .
x ∈ Rp , its principle component approximate is calculated as x
b) Determine three first principle axes b1 , b2 , b3 and plot them (against arguments).
#k
"(k) =
c) Determine the projections x !
i=1 (x bi )bi for the vector x = (X1,j )pj=1 obtained as the
first row of the data matrix X . Plot x(k) for k = 1, 2, 3 and x
" on the same graph (against
arguments).
3
Solutions Assignment Part II
Statistics and Financial Econometrics
Autumn 2017
#################################################################################
#
# Problem 1
#
#################################################################################
#
# a)
#
#######################################
rm(list = ls())
?data.frame
?read.table
?write.table
#install.packages("data.table")
require("data.table")
?fread
?fwrite
#######################################
#
# b)
#
#######################################
working_directory<-"/home/juri/data/Assignment2-2017/"
setwd(working_directory)
#######################################
#
# c)
#
#######################################
tic<-proc.time()
market <- read.csv("Poloniex.csv", skip=1, header=T);
4
toc<-proc.time()
print(toc-tic)
#######################################
#
# d)
#
#######################################
tic<-proc.time()
market1 <- fread("Poloniex.csv", skip=1, header=T);
toc<-proc.time()
print(toc-tic)
tic<-proc.time()
write.csv(x=market, file="test.csv");
toc<-proc.time()
print(toc-tic)
tic<-proc.time()
fwrite(x=market1, file="test.csv");
toc<-proc.time()
print(toc-tic)
tic<-proc.time()
saveRDS(object=market1, file="test.rds");
toc<-proc.time()
print(toc-tic)
tic<-proc.time()
saveRDS(object=market, file="test.rds");
toc<-proc.time()
print(toc-tic)
#######################################
#
# e)
5
#
#######################################
column_names<-colnames(market1);column_names
last<-column_names[grep(pattern="_last", x=column_names)]
prices<-market1[,..last]
names(prices)
#######################################
#
# f)
#
#######################################
time_stamps <- as.POSIXct(market1[["TIME"]],origin = ’1970-01-01’, tz = ’GMT’);
max(time_stamps); min(time_stamps)
mean(diff(time_stamps))
#######################################
#
# g)
#
#######################################
################################################################################
#
# Problem 2
#
#################################################################################
#
# a)
#
#######################################
five_min_prices<-readRDS(file="five_min_prices.rds")
6
#######################################
#
# b)
#
#######################################
time_stamps <- as.POSIXct(five_min_prices[["TIME"]],origin = ’1970-01-01’, tz = ’GMT’);
plot(time_stamps , five_min_prices[,BTC_ETH_last] , type="l");
#######################################
#
# c)
#
#######################################
diff_price_BTC_ETH_last<-diff(five_min_prices[,BTC_ETH_last])
#install.packages("e1071")
require("e1071")
mean(diff_price_BTC_ETH_last)
sd(diff_price_BTC_ETH_last)
skewness(diff_price_BTC_ETH_last)
kurtosis(diff_price_BTC_ETH_last)
#######################################
#
# d)
#
#######################################
qqnorm(diff_price_BTC_ETH_last)
#######################################
#
# e)
#
#######################################
7
#######################################
#
# f)
#
#######################################
mean(diff_price_BTC_ETH_last[diff_price_BTC_ETH_last>quantile(diff_price_BTC_ETH_last, probs=0.9
minute_history<-history[index, ]
time_stamps <- as.POSIXct(minute_history[["TIME"]],origin = ’1970-01-01’, tz = ’GMT’);
################################################################################
#
# Problem 3
#
#################################################################################
#######################################
#
# a)
#
#######################################
set.seed(10) #initialize pseudo random generator for reproducibility
n<-3000; p=100
basis_functions<-list(sin, cos, dnorm, pnorm)
arguments<-seq(from=0, to=1, length=p)
function_values<-matrix(data=0, nrow=length(basis_functions), ncol=length(arguments))
for (i in 1:length(basis_functions)) function_values[i,]<-basis_functions[[i]](arguments)
random_coefficients<-matrix(data=0, nrow=n, ncol=length(basis_functions))
random_coefficients[,]<-rnorm(nrow(random_coefficients)*ncol(random_coefficients), mean=3,
X<-random_coefficients%*%function_values
#
plot(x=arguments, y=X[1,], type="l", ylim=c(min(X[1:20, ]), max(X[1:20, ]))) # plot some functio
for (i in (2:50)) points(x=arguments, y=X[i,], type="l")
#######################################
#
# b)
#
#######################################
EVD<-eigen(t(X)%*%X)
plot(x=arguments,y=EVD$vectors[,1], type="l", col="black", ylim=c(min(EVD$vectors[,1:3]), max(EV
points(x=arguments,y=EVD$vectors[,2], type="l")
points(x=arguments,y=EVD$vectors[,3], type="l")
8
#######################################
#
# c)
#
#######################################
x<-X[1,]
b<-EVD$vectors[,1:1]
hatx<-b%*%(t(b)%*%x)
plot(x=arguments,y=x, type="l", col="black", ylim=c(min(X[1:5, ]), max(X[1:5, ])))
points(x=arguments,y=hatx, type="l", col="red")
#
b<-EVD$vectors[,1:2]
hatx<-b%*%(t(b)%*%x)
points(x=arguments,y=hatx, type="l", col="red")
#
b<-EVD$vectors[,1:3]
hatx<-b%*%(t(b)%*%x)
points(x=arguments,y=hatx, type="l", col="red")
9
Solutions Assignment Part I
Statistics and Financial Econometrics
Autumn 2017
P((Y6 , Y7 ) = (i, j)) = P((Y6 , Y7 ) = (i, j)|X = 1)P(X = 1) + P((Y6 , Y7 ) = (i, j)|X = 2)P(X = 2)
That is
$ % $ % $ %
1 1 3 1 7 5
1 1
(P((Y6 , Y7 ) = (i, j))0i,j=1 = 4
1
4
1
· + 16
9
16
3
· = 32
13
32
7
4 4
2 16 16
2 32 32
This distribution does not factorize in the product of its marginals. Hence Y6 and Y7 are not
independent.
c) It holds that
& ' & '
1 4 3 3 1 4 1 4
P(X = 1) = = P(X = 2), P(A|X = 2) = ( ) ( ), P(A|X = 1) = ( )
2 1 4 4 1 2
Thus Bayes rule suggests that
( 34 )3 ( 41 )
P(X = 2|A) = = 27/43 = 0.627907,
( 34 )3 ( 14 ) + ( 12 )4
( 21 )4
P(X = 1|A) = = 16/43 = 0.372093,
3 3 1
( 4 ) ( 4 ) + ( 12 )4
d)
Now
P({Y6 = 1} ∩ {X = 1} ∩ A)
P(Y6 = 1, X = 1|A) =
P(A)
P({Y6 = 1} ∩ {X = 1} ∩ A) P({X = 1} ∩ A)
=
P({X = 1} ∩ A) P(A)
P({Y6 = 1} ∩ {X = 1}) P({X = 1} ∩ A)
=
P({X = 1}) P(A)
= P(Y6 = 1|X = 1)P(X = 1|A)
Similarly, we obtain
P(Y6 = 1|A) = P(Y6 = 1|X = 2)P(X = 2|A) + P(Y6 = 1|X = 1)P(X = 1|A)
3 ( 34 )3 ( 14 ) 1 ( 21 )4
= +
4 ( 34 )3 ( 14 ) + ( 12 )4 2 ( 34 )3 ( 14 ) + ( 12 )4
3 27 1 16
= · + · = 113/172
4 43 2 43
10
Similarly,
SOLUTION 2
Consider the random variable S with values in {1, . . . , 5} which represents the choice of the
server. Let R be the event that the server rejected. We have
i 1
P(R|S = i) = , P(S = i) = , i = 1, . . . 5
5 5
That is Bayes formula gives
i
P(S = i|R) = #55 j
, i = 1, . . . , 5
j=1 5
a) Similarly, for the event R2 that the chosen server rejected twice we have
i 1
P(R2 |S = i) = ( )2 , P(S = i) = , i = 1, . . . 5
5 5
That is Bayes formula gives
( i )2
P(S = i|R2 ) = #5 5 j , i = 1, . . . , 5
2
j=1 ( 5 )
Consider the event T standing for successful response after the request was sent to another server.
We have
5
!
P(T |R2 ) = P(T ∩ {S = i}|R2 )
i=1
!5
P(T ∩ {S = i} ∩ R2 )
=
P(R2 )
i=1
5
! P(T ∩ {S = i} ∩ R2 ) P({S = i} ∩ R)
=
P({S = i} ∩ R2 ) P(R2 )
i=1
5
! P(T ∩ {S = i}) P({S = i} ∩ R2 )
=
P({S = i}) P(R2 )
i=1
5
!
= P(T |S = i)P(S = i|R2 )
i=1
> u<-(1:5)/5
> p<-u^2/sum(u^2)
> t<- (sum(1-u) -(1-u))/4
> sum(t*p)
[1] 0.4545455
11
b) As in the above argumentation, the probability that the server i was chosen. under condition
that it rejected the request is
i
pi = P(R|S = i) = #55 j i = 1, . . . 5
j=1 5
After the server i is removed the request is sent to one of the remaining four servers. Under
the condition that the first server was i ∈ {1, . . . , 5} and the second request was rejected, the
probability that the second server is j ∈ {1, . . . , 5} \ {i} equals to
j
5
qi,j = # k
, j ∈ {1, . . . , 5} \ {i}
k∈{1,...,5}\{i} 5
The probability that the second request will be processed successfully under the condition that
the serves i, j are chosen in the previous steps is
! k
li,j = (1 − )/3,
5
k∈{1,...,5}\{i,j}
> u<-(1:5)/5
> p<-u/sum(u)
> q<-matrix(data=u, byrow = TRUE, ncol=5, nrow=5)
> d<-matrix(data=sum(u)-u, byrow=FALSE, ncol=5, nrow=5)
> q<-q/d
> diag(q)<-0
> l<-matrix(data=sum(1-u), byrow=TRUE, nrow=5, ncol=5)
> s1<-matrix(data=1-u, byrow=TRUE, nrow=5, ncol=5)
> s2<-matrix(data=1-u, byrow=FALSE, nrow=5, ncol=5)
> l<-(l-s1-s2)/3
> p%*%( (q*l)%*%rep(1, 5) )
[,1]
[1,] 0.4772672
SOLUTION 3
Le X is be the time of occurrence of the r = 3 red card, then:
a) This is a negative binomial distribution with probability p = 12 and success number r = 3.
& '
i−1 1 i
P(X = i) = ( ) i = 3, 4, . . .
r−1 2
b) This must be modeled using Hypergeometric distribution. Since at the step i − 1 we must have
12
r − 1 red card among chosen i − 1 card followed by one red card, the probability is
( R )( N −R )
r−1 i−1−(r−1) R − (r − 1)
P(X = i) = (N) i = 3, 4, . . .
i−1
N − (i − 1)
(λp)k −λp
= e
k!
SOLUTION 5
a) Let D = B ∪ C with the probability P(D) = 1/2 + 1/2 − 1/4 = 3/4 = d. Denote by
0.1 = P(A \ D) = a and introduce s = P(A ∩ D). Since A and D are independent, we have
ad
P(A ∩ D) = P(A)P(D) ⇒ s = (a + s)d ⇒ s = = 3/10
1−d
That is P(A) = a + s = 4/10 giving
4 1 1 1
P(A ∩ B ∩ C) = P(A)P(B)P(C) = · · =
10 2 2 10
5
b) Define P(A ∪ B) = 9 = r and P(A) = p then P(B \ A) = r − p and s = P(A ∩ B) satisfies
(r − p)p
s = ((r − p) + s)p ⇒s=
1−p
13
(r−p)p
we need to determine p ∈ [0, r] which maximizes p )→ s(p) = 1−p on the interval [0, r]. Having
calculated the derivative, we obtain first-order conditions
(r − 2p)(1 − p) + (r − p)p
=0
(1 − p)2
giving quadratic equation
r − 2p + p2 = 0
whose unique solution in the interval [0, r] is given by
√
p=1− 1−r
In our case r = 5/9, we obtain p = 1/3 and s = 1/9, thus P(B) = r − p + s = 1/3, that is the
maximal possible value of P(A ∩ B) is 1/9 which is achieved with independent sets A B with
equal probabilities P(A) = P(B) = 13 .
h−1
1 (x, y) = x, h−1
2 (x, y) = y − x
14
and by
. x2 −2x
e− 2
fY (y) = √ e−y 1[x,∞[ (y)dx
R 2π
. y x2 −2x . x2 −2x+1
e− 2 1
y
e− 2
= e−y √ dx = e−y+ 2 √ dx
−∞ 2π −∞ 2π
1 1
= e−y+ 2 N (1, 1)(] − ∞, y]) = e−y+ 2 Φ(y − 1)
f<-function(y)
{
exp(-y+1/2)*pnorm(mean=1, sd=1, q=y)
}
15