Mlsec Exercise 5 Solutions
Mlsec Exercise 5 Solutions
Exercise Sheet 5
Summer term 2024
x
-2 -1 1 2
-1
-2
1. What is the anomaly score for ϕ(z1 ) = (0, 1) and ϕ(z2 ) = (0, 3)? Assume a center
of mass classifier with µ = (−0.28, 0.07).
2. Draw the hypersphere that would be learned by a hard-margin SVDD. What are
the support vectors of the hypersphere?
3. What is the anomaly score of the SVDD for z1 and z2 ? Assume that the center is
fully defined by the support
P vectors and all support vectors are weighted equally
by αi . Hint: Recall that ni=1 αi = 1.
4. Assume we add a point x3 = (0, 5) to the training set and retrain the classifier. How
does this affect the anomaly score? How can we lower the impact of this point?
1
Solution 1
y
x
-2 -1 1 2
-1
-2
2. The hard-margin SVDD aims to find the smallest possible hypersphere that encloses
all data points in the training data. The resulting support vectors are s1 = (0, 2)
and s2 = (0, −2).
4. Adding x3 to the training set will increase the volume of the hypersphere. To lower
its impact, we can use regularization, leading to a soft-margin SVDD that does
allow outliers.
2
Exercise 2 Anagram
Anagram uses n-grams and Bloom filters to find anomalies in network traffic. A Bloom
filter is a probabilistic data structure used to implement efficient membership queries of
elements in a set. It uses an array of m bits and a set of k hash functions that map to
an array index between 0 and m − 1. In the following, we consider Bloom filters of size
m = 32 with k = 2 hash functions:
h0 (w) = |w|2 mod m
X
h1 (w) = ϕ(wi ) mod m ,
i
where w ∈ Σ∗ is a string, | · | its length, and ϕ(x) a letter’s position in the alphabet:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1. To add an element to the set, it is hashed by all hash functions and the respective
array positions in the Bloom filter are set to 1. Fill in the following empty Bloom
filter for the words “GET”, “POST”, “PUT”:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
2. To query an element, it gets hashed by all hash functions and the respective array
positions are examined. If any array position is 0, the element is definitively not
part of the set. If all array positions are 1, the element might be part of the set.
Query the following Bloom filter with the words “HEAD”, “DELETE”, “PATCH”:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
3
Solution 2
1.
h0 (“GET”) = 9 h1 (“GET”) = 0
h0 (“POST”) = 16 h1 (“POST”) = 6
h0 (“PUT”) = 9 h1 (“PUT”) = 25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
2.
h0 (“HEAD”) = 16 h1 (“HEAD”) = 18
h0 (“DELETE”) = 4 h1 (“DELETE”) = 19
h0 (“PATCH”) = 25 h1 (“PATCH”) = 16
“HEAD” is (probably) in the set, “DELETE” is not in the set, “PATCH” is not in
the set.
3.
1+1+0 2
f (x0 ) = = > 0.5 ⇒ normal
3 3
1+0+0 1
f (x1 ) = = < 0.5 ⇒ anomalous
3 3