0% found this document useful (0 votes)
14 views4 pages

Mlsec Exercise 5 Solutions

Uploaded by

laribiamal24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

Mlsec Exercise 5 Solutions

Uploaded by

laribiamal24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning for Computer Security

Exercise Sheet 5
Summer term 2024

Exercise 1 Anomaly detection


Consider the following dataset

x
-2 -1 1 2
-1

-2

1. What is the anomaly score for ϕ(z1 ) = (0, 1) and ϕ(z2 ) = (0, 3)? Assume a center
of mass classifier with µ = (−0.28, 0.07).

2. Draw the hypersphere that would be learned by a hard-margin SVDD. What are
the support vectors of the hypersphere?

3. What is the anomaly score of the SVDD for z1 and z2 ? Assume that the center is
fully defined by the support
P vectors and all support vectors are weighted equally
by αi . Hint: Recall that ni=1 αi = 1.

4. Assume we add a point x3 = (0, 5) to the training set and retrain the classifier. How
does this affect the anomaly score? How can we lower the impact of this point?

1
Solution 1
y

x
-2 -1 1 2
-1

-2

1. For z1 and z2 we get

f (z1 ) ≈ 0.94 , f (z2 ) ≈ 8.66 .

2. The hard-margin SVDD aims to find the smallest possible hypersphere that encloses
all data points in the training data. The resulting support vectors are s1 = (0, 2)
and s2 = (0, −2).

3. Using the support vectors, we get

f (z1 ) ≈ 1.0 , f (z2 ) ≈ 9.0 .

4. Adding x3 to the training set will increase the volume of the hypersphere. To lower
its impact, we can use regularization, leading to a soft-margin SVDD that does
allow outliers.

2
Exercise 2 Anagram
Anagram uses n-grams and Bloom filters to find anomalies in network traffic. A Bloom
filter is a probabilistic data structure used to implement efficient membership queries of
elements in a set. It uses an array of m bits and a set of k hash functions that map to
an array index between 0 and m − 1. In the following, we consider Bloom filters of size
m = 32 with k = 2 hash functions:
h0 (w) = |w|2 mod m
X
h1 (w) = ϕ(wi ) mod m ,
i

where w ∈ Σ∗ is a string, | · | its length, and ϕ(x) a letter’s position in the alphabet:

a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

1. To add an element to the set, it is hashed by all hash functions and the respective
array positions in the Bloom filter are set to 1. Fill in the following empty Bloom
filter for the words “GET”, “POST”, “PUT”:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

2. To query an element, it gets hashed by all hash functions and the respective array
positions are examined. If any array position is 0, the element is definitively not
part of the set. If all array positions are 1, the element might be part of the set.
Query the following Bloom filter with the words “HEAD”, “DELETE”, “PATCH”:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

3. We can use a Bloom filter B to build a detector with an anomaly score


1B (w)
P
nseen
f (x) = = w∈x ,
|x| |x|
where the message x is a tuple of strings and 1B (w) is the indicator function.
Assume that we trained the Bloom filter on normal traffic. Calculate the scores for
x0 = (“GET”, “HEAD”, “PUT”) and x1 = (“GET”, “PATCH”, “DELETE”) and
decide whether the messages are anomalous. Messages with a score below 0.5 are
classified as anomalous.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

3
Solution 2
1.

h0 (“GET”) = 9 h1 (“GET”) = 0
h0 (“POST”) = 16 h1 (“POST”) = 6
h0 (“PUT”) = 9 h1 (“PUT”) = 25

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

2.

h0 (“HEAD”) = 16 h1 (“HEAD”) = 18
h0 (“DELETE”) = 4 h1 (“DELETE”) = 19
h0 (“PATCH”) = 25 h1 (“PATCH”) = 16

“HEAD” is (probably) in the set, “DELETE” is not in the set, “PATCH” is not in
the set.

3.
1+1+0 2
f (x0 ) = = > 0.5 ⇒ normal
3 3

1+0+0 1
f (x1 ) = = < 0.5 ⇒ anomalous
3 3

You might also like