0% found this document useful (1 vote)

338 views17 pages

Self Test Master Data Science SoSe 2021 2

1) The self assessment is required for applicants to the Master's program in Data Science at TU Dortmund University. 2) It provides an indication of how much an applicant needs to refresh their knowledge before beginning their studies. The exact result is only for orientation and not evaluated. 3) The assessment contains multiple choice questions in mathematics, computer science, and statistics. Applicants submit a signed certificate of completion with their application.

Uploaded by

Shayekh Mohiuddin Ahmed Navid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

338 views17 pages

Self Test Master Data Science SoSe 2021 2

Uploaded by

Shayekh Mohiuddin Ahmed Navid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Online self assessment for applicants of the

master programme Data Science

TU Dortmund University
Departments of Mathematics, Computer Science and Statistics
Version 2020

• The participation in this assessment is required during the application process for the master programme
Data Science.
• The self assessment is also intended to give you an indication of the extent to which it is recommended that
you refresh your knowledge before beginning your studies. The exact result is solely for your orientation and
is not evaluated by us.
• To provide your answers, please use the input mask available on the ‘EvaSys’ website:
https://evaluation.tu-dortmund.de/evasys/online.php?p=V1HSK
There, the questions are numbered as below and also identified by a keyword.
• We recommend that you work on the assessment offline and then enter your prepared answers in
the input mask. You should not interrupt this filling in.
• After answering all the questions, you obtain a certificate of attendance, which you have to fill in, sign, and
submit during the application process.
• The questions are multiple-choice, all of them with a set of possible answers, which are true or false. Each
answer can be marked as ‘true’ or ’false’ or you can click on ‘no idea’; you receive 1 point per answer for
correctly assigning it as ‘true’ or ’false’, -1 point off for incorrectly assigning it, and 0 points for ‘no idea’.
(Hence, you can receive as many points for a question as there are answers.)
• In the statistical programming Section 3.4, you can opt between R and Python representations of the same
questions. If you do not know any of these programming languages, select the answer option ‘no idea’ for all
these questions.

We wish you good luck!

1 Mathematics
• The symbol ln denotes the natural logarithm, that is, with base e.
• The symbols Z, Q, R, C denote the sets of integers, rational numbers, real numbers, and complex numbers,
respectively.

1.1 Calculus
ln(x − a)
Question 1. For a ∈ R, which statements do hold for the one-sided limit lim ?
x→a+ ln(ex − ea )
a) The limit exists.
b) Its value is 0.
c) Its value is 1.
d) Its value is ea .

1
Z π
Question 2. Which statements do hold for the definite integral ecos x sin x dx ?
0

a) The antiderivative is not explicitly calculable.

b) The definite integral has a finite value.
1
c) Its value is e − .
e
d) Its value is 0.
Question 3. The following figure shows the graph of the derivative f 0 of a function f , where f is continuous on the
interval [0, 4] and differentiable on the interval (0, 4). Which of the following statements give the correct ordering
of the function values f (0), f (2), and f (4)?

a) f (0) < f (4)

b) f (4) < f (2)
c) f (2) ≤ f (0)
d) f (4) = f (2)
∞
X xn
Question 4. Let f be the function defined by the series f (x) = for all x such that −1 < x < 1. Which
n=1
n
statements do hold?
a) The series converges absolutely.
∞
X
0
b) The derivative is f (x) = xn .
n=1
∞
X
c) The derivative is f 0 (x) = xn .
n=0

1
d) The derivative equals f 0 (x) = .
1−x

1.2 Linear Algebra

Question 5. Consider the following system of linear equations
3x + 2y + z = 0
x + y + z = 0
x − z = 0
with solutions of the form (x, y, z) where x, y, z are real numbers. Which of the following statements are correct?

2
a) The system is consistent.
b) The sum of any two solutions is a solution.
c) The system has a unique solution.
d) The system has infinitely many solutions.
Question 6. Which are eigenvalues of the matrix
 
3 2 5
 0 2 3  ?
0 1 4

a) 2
b) 3
c) 5
d) 0

1.3 Analytic Geometry

Question 7. Consider the solid in xyz-space, which contains all points (x, y, z) whose z-coordinate satisfies

0 ≤ z ≤ 4 − x2 − y 2

. Which statements do hold?

a) The solid is a sphere.
b) The solid is a pyramid.
c) Its volume is 8π.
16π
d) Its volume is .
3
Question 8. Consider the function g defined by g(x, y) = ey (y − x2 ) for all real x, y. Which of the following terms
are needed to represent the length of the gradient ∇g(1, −1) ?
√
a) 10
√
b) 5
c) e
d) π
Question 9. A circular helix in xyz-space has the following parametric equations, where θ ∈ R.

x(θ) = 4 cos θ
y(θ) = 4 sin θ
z(θ) = 3θ

Let L(θ) be the arclength of the helix from the point P (θ) = (x(θ), y(θ), z(θ)) to the point P (0) = (4, 0, 0), and let
D(θ) be the distance between P (θ) and the origin (0, 0, 0). Let L(θ) = 10. Which statements do hold?
a) θ = 4
b) θ = 2
c) To calculate the value of D for a given θ, x(θ) and y(θ) have to be evaluated explicitely.
√
d) D(θ) = 52

3
1.4 Differential Equations
Question 10. Let y : R → R be the real-valued function defined on the real line, which is the solution of the initial
value problem
y 0 = −xy + x, y(0) = 2.
Which statements are correct?
a) The problem is not uniquely solvable.
b) The solution y(x) contains an exponential function.
c) lim y(x) = 1
x→∞

d) lim y(x) = 0
x→∞

2 Computer Science
2.1 Data Structures
Question 11. The number of steps taken for searching the value x in a binary tree with n nodes . . .
a) depends on x.
b) depends on n.
c) is O(log2 n).
d) is O(logx n).
Question 12. The average-case performance when looking up a single search key . . .
a) is better with a Linked List than with a Hash Table.
b) is better with a Hash Table than with an Array.
c) is better with a Binary Search Tree than with a Hash Table.
d) is the same with a Linked List, an Array, and a Hash Table.
Question 13. Given 100 000 numbers, the minimum height of a binary search tree that can store all these numbers
...
a) depends on the numbers.
b) is larger than 20 levels.
c) is smaller than 19 levels.
d) can be calculated as log10 (100 000).
Question 14. Which of the following statements are correct for a max-heap?
a) The root always contains the largest key.
b) All keys in the left subtree are always smaller than any key in the corresponding right subtree.
c) All leaves are located on the same level.
d) Each subtree is also a max-heap.
Question 15. Which of the following statements are correct for a binary search tree?
a) The root always contains the largest key.
b) All keys in the left subtree are always smaller than any key in the corresponding right subtree.

4
c) All leaves are located on the same level.
d) Each subtree is also a binary search tree.
Question 16. The following operations are applied to an empty stack s:
s.push(1)
s.push(2)
s.push(3)
s.pop()
s.push(4)
s.pop()

The result of a further s.pop() is . . .

a) a number
b) undefined
c) 4

d) 2

2.2 Algorithms and Programming

Question 17. Sorting a data set is an important sub-problem in data science. Given the size n of a data set, which
statements are correct?
a) Bubble Sort has worst-case run-time complexity O(n).
b) Bubble Sort has worst-case run-time complexity O(n log N ).
c) Bubble Sort has worst-case run-time complexity O(n2 ).

d) Merge Sort has worst-case run-time complexity O(n).

e) Merge Sort has worst-case run-time complexity O(n log N ).
f) Merge Sort has worst-case run-time complexity O(n2 ).

g) Quick Sort has worst-case run-time complexity O(n).

h) Quick Sort has worst-case run-time complexity O(n log N ).
i) Quick Sort has worst-case run-time complexity O(n2 ).
Question 18. C1 and C2 are classes written in an object-oriented programming language (such as Java, C#, or
C++). Which of the following statements are correct if C1 is a superclass of C2?
a) C1 is always an abstract class.
b) C2 contains all public features defined by C1.
c) Each C2 object may be replaced by a C1 object.

d) C2 is a subclass of C1.
Question 19. The following function f uses recursion:
def f(n):
if n <= 1
return n
else
return f(n-1) + f(n-2)

5
Let n be a valid input, i.e., a natural number. Which of the following functions returns the same result but without
recursion?
a) def f(n):
a <- 0
b <- 1
if n = 0
return a
elsif n = 1
return b
else
for i in 1..n
c <- a + b
a <- b
b <- c
return b
b) def f(n):
a <- 0
i <- n
while i > 0
a <- a + i + (i-1)
return a

c) def f(n):
arr[0] <- 0
arr[1] <- 1
if n <= 1
return arr[n]
else
for i in 2..n
arr[i] <- arr[i-1] + arr[i-2]
return arr[n]
d) def f(n):
arr[0..n] <- [0, ..., n]
if n <= 1
return arr[n]
else
a <- 0
for i in 0..n
a <- a + arr[i]
return a

2.3 Logic and Databases

Question 20. If A, B, and C are Boolean variables, which of the following statements are correct?

a) A ∧ (B ∨ C) = (A ∧ B) ∨ (A ∧ C)
b) A ∨ (B ∧ C) = (A ∨ B) ∧ (A ∨ C)
c) (A ∧ B) ∨ C = C ∨ (B ∧ A)

Question 21. A large retail company keeps sales data local to the individual branches where sales transactions
were performed. To compute overall sales statistics, the company wants to avoid sending the full sales data set to
a central server. Instead, only aggregated sales information (sum, average, minimum, variance, median, maximum)
is sent from each branch to the central site. Which of the following statements are correct?
a) The overall sum can be derived from the sums per branch.

6
b) The overall average can be derived from the averages per branch.
c) The overall minimum can be derived from the minimums per branch.
d) The overall variance can be derived from the variances per branch.
e) The overall median can be derived from the medians per branch.
f) The overall maximum can be derived from the maximums per branch.
Question 22. Consider the following table in a relational database.
Last Name Rank Room Shift
Smith Manager 234 Morning
Jones Custodian 33 Afternoon
Smith Custodian 33 Evening
Doe Clerical 222 Morning
According to the data shown in the table, which of the following could be candidate keys of the table?
a) {Last Name}
b) {Room}
c) {Shift}
d) {Rank, Room}
e) {Room, Shift}
Question 23. The database interface of a library allows searching only for a single attribute (such as Title
or Author ) in each query. Your friend decided to extend it’s functionality and wrote an algorithm that allows
searching for books that satisfy multiple predicates over single attributes in conjunction. He tells you the algorithm
reuses the already implemented query functionality and works by intersecting the results ( book id’s ) of queries
over single attributes.
Which of the following assumptions on your friend’s algorithm are plausible?
a) Its worst-case run-time necessarily increases exponentially with respect to the number of attributes in the
query.
b) Its worst-case run-time depends on the length of the longest result of the single-attribute queries.
c) It might be implemented using an join.
d) It might be implemented using sorting.

2.4 Fundamentals of theoretical computer science

Question 24. Given an implementation of an algorithm, you want to check formally its run-time performance
before you apply the algorithm to big data sets, in order to prevent endless runs of algorithms on your computer.
The check if your algorithm runs endlessly on this data is depending on. . .
a) the length of the source code, it is a coding problem.
b) function calls in the algorithm, it is a call-graph problem.
c) recursion in the algorithm, it is a software design problem.
d) the size of your data, it is a big data problem.
Question 25. Which of the following languages are regular?
a) Words that consist of only vowels (‘a’, ‘e’, ‘i’, ‘o’, ‘u’).
b) Words where the 6th-last character is a vowel.
c) Words that contain as many vowels as consonants (non-vowels).
d) Palindroms (reading the word backwords yields the same word).

7
2.5 Computer Architecture
Question 26. In computer architecture, SIMD may refer to the situation where...
a) multiple CPU cores can access the same memory concurrently.
b) the same operation can be applied to multiple operands with only a single instruction.
c) multiple independent instructions can be executed at the same time in the same CPU core.
d) multiple independent memory banks show up as a single address space.

3 Statistics
3.1 Descriptive Statistics
Question 27. Which of the following sets have an arithmetic mean of 100, but a median smaller than 100?
a) {80, 100, 120}
b) {80, 80, 140}
c) {0, 50, 150}
d) {60, 120, 120}
Question 28. Can there be a set of data fitting to both the following histograms? Which of these answers are
correct?

Histogram 1 Histogram 2
0.15

0.06
0.10

0.04
Density

Density
0.05

0.02
0.00

0.00

−4 −2 0 2 4 6 8 0 10 20 30 40 50

x y

a) No, because the right one is calculated from positive data only.
b) Yes, the right one includes all possible data from which the left one may be calculated.
c) No, the right one must be calculated with at least one value greater than 8.
d) No, the left one can not have been calculated with a value of 10 or more.

Question 29. Calculate estimates of the standard deviations sx , sy of the samples x = (5, 9, 7) and y = (−1, 2, 5)
as well as the Pearson coefficient of correlation rxy of x and y. Which of the following answers are correct?
a) sx = 4, sy = 9

8
b) rxy = 0
c) sx = 2, sy = 3
1
d) rxy = 2
1
e) rxy = 4

Question 30. Consider the following scatter plot.

●
●

● ●

●
● ●●
●
●
1

● ●●

● ●
●
● ● ●
●

●●
●
● ● ●
0

●
●
● ●
x2

● ●

● ●
● ● ●
●

●
● ● ●
● ●
● ●
−1

●
●
−2

−2 −1 0 1

The coefficient of correlation of the two variables . . .

a) is negative.
b) is positive.

c) should have an absolute value greater than 0.4.

d) should be close to zero.
Question 31. Let the coefficient of correlation of two variables X and Y be larger than zero. What will be the
effect on it, if the data of X are multiplied by the factor of 2?

a) The effect depends on the data of X.

b) It depends on Y .
c) The coefficient will be doubled.

d) It will be increased fourfold.

3.2 Probability
Question 32. There are 8 socks in your drawer: 4 black and 4 red. You take 3 of them with you in the dark.
Which statements are correct?
a) It is sure that you get at least two socks (a pair) of the same colour.
b) It is sure that you get a pair of reds.
c) The probability to get 3 of the same colour is 18 .

9
d) The probability to get 3 of the same colour is 71 .
Question 33. In the sports injuries unit of a hospital, 40% of the patients are rugby players, 20% are swimmers
and the remaining 40% play soccer. For a rugby player, the probability to be released on the first day is 10%; for
a swimmer, it is 20%; for a soccer player, it is 80%. Which of the following statements are correct?
a) 40% of all patients are released on the first day.

b) Given a patient is released on the first day, the probability of her/him being a soccer player is 80%.
c) 80% of the non-swimmers have to stay for more than one day.
Question 34. Let X be a random variable with probability density function
(
1 2
x , x ∈ [0, 3],
f (x) = 9
0, else.

Which of the following statements are correct?

a) The expected value of X is 49 .
1
b) The probability of X < 1 is 27 .
1
c) The probability of X ∈ [0, 0.5] is 54 .

d) The probability of X = 1 is zero.

3.3 Inference and Linear Models

Question 35. Let X be a random variable defined by the density function
α

 αβ
α+1
, if x ≥ β
f (x) = x
0 , else

with parameters α > 0 and β > 0. We observe a sample {3, 4, 8}. Which of the following statements are correct?
a) The expected value of X exists for all combinations of α and β.
b) The expected value does only depend on α, but not on β.

c) Given the sample, β can not be larger than 3.

5
d) If we assume β = 2, the estimation of the expected value from the sample mean leads to an estimate of for
3
α.
5
e) is also the maximum likelihood estimate of α in this case.
3
Question 36. We are interested in significant differences (level α = 0.05) between the expected values µ1 and µ2
of two populations. Which of the following statements on statistical tests are correct?
a) We will formulate the null hypothesis as µ1 = µ2 .
b) A t-test can always be applied in this situation.

c) A p-value is the probability that the null hypothesis is correct, given the observed data.
d) If we obtain a p-value of 0.04, we will reject (level α = 0.05) the null hypothesis.
Question 37. One of the lines in the following scatter plot is the regression line fitted to the data. Which of the
statements are correct?

10
80
●

70
60
50
●
●
●
● ●● ●

y
● ● ●
●
● ●● ●
● ●
● ● ● ● ●

40
● ●●
●
●● ●● ● ●
● ● ● ●●
● ●
●● ●
● ● ● ●
●
30
● ●
● ●●●
●● ● ●
● ● ● ●
● ●● ●
● ● ● ● ●
● ●●
● ●● ● ●
20

● ● ●
● ●
● ●
●● ●
●● ● ●
●
● ●
●
●
10

0 50 100 150 200

a) The red and green line have the right direction, and, hence, one of them could be the regression line.
b) The blue line seems to represent the mean value of the data with respect to y and thus could be the regression
line.
c) The point in the top right corner has a strong influence on the regression line.
d) Leaving aside the point in the corner, the red line seems to fit better to the rest of the data.

Question 38. You have performed a linear regression analysis to explore sunflowers’ growth (in meters per month)
depending on the watering (in litres per day). You have estimated the regression coefficient to be β̂ = 1.6. What
can you conclude?
a) There is a significant correlation between watering and growth.
b) An average sunflower growths 1.6 meters per month.
c) If you give it an additional litre of water per day, there will be an additional average growth of 1.6 meters per
month.
d) According to the model assumptions, an additional litre of water per day will result in additional 19.2 meters
of growth after one year.
e) You should consider further influencing quantities.

3.4 (Alternative 1) R Programming

In this section, you can opt between R (here) and Python (below) representations of the same questions.
Question 39. Which of the following R commands evaluates to TRUE?

a) 5 >= 5
b) TRUE & FALSE | FALSE & TRUE
c) FALSE & FALSE & FALSE | TRUE
d) !(((TRUE > FALSE) > TRUE) & !TRUE)

Question 40. Consider the following code chunk:

11
x <- 0
while(x < 4) {
x <- sample(1:3, 1)
print(x)
}
It is not a good idea to run these lines because...
a) x is an invalid argument to print().
b) the condition x < 4 is never violated.
c) the function sample() does not exist.
d) x is initialised with the wrong type.
Question 41. Which of the following code lines return TRUE?
a) max(c(2, 3, 4, NA, 1, 5)) == NA
b) max(c(2, 3, 4, NA, 1, 5), na.rm = TRUE) == 5
c) typeof(sum(c(1, 2, 3, 4, NA))) == "double"
d) typeof(sum(1:4)) == "integer"
e) typeof(sum(c(1L, 2L, 3L, 4L, NA_real_), na.rm = TRUE)) == "integer"
Question 42. Which functions may have been used to generate the following plot and its underlying data?

a) lm()
b) points()
c) abline()
d) integrate()

Question 43. Consider the following code chunk and output and note that NA appears in the output of lm().

X1 <- rnorm(1e2)
X2 <- X1 + 3
Y <- X1 + X2 + rnorm(1e2)
lm(Y ~ X1 + X2)

12
Call:
lm(formula = Y ~ X1 + X2)

Coefficients:
(Intercept) X1 X2
2.979 2.019 NA

Which of the following statements are correct?

a) Perfectly correlated regressors X1 and X2 are used.

b) lm() excludes X2 from the regression so that there is a least squares solution.
c) NA indicates that the model fit to the data is perfect.

3.4 (Alternative 2) Python Programming

In this section, you can opt between R (above) and Python (here) representations of the same questions.
Question 39. Which of the following Python commands evaluates to True?
a) 5 >= 5

b) True & False | False & True

c) False & False & False | True
d) not(((True > False) > True) & (not(True)))

Question 40. Consider the following code chunk:

import random
x = 0
while(x < 4):
x = random.choice([1, 2, 3])
print(x)

It is not a good idea to run these lines because...

a) x is an invalid argument to print().
b) the condition x < 4 is never violated.

c) the function random.choice() does not exist.

d) x is initialised with the wrong type.
Question 41. Which of the following codelines return True.
a) numpy.argmax(numpy.array([2,3,4,numpy.NAN,1,5])) == 4

b) numpy.nanargmax(numpy.array([2,3,4,numpy.NAN,1,5])) == 5
c) type(numpy.array([1,2,3,4,numpy.NAN]).sum()) is numpy.float64
d) type(numpy.array([1,2,3,4],dtype=object).sum()) is int

e) type(numpy.array([1,2,3,4]).sum()) is int
Question 42. Which packages may have been used to generate the following plot and its underlying data?

13
6

2
Y

4
4 2 0 2 4 6
X
a) numpy
b) mathplotlib
c) statsmodels
d) math
Question 43. Consider the following code chunk and output and note that there are two warnings.
import pandas as pd
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt

X1 = np.random.normal(0, 1, 100)
X2 = X1 + 3
Y = X1 + X2 + np.random.normal(0, 1, 100)

df = pd.DataFrame({"Y": Y, "X1": X1, "X2": X2})

linmodel = sm.ols(formula = "Y ~ X1 + X2", data = df).fit()

linmodel.summary()

Output:
Intercept -0.0272
X1 1.1074
X2 1.0257

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 3.27e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
Which of the following statements are correct?
a) Perfectly correlated regressors X1 and X2 are used.
b) Either X1 or X2 should be excluded, as the second regressor does not add any information to the model.
c) The second warning indicates that the model fit to the data is perfect.

14
4 Data Science
Question 44. Consider a data set data containing all German inhabitants, which is subsetted in the following
process:

data = subset(data, Gender == "female")

data = subset(data, Status == "married")
data = subset(data, Haircolor == "brown")

Select which statements are true after all the subsets have applied.
a) The data set contains all brown haired and married females worldwide.
b) The data set contains all brown haired German inhabitants.
c) The data set contains all brown haired and married female German inhabitants.
d) The data set contains all brown haired, married, female German inhabitants with at least 2 children.
Question 45. For what ultimate purposes may algorithms like Nelder-Mead, Newton-Raphson or gradient-descent
be used for?
a) To find the minimum of a function.
b) To find all zeros of a function.
c) To evaluate the derivative of a function.
d) To solve a generalised regression problem.
Question 46. The Titanic data set contains information, whether passengers of the Titanic survived the shipwreck,
based on their gender, age and passenger class. The following decision tree has been learned on this data. Which
of the statements are true?

died
0.38
100%

yes sex = male no

died survived
0.19 0.73
64% 36%

age >= 9.5 pclass = 3rd

survived died
0.58 0.49
3% 17%

pclass = 3rd age >= 1.5

died died survived died survived survived

0.17 0.38 1.00 0.48 0.86 0.93
61% 2% 1% 16% 1% 19%

a) Overall, 62% of the passengers in the data set died.

b) A ‘new‘ passenger (female, 3rd class, 30 years old) is predicted to die in the shipwreck.
c) 62% of the passengers in the data set are female.
d) All male 3rd class passengers in the data set died.

15
Question 47. Random forests are one of the most famous machine learning methods. They are easy to understand,
easy to implement and reach good prediction performances even without a hyper-parameter tuning. Which of the
following statements on random forest are correct?
a) The prediction of a classification forest is made by a majority vote of the trees’ predictions.
b) The prediction of a regression forest is the median of the tree predictions.

c) Each single tree in the forest uses only a part of the data available.
d) The training time of a random forest scales linear with the number of trees used.
Question 48. Let us return to the Titanic data set. We now have learned several models and want to choose the
best one. We used three different methods to validate these models: The training error rate (apparent error rate),
the error rate on an external test set and the error rate estimated by a 10-fold cross validation.

Learner Training Error Error on the test set Cross Validation Error
Decision Tree 0.18 0.22 0.21
Random Forest 0.01 0.10 0.12
1-Nearest-Neighbour 0 0.18 0.19

Which of the following statements are correct?

a) 1-Nearest-Neighbour has a perfect training error and hence it should be used here.
b) Random Forests outperforms both 1-Nearest-Neighbour and the Decision Tree in terms of prediction error.
c) Not just in this case, but in general, Cross Validation is the better validation strategy and should always be
preferred over the error on a single test set.

d) Not just in this case, but in general, Decision Trees always perform worse than Random Forests.
Question 49. We try a last model class to find the perfect model for the Titanic data-set: An SVM. The SVM is a
model class that is very sensitive to hyper-parameter tuning. Especially, the cost parameter C and the bandwidth
of the RBF kernel λ must be optimally adjusted in order to obtain a sensible model.
We use a nested resampling strategy to perform this hyper-parameter tuning: At first, 33% of the data are
laid aside as an external test set, to validate the result of the hyper-parameter tuning itself (the outer resampling
strategy). We use a random search as the tuning algorithm with a budget of 100 iterations. As parameter spaces,
we use all positive real numbers for both C and λ. The performance of a single hyper-parameter setting is evaluated
using a 10-fold cross validation (the inner resampling strategy). Moreover, in order to speed up the entire tuning
process, we utilise parallel computing.
Which of the following statements are correct?

a) Using a nested resampling is necessary in order to detect underfitting.

b) As both C and λ are numeric parameters, any other optimization algorithm could be used instead of random
search.
c) The choice of cross-validation as the inner resampling strategy is arbitrary, and a bootstrapping would lead
to similar results.
d) The parallelization should take place at the innermost loop, hence, the execution of the inner cross-validation
loop should be parallelized.
Question 50. Take a look at the following scatter plot of the so-called XOR data-set:

16
1.0
0.5
0.0
x2

−0.5
−1.0

−1.0 −0.5 0.0 0.5 1.0

It is a classification data-set with the goal of separating the red and the black observations. Assume, that the
number of red and black observations is approximately equal. Which of the following statements is correct?
a) A Decision Tree can reach a prediction error of (nearly) zero on this data-set.

b) When performing a variable selection using the step-wise forward selection algorithm, neither of the variables
x1 , x2 will be added to the model.
c) A Linear Discriminant Analysis (LDA) can reach a prediction error of (nearly) zero on this data-set.
d) Every model using only one of the two variables x1 , x2 will have a missclassification error of approximately
50%.

AP Pgecet Cse Question Paper
100% (2)
AP Pgecet Cse Question Paper
18 pages
FE Civil Exam - Updated
No ratings yet
FE Civil Exam - Updated
163 pages
1.MDS Final Entrance Model Questions 2078
No ratings yet
1.MDS Final Entrance Model Questions 2078
16 pages
1-Mapping Problems To Machine Learning Tasks
No ratings yet
1-Mapping Problems To Machine Learning Tasks
19 pages
NIELIT-2018-Dec-CS-IT-STA
No ratings yet
NIELIT-2018-Dec-CS-IT-STA
17 pages
Gs2015 Qp Css
No ratings yet
Gs2015 Qp Css
16 pages
Msc Ds Sample Qp 2025 0
No ratings yet
Msc Ds Sample Qp 2025 0
12 pages
Cmi Ds Mock 2025
No ratings yet
Cmi Ds Mock 2025
3 pages
DSAI-MS_PhD-SampleQuestionPaper
No ratings yet
DSAI-MS_PhD-SampleQuestionPaper
6 pages
2015 Set-2 PDF
No ratings yet
2015 Set-2 PDF
12 pages
Chennai Mathematical Institute
No ratings yet
Chennai Mathematical Institute
16 pages
Chennai Mathematical Institute
No ratings yet
Chennai Mathematical Institute
14 pages
Model Question Paper - PhD Admission Test, CSE, IIT Guwahati
No ratings yet
Model Question Paper - PhD Admission Test, CSE, IIT Guwahati
4 pages
mscds2020
No ratings yet
mscds2020
22 pages
mscds2018 Solutions
No ratings yet
mscds2018 Solutions
11 pages
DBMS
No ratings yet
DBMS
100 pages
2015-Spr
No ratings yet
2015-Spr
4 pages
2008oct FE AM Questions
No ratings yet
2008oct FE AM Questions
31 pages
Exam
No ratings yet
Exam
9 pages
No. of Questions: 120: Sin XDX
No ratings yet
No. of Questions: 120: Sin XDX
18 pages
3rd End Sem Pyqs
No ratings yet
3rd End Sem Pyqs
13 pages
Mscdatascience2018 Sample
No ratings yet
Mscdatascience2018 Sample
12 pages
MFCS
No ratings yet
MFCS
30 pages
Yahoo Inc Placement Process Details
No ratings yet
Yahoo Inc Placement Process Details
12 pages
GATE-2012: CS: Computer Science & Information Technology
No ratings yet
GATE-2012: CS: Computer Science & Information Technology
14 pages
00 01 10 11 Ab CD 00 01 11 10
No ratings yet
00 01 10 11 Ab CD 00 01 11 10
20 pages
HCU_Ph.D-Computer-Science-2021
No ratings yet
HCU_Ph.D-Computer-Science-2021
18 pages
Jamia 2024 With Solution
No ratings yet
Jamia 2024 With Solution
10 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
93 pages
CS-12-PREBOARD-SET-1 -MARKING SCHEME(with questions)
No ratings yet
CS-12-PREBOARD-SET-1 -MARKING SCHEME(with questions)
13 pages
Computer Sciences
No ratings yet
Computer Sciences
8 pages
GS2025_QP_CS_LIDS
No ratings yet
GS2025_QP_CS_LIDS
16 pages
Section - A General Aptitude: Choose The Most Appropriate Option. (Q.No. 01 To 13)
No ratings yet
Section - A General Aptitude: Choose The Most Appropriate Option. (Q.No. 01 To 13)
18 pages
Unilag-Msc Computer Sci 2014
No ratings yet
Unilag-Msc Computer Sci 2014
17 pages
pgcs2012
No ratings yet
pgcs2012
5 pages
GATE - 2 01 5: CS: Com Puter Sci Ence & I N Form at I On Tech Nol Ogy
No ratings yet
GATE - 2 01 5: CS: Com Puter Sci Ence & I N Form at I On Tech Nol Ogy
14 pages
Quiz Assignment
100% (1)
Quiz Assignment
7 pages
AP PGECET CS and IT (CS-2015) Question Paper & Answer Key. Download All Previous Years Computer Science & Information Technology Sample & Model Question Papers.
100% (2)
AP PGECET CS and IT (CS-2015) Question Paper & Answer Key. Download All Previous Years Computer Science & Information Technology Sample & Model Question Papers.
16 pages
Algorithms & Data Structures CS-IT Workbook
No ratings yet
Algorithms & Data Structures CS-IT Workbook
83 pages
Mtech Cs and Crs Pca 2025
No ratings yet
Mtech Cs and Crs Pca 2025
10 pages
JMI 2024 Original Paper INPS Classes
No ratings yet
JMI 2024 Original Paper INPS Classes
8 pages
Mtech Cs and Crs Pca 2023
No ratings yet
Mtech Cs and Crs Pca 2023
9 pages
MST Final Test 14 Feb 19 (Answer Key) PDF
No ratings yet
MST Final Test 14 Feb 19 (Answer Key) PDF
8 pages
MCQ For IES Gate PSU S Practice Test Workbook Booklet PDF
No ratings yet
MCQ For IES Gate PSU S Practice Test Workbook Booklet PDF
154 pages
GATE CS 1999 Actual Paper
No ratings yet
GATE CS 1999 Actual Paper
16 pages
Enrollment No
No ratings yet
Enrollment No
5 pages
CEAT Question Bank
No ratings yet
CEAT Question Bank
8 pages
pgcs
No ratings yet
pgcs
9 pages
CS 2017 - Set 1-Watermark - pdf-42
No ratings yet
CS 2017 - Set 1-Watermark - pdf-42
7 pages
MIT Entrance Model Question
No ratings yet
MIT Entrance Model Question
16 pages
Ugcnet
100% (1)
Ugcnet
151 pages
CCW CST308
No ratings yet
CCW CST308
5 pages
Work Book of DS Ada Legal File
No ratings yet
Work Book of DS Ada Legal File
50 pages
2017 Est
No ratings yet
2017 Est
3 pages
HW
No ratings yet
HW
41 pages
CMI_MSc_DS_Mock_2025
No ratings yet
CMI_MSc_DS_Mock_2025
3 pages
pgcs2013
No ratings yet
pgcs2013
4 pages
GATE 2022 Paper Solution (CS) IESMaster
No ratings yet
GATE 2022 Paper Solution (CS) IESMaster
26 pages
HimeTest1-MS
No ratings yet
HimeTest1-MS
16 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
PMIT Admission Brochure - Spring 2021
No ratings yet
PMIT Admission Brochure - Spring 2021
3 pages
Application For Approval For The Technischen Hochschule Deggendorf To Summer Semester 2021
No ratings yet
Application For Approval For The Technischen Hochschule Deggendorf To Summer Semester 2021
2 pages
Multiple Graduate Funding Opportunities: Lakehead Optimization Research Group (LORG)
No ratings yet
Multiple Graduate Funding Opportunities: Lakehead Optimization Research Group (LORG)
2 pages
What Is Agile? What Is Scrum?: The FAQ Guide For Everything You Need To Know
No ratings yet
What Is Agile? What Is Scrum?: The FAQ Guide For Everything You Need To Know
8 pages
Data Scince Minor Project
No ratings yet
Data Scince Minor Project
17 pages
Project report
No ratings yet
Project report
40 pages
Deep Learning Project Nice
No ratings yet
Deep Learning Project Nice
45 pages
Sop Machine Learning Model Development
No ratings yet
Sop Machine Learning Model Development
5 pages
DWM Notes
No ratings yet
DWM Notes
19 pages
ML PDF
No ratings yet
ML PDF
17 pages
qasim-et-al-2024-lasso-type-instrumental-variable-selection-methods-with-an-application-to-mendelian-randomization (1)
No ratings yet
qasim-et-al-2024-lasso-type-instrumental-variable-selection-methods-with-an-application-to-mendelian-randomization (1)
24 pages
Artificial Neural Networks Model For Predicting Excavator (Ok)
No ratings yet
Artificial Neural Networks Model For Predicting Excavator (Ok)
7 pages
Modern Machine Learning in Python
No ratings yet
Modern Machine Learning in Python
50 pages
Lab Manual
No ratings yet
Lab Manual
100 pages
Finance Project Proposal
No ratings yet
Finance Project Proposal
7 pages
Unit-2
No ratings yet
Unit-2
125 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
No ratings yet
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
20 pages
Finhack 2018 - ATM Cash Optimization (Dilan) v2
No ratings yet
Finhack 2018 - ATM Cash Optimization (Dilan) v2
23 pages
Bouchard, Stenetorp, Riedel - Unknown - Learning To Generate Textual Data
No ratings yet
Bouchard, Stenetorp, Riedel - Unknown - Learning To Generate Textual Data
9 pages
Lec-18_Model Evaluation Metrics & Performance Measures
No ratings yet
Lec-18_Model Evaluation Metrics & Performance Measures
13 pages
Rpart
No ratings yet
Rpart
34 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
Report - 5G-Energy Consumption Modelling by ITU
No ratings yet
Report - 5G-Energy Consumption Modelling by ITU
13 pages
ADC 2018 Paper 49
No ratings yet
ADC 2018 Paper 49
14 pages
Unstyle: A Tool For Evading Authorship Attribution
No ratings yet
Unstyle: A Tool For Evading Authorship Attribution
80 pages
Machine Learning Notes
100% (1)
Machine Learning Notes
115 pages
Ijeditor1,+nov Dec+2021 26 32 PDF
No ratings yet
Ijeditor1,+nov Dec+2021 26 32 PDF
7 pages
To Improve The Performance of Models Predicting Ba
No ratings yet
To Improve The Performance of Models Predicting Ba
6 pages
Comparison Between ICH vs ANVISA AMV 1742817436
No ratings yet
Comparison Between ICH vs ANVISA AMV 1742817436
15 pages
Neural Networks: A New Technique For Development of Decision Support Systems in Dentistry
No ratings yet
Neural Networks: A New Technique For Development of Decision Support Systems in Dentistry
5 pages

Self Test Master Data Science SoSe 2021 2

Uploaded by

Self Test Master Data Science SoSe 2021 2

Uploaded by

Online self assessment for applicants of the

master programme Data Science

We wish you good luck!

a) The antiderivative is not explicitly calculable.

a) f (0) < f (4)

1.2 Linear Algebra

1.3 Analytic Geometry

. Which statements do hold?

The result of a further s.pop() is . . .

2.2 Algorithms and Programming

d) Merge Sort has worst-case run-time complexity O(n).

g) Quick Sort has worst-case run-time complexity O(n).

2.3 Logic and Databases

2.4 Fundamentals of theoretical computer science

Question 30. Consider the following scatter plot.

The coefficient of correlation of the two variables . . .

c) should have an absolute value greater than 0.4.

a) The effect depends on the data of X.

d) It will be increased fourfold.

Which of the following statements are correct?

d) The probability of X = 1 is zero.

3.3 Inference and Linear Models

c) Given the sample, β can not be larger than 3.

0 50 100 150 200

3.4 (Alternative 1) R Programming

Question 40. Consider the following code chunk:

Which of the following statements are correct?

3.4 (Alternative 2) Python Programming

b) True & False | False & True

Question 40. Consider the following code chunk:

It is not a good idea to run these lines because...

c) the function random.choice() does not exist.

df = pd.DataFrame({"Y": Y, "X1": X1, "X2": X2})

linmodel = sm.ols(formula = "Y ~ X1 + X2", data = df).fit()

data = subset(data, Gender == "female")

yes sex = male no

age >= 9.5 pclass = 3rd

pclass = 3rd age >= 1.5

died died survived died survived survived

a) Overall, 62% of the passengers in the data set died.

Which of the following statements are correct?

a) Using a nested resampling is necessary in order to detect underfitting.

−1.0 −0.5 0.0 0.5 1.0

You might also like