DLMBDSA01-01 Practice Exam (2)
DLMBDSA01-01 Practice Exam (2)
DLMBDSA01-01 1433
EXAMID: 1221789
On multiple choice questions there is exact one right answer. Please check the right answer. You are permitted to check only one answer. If you check
more than one correct answer you will get 0 points. If you check the right answer you will get maximal points. Exam note cross must be set correctly.
QUESTION 1 OF 18
Marked out of 3.00
ng
ni
In which dimensions are data science activities typically conducted?
ar
Select one:
Le
data prediction and business analytics
descriptive dimension, prescriptive dimension and diagnostic dimension
feature engineering, model validation and hyperparameter tuning
e
data flow, data curation, and data analytics
nc
ta
QUESTION 2 OF 18
is
Marked out of 3.00
D
Which of the following defines a false positive output for a classification model?
IU
Select one:
am
QUESTION 3 OF 18
tic
Select one:
LDA correlation
Bayesian correlation
Pearson correlation
PCA correlation
1/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 4 OF 18
Marked out of 3.00
g
in
E-Mail spam detection is a ...
rn
Select one:
a
Le
reinforcement problem.
categorization problem.
classification problem.
e
regression problem.
nc
ta
QUESTION 5 OF 18
is
Marked out of 3.00
D
Select one:
am
high-dimensionality clustering
dimensionality reduction
de-biasing
Ex
outlier detection
e
QUESTION 6 OF 18
tic
In which mathematical technique does the elbow method play an important role?
Select one:
2/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 7 OF 18
Marked out of 3.00
g
in
In which sub-activity of data flow do accessibility, transparency and security play an important role?
rn
Select one:
a
Le
data collection
data preservation
data storage
e
data access
nc
ta
QUESTION 8 OF 18
is
Marked out of 3.00
D
The correlation between the amount of exercise per week and body weight is -0.45. What can be deduced about the
IU
Select one:
am
3/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 9 OF 18
Marked out of 3.00
g
in
Which type of data format is the following?
rn
{
"student": {
a
"name": "name_of_student",
Le
"grade": 85,
"enrolled": true
}
e
}
nc
Select one: ta
Apache Parquet
Protobuf
is
JSON
D
XML
IU
QUESTION 10 OF 18
am
Select one:
e
tic
4/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 11 OF 18
Marked out of 3.00
g
in
In normal distribution, 68% of the values are within …
rn
Select one:
a
Le
four standard deviations.
two standard deviations.
one standard deviation.
e
three standard deviations.
nc
ta
QUESTION 12 OF 18
is
Marked out of 3.00
D
Select one:
am
only TN and FN
only TN and FP
only TP and FN
Ex
only TP and FP
e
tic
ac
Pr
5/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 13 OF 18
Marked out of 3.00
g
in
Which of the following statements is correct for a histogram?
rn
Answer option 1: A smaller dataset is best visualized with a larger number of bins for increased accuracy.
a
Answer option 2: The number of bins has a direct relationship with the size of each bin.
Le
Select one:
e
None of the answer options are correct.
nc
Both answer options are correct.
Only answer option 1 is correct. ta
Only answer option 2 is correct.
is
QUESTION 14 OF 18
D
In the context of support vector machines (SVM), what role do support vectors and the margin play?
am
Select one:
Ex
Support vectors are the outlier data points, and the margin measures the regularization strength in
SVM.
Support vectors are the data records closest to the classification line, and the margin is the
e
Support vectors are the features used for classification, and the margin represents the error in the
model.
ac
Support vectors are the parameters defining the decision boundary, and the margin indicates the
lowest and highest value of the dataset.
Pr
6/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 15 OF 18
Marked out of 6.00
g
in
Briefly explain the three main aspects used to identify a data science use case (DSUC) in a business context.
a rn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
7/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 16 OF 18
Marked out of 6.00
g
in
Suppose you have an attribute with values ranging from -1290 to 4870, and you want to normalize these values using
rn
decimal scaling. Explain the required steps.
a
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
8/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 17 OF 18
Marked out of 18.00
g
in
Sophia and Daniel are working on a data science project for an e-commerce company aiming to improve customer
rn
satisfaction and sales. The project involves analyzing customer data to identify patterns, preferences, and potential areas
for enhancement in the online shopping experience. The dataset includes various customer-related variables such as
a
purchase history, browsing behavior, demographic information, and customer feedback.
Le
However, Sophia and Daniel are facing challenges related to data quality in this setting. One of the customer variables in
the dataset is "time spent on website", which represents the amount of time a customer spends browsing the e-
e
commerce platform during a session. Sophia and Daniel notice that there are missing values and outliers in this variable,
nc
potentially affecting their analysis of customer engagement.
a) Briefly explain what is meant with duplicates and outliers. Illustrate your answer by refering to the above described
ta
dataset.
b) Name four methods that are commonly utilized to resolve missing values.
is
c) Evaluate these methods in terms of their advantages and disadvantages for handling missing values related to the
D
9/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
10/19
DLMBDSA01-01 1433
EXAMID: 1221789
QUESTION 18 OF 18
Marked out of 18.00
g
in
Emily and David are environmental researchers working on a project to analyze various types of environmental data for a
rn
conservation organization. They are tasked with categorizing different environmental datasets into structured,
unstructured, and semi-structured data. However, they have different interpretations of these categories. Help them gain
a
a clear understanding.
Le
Categorize the following environmental data into the three categories: structured, unstructured, and semi-structured data.
Briefly explain why you assigned the data to a certain categorization.
e
nc
(1) Satellite imagery capturing deforestation patterns
(2) Research papers discussing climate change trends
(3) Excel spreadsheet containing bird species observations
ta
(4) Field notes detailing observations of animal behavior
(5) Monthly rainfall log
is
(6) Drones capturing real-time footage of wildlife habitats
D
IU
am
Ex
e
tic
ac
Pr
11/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
12/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
13/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
14/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
15/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
16/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
17/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
18/19
DLMBDSA01-01 1433
EXAMID: 1221789
g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr
19/19
Powered by TCPDF (www.tcpdf.org)