0% found this document useful (0 votes)
9 views19 pages

DLMBDSA01-01 Practice Exam (2)

The document is a practice exam consisting of 18 multiple-choice questions related to data science concepts, including classification models, correlation analysis, and data formats. Each question requires the selection of one correct answer, with specific instructions on marking and scoring. Additionally, there are open-ended questions that ask for explanations and evaluations related to data science use cases and data normalization techniques.

Uploaded by

Manish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

DLMBDSA01-01 Practice Exam (2)

The document is a practice exam consisting of 18 multiple-choice questions related to data science concepts, including classification models, correlation analysis, and data formats. Each question requires the selection of one correct answer, with specific instructions on marking and scoring. Additionally, there are open-ended questions that ask for explanations and evaluations related to data science use cases and data normalization techniques.

Uploaded by

Manish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

PRACTICE EXAM

DLMBDSA01-01 1433
EXAMID: 1221789

On multiple choice questions there is exact one right answer. Please check the right answer. You are permitted to check only one answer. If you check
more than one correct answer you will get 0 points. If you check the right answer you will get maximal points. Exam note cross must be set correctly.

QUESTION 1 OF 18
Marked out of 3.00

ng
ni
In which dimensions are data science activities typically conducted?

ar
Select one:

Le
data prediction and business analytics
descriptive dimension, prescriptive dimension and diagnostic dimension
feature engineering, model validation and hyperparameter tuning

e
data flow, data curation, and data analytics

nc
ta
QUESTION 2 OF 18
is
Marked out of 3.00
D

Which of the following defines a false positive output for a classification model?
IU

Select one:
am

The classifier labels a “No” data record as “Yes”.


The classifier labels a “No” data record as “No”.
The classifier labels a “Yes” data record as “Yes”.
Ex

The classifier labels a “Yes” data record as “No”.


e

QUESTION 3 OF 18
tic

Marked out of 3.00


ac
Pr

Which of the following is a type of correlation analysis?

Select one:

LDA correlation
Bayesian correlation
Pearson correlation
PCA correlation

1/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 4 OF 18
Marked out of 3.00

g
in
E-Mail spam detection is a ...

rn
Select one:

a
Le
reinforcement problem.
categorization problem.
classification problem.

e
regression problem.

nc
ta
QUESTION 5 OF 18
is
Marked out of 3.00
D

What is one main goal of principal component analysis (PCA)?


IU

Select one:
am

high-dimensionality clustering
dimensionality reduction
de-biasing
Ex

outlier detection
e

QUESTION 6 OF 18
tic

Marked out of 3.00


ac
Pr

In which mathematical technique does the elbow method play an important role?

Select one:

principal component analysis (PCA)


K-means clustering
time-series forecasting
linear regression

2/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 7 OF 18
Marked out of 3.00

g
in
In which sub-activity of data flow do accessibility, transparency and security play an important role?

rn
Select one:

a
Le
data collection
data preservation
data storage

e
data access

nc
ta
QUESTION 8 OF 18
is
Marked out of 3.00
D

The correlation between the amount of exercise per week and body weight is -0.45. What can be deduced about the
IU

relationship between weekly exercise and body weight?

Select one:
am

The majority of individuals do only few hours of exercises per week.


There is no relationship between the amount of exercise and the body weight.
Ex

The majority of individuals have a low body weight.


Individuals who engage in more weekly exercise tend to have lower body weights.
e
tic
ac
Pr

3/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 9 OF 18
Marked out of 3.00

g
in
Which type of data format is the following?

rn
{
"student": {

a
"name": "name_of_student",

Le
"grade": 85,
"enrolled": true
}

e
}

nc
Select one: ta
Apache Parquet
Protobuf
is
JSON
D

XML
IU

QUESTION 10 OF 18
am

Marked out of 3.00


Ex

What do the parameters {S, A, R, P, and V} represent in a Markov decision process?

Select one:
e
tic

states, actions, rewards, policy, values


strategy, agent, reinforcement, prediction, verification
ac

simulation, alternatives, returns, planning, visualization


scenarios, alternatives, restrictions, patterns, validation
Pr

4/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 11 OF 18
Marked out of 3.00

g
in
In normal distribution, 68% of the values are within …

rn
Select one:

a
Le
four standard deviations.
two standard deviations.
one standard deviation.

e
three standard deviations.

nc
ta
QUESTION 12 OF 18
is
Marked out of 3.00
D

Which of the following do you need to calculate the model's recall?


IU

Select one:
am

only TN and FN
only TN and FP
only TP and FN
Ex

only TP and FP
e
tic
ac
Pr

5/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 13 OF 18
Marked out of 3.00

g
in
Which of the following statements is correct for a histogram?

rn
Answer option 1: A smaller dataset is best visualized with a larger number of bins for increased accuracy.

a
Answer option 2: The number of bins has a direct relationship with the size of each bin.

Le
Select one:

e
None of the answer options are correct.

nc
Both answer options are correct.
Only answer option 1 is correct. ta
Only answer option 2 is correct.
is

QUESTION 14 OF 18
D

Marked out of 3.00


IU

In the context of support vector machines (SVM), what role do support vectors and the margin play?
am

Select one:
Ex

Support vectors are the outlier data points, and the margin measures the regularization strength in
SVM.
Support vectors are the data records closest to the classification line, and the margin is the
e

distance between the two sides.


tic

Support vectors are the features used for classification, and the margin represents the error in the
model.
ac

Support vectors are the parameters defining the decision boundary, and the margin indicates the
lowest and highest value of the dataset.
Pr

6/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 15 OF 18
Marked out of 6.00

g
in
Briefly explain the three main aspects used to identify a data science use case (DSUC) in a business context.

a rn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

7/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 16 OF 18
Marked out of 6.00

g
in
Suppose you have an attribute with values ranging from -1290 to 4870, and you want to normalize these values using

rn
decimal scaling. Explain the required steps.

a
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

8/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 17 OF 18
Marked out of 18.00

g
in
Sophia and Daniel are working on a data science project for an e-commerce company aiming to improve customer

rn
satisfaction and sales. The project involves analyzing customer data to identify patterns, preferences, and potential areas
for enhancement in the online shopping experience. The dataset includes various customer-related variables such as

a
purchase history, browsing behavior, demographic information, and customer feedback.

Le
However, Sophia and Daniel are facing challenges related to data quality in this setting. One of the customer variables in
the dataset is "time spent on website", which represents the amount of time a customer spends browsing the e-

e
commerce platform during a session. Sophia and Daniel notice that there are missing values and outliers in this variable,

nc
potentially affecting their analysis of customer engagement.

a) Briefly explain what is meant with duplicates and outliers. Illustrate your answer by refering to the above described
ta
dataset.
b) Name four methods that are commonly utilized to resolve missing values.
is
c) Evaluate these methods in terms of their advantages and disadvantages for handling missing values related to the
D

variable "time spent on website."


IU
am
Ex
e
tic
ac
Pr

9/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

10/19
DLMBDSA01-01 1433
EXAMID: 1221789

QUESTION 18 OF 18
Marked out of 18.00

g
in
Emily and David are environmental researchers working on a project to analyze various types of environmental data for a

rn
conservation organization. They are tasked with categorizing different environmental datasets into structured,
unstructured, and semi-structured data. However, they have different interpretations of these categories. Help them gain

a
a clear understanding.

Le
Categorize the following environmental data into the three categories: structured, unstructured, and semi-structured data.
Briefly explain why you assigned the data to a certain categorization.

e
nc
(1) Satellite imagery capturing deforestation patterns
(2) Research papers discussing climate change trends
(3) Excel spreadsheet containing bird species observations
ta
(4) Field notes detailing observations of animal behavior
(5) Monthly rainfall log
is
(6) Drones capturing real-time footage of wildlife habitats
D
IU
am
Ex
e
tic
ac
Pr

11/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

12/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

13/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

14/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

15/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

16/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

17/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

18/19
DLMBDSA01-01 1433
EXAMID: 1221789

g
in
arn
Le
e
nc
ta
is
D
IU
am
Ex
e
tic
ac
Pr

19/19
Powered by TCPDF (www.tcpdf.org)

You might also like