0% found this document useful (0 votes)
83 views

R08 Big Data Projects - Answers

This document contains 7 multiple choice questions about big data concepts and machine learning. Question 1 asks about what data exploration is least likely to encompass, with the explanation stating it involves feature selection, engineering, but not design. Question 2 asks what big data suffers most from, with veracity being the answer since it can include meaningless data. Question 3 asks about characterizing a government using distributed ledger technology for vehicle registrations, with the answer being blockchain.

Uploaded by

Shashwat Desai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

R08 Big Data Projects - Answers

This document contains 7 multiple choice questions about big data concepts and machine learning. Question 1 asks about what data exploration is least likely to encompass, with the explanation stating it involves feature selection, engineering, but not design. Question 2 asks what big data suffers most from, with veracity being the answer since it can include meaningless data. Question 3 asks about characterizing a government using distributed ledger technology for vehicle registrations, with the answer being blockchain.

Uploaded by

Shashwat Desai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Question #1 of 7 Question ID: 1208623

In big data projects, data exploration is least likely to encompass:

A) feature selection.

B) feature engineering.

C) feature design.

Explanation

Data exploration encompasses exploratory data analysis, feature selection, and feature engineering.

(Study Session 3, Module 8.2, LOS 8.c)

Question #2 of 7 Question ID: 1208622

Big data is most likely to su er from low:

A) variety.

B) velocity.

C) veracity.

Explanation

Big data is de ned as data with high volume, velocity, and variety. Big data often su ers from low
veracity, because it can contain a high percentage of meaningless data.

(Study Session 3, Module 8.1, LOS 8.a)

Question #3 of 7 Question ID: 1208621

A government decides it will privatize vehicle registrations if the province's auto insurance companies can

record and maintain ownership titles using distributed ledger technology. This application of distributed
ledger technology is best characterized as:

A) tokenization.

B) blockchain.

C) smart contracts.

Explanation

Tokenization refers to maintaining ownership records for physical assets on a distributed ledger. This
might, but would not necessarily, use a blockchain, which is a subcategory of distributed ledgers. Smart
contracts are computerized agreements designed to automatically carry out certain actions if de ned
conditions are met.

(Study Session 3, Module 8.1, LOS 8.e)


Question #4 of 7 Question ID: 1208619

An executive describes her company's "low latency, multiple terabyte" requirements for managing Big
Data. To which characteristics of Big Data is the executive referring?

A) Volume and velocity.

B) Volume and variety.

C) Velocity and variety.

Explanation

Big Data may be characterized by its volume (the amount of data available), velocity (the speed at which
data are communicated), and variety (degrees of structure in which data exist). "Terabyte" is a measure
of volume. "Latency" refers to velocity.

(Study Session 3, Module 8.1, LOS 8.a)

Question #5 of 7 Question ID: 1208624

Under which of these conditions is a machine learning model said to be under t?

A) The model identi es spurious relationships.

B) The input data are not labelled.

C) The model treats true parameters as noise.

Explanation

Under tting describes a machine learning model that is not complex enough to describe the data it is
meant to analyze. An under t model treats true parameters as noise and fails to identify the actual
patterns and relationships. A model that is over t (too complex) will tend to identify spurious
relationships in the data. Labelling of input data is related to the use of supervised or unsupervised
machine learning techniques.

(Study Session 3, Module 8.3, LOS 8.d)

Question #6 of 7 Question ID: 1208620

Which of the following uses of data is most accurately described as curation?

A) A data technician accesses an o site archive to retrieve data that has been stored there.

B) An investor creates a word cloud from nancial analysts’ recent research reports about a
company.

C) An analyst adjusts daily stock index data from two countries for their di erent market
holidays.

Explanation
Curation is ensuring the quality of data, for example by adjusting for bad or missing data. Word clouds
are a visualization technique. Moving data from a storage medium to where they are needed is referred
to as transfer.

(Study Session 3, Module 8.1, LOS 8.a)

Question #7 of 7 Question ID: 1208625

When evaluating the t of a machine learning algorithm, it is most accurate to state that:

A) accuracy is the ratio of correctly predicted positive classes to all predicted positive classes.

B) recall is the ratio of correctly predicted positive classes to all actual positive classes.

C) precision is the percentage of correctly predicted classes out of total predictions.

Explanation

Recall (also called sensitivity) is the ratio of correctly predicted positive classes to all actual positive
classes. Precision is the ratio of correctly predicted positive classes to all predicted positive classes.
Accuracy is the percentage of correctly predicted classes out of total predictions.

(Study Session 3, Module 8.3, LOS 8.g)

You might also like