R08 Big Data Projects - Answers
R08 Big Data Projects - Answers
A) feature selection.
B) feature engineering.
C) feature design.
Explanation
Data exploration encompasses exploratory data analysis, feature selection, and feature engineering.
A) variety.
B) velocity.
C) veracity.
Explanation
Big data is de ned as data with high volume, velocity, and variety. Big data often su ers from low
veracity, because it can contain a high percentage of meaningless data.
A government decides it will privatize vehicle registrations if the province's auto insurance companies can
record and maintain ownership titles using distributed ledger technology. This application of distributed
ledger technology is best characterized as:
A) tokenization.
B) blockchain.
C) smart contracts.
Explanation
Tokenization refers to maintaining ownership records for physical assets on a distributed ledger. This
might, but would not necessarily, use a blockchain, which is a subcategory of distributed ledgers. Smart
contracts are computerized agreements designed to automatically carry out certain actions if de ned
conditions are met.
An executive describes her company's "low latency, multiple terabyte" requirements for managing Big
Data. To which characteristics of Big Data is the executive referring?
Explanation
Big Data may be characterized by its volume (the amount of data available), velocity (the speed at which
data are communicated), and variety (degrees of structure in which data exist). "Terabyte" is a measure
of volume. "Latency" refers to velocity.
Explanation
Under tting describes a machine learning model that is not complex enough to describe the data it is
meant to analyze. An under t model treats true parameters as noise and fails to identify the actual
patterns and relationships. A model that is over t (too complex) will tend to identify spurious
relationships in the data. Labelling of input data is related to the use of supervised or unsupervised
machine learning techniques.
A) A data technician accesses an o site archive to retrieve data that has been stored there.
B) An investor creates a word cloud from nancial analysts’ recent research reports about a
company.
C) An analyst adjusts daily stock index data from two countries for their di erent market
holidays.
Explanation
Curation is ensuring the quality of data, for example by adjusting for bad or missing data. Word clouds
are a visualization technique. Moving data from a storage medium to where they are needed is referred
to as transfer.
When evaluating the t of a machine learning algorithm, it is most accurate to state that:
A) accuracy is the ratio of correctly predicted positive classes to all predicted positive classes.
B) recall is the ratio of correctly predicted positive classes to all actual positive classes.
Explanation
Recall (also called sensitivity) is the ratio of correctly predicted positive classes to all actual positive
classes. Precision is the ratio of correctly predicted positive classes to all predicted positive classes.
Accuracy is the percentage of correctly predicted classes out of total predictions.