0% found this document useful (0 votes)
43 views

Big Data Modeling and Management Systems Final

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Big Data Modeling and Management Systems Final

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 105

Question 1 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which of the following is a key feature of CSV data?

*A: Columns are separated by commas

Feedback: Correct! CSV stands for Comma-Separated Values, indicating that columns are separated by
commas.

B: Data is stored in a binary format

Feedback: Incorrect. CSV data is stored in a plain text format, not in a binary format.

C: Rows are separated by semicolons

Feedback: Incorrect. In CSV files, rows are typically separated by newline characters.

D: Columns can contain nested tables

Feedback: Incorrect. CSV columns contain simple text values and cannot contain nested tables.

Question 2 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which tool can be used to import CSV data into a spreadsheet and plot values?

*A: Microsoft Excel

Feedback: Correct! Microsoft Excel is commonly used to import CSV data and plot values.

B: Adobe Photoshop

Feedback: Incorrect. Adobe Photoshop is an image editing tool, not a spreadsheet application.

C: VLC Media Player

Feedback: Incorrect. VLC Media Player is a media player, not a spreadsheet application.

D: Microsoft Word

Feedback: Incorrect. Microsoft Word is a word processing application, not a spreadsheet tool.
Question 3 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

What distinguishes structured data from unstructured data?

*A: Structured data follows a predefined model, while unstructured data does not

Feedback: Correct! Structured data follows a predefined schema, whereas unstructured data does not.

B: Structured data includes multimedia files, while unstructured data includes only text

Feedback: Incorrect. Structured data typically includes text and numbers, while unstructured data may
include multimedia files.

C: Structured data cannot be easily searched, while unstructured data can

Feedback: Incorrect. Structured data can be easily searched due to its organized format.

D: Structured data lacks any organization, while unstructured data is highly organized

Feedback: Incorrect. Structured data is highly organized, whereas unstructured data lacks a defined
structure.

Question 4 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which of the following is a characteristic of structured data?

*A: Organized into rows and columns

Feedback: Correct! Structured data is organized into rows and columns, making it easy to search and
analyze.

B: Cannot be easily searched

Feedback: Incorrect. Structured data can be easily searched because it is organized in a systematic way.

C: Lacks a predefined model

Feedback: Incorrect. Structured data follows a predefined model or schema.

D: Includes images and videos

Feedback: Incorrect. Structured data typically includes text and numbers, not multimedia files.
Question 5 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

What is the main purpose of a foreign key in a relational database?

*A: To establish a link between tables

Feedback: Correct! A foreign key is used to establish a link between two tables in a relational database.

B: To store large binary data

Feedback: Incorrect. Large binary data is not stored using foreign keys.

C: To enforce data integrity within a table

Feedback: Incorrect. While foreign keys help in maintaining data integrity, their main purpose is to link
tables.

D: To define the structure of a table

Feedback: Incorrect. Defining the structure of a table is not the purpose of a foreign key.

Question 6 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Which of the following are characteristics of semi-structured data?

*A: Schema-less

Feedback: Correct! Semi-structured data can exist without a fixed schema.

*B: Highly flexible

Feedback: Correct! Semi-structured data is known for its flexibility in storing various types of data.

C: Rigid structure

Feedback: Incorrect. Semi-structured data does not have a rigid structure.

D: Uses tables exclusively

Feedback: Incorrect. Semi-structured data does not exclusively use tables.

Question 7 - multiple choice, shuffle, easy difficulty


Question category: Module: Big Data Modeling

Which of the following is a structural component of a relational data model?

*A: Tables

Feedback: Correct! Tables are indeed a structural component of a relational data model.

B: Nodes

Feedback: Incorrect. Nodes are not a structural component of a relational data model; they are more
associated with graph databases.

C: Edges

Feedback: Incorrect. Edges are more related to graph databases, not relational data models.

D: Documents

Feedback: Incorrect. Documents are typically associated with document-oriented databases, not
relational models.

Question 8 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which Python command is used to display an image using a local system image viewer?

*A: os.system('display image.jpg')

Feedback: That's right! Using os.system('display image.jpg') will display the image with the default
image viewer in many systems.

B: plt.show('image.jpg')

Feedback: Incorrect. plt.show() is used to display plots created with Matplotlib, not images directly from
the system.

C: cv2.imshow('image.jpg')

Feedback: Not quite. cv2.imshow() is used to display images in a window using OpenCV, not the local
system image viewer.

D: image.show('image.jpg')
Feedback: Incorrect. image.show() is a method for image objects in PIL to display images, but it doesn't
directly use the system's default image viewer.

Question 9 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which of the following best describes the concept of a data model in the context of big data?

*A: A representation of the structure and relationships in a dataset

Feedback: Correct! A data model represents the structure and relationships within a dataset, which is
fundamental to understanding and manipulating the data effectively.

B: A set of commands for querying a database

Feedback: Not quite. While querying commands interact with data models, they do not represent the
structure and relationships within a dataset.

C: A graphical interface for managing databases

Feedback: Incorrect. A graphical interface might help manage databases, but it does not describe the
structure and relationships within a dataset.

D: A method for encrypting data

Feedback: No, encryption methods are used to secure data, not to describe its structure and relationships.

Question 10 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

When importing CSV data into a spreadsheet, what is the first step you should take?

*A: Select the 'Import' option in the spreadsheet software

Feedback: Correct! Selecting the 'Import' option is the first step in importing CSV data into a
spreadsheet.

B: Manually enter the data into the spreadsheet

Feedback: Incorrect. Manually entering data is time-consuming and prone to errors.

C: Save the CSV file with a .txt extension

Feedback: Incorrect. Changing the file extension does not help in importing the data into a spreadsheet.
D: Open the CSV file using a text editor

Feedback: Incorrect. Opening the CSV file with a text editor does not import the data into a spreadsheet.

Question 11 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Which of the following are true about the structure of semi-structured data?

*A: It often uses a tree structure.

Feedback: Correct! Semi-structured data often uses a tree structure to represent data.

*B: It lacks a fixed schema.

Feedback: Correct! Semi-structured data does not have a fixed schema, allowing for more flexibility.

C: It is always stored in relational databases.

Feedback: Incorrect. Semi-structured data is typically stored in formats like XML or JSON, not in
traditional relational databases.

D: It cannot contain nested data.

Feedback: Incorrect. Semi-structured data can contain nested data, which is one of its advantages over
structured data.

Question 12 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which component of a relational data model enforces entity integrity?

*A: Primary key

Feedback: Correct! The primary key ensures that each record is unique and enforces entity integrity.

B: Foreign key

Feedback: Incorrect. A foreign key is used to establish a relationship between tables, but it does not
enforce entity integrity.

C: Index

Feedback: Incorrect. An index is used to speed up data retrieval, but it does not enforce entity integrity.
D: Constraint

Feedback: Incorrect. Constraints enforce rules on data in general, but only the primary key enforces
entity integrity.

Question 13 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which command in Python is used to install necessary dependencies within a virtual environment?

*A: pip install -r requirements.txt

Feedback: Correct! The command 'pip install -r requirements.txt' is used to install necessary
dependencies within a virtual environment.

B: python install dependencies

Feedback: Incorrect. 'python install dependencies' is not a valid command to install dependencies.

C: venv install packages

Feedback: Incorrect. 'venv install packages' is not a valid command in Python.

D: pip install packages

Feedback: Incorrect. 'pip install packages' does not specify the source of the dependencies.

Question 14 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Which of the following steps are necessary to extract data from a JSON file?

*A: Import the JSON module

Feedback: Correct! You need to import the JSON module to work with JSON data in Python.

*B: Open the JSON file using Python's open() function

Feedback: Correct! You need to open the JSON file using Python's open() function.

*C: Use the json.load() method to parse the JSON data

Feedback: Correct! The json.load() method is used to parse the JSON data.
D: Copy and paste the JSON data into a text editor

Feedback: Incorrect. Copying and pasting the data into a text editor will not allow you to
programmatically access it.

Question 15 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Select all the key features of CSV data.

*A: Columns are separated by commas

Feedback: Correct! Columns in CSV files are separated by commas.

*B: Rows are separated by newline characters

Feedback: Correct! Rows in CSV files are separated by newline characters.

*C: Data is stored in a plain text format

Feedback: Correct! CSV files store data in a plain text format.

D: Supports complex data types

Feedback: Incorrect. CSV files do not support complex data types; they store simple text values.

E: Allows for data encryption

Feedback: Incorrect. CSV files do not support data encryption.

Question 16 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Which of the following are common operations in data models?

*A: Subsetting

Feedback: Correct! Subsetting is a common operation in data models, allowing for the selection of
specific data subsets.

*B: Projection

Feedback: Correct! Projection is used to retrieve specific columns from a dataset.


*C: Union

Feedback: Correct! Union operation combines the results of two or more queries.

D: Formatting

Feedback: Incorrect. Formatting is not a common operation in data models.

E: Transcoding

Feedback: Incorrect. Transcoding is related to converting data formats, not a common data model
operation.

Question 17 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Select all the operations that can be performed on data models.

*A: Subsetting

Feedback: Correct! Subsetting is a common operation that involves selecting a specific portion of the
data.

*B: Projection

Feedback: Correct! Projection involves creating a new dataset with only certain attributes from the
original data.

C: Formatting

Feedback: Incorrect. Formatting is not considered a common operation on data models.

*D: Union

Feedback: Correct! Union is an operation that combines two datasets into one.

*E: Join

Feedback: Correct! Joining is an operation that merges datasets based on a common attribute.

F: Filtering

Feedback: Incorrect. While similar to subsetting, filtering is not a term typically used to describe
operations on data models.
Question 18 - numeric, easy difficulty

Question category: Module: Big Data Modeling

How many key principles are there in data models, according to the lesson?

*A: 3.0

Feedback: Correct! There are three key principles in data models discussed in this lesson.

Default Feedback: Incorrect. Review the key principles of data models discussed in this lesson.

Question 19 - text match, easy difficulty

Question category: Module: Big Data Modeling

What character is commonly used to separate columns in a CSV file? Please answer in all lowercase.

*A: comma

Feedback: Correct! A comma is used to separate columns in a CSV file.

Default Feedback: Incorrect. Refer to the course materials on CSV file formatting.

Question 20 - numeric, easy difficulty

Question category: Module: Big Data Modeling

If a tree structure in a semi-structured data model has 5 levels, what is the minimum number of nodes in
this tree?

*A: 6.0

Feedback: Correct! A tree with 5 levels will have at least 6 nodes, including the root node.

Default Feedback: Incorrect. Please review the properties of tree structures in semi-structured data
models.

Question 21 - text match, easy difficulty

Question category: Module: Big Data Modeling

What type of data includes text and numbers in a tabular format? Please answer in all lowercase.

*A: structured
Feedback: Correct! Structured data includes text and numbers organized in a tabular format.

Default Feedback: Incorrect. Think about the data that is organized in rows and columns.

Question 22 - text match, easy difficulty

Question category: Module: Big Data Modeling

What Python library is commonly used to create plots of weather station data? Please answer in all
lowercase.

*A: matplotlib

Feedback: Correct! Matplotlib is widely used for creating plots and visualizations in Python.

*B: seaborn

Feedback: Correct! Seaborn, which is based on Matplotlib, is commonly used for creating advanced
visualizations.

Default Feedback: Incorrect. Review the Python libraries used for data visualization in the course
material.

Question 23 - text match, easy difficulty

Question category: Module: Big Data Modeling

What file extension is commonly used for CSV files? Please answer in all lowercase.

*A: csv

Feedback: Correct! The .csv extension is commonly used for CSV files.

Default Feedback: Incorrect. Review the common file extensions for CSV files.

Question 24 - text match, easy difficulty

Question category: Module: Big Data Modeling

What is the term used for a unique identifier for a record in a relational database? Please answer in all
lowercase.

*A: primarykey

Feedback: Correct! A primary key uniquely identifies a record in a relational database.


*B: primary

Feedback: Correct! A primary key uniquely identifies a record in a relational database.

*C: key

Feedback: Correct! A primary key uniquely identifies a record in a relational database.

Default Feedback: Incorrect. Please refer to the course material on relational databases and unique
identifiers.

Question 25 - text match, easy difficulty

Question category: Module: Big Data Modeling

What is the term for the rule that specifies that certain values must be unique within a dataset? Please
answer in all lowercase.

*A: uniqueconstraint

Feedback: Correct! A unique constraint ensures that all values in a column are unique, preventing
duplicates.

*B: uniquerule

Feedback: Correct! Unique rule is another term for unique constraint, ensuring no duplicate values.

Default Feedback: Incorrect. Review the types of constraints in data models and try again.

Question 26 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which of the following is a key feature of CSV data?

*A: Comma-separated values

Feedback: Correct! CSV stands for Comma-Separated Values, which is a key feature of this data format.

B: Special characters in data

Feedback: Incorrect. CSV files typically use plain text and avoid special characters to maintain
simplicity.

C: Binary data encoding


Feedback: Incorrect. CSV files use text-based encoding, not binary.

D: Hierarchical data storage

Feedback: Incorrect. CSV files store data in a flat, tabular format, not in a hierarchical structure.

Question 27 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which Python library is commonly used to create plots of weather station data?

*A: Matplotlib

Feedback: Correct! Matplotlib is widely used for creating static, interactive, and animated visualizations
in Python.

B: NumPy

Feedback: Not quite. NumPy is primarily used for numerical operations, not for creating plots.

C: Pandas

Feedback: Incorrect. While Pandas is great for data manipulation and analysis, it is not primarily used
for creating plots.

D: SciPy

Feedback: Incorrect. SciPy is used for scientific and technical computing, but not typically for creating
plots.

Question 28 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which of the following best describes the purpose of primary keys in a relational data model?

*A: To uniquely identify each record in a table

Feedback: Correct! Primary keys are used to uniquely identify each record in a table.

B: To link two tables together

Feedback: Incorrect. Linking two tables together is the purpose of foreign keys, not primary keys.

C: To store data in a hierarchical structure


Feedback: Incorrect. A hierarchical structure is not a feature of relational data models.

D: To ensure all data is in numerical format

Feedback: Incorrect. Primary keys do not ensure data is in numerical format; they simply provide a
unique identifier for each record.

Question 29 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

What is a key characteristic of structured data?

*A: It is organized in a predefined manner

Feedback: Correct! Structured data is highly organized and follows a predefined model or schema.

B: It lacks a fixed schema or structure

Feedback: Incorrect. Structured data follows a specific schema or structure.

C: It cannot be organized into rows and columns

Feedback: Incorrect. Structured data is often organized into rows and columns, such as in databases.

D: It consists mostly of multimedia content

Feedback: Incorrect. Structured data typically includes text and numbers, not multimedia content.

Question 30 - text match, easy difficulty

Question category: Module: Big Data Modeling

What type of data includes text and numbers and follows a predefined model or schema? Please answer
in all lowercase.

*A: structured

Feedback: Correct! Structured data follows a predefined schema and includes text and numbers.

Default Feedback: Incorrect. Make sure you are thinking about data that is highly organized and follows
a specific structure.

Question 31 - text match, easy difficulty

Question category: Module: Big Data Modeling


What type of chart is often used to plot time series data? Please answer in all lowercase.

*A: line

Feedback: Correct! Line charts are commonly used to plot time series data due to their ability to show
trends over time.

B: bar

Feedback: Incorrect. Bar charts are typically used for comparing quantities, not plotting time series.

C: scatter

Feedback: Incorrect. Scatter plots are used to show the relationship between two variables, not
specifically for time series.

Default Feedback: Incorrect. Consider the type of chart that best represents changes over time.

Question 32 - text match, easy difficulty

Question category: Module: Big Data Modeling

What Python library would you use to create plots of weather station data? Please answer in all
lowercase.

*A: matplotlib

Feedback: Correct! Matplotlib is widely used for creating static, interactive, and animated visualizations
in Python.

*B: seaborn

Feedback: Seaborn is a powerful visualization library built on top of Matplotlib. Though it can be used
to create plots, the question asks for a more general library.

*C: pyplot

Feedback: Pyplot is a module of Matplotlib that provides MATLAB-like plotting framework. It's
commonly used for creating plots.

Default Feedback: Consider libraries that are specifically designed for data visualization in Python.

Question 33 - text match, easy difficulty

Question category: Module: Big Data Modeling


What is the term for data that does not have a predefined format or structure? Please answer in all
lowercase.

*A: unstructured

Feedback: Correct! Unstructured data lacks a predefined format or organization.

Default Feedback: Try again. Consider the type of data that doesn't adhere to a specific schema.

Question 34 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling

Which of the following characteristics are true for semi-structured data?

*A: It has a flexible schema.

Feedback: Correct! Semi-structured data often has a flexible schema, unlike structured data.

B: It is organized into tables and rows.

Feedback: This is characteristic of structured data, not semi-structured data.

*C: It often uses tags or markers to separate data elements.

Feedback: Correct! Semi-structured data often uses tags or markers, like in XML or JSON formats.

D: It cannot be stored in a relational database.

Feedback: This statement is misleading. While semistructured data is not inherently suited to relational
databases, it can often be stored within them using techniques like JSON storage.

E: It requires a fixed schema before any data can be added.

Feedback: This is a characteristic of structured data, not semi-structured data.

Question 35 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

Which of the following is a key feature of a CSV file?

*A: Columns are separated by commas

Feedback: Correct! CSV stands for Comma-Separated Values, where each column is separated by a
comma.
B: Columns are separated by semicolons

Feedback: Incorrect. CSV files use commas, not semicolons, to separate columns.

C: Data is stored in a binary format

Feedback: Incorrect. CSV files store data in plain text, not binary format.

D: Each row must have a unique identifier

Feedback: Incorrect. Rows in a CSV file do not require unique identifiers.

Question 36 - multiple choice, shuffle, medium

Question category: Module: Big Data Modeling

What is the primary purpose of using a foreign key in a relational database?

A: To uniquely identify each record within its own table.

Feedback: This describes the purpose of a primary key, not a foreign key. Review the roles of primary
and foreign keys.

*B: To establish a link between two tables.

Feedback: Correct! A foreign key is used to establish a relationship between two tables.

C: To ensure that data is only entered once in the database.

Feedback: This is related to data normalization, not specifically the function of a foreign key.

D: To encrypt sensitive data in the database.

Feedback: Encryption is a separate concern and not directly related to the function of foreign keys.

Question 37 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling

What characteristic distinguishes structured data from unstructured data?

*A: Structured data is organized into predefined models or schemas.

Feedback: Correct! Structured data is organized and follows a specific model, making it easier to
analyze.
B: Structured data lacks any fixed schema or structure.

Feedback: Incorrect. Structured data is known for its predefined structure, unlike unstructured data
which lacks it.

C: Structured data cannot be stored in databases.

Feedback: Incorrect. Structured data is typically stored in databases due to its organized schema.

D: Structured data is always textual and narrative in form.

Feedback: Incorrect. Structured data is not always textual but is characterized by its organization into
rows and columns.

Question 38 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What was a significant impact of the emergence of MapReduce-style systems on data management?

*A: It allowed for the processing of data in parallel across distributed systems.

Feedback: Correct! MapReduce-style systems enabled parallel processing across distributed systems,
greatly enhancing data management capabilities.

B: It eliminated the need for data warehousing solutions.

Feedback: Incorrect. MapReduce-style systems did not eliminate the need for data warehousing.

C: It made relational databases obsolete.

Feedback: Incorrect. Relational databases are still very much in use and have not become obsolete due
to MapReduce-style systems.

D: It removed the need for data indexing.

Feedback: Incorrect. MapReduce-style systems did not remove the need for data indexing.

Question 39 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following best describes the integration of MR-style operations in DBMSs?

*A: MR-style operations are integrated into DBMSs to allow for parallel processing of large data sets.
Feedback: Correct! This integration allows for efficient parallel processing of large data sets within
DBMSs.

B: MR-style operations in DBMSs are used to replace traditional SQL queries.

Feedback: Incorrect. MR-style operations complement traditional SQL queries; they do not replace
them.

C: MR-style operations are used exclusively for real-time data processing in DBMSs.

Feedback: Incorrect. MR-style operations are not limited to real-time data processing.

D: MR-style operations in DBMSs are designed for handling small-scale data analytics.

Feedback: Incorrect. MR-style operations are designed for large-scale data analytics.

Question 40 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is one of the key differences between ACID and BASE properties in database management
systems?

*A: ACID prioritizes consistency, while BASE prioritizes availability.

Feedback: Correct! ACID properties focus on consistency, ensuring that database transactions are
processed reliably. BASE properties, on the other hand, prioritize availability, allowing for eventual
consistency.

B: ACID properties are only applicable to NoSQL databases, while BASE properties are for SQL
databases.

Feedback: Incorrect. ACID properties are typically associated with SQL databases, whereas BASE
properties are more common in NoSQL databases.

C: BASE properties ensure data durability, while ACID properties focus on high throughput.

Feedback: Incorrect. ACID properties ensure data durability among other things, while BASE properties
allow for high availability and partition tolerance.

D: ACID properties are designed for distributed systems, while BASE properties are designed for
centralized systems.

Feedback: Incorrect. Both ACID and BASE properties can be applied in various types of systems,
including distributed systems.
Question 41 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is one advantage of using column stores in databases?

*A: They offer faster data retrieval for specific columns.

Feedback: Correct! Column stores allow for faster data retrieval when querying specific columns,
making them efficient for analytical queries.

B: They provide better support for transaction processing.

Feedback: Incorrect. Column stores are typically not optimized for transaction processing but for read-
heavy queries.

C: They simplify the database schema design.

Feedback: Incorrect. Column stores do not inherently simplify schema design; their main advantage lies
in efficient data retrieval.

D: They reduce the need for indexing.

Feedback: Incorrect. Column stores may still require indexing, but their primary benefit is in how they
store and retrieve data.

Question 42 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following best explains the difference between parallel and distributed DBMS?

*A: Parallel DBMS uses multiple processors within a single machine, while distributed DBMS uses
multiple machines.

Feedback: Correct! This is the key difference between parallel and distributed DBMS.

B: Parallel DBMS can only process small datasets, while distributed DBMS can process large datasets.

Feedback: Incorrect. The size of the dataset is not a defining difference between parallel and distributed
DBMS.

C: Parallel DBMS is less efficient than distributed DBMS in terms of processing speed.

Feedback: Incorrect. Both parallel and distributed DBMS can be efficient depending on the use case.
D: Parallel DBMS does not support MR-style operations, while distributed DBMS does.

Feedback: Incorrect. Both parallel and distributed DBMS can support MR-style operations.

Question 43 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following describes a key advantage of using a DBMS over a file system for large scale
data processing?

*A: Enhanced data integrity and security

Feedback: Correct! Using a DBMS provides enhanced data integrity and security.

B: Simplified file storage

Feedback: Incorrect. While a DBMS can manage data efficiently, simplified file storage is not its key
advantage over a file system.

C: Reduced need for data backups

Feedback: Incorrect. A DBMS does not reduce the need for data backups.

D: Lower data processing speed

Feedback: Incorrect. In fact, a DBMS generally increases data processing speed due to optimized
queries and indexing.

Question 44 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following is a unique feature of Aerospike?

*A: Geospatial data support

Feedback: Correct! Aerospike supports geospatial data, which is one of its unique features.

B: Batch processing

Feedback: Incorrect. While Aerospike is optimized for performance, batch processing is not one of its
unique features.

C: Schema evolution
Feedback: Incorrect. Schema evolution is not a unique feature of Aerospike.

D: Virtualization

Feedback: Incorrect. Virtualization is not a unique feature of Aerospike.

Question 45 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following capabilities is provided by Vertica's integration with R?

*A: Statistical analysis

Feedback: Correct! Vertica's integration with R allows for advanced statistical analysis.

B: Database replication

Feedback: Incorrect. Vertica's integration with R focuses on statistical analysis, not database replication.

C: Data visualization

Feedback: Incorrect. While R can be used for data visualization, Vertica's integration with R is
specifically aimed at statistical analysis.

D: Machine learning model deployment

Feedback: Incorrect. The integration primarily facilitates statistical analysis, not machine learning model
deployment.

Question 46 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is a significant difference between parallel and distributed DBMS?

*A: Parallel DBMS uses multiple processors within a single system, while distributed DBMS uses
multiple systems.

Feedback: Correct! Parallel DBMS involves multiple processors within a single system, whereas
distributed DBMS involves multiple systems.

B: Parallel DBMS is always faster than distributed DBMS.

Feedback: Incorrect. Speed depends on various factors and is not a definitive difference.
C: Distributed DBMS requires more memory than parallel DBMS.

Feedback: Incorrect. Memory requirements depend on the specific implementation and use case.

D: Parallel DBMS can only run on specialized hardware.

Feedback: Incorrect. Parallel DBMS can run on a variety of hardware, not just specialized ones.

Question 47 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following is a key advantage of using a DBMS over a file system for large scale data
processing?

*A: Improved security features

Feedback: Correct! DBMS systems offer enhanced security features compared to file systems.

B: Simpler file structures

Feedback: Incorrect. File systems might have simpler file structures but lack the advanced features of
DBMS.

C: Less disk space usage

Feedback: Incorrect. DBMS might actually use more disk space due to indexing and other features.

D: Faster read/write operations

Feedback: Incorrect. This is not necessarily true; DBMS might be slower due to complex operations.

Question 48 - multiple choice, shuffle, medium

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following describes a key difference between parallel and distributed DBMS and its
implications for data processing?

*A: Parallel DBMS focuses on dividing tasks within a single machine, while distributed DBMS uses
multiple machines.

Feedback: Correct! Parallel DBMS divides tasks within a single machine, while distributed DBMS uses
multiple machines for data processing.
B: Parallel DBMS uses multiple machines for processing, whereas distributed DBMS processes tasks
within a single machine.

Feedback: Incorrect. Parallel DBMS divides tasks within a single machine, while distributed DBMS
uses multiple machines.

C: Both parallel and distributed DBMS use a single machine for processing tasks.

Feedback: Incorrect. Distributed DBMS uses multiple machines for data processing, unlike parallel
DBMS.

D: There is no significant difference between parallel and distributed DBMS.

Feedback: Incorrect. There is a significant difference between parallel and distributed DBMS in terms of
task distribution and machine usage.

Question 49 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following best describes the analytical capabilities of Vertica integrated with R?

*A: Enabling complex statistical analysis on large datasets

Feedback: Correct! Vertica integrated with R enables complex statistical analysis on large datasets.

B: Providing a basic spreadsheet interface for data manipulation

Feedback: Incorrect. Vertica integrated with R goes beyond basic spreadsheet manipulation.

C: Offering a graphical user interface for data visualization

Feedback: Incorrect. The integration focuses on statistical analysis, not just visualization.

D: Supporting only SQL-based queries

Feedback: Incorrect. Vertica integrated with R supports more than just SQL-based queries.

Question 50 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following is a unique feature of Aerospike's architecture?

*A: Hybrid memory architecture


Feedback: Correct! Aerospike's hybrid memory architecture is a unique feature.

B: Row-oriented data storage

Feedback: Incorrect. Aerospike uses a different approach for data storage.

C: Only supports SQL queries

Feedback: Incorrect. Aerospike supports more than just SQL queries.

D: Built-in machine learning algorithms

Feedback: Incorrect. Aerospike does not have built-in machine learning algorithms.

Question 51 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following describes the query language capabilities of AsterixDB?

*A: Support for SQL++

Feedback: Correct! AsterixDB supports SQL++, which extends SQL for querying semi-structured data.

B: Support for plain SQL

Feedback: Incorrect. AsterixDB uses SQL++, not plain SQL.

C: Support for NoSQL queries

Feedback: Incorrect. AsterixDB uses SQL++ for querying, which is more powerful than typical NoSQL
queries.

D: Support for GraphQL

Feedback: Incorrect. AsterixDB does not support GraphQL; it supports SQL++.

Question 52 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following are features and functionalities of Solr?

*A: Full text search

Feedback: Correct! Solr provides full text search capabilities.


*B: Indexing structured documents

Feedback: Correct! Solr can index structured documents.

*C: Faceted search

Feedback: Correct! Solr supports faceted search.

D: Transactional data processing

Feedback: Incorrect. Solr is not designed for transactional data processing.

E: Distributed file storage

Feedback: Incorrect. Solr does not provide distributed file storage capabilities.

Question 53 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Select the features and functionalities of Solr.

*A: Full-text search

Feedback: Correct! Solr supports full-text search.

*B: Indexing structured documents

Feedback: Correct! Solr can index structured documents.

*C: Faceted search

Feedback: Correct! Solr supports faceted search.

D: Real-time data ingestion

Feedback: Incorrect. Real-time data ingestion is not a primary feature of Solr.

E: Handling spatial data

Feedback: Incorrect. Handling spatial data is not a primary feature of Solr.

F: ACID transactions

Feedback: Incorrect. Solr does not focus on ACID transactions.


Question 54 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

How many desirable characteristics of a Big Data Management System should be explained according to
the lesson objectives?

*A: 5.0

Feedback: Correct! The lesson objectives specify that at least five desirable characteristics should be
explained.

Default Feedback: Incorrect. Refer back to the lesson objectives for the correct number of desirable
characteristics.

Question 55 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Approximately, how many gigabytes of data can TheMinDBMS manage efficiently in a single-node
setup?

*A: 500.0

Feedback: Correct! TheMinDBMS can efficiently manage up to 500 gigabytes of data in a single-node
setup.

Default Feedback: Incorrect. TheMinDBMS has a certain capacity for data management. Please refer to
the course material for details.

Question 56 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

How many secondary indices are supported in Aerospike?

*A: 256.0

Feedback: Correct! Aerospike supports up to 256 secondary indices.

Default Feedback: Incorrect. Please review the material on Aerospike's secondary indices.

Question 57 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS


How many fundamental challenges of storing, indexing, and matching text data are discussed in the
lesson?

*A: 3.0

Feedback: Correct! There are three fundamental challenges discussed in the lesson.

Default Feedback: Incorrect. Please review the lesson on the fundamental challenges of storing,
indexing, and matching text data.

Question 58 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What type of system architecture is typically used by TheMinDBMS? Please answer in all lowercase.
Please answer in all lowercase.

*A: distributed

Feedback: Correct! TheMinDBMS typically uses a distributed system architecture.

*B: distributed system

Feedback: Correct! TheMinDBMS uses a distributed system architecture.

Default Feedback: Incorrect. Please review the system architecture used by TheMinDBMS.

Question 59 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is the term used to describe the simultaneous processing of the same task across multiple
processors? Please answer in all lowercase.

*A: parallelism

Feedback: Correct! Parallelism refers to the simultaneous processing of the same task across multiple
processors.

B: concurrency

Feedback: Incorrect. Concurrency involves managing multiple tasks at the same time, but not
necessarily the same task across multiple processors.

C: multitasking
Feedback: Incorrect. Multitasking refers to handling multiple tasks within a single processor.

D: distributed

Feedback: Incorrect. Distributed refers to processing across multiple systems, not just processors.

Default Feedback: Incorrect. Please review the concepts of parallelism in big data management.

Question 60 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Name the in-memory data structure store that is known for fast data retrieval and optimizing memory
usage. Please answer in all lowercase.

*A: redis

Feedback: Correct! Redis is an in-memory data structure store used for fast data retrieval and optimizing
memory usage.

Default Feedback: Incorrect. Please review the lesson on in-memory data structure stores and their
characteristics.

Question 61 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is the term used to describe databases that support both ACID and non-ACID transactions for
different operations? Please answer in all lowercase.

*A: hybrid

Feedback: Correct! Hybrid databases support both ACID and non-ACID transactions.

*B: htap

Feedback: Correct! HTAP (Hybrid Transactional/Analytical Processing) databases support both ACID
and non-ACID transactions.

Default Feedback: Incorrect. Please review the concept of databases that support both ACID and non-
ACID transactions.

Question 62 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS


What type of DBMS is TheMinDBMS? Please answer in all lowercase.

*A: distributed

Feedback: Correct! TheMinDBMS is a distributed DBMS.

*B: distributeddbms

Feedback: Correct! TheMinDBMS is a distributed DBMS.

*C: distributed_dbms

Feedback: Correct! TheMinDBMS is a distributed DBMS.

Default Feedback: Incorrect. Please review the features of TheMinDBMS.

Question 63 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which feature is NOT associated with TheMinDBMS?

*A: Distributed transaction processing

Feedback: Correct! TheMinDBMS does not focus on distributed transaction processing.

B: Simplified data management

Feedback: Incorrect. Simplified data management is a key feature of TheMinDBMS.

C: Efficient analytics processing

Feedback: Incorrect. Efficient analytics processing is a key feature of TheMinDBMS.

D: Support for large-scale data

Feedback: Incorrect. Support for large-scale data is a key feature of TheMinDBMS.

Question 64 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following describes the integration of MR-style operations in DBMSs?

*A: MR-style operations allow for efficient, parallel processing of large datasets within a DBMS.
Feedback: Correct! The integration of MR-style operations in DBMSs allows for efficient, parallel
processing of large datasets.

B: MR-style operations eliminate the need for SQL in a DBMS.

Feedback: Incorrect. MR-style operations do not eliminate the need for SQL in a DBMS.

C: MR-style operations are designed to replace traditional DBMS operations entirely.

Feedback: Incorrect. MR-style operations are not designed to replace traditional DBMS operations
entirely.

D: MR-style operations are primarily used for real-time transaction processing.

Feedback: Incorrect. MR-style operations are not primarily used for real-time transaction processing.

Question 65 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Which of the following is a key advantage of column stores in the context of big data management?

*A: Improved compression techniques

Feedback: Correct! Column stores allow for more efficient compression techniques, which can
significantly reduce storage space and improve query performance.

B: Better transaction support

Feedback: Incorrect. While column stores have their advantages, transaction support is typically stronger
in row-oriented databases.

C: Simpler indexing methods

Feedback: Incorrect. Column stores may have complex indexing methods to support efficient data
retrieval.

D: Enhanced data security

Feedback: Incorrect. Column stores do not inherently provide enhanced data security compared to other
database designs.

Question 66 - checkbox, shuffle, partial credit, medium

Question category: Module: Big Data Management: The "M" in DBMS


Select the correct statements about the emergence of MapReduce-style systems and their impact on data
management.

*A: MapReduce-style systems enable large-scale data processing on distributed systems.

Feedback: Correct! MapReduce-style systems enable large-scale data processing on distributed systems.

B: MapReduce-style systems are less efficient than traditional DBMS for large-scale data processing.

Feedback: Incorrect. MapReduce-style systems are generally more efficient for large-scale data
processing.

*C: MapReduce-style systems have simplified the development of distributed algorithms.

Feedback: Correct! MapReduce-style systems have simplified the development of distributed


algorithms.

D: MapReduce-style systems are only applicable to structured data.

Feedback: Incorrect. MapReduce-style systems can be applied to both structured and unstructured data.

Question 67 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What type of data structure store is Redis primarily considered? Please answer in all lowercase.

*A: in-memory

Feedback: Correct! Redis is primarily considered an in-memory data structure store.

*B: inmemory

Feedback: Correct! Redis is primarily considered an in-memory data structure store.

Default Feedback: Incorrect. Redis is primarily classified as an in-memory data structure store.

Question 68 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

Within what range does Aerospike's typical write latency for a single record fall (in milliseconds)?

*A: [1, 5)

Feedback: Correct! Aerospike's typical write latency for a single record falls within this range.
Default Feedback: Incorrect. Please review the performance characteristics of Aerospike for web-scale
applications.

Question 69 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is the lower bound of the range of efficiency improvement (in percentage) typically seen when
using a DBMS over a file system for large-scale data processing?

*A: [20, 25)

Feedback: Correct! Using a DBMS for large-scale data processing typically shows significant efficiency
improvements over a file system.

Default Feedback: Incorrect. Please review the efficiency improvements discussed in the lesson.

Question 70 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is the range of nodes typically used in a distributed DBMS?

*A: [50, 1000]

Feedback: Correct! Distributed DBMSs typically operate within this range of nodes.

Default Feedback: That's not quite right. Consider the scale typically associated with distributed
systems.

Question 71 - numeric, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is the minimum number of desirable characteristics a Big Data Management System should have?

*A: 5.0

Feedback: The minimum is five, as mentioned in the course material.

Default Feedback: Think about the essential characteristics needed for managing big data efficiently.

Question 72 - text match, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS


What is the key-value data structure store that optimizes memory for fast data retrieval? Please answer
in all lowercase.

*A: redis

Feedback: Correct! Redis is known for its efficient in-memory data structure store capabilities.

Default Feedback: Remember to think about in-memory data structure stores known for speed.

Question 73 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is an advantage of using column stores and compression techniques?

*A: Column stores allow for faster query performance on specific columns by sequentially scanning
those columns only.

Feedback: Correct! Column stores optimize queries by accessing only the necessary columns, enhancing
performance.

B: Column stores can store entire rows of data contiguously on disk.

Feedback: This is incorrect. Column stores focus on storing data column-wise, not row-wise.

C: Column stores use compression techniques to expand storage needs and slow down processing.

Feedback: Actually, compression in column stores is used to reduce storage footprint and improve
performance, not the opposite.

D: Column stores inherently support ACID transactions better than row stores.

Feedback: No, column stores are not inherently better at supporting ACID transactions compared to row
stores.

Question 74 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is one of the analytical capabilities of Vertica?

A: Vertica can handle real-time data ingestion.

Feedback: Good try, but Vertica is not primarily focused on real-time data ingestion.

*B: Vertica provides integration with R for statistical analysis.


Feedback: Correct! Vertica's integration with R enhances its analytical capabilities for statistical
computations.

C: Vertica is an open-source database management system.

Feedback: That's incorrect. Vertica is a commercial product offering high-performance analytics.

D: Vertica is specifically designed for geospatial data handling.

Feedback: This is not correct. While Vertica supports various data types, geospatial handling is not its
primary focus.

Question 75 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is a key advantage of integrating MapReduce-style operations in DBMSs?

*A: Enhances parallel processing capabilities

Feedback: Correct! Integrating MapReduce-style operations in DBMSs enhances parallel processing


capabilities.

B: Reduces data redundancy

Feedback: Not quite. While data redundancy is a concern, the primary advantage here is related to
processing capabilities.

C: Increases storage capacity

Feedback: This is incorrect. Storage capacity is not directly affected by the integration of MapReduce-
style operations.

D: Improves user interface design

Feedback: No, user interface design is not directly related to the integration of MapReduce-style
operations.

Question 76 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is one significant difference between parallel and distributed DBMS?

*A: Parallel DBMS focuses on shared memory while distributed DBMS utilizes shared-nothing
architecture
Feedback: Correct! This is a significant difference between the two.

B: Parallel DBMS is used for real-time analytics, while distributed DBMS is not

Feedback: Not quite. Both systems can be used for analytics, but they differ in architecture.

C: Distributed DBMSs are always faster than parallel DBMSs

Feedback: This is incorrect. Speed depends on various factors, not just the system type.

D: Parallel DBMS requires less network communication compared to distributed DBMS

Feedback: This is incorrect. The network communication aspect is more related to the architecture rather
than the DBMS type.

Question 77 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is one of the key advantages of MapReduce-style systems in data management?

*A: MapReduce enables distributed processing by breaking tasks into smaller sub-tasks.

Feedback: Correct! MapReduce is designed to handle large-scale data processing through distributed
systems.

B: MapReduce relies on a single central database for all operations.

Feedback: Not quite. MapReduce is distributed, not centralized.

C: MapReduce is only used for structured data processing.

Feedback: Incorrect. MapReduce can handle both structured and unstructured data.

D: MapReduce simplifies data processing by using a linear, non-parallel model.

Feedback: That's not correct. MapReduce actually benefits from parallel processing.

Question 78 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Management: The "M" in DBMS

What is one of the main advantages of using column stores in big data management systems?

*A: Faster retrieval of data for analytical queries.


Feedback: Correct! Column stores allow for faster retrieval of data because they store data by columns
rather than rows, which is particularly efficient for analytical queries.

B: Requires less memory for storage.

Feedback: Not quite. While column stores can use compression to reduce storage space, their main
advantage is in speeding up data retrieval for analytical queries.

C: Simplifies the database schema design.

Feedback: This is incorrect. Column stores do not necessarily simplify schema design, but they optimize
data retrieval speeds.

D: Enhances transactional processing capabilities.

Feedback: No, column stores are primarily optimized for read-heavy analytical operations rather than
transactional processing.

Question 79 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What characteristic of big data is best managed by a distributed file system?

*A: Large volume

Feedback: Correct! Distributed file systems are well-suited to handle large volumes of data.

B: High velocity

Feedback: Incorrect. High velocity refers to the speed at which data is generated and processed.
Distributed file systems primarily focus on handling large volumes of data.

C: Variety

Feedback: Incorrect. Variety refers to the different types of data. While distributed file systems can store
different types of data, their primary strength is managing large volumes.

D: Veracity

Feedback: Incorrect. Veracity refers to the uncertainty of data. Distributed file systems are not
specifically designed to manage data veracity.

Question 80 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management


Which big data management system is best suited for real-time data processing?

*A: Apache Storm

Feedback: Correct! Apache Storm is designed for real-time data processing.

B: Hadoop

Feedback: Incorrect. Hadoop is primarily designed for batch processing, not real-time data processing.

C: Cassandra

Feedback: Incorrect. Cassandra is a distributed database system, but it is not specifically designed for
real-time data processing.

D: HBase

Feedback: Incorrect. HBase is a distributed database that provides real-time read/write access but is not
specifically a real-time data processing system.

Question 81 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which aspect is crucial for gaining trust as a data provider?

*A: Data quality

Feedback: Correct! Data quality is crucial for gaining trust as a data provider.

B: Data quantity

Feedback: Incorrect. While the amount of data can be important, its quality is more crucial for trust.

C: Data format

Feedback: Incorrect. The format of the data can be important, but it is not the most crucial aspect for
gaining trust.

D: Data source

Feedback: Incorrect. The source of the data can be important, but it is not the most crucial aspect for
gaining trust.

Question 82 - multiple choice, shuffle, easy difficulty


Question category: Module: Introduction to Big Data Modeling and Management

Which of the following is a key feature of the Hadoop ecosystem?

*A: Scalability

Feedback: Correct! Scalability is a key feature of the Hadoop ecosystem.

B: Single-point failure

Feedback: Incorrect. The Hadoop ecosystem is designed to avoid single-point failures.

C: Limited data capacity

Feedback: Incorrect. The Hadoop ecosystem is designed to handle large data capacities.

D: Manual fault recovery

Feedback: Incorrect. The Hadoop ecosystem includes automated fault recovery mechanisms.

Question 83 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which command is used to change the current directory in a terminal?

*A: cd

Feedback: Correct! The 'cd' command is used to change the current directory in both macOS Terminal
and Windows PowerShell.

B: ls

Feedback: Incorrect. The 'ls' command is used to list files and directories, not to change directories.

C: pwd

Feedback: Incorrect. The 'pwd' command shows the current directory path, but doesn't change it.

D: mkdir

Feedback: Incorrect. The 'mkdir' command is used to create a new directory, not to change the current
directory.

Question 84 - multiple choice, shuffle, easy difficulty


Question category: Module: Introduction to Big Data Modeling and Management

Which file format is required to be downloaded from the specialization repository as per the course
instructions?

*A: .csv

Feedback: Correct! The course specifies that the required file format to be downloaded is .csv.

B: .txt

Feedback: Incorrect. While .txt files are commonly used, the course specifies the .csv format.

C: .docx

Feedback: Incorrect. .docx is a document format and not specified for this course.

D: .pdf

Feedback: Incorrect. The course explicitly mentions the .csv format for the required file.

Question 85 - multiple choice, shuffle, medium

Question category: Module: Introduction to Big Data Modeling and Management

What key insight can be derived from the 'FlightStats Data.pdf' regarding flight delays?

*A: Flight delays are more frequent in the winter season.

Feedback: Correct! Winter season typically experiences more frequent flight delays due to adverse
weather conditions.

B: Flight delays are less frequent on weekends.

Feedback: Incorrect. Review the data in 'FlightStats Data.pdf' to understand the trends related to flight
delays.

C: Flight delays are evenly distributed throughout the year.

Feedback: Incorrect. The data indicates a seasonal variation in flight delays.

D: Flight delays are primarily caused by mechanical failures.

Feedback: Incorrect. While mechanical failures are a factor, they are not the primary cause of flight
delays as per the data.
Question 86 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is one challenge associated with managing large-scale data from smart meters?

*A: Data privacy concerns

Feedback: Correct! Data privacy is a significant challenge when managing large-scale data from smart
meters.

B: Limited data storage capacity

Feedback: Incorrect. While data storage is a concern, it is not the primary challenge in this context.

C: Lack of data sources

Feedback: Incorrect. There is no lack of data sources in the context of smart meters.

D: Inadequate data frequency

Feedback: Incorrect. The frequency of data collection is typically adequate for analysis.

Question 87 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following best describes a characteristic of big data that makes traditional relational
databases less suitable?

A: High velocity

Feedback: High velocity refers to the speed at which data is generated and processed. While it is a big
data characteristic, it does not make relational databases less suitable.

B: Structured schema

Feedback: Structured schema is a characteristic of traditional relational databases, not big data.

*C: Large volume

Feedback: Correct! Large volume refers to the huge amount of data generated, which makes traditional
relational databases less suitable.

D: Consistency
Feedback: Consistency is a principle of traditional relational databases, not a characteristic of big data.

Question 88 - checkbox, shuffle, partial credit, medium

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are important considerations when choosing a big data management system?

*A: Scalability

Feedback: Correct! Scalability is crucial for handling increasing amounts of data efficiently in a big data
management system.

*B: Data schema

Feedback: Correct! Understanding the data schema is important for efficient data storage and retrieval.

C: Color of the user interface

Feedback: Incorrect. The color of the user interface is not a significant factor in choosing a big data
management system.

*D: Support for real-time processing

Feedback: Correct! Support for real-time processing is important for applications that require immediate
data processing and analysis.

Question 89 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following programming models is part of the Hadoop ecosystem for data modeling and
processing?

*A: MapReduce

Feedback: Correct! MapReduce is a programming model used in the Hadoop ecosystem for processing
large data sets.

B: SQL

Feedback: Incorrect. SQL is not a programming model specific to the Hadoop ecosystem.

C: REST
Feedback: Incorrect. REST is an architectural style used for web services, not a programming model in
Hadoop.

D: OOP

Feedback: Incorrect. Object-Oriented Programming (OOP) is a programming paradigm, but it is not


specific to Hadoop.

Question 90 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following technologies is a core component of the FlightStats real-time flight status data
technology stack?

*A: Apache Kafka

Feedback: Correct! Apache Kafka is used for real-time data streaming in FlightStats.

B: Hadoop MapReduce

Feedback: Incorrect. Hadoop MapReduce is generally used for batch processing, not real-time data
streaming.

C: Microsoft Excel

Feedback: Incorrect. Microsoft Excel is not suitable for handling real-time flight status data.

D: MySQL

Feedback: Incorrect. MySQL is a relational database and not typically used for real-time data streaming.

Question 91 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is a key consideration when processing big data from various sources in the game industry?

*A: Data consistency

Feedback: Correct! Ensuring data consistency is crucial when processing big data from various sources
in the game industry.

B: Data animation
Feedback: Incorrect. Data animation is not a key consideration when processing big data from various
sources in the game industry.

C: Data encryption

Feedback: Incorrect. While data encryption is important, it is not the primary consideration when
processing big data from various sources in the game industry.

D: Data visualization

Feedback: Incorrect. Data visualization is important, but it is not the key consideration when processing
big data from various sources in the game industry.

Question 92 - checkbox, shuffle, partial credit, medium

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are key considerations in big data modeling and management?

*A: Data exploration

Feedback: Correct! Data exploration is a crucial step in understanding and analyzing big data.

*B: Data storage requirements

Feedback: Correct! Understanding data storage requirements is essential for effective big data
management.

C: User interface design

Feedback: Incorrect. While important, user interface design is not a primary consideration in big data
modeling and management.

*D: Data processing requirements

Feedback: Correct! Data processing requirements are vital in managing how data is handled and
analyzed.

E: Graphic design

Feedback: Incorrect. Graphic design is not directly related to big data modeling and management.

Question 93 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management


Select the key aspects of big data analytics.

*A: Volume

Feedback: Correct! Volume is one of the key aspects of big data analytics.

*B: Variety

Feedback: Correct! Variety is one of the key aspects of big data analytics.

*C: Velocity

Feedback: Correct! Velocity is one of the key aspects of big data analytics.

*D: Value

Feedback: Correct! Value is one of the key aspects of big data analytics.

*E: Veracity

Feedback: Correct! Veracity is one of the key aspects of big data analytics.

F: Validity

Feedback: Incorrect. Validity is important but not considered one of the key aspects of big data
analytics.

G: Visibility

Feedback: Incorrect. Visibility is not considered one of the key aspects of big data analytics.

Question 94 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following technologies are part of the technology stack used by FlightStats for real-time
flight status data and data access?

*A: Apache Kafka

Feedback: Correct! Apache Kafka is used for real-time data streaming.

B: MySQL

Feedback: Incorrect. MySQL is not part of the core technology stack for real-time flight status data.
*C: Amazon S3

Feedback: Correct! Amazon S3 is used for data storage.

D: PostgreSQL

Feedback: Incorrect. PostgreSQL is not mentioned as part of the technology stack used by FlightStats.

*E: Redis

Feedback: Correct! Redis is used for caching real-time data.

F: MongoDB

Feedback: Incorrect. MongoDB is not part of the technology stack used by FlightStats.

Question 95 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Identify the different storage options available in an information system.

*A: Direct-Attached Storage (DAS)

Feedback: Correct! DAS is a storage option where the storage device is directly connected to the
computer.

*B: Network-Attached Storage (NAS)

Feedback: Correct! NAS is a storage option that connects storage devices to a network, allowing
multiple users to access data.

*C: Storage Area Network (SAN)

Feedback: Correct! SAN is a dedicated network that provides access to consolidated block-level storage.

*D: Cloud Storage

Feedback: Correct! Cloud storage is a service model in which data is maintained, managed, and backed
up remotely and made available to users over a network (typically the Internet).

E: Virtual Storage

Feedback: Incorrect. Virtual storage is not a specific storage option; it is a technique used in various
storage solutions.
F: Hybrid Storage

Feedback: Incorrect. Hybrid storage refers to a combination of different storage types, not a standalone
storage option.

Question 96 - checkbox, shuffle, partial credit, medium

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are computational tasks involved in managing large-scale data from smart
meters?

*A: Data aggregation

Feedback: Correct! Data aggregation is required to summarize the large volume of data generated by
smart meters.

*B: Real-time data monitoring

Feedback: Correct! Real-time data monitoring is crucial for immediate insights and decision-making.

C: Graphic design

Feedback: Incorrect. Graphic design is not a computational task related to smart meter data
management.

*D: Data anonymization

Feedback: Correct! Data anonymization is important for ensuring privacy and security when handling
smart meter data.

E: Video streaming

Feedback: Incorrect. Video streaming is not related to managing smart meter data.

Question 97 - numeric, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the minimum number of nodes typically required for a Hadoop Distributed File System (HDFS)
to ensure reliability and fault tolerance?

*A: 3.0

Feedback: Correct! A minimum of 3 nodes is typically required to ensure reliability and fault tolerance
in HDFS.
Default Feedback: Incorrect. Please review the requirements for reliability and fault tolerance in HDFS.

Question 98 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Describe the general concept of data management in one word. Please answer in all lowercase.

*A: organization

Feedback: Correct! Data management is about the organization of data.

*B: governance

Feedback: Correct! Data management is about governance of data.

*C: administration

Feedback: Correct! Data management is about administration of data.

Default Feedback: Incorrect. Please review the general concept of data management.

Question 99 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the term for devices that record energy consumption in real-time and provide data to both
consumers and utility companies? Please answer in all lowercase.

*A: smartmeters

Feedback: Correct! Smart meters record energy consumption in real-time.

*B: smartmeter

Feedback: Correct! Smart meters record energy consumption in real-time.

*C: smart-meter

Feedback: Correct! Smart meters record energy consumption in real-time.

*D: smart-meters

Feedback: Correct! Smart meters record energy consumption in real-time.

Default Feedback: Incorrect. Review the concepts related to energy consumption devices.
Question 100 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the term used to describe the variety of data types in big data? Please answer in all lowercase.
Please answer in all lowercase.

*A: variety

Feedback: Correct! Variety refers to the different types of data (structured, semi-structured,
unstructured) in big data.

Default Feedback: Incorrect. Please review the types of data in big data and try again.

Question 101 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Describe a key challenge in managing big data in one word. Please answer in all lowercase. Please
answer in all lowercase.

*A: scalability

Feedback: Correct! Scalability is a major challenge in managing big data as it involves handling
increasing amounts of data efficiently.

*B: complexity

Feedback: Correct! Complexity is a significant challenge in managing big data due to the intricate
processes involved.

Default Feedback: Incorrect. Please review the challenges involved in managing big data.

Question 102 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What term describes the speed at which big data is generated and processed? Please answer in all
lowercase.

*A: velocity

Feedback: Correct! Velocity describes the speed at which big data is generated and processed.

Default Feedback: Incorrect. Please review the characteristics of big data and try again.
Question 103 - numeric, medium

Question category: Module: Introduction to Big Data Modeling and Management

Explain the concept of a memory hierarchy and its impact on storage speed and cost in the context of
storage levels. How many main levels does a typical memory hierarchy have?

*A: 5.0

Feedback: Correct! A typical memory hierarchy has 5 main levels.

Default Feedback: Incorrect. Please review the concept of memory hierarchy and its typical levels.

Question 104 - numeric, medium

Question category: Module: Introduction to Big Data Modeling and Management

Based on the data from 'FlightStats Data.pdf', what is the average flight delay time in minutes?

*A: 45.0

Feedback: Correct! The average flight delay time is 45 minutes as indicated by the data.

Default Feedback: Incorrect. Refer to the 'FlightStats Data.pdf' for the correct average flight delay time.

Question 105 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following technologies is primarily used by FlightStats for real-time flight status data and
data access?

*A: Apache Kafka

Feedback: Correct! Apache Kafka is indeed used by FlightStats for real-time data streaming and access.

B: Hadoop

Feedback: Incorrect. While Hadoop is used for big data processing, it is not the primary technology for
real-time flight status data at FlightStats.

C: Spark

Feedback: Incorrect. Spark is often used for big data analytics, but not specifically for real-time flight
status data at FlightStats.
D: MongoDB

Feedback: Incorrect. MongoDB is used for database storage, but it is not the primary technology for
real-time flight status data at FlightStats.

Question 106 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following big data management systems is most suitable for handling real-time data
streams?

*A: Apache Kafka

Feedback: Correct! Apache Kafka is designed for building real-time data pipelines and streaming apps.

B: Hadoop HDFS

Feedback: Incorrect. Hadoop HDFS is more suited for batch processing rather than real-time data
streams.

C: MongoDB

Feedback: Not quite. MongoDB is a NoSQL database that handles large volumes of data efficiently but
is not specifically designed for real-time data streams.

D: MySQL

Feedback: Incorrect. MySQL is a relational database that is not optimized for handling real-time data
streams.

Question 107 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the difference between SAN (Storage Area Network) and NAS (Network Attached Storage)?

*A: SAN uses block storage while NAS uses file storage.

Feedback: Correct! SAN utilizes block storage, which provides raw storage capacity, whereas NAS uses
file storage, managing data in a hierarchical structure.

B: SAN is suitable for small businesses while NAS is for large enterprises.

Feedback: Incorrect. Both SAN and NAS can be used by both small and large enterprises depending on
their storage needs.
C: SAN is directly connected to computers while NAS is connected through the network.

Feedback: Incorrect. Actually, SAN is connected through a dedicated network, while NAS is connected
through a standard network.

D: SAN is typically slower than NAS in terms of data retrieval.

Feedback: Incorrect. SANs are generally faster than NAS in terms of data retrieval due to their
architecture.

Question 108 - checkbox, shuffle, partial credit, medium

Question category: Module: Introduction to Big Data Modeling and Management

Select the computational tasks involved in managing large-scale data from smart meters.

*A: Data aggregation

Feedback: Correct! Data aggregation is a crucial task for managing large-scale data from smart meters to
derive meaningful insights.

*B: Real-time processing

Feedback: Correct! Real-time processing is essential for immediate analysis and response to energy
usage data.

*C: Batch processing

Feedback: Correct! Batch processing helps in handling large volumes of data at intervals.

D: Manual data entry

Feedback: Incorrect. Manual data entry is not typically involved in managing large-scale data from
smart meters due to the automated nature of data collection.

E: Visual data representation

Feedback: Incorrect. While visual data representation is useful, it is not a computational task but rather a
method of presenting analyzed data.

Question 109 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are key aspects of big data analytics?


*A: Volume

Feedback: Correct! Volume is a key aspect of big data analytics as it refers to the vast amount of data
generated every second.

*B: Variety

Feedback: Correct! Variety is another important aspect as it denotes the different types of data
(structured, unstructured, and semi-structured) that are analyzed.

*C: Velocity

Feedback: Correct! Velocity refers to the speed at which data is generated and processed, making it a
crucial aspect of big data analytics.

*D: Veracity

Feedback: Correct! Veracity deals with the trustworthiness and quality of the data, which is essential in
big data analytics.

E: Vulnerability

Feedback: Incorrect. Vulnerability is not considered one of the key aspects of big data analytics.

Question 110 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are characteristics of big data?

*A: Volume

Feedback: Correct! Volume refers to the vast amounts of data generated every second.

*B: Velocity

Feedback: Correct! Velocity refers to the speed at which new data is generated and moves around.

*C: Veracity

Feedback: Correct! Veracity refers to the trustworthiness and quality of the data.

*D: Variety

Feedback: Correct! Variety refers to the different types of data (structured, unstructured, etc.).
E: Volatility

Feedback: Incorrect. Volatility is not one of the primary characteristics of big data.

F: Versatility

Feedback: Incorrect. While versatility is important, it is not considered a primary characteristic of big
data.

Question 111 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the name of the command-line interface tool used to interact with Docker containers? Please
answer in all lowercase.

*A: docker

Feedback: Correct! 'docker' is the CLI tool used to manage Docker containers.

Default Feedback: Incorrect. Check the Docker documentation to find the correct CLI tool name.

Question 112 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the term for attaching storage devices directly to a computer without using a network? Please
answer in all lowercase.

*A: das

Feedback: Correct! DAS stands for Direct-Attached Storage, which is connected directly to a computer
without using a network.

Default Feedback: Not quite. Try revisiting the concepts of different storage configurations.

Question 113 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the term used to describe devices like smart meters that measure and report energy usage?
Please answer in all lowercase.

*A: iot

Feedback: Correct! IoT stands for Internet of Things, which includes devices like smart meters.
*B: internetofthings

Feedback: Correct! Internet of Things is the full form of IoT.

Default Feedback: Consider the role of interconnected devices that measure and report energy usage.

Question 114 - text match, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is the term used to describe the process of examining and analyzing large data sets to uncover
hidden patterns, correlations, and insights? Please answer in all lowercase.

*A: datamining

Feedback: Correct! Data mining involves analyzing large datasets to find patterns and insights.

*B: data-mining

Feedback: Correct! Data mining involves analyzing large datasets to find patterns and insights.

*C: mining

Feedback: Correct! Data mining involves analyzing large datasets to find patterns and insights.

Default Feedback: Consider reviewing the terminology used in big data analytics for analyzing data sets.

Question 115 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are key aspects of big data analytics? Select all that apply.

*A: Volume, variety, and velocity of data

Feedback: Correct! These are known as the three V's of big data analytics, representing the scale,
diversity, and speed of data processing.

B: Inflexible data structures

Feedback: Incorrect. Big data analytics often requires flexible data structures to handle heterogeneous
and rapidly changing data.

*C: Real-time processing capabilities

Feedback: Correct! Real-time processing is crucial for gaining timely insights from big data.
D: Limited scalability

Feedback: Incorrect. Big data analytics requires systems that can scale up or out to handle growing data
demands.

*E: Advanced analytical techniques

Feedback: Correct! Big data analytics leverages advanced analytical techniques to extract meaningful
insights from complex datasets.

Question 116 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following are challenges associated with managing large-scale data from smart meters?

*A: Data privacy concerns

Feedback: Correct. Ensuring data privacy is a significant challenge in managing smart meter data.

B: Lack of data storage options

Feedback: Not quite. There are many storage solutions available; the challenge is often choosing the
right one.

*C: Real-time data processing requirements

Feedback: Correct. Processing data in real-time is a complex challenge with smart meter data.

D: Uniform data formats across devices

Feedback: Incorrect. Data formats can vary greatly between different smart meter manufacturers.

Question 117 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

What is one key technology used by FlightStats for real-time flight status data?

*A: Apache Kafka

Feedback: Correct! Apache Kafka is commonly used for real-time data streaming.

B: Hadoop Distributed File System (HDFS)

Feedback: Not quite. While HDFS is great for storage, it's not typically used for real-time data.
C: Cassandra

Feedback: Cassandra is a database solution, but not necessarily for real-time flight data.

D: PostgreSQL

Feedback: PostgreSQL is a robust database management system, but not specifically for real-time data
streaming.

Question 118 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following best describes the iterative nature of the data science process?

*A: The process goes through cycles of analysis, modeling, and evaluation, refining the solution with
each iteration.

Feedback: Correct! The data science process is iterative, involving repeated cycles of analysis,
modeling, and evaluation to refine solutions.

B: The process is completed in one linear sequence from data collection to analysis.

Feedback: Not quite. The data science process is not linear; it is iterative and involves refinement
through cycles.

C: The process skips documentation until the final solution is reached.

Feedback: Incorrect. Documentation is an ongoing part of the data science process, not something to be
skipped.

D: The process requires no reevaluation once a model is built.

Feedback: Incorrect. Reevaluation and refinement are key components of the iterative nature of the data
science process.

Question 119 - multiple choice, shuffle, easy difficulty

Question category: Module: Introduction to Big Data Modeling and Management

Which of the following big data management systems is most suitable for handling large volume
unstructured data?

*A: Hadoop

Feedback: Correct! Hadoop is designed to handle large volumes of unstructured data efficiently.
B: Relational Database Management System (RDBMS)

Feedback: RDBMS is more suitable for structured data, not for large volume unstructured data.

C: Spreadsheet Software

Feedback: Spreadsheet software is not capable of handling large volume unstructured data efficiently.

D: Document Management System

Feedback: Document Management Systems are for managing documents but not suitable for handling
big data volumes.

Question 120 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Why can images be modeled as vector arrays?

*A: Because they consist of pixel values that can be represented as numerical vectors

Feedback: Correct! Images consist of pixel values that can be represented as numerical vectors, allowing
for manipulation and analysis.

B: Because they are always stored in vector file formats

Feedback: Incorrect. Not all images are stored in vector file formats.

C: Because they are composed of scalar values exclusively

Feedback: No. Images are not composed exclusively of scalar values.

D: Because they can only be processed by vector processors

Feedback: Incorrect. Images can be processed by various types of processors, not just vector processors.

Question 121 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the main purpose of the Vector Space Model in information retrieval?

*A: To represent text documents as vectors of identifiers

Feedback: Correct! The Vector Space Model represents text documents as vectors of identifiers, which
helps in measuring the similarity between documents.
B: To store large-scale graph data efficiently

Feedback: Not quite. Storing large-scale graph data efficiently is not the main purpose of the Vector
Space Model.

C: To optimize the performance of relational databases

Feedback: Incorrect. Optimizing performance of relational databases is not the aim of the Vector Space
Model.

D: To facilitate the compression of multimedia files

Feedback: No. The Vector Space Model is not designed for compressing multimedia files.

Question 122 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which statistical measure can be used to determine the importance of a term in a document relative to a
collection of documents?

*A: TF-IDF

Feedback: Correct! TF-IDF is used to determine the importance of a term in a document relative to a
collection of documents.

B: PageRank

Feedback: Incorrect. PageRank is used to rank web pages, not to measure term importance in
documents.

C: Centrality

Feedback: Incorrect. Centrality is a measure in graph theory, not for term importance in text documents.

D: Clustering Coefficient

Feedback: Incorrect. Clustering Coefficient is used in network analysis, not for measuring term
importance in documents.

Question 123 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which file format is required to import data into Gephi?


*A: CSV

Feedback: Correct! CSV is the required format to import data into Gephi.

B: JSON

Feedback: Incorrect. JSON is not the required format for importing data into Gephi.

C: XML

Feedback: Incorrect. XML is not the required format for importing data into Gephi.

D: XLSX

Feedback: Incorrect. XLSX is not the required format for importing data into Gephi.

Question 124 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which operation is used to find the shortest path between two nodes in a graph?

*A: Dijkstra's algorithm

Feedback: Correct! Dijkstra's algorithm is commonly used to find the shortest path between two nodes
in a graph.

B: Depth-first search

Feedback: Incorrect. Depth-first search is used for traversing or searching tree or graph data structures
but not specifically for finding the shortest path.

C: Breadth-first search

Feedback: Incorrect. Breadth-first search can be used to find the shortest path in an unweighted graph,
but it is not the most efficient algorithm for weighted graphs.

D: Prim's algorithm

Feedback: Incorrect. Prim's algorithm is used for finding the minimum spanning tree of a graph, not the
shortest path.

Question 125 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)


Which of the following best describes a neighborhood operation in graph data models?

*A: Identifying all nodes that are directly connected to a given node

Feedback: Correct! A neighborhood operation involves identifying all nodes that are directly connected
to a given node.

B: Finding the shortest path between two nodes

Feedback: Incorrect. Finding the shortest path is a path operation, not a neighborhood operation.

C: Determining the degree of a node

Feedback: Incorrect. Determining the degree of a node is not specifically a neighborhood operation.

D: Calculating the centrality of a node

Feedback: Incorrect. Calculating the centrality of a node is not specifically a neighborhood operation.

Question 126 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the first step to import a CSV file into Gephi?

*A: Open Gephi and create a new project

Feedback: Correct! Opening Gephi and creating a new project is the first step to importing a CSV file.

B: Select 'Import' from the File menu

Feedback: Incorrect. While this seems plausible, it is necessary to first open Gephi and create a new
project.

C: Download the CSV file to your computer

Feedback: Incorrect. Downloading the file is a prerequisite, but not the first step in Gephi.

D: Use the 'Data Laboratory' tab

Feedback: Incorrect. The 'Data Laboratory' tab is used after importing the file.

Question 127 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)


In Gephi, which layout algorithm is commonly used to visualize large-scale graphs?

*A: ForceAtlas2

Feedback: Correct! ForceAtlas2 is commonly used to visualize large-scale graphs in Gephi.

B: Circular Layout

Feedback: Incorrect. Circular Layout is not typically used for large-scale graphs in Gephi.

C: Radial Axis Layout

Feedback: Incorrect. Radial Axis Layout is more suited for hierarchical visualizations, not large-scale
graphs.

D: Grid Layout

Feedback: Incorrect. Grid Layout is not typically used for visualizing large-scale graphs.

Question 128 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the first step to import a CSV file into Gephi?

A: Open the Data Laboratory

Feedback: Opening the Data Laboratory is not the first step. You need to start by creating a new project.

*B: Create a new project

Feedback: Correct! Creating a new project is the first step to import a CSV file into Gephi.

C: Run a layout algorithm

Feedback: Running a layout algorithm is not part of the import process. You need to import the data
first.

D: Perform statistical operations

Feedback: Performing statistical operations is done after importing the data, not before.

Question 129 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)


Which of the following is a statistical operation that can be performed on graph data in Gephi?

*A: Degree distribution

Feedback: Correct! Degree distribution is a common statistical operation performed on graph data in
Gephi.

B: Data aggregation

Feedback: Data aggregation is not typically performed on graph data in Gephi. Look for statistical
operations.

C: Anomaly detection

Feedback: Anomaly detection is not a standard statistical operation in Gephi. Focus on graph-specific
statistics.

D: Feature scaling

Feedback: Feature scaling is not relevant to graph data analysis in Gephi. Consider graph-specific
operations.

Question 130 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is a common application of the document vector model?

*A: Similarity search

Feedback: Correct! The document vector model is commonly used in similarity search to find
documents similar to a given query.

B: Creating relational databases

Feedback: Incorrect. Relational databases are structured differently and do not use the document vector
model.

C: Sorting numerical data

Feedback: Incorrect. Sorting numerical data does not typically involve the document vector model.

D: Performing image recognition

Feedback: Incorrect. Image recognition involves different models and techniques, not the document
vector model.
Question 131 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which operation is used to find the shortest path between two nodes in a graph?

*A: Dijkstra's algorithm

Feedback: Excellent! Dijkstra's algorithm is commonly used to find the shortest path in a graph.

B: Depth-first search

Feedback: Not quite. Depth-first search is used for exploring nodes and edges in graphs, not necessarily
for finding the shortest path.

C: Breadth-first search

Feedback: Incorrect. Breadth-first search can be used to find the shortest path in an unweighted graph,
but it's not the general method.

D: Prim's algorithm

Feedback: Incorrect. Prim's algorithm is used for finding the minimum spanning tree, not the shortest
path.

Question 132 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following layout algorithms in Gephi helps in visualizing the clusters within a graph by
grouping highly connected nodes together?

*A: Force Atlas

Feedback: Correct! Force Atlas algorithm helps in visualizing clusters by grouping highly connected
nodes together.

B: Fruchterman-Reingold

Feedback: Not quite. Fruchterman-Reingold is a force-directed algorithm but it does not specifically
focus on clustering.

C: Random Layout

Feedback: Incorrect. Random Layout does not group nodes based on their connections.
D: Circular Layout

Feedback: No. Circular Layout arranges nodes in a circle but does not emphasize on clustering.

Question 133 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

In the context of graph data models, what does the term 'connectivity' refer to?

*A: The degree to which nodes in a graph are connected with each other.

Feedback: Correct! Connectivity refers to the degree to which nodes in a graph are connected with each
other. This is a fundamental concept in graph theory and is crucial for understanding the structure and
function of networks.

B: The number of edges in a graph.

Feedback: Not quite. The number of edges in a graph is related to its density but does not specifically
refer to connectivity.

C: The shortest path between any two nodes in a graph.

Feedback: Incorrect. The shortest path between any two nodes is a specific measure within a graph, but
it is not the definition of connectivity.

D: The total number of nodes in a graph.

Feedback: No, the total number of nodes in a graph is its size, not its connectivity.

Question 134 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following operations are commonly associated with graph data models?

*A: Pathfinding

Feedback: Correct! Pathfinding is a fundamental operation in graph data models.

B: Sorting

Feedback: Incorrect. Sorting is not typically associated with graph data models.

*C: Neighborhood search


Feedback: Yes! Neighborhood search is a common operation in graph data models.

D: Data encryption

Feedback: No. Data encryption is not an operation associated with graph data models.

*E: Connectivity checks

Feedback: Correct! Connectivity checks are essential operations in graph data models.

Question 135 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following operations can be performed on graph data in Gephi?

*A: Statistical operations

Feedback: Correct! Gephi allows performing statistical operations on graph data.

*B: Layout algorithms

Feedback: Correct! Gephi enables users to apply layout algorithms on graph data.

C: Image processing

Feedback: Incorrect. Image processing is not an operation performed on graph data in Gephi.

D: Video editing

Feedback: Incorrect. Video editing is not an operation you can perform on graph data in Gephi.

*E: Term frequency analysis

Feedback: Correct! Term frequency analysis can be performed in Gephi.

Question 136 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following operations can be performed using Lucene?

*A: Text searching

Feedback: Correct! Lucene is commonly used for text searching.


B: Image recognition

Feedback: Image recognition is not a feature of Lucene. It is primarily used for text queries.

*C: Term frequency analysis

Feedback: Correct! Lucene can be used for term frequency analysis.

D: Graph visualization

Feedback: Graph visualization is not a feature of Lucene. It is used for text document querying.

*E: Weighted queries

Feedback: Correct! Lucene supports weighted queries to rank results.

Question 137 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following characteristics are associated with the Vector Space Model?

*A: Uses tf-idf weighting

Feedback: Correct! The Vector Space Model often uses tf-idf weighting to measure the importance of
terms.

*B: Represents documents as vectors

Feedback: Correct! The model represents documents as vectors in a multi-dimensional space.

C: Emphasizes graph connectivity

Feedback: Incorrect. The Vector Space Model does not emphasize graph connectivity; this is related to
graph data models.

*D: Applies to text retrieval

Feedback: Correct! The Vector Space Model is widely applied in text retrieval and information retrieval
systems.

E: Requires hierarchical data structure

Feedback: Incorrect. The Vector Space Model does not require a hierarchical data structure.

Question 138 - numeric, easy difficulty


Question category: Module: Big Data Modeling (Part 2)

What is the minimum number of dimensions required to represent a text document in the Vector Space
Model?

*A: 1.0

Feedback: Correct! At least one dimension is required to represent a text document in the Vector Space
Model.

Default Feedback: Incorrect. At least one dimension is required to represent a text document in the
Vector Space Model.

Question 139 - numeric, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the default damping factor value for the PageRank algorithm in Gephi?

*A: 0.85

Feedback: Correct! The default damping factor for the PageRank algorithm in Gephi is 0.85.

Default Feedback: Review the default settings for the PageRank algorithm in Gephi and try again.

Question 140 - numeric, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

How many dimensions are required to represent a document containing 500 unique words in the Vector
Space Model?

*A: 500.0

Feedback: Correct! Each unique word represents a dimension in the Vector Space Model.

Default Feedback: Incorrect. The number of dimensions corresponds to the number of unique words in
the document.

Question 141 - numeric, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

If a document vector has the term frequencies [2, 3, 5] , what is the Euclidean length (L2 norm) of the
vector?
*A: 6.164

Feedback: Correct! The Euclidean length (L2 norm) of the vector [2, 3, 5] is approximately 6.164.

Default Feedback: Incorrect. Please review the method to calculate the Euclidean length (L2 norm) of a
vector.

Question 142 - text match, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What term is used to describe the model that represents text documents as vectors for similarity search?
Please answer in all lowercase.

*A: vectorspacemodel

Feedback: Correct! The term is Vector Space Model.

*B: vsm

Feedback: Correct! VSM is the abbreviation for Vector Space Model.

Default Feedback: Incorrect. The term refers to a model that represents text documents as vectors for
similarity search.

Question 143 - text match, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which layout algorithm in Gephi is often used to visualize large-scale networks? Please answer in all
lowercase.

*A: forceatlas2

Feedback: Correct! ForceAtlas2 is commonly used to visualize large-scale networks in Gephi.

*B: force atlas 2

Feedback: Correct! Force Atlas 2 is commonly used to visualize large-scale networks in Gephi.

*C: forceatlas

Feedback: Correct! Force Atlas is commonly used to visualize large-scale networks in Gephi.

Default Feedback: Review the layout algorithms in Gephi and try again. Focus on those suited for large-
scale networks.
Question 144 - text match, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which tool is commonly used to query text documents in this course? Please answer in all lowercase.

*A: lucene

Feedback: Correct! Lucene is the tool commonly used to query text documents in this course.

B: solr

Feedback: Incorrect. Solr is built on top of Lucene, but the tool used in this course for querying text
documents is Lucene.

Default Feedback: Incorrect. The tool used to query text documents in this course is Lucene.

Question 145 - text match, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which tool can be used to query text documents? Please answer in all lowercase.

*A: lucene

Feedback: Correct! Lucene is used for querying text documents.

Default Feedback: Incorrect. You should revisit the section on tools used for querying text documents.

Question 146 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the purpose of performing statistical operations on graph data in Gephi?

*A: To identify patterns and trends within the data

Feedback: Correct! Statistical operations help in identifying patterns and trends within the graph data.

B: To delete irrelevant nodes

Feedback: Incorrect. Statistical operations are not used for deleting nodes.

C: To import new data into the graph

Feedback: Incorrect. Importing new data is a separate process from performing statistical operations.
D: To export the graph data to CSV

Feedback: Incorrect. Exporting data is a separate operation in Gephi.

Question 147 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the primary advantage of using arrays as a data model?

*A: They allow for efficient data retrieval and storage

Feedback: Correct! Arrays provide efficient ways to store and retrieve data due to their indexed nature.

B: They can only store integer data types

Feedback: Incorrect. Arrays can store various data types, not just integers.

C: They eliminate the need for data validation

Feedback: Incorrect. Data validation is still necessary when using arrays.

D: They are immutable and cannot be changed

Feedback: Incorrect. Arrays can be mutable depending on the programming language.

Question 148 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Why can images be modeled as vector arrays?

*A: Because images are composed of pixel values that can be represented in multi-dimensional arrays

Feedback: Correct! Images consist of pixel values that can be efficiently represented using multi-
dimensional arrays.

B: Because images do not require any storage space

Feedback: Incorrect. Images do require storage space, often large amounts.

C: Because images are immutable data

Feedback: Incorrect. Images can be modified and are not necessarily immutable.

D: Because images consist only of binary data


Feedback: Incorrect. Images can consist of more than just binary data, including color values and other
information.

Question 149 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which operations are commonly associated with graph data models?

*A: Path operations

Feedback: Correct! Path operations are fundamental in graph data models for finding routes between
nodes.

B: Sorting operations

Feedback: Incorrect. Sorting operations are not specifically related to graph data models.

*C: Neighborhood operations

Feedback: Correct! Neighborhood operations are used to find the adjacent nodes in a graph.

D: Compilation operations

Feedback: Incorrect. Compilation operations are not related to graph data models.

*E: Connectivity operations

Feedback: Correct! Connectivity operations determine how nodes are connected within the graph.

Question 150 - text match, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the concept used in information retrieval to represent documents in a continuous space? Please
answer in all lowercase.

*A: vectorspacemodel

Feedback: Correct! The Vector Space Model is used to represent documents in a continuous space.

Default Feedback: Incorrect. The concept is related to representing documents in a multi-dimensional


space.

Question 151 - text match, easy difficulty


Question category: Module: Big Data Modeling (Part 2)

Which tool is used for querying text documents in the lesson? Please answer in all lowercase.

*A: lucene

Feedback: Correct! Lucene is the tool used for querying text documents.

*B: apachelucene

Feedback: Correct! Apache Lucene is the full name of the tool used.

Default Feedback: Incorrect. Please review the lesson materials on the tools used for querying text
documents.

Question 152 - numeric, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

How many neighbors does a node have in a complete graph with 5 nodes?

*A: 4.0

Feedback: Correct! In a complete graph with 5 nodes, each node has 4 neighbors.

Default Feedback: Incorrect. Remember that in a complete graph, every node is connected to every other
node.

Question 153 - numeric, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

If you import a graph with 200 nodes and apply a community detection algorithm, you find 5
communities. What is the average number of nodes per community?

*A: 40.0

Feedback: Correct! Dividing the number of nodes by the number of communities gives the average
number of nodes per community.

Default Feedback: Incorrect. Recalculate the average by dividing the total number of nodes by the
number of communities.

Question 154 - numeric, easy difficulty

Question category: Module: Big Data Modeling (Part 2)


In a vector space model, if a document vector is represented as \[ \mathbf{d} = (3, 4) \], what is the
Euclidean norm of this vector?

*A: 5.0

Feedback: Great job! You used the Euclidean formula correctly to calculate the norm.

Default Feedback: Remember to use the Euclidean formula: \[ ||\mathbf{d}|| = \sqrt{x^2 + y^2} \].
Revisit the concept of vector norms in the course material.

Question 155 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which operations can you perform on graph data using Gephi?

*A: Statistical operations

Feedback: Correct! Gephi supports statistical operations on graph data.

*B: Layout algorithms

Feedback: Correct! Gephi can perform layout algorithms on graph data.

C: Image editing

Feedback: Incorrect. Gephi is not used for image editing.

D: Weighted queries

Feedback: Incorrect. While weighted queries are possible, they are not typically associated with Gephi.

E: Document processing

Feedback: Incorrect. Gephi does not process documents.

Question 156 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What does the term 'term frequency inverse document frequency' (TF-IDF) typically measure in a set of
documents?

*A: The importance of a term in relation to a collection of documents


Feedback: Correct! TF-IDF measures the importance of a term in a document relative to a collection of
documents.

B: The number of documents containing a specific term

Feedback: Incorrect. This describes document frequency, not TF-IDF.

C: The frequency of terms in a single document

Feedback: Incorrect. This is just term frequency, not considering inverse document frequency.

D: The total number of terms in a document

Feedback: Incorrect. TF-IDF is not concerned with the total number of terms in a document.

Question 157 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What is the primary purpose of using Gephi when importing a CSV file?

*A: To visualize and analyze the structure of graph data

Feedback: Correct! Gephi is primarily used for visualizing and analyzing graph data.

B: To edit the CSV file content before analysis

Feedback: Incorrect. Gephi is not used for editing CSV file content.

C: To convert CSV data into a SQL database

Feedback: Incorrect. Gephi does not convert CSV files into SQL databases.

D: To perform text mining on CSV data

Feedback: Incorrect. Gephi is not used for text mining purposes.

Question 158 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

What does the Graph Data Model primarily represent?

*A: The Graph Data Model represents data as a collection of nodes and edges.
Feedback: Correct! The Graph Data Model uses nodes and edges to represent entities and their
relationships.

B: The Graph Data Model represents data as hierarchical trees.

Feedback: Not quite. Hierarchical trees are typically associated with tree data models.

C: The Graph Data Model is mainly used for tabular data representation.

Feedback: Incorrect. Tabular data representation is a characteristic of relational data models.

D: The Graph Data Model uses key-value pairs exclusively.

Feedback: This describes key-value data models, not graph data models.

Question 159 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following statements describe characteristics of arrays as a data model?

*A: Arrays consist of elements identified by index.

Feedback: Correct! Arrays use indices to uniquely identify their elements.

B: Arrays can dynamically change size.

Feedback: Not exactly. Arrays have a fixed size when declared, unlike other data structures like lists.

*C: Arrays store elements of the same data type.

Feedback: Correct! All elements in an array are of the same data type.

D: Arrays allow storage of mixed data types.

Feedback: Incorrect. Arrays do not support storing elements of different data types.

Question 160 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

How are images typically modeled in data systems?

*A: Images can be represented as matrix arrays of pixel vectors.

Feedback: Correct! Images are modeled as matrices where each element is a pixel vector.
B: Images can only be represented as scalar numbers.

Feedback: Not quite. Images are complex data models that involve more than just scalar values.

C: Images are exclusively represented as text data.

Feedback: Incorrect. Images are visual data and are not represented as text.

D: Images utilize hierarchical structures for representation.

Feedback: This is related to tree data models, not image representation.

Question 161 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

In the context of the document vector model, how is a document typically represented for similarity
search?

*A: A document is represented as a vector where each element corresponds to a term's frequency.

Feedback: Correct! The document vector model uses term frequencies to represent documents as vectors
for similarity search.

B: A document is represented by the number of pages it contains.

Feedback: Incorrect. The document vector model does not use the number of pages as a representation.

C: A document is represented by its file size in megabytes.

Feedback: Incorrect. File size is not used in the document vector model for similarity search.

D: A document is represented as an array of its paragraphs.

Feedback: Incorrect. Although paragraphs could be part of a document, the document vector model
focuses on term frequencies instead.

Question 162 - multiple choice, shuffle, easy difficulty

Question category: Module: Big Data Modeling (Part 2)

Which of the following steps is necessary to import a CSV file into Gephi and perform statistical
operations on the graph data?

*A: Load the CSV file and configure data laboratory settings
Feedback: Correct! Configuring the data laboratory settings is crucial for managing your data efficiently.

B: Use Gephi's built-in text editor to modify the CSV file

Feedback: Gephi does not have a built-in text editor for modifying CSV files. Consider other tools for
editing.

C: Export the CSV file in XML format before importing

Feedback: Exporting to XML is unnecessary for CSV import in Gephi. Focus on the CSV import
process.

D: Visualize the graph data before importing the CSV file

Feedback: Visualization occurs after data import, not before. Ensure you follow the correct order.

Question 163 - multiple choice, shuffle, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

Which component of an Information System is responsible for transforming data into meaningful
information?

*A: Processing

Feedback: Correct! Processing is the component responsible for transforming data into meaningful
information.

B: Storage

Feedback: Incorrect. Storage is responsible for storing data, not transforming it.

C: Input

Feedback: No, input is responsible for collecting data, not transforming it.

D: Output

Feedback: Incorrect. Output is responsible for presenting the processed information, not transforming
data.

Question 164 - multiple choice, shuffle, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

Which of the following best describes an Information System?


*A: A set of components that collect, store, and process data

Feedback: Correct! An Information System is indeed a set of components that collect, store, and process
data.

B: A type of software used for graphic design

Feedback: Incorrect. Graphic design software is not an Information System.

C: A hardware device like a computer or printer

Feedback: Not quite. While hardware devices can be part of an Information System, they are not
Information Systems by themselves.

D: A network of interconnected devices

Feedback: Incorrect. Though networks can be part of an Information System, they are not Information
Systems themselves.

Question 165 - multiple choice, shuffle, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

Which of the following best explains the role of feedback in an Information System?

*A: Feedback helps in evaluating and refining the system's performance.

Feedback: Correct! Feedback is essential for evaluating and improving the performance of an
Information System.

B: Feedback determines the hardware components required for the system.

Feedback: Incorrect. Feedback is not related to determining hardware components.

C: Feedback is used to create new data for the system.

Feedback: Incorrect. Feedback does not create new data but helps in evaluating existing data.

D: Feedback is responsible for securing the data within the system.

Feedback: Incorrect. Feedback is not responsible for data security.

Question 166 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game
Which of the following are considered key activities in the development of an Information System?

*A: System Design

Feedback: Correct! System Design is a crucial activity in the development of an Information System.

*B: Data Collection

Feedback: Correct! Data Collection is essential for providing the necessary information to the system.

C: Marketing Strategy

Feedback: Incorrect. Marketing Strategy is not a key activity in the development of an Information
System.

*D: System Implementation

Feedback: Correct! System Implementation is a major phase in the development of an Information


System.

E: Customer Feedback Analysis

Feedback: Incorrect. Customer Feedback Analysis is not a direct activity in the development of an
Information System.

Question 167 - text match, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

What is the term used to describe the process of converting raw data into meaningful information in an
Information System? Please answer in all lowercase.

*A: processing

Feedback: Correct! Processing is the term used to describe the conversion of raw data into meaningful
information.

*B: data processing

Feedback: Correct! Data processing is used to describe this conversion process.

Default Feedback: Incorrect. Please review the lesson on information processing.

Question 168 - text match, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game
What is the term for the data processing cycle component that involves transforming raw data into
meaningful output? Please answer in all lowercase.

*A: processing

Feedback: Correct! Processing is the stage where data is transformed into meaningful information.

*B: transformation

Feedback: Correct! Transformation is another term for processing data into meaningful output.

Default Feedback: Review the stages of the data processing cycle to find the correct term.

Question 169 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

Select all the components that are essential to an Information System.

*A: Hardware

Feedback: Yes, hardware is a fundamental component of an Information System.

*B: Software

Feedback: Correct! Software is crucial for the operation of an Information System.

C: Culture

Feedback: Culture, while important in organizations, is not a component of an Information System.

*D: Data

Feedback: Correct! Data is a key component of an Information System.

*E: Processes

Feedback: Indeed, processes are essential to Information Systems.

Question 170 - multiple choice, shuffle, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

Which of the following best describes an Information System?

*A: A set of components that collect, store, and process data


Feedback: Correct! An Information System is designed to collect, store, and process data efficiently.

B: A framework for hardware and software only

Feedback: Close, but an Information System encompasses more than just hardware and software.

C: A network of enterprises and resources

Feedback: Not quite. While networks are involved, an Information System specifically pertains to data
handling.

D: A collection of software applications

Feedback: This is part of it, but an Information System is more comprehensive, including people and
processes.

Question 171 - multiple choice, shuffle, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

What is an Information System?

A: An automated method to collect and process data

Feedback: Consider how data collection and processing are typically handled.

*B: A coordinated system of people, processes, and technology

Feedback: Correct! This is a comprehensive definition of an Information System.

C: A network of software applications

Feedback: Think about the broader scope of Information Systems beyond just software.

D: A database management tool

Feedback: While databases are part of Information Systems, they are not the entirety of it.

Question 172 - multiple choice, shuffle, easy difficulty

Question category: Module: Designing a Big Data Management System for an Online Game

What is the primary purpose of an Information System?

A: Data storage
Feedback: Consider whether data storage alone is the primary purpose.

*B: Enhancing decision making

Feedback: Correct! One of the main purposes of Information Systems is to aid in decision making.

C: Networking computers

Feedback: Think about the broader goals of Information Systems beyond networking.

D: Automating tasks

Feedback: While automation can be a function, it is not the primary purpose of Information Systems.

Question 173 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following use cases is most appropriate for a data model?

*A: Designing the structure of a new database.

Feedback: Correct! Designing the structure of a new database is an appropriate use case for a data
model.

B: Transferring data between two different systems.

Feedback: Incorrect. Transferring data between systems is more related to data formats which define
how data is encoded for transfer.

C: Compressing large datasets for storage efficiency.

Feedback: Incorrect. Compressing datasets is related to data formats which can include compression
algorithms.

D: Visualizing data for analysis.

Feedback: Incorrect. Visualizing data is not directly related to using data models; it involves using
visualization tools and techniques.

Question 174 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following best explains the difference between a data format and a data model?
*A: A data format specifies how data is encoded, while a data model defines the structure and
relationships of data.

Feedback: Correct! A data format indeed specifies how data is encoded and stored, such as CSV, JSON,
or XML, while a data model defines how data is structured and related.

B: A data format defines the structure of data, while a data model specifies how data is encoded.

Feedback: Incorrect. This statement reverses the definitions. A data model defines the structure of data,
whereas a data format specifies how data is encoded.

C: A data format is used for data storage, while a data model is used for data retrieval.

Feedback: Not quite. Both data formats and data models can be used for storage and retrieval, but the
key difference lies in encoding versus structure.

D: A data format is language-specific, while a data model is language-agnostic.

Feedback: Incorrect. Data formats can be language-agnostic (like CSV), and data models are generally
language-neutral as well.

Question 175 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which type of plot is best suited for visualizing the distribution of a single variable in streaming weather
data?

*A: Histogram

Feedback: Correct! Histograms are ideal for visualizing the distribution of a single variable over time.

B: Box plot

Feedback: Incorrect. Box plots are useful for identifying outliers and understanding the spread of data,
but not specifically for distribution.

C: Heat map

Feedback: Incorrect. Heat maps are better suited for visualizing data density or relationships between
variables in a matrix.

D: Line plot

Feedback: Incorrect. Line plots are better for showing trends over time, not the distribution of a single
variable.
Question 176 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which data visualization technique is most effective for identifying trends in real-time streaming
weather data?

*A: Line plot

Feedback: Correct! Line plots are ideal for visualizing trends over time in real-time streaming data.

B: Scatter plot

Feedback: Incorrect. Scatter plots are better for showing relationships between two variables, not for
identifying trends over time.

C: Pie chart

Feedback: Incorrect. Pie charts are better for showing proportions of a whole, not for visualizing trends
over time.

D: Bar chart

Feedback: Incorrect. Bar charts are better for comparing discrete categories, not for visualizing trends in
streaming data.

Question 177 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following best describes the concept of schema on read?

A: The schema is applied to the data as it is written into storage.

Feedback: Incorrect. Applying the schema as data is written into storage describes schema on write, not
schema on read.

*B: The schema is applied to the data only when it is read, allowing for more flexibility in data storage.

Feedback: Correct! Schema on read applies the schema to the data only when it is read, which allows for
more flexibility in data storage.

C: The schema is used to transform the data before writing it into storage.

Feedback: Incorrect. Using the schema to transform the data before writing it into storage is not
characteristic of schema on read.
D: The schema is predefined and must be adhered to strictly when writing and reading data.

Feedback: Incorrect. A predefined schema that must be adhered to strictly when writing and reading data
aligns with the concept of schema on write, not schema on read.

Question 178 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

What is a data stream and what are its characteristics?

*A: A sequence of data elements made available over time, typically unbounded and continuous.

Feedback: Correct! A data stream is indeed a sequence of data elements made available over time, and it
is typically unbounded and continuous.

B: A fixed set of data stored in a database, typically bounded and static.

Feedback: Incorrect. A fixed set of data stored in a database is not a data stream; such data is typically
bounded and static.

C: A collection of data packets sent over a network, typically bounded and discrete.

Feedback: Incorrect. While data packets can be part of a data stream, a data stream itself is not just a
collection of data packets and is typically unbounded and continuous.

D: A series of data transformations applied to a dataset, typically bounded and finite.

Feedback: Incorrect. Data transformations applied to a dataset do not define a data stream, as a data
stream is unbounded and continuous.

Question 179 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which programming language is commonly used for creating plots of streaming weather station data?

*A: Python

Feedback: Correct! Python is widely used for data analysis and visualization, including creating plots of
streaming weather data.

B: Java

Feedback: Incorrect. Java is not commonly used for creating plots of streaming weather data.
C: C++

Feedback: Incorrect. C++ is not typically used for data visualization tasks like plotting streaming
weather data.

D: Ruby

Feedback: Incorrect. Ruby is not a common choice for data visualization of streaming weather data.

Question 180 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

When analyzing real-time streaming data from a weather station, which of the following techniques can
be used to handle missing data?

*A: Interpolation

Feedback: Correct! Interpolation estimates missing values within the range of available data points.

B: Replication

Feedback: Incorrect. Replication duplicates existing values, which may not provide accurate estimates
for missing data.

C: Extrapolation

Feedback: Incorrect. Extrapolation estimates values outside the range of available data, which is not
suitable for filling missing data within the range.

D: Random Sampling

Feedback: Incorrect. Random sampling does not address the problem of missing data in a systematic
way.

Question 181 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

What is the primary use case for a data model in database design?

*A: To define the structure, storage, and retrieval of data

Feedback: Correct! Data models define how data is structured, stored, and retrieved in a database.

B: To visualize data in charts and graphs


Feedback: Incorrect. Data models are used for defining data structures, not for visualizing data.

C: To convert data between different formats

Feedback: Incorrect. Data models are not primarily used for data conversion.

D: To compress and decompress data

Feedback: Incorrect. Data models do not deal with data compression.

Question 182 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following best describes a CSV's data format?

*A: A plain text format for tabular data

Feedback: Correct! CSV is a plain text format used to represent tabular data.

B: A binary format for structured data

Feedback: Incorrect. CSV is a plain text format, not a binary format.

C: A format for unstructured data

Feedback: Incorrect. CSV organizes data in a structured, tabular format.

D: A format for hierarchical data

Feedback: Incorrect. CSV is used for tabular data, not hierarchical data.

Question 183 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following is a key characteristic of a data stream?

*A: Continuous flow of data

Feedback: Correct! A key characteristic of a data stream is the continuous flow of data.

B: Fixed data format

Feedback: Incorrect. Data streams often have variable data formats, not fixed.
C: High latency

Feedback: Incorrect. Data streams are typically characterized by low latency.

D: Batch processing

Feedback: Incorrect. Data streams are processed in real-time, not in batches.

Question 184 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following best describes the significance of data lakes in data management?

*A: Data lakes provide a centralized repository for storing large volumes of raw data.

Feedback: Correct! Data lakes indeed serve as a centralized repository for storing large volumes of raw
data, enabling efficient data management.

B: Data lakes primarily focus on the real-time processing of streaming data.

Feedback: Incorrect. While data lakes can store streaming data, their primary focus is on providing a
centralized repository for large volumes of raw data.

C: Data lakes are designed to replace traditional data warehouses entirely.

Feedback: Incorrect. Data lakes complement traditional data warehouses but do not necessarily replace
them.

D: Data lakes are only useful for structured data.

Feedback: Incorrect. Data lakes are designed to handle structured, semi-structured, and unstructured
data.

Question 185 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following best describes the significance of data lakes in data management?

*A: Data lakes enable the storage of both structured and unstructured data, allowing for greater
flexibility in data processing.

Feedback: Correct! Data lakes provide a flexible storage solution that can handle various data formats,
making it easier to perform diverse types of data processing.
B: Data lakes are designed to store only structured data, which streamlines data processing activities.

Feedback: Incorrect. Data lakes are capable of storing both structured and unstructured data, offering
more flexibility than systems designed for structured data only.

C: Data lakes enhance data security by restricting access to specific users and applications.

Feedback: Incorrect. While data lakes can include security measures, their primary significance is in
their ability to store and manage diverse data types.

D: Data lakes require real-time processing of data streams, making them ideal for time-sensitive
applications.

Feedback: Incorrect. Data lakes are typically used for batch processing and can handle large volumes of
data, but they are not inherently designed for real-time processing.

Question 186 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models

Which of the following are examples of data formats?

*A: CSV

Feedback: Correct! CSV is a common data format used for encoding tabular data.

*B: JSON

Feedback: Correct! JSON is a widely-used data format for encoding structured data.

C: Relational schema

Feedback: Incorrect. A relational schema is an example of a data model, not a data format.

*D: XML

Feedback: Correct! XML is a versatile data format used for encoding structured data.

E: ER diagram

Feedback: Incorrect. An ER diagram is an example of a data model, not a data format.

Question 187 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models


Which of the following are characteristics of semi-structured data?

*A: It has a flexible schema.

Feedback: Correct! Semi-structured data does have a flexible schema, which allows for easier
integration of different data sources.

B: It is stored in a relational database.

Feedback: Incorrect. Semi-structured data is typically not stored in relational databases but in formats
like JSON or XML.

*C: It includes metadata tags.

Feedback: Correct! Semi-structured data includes metadata tags that help define the data structure.

D: It lacks any organizational structure.

Feedback: Incorrect. Semi-structured data does have some organizational structure, provided by the
metadata tags.

E: It is always unformatted.

Feedback: Incorrect. Semi-structured data can have some level of formatting, unlike completely
unstructured data.

Question 188 - checkbox, shuffle, partial credit, medium

Question category: Module: Working With Data Models

Select the characteristics that differentiate streaming data from traditional data processing.

*A: Low-latency processing

Feedback: Correct! Low-latency processing is a key characteristic of streaming data.

B: Batch processing

Feedback: Incorrect. Batch processing is characteristic of traditional data processing, not streaming data.

*C: Real-time analytics

Feedback: Correct! Real-time analytics is a significant feature of streaming data.

D: Finite data sets


Feedback: Incorrect. Finite data sets are typical in traditional data processing, not in streaming data.

*E: Continuous data input

Feedback: Correct! Continuous data input is a hallmark of streaming data.

F: Delayed processing

Feedback: Incorrect. Delayed processing is not a characteristic of streaming data; it is more typical of
traditional data processing.

Question 189 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models

Select all correct examples of data models.

*A: Relational model

Feedback: Correct! The relational model is a type of data model.

*B: Entity-relationship model

Feedback: Correct! The entity-relationship model is a type of data model.

C: JSON

Feedback: Incorrect. JSON is a data format, not a data model.

*D: Hierarchical model

Feedback: Correct! The hierarchical model is a type of data model.

E: XML

Feedback: Incorrect. XML is a data format, not a data model.

Question 190 - checkbox, shuffle, partial credit, medium

Question category: Module: Working With Data Models

Identify the requirements needed for an effective streaming data system.

*A: Low latency processing

Feedback: Correct. Low latency processing is a crucial requirement for effective streaming data systems.
*B: High scalability

Feedback: Correct. High scalability is essential for handling large volumes of streaming data.

C: Batch processing capabilities

Feedback: Incorrect. Batch processing is not a primary requirement for streaming data systems, which
focus on real-time processing.

*D: Fault-tolerant architecture

Feedback: Correct. Fault-tolerant architecture ensures the reliability of streaming data systems.

E: Fixed schema

Feedback: Incorrect. Streaming data systems often require flexible schemas to handle varying data
formats.

Question 191 - numeric, easy difficulty

Question category: Module: Working With Data Models

What is the ideal latency (in milliseconds) for a high-performance streaming data system?

*A: 100.0

Feedback: Correct! An ideal latency for a high-performance streaming data system is around 100
milliseconds.

Default Feedback: Incorrect. Consider the requirements for real-time data processing in high-
performance streaming systems.

Question 192 - numeric, easy difficulty

Question category: Module: Working With Data Models

What is the typical latency range (in milliseconds) for streaming data systems to process data?

*A: 100.0

Feedback: Correct! Streaming data systems typically aim for low-latency processing, often around 100
milliseconds.

Default Feedback: Incorrect. Streaming data systems generally aim for low-latency processing. Review
the course materials on the latency requirements for streaming data systems.
Question 193 - numeric, easy difficulty

Question category: Module: Working With Data Models

How many fields are there in a typical CSV file header if the file contains columns: Name, Age, and
Email?

*A: 3.0

Feedback: Correct! The CSV file header contains three fields: Name, Age, and Email.

Default Feedback: Incorrect. Remember that the number of fields in the header corresponds to the
number of columns in the CSV file.

Question 194 - text match, easy difficulty

Question category: Module: Working With Data Models

What does CSV stand for? Provide your answer in all lowercase without spaces. Please answer in all
lowercase.

*A: csv

Feedback: Correct! CSV stands for comma-separated values, which is a common format for tabular data.

*B: comma-separated values

Feedback: Correct! CSV is an abbreviation for comma-separated values.

Default Feedback: Incorrect. Review the concept of CSV and its full form.

Question 195 - text match, easy difficulty

Question category: Module: Working With Data Models

What is the term used for the continuous flow of data from a weather station? Please answer in all
lowercase.

*A: streaming

Feedback: Correct! 'Streaming' refers to the continuous flow of data from sources like weather stations.

*B: stream

Feedback: Correct! 'Stream' is another term used to describe the continuous flow of data.
Default Feedback: Incorrect. The term refers to the continuous flow of data from sources like weather
stations.

Question 196 - text match, easy difficulty

Question category: Module: Working With Data Models

Provide an example of a data model used in database design. Please answer in all lowercase. Please
answer in all lowercase.

*A: relational

Feedback: Correct! The relational model is a widely used data model in database design.

*B: entityrelationship

Feedback: Correct! The entity-relationship model is another common data model.

Default Feedback: Incorrect. Please review the data models commonly used in database design.

Question 197 - text match, easy difficulty

Question category: Module: Working With Data Models

Provide a common data format used for encoding tabular data. Please answer in all lowercase.

*A: csv

Feedback: Correct! CSV is a common data format used for encoding tabular data.

Default Feedback: Incorrect. The correct answer is a common data format used for encoding tabular
data.

Question 198 - text match, easy difficulty

Question category: Module: Working With Data Models

What term is used to describe data that is stored and not currently being processed or moved? Please
answer in all lowercase.

*A: dataatrest

Feedback: Correct! Data that is stored and not currently being processed or moved is referred to as data
at rest.
Default Feedback: Incorrect. Try again. Remember, the term describes data that is stored and not
currently being processed or moved.

Question 199 - text match, easy difficulty

Question category: Module: Working With Data Models

What is the process of estimating missing values within the range of a set of known data points called?
Please answer in all lowercase.

*A: interpolation

Feedback: Correct! Interpolation is the process of estimating missing values within the range of known
data points.

*B: interpolating

Feedback: Correct! Interpolating is the process of estimating missing values within the range of known
data points.

Default Feedback: Incorrect. Revisit the methods of handling missing data in real-time weather data
analysis.

Question 200 - text match, easy difficulty

Question category: Module: Working With Data Models

What term describes the processing of data in real-time as it arrives? Please answer in all lowercase.
Please answer in all lowercase.

*A: streaming

Feedback: Correct! Streaming describes the real-time processing of data as it arrives.

*B: stream

Feedback: Correct! Stream is also an acceptable term for real-time data processing.

*C: streamingdata

Feedback: Correct! Streamingdata is another acceptable term.

Default Feedback: Incorrect. Review the concept of real-time data processing and try again.

Question 201 - multiple choice, shuffle, easy difficulty


Question category: Module: Working With Data Models

When analyzing real-time streaming data from a weather station, which of the following metrics would
be most useful for determining if a sudden temperature drop has occurred?

*A: Temperature anomaly

Feedback: Correct! Temperature anomaly is a measure of deviation from a long-term average


temperature, which is useful for identifying sudden changes.

B: Wind speed variability

Feedback: Not quite. While wind speed variability can be important, it does not directly indicate a
sudden temperature drop.

C: Humidity index

Feedback: Incorrect. The humidity index measures moisture in the air, not temperature changes.

D: Precipitation rate

Feedback: Precipitation rate is related to rainfall, not temperature changes. Try again.

Question 202 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

Which of the following best describes a data stream?

*A: A continuous flow of data that is generated over time.

Feedback: Correct! A data stream is indeed a continuous flow of data generated over time.

B: A static dataset that is stored in a database.

Feedback: Incorrect. A data stream is not a static dataset stored in a database.

C: A batch of data processed at regular intervals.

Feedback: Not quite. A data stream is not processed in batches at regular intervals.

D: A temporary storage location for data.

Feedback: Incorrect. A data stream is not a temporary storage location for data.

Question 203 - multiple choice, shuffle, easy difficulty


Question category: Module: Working With Data Models

Which of the following best explains the difference between a data format and a data model?

*A: Data format defines the structure of data, while data model defines the relationships among the data.

Feedback: Correct! Data format pertains to the structure of data, whereas data model concerns the
relationships within the data.

B: Data format defines the relationships among the data, while data model defines the structure of data.

Feedback: Incorrect. Data format actually defines the structure of data, not the relationships.

C: Data format and data model are synonyms and can be used interchangeably.

Feedback: Incorrect. Data format and data model are distinct concepts and cannot be used
interchangeably.

D: Data format defines how data is stored, while data model defines how data is transmitted.

Feedback: Incorrect. Data format defines the structure, not specifically how it is stored or transmitted.

Question 204 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models

Which of the following are characteristics of a data stream?

*A: Continuous generation of data.

Feedback: Correct! Data streams are continuously generated.

B: Data is processed in batches.

Feedback: Incorrect. Data streams are not processed in batches.

*C: Requires real-time processing.

Feedback: Correct! Real-time processing is often required for data streams.

D: Data is static and unchanging.

Feedback: Incorrect. Data streams are dynamic and continuously changing.

*E: Data arrives in a sequence over time.


Feedback: Correct! Data in streams arrives in sequence over time.

Question 205 - checkbox, shuffle, partial credit, medium

Question category: Module: Working With Data Models

Which of the following are appropriate use cases for data models and data formats?

*A: Using JSON to store configuration settings.

Feedback: Correct! JSON is commonly used as a data format for storing configuration settings.

*B: Using an entity-relationship diagram to design a database.

Feedback: Correct! An entity-relationship diagram is a data model used for designing databases.

C: Using CSV to visualize complex data relationships.

Feedback: Incorrect. CSV is a data format that is not typically used for visualizing complex data
relationships.

*D: Using XML to represent hierarchical data.

Feedback: Correct! XML is a data format that is well-suited for representing hierarchical data.

E: Using UML diagrams to store large datasets.

Feedback: Incorrect. UML diagrams are data models used for system design, not for storing large
datasets.

Question 206 - checkbox, shuffle, partial credit, medium

Question category: Module: Working With Data Models

Which of the following visualizations would be most appropriate for interpreting real-time weather data
from a weather station?

*A: Line chart

Feedback: Correct! Line charts are excellent for showing trends over time, such as temperature or wind
speed.

B: Pie chart

Feedback: Incorrect. Pie charts are used for showing proportions and are not suitable for time-series
data.
C: Scatter plot

Feedback: Not quite. Scatter plots are useful for showing relationships between two variables but are not
ideal for time-series data.

*D: Bar chart

Feedback: Correct! Bar charts can be used for comparing different categories, such as daily rainfall
amounts.

E: Histogram

Feedback: Incorrect. Histograms are used for showing frequency distributions, not for real-time data
visualization.

Question 207 - text match, easy difficulty

Question category: Module: Working With Data Models

What is the term for the process of defining schema as data is ingested? Please answer in all lowercase.

*A: schemaonread

Feedback: Correct! Schema on read defines schema as data is ingested.

*B: schema-on-read

Feedback: Correct! Schema on read defines schema as data is ingested.

Default Feedback: Incorrect. Please review the concept of schema on read and schema on write.

Question 208 - text match, easy difficulty

Question category: Module: Working With Data Models

What type of plot is typically used to display temperature changes over time in a real-time data stream?
Please answer in all lowercase.

*A: line

Feedback: Correct! A line plot is typically used to display temperature changes over time.

*B: lineplot

Feedback: Correct! A line plot is typically used to display temperature changes over time.
Default Feedback: Incorrect. Review the types of plots used for time-series data visualization.

Question 209 - numeric, easy difficulty

Question category: Module: Working With Data Models

Consider a streaming data system that processes data with a latency of less than 5 seconds. What is the
maximum latency in seconds for this system to be considered real-time?

*A: 5.0

Feedback: Correct! Real-time systems typically have a latency of a few seconds or less.

Default Feedback: Consider how quickly data needs to be processed to be considered real-time.

Question 210 - numeric, easy difficulty

Question category: Module: Working With Data Models

If a CSV file contains 5 rows of data, how many lines will the file contain including the header?

*A: 6.0

Feedback: Correct! The file will have one header line and five lines of data, totaling six lines.

Default Feedback: Recall how many lines a CSV file contains with both data and header information.

Question 211 - text match, easy difficulty

Question category: Module: Working With Data Models

What is a commonly used file extension for comma-separated values files? Please answer in all
lowercase.

*A: csv

Feedback: Correct! CSV is a widely recognized file extension for comma-separated values files.

Default Feedback: Consider revisiting the material on common file extensions for data formats.

Question 212 - text match, easy difficulty

Question category: Module: Working With Data Models

What is the term for the process of analyzing data as it flows through a system? Please answer in all
lowercase.
*A: streaming

Feedback: Correct! Streaming refers to the real-time analysis of data as it flows through a system.

*B: realtime

Feedback: Correct! However, the more appropriate term is 'streaming'.

Default Feedback: Think about how data is processed in real-time as it moves continuously.

Question 213 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models

Which of the following are characteristic of a data lake?

*A: Stores data in its raw format.

Feedback: Correct! Data lakes are known for storing data in its original, raw format.

B: Requires schema on write.

Feedback: Incorrect. Data lakes generally use schema on read, not schema on write.

*C: Supports batch processing of streaming data.

Feedback: Correct! Data lakes can support the batch processing of large volumes of data, including
streaming data.

D: Only supports structured data.

Feedback: Incorrect. Data lakes can store and process both structured and unstructured data.

E: Offers real-time data processing without delay.

Feedback: Not quite. While data lakes can process large volumes of data, they typically do not provide
real-time processing.

Question 214 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models

Which of the following elements are crucial when creating plots of streaming weather station data?
Select all that apply.

*A: Time-stamping data accurately.


Feedback: Time-stamping is crucial for understanding when each data point was recorded.

B: Implementing data encryption.

Feedback: While security is important, encryption is not directly related to plotting data.

*C: Correctly labeling axes.

Feedback: Labels help make plots understandable and are essential for viewers to interpret the data
correctly.

D: Ensuring data latency is minimized.

Feedback: Minimizing latency can be important but is more related to data transmission rather than
plotting.

*E: Choosing appropriate scales for data representation.

Feedback: Selecting suitable scales is vital to ensure data is represented accurately and trends are visible.

Question 215 - checkbox, shuffle, partial credit, easy difficulty

Question category: Module: Working With Data Models

Which of the following are considered data formats?

*A: CSV

Feedback: Correct! CSV is a common data format for storing tabular data.

*B: XML

Feedback: Correct! XML is a format used for data representation.

C: Relational Schema

Feedback: Incorrect. A relational schema is a data model, not a data format.

*D: JSON

Feedback: Correct! JSON is widely used as a data format for data interchange.

E: Entity-Relationship Diagram

Feedback: Not quite. An entity-relationship diagram is a type of data model.


Question 216 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

What is a key characteristic of a data stream?

*A: It processes data in real-time as it arrives.

Feedback: Correct! Data streams process data in real-time, allowing for immediate analysis and action.

B: It stores data for long-term analysis.

Feedback: Not quite. While storage is important, a key characteristic of a data stream is real-time
processing.

C: It only processes structured data.

Feedback: Incorrect. Data streams can process both structured and unstructured data.

D: It requires batch processing for analysis.

Feedback: Try again. Data streams are typically analyzed in real-time, rather than through batch
processing.

Question 217 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

What is the primary benefit of analyzing real-time streaming data from a weather station?

A: To predict future weather patterns accurately.

Feedback: Predicting future weather patterns requires more than just analyzing real-time data; it
involves using historical data and complex models.

B: To enhance the accuracy of weather sensors immediately.

Feedback: The accuracy of weather sensors is determined by their design and calibration, not the
analysis of streaming data.

*C: To react promptly to sudden weather changes.

Feedback: Correct! Analyzing streaming data helps in promptly reacting to changes such as sudden
storms or temperature drops.

D: To permanently store all weather data for long-term analysis.


Feedback: While storing data is important, streaming data analysis primarily focuses on immediate
insights rather than long-term storage.

Question 218 - multiple choice, shuffle, easy difficulty

Question category: Module: Working With Data Models

What is the primary distinction between a data format and a data model?

*A: A data format is concerned with storage, while a data model describes the structure.

Feedback: Correct! Data formats focus on how data is stored, whereas data models define the structure.

B: A data format describes data structure, while a data model is about storage.

Feedback: Not quite. Remember, data formats are about storage and data models define the structure.

C: Data formats and data models are essentially the same.

Feedback: This is incorrect. Data formats and data models serve different purposes.

D: Data models determine file size, while data formats determine data usage.

Feedback: Incorrect. File size and data usage are not the primary concerns of data models and formats.

You might also like