100% found this document useful (1 vote)
248 views

Question 1: Your Answer

This document contains 35 multiple choice questions about big data and Apache Hadoop. For each question, the user provides an answer, and the response indicates whether the answer is correct or not. The questions cover topics like Apache Spark, MapReduce, Hadoop ecosystem tools, Zookeeper, and more.

Uploaded by

Marwa Benayed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
248 views

Question 1: Your Answer

This document contains 35 multiple choice questions about big data and Apache Hadoop. For each question, the user provides an answer, and the response indicates whether the answer is correct or not. The questions cover topics like Apache Spark, MapReduce, Hadoop ecosystem tools, Zookeeper, and more.

Uploaded by

Marwa Benayed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Question 

1
Which type of cell can be used to document and
comment on a process in a Jupyter notebook?
Your answer

  A.  Kernel

  B.  Markdown

  C.  Code

  D.  Output

That's rIght!

Question 2
Where does the unstructured data of a project reside
in Watson Studio?
Your answer

  A.  Object Storage

  B.  Database

  C.  Wrapper

  D.  Tables

That's rIght!
 Feedback
 Questions
 Test Overview
 Help
Big Data Engineer v2 Explorer Award for Students (2018)
Time Taken: 42:01
CloseMail Feedback

Question 3
Which Watson Studio offering used to be available
through something known as IBM Bluemix?
Your answer

  A.  Watson Studio Cloud

  B.  Watson Studio Desktop

  C.  Watson Studio Local

  D.  Watson Studio Business


That's rIght!

Question 4
What is the architecture of Watson Studio centered
on?
Your answer

  A.  Projects

  B.  Collaborators

  C.  Analytic Assets

  D.  Data Assets

That's rIght!

Question 5
Before you create a Jupyter notebook in Watson
Studio, which two items are necessary?
Your answer

  A.  URL

  B.  Project

  C.  File
Your answer

  D.  Scala

  E.  Spark Instance

That's rIght!

Question 6
Which Spark Core function provides the main
element of Spark API?
Your answer

  A.  Mesos

  B.  MLlib

  C.  YARN

  D.  RDD

That's rIght!

Question 7
Which statement about Apache Spark is true?
Your answer

  A.  It runs on Hadoop clusters with RAM drives configured on each DataNode.

  B.  It features APIs for C++ and .NET.

  C.  It is much faster than MapReduce for complex applications on disk.

  D.  It supports HDFS, MS-SQL, and Oracle.

That's rIght!

Question 8
Under the YARN/MRv2 framework, which daemon is
tasked with negotiating with the NodeManager(s) to
execute and monitor tasks?
Your answer

  A.  JobMaster

  B.  ApplicationMaster

  C.  ResourceManager

  D.  TaskManager

That's rIght!

Question 9
What is the preferred replacement for Flume?
Your answer

  A.  NiFi

  B.  Druid

  C.  Hortonworks Data Flow

  D.  Storm

That's rIght!

Question 10
What is an example of a Key-value type of NoSQL
datastore?
Your answer

  A.  Neo4j

  B.  REDIS
Your answer

  C.  Sesame

  D.  MongoDB

That's rIght!

Question 11
Under the YARN/MRv2 framework, the Scheduler
and ApplicationsManager are components of which
daemon?
Your answer

  A.  ApplicationMaster

  B.  TaskManager

  C.  ScheduleManager

  D.  ResourceManager

That's rIght!

Question 12
What are two security features Apache Ranger
provides?
Your answer

  A.  Auditing

  B.  Authorization

  C.  Authentication

  D.  Availability

That's rIght!

Question 13
Which is the java class prefix for the MapReduce v1
APIs?
Your answer

  A.  org.apache.mr

  B.  org.apache.mapreduce

  C.  org.apache.hadoop.mapred

  D.  org.apache.hadoop.mr

That's rIght!

Question 14
How can a Sqoop invocation be constrained to only
run one mapper?
Your answer

  A.  Use the -m 1 parameter.

  B.  Use the --single parameter.

  C.  Use the -mapper 1 parameter.

  D.  Use the --limit mapper=1 parameter.

That's rIght!

Question 15
Which data encoding format supports exact storage
of all data in binary representations such as
VARBINARY columns?
Your answer

  A.  RCFile
Your answer

  B.  Parquet

  C.  SequenceFiles

  D.  Flat

That's rIght!

Question 16
Under the MapReduce v1 programming model,
which optional phase is executed simultaneously
with the Shuffle phase?
Your answer

  A.  Split

  B.  Reduce

  C.  Combiner

  D.  Map

That's rIght!

Question 17
Which two are valid watches for ZNodes in
ZooKeeper?
Your answer

  A.  NodeChildrenChanged

  B.  NodeExpired

  C.  NodeDeleted
Your answer

  D.  NodeRefreshed

That's rIght!

Question 18
Which NoSQL datastore type began as an
implementation of Google's BigTable that can store
any type of data and scale to many petabytes?
Your answer

  A.  HBase

  B.  Riak

  C.  MemcacheD

  D.  CouchDB

That's rIght!

Question 19
Which two factors in a Hadoop cluster increase
performance most significantly?
Your answer

  A.  solid state disks

  B.  parallel reading of large data files

  C.  data redundancy on management nodes

  D.  immediate failover of failed disks

  E.  large number of small data files


Your answer

  F.  high-speed networking between nodes

That's rIght!

Question 20
Which statement is true about MapReduce v1 APIs?
Your answer

  A.  MapReduce v1 APIs are implemented by applications which are largely independent of the execution environment.

  B.  MapReduce v1 APIs cannot be used with YARN.

  C.  MapReduce v1 APIs provide a flexible execution environment to run MapReduce.

  D.  MapReduce v1 APIs define how MapReduce jobs are executed.

That's rIght!

Question 21
Apache Spark provides a single, unifying platform
for which three of the following types of operations?
Your answer

  A.  batch processing

  B.  ACID transactions

  C.  machine learning

  D.  transaction processing

  E.  record locking

  F.  graph operations

That's rIght!

Question 22
Which Hadoop ecosystem tool can import data into
a Hadoop cluster from a DB2, MySQL, or other
databases?
Your answer

  A.  Sqoop

  B.  Accumulo

  C.  HBase

  D.  Oozie

That's rIght!

Question 23
Under the YARN/MRv2 framework, the JobTracker
functions are split into which two daemons?
Your answer

  A.  ApplicationMaster

  B.  TaskManager

  C.  JobMaster

  D.  ScheduleManager

  E.  ResourceManager

That's rIght!

Question 24
Which three programming languages are directly
supported by Apache Spark?
Your answer

  A.  Python

  B.  Java

  C.  .NET

  D.  Scala

  E.  C++

  F.  C#

That's rIght!

Question 25
Which component of the Apache Ambari
architecture integrates with an organization's LDAP
or Active Directory service?
Your answer

  A.  Ambari Alert Framework

  B.  REST API

  C.  Authorization Provider

  D.  Postgres RDBMS

That's rIght!

Question 26
Which statement accurately describes how
ZooKeeper works?
Your answer

  A.  There can be more than one leader server at a time.

  B.  Clients connect to multiple servers at the same time.

  C.  All servers keep a copy of the shared data in memory.

  D.  Writes to a leader server will always succeed.

That's rIght!

Question 27
Which description characterizes a function provided
by Apache Ambari?
Your answer

  A.  A wizard for installing Hadoop services on host servers.

  B.  Moves information to/from structured databases.

  C.  Moves large amounts of streaming event data.

  D.  A messaging system for real-time data pipelines.

That's rIght!

Question 28
What are two services provided by ZooKeeper?
Your answer

  A.  Loading bulk data into an Hadoop cluster.

  B.  Providing distributed synchronization.

  C.  Authenticating and auditing user access.

  D.  Maintaining configuration information.


That's rIght!

Question 29
Which three are a part of the Five Pillars of Security?
Your answer

  A.  Resiliency

  B.  Audit

  C.  Administration

  D.  Data Protection

  E.  Speed

That's rIght!

Question 30
If a Hadoop node goes down, which Ambari
component will notify the Administrator?
Your answer

  A.  Ambari Metrics System

  B.  Ambari Wizard

  C.  REST API

  D.  Ambari Alert Framework

That's rIght!

Question 31
Under the MapReduce v1 programming model, what
happens in a "Reduce" step?
Your answer

  A.  Worker nodes process pieces in parallel.


Your answer

  B.  Data is aggregated by worker nodes.

  C.  Worker nodes store results on their own local file systems.

  D.  Input is split into pieces.

That's rIght!

Question 32
Which component of the Spark Unified Stack allows
developers to intermix structured database queries
with Spark's programming language?
Your answer

  A.  Java

  B.  Spark SQL

  C.  MLlib

  D.  Mesos

That's rIght!

Question 33
What are three IBM value-add components to the
Hortonworks Data Platform (HDP)?
Your answer

  A.  Big Index

  B.  Big Data

  C.  Big Match
Your answer

  D.  Big Replicate

  E.  Big SQL

  F.  Big YARN

That's rIght!

Question 34
Which component of an Hadoop system is the
primary cause of poor performance?
Your answer

  A.  RAM

  B.  disk latency

  C.  network

  D.  CPU

That's rIght!

Question 35
What are two ways the command-line parameters for
a Sqoop invocation can be simplified?
Your answer

  A.  Include the --options-file command line argument.

  B.  Run Sqoop using the vi editor.

  C.  Use the --import-command line argument.

  D.  Place the commands in a file.

That's rIght!
Question 36
Which component of the Hortonworks Data Platform
(HDP) is the architectural center of Hadoop and
provides resource management and a central
platform for Hadoop applications?
Your answer

  A.  HDFS

  B.  MapReduce

  C.  HBase

  D.  YARN

That's rIght!

Question 37
Which hardware feature on an Hadoop datanode is
recommended for cost efficient performance?
Your answer

  A.  RAID

  B.  LVM

  C.  SSD

  D.  JBOD

That's rIght!

Question 38
Hadoop uses which two Google technologies as its
foundation?
Your answer

  A.  HBase

  B.  Ambari

  C.  MapReduce

  D.  Google File System

  E.  YARN

That's rIght!

Question 39
What are two primary limitations of MapReduce v1?
Your answer

  A.  TaskTrackers can be a bottleneck to MapReduce jobs

  B.  Number of TaskTrackers limited to 1,000

  C.  Scalability

  D.  Resource utilization

  E.  Workloads limited to MapReduce

That's rIght!

Question 40
Which computing technology provides Hadoop's
high performance?
Your answer

  A.  Online Analytical Processing

  B.  RAID-0
Your answer

  C.  Parallel Processing

  D.  Online Transactional Processing

That's rIght!

Question 41
What command is used to list the "magic"
commands in Jupyter?
Your answer

  A.  %list-all-magic

  B.  %lsmagic

  C.  %dirmagic

  D.  %list-magic

That's rIght!

Question 42
What is the first step in a data science pipeline?
Your answer

  A.  Exploration

  B.  Manipulation

  C.  Acquisition

  D.  Analytics

That's rIght!

Question 43
Why might a data scientist need a particular kind of
GPU (graphics processing unit)?
Your answer

  A.  To perform certain data transformation quickly.

  B.  To display a simple bar chart of data on the screen.

  C.  To collect video for use in streaming data applications.

  D.  To input commands to a data science notebook.

That's rIght!

Question 44
Which is an advantage that Zeppelin holds over
Jupyter?
Your answer

  A.  Notebooks can be used by multiple people at the same time.

  B.  Notebooks can be connected to big data engines such as Spark.

  C.  Zeppelin is able to use the R language.

  D.  Users must authenticate before using a notebook.

That's rIght!

Question 45
What is a "magic" command used for in Jupyter?
Your answer

  A.  Extending the core language with shortcuts.

  B.  Parsing and loading data into a notebook.

  C.  Running common statistical analyses.

  D.  Autoconfiguring data connections using a registry.


That's rIght!

Question 46
Which directory permissions need to be set to allow
all users to create their own schema?
Your answer

  A.  700

  B.  755

  C.  666

  D.  777

That's rIght!

Question 47
You are creating a new table and need to format it
with parquet. Which partial SQL statement would
create the table in parquet format?
Your answer

  A.  STORED AS parquetfile

  B.  CREATE AS parquetfile

  C.  STORED AS parquet

  D.  CREATE AS parquet

That's rIght!

Question 48
What is an advantage of the ORC file format?
Your answer

  A.  Big SQL can exploit advanced features


Your answer

  B.  Efficient compression

  C.  Data interchange outside Hadoop

  D.  Supported by multiple I/O engines

That's rIght!

Question 49
You need to enable impersonation. Which two
properties in the bigsql-conf.xml file need to be
marked true?
Your answer

  A.  bigsql.alltables.io.doAs

  B.  $BIGSQL_HOME/conf

  C.  DB2COMPOPT

  D.  bigsql.impersonation.create.table.grant.public

  E.  DB2_ATS_ENABLE

That's rIght!

Question 50
Using the Java SQL Shell, which command will
connect to a database called mybigdata?
Your answer

  A.  ./jsqsh mybigdata

  B.  ./java tables
Your answer

  C.  ./jsqsh go mybigdata

  D.  ./java mybigdata

That's rIght!

Question 51
Which two commands would you use to give or
remove certain privileges to/from a user?
Your answer

  A.  GRANT

  B.  SELECT

  C.  LOAD

  D.  REVOKE

  E.  INSERT

That's rIght!

Question 52
You need to determine the permission setting for a
new schema directory. Which tool would you use?
Your answer

  A.  umask

  B.  Kerberos

  C.  HDFS

  D.  GRANT

That's rIght!
Question 53
What is the default directory in HDFS where tables
are stored?
Your answer

  A.  /apps/hive/warehouse/

  B.  /apps/hive/warehouse/data

  C.  /apps/hive/warehouse/schema

  D.  /apps/hive/warehouse/bigsql

That's rIght!

Question 54
Which statement best describes a Big SQL database
table?
Your answer

  A.  A directory with zero or more data files.

  B.  A data type of a column describing its value.

  C.  A container for any record format.

  D.  The defined format and rules around a delimited file.

That's rIght!
Question 55
Which command creates a user-defined schema
function?
Your answer

  A.  CREATE FUNCTION
Your answer

  B.  ALTER MODULE ADD FUNCTION

  C.  TRANSLATE FUNCTION

  D.  ALTER MODULE PUBLISH FUNCTION

That's rIght!

Question 56
Which definition best describes RCAC?
Your answer

  A.  It grants or revokes certain directory privileges.

  B.  It limits access by using views and stored procedures.

  C.  It grants or revokes certain user privileges.

  D.  It limits the rows or columns returned based on certain criteria.

That's rIght!

Question 57
What are Big SQL database tables organized into?
Your answer

  A.  Hives

  B.  Files

  C.  Directories

  D.  Schemas

That's rIght!

Question 58
You have a distributed file system (DFS) and need to
set permissions on the the /hive/warehouse
directory to allow access to ONLY the bigsql user.
Which command would you run?
Your answer

  A.  hdfs dfs -chmod 770 /hive/warehouse

  B.  hdfs dfs -chmod 700 /hive/warehouse

  C.  hdfs dfs -chmod 755 /hive/warehouse

  D.  hdfs dfs -chmod 666 /hive/warehouse

That's rIght!

Question 59
When connecting to an external database in a
federation, you need to use the correct database
driver and protocol. What is this federation
component called in Big SQL?
Your answer

  A.  User mapping

  B.  Data source

  C.  Wrapper

  D.  Nickname

That's rIght!

Question 60
Which Big SQL feature allows users to join a
Hadoop data set to data in external databases?
Your answer

  A.  Impersonation

  B.  Grant/Revoke privileges

  C.  Fluid query

  D.  Integration

That's rIght!

You might also like