0% found this document useful (0 votes)
75 views

Bigdata MCQ QA Part2

Uploaded by

VIDHYA HK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Bigdata MCQ QA Part2

Uploaded by

VIDHYA HK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1. What does the following code print?

Val numbersRdd= sc.parallelize(List(4,99,2,348,99,1))


Val numbers = numbersRdd.takeOrdered(2)
Println(numbers.toList)
i) List(2)
ii) List(99,1)
iii) List(4,99)
iv) List(1,2)

2) When does Apache Spark evaluates RDD?


a) Upon Action

3) Spark Core’s fast scheduling capability to perform


Spark streaming

4) Metadata is defined as:


Data about data

5) Yarn stands for


Yet another resource negotiator

6) Combiner increases the amount of work to be done by the reducer by reducing the network
traffic.
a) True
b) False

7) Default partitioner of mapreduce!


a) Hash Pratitioner

8) Not a scheduling option available in YARN


a) Balanced Scheduler

9) Which is the format used by kafka property file?


a) Key-value pairs

10) Kafka can serve as a ____________ for distributed systems.


a) External commit-log
11) Select the appropriate option in accordance to python
a) Runtime Program construction
b) Automatic memory management
c) Wide Portability
d) All the above

12) What happens if a local variable exists with the same name as the global variable you want
to access?
a) Global variable is shadowed

13) What is the return type of the bellow snippet?


Val ds = Seq(1,2,3).toDS()
i) Dataset

14) What is the zookeeper in Kafka?


a) Co-ordination service

15) How does hadoop make the system more resilient?


a) It uses an effective firewall and antivirus
b) It keeps each computing resourses isolated
c) It keeps multiple copies of data
d) It uploads data to a cloud for backup

16) Which of the following genres does hadoop produce?


a) Relational Database Management system
b) JAX-RS
c) Java Message service
d) Distributed file system

17) In order to change settings within Hive session for dynamic partitioning we should set the
parameter: Hive.exec.dynamic.partition.mode=strict
a) True
b) False

18) Default port of Hive Thrift is


a) 10000
19) How can BigData combat frauds and help prevent frauds?
a) All the above
b) Analyse all the data
c) Use predictive analystics
d) Detect fraud in realtime

20) Is it possible to delete a kafka topic when the broker is down?(multiple options)
a) Yes, we can delete it
b) Deletion will be recorded by zookeeper
c) No, topics are default category feed
d) No, topics cannot be deleted when the broker is unavailable

21) Why is Apache spark considered as an integrated solution for processing on all lambda
architecture?
a) It contains Spark SQL for SQL and structured data processing
b) It contains Spark Streaming that enables scalable, high-throughput, fault-tolerant
stream processing of live data structures
c) All the above
d) It contains Spark Core that includes high-level API and an optimized engine that
supports general execution graphs

22) Each version of data within a cell adds a versioning information through a
a) Version Value
b) Keyvalue
c) KeyNo
d) VersionNo

23) Which command fetches the contents of a row or a cell


a) Get

24) The underlying data is not deleted from HDFS when an HIVE external table is dropped
a) True
25) What does the following code print?
Val mammals=List(“Lion”,”Dolphin”,”Whale”)
Val mammalsRdd = sc.parallelize(mammals)
Val mammalsLengthRdd= mammalsRdd.map{(m:String)=>
m.length
}
mammalsLengthRdd.collect().foreach(println)
a) 16
b) 4,7,5
c) Lion,Dolphin,Whale
d) None

26) True about broadcast variable?


a) Readonly variables

27) Which of the following is action?


a) countByValue()
b) distinct()
c) intersection(other-dataset)
d) union(dataset)

28) {0:f},{1:2f},{2:05.2f}.format(1.23456,1.23456,1.23456)
Output?
a) ‘1.234560,1.22345,1.23’
b) Error
c) ‘1.234560,1.23460,01.23’
d) No output

29) Build tool used in scala?


a) Ant
b) Sbt
c) Gradle
d) All the above

30) How would the data received from a GPS satellite and the Web be classified as?
a) Structured
b) Unstructured
c) Both structured and unstructured
d) Semi structured

31) Correct syntax for parameter substitution using cmd?


a) {%declare | %default} param_name param_value
b) {%declare | %default} param_name param_value cmd
c) Pig {-param param_name = param_value –param_file file_name} [debug | -dryrun]
script
d) All the above
32) In the mapreduce framework, map and reduce functions can be run in any order. Do you
Agree?
a) Yes, because in functional programming, the order of execution is not important
b) Yes, because in functions use KVP as input and output, order is not important
c) No, because the output of the reduce function is the input of the map function
d) No, because the output of the map function is the input of the reduce function

33) Mapper and Reducer implementations can use the _____ to report progress or just indicate
that they are alive.
a) Reporter
b) Partitioner
c) OutputCollector
d) All the above

34) ______ is the utility which allows users to create and run jobs with any executable as the
mapper and/or the Reducer.
a) Hadoop Streaming
b) Hadoop Strdata
c) None
d) Hadoop Stream

35) Hive supports random read and writes


a) True
b) False

36) The partitioning of a table in Hive create more:


a) File under database name
b) Files under the table name
c) Subdirectories under the database name
d) Subdirectories under the table name

37) Mapreduce code can be written in various languages other than java
a) True
b) False

38) What does the code print?


Var y: Option[String] = None
y.get
a) Null
b) Code throws exception
c) None
d) None of the options

39) In Hadoop 2.x release HDFS federation means


Allow a cluster to scale by adding more namenodes

40) What is the default storage of cache()?


Memory_only

41) Result of below snippet!


List(1,2,3).zip(List(“a”,”b”,”c”))
List((1,a),(2,b),(3,c))

42) Hive supports compact index


a) True
b) False

43) To list tables with prefix ‘page’ in Hive, we use the syntax:
a) SHOW TABLES ‘page.*’
44) Can multiple clients write into an HDFS file concurrently?
a) True
b) False
45) Node manager runs services on the node to check its health and report the same to
resource manager
a) True
b) False
46) Thrift server in hive doesn’t allow external clients to interact with Hive over a network
a) True
b) False
47) HIVE also support custom extentions written in
a) Python
b) Scala
c) Ruby
d) Java
48) Once compilation and optimization completes, the executer executes the task
a) True
b) False
49) Client node load the data on the hadoop cluster
a) True
b) False
50) Which of the following is a platform for constructing data flows for extract, transform and
load(ETL) processing and analysis of large datasets
a) Oozie
b) Pig latin
c) Hive
d) Sqoop
51) Map operator trees are executed on mapper
a) True
b) False
52) _____ manager service feature monitors dozens of service health and performance metrics
about the services and role instances running on your cluster
a) Google
b) Amazon
c) None
d) Cloudera
53) Name node is monitored and upgraded in a ____ transition
a) Secure mode
b) Service mode
c) Safe mode
d) Boot mode
54) Check pointing is a feature for any non stateful transformation
a) True
b) False
55) Which are the three major parallel computing platforms
a) Network,cloud,multitenancy
b) Iaas, Paas,Saas
c) Clusters or grids, MPP, HPC
d) Database, sql, network
56) Hive is designed mainly for
a) None
b) OLAP,OLTP
c) OLTP
d) OLAP
57) Hive shell can run in both non-interactive mode and interactive mode
a) True
b) False
58) Which of the following jobs are optimized for scalability but not latency
a) Hive
b) Hadoop
c) Oozie
d) Pig
59) Which of the following is not an output format in hadoop
a) KepInputFormat
b) SequenceFileInputFormat
c) ByteInputFormat
d) TextInputFormat
60) Choose the correct statement:
a) Action operation evaluates and returns a new value
b) Transformations return a single value
c) An action function is called on a RDD object, all the data processing queries are
computed at that time and the result value is returned in new RDD
61) Hadoop cluster establishes the connection to the client using HTTP protocol
a) True
b) False
62) HAdoop is a framework that works with a variety of related tools. Common cohorts include
a) Mapreduce, Hummer, Iguama
b) Mapreduce, Heron and trumpet
c) Mapreduce, Mysql and google apps
d) Mapreduce, Hive, and Hbase
63) Data locality feature in Hadoop means ______
a) Relocate the data from one node to another
b) Distribute the data across multiple nodes
c) Store the same data across multiple nodes
d) Co-locate the data with the computing nodes

You might also like