BDA R22 Question Bank
BDA R22 Question Bank
Subject Code(s):
DESCRIPTIVE QUESTION BANK
UNIT-1
Q..N
DESCRIPTION OF QUESTION MARKS CO PO BTL
O
a State the responsibilities of Data Scientist? 1 1 1 2
1 b Discuss Evolution of Big Data 1 1 1 2
c Illustrate various Terminologies used in Big Data Environments 10 1 1 3
a Distinguish between Traditional Business Intelligence and Big Data 1 1 2 4
2 b What is Distributed System? 1 1 1 2
c Demonstrate classification of Analytics 10 1 2 4
a Why sudden hype around Big Data Analytics 1 1 7 2
b Discuss advantages of a Shared Nothing Architecture
3 1 1 3 2
c Demonstrate classification of Digital Data 10 1 1 4
a Define Big Data Analytics. 1 1 1 2
b State CAP theorem 1 1 1 2
4
c What Big Data Analytics isn’t? 5 1 2 2
d Explain top challenges facing Big Data 5 1 2 4
a What is a Data Warehouse? 1 1 1 2
b Discuss characteristics of Big Data 1 1 1 2
5
c List out traits that needs to be honed to play the role of data scientist 5 1 2 4
d Explain the challenges of Big Data 5 1 2 4
a What is Data Science? 1 1 1 2
b Write the differences between parallel and distributed systems 1 1 1 1
6 c Describe the coexistence of Big Data and Data Warehouse 5 1 3 4
d Explain greatest challenges that prevent business from capitalizing
5 1 3 4
in Big Data
UNIT-2
Q..N MARK C BT
DESCRIPTION OF QUESTION PO
O S O L
a What are ordered factors? 1 5 3 2
1 2 3 4 5 6 7 8 9 10
D B C D C B D D C A
11 12 13 14 15 16 17 18 19 20
rows, traditional velocity real- structure patterns insights integration professionals informed
columns time d
UNIT-II
Multiple Choice Questions
1. Which of the following is a key feature of Hadoop?
A) Real-time processing B) Batch processing C) Limited scalability D) Single-node architecture
2. What is one of the key advantages of using Hadoop?
A) High-cost B) Scalability C) Limited data storage D) Single point of failure
3. Which version of Hadoop introduced YARN (Yet Another Resource Negotiator)?
A) Hadoop 1.x B) Hadoop 2.x C) Hadoop 3.x D) Hadoop 4.x
4. Which of the following components is NOT part of the Hadoop ecosystem?
A) HDFS B) MapReduce C) SQL Server D) Hive
5. What is the primary need for Hadoop in modern data processing?
A) To handle small datasets B) To manage large volumes of unstructured data
C) To replace traditional databases D) To perform real-time analytics
6. In comparison to RDBMS, Hadoop is better suited for:
A) Structured data B) Real-time transactions
C) Unstructured and semi-structured data D) Small data sets
7. Which component of Hadoop is responsible for storing large datasets in a distributed manner?
A) HDFS B) YARN C) MapReduce D) HBase
8. Which of the following companies is known for its Hadoop distribution called Cloudera?
A) Microsoft B) IBM C) Oracle D) Cloudera Inc.
9. The history of Hadoop can be traced back to which project?
A) Apache Storm B) Apache Spark C) Apache Nutch D) Apache Flink
10. Which of the following best describes HDFS (Hadoop Distributed File System)?
A) Centralized file storage system B) Distributed file storage system
C) Relational database management system D) Cloud-based storage system
Fill-in-the-Blank Questions
11. Hadoop is designed to handle ________ volumes of data.
12. The main advantage of Hadoop is its ability to __________ across many nodes.
13. Hadoop 2.x introduced a new resource management layer called ________.
14. The Hadoop ecosystem includes various tools such as Pig, Hive, and ________.
15. A key need for Hadoop arises from the increasing amount of ________ data generated by modern
applications.
16. Unlike RDBMS, Hadoop is capable of processing both structured and ________ data.
17. The primary storage system used in Hadoop is called ________.
18. One of the distribution challenges in computing is the efficient management of ________ data across
multiple nodes.
19. The development of Hadoop was inspired by the distributed computing research papers published by
________.
20. HDFS stores data by breaking it down into smaller blocks and distributing them across multiple
________.
Answers:
1 2 3 4 5 6 7 8 9 10
B B B C B C A D A B
11 12 13 14 15 16 17 18 19 20
large scale YARN HBase unstructured unstructure HDFS large Google nodes
d
UNIT-III
Multiple Choice Questions
1 In Hadoop, which component is responsible for splitting input data into smaller chunks for
processing?
A) Reducer B) Combiner C) Partitioner D) Mapper
2. What is the primary function of a Reducer in Hadoop MapReduce?
A) Splitting input data B) Sorting and shuffling data
C) Processing intermediate data and producing the final output
D) Combining intermediate data
3. Which of the following is a key advantage of NoSQL databases?
A) Fixed schema B) Scalability C) Limited data types D) SQL compatibility
4. Which type of NoSQL database is designed to store data in a document-oriented format?
A) Key-Value Store B) Document Store C) Column Store D) Graph Database
5. What is one of the main use cases of NoSQL databases in the industry?
A) Storing relational data B) Managing large-scale, unstructured data
C) Performing complex joins D) Ensuring data integrity
6. Which term refers to a modern database management system that combines the benefits of
traditional SQL and NoSQL?
A) MySQL B) NewSQL C) PostgreSQL D) SQL++
7. In Hadoop MapReduce, which component is responsible for distributing the intermediate data
output from the Mappers?
A) Reducer B) Partitioner C) Combiner D) Mapper
8. Which of the following best describes a Combiner in Hadoop MapReduce?
A) A mini-reducer that performs local aggregation B) A component that partitions data
C) A component that splits input data D) A tool for data storage
9. What is a key difference between SQL and NoSQL databases?
A) SQL databases are schema-less B) NoSQL databases use fixed schemas
C) SQL databases support ACID transactions D) NoSQL databases do not scale well
10. NewSQL databases are designed to provide the scalability of NoSQL databases while
maintaining the ________ of SQL databases.
A) flexibility B) consistency C) simplicity D) security
Fill-in-the-Blank Questions
11. Hadoop MapReduce is a programming model used for processing ________ data.
12. The Mapper in Hadoop processes input data and generates intermediate ________ pairs.
13. A Reducer in Hadoop takes intermediate key-value pairs and produces the ________ output.
14. Combiners in Hadoop MapReduce are used to perform local ________ of intermediate data to
optimize performance.
15. Partitioners in Hadoop ensure that the intermediate data from the Mappers is evenly distributed to
the ________.
16. NoSQL databases are designed to handle ________ and semi-structured data.
17. A key advantage of NoSQL databases is their ability to scale ________.
18. In industry, NoSQL databases are often used for applications that require high ________ and
performance.
19. SQL databases are known for their support of ACID transactions, while NoSQL databases are
known for their ________ consistency model.
20. NewSQL databases aim to combine the scalability of NoSQL with the transactional ________ of
SQL.
Answers:
1 2 3 4 5 6 7 8 9 10
D C B B B B B A C B
11 12 13 14 15 16 17 18 19 20
larg key- final aggregatio Reducers unstructured horizontall scalability eventual consistency
e value n y
UNIT-IV
Multiple Choice Questions:
1. What is MongoDB?
A) A relational database management system B) A NoSQL database
C) An in-memory database D) A traditional SQL database
2. Why is MongoDB necessary in modern applications?
A) To handle small amounts of structured data B) To manage large volumes of unstructured data
C) To replace all relational databases D) To perform real-time analytics only
3. Which term is used in MongoDB to refer to a single entry in a collection?
A) Row B) Column C) Document D) Table
4. In MongoDB, which data type is used to store binary data?
A) String B) Binary data C) ObjectID D) Array
5. What is the equivalent of a table in RDBMS in MongoDB?
A) Row B) Collection C) Field D) Database
6. Which of the following is NOT a data type supported by MongoDB?
A) String B) Integer C) Blob D) Boolean
7. In MongoDB, which command is used to retrieve data from a collection?
A) SELECT B) FIND C) GET D) FETCH
8. Which term describes the unique identifier for a document in MongoDB?
A) Primary Key B) RowID C) ObjectID D) DocumentID
9. Which MongoDB query language feature is used to update existing documents?
A) UPDATE B) SET C) MODIFY D) UPDATE_ONE
10. What is a key advantage of using MongoDB over traditional RDBMS?
A) Fixed schema B) Horizontal scalability C) Limited data storage
D) High transaction overhead
Fill-in-the-Blank Questions
11. MongoDB is a NoSQL database that stores data in ________ format.
12. A collection in MongoDB is analogous to a ________ in a relational database.
13. The data type used to store dates in MongoDB is called ________.
14. In MongoDB, the equivalent of a row in RDBMS is referred to as a ________.
15. MongoDB uses a ________ model to store and manage data.
16. The command to insert a new document into a collection in MongoDB is ________.
17. In MongoDB, fields within a document are similar to ________ in RDBMS.
18. A key advantage of using MongoDB is its ability to handle ________ data.
19. The query language used by MongoDB is called the MongoDB ________ Language.
20. The ObjectID in MongoDB is a unique identifier for each ________ in a collection.
Answers:
1 2 3 4 5 6 7 8 9 10
B B C B B C B C D B
11 12 13 14 15 16 17 18 19 20
JSON table Date document documen insertOne columns unstructured Query document
t
UNIT-V
Multiple Choice Questions:
1 2 3 4 5 6 7 8 9 10
B D B B A C A A A A
11 12 13 14 15 16 17 18 19 20
c() if-else matrix() table types hist() categorica text ggplot2 apply
l