0% found this document useful (0 votes)
8 views14 pages

BDA R22 Question Bank

The document is a descriptive question bank for a B.Tech course in Big Data Analytics, covering various units with questions related to data science, Hadoop, NoSQL, MongoDB, and R programming. Each unit contains multiple types of questions, including descriptive, multiple choice, and fill-in-the-blank, along with their respective marks and learning outcomes. The document serves as a comprehensive guide for students to prepare for examinations in the subject.

Uploaded by

subscriptionvkb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

BDA R22 Question Bank

The document is a descriptive question bank for a B.Tech course in Big Data Analytics, covering various units with questions related to data science, Hadoop, NoSQL, MongoDB, and R programming. Each unit contains multiple types of questions, including descriptive, multiple choice, and fill-in-the-blank, along with their respective marks and learning outcomes. The document serves as a comprehensive guide for students to prepare for examinations in the subject.

Uploaded by

subscriptionvkb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Course: B.

Tech Year / Semester: III / II


Subject Name: Big Data Analytics Branch Name(s): CSE (DS) and AI&DS

Subject Code(s):
DESCRIPTIVE QUESTION BANK

UNIT-1

Q..N
DESCRIPTION OF QUESTION MARKS CO PO BTL
O
a State the responsibilities of Data Scientist? 1 1 1 2
1 b Discuss Evolution of Big Data 1 1 1 2
c Illustrate various Terminologies used in Big Data Environments 10 1 1 3
a Distinguish between Traditional Business Intelligence and Big Data 1 1 2 4
2 b What is Distributed System? 1 1 1 2
c Demonstrate classification of Analytics 10 1 2 4
a Why sudden hype around Big Data Analytics 1 1 7 2
b Discuss advantages of a Shared Nothing Architecture
3 1 1 3 2
c Demonstrate classification of Digital Data 10 1 1 4
a Define Big Data Analytics. 1 1 1 2
b State CAP theorem 1 1 1 2
4
c What Big Data Analytics isn’t? 5 1 2 2
d Explain top challenges facing Big Data 5 1 2 4
a What is a Data Warehouse? 1 1 1 2
b Discuss characteristics of Big Data 1 1 1 2
5
c List out traits that needs to be honed to play the role of data scientist 5 1 2 4
d Explain the challenges of Big Data 5 1 2 4
a What is Data Science? 1 1 1 2
b Write the differences between parallel and distributed systems 1 1 1 1
6 c Describe the coexistence of Big Data and Data Warehouse 5 1 3 4
d Explain greatest challenges that prevent business from capitalizing
5 1 3 4
in Big Data
UNIT-2

Q..NO DESCRIPTION OF QUESTION MARKS CO PO BTL


a What is Data Replication. 1 2 1 2
1 b What are the main advantages of using Hadoop? 1 2 2 2
c Explain about HDFS in detail. 10 2 1 4
a What is Data Pipeline. 1 2 1 2
2 b What are some real-world use cases for Hadoop? 1 2 7 2
c Discuss about Hadoop Ecosystem. 10 2 1 4
a State the replica placement strategy 1 2 3 1
3 b What are the key features of Hadoop? 1 2 2 2
c Demonstrate various Versions of Hadoop 10 2 1 4
a Can you provide specific case studies of companies using Hadoop? 1 2 5 1
b Can you list the versions of Hadoop that have been released so far? 1 2 1 1
4
c Explain five HDFS commands 5 2 1 4
d Summarize Hadoop Distributions 5 2 1 4
a What are the core components of Hadoop's architecture? 1 2 2 2
b What are some of the popular Hadoop distributions available? 1 2 5 2
5
c Describe the anatomy of File Read in HDFS 5 2 3 4
d Explore Distributed Computing Challenges 5 2 6 4
a What is the Hadoop ecosystem, and what components does it
include? 1 2 1 2

6 b Can you provide a brief history of Hadoop's development? 1 2 1 1


c Elaborate the anatomy of File Write in HDFS 5 2 3 4
d Compare RDBMS and Hadoop 5 2 2 3
UNIT-3

Q..NO DESCRIPTION OF QUESTION MARKS CO PO BTL


a Sketch MapReduce Programming phases and daemons 1 3 1 2
b List out key features of NOSQL 1 3 2 2
1
c State the use of NoSQL in industry and also compare SQL, NoSQL
10 3 2 4
and New SQL
a Why is NoSQL preferred for handling big data applications? 1 3 2 2
2 b Where to use NOSQL. 1 3 3 2
c Classify NoSQL databases. 10 3 1 4
a What does the term "NewSQL" refer to? 1 3 1 2
3 b What is MapReduce in Hadoop? 1 3 1 2
c Describe working model of MapReduce Programming with Example. 10 3 3 4
a What is NOSQL. 1 3 1 2
b Give Characteristics of NOSQL 1 3 2 2
4
c Compare SQL and NoSQL. 5 3 2 3
d Explain about Combiner and Portioner of Mapper task. 5 3 3 4
a Name two industries where NoSQL databases are widely used. 1 3 7 1
b List the key phases of a MapReduce job. 1 3 1 2
5
c Enumerate Advantages of NoSQL. 5 3 2 3
d Demonstrate MapReduce Daemons and their interaction diagram. 5 3 3 4
a Mention Popular NoSQL Vendors. 1 3 5 1
b What is the role of the Mapper in the MapReduce framework? 1 3 1 2
6
c Why NoSQL Explain. 5 3 2 2
d With a neat sketch, Explain Mapper and Reducer. 5 3 3 4
UNIT-4

Q..NO DESCRIPTION OF QUESTION MARKS CO PO BTL


a What is a document in MongoDB? 1 4 1 2
1 b Write the syntax for limit function. 1 4 1 1
c Explain the concept of Arrays in MongoDB 10 4 2 2
a Write syntax to drop database in MongoDB 1 4 1 1
2 b State the process of replication in MongoDB 1 4 2 4
c Demonstrate various Data types with suitable examples. 10 4 3 3
a Write the syntax for skip function. 1 4 1 1
3 b State the process of sharding in MongoDB 1 4 2 4
c Why MongoDB. Explain? 10 4 2 4
a What is CRUD? 1 4 1 2
b What is the difference between BSON and JSON in MongoDB? 1 4 2 4
4
c Write a program to find factorial of a number. 5 4 3 3
d Discuss MongoImport and mongoexport Methods. 5 4 1 2
a Give the statement for creating Collection. 1 4 1 1
b Write syntax for creating database in MongoDB 1 4 1 1
5
c Discuss the syntax of insert and save methods with examples. 5 4 1 3
d Demonstrate the syntax of update and remove method with examples. 5 4 3 3
a What is MongoDB 1 4 1 2
b Write the syntax for count function. 1 4 1 1
6 c Explain Aggregate and MapReduce functions in MongoDB with 4
5 3 3
Example
d Explain Cursors in MongoDB with Example 5 4 3 3
UNIT-5

Q..N MARK C BT
DESCRIPTION OF QUESTION PO
O S O L
a What are ordered factors? 1 5 3 2

1 b What are relational operators in R? 1 5 4 2

c Illustrate R apply family functions with suitable examples. 10 5 2 3

a State the structure of switch statement in R. 1 5 3 1

b How do you extend a data frame in R? 1 5 1 1


2 Explain how to create scatter plot, bar chart, pie chart, histogram, box
c plot, and line chart visualizations and write the significance of each 10 5 3 4
visualization chart.

a What is chart legend? 1 5 3 2


b List the basic data types available in R. 1 5 3 1
3
Explain concept of recursive function, nested function with examples
c and demonstrate function scoping with suitable example for each. 10 5 4 3

a What is scatter plot? 1 5 2 4


b What is function scoping in R? 1 5 3 2
Explain how sorting and merging are handled in data frames with
4 c examples. 5 5 4 2

Explain the concept of matrix sub setting in R. provide example of


d each sub setting operation on matrix. 5 5 4 4

a What is a named list in R?, convert list to vector. 1 5 3 2

b What kind of data is best represented by a line graph? 1 5 3 1


5 How do factors and data frames complement each other in R?
c Explain with an example. 5 5 3 3

d Explain control statements and operators in R with clear examples. 5 5 4 4


How do you load package in R? , name two popular packages used
a for data manipulation. 1 5 3 2

b List R apply family functions. 1 5 2 2


6
c Describe the key operations that can be performed on vectors in R. 5 5 1 4
Explain how to merge two lists with example and demonstrate how to
d loop over a list in R using example. 5 5 5 3
UNIT-I
Multiple Choice Questions:

1. Which of the following is NOT a type of digital data?


A) Structured data B) Semi-structured data C) Unstructured data D) Analog data
2. What is Big Data?
A) A small dataset B) A large volume of structured and unstructured data
C) Only structured data D) A traditional database
3. Which of the following best describes the evolution of Big Data?
A) Transition from traditional databases to cloud computing B) Increasing use of small datasets
C) Transition from manual data entry to automated data collection D) All of the above
4. Which of the following is a key difference between Traditional Business Intelligence and Big
Data?
A) Data volume B) Data variety C) Data velocity D) All of the above
5. How do Big Data and Data Warehouse coexist?
A) They cannot coexist
B) Big Data is used for historical analysis, while Data Warehouses are used for real-time analysis
C) Data Warehouses store historical data, while Big Data platforms handle large volumes of real-time
data D) Both are used interchangeably
6. What is Big Data Analytics?
A) The process of analyzing small datasets
B) The process of examining large and varied data sets to uncover hidden patterns, correlations, and
other insights
C) A type of traditional business intelligence tool D) None of the above
7. Why has there been a sudden hype around Big Data Analytics?
A) Increased data generation from various sources
B) Advances in technology
C) The potential to gain valuable insights and competitive advantage
D) All of the above
8. What is NOT a classification of analytics?
A) Descriptive analytics B) Predictive analytics C) Prescriptive analytics
D) Hypothetical analytics
9. What is one of the greatest challenges that prevent businesses from capitalizing on Big Data?
A) Lack of data B) High cost of data storage
C) Difficulty in data integration and management
D) Absence of skilled professionals
10. Why is Big Data Analytics important?
A) It helps in making informed decisions B) It only benefits large corporations
C) It is a trend with no real value D) It reduces the amount of data generated
Fill in the Blanks:
11. Structured data is organized into a predefined format, often using ________ and ________.
12. Big Data refers to the large volume of data that cannot be processed effectively using ___________
methods
13. The definition of Big Data typically includes three key characteristics: volume, variety, and
_________.
14. Traditional Business Intelligence focuses on historical data analysis, whereas Big Data can handle
both historical and ___________ data.
15. In the context of data storage, Data Warehouses are optimized for ________ data, while Big Data
platforms are designed for handling large-scale, diverse data.
16. Big Data Analytics involves examining large datasets to uncover hidden ________, correlations, and
other insights.
17. The sudden hype around Big Data Analytics is due to the potential for gaining valuable ________
and competitive advantage.
18. One of the greatest challenges that prevent businesses from capitalizing on Big Data is the difficulty
in data ________ and management.
19. A significant challenge facing Big Data is the absence of skilled ________ to analyze and interpret
the data.
20. Big Data Analytics is important because it helps businesses make ________ decisions.
Answers:

1 2 3 4 5 6 7 8 9 10

D B C D C B D D C A

11 12 13 14 15 16 17 18 19 20

rows, traditional velocity real- structure patterns insights integration professionals informed
columns time d
UNIT-II
Multiple Choice Questions
1. Which of the following is a key feature of Hadoop?
A) Real-time processing B) Batch processing C) Limited scalability D) Single-node architecture
2. What is one of the key advantages of using Hadoop?
A) High-cost B) Scalability C) Limited data storage D) Single point of failure
3. Which version of Hadoop introduced YARN (Yet Another Resource Negotiator)?
A) Hadoop 1.x B) Hadoop 2.x C) Hadoop 3.x D) Hadoop 4.x
4. Which of the following components is NOT part of the Hadoop ecosystem?
A) HDFS B) MapReduce C) SQL Server D) Hive
5. What is the primary need for Hadoop in modern data processing?
A) To handle small datasets B) To manage large volumes of unstructured data
C) To replace traditional databases D) To perform real-time analytics
6. In comparison to RDBMS, Hadoop is better suited for:
A) Structured data B) Real-time transactions
C) Unstructured and semi-structured data D) Small data sets
7. Which component of Hadoop is responsible for storing large datasets in a distributed manner?
A) HDFS B) YARN C) MapReduce D) HBase
8. Which of the following companies is known for its Hadoop distribution called Cloudera?
A) Microsoft B) IBM C) Oracle D) Cloudera Inc.
9. The history of Hadoop can be traced back to which project?
A) Apache Storm B) Apache Spark C) Apache Nutch D) Apache Flink
10. Which of the following best describes HDFS (Hadoop Distributed File System)?
A) Centralized file storage system B) Distributed file storage system
C) Relational database management system D) Cloud-based storage system
Fill-in-the-Blank Questions
11. Hadoop is designed to handle ________ volumes of data.
12. The main advantage of Hadoop is its ability to __________ across many nodes.
13. Hadoop 2.x introduced a new resource management layer called ________.
14. The Hadoop ecosystem includes various tools such as Pig, Hive, and ________.
15. A key need for Hadoop arises from the increasing amount of ________ data generated by modern
applications.
16. Unlike RDBMS, Hadoop is capable of processing both structured and ________ data.
17. The primary storage system used in Hadoop is called ________.
18. One of the distribution challenges in computing is the efficient management of ________ data across
multiple nodes.
19. The development of Hadoop was inspired by the distributed computing research papers published by
________.
20. HDFS stores data by breaking it down into smaller blocks and distributing them across multiple
________.
Answers:

1 2 3 4 5 6 7 8 9 10

B B B C B C A D A B

11 12 13 14 15 16 17 18 19 20

large scale YARN HBase unstructured unstructure HDFS large Google nodes
d

UNIT-III
Multiple Choice Questions
1 In Hadoop, which component is responsible for splitting input data into smaller chunks for
processing?
A) Reducer B) Combiner C) Partitioner D) Mapper
2. What is the primary function of a Reducer in Hadoop MapReduce?
A) Splitting input data B) Sorting and shuffling data
C) Processing intermediate data and producing the final output
D) Combining intermediate data
3. Which of the following is a key advantage of NoSQL databases?
A) Fixed schema B) Scalability C) Limited data types D) SQL compatibility
4. Which type of NoSQL database is designed to store data in a document-oriented format?
A) Key-Value Store B) Document Store C) Column Store D) Graph Database
5. What is one of the main use cases of NoSQL databases in the industry?
A) Storing relational data B) Managing large-scale, unstructured data
C) Performing complex joins D) Ensuring data integrity
6. Which term refers to a modern database management system that combines the benefits of
traditional SQL and NoSQL?
A) MySQL B) NewSQL C) PostgreSQL D) SQL++
7. In Hadoop MapReduce, which component is responsible for distributing the intermediate data
output from the Mappers?
A) Reducer B) Partitioner C) Combiner D) Mapper
8. Which of the following best describes a Combiner in Hadoop MapReduce?
A) A mini-reducer that performs local aggregation B) A component that partitions data
C) A component that splits input data D) A tool for data storage
9. What is a key difference between SQL and NoSQL databases?
A) SQL databases are schema-less B) NoSQL databases use fixed schemas
C) SQL databases support ACID transactions D) NoSQL databases do not scale well
10. NewSQL databases are designed to provide the scalability of NoSQL databases while
maintaining the ________ of SQL databases.
A) flexibility B) consistency C) simplicity D) security
Fill-in-the-Blank Questions
11. Hadoop MapReduce is a programming model used for processing ________ data.
12. The Mapper in Hadoop processes input data and generates intermediate ________ pairs.
13. A Reducer in Hadoop takes intermediate key-value pairs and produces the ________ output.
14. Combiners in Hadoop MapReduce are used to perform local ________ of intermediate data to
optimize performance.
15. Partitioners in Hadoop ensure that the intermediate data from the Mappers is evenly distributed to
the ________.
16. NoSQL databases are designed to handle ________ and semi-structured data.
17. A key advantage of NoSQL databases is their ability to scale ________.
18. In industry, NoSQL databases are often used for applications that require high ________ and
performance.
19. SQL databases are known for their support of ACID transactions, while NoSQL databases are
known for their ________ consistency model.
20. NewSQL databases aim to combine the scalability of NoSQL with the transactional ________ of
SQL.
Answers:

1 2 3 4 5 6 7 8 9 10

D C B B B B B A C B

11 12 13 14 15 16 17 18 19 20

larg key- final aggregatio Reducers unstructured horizontall scalability eventual consistency
e value n y

UNIT-IV
Multiple Choice Questions:

1. What is MongoDB?
A) A relational database management system B) A NoSQL database
C) An in-memory database D) A traditional SQL database
2. Why is MongoDB necessary in modern applications?
A) To handle small amounts of structured data B) To manage large volumes of unstructured data
C) To replace all relational databases D) To perform real-time analytics only
3. Which term is used in MongoDB to refer to a single entry in a collection?
A) Row B) Column C) Document D) Table
4. In MongoDB, which data type is used to store binary data?
A) String B) Binary data C) ObjectID D) Array
5. What is the equivalent of a table in RDBMS in MongoDB?
A) Row B) Collection C) Field D) Database
6. Which of the following is NOT a data type supported by MongoDB?
A) String B) Integer C) Blob D) Boolean
7. In MongoDB, which command is used to retrieve data from a collection?
A) SELECT B) FIND C) GET D) FETCH
8. Which term describes the unique identifier for a document in MongoDB?
A) Primary Key B) RowID C) ObjectID D) DocumentID
9. Which MongoDB query language feature is used to update existing documents?
A) UPDATE B) SET C) MODIFY D) UPDATE_ONE
10. What is a key advantage of using MongoDB over traditional RDBMS?
A) Fixed schema B) Horizontal scalability C) Limited data storage
D) High transaction overhead
Fill-in-the-Blank Questions
11. MongoDB is a NoSQL database that stores data in ________ format.
12. A collection in MongoDB is analogous to a ________ in a relational database.
13. The data type used to store dates in MongoDB is called ________.
14. In MongoDB, the equivalent of a row in RDBMS is referred to as a ________.
15. MongoDB uses a ________ model to store and manage data.
16. The command to insert a new document into a collection in MongoDB is ________.
17. In MongoDB, fields within a document are similar to ________ in RDBMS.
18. A key advantage of using MongoDB is its ability to handle ________ data.
19. The query language used by MongoDB is called the MongoDB ________ Language.
20. The ObjectID in MongoDB is a unique identifier for each ________ in a collection.
Answers:

1 2 3 4 5 6 7 8 9 10

B B C B B C B C D B

11 12 13 14 15 16 17 18 19 20

JSON table Date document documen insertOne columns unstructured Query document
t

UNIT-V
Multiple Choice Questions:

1. What is the primary use of R programming?


A) Web development B) Statistical computing and graphics
C) Mobile app development D) Game development
2. Which operator is used for assignment in R?
A) = B) == C) -> D) <-
3. Which control statement in R is used to iterate over a sequence of elements?
A) if-else B) for loop C) switch D) while loop
4. How do you create a vector in R?
A) vector() B) c() C) list() D) matrix()
5. Which function is used to create a data frame in R?
A) data.frame() B) dataframe() C) frame.data() D) df()
6. What is a factor in R?
A) A numeric vector B) A character vector
C) A categorical data type used for fields that take on a limited number of unique values
D) A matrix
7. Which function in R is used to create a bar plot?
A) barplot() B) plot() C) hist() D) boxplot()
8. What does the apply function in R do?
A) It applies a function over the margins of an array or matrix B) It creates a list
C) It sorts a vector D) It merges data frames
9. Which package in R is commonly used for data visualization?
A) ggplot2 B) dplyr C) tidyr D) lubridate
10. How do you read a CSV file into R?
A) read.csv() B) csv.read() C) read.table() D) read.file()
Fill-in-the-Blank Questions
1. In R, the ________ operator is used to concatenate elements into a vector.
2. Control statements such as ________ are used to control the flow of execution in R.
3. The function used to create a matrix in R is ________.
4. A data frame in R is similar to a ________ in other programming languages.
5. Lists in R can contain elements of different ________.
6. The function ________ is used to create histograms in R.
7. Factors in R are used to handle ________ data types.
8. The read.table() function in R is used to read ________ files.
9. Graphs in R can be created using the ________ package for enhanced data visualization.
10. The sapply function in R is a member of the ________ family of functions.
Answers:

1 2 3 4 5 6 7 8 9 10

B D B B A C A A A A

11 12 13 14 15 16 17 18 19 20

c() if-else matrix() table types hist() categorica text ggplot2 apply
l

You might also like