Model question paper _Big data_2024-25_kca022
Model question paper _Big data_2024-25_kca022
UNIT -1
1. What is Big Data? why we need to analyze big data?
2. What are the benefits of big data?
3. Discuss challenges under big data .
4. How big data analytics will be useful in the development of smart cities?
5. Discuss big data in terms of volume and velocity.
6. What are the different types of big data technologies?
7. List and discuss various dimensions of Big data. Explain in detail industry
examples of big data?
8. Explain big data and Hadoop open source technology
9. Discuss and differentiate structured, unstructured and semi - structured data.
Give proper examples.
10. Explain 4 ‘V’s of big data with suitable example. Discuss how big data
analytics can be useful in the development of smart transports.
11. Discuss on how cloud and Big data related to each other?
12. How does Hadoop system analyze data? Explain your answer with example.
UNIT 2
13. How does Hadoop work? What are the advantages of Hadoop? What are the
different modes in which Hadoop can be installed and what is use of each
mode from application and developer point of view?
14. List down the tools related with Hadoop
15. What is Map reducing? Explain with neat sketch about the processing of a job
in Hadoop?
16. Explain the stages of map reduce program execution?
17. Explain the anatomy of map reduce job run ?
18. Define the role of combiner and partitioner in a map reduce application?
19. Specify the role of job tracker & task tracker in HDFS.
20. Explain shuffle and sort phase and reducer phase in map reduce?
21. Explain the role of driver code, mapper code and reducer code with in a map
reduce program model by a suitable example,
22. Explain briefly about the input and output format in mapreduce?
23. What are the various operational modes of Hadoop cluster configuration and
explain in detail about configuring/installing Hadoop in fully distributed
mode.
24. Explain about the implementation of map reduce concept with a small
example.
25. Discuss role of JobTracker and TaskTracker in processing data with Hadoop.
26. What is MapReduce? Explain working of various phases of MapReduce with
word count example.
27. Explain Hadoop architecture and its component with proper diagram
28. Discuss the different types and formats of Map Reduce with exam
29.
UNIT -3
30. Draw and explain HDFS architecture. Explain the function of name node and
data node? what is a secondary name node? Is it a substitute of Namenode?
31. How does HDFS ensure data integrity in a Hadoop cluster?
32. State the purpose of Hadoop pipes
33. Show on how a client read and write data in HDFS, Give an example code.
34. Discuss the design of Hadoop Distributed FileSystem(HDFS) and concept in
detail
35. Explain Avro file based data structures in detail?
36. Write the working procedure of HDFS and also explain the features of HDFS.
37. Give commands with appropriate arguments to perform data transfer between
local file system and HDFS.
38. With suitable block diagram explain architecture of HDFS.
39. Discuss role of Data node and Name node in HDFS
UNIT -4
1. Write a short note on NOSQL database. Compare & Contrast NOSQL
relational Database.
2. Describe about the graph database and schemaless database?
3. Explain the aggregate data models?List four advantages and disadvantages of
aggregate oriented database?
4. Explain master slave and peer to peer replication in detail?
5. List down the entities of YARN. What are the limitations of classic map
reduce? Compare classic map reduce with YARN? Discuss Hadoop YARN in
detail
6. Distinguish between the old and new versions of Hadoop ApI for Map
Reduce framework.
7. What is NoSQL database? Discuss key characteristics and advantages of
NoSQL database
8. Write a short note on Hadoop Ecosystem.
9. What is transformation and actions in Spark? Explain with example.
10. Discuss limitations of Hadoop and how it is overcome in Apache Spark.
11. Write a short note on Spark stack. Give brief explanation of each component.
12. What is RDD? Explain role of RDD in Spark.
13. Differentiate SQL and NoSQL databases. What are the applications of
NoSQL database?
14. Discuss Spark Streaming with suitable example such as analyzing tweets from
Twitter
15. What is MongoDB? Discuss important features of MongoDB.
16. Discuss different types of NoSQL databases with proper example.
17. Explain basic CRUD operations with example in MongoDB
18. Explain database, collection, document and fields with respect to MongoDB.
Also give its equivalent term in RDBMS.
19. Explain use of aggregate function in MongoDB with suitable example
20. List down the entity of YARN.
UNIT-5
21. Write a note on the use of Zookeeper?
21. Write in detail about Hbase data model and Pig data model?
22. What is the necessity of PIG Latin?
23. What are the components of pig execution environment?
24. Explain about the various data types supported by pig in its data model with
an example.
25. Explain the storage mechanism in Hbase? write a query to create a table in
hbase
26. Explain the metastore in HIVE.
27. Explain the architecture of HIVE with neat sketch.
28. What are views in hive .
29. what is the difference between internal and external tables in hive .
30. Explain about various data types supported by HiveQL with an example.
31. Define the various file formats supported by HIVE.Discuss the queries
involved in hive data definition?
32. Explain the Cassandra Data Model with examples? How Cassandra integrated
with Hadoop?
33. Explain the operators supported by Pig w.r.t. data access, transformations and
debugging operations.
34. How are Pig programs packaged and explain the modes of running a pig
script with a neat sketch.
35. Write Example Hive Queries for Natural Join and outer-Join .
36. What is HBase? Differentiate HBase and RDBMS.Explain H base and their
data model and implementations.
37. Explain in detail about the Hive data manipulation, queries, data definition
and data types.
38. Write a short note on Apache Pig.
39. What is HiveQL? Explain various statements in HiveQL with example.