0% found this document useful (0 votes)
2 views

Assignment 2

The document outlines an assignment for Big Data at Sunderdeep Engineering College, detailing guidelines for submission, including the requirement to answer 10 specific questions related to HDFS, YARN, NoSQL databases, Spark, Scala, and Hadoop ecosystem frameworks. It emphasizes originality, citation of references, and includes a due date of May 23, 2025. Each question requires in-depth explanations and comparisons of various concepts and technologies in the field of Big Data.

Uploaded by

db880074
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment 2

The document outlines an assignment for Big Data at Sunderdeep Engineering College, detailing guidelines for submission, including the requirement to answer 10 specific questions related to HDFS, YARN, NoSQL databases, Spark, Scala, and Hadoop ecosystem frameworks. It emphasizes originality, citation of references, and includes a due date of May 23, 2025. Each question requires in-depth explanations and comparisons of various concepts and technologies in the field of Big Data.

Uploaded by

db880074
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Sunderdeep Engineering College, Ghaziabad

Assignment BIG DATA(BCS-061)

Faculty Name: Ms. Vandana Sharma


Guidelines for Assignment Submission :

 Answer all 10 questions. Include diagrams and tables where necessary.


 Work must be original. Cite any references used.
 Due Date: 23rd May 2025.

1. Explain the design of HDFS. What are the key concepts behind HDFS, such as block
sizes, data replication, and block abstraction? How does HDFS ensure fault tolerance and
scalability?
2. Describe how HDFS stores, reads, and writes files. How does HDFS achieve high
throughput when handling large datasets? Explain the data flow in HDFS from the
client’s perspective.
3. What are the key differences between the Hadoop File System (HDFS) command line
interface and Java interfaces? How can you interact with HDFS using both the command
line and Java?
4. Describe the steps involved in setting up a Hadoop cluster. What are the main
configurations that need to be considered during Hadoop installation? How do you ensure
security in a Hadoop environment?
5. Explain the role of YARN in the Hadoop ecosystem. How does YARN improve the
resource management in Hadoop 2.0? What are the main differences between MRv1 and
MRv2?
6. What are the key characteristics of NoSQL databases? Explain how MongoDB fits into
the NoSQL landscape. How do you create, update, delete, and query documents in
MongoDB?
7. Describe the concept of Resilient Distributed Datasets (RDDs) in Spark. How do Spark
applications, jobs, stages, and tasks work in the context of distributed data processing?
8. Provide an overview of the basic syntax and concepts in Scala. How does Scala support
object-oriented and functional programming? Describe the use of functions, closures, and
inheritance in Scala.
9. Compare and contrast the three Hadoop ecosystem frameworks: Pig, Hive, and HBase.
How do they differ in terms of data processing, querying, and storage? Provide examples
of their use cases.
10. What is Zookeeper and how does it help in monitoring a Hadoop cluster? Explain its role
in coordination and configuration management for distributed applications in a cluster
environment.

You might also like