Big Data With Hadoop and Spark_2023-25
Big Data With Hadoop and Spark_2023-25
Course Objectives:
1. Understand the concepts of big data and its impact on businesses.
2. Learn about the Hadoop ecosystem and its components, such as HDFS, MapReduce, Hive,
Pig, and HBase.
3. Gain hands-on experience with Hadoop and Spark, including writing applications and running
them on a cluster.
4. Learn about the different types of big data analytics and how to use Hadoop and Spark to
perform them.
5. Be able to apply Hadoop and Spark to real-world big data problems.
Course Outcomes:
At the end of the course students will be able to:
CO1: Advanced Data Processing Skills.
CO2: Scalable Data Storage and Management.
CO3: Distributed Computing Concepts.
CO4: Real-time Data Processing.
UNIT-I:
UNIT-II
Page 1 of 4
(b) HDFS Architecture
(c) Components of HDFS - NameNode, DataNode, SecondayNameNode
(d) Components of HDFS - NameNode, DataNode, SecondayNameNode
(e) HDFS Features - Fault Tolerance, Horizontal Scaling, Data Replication, Rack Awareness
(f) Anatomy of a file write on HDFS
(g) Anatomy of a file read on HDFS
(h) Hands on with Hadoop HDFS, WebUI and Linux Terminal Commands
(i) HDFS File System Operations
(j) Name Node Metadata, File System Namespace, NameNode Operation,
(k) Data Block Split
(l) Benefits of Data Block Approach
(m) Topology, Data Replication Representation
(n) HDFS Programming Basics – Java API
(o) Hadoop Configuration API
(p) HDFS API Overview
(q) When Hadoop is not suitable
UNIT-III
Page 2 of 4
(k) CREATE, ALTER, DROP, TRUNCATE, JOINS
(l) SerDe (Serialization / Deserialization)
(m) Partitions and Buckets
(n) Limitations of Hive
(o) SQL vs. Hive
(p) Different Formats like Avro, Parquet and ORC
Page 3 of 4
UNIT-V
Reference Books:
1. The Hadoop for Dummies by Dirk deRoos, Paul C. Zikopoulos, Roman B. Melnyk, Bruce
Brown, Rafael Coss.
2. Hadoop MapReduce Cookbook, Srinath Perera, Thilina Gunarathne.
Page 4 of 4