BigData and Hadoop - Syllabus
BigData and Hadoop - Syllabus
Course Objectives
This course introduces the fundamental concepts of cloud and lays a strong foundation of Apache Hadoop (Big
data framework).
1. The HDFS file system, MapReduce frameworks are studied in detail.
2. Hadoop tools like Hive, and Hbase, which provide interface to relational databases, are also covered as
part of this course work.
3. Analyzing data with unix tools
4. Sorting. Map side and Reduce side joins.
5. Implementation. Java and Mapreduce clients
Introduction to Big Data. What is Big Dat?. Why Big Data is Important. Meet Hadoop. Data.
Data Storage and Analysis. Comparison with other systems. Grid Computing. A brief history of
UNIT I
Hadoop. Apache Hadoop and the Hadoop Eco System. Linux refresher; VMWare Installation of
Hadoop.
The design of HDFS. HDFS conceptsCommand-linene interface to Hadoop Distributed File
System (HDFS). Hadoop File systems. Interfaces. Java Interface to Hadoop. Anatomy of a file
UNIT II
read. Anatomy of a file writes. Replica placement and Coherency Model. Parallel copying with
distcp, Keeping an HDFS cluster balanced.
Introduction. Analyzing data with Unix tools. Analyzing data with hadoop. Java
MapReduce classes (new API). Data flow, combiner functions, Running a distributed Map
UNIT III Reduce Job. Configuration API. Setting up the development environment. Managing
configuration. Writing a unit test with MRUnit. Running a job in local job runner. Running on a
cluster. Launching a job. The Map Reduce Web UI.
Classic Map Reduce. Job submission. Job Initialization. Task Assignment. Task execution
UNIT IV .Progress and status updates. Job Completion. Shuffle and sort on Map and reducer side.
Configuration tuning. Map Reduce Types. Input formats. Output formats, Sorting. Map side and
Reduce side joins.
The Hive Shell. Hive services. Hive clients. The meta store. Comparison with traditional
databases. HiveQl. Hbasics. Concepts. Implementation. Java and Mapreduce clients. Loading
UNIT V
data, web queries.
Text books:
1. Tom White, Hadoop, “The Definitive Guide”, 3rd Edition, O’Reilly Publications, 2012
2. Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch , “Understanding Big Data:
Analytics for Enterprise Class Hadoop and Streaming Data”, McGraw Hill Osborne Media; 1 edition, 2011
REFERENCES:
1. http://www.cloudera.com/content/cloudera-content/clouderadocs/HadoopTutorial/CDH4/Hadoop-
Tutorial.html
2. https: //www.ibm.com / developerworks / community / blogs / Susan VisserEditionntry/flash book
understanding big data analytics for enterprise class hadoop and streaming data? langen
Course outcome [After undergoing the course, students will be able to:]
1. Understand the fundamentals of Big cloud and data architectures.
2. Understand HDFS file structure and Mapreduce frameworks, and use them to solve complex problems, which
require massive computation power.
3. Use relational data in a Hadoop environment, using Hive and Hbase tools of the Hadoop Ecosystem.
4. Understand The Hive Shell.
5. Understand the Comparison with traditional databases.