We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
Big Data (KCS-061)
Course Outcome ( CO)
Bloom’s
Knowledge
Level (KL)
At the end of course , the student will be able to:
co1
Demonstrate knowledge of Big Data Analytics concepts and its
applications in business.
K1, we
coz
Demonstrate funetions and components of Map Reduce Framework
and HDFS.
Ki, K2
co3
Discuss Daia Management concepts in NoSQL environment,
K6
co4
Explain process of developing Map Reduce based distributed
processing applications.
K2, KS
cos
Explain process of developing applications using HBASE, Hive, Pig
ete.
K2, KS
DETAILED SYLLABUS
3-0-0
| Topic
Proposed
Lecture
Tatroduction to Big Data: Types of digital data, history of Big Data
innovation, introduction to Big Data platform, drivers for Big Data, Big,
Data architecture and characteristics, 5 Vs of Big Data, Big Data
technology components, Big Data importance and applications, Big
Data features ~ Security, Compliance, auditing and protection, Big Data
| privacy and ethics, Big Data Analytics, Challenges of conventional
| systems, intelligent data analysis, nature of data, analytic processes and
tools, analysis vs reporting, modern data analytic tools.
06
Hadoop: History of Hadoop, Apache Hadoop, the Hadoop Distributed
File System, components of Hadoop, data format, analyzing data with
Hadoop, scaling out, Hadoop streaming, Hadoop pipes, Hadoop Echo
System.
Map Reduce: Map Reduce framework and basics, how Map Reduce
works, developing a Map Reduce application, unit tests with MR unit,
test data and local tests, anatomy of a Map Reduce job run, failures, job
scheduling, shuffle and sort, task execution, Map Reduce types, input
formats, output formats, Map Reduce features, Real-world Map
Reduce.
08
| HDFS (Hadoop Distributed File System): Design of HDFS, HDFS
concepts, benefits and challenges, file sizes, block sizes and block
abstraction in HDFS, data replication, how does HDFS store, read, and
write files, Java interfaces to HDES, command line interface, Hadoop
file system interfaces, data flow, data ingest with Flume and Scoop,
Hadoop archives, Hadoop I/O: compression, serialization, Avro and
file-based data structures.
Hadoop Environment: Setting up a Hadoop cluster, cluster
specification, cluster setup and installation, Hadoop configuration,
security in Hadoop, administering Hadoop, HDFS
monitoring & maintenance, Hadoop benchmarks, Hadoop in the cloud
Hadoop Eco System and YARN: Hadoop ecosystem components,
schedulers, fair and capacity, Hadoop 2.0 New Features - NameNode
high availability, HDFS federation, MRv2, YARN, Running MRv! in
YARN.NoSQL Databases: Introduction to NoSQL =
MongoDB: Introduction, data types, creating, updating and deleing
documents, querying, introduction to indexing, capped collections
Spark: Installing spark, spark applications, jobs, stages and tasks,
Resilient Distributed
Databases, anatomy of a Spark job run, Spark on YARN
SCALA: Introduction, classes and objects, basic types and operators,
built-in control structures, functions and closures, inheritance.
5 Hadoop Eco System Frameworks: Applications on Big Data 09
using Pig, Hive and HBase
Pig - Introduction to PIG, Execution Modes of Pig, Comparison of
with Databases, Grunt, Pig Latin, User Defined Functions,
Data Processing operators,
Hive - Apache Hive architecture and installation, Hive shell, Hive
services, Hive metastore, comparison with traditional databases,
HiveQL, tables, querying data and user-defined functions, sorting
and aggregating, Map Reduce scripts, joins & subqueries.
HBase — Hbase concepts, clients, example, Hbase vs RDBMS,
advanced usage, schema design, advance indexing, Zookeeper —
| how it helps in monitoring a cluster, how to build applications with
Zookeeper.
IBM Big Data strategy, introduction to Infosphere, BigInsights
and Big Sheets, introduction to Big SQL.
Text books and References:
1. Michael Minelli, Michelle Chambers, and Ambis
Business Intelligence and Analytic Trends for Tod:
2. Big-Data Black Book, DT Editorial Services, Wiley
3. Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, “Understanding Big Data
Analyties for Enterprise Class Hadoop and Streaming Data”, McGrawHill.
4, Thomas Erl, Wajid Khattak, Paul Buhler, “Big Data Fundamentals: Concepts, Drivers and Techniques”,
Prentice Hall
5. Bart Baesens “Analytics in a Big Data World: The Essential Guide to Data Science and its Applications
(WILEY Big Data Series)”, John Wiley & Sons
6. ArshdeepBahga, Vijay Madisetti, “Big Data Science & Analytics: A HandsOn Approach “, VPT
7. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, CUP
8. Tom White, "Hadoop: The Definitive Guide", O'Reilly.
9, Eric Sammer, "Hadoop Operations", O'Reilly.
10. Chuck Lam, “Hadoop in Action”, MANNING Publishers
11. Deepak Vohra, “Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and
Tools”, Apress
12. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilly
13. Lars George, "HBase: The Definitive Guide", O'Reilly.
14, Alan Gates, "Programming Pig", OReilly.
15, Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer
16. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with
Advanced Analytics”, John Wiley & sons
17, Glenn J. Myatt, “Making Sense of Data”, John Wiley & Sons
18. Pete Warden, “Big Data Glossary”, O'Reilly
Dhiraj, "Big Data, Big Analytics: Emerging
's Businesses", WileyABES ENGINEERING COLLEG, GHAZIABAD
Department of Computer Science
Lecture Plan
Fan
couse Cneatyy |___ Evaluation Scheme coane
Program | Sem | “ampe_ | CourseCode TNT" | SemionalMaris | y. | “Tou | credit
eT | TA | Total
as vi |Bigbata | Kcs-o61 |3] 0 | 0 | 30 | 20 | s0 | 100 | 150 3
fan Date of Total leet Date of
i eo otal lectures te of
Tey] Mamet Faculty) Vertiat Head | Commencement | planned | Consusion
Dr Pankaj Kumar 5
3] 0 | o Sharma co Peal | ano 40
Mr, Ashwin Pert
Schule Name of the Topic as given in the Syllabus KL ciacr
wri Introduction to Big Data K2
a Types of digital data, history of Big Data innovation, |,
introduction to Big Data platform, drivers for Big Data
12 | Big Data architecture and characteristics, 5 Vs of Big Data, Big | >
Data technology components,
Big Data importance and applications, Big Data features — 1
13 Security, Compliance, auditing and protection, Big Data privacy | K2 ‘Assign
and ethics. QUIZ
7 ig Data Analytics, Challenges of conventional systems, a
intelligent data analysis 7
L5___| Nature of data, analytic processes and tools Ki
16 ‘Analysis vs reporting, modern data analytic tools. K2
UNIT=U Basic Structural Modeling and behavioral Modeling K2,K3,
17 | Hadoop: History of Hadoop, Apache Hadoop, the Hadoop |,»
. Distributed File System
‘components of Hadoop, data format, analyzing data with
1s K2
Hadoop.
19 _| Scaling out, Hadoop streaming K2
Lio Hadoop pipes, Hadoop Echo System. K3
Li | Map Reduce: Map Reduce framework and basies, how Map| > acs
Reduce works, developing a Map Reduce application,
Li2 Unit tests with MR unit, test data and local tests, k2
‘Anatomy of a Map Reduce job run, failures, job scheduling,
LI3 5 Ko
shuffle and sort, task execution,
Lia __| Map Reduce types. input formats, output formats, Map Reduce =
features, Real-world Map Reduce.
Ge a[2
Sessional Test
Object Oriented Analysis ;
NIE Structured analysis and structure design (SA/SD) we
HIDFS (Hadoop Distributed File System): Design of HDFS,
Lis _ | HDFS concepts, benefits and challenges, file sizes, block sizes | K2
and block abstraction in HDFS, data replication,
L16 how does HDFS store, read, and write files, K2
Java interfaces to HDFS, command line interface, Hadoop file
L17__| system interfaces, data flow, data ingest with Flume and Scoop, | __K3
Hadoop archives,
Lis Hadoop I/O: compression, serialization, Avro and file-based K2
data structures.
Lig | Hadoop Environment: Setting up a Hadoop cluster, cluster |
specification,
[20 _| Cluster setup and installation, Hadoop configuration, security in| 5
Hadoop _ _ _ |
Lan Administering Hadoop, HDFS K2
monitoring & maintenance, Hadoop benchmarks, Hadoop in
L2 K2
the cloud
UNIT-1V ‘Hadoop Eco System
Hadoop Eco System and YARN: Hadoop ecosystem
13 : K2
components, schedulers, fair and capacity,
124 | Hadoop 2.0 New Features - NameNode high availability K2
L25 HDFS federation, MRv2 K2,K3
126 _| YARN, Running MRv1 in YARN. K2, K3
| 127 | NoSQt Databases: introduction to NoSQL K2, K3
| ‘MongoDB: Introduction, data types, creating, updating and aoe
| Las deleing documents, querying, introduction to indexing, capped | 2, K3 Quiz4
collections
Spark: Installing spark, spark applications, jobs, stages and
[eo [Set aa Se On tO. BS SS 2g
130 | Databases, anatomy of a Spark job run, Spark on VARN K2,K3
SCALA: Introduction, classes and objects, basic types and
131 ‘operators, built-in control structures, functions and closures, 3, Ka
inheritance.
Test
UNIT-V, Hadoop eco-system Framework
L32__ | Hadoop Feo System Frameworks: Applications on Big Data aS
ig Pig. and HBase
133 Pig - Introduction to PIG, Execution Modes of Pig ‘K2, K3
134 | Comparison of Pig with Databases, Grunt, Pig Latin K2,K3_| gkm
135 __| User Defined Functions, Data Processing operators KIS | pesag met]
Hive - Apache Hive architecture and installation, Hive shell, :
136 | Hive services, Hive metastore, comparison with traditional | K2,K3 | GAS
databases.
HiveQL, tables, querying data and user-defined functions,
137 | sorting and aggregating, Map Reduce scripts, joins & | K2,K3
subqueries.
138 | HBase — Hbase concepts, clients, example, Hbase vs RDBMS, | _K2, K3
aw oa[ advanced usage, schema design, advance indexing
Zookeeper ~ how it helps in monitoring a cluster, how to build
applications with Zookeeper.
139 K2,K3
IBM Big Data strategy, introduction to Infosphere, Biginsights
140 | and Big sheets, introduction to Big SQL.
K2,K3
PRE-UNIVERSITY EXAMINATION.
KL- Bloom's Knowledge Level (K;, Ke, Ka, Ks, Ks, Ke)
K,~ Remember K2— Understand Ks Apply K4~ Analyze K5— Evaluate Ke— Create
Text Books:
TI. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging
Business Intelligence and Analytic Trends for Today's Businesses", Wiley
‘2. Big-Data Black Book, DT Editorial Services, Wiley
T3. Ditk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, “Understanding Big Data
‘Analytics for Enterprise Class Hadoop and Streaming Data’, McGrawHiil
T4, Thomas Erl, Wajid Khattak, Paul Buhler, “Big Data Fundamentals: Concepts, Drivers and
Techniques”, Prentice Hall.
TS. Bart Baesens “Analytics in a Big Data World: The Essential Guide to Data Science and its.
Applications (WILEY Big Data Series)”, John Wiley & Sons
6. ArshdeepBahga, Vijay Madisetti, “Big Data Science & Analytics: A HandsOn Approach “, VPT
17. Anand Rajaraman and Jeffrey David Ullman, “ ‘of Massive Datasets”, CUP
T8. Tom White, "Hadoop: The Definitive Guide", O'Reilly.
9, Eric Sammer, "Hadoop Operations", O'Reilly.
Reference Books:
RI. Chuck Lam, “Hadoop in Action”, MANNING Publishers.
R2. Deepak Vohra, “Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and
Tools”, Apress
R3. E, Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilly
RA, Lars George, "HBase: The Definitive Guide", O'R
RS. Alan Gates, "Programming Pig", O'R
R6. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer
R7. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with
Advanced Analytics”, John Wiley & sons
R8. Glenn J, Myatt, “Making Sense of Data”, John Wiley & Sons
R9. Pete Warden, “Big Data Glossary”, O'Reilly
Web references: Cloud-Scale Analytics | Microsoft Azure
What is = ices (AWS)
Deca Ze von onte ohcokeruoo wn cet