0% found this document useful (0 votes)

82 views26 pages

PPT 2.2.1

The document outlines the curriculum for a Big Data Analytics course at Chandigarh University, covering key topics such as Big Data frameworks, NoSQL databases, and real-time data processing. It includes an overview of SQL vs. NoSQL, the CAP theorem, and specific technologies like MongoDB and DynamoDB. The course outcomes emphasize understanding Big Data fundamentals, mastering architecture and tools, and implementing real-time data analytics and visualization.

Uploaded by

Ansh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views26 pages

PPT 2.2.1

Uploaded by

Ansh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

•

Computer Science & Engineering

CHANDIGARH UNIVERSITY, MOHALI

BIG Data Analytics

21CSH-471

BY : Urvashi

Assistant Professor (Chandigarh

University)
Contents to be covered in UNIT
2
UNIT-2 Big Data Technologies Contact Hours:15

Chapter-1 Big Data Frameworks: Hadoop, Apache Spark, and their Comparison; NoSQL databases: MongoDB,
Big Data Cassandra, and HBase; Big Data Visualization Tools: Tableau, Power BI, and Zeppelin; Real-Time Big
Frameworks Data Processing: Apache Storm and Flink; Emerging trends in Big Data Technologies.

Overview of SQL vs. NoSQL: Differences and Use Cases; Introduction to Big SQL: Big SQL Features –
Chapter – 2 Scalability, support for structured and unstructured data, Query optimization Techniques in Big
Big SQL and SQL; NoSQL Database Types: Key-Value stores (Redis, DynamoDB), Document stores (MongoDB,
NO SQL CouchDB), Column-family stores (Cassandra, HBase), Graph Databases (Neo4j); Advantages and
Databases limitations of Big SQL and NoSQL.

Chapter – 3 Introduction to IBM Watson: Overview and capabilities of Watson AI, Watson’s role in Big data and
AI in Big Data decision-making; Key Watson Services: Watson Discovery, Watson Studio, and Watson Assistant,
Integration of Watson with Big Data tools; AI and Machine Learning Applications in Big Data:
Natural Language Processing (NLP), Sentiment Analysis and Predictive Analytics.
Course Outcomes

CO1 Understand the Fundamentals of Big Data.

CO2 Master Big Data Architecture and Tools

CO3 Explore the Hadoop Ecosystem and Data Processing Models

CO4 Develop Data Science Skills and Tools

CO5 Implement Real-Time Data Analytics and Visualization

3
NoSQL databases and Big Data Features
Introduction
• NOSQL
• Not only SQL
• Most NOSQL systems are distributed databases
or distributed storage systems
• Focus on semi-structured data storage,
high performance, availability, data replication,
and scalability
Introduction
(cont'd.)
NOSQL systems focus on storage of “big data”
• Typical applications that use NOSQL
• Social media
• Web links
• User profiles
• Marketing and sales
• Posts and tweets
• Road maps and spatial data
• Email
Introduction to NOSQL Systems
• BigTable
• Google's proprietary NOSQL system
• Column-based or wide column store
• DynamoDB (Amazon)
• Key-value data store
• Cassandra (Facebook)
• Uses concepts from both key-value
store and column-based systems
Introduction to NOSQL
Systems
Categories of NOSQL systems
• Document-based NOSQL systems
• NOSQL key-value stores
• Column-based or wide column NOSQL systems
• Graph-based NOSQL systems
• Hybrid NOSQL systems
• Object databases
• XML databases
The CAP Theoram:

• Various levels of consistency among replicated

data items
• Enforcing serializabilty the strongest
form of consistency
• High overhead — can reduce read/write
operation performance
• CAP theorem
• Consistency, availability, and partition tolerance
• Not possible to guarantee all three
simultaneously
- In distributed system with data replication
The CAP Theorem (cont'd.)
• Designer can choose two of three to guarantee
• Weaker consistency level is often
acceptable in NOSQL distributed data store
• Guaranteeing availability and partition
tolerance more important
• Eventual consistency often adopted
Document Based NOSQL
Systems and MongoDB
• Document stores
• Collections of similar documents
• Individual documents resemble complex objects
or XML documents
• Documents are self-describing
• Can have different data elements
• Documents can be specified in various formats
• XML
• JSON
MongoDB Data Mode
Documents stored in binary JSON (BSON)
format

Individual documents stored in a collection

• Example command
• First parameter specifies name of the collection
• Collection options include limits on size and
number of documents

• Each document in collection has unique

ObjectlD field called id
MongoDB Data Model (cont'd.)
• A collectiondoes not have a schema
• Structure of the data fields in documents
chosen based on how documents will be
accessed
• User can choose normalized or
denormalized design
• Document creation using insert operation
db.<collection_name>.insert(<document(s)>)
• Document deletion using remove operation
db.<collection_name>.remove(<condition>)
(a\ project document with an array of embedded
workers:
id: ”P1”,
Pnamc: -Product
Plocation X",
: Bellamy-.
Workers: ( Ename: *John
[ Smith”. Hours: 32.5
).
( Ename: “Joyce
English", Hours:
20.0
)
Figure 24.1 (continues)
Example of simple documents
(b) project document with an embedded array of worker
in MongoDB (a) Denormalized ids:
document design with
id: P1”
embedded subdocuments (b) Pname: -Product
Embedded array of document Plocation X”,
references : Bellzre".
Workerlds: ( “W1-.
I id. “W2"
W1" ]
Ename -John
: Smith*, 32.5
Hours:
l id: -W2".
Ename -Joyce
: English". 20.0
Hours:
(c) normalized project and worker documents (not a fully normalized
design for M:N relatlonshlps):

id: "P1”
Pnarne: Product
Plocation X”,
Figure 24.1 (cont'd.) : "Bellaire"
( id:
Example of simple "John Smith",
Ename
documents in : P1 ”,
MongoDB Projectld 32.5
(c)Normalized Hours
documents ( id: -W2-,
(d)Inserting the Ename: -Joyce
Projectld: English".
documents in
Hours: P1
Figure 24.1(c) into ) ”
their collections 20.
Id) inserting the documents In Act into thelr collections "project- and
0
“worker": db.project.inserts ( id. “P1". Pname: “ProductX", location:
"Bellaire" ) ii db.worker.insert( [ ( id: "W1". Ename: "John Smrh", Pro ectld:
*P1", Hours: 32.5 ).
( d: “W2". Ename: "Joyce
English". Projectld. "P1", Hours: 20.0 ) ] !
MongoDB Distributed Systems
Characteristics
• Two-phase commit method
• Used to ensure atomicity and
consistencyof multidocument transactions
• Replication in MongoDB
• Concept of replica set to create multiple
copies on different nodes
• Variation of master-slave approach
• Primary copy, secondary copy, and arbiter
- Arbiter participates in elections to select new
primary if needed
MongoDB Characteristics (cont'd.)
• Sharding in MongoDB (cont'd.)
• Partitioning field (shard key) must exist in
every document in the collection
• Must have an index
• Range partitioning
- Creates chunks by specifying a range of key values
• Works best with range queries
• Hash partitioning
• Partitioning based on the hash values of each
shared key.
NOSQL Key Value Stores

• Key-value stores focus on high performance,

availability, and scalability
• Can store structured, unstructured,
or semi- structured data
• Key: unique identifier associated with a data
item
• Used for fast retrieval
• Value: the data item itself
• Can be string or array of bytes
• Application interprets the structure
• No query language
DynamoDB Overview
• DynamoDB part of Amazon's Web Services/SDK
platforms
• Proprietary
• Table holds a collection of self-describing items
• Item consists of attribute-value pairs
• Attribute values can be single or multi-valued
• Primary key used to locate items within a table
• Can be single attribute or pair of attributes
Voldemofi Key-Value Distributed
Data Store
• Voldemort open source key-value system similar
to DynamoDB
• Voldemort features
• Simple basic operations (get, put, and delete)
• High-level formatted data values
• Consistent hashing for distributing (key,
value) pairs
• Consistency and versioning
• Concurrent writes allowed
• Each write associated with a vector clock
Range
3

Range Range
2 t

Range Range
3 2

Figure 24.2 Example of

consistent hashing (a) Ring
having three nodes A, B, and C, Range Range
with C having
capacity greater
. The h( values that map 1 3
the circle points in range 1 have
to Range
3
(k, v) items stored in node A,
their
in node
range 2 B, range 5 in node C (b) Range
Range
Adding a node D to the ring. Items 2 1
in range 4 are moved to the node
D from node B (range S is
Range
reduced) and node C |range Z is 4
reduced)
Range
4

Range — ”Range
1 c 3
Examples of other Key Value Stores
• Oracle key-value store
• Oracle NOSQL Database
• Redis key-value cache and store
• Caches data in main memory to improve
performance
• Offers master-slave replication and high availability
- Offers persistence by backing up cache to disk
• Apache Cassandra
• Offers features from several NOSQL categories
- Used by Facebook and others
NOSQL Systems - Column-Based or Wide
Column
• BigTable: Google's distributed storage system for
big data
• Used in Gmail
• Uses Google File System for data storage
and distribution
• Apache Hbase a similar, open source system
• Uses Hadoop Distributed File System
(HDFS) for data storage
• Can also use Amazon's Simple Storage System
(S3)
Reference Books
TEXT BOOKS

1. Mohammed Guller, Big Data Analytics with Spark, Apress,2015

2. Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition,1997
3. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics:
Emerging Business Intelligence and Analytic Trends for Today’s
Business”, 1stEdition, Ambiga Dhiraj, Wiely CIO Series, 2013.
4. Arvind Sathi, “Big Data Analytics: Disruptive Technologies for
Changing the Game”,1st Edition, IBM Corporation, 2012.

REFERENCE BOOKS

5. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw

Hill, 2012.
6. Vignesh Prajapati, “Big Data Analytics with R and Hadoop”, Packet
Publishing 2013.
7. JyLiebowitz, “Big Data and Business Analytics”, CRC press, 2013.
For more insight
Web sources 
1. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
2. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
3. https://www.coursera.org/articles/
big-data-technologies?
utm_source=chatgpt.com
4. https://careerfoundry.com/en/ Big Data Big Big Data and
Analytics Analytics
blog/data-analytics/where-to-find- Wiley
free-datasets/?
utm_source=chatgpt.com
THANK YOU

For queries
Email: [email protected]

Certified Professional KUMA (034.3.2)
No ratings yet
Certified Professional KUMA (034.3.2)
37 pages
DevOps. How To Build Pipelines With Bitbucket Pipelines + Docker Container + AWS ECS + JDK 11 + Maven 3?
From Everand
DevOps. How To Build Pipelines With Bitbucket Pipelines + Docker Container + AWS ECS + JDK 11 + Maven 3?
John Edward Cooper Berg
No ratings yet
Project Report PPT - 12 - Yathaarth Suri 12515002718
No ratings yet
Project Report PPT - 12 - Yathaarth Suri 12515002718
22 pages
Database Security
No ratings yet
Database Security
21 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
DBMS - Unit 6 (Advances in Databases)
No ratings yet
DBMS - Unit 6 (Advances in Databases)
19 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
06-NoSQL
No ratings yet
06-NoSQL
80 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Unit 2
No ratings yet
Unit 2
26 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
Unit 2(Big Data Analytics)
No ratings yet
Unit 2(Big Data Analytics)
11 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
Module_1
No ratings yet
Module_1
69 pages
NGT NOV-19 (Sol) (E-next.in)
No ratings yet
NGT NOV-19 (Sol) (E-next.in)
33 pages
BIG DATA UNIT-II NOTES
No ratings yet
BIG DATA UNIT-II NOTES
7 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
No SQL
No ratings yet
No SQL
38 pages
No SQL
No ratings yet
No SQL
109 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Chapter14_BigData&NoSQLDatabases
No ratings yet
Chapter14_BigData&NoSQLDatabases
39 pages
Slide 6 NoSQL Database and HBase Tutorial
No ratings yet
Slide 6 NoSQL Database and HBase Tutorial
110 pages
nosql
No ratings yet
nosql
64 pages
NoSQL Databases and Big Data Storage Systems
No ratings yet
NoSQL Databases and Big Data Storage Systems
4 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
11-NoSQL_Nhom8
No ratings yet
11-NoSQL_Nhom8
72 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
Module 1
No ratings yet
Module 1
34 pages
Chapter 5-NoSQL PDF
No ratings yet
Chapter 5-NoSQL PDF
47 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Lecture8
No ratings yet
Lecture8
34 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
05 NoSQL
No ratings yet
05 NoSQL
21 pages
UNIT 1 NOTES
No ratings yet
UNIT 1 NOTES
28 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
01 NSQL
No ratings yet
01 NSQL
5 pages
Screenshot 2023-12-07 at 00.20.37
No ratings yet
Screenshot 2023-12-07 at 00.20.37
21 pages
Data Analytics Using NoSQL
0% (1)
Data Analytics Using NoSQL
50 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
sea level
No ratings yet
sea level
42 pages
PPT 2.1.2
No ratings yet
PPT 2.1.2
31 pages
PPT 2.1.5
No ratings yet
PPT 2.1.5
21 pages
PPT 2.1.4
No ratings yet
PPT 2.1.4
23 pages
PPT 2.1.3
No ratings yet
PPT 2.1.3
25 pages
Import CSV File Using Mongoimport in
No ratings yet
Import CSV File Using Mongoimport in
6 pages
MST Lab Manual
No ratings yet
MST Lab Manual
42 pages
BDA - MongoDB
No ratings yet
BDA - MongoDB
12 pages
Final Obb
No ratings yet
Final Obb
87 pages
Project Documentation 520
No ratings yet
Project Documentation 520
14 pages
Model Question Papers
No ratings yet
Model Question Papers
2 pages
Dbms Assignment 9
No ratings yet
Dbms Assignment 9
6 pages
FSD-BIS601-Tutorial3
No ratings yet
FSD-BIS601-Tutorial3
2 pages
Mongodb Java Crud Example Tutorial
No ratings yet
Mongodb Java Crud Example Tutorial
13 pages
BDA - Expt 2 - 18102B0032
No ratings yet
BDA - Expt 2 - 18102B0032
4 pages
Web Dev Syllabus
No ratings yet
Web Dev Syllabus
5 pages
Mongodb Vs Mysql
No ratings yet
Mongodb Vs Mysql
10 pages
Jake S Resume Anonymous
No ratings yet
Jake S Resume Anonymous
1 page
Manual Mango
No ratings yet
Manual Mango
17 pages
Rohith_Asina
No ratings yet
Rohith_Asina
2 pages
MongoDB CheatSheet v1 0
No ratings yet
MongoDB CheatSheet v1 0
4 pages
ProjectReport Website Development
No ratings yet
ProjectReport Website Development
29 pages
Tutorial - Writing Your First Meteor Application - Sebastian Dahlgren
No ratings yet
Tutorial - Writing Your First Meteor Application - Sebastian Dahlgren
12 pages
MERN Stack Interview Questions - CodeHype
No ratings yet
MERN Stack Interview Questions - CodeHype
6 pages
Mongodb
No ratings yet
Mongodb
9 pages
Crud Repository
No ratings yet
Crud Repository
19 pages
Performance Evaluation of SQL and Nosql Database Management Systems in A Cluster
No ratings yet
Performance Evaluation of SQL and Nosql Database Management Systems in A Cluster
24 pages
SCA21 Paper 102
No ratings yet
SCA21 Paper 102
11 pages
Pagination in Golang and MongoDB - DEV Community
No ratings yet
Pagination in Golang and MongoDB - DEV Community
7 pages
Emerging Research Trends in Database Systems
No ratings yet
Emerging Research Trends in Database Systems
21 pages
Unit 3
No ratings yet
Unit 3
10 pages
Csis 3300 w5 9 Nosql
No ratings yet
Csis 3300 w5 9 Nosql
27 pages

PPT 2.2.1

Uploaded by

PPT 2.2.1

Uploaded by

•

Computer Science & Engineering

BIG Data Analytics

Assistant Professor (Chandigarh

CO1 Understand the Fundamentals of Big Data.

CO2 Master Big Data Architecture and Tools

CO3 Explore the Hadoop Ecosystem and Data Processing Models

CO4 Develop Data Science Skills and Tools

CO5 Implement Real-Time Data Analytics and Visualization

• Various levels of consistency among replicated

Individual documents stored in a collection

• Each document in collection has unique

• Key-value stores focus on high performance,

Figure 24.2 Example of

1. Mohammed Guller, Big Data Analytics with Spark, Apress,2015

5. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw

You might also like