PPT 2.2.1
PPT 2.2.1
BY : Urvashi
Chapter-1 Big Data Frameworks: Hadoop, Apache Spark, and their Comparison; NoSQL databases: MongoDB,
Big Data Cassandra, and HBase; Big Data Visualization Tools: Tableau, Power BI, and Zeppelin; Real-Time Big
Frameworks Data Processing: Apache Storm and Flink; Emerging trends in Big Data Technologies.
Overview of SQL vs. NoSQL: Differences and Use Cases; Introduction to Big SQL: Big SQL Features –
Chapter – 2 Scalability, support for structured and unstructured data, Query optimization Techniques in Big
Big SQL and SQL; NoSQL Database Types: Key-Value stores (Redis, DynamoDB), Document stores (MongoDB,
NO SQL CouchDB), Column-family stores (Cassandra, HBase), Graph Databases (Neo4j); Advantages and
Databases limitations of Big SQL and NoSQL.
Chapter – 3 Introduction to IBM Watson: Overview and capabilities of Watson AI, Watson’s role in Big data and
AI in Big Data decision-making; Key Watson Services: Watson Discovery, Watson Studio, and Watson Assistant,
Integration of Watson with Big Data tools; AI and Machine Learning Applications in Big Data:
Natural Language Processing (NLP), Sentiment Analysis and Predictive Analytics.
Course Outcomes
3
NoSQL databases and Big Data Features
Introduction
• NOSQL
• Not only SQL
• Most NOSQL systems are distributed databases
or distributed storage systems
• Focus on semi-structured data storage,
high performance, availability, data replication,
and scalability
Introduction
(cont'd.)
NOSQL systems focus on storage of “big data”
• Typical applications that use NOSQL
• Social media
• Web links
• User profiles
• Marketing and sales
• Posts and tweets
• Road maps and spatial data
• Email
Introduction to NOSQL Systems
• BigTable
• Google's proprietary NOSQL system
• Column-based or wide column store
• DynamoDB (Amazon)
• Key-value data store
• Cassandra (Facebook)
• Uses concepts from both key-value
store and column-based systems
Introduction to NOSQL
Systems
Categories of NOSQL systems
• Document-based NOSQL systems
• NOSQL key-value stores
• Column-based or wide column NOSQL systems
• Graph-based NOSQL systems
• Hybrid NOSQL systems
• Object databases
• XML databases
The CAP Theoram:
id: "P1”
Pnarne: Product
Plocation X”,
Figure 24.1 (cont'd.) : "Bellaire"
( id:
Example of simple "John Smith",
Ename
documents in : P1 ”,
MongoDB Projectld 32.5
(c)Normalized Hours
documents ( id: -W2-,
(d)Inserting the Ename: -Joyce
Projectld: English".
documents in
Hours: P1
Figure 24.1(c) into ) ”
their collections 20.
Id) inserting the documents In Act into thelr collections "project- and
0
“worker": db.project.inserts ( id. “P1". Pname: “ProductX", location:
"Bellaire" ) ii db.worker.insert( [ ( id: "W1". Ename: "John Smrh", Pro ectld:
*P1", Hours: 32.5 ).
( d: “W2". Ename: "Joyce
English". Projectld. "P1", Hours: 20.0 ) ] !
MongoDB Distributed Systems
Characteristics
• Two-phase commit method
• Used to ensure atomicity and
consistencyof multidocument transactions
• Replication in MongoDB
• Concept of replica set to create multiple
copies on different nodes
• Variation of master-slave approach
• Primary copy, secondary copy, and arbiter
- Arbiter participates in elections to select new
primary if needed
MongoDB Characteristics (cont'd.)
• Sharding in MongoDB (cont'd.)
• Partitioning field (shard key) must exist in
every document in the collection
• Must have an index
• Range partitioning
- Creates chunks by specifying a range of key values
• Works best with range queries
• Hash partitioning
• Partitioning based on the hash values of each
shared key.
NOSQL Key Value Stores
Range Range
2 t
Range Range
3 2
Range — ”Range
1 c 3
Examples of other Key Value Stores
• Oracle key-value store
• Oracle NOSQL Database
• Redis key-value cache and store
• Caches data in main memory to improve
performance
• Offers master-slave replication and high availability
- Offers persistence by backing up cache to disk
• Apache Cassandra
• Offers features from several NOSQL categories
- Used by Facebook and others
NOSQL Systems - Column-Based or Wide
Column
• BigTable: Google's distributed storage system for
big data
• Used in Gmail
• Uses Google File System for data storage
and distribution
• Apache Hbase a similar, open source system
• Uses Hadoop Distributed File System
(HDFS) for data storage
• Can also use Amazon's Simple Storage System
(S3)
Reference Books
TEXT BOOKS
REFERENCE BOOKS
For queries
Email: [email protected]