G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
NoSQL vs Relational DB
Key-Value model
Log data applications Log data Ad/media applications Digital marketing applications
Big Data Technologies
NOSQL Databases - Distributed DB - Big Data Storage Systems
These databases have fixed or static or predefined schema They have dynamic schema
These databases are best suited for complex queries These databases are not so good for complex queries
You have high volume workloads that require Your workload volume generally fits within thousands
predictable latency at large scale (e.g. latency of transactions per second
measured in milliseconds while performing millions of
transactions per second)
Your data is dynamic and frequently changes Your data is highly structured and requires
referential integrity
Relationships can be denormalized data models Relationships are expressed through table joins on
normalized data models
Data retrieval is simple and expressed without table You work with complex queries and reports
joins
Data is typically replicated across geographies and Data is typically centralized, or can be replicated
requires finer control over consistency, availability, and regions asynchronously
performance
Your application will be deployed to commodity Your application will be deployed to large, high-end
Consistency Models How NOSQL systems approach the issue of consistency among replicas
https://www.alexdebrie.com/posts/dynamodb-one-to-many/
Denormalization by duplicating data
https://www.alexdebrie.com/posts/dynamodb-one-to-many/
DynamoDB Indexing
When a table is created, it is required to specify a table name
and primary key.
In most cases, all items with the same partition key are
stored together in a collection, which we define as a group
of items with the same partition key but different sort keys.
For tables with composite primary keys, the sort key may
be used as a partition boundary. DynamoDB splits
Primary key & Partition key
partitions by sort key if the collection size grows bigger than https://aws.amazon.com/blogs/database/choosing-the-right-dy
10 GB. namodb-partition-key/
Recommendations for Partition Keys
Use high-cardinality attributes (distinct values for
each item, like emailid, employee_no, customerid,
sessionid, orderid)
https://aws.amazon.com/blogs/database/choosing-the-right-d
ynamodb-partition-key/
DynamoDB Secondary Index
● Global secondary index — an index with a hash and range key that can be
different from those on the table. A global secondary index is considered
“global” because queries on the index can span all of the data in a table,
across all partitions.
● Local secondary index — an index that has the same hash key as the table,
but a different range key. A local secondary index is “local” in the sense that
every partition of a local secondary index is scoped to a table partition that
has the same hash key.
DynamoDB Query & Scan
Adding a GSI (Global Secondary Index) to index that attribute and enable Query. In the last resort, use Scan.
https://dynobase.dev/dynamodb-scan-vs-query/
DynamoDB Use Cases
SnapChat
Voldemort Key-Value Distributed Data Store
● Voldemort is an open source system ( Apache 2.0 license), based on Amazon’s DynamoDB.
● Focus on high performance and horizontal scalability, as well as on providing replication for high
availability and sharding for improving latency (response time) of read and write requests.
● Technique to distribute the key-value pairs among the nodes of distributed cluster : Consistent
hashing.
● Features:
○ Simple basic operations: A collection of (key, value) pairs is kept in a Voldemort store (s).
○ High-level formatted data values : JSON
○ Consistent hashing for distributing (key,value) pairs.
○ Consistency and versioning: similar to DynamoDB for consistency in the presence of
replicas. Concurrent write operations are allowed by difference processes -> exits two or more
different values associated with the same key at different nodes when items are replicated.
Consistency is achieved using technique versioning and read repair. Each write is associated
with a vector clock value. When a read occurs, system can reconcile the final single value
between different version of the same value (of the same key) based on application semantics.
Voldemort Consistent Hashing
For distributing (k,v) pair among the nodes in the
distributed cluster of nodes. Hash function h(k) is
applied to the key k , determines where the item will
be store.