Aggregate data model Single Server Single Server - No distribution at all - Single server handles all the read & write requests - Graph databases work better on a single server architecture Sharding Sharding - Different users working with different data - Related data needs to be clumped together - Another factor is load balancing - Auto-sharding - Node failures Master slave replication Master Slave Replication - A single master and multiple slaves - Master handles the writes and slaves handles the reads - Read resilience - Master node failure Peer - Peer replication Peer to peer Replication - No master at all - All nodes can accept reads & writes - Provides the write resilience as well - Consistency problems - Write-write conflict Combining replication & sharding Combining Sharding & Replication - Multiple masters but each data item has only single master - A single node can be master for one item and slave for other Key Points - There are two styles of distributing data: - Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data. - Replication copies data across multiple servers, so each bit of data can be found in multiple places. A system may use either or both techniques. - Replication comes in two forms: - Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads. - Peer-to-peer replication allows writes to any node; the nodes coordinate to synchronize their copies of the data. Master-slave replication reduces the chance of update conflicts but peer-to-peer replication avoids loading all writes onto a single point of failure. CAP Theorem Key-Value Store A key-value store, or key-value database, is a type of data storage software program that stores data as a set of unique identifiers, each of which have an associated value. This data pairing is known as a “key-value pair.” The unique identifier is the “key” for an item of data, and a value is either the data being identified or the location of that data. get, put, delete Riak, Redis, Memchaced DB, BerkleyDB, HamsterDB, Amazon DynamoDB, Project Voldemart Riak Riak Key-Value store features Consistency
It is applicable for operations which involves single key
Ex: put, get, delete
● Optimal writes are expensive
● In distributed data stores like Riak, eventually consistency model is implemented ● Newest write wins ● Oldest writes loose Consistency Key-Value store features Transactions
● In NoSQL transactions are implemented differently for different
types ● Riak uses quorum ● The write said to be successful if it is executed on at least w nodes ● Suppose replication factor is 10 and w value is 7 which means 3 nodes can be down for write operations Key-Value store features Query Features
● All key value databases can be queried by using the key
● It is not possible to read the part of the value directly, we need to get the value and parse it ● What if we do not know the key? ● Riak-Search to search for the values Suitable use cases(when to use) ● Storing session information ● Userprofiles, preferences ● Shopping cart data
When not to use
● Relationships among the data ● Multi Operation Transactions ● Query by data ● Operation by sets