Databases and Storage
Databases and Storage
What is a Database?
What is DBMS?
Components
Schema
The role of a schema is to define the shape of a data structure, and specify
what kinds of data can go where. Schemas can be strictly enforced across the entire
database, loosely enforced on part of the database, or they might not exist at all.
Table
Each table contains various columns just like in a spreadsheet. A table can
have as meager as two columns and upwards of a hundred or more columns,
depending upon the kind of information being put in the table.
1
Column
A column contains a set of data values of a particular type, one value for each
row of the database. A column may contain text values, numbers, enums,
timestamps, etc.
Row
Types
1. SQL
2. NoSQL
Document
Key-value
Graph
Time series
Wide column
Multi-model
2
Challenges
Some common challenges faced while running databases at scale:
3
Tables are used to hold information about the objects to be represented in
the database.
Each column in a table holds a certain kind of data and a field stores the actual
value of an attribute.
The rows in the table represent a collection of related values of one object or
entity.
Each row in a table could be marked with a unique identifier called a primary
key, and rows among multiple tables can be made related using foreign keys.
This data can be accessed in many different ways without re-organizing the
database tables themselves.
4
A. Database Replication
Replication is a process that involves sharing information to ensure
consistency between redundant resources such as multiple databases, to improve
reliability, fault-tolerance, or accessibility.
1. Master-Slave Replication
The master serves reads and writes, replicating writes to one or more slaves,
which serve only reads. Slaves can also replicate additional slaves in a tree-like
fashion. If the master goes offline, the system can continue to operate in read-only
mode until a slave is promoted to a master or a new master is provisioned.
Advantages
5
Disadvantages
6
2. Master-Master Replication
Both masters serve reads/writes and coordinate with each other. If either
master goes down, the system can continue to operate with both reads and writes.
Advantages
Disadvantages
7
Disadvantages: replication
1. There is a potential for loss of data if the master fails before any newly written
data can be replicated to other nodes.
2. Writes are replayed to the read replicas. If there are a lot of writes, the read
replicas can get bogged down with replaying writes and can't do as many
reads.
3. The more read slaves, the more you have to replicate, which leads to greater
replication lag.
4. On some systems, writing to the master can spawn multiple threads to write
in parallel, whereas read replicas only support writing sequentially with a
single thread.
5. Replication adds more hardware and additional complexity.
2. Multi-master replication
8
B. Federation
For example, instead of a single, monolithic database, you could have three
databases: forums, users, and products, resulting in less read and write traffic to
each database and therefore less replication lag.
Smaller databases result in more data that can fit in memory, which in turn results in
more cache hits due to improved cache locality. With no single central master
serializing writes you can write in parallel, increasing throughput.
Disadvantages: federation
9
3. Joining data from two databases is more complex with a server link.
10
C. Sharding
Sharding distributes data across different databases such that each database
can only manage a subset of the data. Taking a users database as an example, as the
number of users increases, more shards are added to the cluster.
Similar to the advantages of federation, sharding results in less read and write
traffic, less replication, and more cache hits. Index size is also reduced, which
generally improves performance with faster queries. If one shard goes down, the
other shards are still operational, although you'll want to add some form of
replication to avoid data loss. Like federation, there is no single central master
serializing writes, allowing you to write in parallel with increased throughput.
Common ways to shard a table of users is either through the user's last name
initial or the user's geographic location.
11
Disadvantages: sharding
1. You'll need to update your application logic to work with shards, which could
result in complex SQL queries.
2. Data distribution can become lopsided in a shard. For example, a set of power
users on a shard could result in increased load to that shard compared to
others.
3. Rebalancing adds additional complexity. A sharding function based on
consistent hashing can reduce the amount of transferred data.
4. Joining data from multiple shards is more complex.
5. Sharding adds more hardware and additional complexity.
12
D. SQL tuning
SQL tuning is a broad topic and many books have been written as reference.
3. CHAR effectively allows for fast, random access, whereas with VARCHAR, you
must find the end of a string before moving onto the next one.
4. Use TEXT for large blocks of text such as blog posts. TEXT also allows for
boolean searches. Using a TEXT field results in storing a pointer on disk that is
used to locate the text block.
7. Avoid storing large BLOBS, store the location of where to get the object
instead.
13
Use good indices
1. Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be
faster with indices.
2. Indices are usually represented as self-balancing B-tree that keeps data sorted
and allows searches, sequential access, insertions, and deletions in logarithmic
time.
3. Placing an index can keep the data in memory, requiring more space.
4. Writes could also be slower since the index also needs to be updated.
5. When loading large amounts of data, it might be faster to disable indices, load
the data, then rebuild the indices.
Partition tables
Break up a table by putting hot spots in a separate table to help keep it in memory.
14
Synchronous vs. Asynchronous replication
In contrast, asynchronous replication copies the data to the replica after the
data is already written to the primary storage. Although the replication process may
occur in near-real-time, it is more common for replication to occur on a scheduled
basis and it is more cost-effective.
Indexes
Indexes are well known when it comes to databases, they are used to improve
the speed of data retrieval operations on the data store. An index makes the
trade-offs of increased storage overhead, and slower writes (since we not only have
to write the data but also have to update the index) for the benefit of faster reads.
Indexes are used to quickly locate data without having to examine every row in a
database table. Indexes can be created using one or more columns of a database
table, providing the basis for both rapid random lookups and efficient access to
ordered records.
15
An index is a data structure that can be perceived as a table of contents that
points us to the location where actual data lives. So when we create an index on a
column of a table, we store that column and a pointer to the whole row in the index.
Indexes are also used to create different views of the same data. For large data sets,
this is an excellent way to specify different filters or sorting schemes without
resorting to creating multiple additional copies of the data.
One quality that database indexes can have is that they can be dense or
sparse. Each of these index qualities comes with its own trade-offs. Let's look at how
each index type would work:
Dense Index
In a dense index, an index record is created for every row of the table. Records
can be located directly as each record of the index holds the search key value and the
pointer to the actual record.
16
Sparse Index
In a sparse index, index records are created only for some of the records.
17
Normalization and Denormalization
Terms
Before we go any further, let's look at some commonly used terms in normalization
and denormalization.
Keys
Dependencies
Partial dependency: Occurs when the primary key determines some other
attributes.
Functional dependency: It is a relationship that exists between two attributes,
typically between the primary key and non-key attribute within a table.
Transitive functional dependency: Occurs when some non-key attribute
determines some other attribute.
18
Anomalies
Database anomaly happens when there is a flaw in the database due to
incorrect planning or storing everything in a flat database. This is generally addressed
by the process of normalization.
1. Insertion anomaly: Occurs when we are not able to insert certain attributes in
the database without the presence of other attributes.
2. Update anomaly: Occurs in case of data redundancy and partial update. In
other words, a correct update of the database needs other actions such as
addition, deletion, or both.
3. Deletion anomaly: Occurs where deletion of some data requires deletion of
other data.
Example
Let's imagine, we hired a new person "John" but they might not be assigned a
team immediately. This will cause an insertion anomaly as the team attribute is not
yet present.
19
Next, let's say Hailey from Team C got promoted, to reflect that change in the
database, we will need to update 2 rows to maintain consistency which can cause an
update anomaly.
Finally, we would like to remove Team B but to do that we will also need to remove
additional information such as name and role, this is an example of a deletion
anomaly.
20
Normalization
Normal forms
1NF
For a table to be in the first normal form (1NF), it should follow the following rules:
2NF
For a table to be in the second normal form (2NF), it should follow the following
rules:
21
3NF
For a table to be in the third normal form (3NF), it should follow the following rules:
BCNF
Boyce-Codd normal form (or BCNF) is a slightly stronger version of the third
normal form (3NF) used to address certain types of anomalies not dealt with by 3NF
as originally defined. Sometimes it is also known as the 3.5 normal form (3.5NF).
For a table to be in the Boyce-Codd normal form (BCNF), it should follow the
following rules:
There are more normal forms such as 4NF, 5NF, and 6NF but we won't discuss
them here. Check out this amazing video that goes into detail.
22
Advantages
Disadvantages
23
Denormalization
Denormalization is a database optimization technique in which we add
redundant data to one or more tables. This can help us avoid costly joins in a
relational database.
Advantages
Disadvantages
24
ACID and BASE consistency models
Let's discuss the ACID and BASE consistency models.
ACID
The term ACID stands for Atomicity, Consistency, Isolation, and Durability.
ACID properties are used for maintaining data integrity during transaction
processing.
Atomic
Consistent
Isolated
Durable
Once the transaction has been completed and the writes and updates have
been written to the disk, it will remain in the system even if a system failure
occurs.
25
BASE
With the increasing amount of data and high availability requirements, the
approach to database design has also changed dramatically. To increase the ability to
scale and at the same time be highly available, we move the logic from the database
to separate servers.
In this way, the database becomes more independent and focused on the
actual process of storing data.
In the NoSQL database world, ACID transactions are less common as some
databases have loosened the requirements for immediate consistency, data
freshness, and accuracy in order to gain other benefits, like scale and resilience.
BASE properties are much looser than ACID guarantees, but there isn't a direct
one-for-one mapping between the two consistency models. Let us understand these
terms:
Basic Availability
Soft-state
Eventual consistency
26
ACID vs BASE Trade-offs
There's no right answer to whether our application needs an ACID or a BASE
consistency model. Both the models have been designed to satisfy different
requirements. While choosing a database we need to keep the properties of both the
models and the requirements of our application in mind.
27
NoSQL
NoSQL is a collection of data items represented in a key-value store,
document store, wide column store, or a graph database. Data is denormalized, and
joins are generally done in the application code. Most NoSQL stores lack true ACID
transactions and favor eventual consistency.
Key-value store
A key-value store generally allows for O(1) reads and writes and is often
backed by memory or SSD. Data stores can maintain keys in lexicographic order,
allowing efficient retrieval of key ranges. Key-value stores can allow for storing of
metadata with a value.
Key-value stores provide high performance and are often used for simple data
models or for rapidly-changing data, such as an in-memory cache layer. Since they
offer only a limited set of operations, complexity is shifted to the application layer if
additional operations are needed.
28
A key-value store is the basis for more complex systems such as a document
store, and in some cases, a graph database.
1. Key-value database
2. Disadvantages of key-value stores
3. Redis architecture
4. Memcached architecture
Document store
Some document stores like MongoDB and CouchDB also provide a SQL-like
language to perform complex queries. DynamoDB supports both key-values and
documents.
Document stores provide high flexibility and are often used for working with
occasionally changing data.
29
Sources and further reading: document store
1. Document-oriented database
2. MongoDB architecture
3. CouchDB architecture
4. Elasticsearch architecture
Google introduced Bigtable as the first wide column store, which influenced
the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from
Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in
lexicographic order, allowing efficient retrieval of selective key ranges.
Wide column stores offer high availability and high scalability. They are often
used for very large data sets.
30
Graph database
Abstraction: graph
Graphs databases offer high performance for data models with complex
relationships, such as a social network. They are relatively new and are not yet
widely-used; it might be more difficult to find development tools and resources.
Many graphs can only be accessed with REST API’s.
1. Graph database
2. Neo4j
3. FlockDB
31
Sources and further reading: NoSQL
SQL or NoSQL
1. Semi-structured data
3. Non-relational data
32
6. Very data intensive workload
5. Metadata/lookup tables
33
Caching
"There are only two hard things in Computer Science: cache invalidation and
naming things." - Phil Karlton
34
Caching and Memory
No matter whether the cache is read or written, it's done one block at a time.
Each block also has a tag that includes the location where the data was stored in the
cache. When data is requested from the cache, a search occurs through the tags to
find the specific content that's needed in level one (L1) of the memory. If the correct
data isn't found, more searches are conducted in L2.
If the data isn't found there, searches are continued in L3, then L4, and so on
until it has been found, then, it's read and loaded. If the data isn't found in the cache
at all, then it's written into it for quick retrieval the next time.
Cache hit
A cache hit describes the situation where content is successfully served from
the cache. The tags are searched in the memory rapidly, and when the data is found
and read, it's considered a cache hit.
A cache hit can also be described as cold, warm, or hot. In each of these, the
speed at which the data is read is described.
A hot cache is an instance where data was read from the memory at the
fastest possible rate. This happens when the data is retrieved from L1.
35
A cold cache is the slowest possible rate for data to be read, though, it's still
successful so it's still considered a cache hit. The data is just found lower in the
memory hierarchy such as in L3, or lower.
A warm cache is used to describe data that's found in L2 or L3. It's not as fast
as a hot cache, but it's still faster than a cold cache. Generally, calling a cache warm is
used to express that it's slower and closer to a cold cache than a hot one.
Cache miss
A cache miss refers to the instance when the memory is searched, and the
data isn't found. When this happens, the content is transferred and written into the
cache.
Caching improves page load times and can reduce the load on your servers
and databases. In this model, the dispatcher will first lookup if the request has been
made before and try to find the previous result to return, in order to save the actual
execution.
Databases often benefit from a uniform distribution of reads and writes across
its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a
cache in front of a database can help absorb uneven loads and spikes in traffic.
Client caching
CDN caching
36
Web server caching
Reverse proxies and caches such as Varnish can serve static and dynamic
content directly. Web servers can also cache requests, returning responses without
having to contact application servers.
Database caching
Application caching
Persistence option
Built-in data structures such as sorted sets and lists
There are multiple levels you can cache that fall into two general categories:
database queries and objects:
Row level
Query-level
Fully-formed serializable objects
Fully-rendered HTML
Generally, you should try to avoid file-based caching, as it makes cloning and
auto-scaling more difficult.
37
Caching at the database query level
Whenever you query the database, hash the query as a key and store the
result to the cache. This approach suffers from expiration issues:
See your data as an object, similar to what you do with your application code.
Have your application assemble the dataset from the database into a class instance
or a data structures:
Remove the object from cache if its underlying data has changed
Allows for asynchronous processing: workers assemble objects by consuming
the latest cached object
User sessions
Fully rendered web pages
Activity streams
User graph data
Since you can only store a limited amount of data in cache, you'll need to
determine which cache update strategy works best for your use case.
38
Cache-aside
The application is responsible for reading and writing from storage. The cache
does not interact with storage directly. The application does the following:
39
Subsequent reads of data added to cache are fast. Cache-aside is also referred
to as lazy loading. Only requested data is cached, which avoids filling up the cache
with data that isn't requested.
Disadvantages cache-aside
Each cache miss results in three trips, which can cause a noticeable delay.
Data can become stale if it is updated in the database. This issue is mitigated
by setting a time-to-live (TTL) which forces an update of the cache entry, or by
using write-through.
When a node fails, it is replaced by a new, empty node, increasing latency.
Write-through cache
The application uses the cache as the main data store, reading and writing
data to it, while the cache is responsible for reading and writing to the database:
Application code:
set_user(12345, {"foo":"bar"})
40
Cache code:
cache.set(user_id, user)
Pro: Fast retrieval, complete data consistency between cache and storage.
When a new node is created due to failure or scaling, the new node will not
cache entries until the entry is updated in the database. Cache-aside in
conjunction with write through can mitigate this issue.
Most data written might never be read, which can be minimized with a TTL.
Higher latency for write operations.
41
Write-behind (write-back)
Disadvantages: write-behind
There could be data loss if the cache goes down prior to its contents hitting
the data store.
It is more complex to implement write-behind than it is to implement
cache-aside or write-through.
42
Refresh-ahead
You can configure the cache to automatically refresh any recently accessed
cache entry prior to its expiration.
Disadvantages: refresh-ahead
Not accurately predicting which items are likely to be needed in the future can
result in reduced performance than without refresh-ahead.
Eviction policies
First In First Out (FIFO): The cache evicts the first block accessed first without
any regard to how often or how many times it was accessed before.
Last In First Out (LIFO): The cache evicts the block accessed most recently first
without any regard to how often or how many times it was accessed before.
Least Recently Used (LRU): Discards the least recently used items first.
Most Recently Used (MRU): Discards, in contrast to LRU, the most recently
used items first.
Least Frequently Used (LFU): Counts how often an item is needed. Those that
are used least often are discarded first.
43
Random Replacement (RR): Randomly selects a candidate item and discards it
to make space when necessary.
44
Distributed Cache
Global Cache
45
As the name suggests, we will have a single shared cache that all the
application nodes will use. When the requested data is not found in the global cache,
it's the responsibility of the cache to find out the missing piece of data from the
underlying data store.
Use cases
Database Caching
Content Delivery Network (CDN)
Domain Name System (DNS) Caching
API Caching
Let's also look at some scenarios where we should not use cache:
Caching isn't helpful when it takes just as long to access the cache as it does to
access the primary data store.
Caching doesn't work as well when requests have low repetition (higher
randomness), because caching performance comes from repeated memory
access patterns.
Caching isn't helpful when the data changes frequently, as the cached version
gets out of sync, and the primary data store must be accessed every time.
It's important to note that a cache should not be used as permanent data storage.
They are almost always implemented in volatile memory because it is faster, and
thus should be considered transient.
46
Advantages
Improves performance
Reduce latency
Reduce load on the database
Reduce network cost
Increase Read Throughput
Need to maintain consistency between caches and the source of truth such as
the database through cache invalidation.
Cache invalidation is a difficult problem, there is additional complexity
associated with when to update the cache.
Need to make application changes such as adding Redis or memcached.
Examples
Redis
Memcached
Amazon Elasticache
Aerospike
47
AWS ElastiCache strategies
Wikipedia
STORAGE TYPE
Storage is a mechanism that enables a system to retain data, either
temporarily or permanently. This topic is mostly skipped over in the context of
system design, however, it is important to have a basic understanding of some
common types of storage techniques that can help us fine-tune our storage
components. Let's discuss some important storage concepts:
RAID
There are different RAID levels, however, and not all have the goal of
providing redundancy. Let's discuss some commonly used RAID levels:
RAID 0: Also known as striping, data is split evenly across all the drives in the
array.
RAID 1: Also known as mirroring, at least two drives contains the exact copy of
a set of data. If a drive fails, others will still work.
RAID 5: Striping with parity. Requires the use of at least 3 drives, striping the
data across multiple drives like RAID 0, but also has a parity distributed across
the drives.
RAID 6: Striping with double parity. RAID 6 is like RAID 5, but the parity data
are written to two drives.
RAID 10: Combines striping plus mirroring from RAID 0 and RAID 1. It provides
security by mirroring all data on secondary drives while using striping across
each set of drives to speed up data transfers.
48
Comparison
Minimum 2 2 3 4 4
Disks
Volumes
File storage
File storage is a solution to store data as files and present it to its final users as
a hierarchical directories structure. The main advantage is to provide a user-friendly
solution to store and retrieve files. To locate a file in file storage, the complete path
of the file is required. It is economical and easily structured and is usually found on
hard drives, which means that they appear exactly the same for the user and on the
hard drive.
49
Block storage
Block storage divides data into blocks (chunks) and stores them as separate
pieces. Each block of data is given a unique identifier, which allows a storage system
to place the smaller pieces of data wherever it is most convenient.
Block storage also decouples data from user environments, allowing that data
to be spread across multiple environments. This creates multiple paths to the data
and allows the user to retrieve it quickly. When a user or application requests data
from a block storage system, the underlying storage system reassembles the data
blocks and presents the data to the user or application
Object Storage
Object storage, which is also known as object-based storage, breaks data files
up into pieces called objects. It then stores those objects in a single repository, which
can be spread out across multiple networked systems.
Example: Amazon S3, Azure Blob Storage, Google Cloud Storage, etc.
NAS
HDFS
50
access to application data and is suitable for applications that have large data sets. It
has many similarities with existing distributed file systems.
HDFS is designed to reliably store very large files across machines in a large
cluster. It stores each file as a sequence of blocks, all blocks in a file except the last
block are the same size. The blocks of a file are replicated for fault tolerance.
51