Azure Data Fundamentals Explore Non Relational Data in Azure - Explore Non-Relational Data Offerings in Azure
Azure Data Fundamentals Explore Non Relational Data in Azure - Explore Non-Relational Data Offerings in Azure
The structure of the data might be too varied to easily model as a set of relational tables
Learning objectives
- Explore use-case and management benefits of using Azure Table storage
- Explore use-case and management benefits of using Azure Blob storage
- Explore use-case and management benefits of using Azure File Storage
- Explore use-case and management benefits of using Azure Cosmos DB
Microsoft Page 1
Data will usually be denormalized, with each row holding the entire data for a logical entity
- To help ensure fast access, Azure Table Storage splits a table into partitions
- Partitioning is a mechanism for grouping related rows, based on a common property or partition
key. Rows that share the same partition key will be stored together. It can improve scalability and
performance:
○ Partitions are independent from each other, and can grow or shrink as rows are added to, or
removed from a partition. A table can contain any number of partitions
○ When you search for data, you can include the partition key in the search criteria.
This helps to narrow down the volume of data to be examined, and improves performance
by reducing the amount of I/O (reads and writes) needed to locate the data.
- The key in an Azure Table Storage table comprises two elements:
○ Partition key - identifies the partition containing the row]
○ Row key - unique to each row in the same partition
▪ Item in the same partition are stored in row key order
- This scheme enables an application to quickly perform Point queries that identify a single row and
Range queries that fetch a contiguous block of rows in a partition
Point query
○ When an application retrieves a single row, the partition key enables Azure to quickly hone
in on the correct partition and the row key lets Azure identify the row in that partition
○ The partition key and row key effectively define a clustered index over the data
Microsoft Page 2
Range query
○ The application search for a set oof rows in a partition, specifying the start and end point of
set as row keys.
The column in a table can hold numeric, string or binary data up to 64KB in size
A table can have up to 252 columns, apart from the partition and row keys.
The maximum row size is 1 MB
Microsoft Page 3
○ Storing TBs of structured data capable or serving web scale applications
Ex: product catalog for eCommerce applications, and customer information, where the data
can be quickly identified and ordered by a composite key
○ Storing datasets that don't require complex joins, foreign keys or stored procedures, and
that can be denormalized for fast access.
In IoT system, you might use Azure Table Storage to capture device sensor data
○ Capturing event logging and performance monitoring data.
Event log and performance information typically contain data that is structured according to
the type of event or performance measure being recorded.
Ex: partitioned by event and ordered by the date and time
Or if you need to analyze an ordered series of events and performance measure
chronologically, then partition the data by date. Then consider storing data twice, first by
type and again by date.
Writing data is fast, and the data is static once it has been recorded
- Azure Table Storage is intended to support very large volumes of data, up to several hundred TBs
in size.
- As you add rows to a tale, Azure Table Storage automatically manages the partitions in a table and
allocates storage as necessary
- Provide high-availability guarantees in a single region.
The data for each table is replicated three times within an Azure region.
At additional cost, you can create tables in geo-redundant storage
- Azure Table Storage helps to protect your data. You can configure security and role-based access
control to ensure that only the people or applications that need to see your data can actually
retrieve it
- On the New page, select Storage account - blob, file, table, queue
Microsoft Page 4
On the New page, select Storage account - blob, file, table, queue
- On the Create storage account page, enter the following details, and then select Review + create
Microsoft Page 5
- On the validation page, click Create, and wait while the new storage account is configure
- When the Your deployment is complete page appears, select Go to resource
- On the Overview page, for the new storage account, select Tables
Microsoft Page 6
On the Tables page, select + Table
- In the Add table dialog box, enter testtable for the name of the table, and then select OK
- When the new table has been created, select Storage Explorer
- On the Storage Explorer page, expand Tables, and then select testtable. Select Add to insert a new
entity into the table
Note: in Storage Explorer, rows are also called entities
Microsoft Page 7
- In the Add Entity dialog box, enter your own values for the PartitionKey and RowKey properties,
and then select Add Property.
Add a String property called Name and set the value to your name.
Select Add Property again, and add a Double property (this is numeric) named Age, and set the
value to your age
Select Insert to save the entity
- If time allows, experiment with creating additional entities. Not all entities must have the same
Microsoft Page 8
- If time allows, experiment with creating additional entities. Not all entities must have the same
properties. You can use the Edit function to modify the values in entity, and add or remove
properties. The Query function enables you to find entities that have properties with a specified
set of values
- Blob storage provides three access tiers, which help to balance access latency and storage cost
○ The Hot tier (default)
▪ Use this tier for blobs that are accessed frequently
▪ The blob data is stored on high-performance media
○ The Cool tier
▪ Has lower performance and incurs reduced storage charges compared to the Hot tier.
▪ Use this tier for data that is accessed infrequently
▪ You can migrate a blob from the hot tier to cool tier and also the opposite
○ The Archive tier
▪ Provides the lowest storage cost, but with increased latency
Microsoft Page 9
▪ Provides the lowest storage cost, but with increased latency
▪ Intended for historical data that mustn't be lost, but is required only rarely
▪ Blobs in the Archive tier effectively stored in an offline state.
▪ Typical reading latency for hot and cool tiers is a few milliseconds, but for the Archive
tier, it can take hours for the data to become available.
▪ To retrieve a blob from the Archive tier, you must change the access tier to Hot or
Cool. The blob will then be rehydrated. You can read the blob only when the
rehydration process is complete.
You can create lifecycle management policies for blobs in a storage account. A lifecycle
management policy can automatically move a blob from Hot to Cool, and then to the Archive tier,
as it ages and is used less frequently (policy is based on the number of days since modification). A
lifecycle management policy can also arrange to delete outdated blobs.
Note:
Azure Blob Storage is also used as the basis for Azure Data Lake Storage
Azure Data Lake -> uses for performing big data analytics
Microsoft Page 10
- On the home page, select Storage accounts
- On the Storage accounts page, select the storage account you created
- On the Overview page for your storage account, select Storage Explorer
- On the Storage Explorer page, right-click BLOB CONTAINERS, and then select Create blob container
- In the New Container dialog box, give your container a name, accept the default public access
level and then select Create
Microsoft Page 11
level and then select Create
- In the Storage Explorer window, expand BLOB CONTAINERS, and then select your new blob
container
- In the Upload blob dialog box, use the files button to pick a file of your choice on your computer,
and then select Upload
Microsoft Page 12
and then select Upload
- When the upload has completed, close the Upload blob dialog box. Verify that the block blob
appears in your container
- If you have time, you can experiment uploading other files as block blobs. You can also download
blobs back to your computer using the Download button
Microsoft Page 13
- You create Azure File Storage in a storage account,
- Azure File Storage enables you to share up to 100TB if data in a single storage account. The
maximum size of a single file is 1TB, but you can set quotas to limit the size of each share below
this figure.
- Supports up to 2000 concurrent connections per shared file
- Once you've created a storage account, you can upload files to Azure File Storage using the Azure
portal, or tools such as the AzCopy utility. You can also use the Azure File Sync service to
synchronize locally cached copies of shared files with the data in Azure File Storage.
- Azure File Storage offers two performance tiers
○ Standard tier - uses hard disk-based hardware datacenter
○ Premium tier - uses solid-state disks, offer greater throughput, but it charged at a higher
rate
Note:
Don't use Azure File Storage for files that can be written by multiple concurrent processes
simultaneously. Multiple writers require careful synchronization, otherwise the changes made by
one process can be overwritten by another.
The alternative solution is to lock the file as it is written, and then release the lock when the write
operation is complete. However, this approach can severely impact concurrency and limit
performance.
Azure Files Storage is a fully managed service. Your shared data is replicated locally within a region, but
can also be geo-replicated to a second region.
Microsoft Page 14
can also be geo-replicated to a second region.
Azure aims to provide up to 300 MB/second of throughput for a single Standard file share, but you can
increase throughput capacity by creating a Premium file share, for additional cost.
All data is encrypted at rest, and you can enable encryption for data in-transit between Azure File
Storage and your applications.
- In the New file share dialog box, enter a name for your file share, leave Quota empty, and then
select Create
Microsoft Page 15
select Create
- In the Storage Explorer window, expand FILE SHARES and select your new file share, and then
select Upload
Tip
If your new file share doesn't appear, right-click FILE SHARES, and then select Refresh.
- In the Upload files dialog box, use the files button to pick a file your choice on your computer, and
then select Upload
Microsoft Page 16
then select Upload
- When the upload has completed, close the Upload files dialog box. Verify that the file appears in
file share
Tip
If the file doesn't appear, right-click FILE SHARES, and then select Refresh.
Microsoft Page 17
- A document can hold up to 2 MB of data, including small binary objects
- Cosmos DB provides APIs --> enable you to access these documents using a set of well-known
interface
Note:
API (Application Programming Interface) -- uses to write programs that need to access data.
The APIs will often be different for different database management systems
○ Table API
▪ Enables you to use the Azure Table Storage API to store and retrieve documents.
▪ Enable you to switch from Table Storage to Cosmos DB without requiring that you
modify your existing applications
○ MongoDB API
▪ MongoDB is document database, with its own programmatic interface. Many
organizations run MongoDB on-premises.
▪ You can use the MongoDB API for Cosmos DB to enable a MongoDB application to run
unchanged against a Cosmos DB database.
▪ You can migrate the data in the MongoDB database to Cosmos DB running in the
cloud, but continue to run your existing applications to access this data.
○ Cassandra API
▪ A column family database management system.
▪ Many organization run it on-premises
▪ The Cassandra API for Cosmos DB provides a Cassandra-like programmatic interface
for Cosmos DB.
▪ enable you to quickly migrate Cassandra databases and applications to Cosmos DB.
○ Gremlin API
▪ Implements a graph database interface to Cosmos DB.
▪ A graph is a collection of data objects and directed relationships
▪ Data is still held as a set of documents in Cosmos DB, but the Gremlin API enables you
to perform graph queries over data.
▪ Using Gremlin API you can walk through the objects and relationships in the graph to
discover all manner of complex relationships
Microsoft Page 18
discover all manner of complex relationships
Note:
The primary purpose of the Table, MongoDB, Cassandra, and Gremlin APIs is to support
existing applications. If you are building a new application and database, you should use the
SQL API.
- documents in a Cosmos DB partition aren't sorted by ID. Instead, Cosmos DB maintains a separate
index.
This index contains not only the document IDs, but also tracks the value of every other field in
each document. This index is created and maintained automatically.
This index enables you to perform queries that specify criteria referencing any fields in a
container, without incurring the need to scan the entire partition to find that data.
Microsoft Page 19
- Cosmos DB guarantees less than 10-ms latencies for both reads (indexed) and writes at 99th
percentile, all around the world. This capability enables sustained ingestion of data and fast
queries for highly responsive apps
- Cosmos DB is certified for a wide array of compliance standards. Additionally, all data in Cosmos
DB is encrypted at rest and in motion
- Cosmos DB is a foundational service in Azure.
- Cosmos DB is highly suitable for this scenarios
○ IoT and telematics
▪ Ingest large amount of data in frequent burst of activity
▪ The data can then be used by analytics services, such as Azure Machine Learning,
Azure HDInsight, and Power BI, additionally, you can process the data in real time
using Azure Functions
○ Retail and marketing
▪ Ex: Windows Store and Xbox Live
▪ Used in the retail industry for storing catalog data and for event sourcing in order
processing pipelines
○ Gaming
▪ Modern games perform graphical processing on mobile/console clients, but rely on
the cloud to deliver customized content like in-game stats, social media integration
and high-score leaderboards.
▪ A game database needs to be fast and be able to handle massive spikes in request
rates during new game launches and feature updates
○ Web and mobile applications
▪ Well suited for modeling social interactions, integrating with third-party services and
for building rich personalized experiences.
▪ The Cosmos DB SDKs can be used to build rich iOS and Android applications using the
popular Xamaring framework
Microsoft Page 20