Cosmosdb Study
Cosmosdb Study
Integracion con Synapsis analytics (synapsis link) debe ser activado a nivel de
database y container (Es el is analyticts storage check)
Autoscale throughput is helpful if your team cannot predict your throughput needs
accurately or otherwise use the max throughput amount for < 66% of hours per month.
RUs
- one RU for a read and six RU/s for a write operation of a 1-KB document in
optimal conditions.
The TTL value for an item is configured by setting the ttl path of the item
The TTL value for an item is configured by setting the ttl path of the item. The
TTL value for an item will only work if the DefaultTimeToLive property is
configured for the parent containe
Examples
Container.DefaultTimeToLive Item.ttl Expiration in seconds
1000 null 1000
1000 -1 This item will never expire
1000 2000 2000
=====
400 Bad Request Something was wrong with the item in the body of the request
403 Forbidden Container was likely full
409 Conflict Item in container likely already had a matching id
413 RequestEntityTooLarge Item exceeds max entity size
429 TooManyRequests Current request exceeds the maximum RU/s provisioned for
the container
PartitionKey partitionKey = new ("accessories-used");
TransactionalBatch batch = container.CreateTransactionalBatch(partitionKey)
.CreateItem<Product>(saddle)
.CreateItem<Product>(handlebar);
====
TransactionalBatchOperationResult<Product> result =
response.GetOperationResultAtIndex<Product>(0);
Product firstProductResult = result.Resource;
TransactionalBatchOperationResult<Product> result =
response.GetOperationResultAtIndex<Product>(1);
Product secondProductResult = result.Resource;
====
product.price = 50d;
====
Task.WhenAll(concurrentTasks);
When we invoke Task.WhenAll, the SDK will kick in to create batches to group our
operations by physical partition, then distribute the requests to run concurrently.
Grouping operations greatly improves efficiency by reducing the number of back-end
requests, and allowing batches to be dispatched to different physical partitions in
parallel. It also reduces thread count on the client making it easier to consume
more throughput that you could if done as individual operations using individual
threads.
Throughput consumption
The provisioned throughput in request units per second (RU/s) is higher than if the
operations were executed individually. This increase should be considered as you
evaluate total throughput requirements when measured against other operations that
will happen concurrently.
Latency impact
When the SDK is attempting to fill a batch and doesn’t quite have enough items, it
will wait 100 milliseconds for more items. This wait can effect overall latency.
Document size
The SDK automatically creates batches for optimization with a maximum of 2 Mb (or
100 operations). Smaller items can take advantage of this optimization, with
oversized items having an inverse effect.
====
One interesting caveat here is that it doesn’t matter what name is used here for
the source, as this will reference the source moving forward. You can think of this
as a variable. It’s not uncommon to use a single letter from the container name:
SELECT
p.name,
p.price
FROM
p
====
{
"name": "LL Bottom Bracket",
"category": "Components, Bottom Brackets",
"scannerData": {
"price": 53.99
}
}
SELECT
p.name,
p.categoryName AS category,
{ "price": p.price } AS scannerData
FROM
products p
WHERE
p.price >= 50 AND
p.price <= 100
====
SELECT DISTINCT
p.categoryName
FROM
products p
====
[
"Components, Road Frames",
"Components, Touring Frames",
"Bikes, Touring Bikes",
"Clothing, Vests",
"Accessories, Locks",
"Components, Pedals",
...
====
First, we can use the IS_DEFINED built-in function to check if the tags property
exists at all in this item:
SELECT
IS_DEFINED(p.tags) AS tags_exist
FROM
products p
[
{
"tags_exist": false
}
]
====
We can use the IS_ARRAY built-in function to check if the tags property is an
array:
SELECT
IS_ARRAY(p.tags) AS tags_is_array
FROM
products p
We can also check if the tags property is null or not using the IS_NULL built-in
function:
SELECT
IS_NULL(p.tags) AS tags_is_null
FROM
products p
=====
SELECT
p.id,
p.price,
(p.price * 1.25) AS priceWithTax
FROM
products p
WHERE
IS_NUMBER(p.price)
SELECT
p.id,
p.price
FROM
products p
WHERE
IS_STRING(p.price)
SELECT VALUE
LOWER(p.sku)
FROM
products p
SELECT
*
FROM
products p
WHERE
p.retirementDate >= GetCurrentDateTime()
=====
"tags": [
{
"id": "2CE9DADE-DCAC-436C-9D69-B7C886A01B77",
"name": "apparel",
"class": "group"
},
{
"id": "CA170AAD-A5F6-42FF-B115-146FADD87298",
"name": "worn",
"class": "trade-in"
},
{
"id": "CA170AAD-A5F6-42FF-B115-146FADD87298",
"name": "no-damaged",
"class": "trade-in"
}
]
SELECT
p.id,
p.name,
t.name AS tag
FROM
products p
JOIN
t IN p.tags
[
{
"id": "80D3630F-B661-4FD6-A296-CD03BB7A4A0C",
"name": "Classic Vest, L",
"tag": "apparel"
},
{
"id": "80D3630F-B661-4FD6-A296-CD03BB7A4A0C",
"name": "Classic Vest, L",
"tag": "worn"
},
{
"id": "80D3630F-B661-4FD6-A296-CD03BB7A4A0C",
"name": "Classic Vest, L",
"tag": "no-damaged"
}
]
=====
string sql = "SELECT p.name, t.name AS tag FROM products p JOIN t IN p.tags WHERE
p.price >= @lower AND p.price <= @upper"
QueryDefinition query = new (sql)
.WithParameter("@lower", 500)
.WithParameter("@upper", 1000);
====
while(iterator.HasMoreResults)
{
foreach(Product product in await iterator.ReadNextAsync())
{
// Handle individual items
}
}
====
That you will want to connect to the first writable (primary) region of your
account
That you will use the default consistency level for the account with your read
requests
That you will connect directly to data nodes for requests
Gateway All requests are routed through the Azure Cosmos DB gateway as a
proxy
Direct The gateway is only used in initialization and to and cache addresses for
direct connectivity to data nodes
======
Bounded Staleness
ConsistentPrefix
Eventual
Session
Strong
For a single region account, the minimum value of K and T is 10 write operations or
5 seconds. For multi-region accounts the minimum value of K and T is 100,000 write
operations or 300 seconds.
The ConsistencyLevel setting is only used to only weaken the consistency level for
reads. It cannot be strengthened or applied to writes.
If you use a MaxItemCount of -1, you should ensure the total response doesn't
exceed the service limit for response size. For instance, the max response size is
4 MB.
=====
Console.WriteLine($"[{Convert.ToInt32(response.StatusCode)}]\
t{response.StatusCode}");
return response;
}
}
====
All data in Azure Cosmos DB SQL API containers is indexed by default. This occurs
because the container includes a default indexing policy that’s applied to all
newly created containers. The default indexing policy consists of the following
settings:
The inverted index is updated for all create, update, or delete operations on an
item
All properties for every item is automatically indexed
Range indexes are used for all strings or numbers
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
]
}
=====
The consistent indexing mode updates the index synchronously as your perform
individual operations that modify an item (create, update, or delete). This
indexing mode will be the standard choice for most containers to ensure the index
is updated as items change.
The none indexing mode completely disables indexing on a container. This indexing
mode is a scenario-specific mode where the indexing operation is either unnecessary
or could impact the solution's overall performance. Two examples include:
A bulk operation to create, update, or delete multiple documents may benefit from
disabling indexing during the bulk execution period. Once the bulk operations are
complete, the indexing mode can be switched back to consistent.
Solutions that use containers as a pure key-value store only perform point-read
operations. These containers do not benefit from the secondary indexes created by
running the indexer.
====
Three primary operators are used when defining a property path:
The ? operator indicates that a path terminates with a string or number (scalar)
value
The [] operator indicates that this path includes an array and avoids having to
specify an array index value
The * operator is a wildcard and matches any element beyond the current path
Using these operators, you can create a few example property path expressions for
the example JSON item:
=====
The exclude path excludes all possible properties within the category path, however
the include path is more precise and specifically includes the category.name
property. The result is that all properties within category are not indexed, with
the sole exception being the category.name property.
Indexing policies must include the root path and all possible values (/*) as either
an included or excluded path. More customizations exist as a level of precision
beyond that base. This leads to two fundamental indexing strategies you will see in
many examples.
====
policy.ExcludedPaths.Add(
new ExcludedPath{ Path = "/*" }
);
policy.IncludedPaths.Add(
new IncludedPath{ Path = "/name/?" }
);
policy.IncludedPaths.Add(
new IncludedPath{ Path = "/categoryName/?" }
);
====
You can also use composite indexes with queries that have different permutations of
filters and order by clauses.
=====
For example, to create a composite index of (name ASC, price DESC), you can define
a JSON object with this structure:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/_etag/?"
}
],
"compositeIndexes": [
[
{
"path": "/name",
"order": "ascending"
},
{
"path": "/price",
"order": "descending"
}
]
]
}
=====
await processor.StartAsync();
======
====
The first step is to create a data source. The data source points to somewhere
where data is stored. With Azure Cosmos DB SQL API, the data source is a reference
to an existing account with the following parameters configured:
Parameter Value
Connection string Connection string for Azure Cosmos DB account
Database Name of target database
Collection Name of target container
Query A SQL query to select items to be indexed
=====
Once the data source is configured, an index should be created that would be the
target of the indexing operation. The index contains, at a minimum, a name and a
key. The key refers to a unique identifier field for each JSON document in the
index.
Each field in the index should be configured to enable or disable features when
searching. These optional features allow extra search functionality on specific
fields when it makes sense. For each field, you must configure whether the field
is:
Feature Description
Retrievable Configures the field to be projected in search result sets
Filterable Accepts OData-style filtering on the field
Sortable Enables sorting using the field
Facetable Allows field to be dynamically aggregated and grouped
Searchable Allows search queries to match terms in the field
=====
The final step is to configure the indexer’s name and schedule. The schedule
determines how often the indexer will run to pull data from the data source and
populate the index with JSON documents.
=====
The default query for the Azure Cosmos DB SQL API data source is the following SQL
query.
SELECT
*
FROM
c
WHERE
c._ts >= @HighWaterMark
ORDER BY
c._ts
This query only finds items whose timestamp (_ts property) is greater than or equal
to a built-in high watermark field. The high watermark field comes from a built-in
change detection policy that attempts to identify whether an item has been changed
or not.
To accomplish change detection, the indexer will index all items returned by the
query. It will then store a timestamp as the high watermark. The next indexer run
will then index items with a timestamp greater than or equal to the stored high
watermark. This strategy effectively indexes all items that have been created or
changed since the last run.
If your SQL query sorts the items to index using the timestamp, then Azure
Cognitive Search can implement incremental progress during indexing. If the indexer
fails for a transient reason, sorting the timestamps will allow the indexer to
resume indexing from the failure point instead of reindexing the entire container
again. This setting must be enabled when configuring the data source.
======
If an item is deleted from a container in Azure Cosmos DB SQL API, that item may
not be deleted from the index in Azure Cognitive Search. To enable tracking of
deleted items, you must configure a policy to track when an item is deleted.
JSON
Copy
{
"id": "E08E4507-9666-411B-AAC4-519C00596B0A",
"categoryId": "86F3CBAB-97A7-4D01-BABB-ADEFFFAED6B4",
"sku": "TI-R092",
"name": "LL Road Tire",
"_isDeleted": true
}
Using the soft-delete policy, the softDeleteColumnName for the data source (Azure
Cosmos DB SQL API) would be configured as _isDeleted. The softDeleteMarkerValue
would then be set to true. Using this strategy, Azure Cognitive Search will remove
items that have been soft-deleted from the container.
======
Read or updated together: Data that's read or updated together is nearly always
modeled as a single document. This is especially true because our objective for our
NoSQL model is to reduce the number of requests to our database. In our scenario,
all of the customer entities are read or written together.
1:1 relationship: For example, Customer and CustomerPassword have a 1:1
relationship.
1:Few relationship: In a NoSQL database, it's necessary to distinguish 1:Many
relationships as bounded or unbounded. Customer and CustomerAddress have a bounded
1:Many relationship because customers in an e-commerce application normally have
only a handful of addresses to ship to. When the relationship is bounded, this is a
1:Few relationship.
When should you reference data?
Reference data as separate documents when the following criteria apply to your
data:
=====
In Azure Cosmos DB, you increase storage and throughput by adding more physical
partitions to access and store data. The maximum storage size of a physical
partition is 50 GB, and the maximum throughput is 10,000 RU/s.
The maximum size for a logical partition is 20 GB. Using a partition key with high
cardinality to allows you to avoid this 20-GB limit by spreading your data across a
larger number of logical partitions.
=====
It's important to understand the access patterns for your application to ensure
that requests are spread as evenly as possible across partition key values. When
throughput is provisioned for a container in Azure Cosmos DB, it's allocated evenly
across all the physical partitions within a container.
As an example, if you have a container with 30,000 RU/s, this workload is spread
across the three physical partitions for the same six tenants mentioned earlier. So
each physical partition gets 10,000 RU/s. If tenant D consumes all of its 10,000
RU/s, it will be rate limited because it can't consume the throughput allocated to
the other partitions. This results in poor performance for tenant C and D, and
leaving unused compute capacity in the other physical partitions and remaining
tenants. Ultimately, this partition key results in a database design where the
application workload can't scale.
=====
When you're choosing a partition key, you also need to consider whether the data is
read heavy or write heavy. You should seek to distribute write-heavy requests with
a partition key that has high cardinality.
For read-heavy workloads, you should ensure that queries are processed by one or a
limited number of physical partitions by including an WHERE clause with an equality
filter on the partition key, or an IN operator on a subset of partition key values
in your queries.
You might worry here that making the id the partition key means that we'll have as
many logical partitions as there are customers, with each logical partition
containing only a single document. Millions of customers would result in millions
of logical partitions.
But this is perfectly fine! Logical partitions are a virtual concept, and there's
no limit to how many logical partitions you can have. Azure Cosmos DB will
collocate multiple logical partitions on the same physical partition. As logical
partitions in number or in size, Cosmos DB will move them to new physical
partitions when needed.
====
One thing you might have noticed with the salesOrder container is that it shares
the same partition key as the customer container. The customer container has a
partition key of ID and salesOrder has a partition key of customerId. When data
share a partition key and have similar access patterns, they're candidates for
being stored in the same container. As a NoSQL database, Azure Cosmos DB is schema
agnostic, so mixing entities with different schema is not only possible but, under
these conditions, it's also another best practice. But to combine the data from
these two containers, you'll need to make more changes to your schema.
First, you need to add a customerId property to each customer document. Customers
will now have the same value for ID and customerId. Next, you need a way to
distinguish a sales order from a customer in the container. So you'll add a
discriminator property you'll call type that has a value of customer and salesOrder
for each entity.
With these changes, you can now store both the customer data and sales order data
in your new customer container. Each customer is in its own logical partition and
will have one customer item with all its sales orders. For your second operation
here, you now have a query you can run to list all sales orders for a customer.
====
Before your new model is complete, one last operation to look at is to query your
top 10 customers by the number of sales orders. In your current model, you first do
a group by on each customer and sum for sales orders in your customer container.
You then sort in descending order and take the top 10 results. Even though
customers and sales orders sit in the same container, this type of query is not
something you can currently do.
Now, every time a customer creates a new sales order and a new sales order is
inserted into your customer container, you need a way to update the customer
document and increment the salesOrderCount property by one. To do this, you need a
transaction. Azure Cosmos DB supports transactions when the data sits within the
same logical partition.
Because customers and sales orders reside in the same logical partition, you can
insert the new sales order and update the customer document within a transaction.
There are two ways to implement transactions in Azure Cosmos DB: by using stored
procedures or by using a feature called transactional batch, which is available in
both .NET and Java SDKs.
=======
====
====
=====
====
Session consistency is a great option for applications where the end users may be
confused if they cannot immediately see any transaction they just made.
Consistent Prefix consistency is ideal for applications where the order of read
operations matters more than the latency.
Eventual consistency is a good option for applications that don't require any
linear or consistency guarantees.
Each new Azure Cosmos DB account has a default consistency level of Session. In the
Azure portal, the Default consistency pane is used to configure a new default
consistency level for the entire account.
=====
The consistency level can only be relaxed on a per-request basis, not strengthened.
=====
Session tokens can be manually pulled out of a client and used on another client to
preserve a session between multiple clients.
=====
Enabling the ability to write to any region is a turnkey operation that doesn’t
interrupt the application’s availability.
=======
Type Description
Insert This conflict occurs when more than one item is inserted simultaneously
with the same unique identifier in multiple regions
Replace Replace conflicts occur when multiple client applications update the
same item concurrently in separate regions
Delete Delete conflicts occurs when a client is attempting to update an item
that has been deleted in another region at the same time
The default conflict resolution policy in Azure Cosmos DB is Last Write Wins. This
policy uses the timestamp (_ts) to determine which item wrote last. In simple
terms, if multiple items are in conflict, the item with the largest value for the
_ts property will win. In the case of a delete conflict, the operation to delete an
item will always win out over other operations.
You can only set a conflict resolution policy on newly created containers.
======
A custom resolution policy will use a stored procedure to resolve conflicts between
items in different regions. All custom stored procedures must be implemented with
the following JavaScript function signature.
Parameter Description
existingItem The item that is already committed
incomingItem The item that's being inserted or updated that generated the
conflict
isTombstone Boolean indicating if the incoming item was previously deleted
conflictingItems Array of all committed items in the container that conflicts with
incomingItem
await container.Scripts.CreateStoredProcedureAsync(properties);
====
=====
Index seek The query engine will seek an exact match on a field’s value by
traversing directly to that value and looking up how many items match. Once the
matched items are determined, the query engine will return the items as the query
result. The RU charge is constant for the lookup. The RU charge for loading and
returning items is linear based on the number of items.
Index scan The query engine will find all possible values for a field and then
perform various comparisons only on the values. Once matches are found, the query
engine will load and return the items as the query result. The RU charge is still
constant for the lookup, with a slight increase over the index seek based on the
cardinality of the indexed properties. The RU charge for loading and returning
items is still linear based on the number of items returned.
Full scan The query engine will load the items, in their entirety, to the
transactional store to evaluate the filters. This type of scan does not use the
index; however, the RU charge for loading items is based on the number of items in
the entire container.
An index scan can range in complexity from an efficient and precise index scan, to
a more involved expanded index scan, and finally the most complex full index scan.
=====
Suppose your application is write-heavy and only ever does point reads using the id
and partition key values. In that case, you can choose to disable indexing entirely
using a customized indexing policy.
=====
Console.WriteLine(response.IndexMetrics);
====
while(iterator.HasMoreResults)
{
FeedResponse<Product> response = await iterator.ReadNextAsync();
foreach(Product product in response)
{
// Do something with each product
}
Console.WriteLine($"RUs:\t\t{response.RequestCharge:0.00}");
totalRUs += response.RequestCharge;
}
====
Console.WriteLine($"RUs:\t{response.RequestCharge:0.00}");
=====
For some workloads in Azure Cosmos DB, an integrated cache comes at a great
benefit. These workloads include, but are not limited to:
Workloads with far more read operations and queries than write operations
Workloads that read large individual items multiple times
Workloads that execute queries multiple times with a large amount of RU/s
Workloads that have hot partition key[s] for read operations and queries
======
For the .NET SDK client to use the integrated cache, you must make sure that three
things are true:
The client uses the dedicated gateway connection string instead of the typical
connection string
The client is configured to use Gateway mode instead of the default Direct
connectivity mode
The client’s consistency level must be set to session or eventual
To configure a point read operation to use the integrated cache, you must create an
object of type ItemRequestOptions. In this object, you can manually set the
ConsistencyLevel property to either ConsistencyLevel.Session or
ConsistencyLevel.Eventual. You can then use the options variable in the
ReadItemAsync method invocation.
=====
By default, the cache will keep data for five minutes. This staleness window can be
configured using the MaxIntegratedCacheStaleness property in the SDK.
=====
By default, Azure Monitor will display the overall throughput of all Azure Cosmos
DB operations the selected account does. To better analyze the throughput, more
granular filtering will be needed to find aggregate usage of the individual
operation types or to further compare the usage of multiple operation types at the
same time. Using the Add filter and Apply splitting options will help us with those
analyses.
====
DataPlaneRequests - This table logs back-end requests for operations that execute
create, update, delete, or retrieve data.
QueryRuntimeStatistics - This table logs query operations against the SQL API
account.
PartitionKeyStatistics - This table logs logical partition key statistics in
estimated KB. It's helpful when troubleshooting skew storage.
PartitionKeyRUConsumption - This table logs every second aggregated RU/s
consumption of partition keys. It's helpful when troubleshooting hot partitions.
ControlPlaneRequests - This table logs Azure Cosmos DB account control data, for
example adding or removing regions in the replication settings.
=====
PartitionKeyRUConsumption.
That's correct. This table logs every second aggregated RU/s consumption of
partition keys. It's helpful when troubleshooting hot partitions.
=====
200 OK List, Get, Replace, Patch, Query The operation was successful.
Create a Document
400 Bad Request The override set in x-ms-consistency-level is stronger than the
one set during account creation. For example, if the consistency level is Session,
the override can't be Strong or Bounded.
Get a Document
304 Not Modified The document requested wasn't modified since the specified
eTag value in the If-Match header. The service returns an empty response body.
400 Bad Request The override set in the x-ms-consistency-level header is stronger
than the one set during account creation. For example, if the consistency level is
Session, the override can't be Strong or Bounded.
404 Not Found The document is no longer a resource, that is, the document was
deleted.
Replace a Document
400 Bad Request The JSON body is invalid. Check for missing curly brackets or
quotes.
404 Not Found The document no longer exists, that is, the document was deleted.
409 Conflict The id provided for the new document has been taken by an
existing document.
413 Entity Too Large The document size in the request exceeded the allowable
document size in a request.
Patch a Document
Delete Document
Query Documents
400 Bad Request The specified request was specified with an incorrect SQL syntax,
or missing required headers.
408 Request timeout The operation did not complete within the allotted amount
of time. This code is returned when a stored procedure, trigger, or UDF (within a
query) does not complete execution within the maximum execution time.
429 Too many requests The collection has exceeded the provisioned throughput
limit. Retry the request after the server specified retry after duration. For more
information, see request units.
500 Internal Server Error The operation failed because of an unexpected service
error. Contact support.
503 Service Unavailable The operation couldn't be completed because the
service was unavailable. This situation could happen because of network
connectivity or service availability issues. It's safe to retry the operation. If
the issue persists, contact support.
=====
======
A full backup is taken every 4 hours. Only the last two backups are stored by
default. Both the backup interval and the retention period can be configured in the
Azure portal. This configuration can be set during or after the Azure Cosmos DB
account has been created.
If Azure Cosmos DB's containers or database are deleted, the existing container and
database snapshots will be retained for 30 days.
Azure Cosmos DB backups are stored in Azure Blob storage.
Backups are stored in the current write region or if using multi-region writes to
one of the write regions to guarantee low latency.
Snapshots of the backup are replicated to another region through geo-redundant
storage (GRS). This replication, provides resiliency against regional disasters.
Backups can't be accessed directly. To restore the backup, a support ticket needs
to be opened with the Azure Cosmos DB team.
Backups won't affect performance or availability. Furthermore, no RUs are consumed
during the backup process.
====
Geo-redundant backup storage: The default value. Copies the backup asynchronously
across the paired region.
Zone-redundant backup storage: Copies the backup synchronously across three Azure
availability zones in the primary region.
Locally redundant backup storage: Copies the backup synchronously three times
within a single physical location in the primary region.
=====
When creating a new Azure Cosmos DB account, select Periodic under the Backup
policy tab.
Backup Interval - This setting defines how often is the backup going to be done.
Can be changed in minutes or hours. The interval period can be between 1 and 24
hours. The default is 240 minutes.
Backup Retention - This setting defines how long should the backups be kept. Can be
changed in hours or days. The retention period will be at least two times the
backup interval and 720 hours (or 30 days) at the most. The default is 8 Hours.
Backup storage redundancy - One of the three redundancy options discussed in the
previous section. The default is Geo-redundant backup storage.
=====
=====
=====
======
Always encrypted requires that you create data encryption keys (DEK) ahead of time.
The DEKs are created at client-side using the Azure Comsos DB SDK. These DEKs are
stored in the Azure Cosmos DB service. The DEKs are defined at the database level
so they can be shared across multiple containers. Each DEK you create can be used
to encrypt only one property, or can be used to encrypt many properties. You can
have multiple DEKs per databases.
Customer-managed keys
For each property that you want to encrypt, the encryption policy defines:
The path of the property in the form of /property. Only top-level paths are
currently supported, nested paths such as /path/to/property aren't supported.
The ID of the DEK to use when encrypting and decrypting the property.
An encryption type. It can be either randomized or deterministic.
The encryption algorithm to use when encrypting the property. The specified
algorithm can override the algorithm defined when creating the key if they're
compatible.
Because the Azure Cosmos DB service might need to support some querying
capabilities over the encrypted data, and it can't evaluate the data in plain text,
Always Encrypted has more than one encryption type. The encryption types supported
by Always Encrypted are:
Deterministic encryption: It always generates the same encrypted value for any
given plain text value and encryption configuration. Using deterministic encryption
allows queries to do equality filters on encrypted properties. However, it may
allow attackers to guess information about encrypted values by examining patterns
in the encrypted property. This is especially true if there's a small set of
possible encrypted values, such as True/False, or North/South/East/West region.
=====
=====
Once we created the CMK in the Azure Key Vault, its time to create our DEK in the
parent database. To create this DEK, we'll use the CreateClientEncryptionKeyAsync
method and pass the following information:
A string identifier that will uniquely identify the key in the database.
The encryption algorithm intended to be used with the key. Only one algorithm is
currently supported.
The key identifier of the CMK stored in Azure Key Vault. This parameter is passed
in a generic EncryptionKeyWrapMetadata object where the name can be any friendly
name you want, and the value must be the key identifier.
The following snippets show how we create this DEK in .NET.
=====
It's a good security practice to rotate your CMKs regularly. You should also rotate
your CMK if you suspect that the current CMK has been compromised. Once the CMK is
rotated, you provide that new CMK identifier for the DEK rewrapper to start using
it. This operation doesn't affect the encryption of your data, but the protection
of the DEK. Review the following script that rewraps the new CMK to the DEK:
await database.RewrapClientEncryptionKeyAsync(
"my-key",
new EncryptionKeyWrapMetadata(
keyStoreProvider.ProviderName,
"akvKey",
" https://<my-key-vault>.vault.azure.net/keys/<new-key>/<version>"));
=====
======
When the property to encrypt is a JSON array, every entry of the array is
encrypted.
When the property to encrypt is a JSON object, only the leaf values of the object
get encrypted. The intermediate subproperty names remain in plain text form.
=====
The AddParameterAsync method passes the value of the query parameter used in
queries that filter on encrypted properties. This method takes the following
arguments:
=====
Reading documents when only a subset of properties can be decrypted
Different document properties in the same container can use different encryption
policies. Each policy can use different CMKs to encrypt the properties. If your
client has access to some of the CMKs used to decrypt some of the properties, but
not access to other CMKs to decrypt other properties, you can still partially query
the documents with the properties you can decrypt. You should just remove those
properties from your queries, which you don't have access to their CMKs. For
example, if property1 was encrypted with key1 and property2 was encrypted with
key2, and your app only has access to key1, your query should ignore property2.
This query could look like, SELECT c.property1, c.property3 FROM c.
=====
What is the proper set of steps to set up Always Encrypted for Cosmos DB?
Create a CMKs in Azure Key Vault, Use the SDK to create the DEKs (with the CMKs) on
a database, create a container with the DEK and encryption policies.
That's correct. You can only set up Always Encrypted at the container's creation
time.
======
Data Explorer doesn't support RBAC. (Data explorer tab in the Portal, not the Data
explorer service)
That's correct. Data Explorer doesn't support RBAC, to query your data, use Azure
Cosmos DB Explorer instead.
=====
az cosmosdb: This group contains the commands required to create and manage a new
Azure Cosmos DB account.
az cosmosdb sql: This subgroup of the az cosmosdb group contains the commands to
manage SQL API-specific resources such as databases and containers.
By default, this command will create a new account using the SQL API.
az cosmosdb create \
--name '<account-name>' \
--resource-group '<resource-group>'
az cosmosdb create \
--name '<account-name>' \
--resource-group '<resource-group>' \
--default-consistency-level 'eventual' \
--enable-free-tier 'true'
az cosmosdb create \
--name '<account-name>' \
--resource-group '<resource-group>' \
--locations regionName='eastus'
======
======
======
======
az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--locations regionName='eastus' failoverPriority=0 isZoneRedundant=False \
--locations regionName='westus2' failoverPriority=1 isZoneRedundant=False \
--locations regionName='centralus' failoverPriority=2 isZoneRedundant=False
az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--enable-automatic-failover 'true'
az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--enable-multiple-write-locations 'true'
To remove a region from an Azure Cosmos DB account, use the az cosmosdb update
command to specify the locations that you want to remain using the --locations
argument one or more times. Any location that is not included in the list will be
removed from the account.
az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--locations regionName='eastus' failoverPriority=0 isZoneRedundant=False \
--locations regionName='westus2' failoverPriority=1 isZoneRedundant=False
======
az cosmosdb failover-priority-change \
--name '<account-name>' \
--resource-group '<resource-group>' \
--failover-policies 'eastus=0' 'centralus=1' 'westus2=2'
Even if you are not changing the priorities of every region, you must include all
regions in the failover-policies argument.
======
Changing the region with priority = 0 will trigger a manual failover for an Azure
Cosmos account.
The az cosmosdb update command is used to update the failover policies for an
account. If you use this command and change the failover priority for the region
that is already set to 0, the command will trigger a manual failover.
az cosmosdb failover-priority-change \
--name '<account-name>' \
--resource-group '<resource-group>' \
--failover-policies 'westus2=0' 'eastus=1'
======
Which Azure CLI command will change the maximum amount of throughput for a
container using autoscale throughput?
======
{
"$schema":
"https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
]
}
All resources we place in this template will be JSON objects within the resources
array.
======
Final
{
"$schema":
"https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.DocumentDB/databaseAccounts",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id))]",
"location": "[resourceGroup().location]",
"properties": {
"databaseAccountOfferType": "Standard",
"locations": [
{
"locationName": "westus"
}
]
}
},
{
"type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id),
'/cosmicworks')]",
"dependsOn": [
"[resourceId('Microsoft.DocumentDB/databaseAccounts', concat('csmsarm',
uniqueString(resourceGroup().id)))]"
],
"properties": {
"resource": {
"id": "cosmicworks"
}
}
},
{
"type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id),
'/cosmicworks/products')]",
"dependsOn": [
"[resourceId('Microsoft.DocumentDB/databaseAccounts', concat('csmsarm',
uniqueString(resourceGroup().id)))]",
"[resourceId('Microsoft.DocumentDB/databaseAccounts/sqlDatabases',
concat('csmsarm', uniqueString(resourceGroup().id)), 'cosmicworks')]"
],
"properties": {
"options": {
"throughput": 400
},
"resource": {
"id": "products",
"partitionKey": {
"paths": [
"/categoryId"
]
}
}
}
}
]
}
======
}
resource: {
id: 'cosmicworks'
}
}
}
resource Container
'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2021-05-15' = {
parent: Database
name: 'customers'
properties: {
resource: {
id: 'customers'
partitionKey: {
paths: [
'/regionId'
]
}
}
}
}
=======
Argument Description
--resource-group The name of the resource group that is the target of the
deployment
--template-file The name of the file with the resources defined to deploy
======
The indexingPolicy object can be lifted with no changes and set to the
properties.resource.indexingPolicy property of the
Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers.
{
"type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id),
'/cosmicworks/products')]",
"dependsOn": [
"[resourceId('Microsoft.DocumentDB/databaseAccounts', concat('csmsarm',
uniqueString(resourceGroup().id)))]",
"[resourceId('Microsoft.DocumentDB/databaseAccounts/sqlDatabases',
concat('csmsarm', uniqueString(resourceGroup().id)), 'cosmicworks')]"
],
"properties": {
"options": {
"throughput": 400
},
"resource": {
"id": "products",
"partitionKey": {
"paths": [
"/categoryId"
]
},
"indexingPolicy": {
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/price/*"
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}
}
}
}
resource Container
'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2021-05-15' = {
parent: Database
name: 'customers'
properties: {
resource: {
id: 'customers'
partitionKey: {
paths: [
'/regionId'
]
}
indexingPolicy: {
indexingMode: 'consistent'
automatic: true
includedPaths: [
{
path: '/address/*'
}
]
excludedPaths: [
{
path: '/*'
}
]
}
}
}
}
==========
Atomicity guarantees that all the work done inside a transaction is treated as a
single unit where either all of it is committed or none.
Consistency makes sure that the data is always in a healthy internal state across
transactions.
Isolation guarantees that no two transactions interfere with each other –
generally, most commercial systems provide multiple isolation levels that can be
used based on the application's needs.
Durability ensures that any change that's committed in the database will always be
present.
========
function name() {
}
Within the function, the getContext() method retrieves a context object, which can
be used to perform multiple actions, including:
Using the context object, you can invoke the getResponse() method to access the
HTTP response object to perform actions such as returning a HTTP OK (200) and
setting the response's body to a static string.
function greet() {
var context = getContext();
var response = context.getResponse();
response.setBody("Hello, Learn!");
}
======
Again, use the context object, you can invoke the getCollection() method to access
the container using the JavaScript query API.
function createProduct(item) {
var context = getContext();
var container = context.getCollection();
container.createDocument(
container.getSelfLink(),
item
);
}
======
This stored procedure is almost complete. While this code will run fine, it does
stand the risk of swallowing errors and potentially not returning if the stored
procedure has exceeded the timeout. We should update the code by implementing two
more changes:
Store the boolean return value of container.createDocument, and then use that to
determine if we should return from the function due to an impending server timeout.
function createProduct(item) {
var context = getContext();
var container = context.getCollection();
var accepted = container.createDocument(
container.getSelfLink(),
item,
(error, newItem) => {
if (error) throw error;
context.getResponse().setBody(newItem)
}
);
if (!accepted) return;
}
======
Transactions are deeply and natively integrated into Azure Cosmos DB SQL API’s
JavaScript programming model. Inside a JavaScript function, all operations are
automatically wrapped under a single transaction. If the function completes without
any exception, all data changes are committed. Azure Cosmos DB’s SQL API will roll
back the entire transaction if a single exception is thrown from the script.
======
Alternatively, you can use file APIs such as System.IO.File to read a function from
a *.js file.
await container.Scripts.CreateStoredProcedureAsync(properties);
======
User-defined functions (UDFs) are used to extend the Azure Cosmos DB SQL API’s
query language grammar and implement custom business logic. UDFs can only be called
from inside queries as they enhance and extend the SQL query language
UDFs do not have access to the context object and are meant to be used as compute-
only code
function name(input) {
return output;
}
function addTax(preTax) {
return preTax * 1.15;
}
SELECT
p.name,
p.price,
udf.addTax(p.price) AS priceWithTax
FROM
products p
=======
========
Triggers are defined as JavaScript functions. The function is then executed when
the trigger is invoked.
function name() {
}
Within the function, the getContext() method retrieves a context object, which can
be used to perform multiple actions, including:
Using the context object, you can invoke the getRequest() or getResponse() methods
to access the HTTP request and response objects. You can also invoke the
getCollection() method to access the container using the JavaScript query API.
Pre trigger
Pre-triggers are ran before an operation and cannot have any input parameters. They
can perform actions such as validate the properties of an item, or inject missing
properties.
function addLabel(item) {
var context = getContext();
var request = context.getRequest();
if (!('label' in pendingItem))
pendingItem['label'] = 'new';
request.setBody(pendingItem);
}
Post-trigger
Post-triggers run after an operation has completed and can have input parameters
even though they are not required. They have action to the HTTP response message
right before it is sent to the client. They can perform actions such as updating or
creating secondary items based on changes to your original item.
Let's walk through a slightly different example with the same JSON file. Now, a
post-trigger will be used to create a second item with a different materialized
view of our data. Our goal, is to create a second item with three JSON properties;
sourceId, categoryId, and displayName.
{
"sourceId": "caab0e5e-c037-48a4-a760-140497d19452",
"categoryId": "e89a34d2-47ee-4da8-bcf6-10f552604b79",
"displayName": "Handlebar [Accessories]",
}
We are including the categoryId property because all items created within a post-
trigger must have the same logical partition key as the original item that was the
source of the trigger.
We can start our function by getting both the container and HTTP response using the
getCollection() and getResponse() methods. We will also get the newly created item
using the getBody() method of the HTTP response object.
function createView() {
var context = getContext();
var container = context.getCollection();
var response = context.getResponse();
var viewItem = {
sourceId: createdItem.id,
categoryId: createdItem.categoryId,
displayName: `${createdItem.name} [${createdItem.categoryName}]`
};
======
Create a pre-trigger
if (!('label' in pendingItem))
pendingItem['label'] = 'new';
request.setBody(pendingItem);
}";
await container.Scripts.CreateTriggerAsync(properties);
======
Create a post-trigger
var viewItem = {
sourceId: createdItem.id,
categoryId: createdItem.categoryId,
displayName: `${createdItem.name} [${createdItem.categoryName}]`
};
await container.Scripts.CreateTriggerAsync(properties);
=========
TriggerOperation
All 0
Specifies all operations.
Create 1
Specifies create operations only.
Delete 3
Specifies delete operations only.
Replace 4
Specifies replace operations only.
Update 2
Specifies update operations only.
========
Remember, triggers are not automatically executed; they must be specified for each
database operation where you want them to execute.
======
You have authored a user-defined function named addTax. You are writing a SQL
query to return a flat array of scalar price values with the calculated tax value.
Which valid SQL query should you use for this task?
======
--enable-analytical-storage true
=====