0% found this document useful (0 votes)
177 views

Cosmosdb Study

This document provides an overview of the Azure Cosmos DB SQL API including: - It is optimized for write-heavy workloads and provides guaranteed speed at any scale through elastic scaling. - It offers SDKs for popular languages, SQL and other APIs like MongoDB and Cassandra, with no ETL required. - It ensures high availability, security and is fully managed with automatic scaling.

Uploaded by

Bryan Sanchez
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views

Cosmosdb Study

This document provides an overview of the Azure Cosmos DB SQL API including: - It is optimized for write-heavy workloads and provides guaranteed speed at any scale through elastic scaling. - It offers SDKs for popular languages, SQL and other APIs like MongoDB and Cassandra, with no ETL required. - It ensures high availability, security and is fully managed with automatic scaling.

Uploaded by

Bryan Sanchez
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 41

SQL API Introduction

Integracion con Synapsis analytics (synapsis link) debe ser activado a nivel de
database y container (Es el is analyticts storage check)

Autoscale throughput is helpful if your team cannot predict your throughput needs
accurately or otherwise use the max throughput amount for < 66% of hours per month.

Azure Cosmos DB SQL API is optimized for write-heavy workloads.

- Guaranteed speed at any scale—even through bursts—with instant, limitless


elasticity, fast reads, and multi-master writes, anywhere in the world
- Fast, flexible app development with SDKs for popular languages, a native SQL API
along with APIs for MongoDB, Cassandra, and Gremlin, and no-ETL (extract,
transform, load) analytics
- Ready for mission-critical applications with guaranteed business continuity,
99.999-percent availability, and enterprise-grade security
- Fully managed and cost-effective serverless database with instant, automatic
scaling that responds to application needs

- Experience unpredictable spikes and dips in traffic


- Generate lots of data
- Need to deliver real-time user experiences
- Are depended upon for business continuity

RUs

- one RU for a read and six RU/s for a write operation of a 1-KB document in
optimal conditions.

A document's "time-to-live" (TTL) is measured in seconds from the last modification


and can be set at the container level with the ability to override on a per-item
basis

The TTL value for an item is configured by setting the ttl path of the item

The TTL value for an item is configured by setting the ttl path of the item. The
TTL value for an item will only work if the DefaultTimeToLive property is
configured for the parent containe

Examples
Container.DefaultTimeToLive Item.ttl Expiration in seconds
1000 null 1000
1000 -1 This item will never expire
1000 2000 2000

Container.DefaultTimeToLive Item.ttl Expiration in seconds


null null This item will never expire
null -1 This item will never expire
null 2000 This item will never expire

=====

400 Bad Request Something was wrong with the item in the body of the request
403 Forbidden Container was likely full
409 Conflict Item in container likely already had a matching id
413 RequestEntityTooLarge Item exceeds max entity size
429 TooManyRequests Current request exceeds the maximum RU/s provisioned for
the container
PartitionKey partitionKey = new ("accessories-used");
TransactionalBatch batch = container.CreateTransactionalBatch(partitionKey)
.CreateItem<Product>(saddle)
.CreateItem<Product>(handlebar);

using TransactionalBatchResponse response = await batch.ExecuteAsync();

CreateItemStream() Create item from existing stream


DeleteItem() Delete an item
ReadItem() Read an item
ReplaceItem() & ReplaceItemStream() Update an existing item or stream
UpsertItem() & UpsertItemStream() Create or update an existing item or stream
based on the item's unique identifier

====

TransactionalBatchOperationResult<Product> result =
response.GetOperationResultAtIndex<Product>(0);
Product firstProductResult = result.Resource;

TransactionalBatchOperationResult<Product> result =
response.GetOperationResultAtIndex<Product>(1);
Product secondProductResult = result.Resource;

====

string categoryId = "9603ca6c-9e28-4a02-9194-51cdb7fea816";


PartitionKey partitionKey = new (categoryId);

ItemResponse<Product> response = await container.ReadItemAsync<Product>("01AC0",


partitionKey);
Product product = response.Resource;
string eTag = response.ETag;

product.price = 50d;

ItemRequestOptions options = new ItemRequestOptions { IfMatchEtag = eTag };


await container.UpsertItemAsync<Product>(product, partitionKey, requestOptions:
options);

====

CosmosClientOptions options = new ()


{
AllowBulkExecution = true
};

List<Product> productsToInsert = GetOurProductsFromSomeWhere();

List<Task> concurrentTasks = new List<Task>();

foreach(Product product in productsToInsert)


{
concurrentTasks.Add(
container.CreateItemAsync<Product>(
product,
new PartitionKey(product.partitionKeyValue))
);
}

Task.WhenAll(concurrentTasks);

When we invoke Task.WhenAll, the SDK will kick in to create batches to group our
operations by physical partition, then distribute the requests to run concurrently.
Grouping operations greatly improves efficiency by reducing the number of back-end
requests, and allowing batches to be dispatched to different physical partitions in
parallel. It also reduces thread count on the client making it easier to consume
more throughput that you could if done as individual operations using individual
threads.

Review bulk operation caveats

Throughput consumption
The provisioned throughput in request units per second (RU/s) is higher than if the
operations were executed individually. This increase should be considered as you
evaluate total throughput requirements when measured against other operations that
will happen concurrently.

Latency impact
When the SDK is attempting to fill a batch and doesn’t quite have enough items, it
will wait 100 milliseconds for more items. This wait can effect overall latency.

Document size
The SDK automatically creates batches for optimization with a maximum of 2 Mb (or
100 operations). Smaller items can take advantage of this optimization, with
oversized items having an inverse effect.

====

One interesting caveat here is that it doesn’t matter what name is used here for
the source, as this will reference the source moving forward. You can think of this
as a variable. It’s not uncommon to use a single letter from the container name:

SELECT
p.name,
p.price
FROM
p

====

{
"name": "LL Bottom Bracket",
"category": "Components, Bottom Brackets",
"scannerData": {
"price": 53.99
}
}

SELECT
p.name,
p.categoryName AS category,
{ "price": p.price } AS scannerData
FROM
products p
WHERE
p.price >= 50 AND
p.price <= 100

====

SELECT DISTINCT
p.categoryName
FROM
products p

====

SELECT DISTINCT VALUE


p.categoryName
FROM
products p

[
"Components, Road Frames",
"Components, Touring Frames",
"Bikes, Touring Bikes",
"Clothing, Vests",
"Accessories, Locks",
"Components, Pedals",
...

// Developers read this as List<string>

====

First, we can use the IS_DEFINED built-in function to check if the tags property
exists at all in this item:

SELECT
IS_DEFINED(p.tags) AS tags_exist
FROM
products p

[
{
"tags_exist": false
}
]

====

We can use the IS_ARRAY built-in function to check if the tags property is an
array:

SELECT
IS_ARRAY(p.tags) AS tags_is_array
FROM
products p

We can also check if the tags property is null or not using the IS_NULL built-in
function:

SELECT
IS_NULL(p.tags) AS tags_is_null
FROM
products p

=====

SELECT
p.id,
p.price,
(p.price * 1.25) AS priceWithTax
FROM
products p
WHERE
IS_NUMBER(p.price)

SELECT
p.id,
p.price
FROM
products p
WHERE
IS_STRING(p.price)

SELECT VALUE
LOWER(p.sku)
FROM
products p

SELECT
*
FROM
products p
WHERE
p.retirementDate >= GetCurrentDateTime()

=====

"tags": [
{
"id": "2CE9DADE-DCAC-436C-9D69-B7C886A01B77",
"name": "apparel",
"class": "group"
},
{
"id": "CA170AAD-A5F6-42FF-B115-146FADD87298",
"name": "worn",
"class": "trade-in"
},
{
"id": "CA170AAD-A5F6-42FF-B115-146FADD87298",
"name": "no-damaged",
"class": "trade-in"
}
]

SELECT
p.id,
p.name,
t.name AS tag
FROM
products p
JOIN
t IN p.tags

[
{
"id": "80D3630F-B661-4FD6-A296-CD03BB7A4A0C",
"name": "Classic Vest, L",
"tag": "apparel"
},
{
"id": "80D3630F-B661-4FD6-A296-CD03BB7A4A0C",
"name": "Classic Vest, L",
"tag": "worn"
},
{
"id": "80D3630F-B661-4FD6-A296-CD03BB7A4A0C",
"name": "Classic Vest, L",
"tag": "no-damaged"
}
]

=====

string sql = "SELECT p.name, t.name AS tag FROM products p JOIN t IN p.tags WHERE
p.price >= @lower AND p.price <= @upper"
QueryDefinition query = new (sql)
.WithParameter("@lower", 500)
.WithParameter("@upper", 1000);

====

QueryRequestOptions options = new()


{
MaxItemCount = 100
};

FeedIterator<Product> iterator = container.GetItemQueryIterator<Product>(query,


requestOptions: options);

while(iterator.HasMoreResults)
{
foreach(Product product in await iterator.ReadNextAsync())
{
// Handle individual items
}
}

====

That you will want to connect to the first writable (primary) region of your
account
That you will use the default consistency level for the account with your read
requests
That you will connect directly to data nodes for requests

Gateway All requests are routed through the Azure Cosmos DB gateway as a
proxy
Direct The gateway is only used in initialization and to and cache addresses for
direct connectivity to data nodes

======

Bounded Staleness
ConsistentPrefix
Eventual
Session
Strong

For a single region account, the minimum value of K and T is 10 write operations or
5 seconds. For multi-region accounts the minimum value of K and T is 100,000 write
operations or 300 seconds.

The ConsistencyLevel setting is only used to only weaken the consistency level for
reads. It cannot be strengthened or applied to writes.

CosmosClientOptions options = new CosmosClientOptions()


{
ApplicationPreferredRegions = new List<string> { "westus", "eastus" }
};

CosmosClientOptions options = new ()


{
ApplicationRegion = "westus"
};

First, the emulator's endpoint is https://localhost:<port>/ using SSL with the


default port set to 8081.

If you use a MaxItemCount of -1, you should ensure the total response doesn't
exceed the service limit for response size. For instance, the max response size is
4 MB.

WithApplicationRegion or WithApplicationPreferredRegions Configures


preferred region[s]
WithConnectionModeDirect and WithConnectionModeGateway Sets connection mode
WithConsistencyLevel Overrides consistency level

=====

public class LogHandler : RequestHandler


{
public override async Task<ResponseMessage> SendAsync(RequestMessage request,
CancellationToken cancellationToken)
{
Console.WriteLine($"[{request.Method.Method}]\t{request.RequestUri}");

ResponseMessage response = await base.SendAsync(request,


cancellationToken);

Console.WriteLine($"[{Convert.ToInt32(response.StatusCode)}]\
t{response.StatusCode}");

return response;
}
}

CosmosClientBuilder builder = new (endpoint, key);


builder.AddCustomHandlers(new LogHandler());

CosmosClient client = builder.Build();

====

All data in Azure Cosmos DB SQL API containers is indexed by default. This occurs
because the container includes a default indexing policy that’s applied to all
newly created containers. The default indexing policy consists of the following
settings:

The inverted index is updated for all create, update, or delete operations on an
item
All properties for every item is automatically indexed
Range indexes are used for all strings or numbers

Indexing mode Configures whether indexing is enabled (Consistent) or not (None)


for the container Consistent
Automatic Configures whether automatically indexes items as they are written
Enabled
Included paths Set of paths to include in the index All (*)
Excluded paths Set of paths to exclude from the index _etag property path

{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
]
}

=====

The consistent indexing mode updates the index synchronously as your perform
individual operations that modify an item (create, update, or delete). This
indexing mode will be the standard choice for most containers to ensure the index
is updated as items change.

The none indexing mode completely disables indexing on a container. This indexing
mode is a scenario-specific mode where the indexing operation is either unnecessary
or could impact the solution's overall performance. Two examples include:

A bulk operation to create, update, or delete multiple documents may benefit from
disabling indexing during the bulk execution period. Once the bulk operations are
complete, the indexing mode can be switched back to consistent.
Solutions that use containers as a pure key-value store only perform point-read
operations. These containers do not benefit from the secondary indexes created by
running the indexer.

====
Three primary operators are used when defining a property path:

The ? operator indicates that a path terminates with a string or number (scalar)
value
The [] operator indicates that this path includes an array and avoids having to
specify an array index value
The * operator is a wildcard and matches any element beyond the current path
Using these operators, you can create a few example property path expressions for
the example JSON item:

Path expression Description


/* All properties
/name/? The scalar value of the name property
/category/* All properties under the category property
/metadata/sku/? The scalar value of the metadata.sku property
/tags/[]/name/? Within the tags array, the scalar values of all possible name
properties

=====

Included path: /category/name/?

Excluded path: /category/*

The exclude path excludes all possible properties within the category path, however
the include path is more precise and specifically includes the category.name
property. The result is that all properties within category are not indexed, with
the sole exception being the category.name property.

Indexing policies must include the root path and all possible values (/*) as either
an included or excluded path. More customizations exist as a level of precision
beyond that base. This leads to two fundamental indexing strategies you will see in
many examples.

====

IndexingPolicy policy = new ()


{
IndexingMode = IndexingMode.Consistent,
Automatic = true
};

policy.ExcludedPaths.Add(
new ExcludedPath{ Path = "/*" }
);

policy.IncludedPaths.Add(
new IncludedPath{ Path = "/name/?" }
);
policy.IncludedPaths.Add(
new IncludedPath{ Path = "/categoryName/?" }
);

ContainerProperties options = new ()


{
Id = "products",
PartitionKeyPath = "/categoryId",
IndexingPolicy = policy
};
Container container = await database.CreateContainerIfNotExistsAsync(options,
throughput: 400);

====

SELECT * FROM products p ORDER BY p.price ASC, p.name ASC


The composite index that will support this query must exactly match the sequence of
the properties in the ORDER BY clause. A composite index of (name ASC, price ASC)
would not work in this example. Composite indexes such as these will work in this
scenario:

(price ASC, name ASC)

(price DESC, name ASC)

You can also use composite indexes with queries that have different permutations of
filters and order by clauses.

=====

For example, to create a composite index of (name ASC, price DESC), you can define
a JSON object with this structure:

{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/_etag/?"
}
],
"compositeIndexes": [
[
{
"path": "/name",
"order": "ascending"
},
{
"path": "/price",
"order": "descending"
}
]
]
}

=====

ChangesHandler<Product> changeHandlerDelegate = async (


IReadOnlyCollection<Product> changes,
CancellationToken cancellationToken
) => {
foreach(Product product in changes)
{
await Console.Out.WriteLineAsync($"Detected Operation:\t[{product.id}]\
t{product.name}");
// Do something with each change
}
};

Container sourceContainer = client.GetContainer("cosmicworks", "products");

Container leaseContainer = client.GetContainer("cosmicworks", "productslease");

var builder = sourceContainer.GetChangeFeedProcessorBuilder<Product>(


processorName: "productItemProcessor",
onChangesDelegate: changeHandlerDelegate
);

ChangeFeedProcessor processor = builder


.WithInstanceName("desktopApplication")
.WithLeaseContainer(leaseContainer)
.Build();

WithInstanceName Name of host instance


WithStartTime Set the pointer (in time) to start looking for changes after
WithLeaseContainer Configures the lease container
WithErrorNotification Assigns a delegate to handle errors during execution
WithMaxItems Quantifies the max number of items in each batch
WithPollInterval Sets the delay when the processor will poll the change feed for
new changes

await processor.StartAsync();

======

ChangesEstimationHandler changeEstimationDelegate = async (


long estimation,
CancellationToken cancellationToken
) => {
// Do something with the estimation
};

ChangeFeedProcessor estimator = sourceContainer.GetChangeFeedEstimatorBuilder(


processorName: "productItemEstimator",
estimationDelegate: changeEstimationDelegate)
.WithLeaseContainer(leaseContainer)
.Build();

====

An Azure Cognitive Search instance is comprised of a few core components:

Indexes that contain JSON documents that are searchable


Indexers to crawl data from various data sources and insert them into indexes
Data Sources that connect Azure Cognitive Search to various data platforms

The first step is to create a data source. The data source points to somewhere
where data is stored. With Azure Cosmos DB SQL API, the data source is a reference
to an existing account with the following parameters configured:

Parameter Value
Connection string Connection string for Azure Cosmos DB account
Database Name of target database
Collection Name of target container
Query A SQL query to select items to be indexed

=====

Once the data source is configured, an index should be created that would be the
target of the indexing operation. The index contains, at a minimum, a name and a
key. The key refers to a unique identifier field for each JSON document in the
index.

Each field in the index should be configured to enable or disable features when
searching. These optional features allow extra search functionality on specific
fields when it makes sense. For each field, you must configure whether the field
is:

Feature Description
Retrievable Configures the field to be projected in search result sets
Filterable Accepts OData-style filtering on the field
Sortable Enables sorting using the field
Facetable Allows field to be dynamically aggregated and grouped
Searchable Allows search queries to match terms in the field

=====

The final step is to configure the indexer’s name and schedule. The schedule
determines how often the indexer will run to pull data from the data source and
populate the index with JSON documents.

=====

The default query for the Azure Cosmos DB SQL API data source is the following SQL
query.

SELECT
*
FROM
c
WHERE
c._ts >= @HighWaterMark
ORDER BY
c._ts
This query only finds items whose timestamp (_ts property) is greater than or equal
to a built-in high watermark field. The high watermark field comes from a built-in
change detection policy that attempts to identify whether an item has been changed
or not.

To accomplish change detection, the indexer will index all items returned by the
query. It will then store a timestamp as the high watermark. The next indexer run
will then index items with a timestamp greater than or equal to the stored high
watermark. This strategy effectively indexes all items that have been created or
changed since the last run.

If your SQL query sorts the items to index using the timestamp, then Azure
Cognitive Search can implement incremental progress during indexing. If the indexer
fails for a transient reason, sorting the timestamps will allow the indexer to
resume indexing from the failure point instead of reindexing the entire container
again. This setting must be enabled when configuring the data source.
======

If an item is deleted from a container in Azure Cosmos DB SQL API, that item may
not be deleted from the index in Azure Cognitive Search. To enable tracking of
deleted items, you must configure a policy to track when an item is deleted.

Azure Cognitive Search supports, exclusively, the ability to track if an item is


deleted using a combination of a field and value in a soft-delete scenario. For
example, consider this JSON document that has a property named _isDeleted with a
value of true.

JSON

Copy
{
"id": "E08E4507-9666-411B-AAC4-519C00596B0A",
"categoryId": "86F3CBAB-97A7-4D01-BABB-ADEFFFAED6B4",
"sku": "TI-R092",
"name": "LL Road Tire",
"_isDeleted": true
}
Using the soft-delete policy, the softDeleteColumnName for the data source (Azure
Cosmos DB SQL API) would be configured as _isDeleted. The softDeleteMarkerValue
would then be set to true. Using this strategy, Azure Cognitive Search will remove
items that have been soft-deleted from the container.

======

When should you embed data?


Embed data in a document when the following criteria apply to your data:

Read or updated together: Data that's read or updated together is nearly always
modeled as a single document. This is especially true because our objective for our
NoSQL model is to reduce the number of requests to our database. In our scenario,
all of the customer entities are read or written together.
1:1 relationship: For example, Customer and CustomerPassword have a 1:1
relationship.
1:Few relationship: In a NoSQL database, it's necessary to distinguish 1:Many
relationships as bounded or unbounded. Customer and CustomerAddress have a bounded
1:Many relationship because customers in an e-commerce application normally have
only a handful of addresses to ship to. When the relationship is bounded, this is a
1:Few relationship.
When should you reference data?
Reference data as separate documents when the following criteria apply to your
data:

Read or updated independently: This is especially true where combining entities


that would result in large documents. Updates in Azure Cosmos DB require the entire
item to be replaced. If a document has a few properties that are frequently updated
alongside a large number of mostly static properties, it's much more efficient to
split the document into two. One document then contains the smaller set of
properties that are updated frequently. The other document contains the static,
unchanging values.

1:Many relationship: This is especially true if the relationship is unbounded.


Azure Cosmos DB has a maximum document size of 2 MB. So in situations where the
1:Many relationship is unbounded or can grow extremely large, data should be
referenced, not embedded.
Many:Many relationship: We'll explore an example of this relationship in a later
unit with product tags.

Separating these properties reduces throughput consumption for more efficiency. It


also reduces latency for better performance.

=====

In Azure Cosmos DB, you increase storage and throughput by adding more physical
partitions to access and store data. The maximum storage size of a physical
partition is 50 GB, and the maximum throughput is 10,000 RU/s.

The maximum size for a logical partition is 20 GB. Using a partition key with high
cardinality to allows you to avoid this 20-GB limit by spreading your data across a
larger number of logical partitions.

=====

It's important to understand the access patterns for your application to ensure
that requests are spread as evenly as possible across partition key values. When
throughput is provisioned for a container in Azure Cosmos DB, it's allocated evenly
across all the physical partitions within a container.

As an example, if you have a container with 30,000 RU/s, this workload is spread
across the three physical partitions for the same six tenants mentioned earlier. So
each physical partition gets 10,000 RU/s. If tenant D consumes all of its 10,000
RU/s, it will be rate limited because it can't consume the throughput allocated to
the other partitions. This results in poor performance for tenant C and D, and
leaving unused compute capacity in the other physical partitions and remaining
tenants. Ultimately, this partition key results in a database design where the
application workload can't scale.

=====

When you're choosing a partition key, you also need to consider whether the data is
read heavy or write heavy. You should seek to distribute write-heavy requests with
a partition key that has high cardinality.

For read-heavy workloads, you should ensure that queries are processed by one or a
limited number of physical partitions by including an WHERE clause with an equality
filter on the partition key, or an IN operator on a subset of partition key values
in your queries.

A query that filters on a different property, such as favoriteColor, would "fan


out" to all partitions in the container. This is also known as a cross-partition
query. Such a query will perform as expected when the container is small and
occupies only a single partition. However, as the container grows and there are
increasing number of physical partitions, this query will become slower and more
expensive because it will need to check every partition to get the results whether
the physical partition containers data related to the query or not.

You might worry here that making the id the partition key means that we'll have as
many logical partitions as there are customers, with each logical partition
containing only a single document. Millions of customers would result in millions
of logical partitions.

But this is perfectly fine! Logical partitions are a virtual concept, and there's
no limit to how many logical partitions you can have. Azure Cosmos DB will
collocate multiple logical partitions on the same physical partition. As logical
partitions in number or in size, Cosmos DB will move them to new physical
partitions when needed.

====

One thing you might have noticed with the salesOrder container is that it shares
the same partition key as the customer container. The customer container has a
partition key of ID and salesOrder has a partition key of customerId. When data
share a partition key and have similar access patterns, they're candidates for
being stored in the same container. As a NoSQL database, Azure Cosmos DB is schema
agnostic, so mixing entities with different schema is not only possible but, under
these conditions, it's also another best practice. But to combine the data from
these two containers, you'll need to make more changes to your schema.

First, you need to add a customerId property to each customer document. Customers
will now have the same value for ID and customerId. Next, you need a way to
distinguish a sales order from a customer in the container. So you'll add a
discriminator property you'll call type that has a value of customer and salesOrder
for each entity.

With these changes, you can now store both the customer data and sales order data
in your new customer container. Each customer is in its own logical partition and
will have one customer item with all its sales orders. For your second operation
here, you now have a query you can run to list all sales orders for a customer.

====

Before your new model is complete, one last operation to look at is to query your
top 10 customers by the number of sales orders. In your current model, you first do
a group by on each customer and sum for sales orders in your customer container.
You then sort in descending order and take the top 10 results. Even though
customers and sales orders sit in the same container, this type of query is not
something you can currently do.

The solution here is to denormalize the aggregate value in a new property,


salesOrderCount, in the customer document. You can get the data you want by using
this property in a query such as the one shown in the following diagram:

Now, every time a customer creates a new sales order and a new sales order is
inserted into your customer container, you need a way to update the customer
document and increment the salesOrderCount property by one. To do this, you need a
transaction. Azure Cosmos DB supports transactions when the data sits within the
same logical partition.

Because customers and sales orders reside in the same logical partition, you can
insert the new sales order and update the customer document within a transaction.
There are two ways to implement transactions in Azure Cosmos DB: by using stored
procedures or by using a feature called transactional batch, which is available in
both .NET and Java SDKs.

=======

CosmosClient client = new CosmosClientBuilder(connectionString)


.WithApplicationRegion(Regions.UKSouth)
.Build();
====

CosmosClientOptions options = new ()


{
ApplicationRegion = Regions.UKSouth
};
CosmosClient client = new (connectionString, options);

====

CosmosClientOptions options = new ()


{
ApplicationRegion = "UK South"
};
CosmosClient client = new (connectionString, options);

====

List<string> regions = new()


{
"East Asia",
"South Africa North",
"West US"
};

CosmosClientOptions options = new ()


{
ApplicationPreferredRegions = regions
};
CosmosClient client = new (connectionString, options);

=====

CosmosClient client = new CosmosClientBuilder(connectionString)


.WithApplicationPreferredRegions(
new List<string>
{
Regions.EastAsia
Regions.SouthAfricaNorth
Regions.WestUS
}
)
.Build();

====

You must enable automatic failover before a failover can occur.

Consistency Level Description


Strong Linear consistency. Data is replicated and committed in all configured
regions before acknowledged as committed and visible to all clients.
Bounded Staleness Reads lag behind writes by a configured threshold in time or
items.
Session Within a specific session (SDK instance), users can read their own
writes.
Consistent Prefix Reads may lag behind writes, but reads will never appear out of
order.
Eventual Reads will eventually be consistent with writes.
Bounded staleness provides strong consistency guarantees within the region in which
data is written.

Session consistency is a great option for applications where the end users may be
confused if they cannot immediately see any transaction they just made.

Consistent Prefix consistency is ideal for applications where the order of read
operations matters more than the latency.

Eventual consistency is a good option for applications that don't require any
linear or consistency guarantees.

Each new Azure Cosmos DB account has a default consistency level of Session. In the
Azure portal, the Default consistency pane is used to configure a new default
consistency level for the entire account.

=====

The consistency level can only be relaxed on a per-request basis, not strengthened.

ItemRequestOptions options = new()


{
ConsistencyLevel = ConsistencyLevel.Eventual
};
Product item = await container.ReadItemAsync<Product>(id, partitionKey,
requestOptions: options);

CosmosClientOptions options = new()


{
ConsistencyLevel = ConsistencyLevel.Eventual
};
CosmosClient client = new (endpoint, key, options);

=====

ItemResponse<Product> response = await container.CreateItemAsync<Product>(item);


string token = response.Headers.Session;

ItemRequestOptions options = new()


{
SessionToken = token
};
ItemResponse<Product> readResponse = container.ReadItemAsync<Product>(id,
partitionKey, requestOptions: options);)

Session tokens can be manually pulled out of a client and used on another client to
preserve a session between multiple clients.

=====

Enabling the ability to write to any region is a turnkey operation that doesn’t
interrupt the application’s availability.

Strong consistency is not supported in a multi-region write scenario.

=======

Type Description
Insert This conflict occurs when more than one item is inserted simultaneously
with the same unique identifier in multiple regions
Replace Replace conflicts occur when multiple client applications update the
same item concurrently in separate regions
Delete Delete conflicts occurs when a client is attempting to update an item
that has been deleted in another region at the same time

The default conflict resolution policy in Azure Cosmos DB is Last Write Wins. This
policy uses the timestamp (_ts) to determine which item wrote last. In simple
terms, if multiple items are in conflict, the item with the largest value for the
_ts property will win. In the case of a delete conflict, the operation to delete an
item will always win out over other operations.

Database database = client.GetDatabase("cosmicworks");

ContainerProperties properties = new("products", "/categoryId")


{
ConflictResolutionPolicy = new ConflictResolutionPolicy()
{
Mode = ConflictResolutionMode.LastWriterWins,
ResolutionPath = "/metadata/sortableTimestamp",
}
};

Container container = database.CreateContainerIfNotExistsAsync(properties);

You can only set a conflict resolution policy on newly created containers.

======

A custom resolution policy will use a stored procedure to resolve conflicts between
items in different regions. All custom stored procedures must be implemented with
the following JavaScript function signature.

function <function-name>(incomingItem, existingItem, isTombstone, conflictingItems)


Each of these four parameters is required in the function:

Parameter Description
existingItem The item that is already committed
incomingItem The item that's being inserted or updated that generated the
conflict
isTombstone Boolean indicating if the incoming item was previously deleted
conflictingItems Array of all committed items in the container that conflicts with
incomingItem

Database database = client.GetDatabase(databaseName);

ContainerProperties properties = new(containerName, partitionKey)


{
ConflictResolutionPolicy = new ConflictResolutionPolicy()
{
Mode = ConflictResolutionMode.Custom,
ResolutionProcedure =
$"dbs/{databaseName}/colls/{containerName}/sprocs/{sprocName}",
}
};

Container container = database.CreateContainerIfNotExistsAsync(properties);

StoredProcedureProperties properties = new (sprocName,


File.ReadAllText(@"code.js"))

await container.Scripts.CreateStoredProcedureAsync(properties);

====

Alternatively, a custom conflict resolution policy can be configured without a


stored procedure. In this scenario, conflicts are written to a conflicts feed. Your
application code can then manually resolve conflicts in the feed.

Database database = client.GetDatabase("cosmicworks");

ContainerProperties properties = new("products", "/categoryId")


{
ConflictResolutionPolicy = new ConflictResolutionPolicy()
{
Mode = ConflictResolutionMode.Custom
}
};

Container container = database.CreateContainerIfNotExistsAsync(properties);

=====

Index seek The query engine will seek an exact match on a field’s value by
traversing directly to that value and looking up how many items match. Once the
matched items are determined, the query engine will return the items as the query
result. The RU charge is constant for the lookup. The RU charge for loading and
returning items is linear based on the number of items.
Index scan The query engine will find all possible values for a field and then
perform various comparisons only on the values. Once matches are found, the query
engine will load and return the items as the query result. The RU charge is still
constant for the lookup, with a slight increase over the index seek based on the
cardinality of the indexed properties. The RU charge for loading and returning
items is still linear based on the number of items returned.
Full scan The query engine will load the items, in their entirety, to the
transactional store to evaluate the filters. This type of scan does not use the
index; however, the RU charge for loading items is based on the number of items in
the entire container.

An index scan can range in complexity from an efficient and precise index scan, to
a more involved expanded index scan, and finally the most complex full index scan.

=====

Suppose your application is write-heavy and only ever does point reads using the id
and partition key values. In that case, you can choose to disable indexing entirely
using a customized indexing policy.

=====

By default, PopulateIndexMetrics is disabled. You should only enable this if you


are troubleshooting query performance or are unsure how to modify your indexing
policy.

QueryRequestOptions options = new()


{
PopulateIndexMetrics = true
};
FeedIterator<Product> iterator = container.GetItemQueryIterator<Product>(query,
requestOptions: options);

FeedResponse<Product> response = await iterator.ReadNextAsync();

Console.WriteLine(response.IndexMetrics);

====

while(iterator.HasMoreResults)
{
FeedResponse<Product> response = await iterator.ReadNextAsync();
foreach(Product product in response)
{
// Do something with each product
}

Console.WriteLine($"RUs:\t\t{response.RequestCharge:0.00}");

totalRUs += response.RequestCharge;
}

====

Product createdItem = response.Resource;

Console.WriteLine($"RUs:\t{response.RequestCharge:0.00}");

=====

For some workloads in Azure Cosmos DB, an integrated cache comes at a great
benefit. These workloads include, but are not limited to:

Workloads with far more read operations and queries than write operations
Workloads that read large individual items multiple times
Workloads that execute queries multiple times with a large amount of RU/s
Workloads that have hot partition key[s] for read operations and queries

======

For the .NET SDK client to use the integrated cache, you must make sure that three
things are true:

The client uses the dedicated gateway connection string instead of the typical
connection string
The client is configured to use Gateway mode instead of the default Direct
connectivity mode
The client’s consistency level must be set to session or eventual

CosmosClientOptions options = new()


{
ConnectionMode = ConnectionMode.Gateway
};

CosmosClient client = new (connectionString, options);

To configure a point read operation to use the integrated cache, you must create an
object of type ItemRequestOptions. In this object, you can manually set the
ConsistencyLevel property to either ConsistencyLevel.Session or
ConsistencyLevel.Eventual. You can then use the options variable in the
ReadItemAsync method invocation.

QueryRequestOptions queryOptions = new()


{
ConsistencyLevel = ConsistencyLevel.Eventual
};

FeedIterator<Product> iterator = container.GetItemQueryIterator<Product>(query,


requestOptions: queryOptions);

=====

By default, the cache will keep data for five minutes. This staleness window can be
configured using the MaxIntegratedCacheStaleness property in the SDK.

ItemRequestOptions operationOptions = new()


{
ConsistencyLevel = ConsistencyLevel.Eventual,
DedicatedGatewayRequestOptions = new()
{
MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(15)
}
};

QueryRequestOptions queryOptions = new()


{
ConsistencyLevel = ConsistencyLevel.Eventual,
DedicatedGatewayRequestOptions = new()
{
MaxIntegratedCacheStaleness = TimeSpan.FromSeconds(120)
}
};

=====

By default, Azure Monitor will display the overall throughput of all Azure Cosmos
DB operations the selected account does. To better analyze the throughput, more
granular filtering will be needed to find aggregate usage of the individual
operation types or to further compare the usage of multiple operation types at the
same time. Using the Add filter and Apply splitting options will help us with those
analyses.

Azure Monitor allows us to filter further by specific CollectionName, DatabaseName,


OperationType, Region, Status, and StatusCode. For example, we could add a filter
by operation type to see the usage of our different Azure Cosmos DB operations.

====

The SQL API log tables are:

DataPlaneRequests - This table logs back-end requests for operations that execute
create, update, delete, or retrieve data.
QueryRuntimeStatistics - This table logs query operations against the SQL API
account.
PartitionKeyStatistics - This table logs logical partition key statistics in
estimated KB. It's helpful when troubleshooting skew storage.
PartitionKeyRUConsumption - This table logs every second aggregated RU/s
consumption of partition keys. It's helpful when troubleshooting hot partitions.
ControlPlaneRequests - This table logs Azure Cosmos DB account control data, for
example adding or removing regions in the replication settings.

=====

Request->Total Requests by Status Code


That's correct. This report will allow us to compare the number of successful
requests (200) against the number of rate-limiting ones (429). If relatively you're
finding 1-5% of the request hit a 429 status, and high number of successful
requests that might be normal for your application, but if we discover much then 5%
of the requests are hitting 429 conditions, there might be a problem.

PartitionKeyRUConsumption.
That's correct. This table logs every second aggregated RU/s consumption of
partition keys. It's helpful when troubleshooting hot partitions.

=====

Common Status codes for all types of operations

200 OK List, Get, Replace, Patch, Query The operation was successful.

Create a Document

201 Created The operation was successful.


400 Bad Request The JSON body is invalid.
403 Forbidden The operation couldn't be completed because the storage limit of
the partition has been reached.
409 Conflict The id provided for the new document has been taken by an
existing document.
413 Entity Too Large The document size in the request exceeded the allowable
document size.

List documents under the collection using ReadFeed

400 Bad Request The override set in x-ms-consistency-level is stronger than the
one set during account creation. For example, if the consistency level is Session,
the override can't be Strong or Bounded.

Get a Document

304 Not Modified The document requested wasn't modified since the specified
eTag value in the If-Match header. The service returns an empty response body.
400 Bad Request The override set in the x-ms-consistency-level header is stronger
than the one set during account creation. For example, if the consistency level is
Session, the override can't be Strong or Bounded.
404 Not Found The document is no longer a resource, that is, the document was
deleted.

Replace a Document

400 Bad Request The JSON body is invalid. Check for missing curly brackets or
quotes.
404 Not Found The document no longer exists, that is, the document was deleted.
409 Conflict The id provided for the new document has been taken by an
existing document.
413 Entity Too Large The document size in the request exceeded the allowable
document size in a request.
Patch a Document

400 Bad Request The JSON body is invalid.


412 Precondition Failed The specified pre-condition isn't met.

Delete Document

204 No Content The delete operation was successful.


404 Not Found The document is not found.

Query Documents

400 Bad Request The specified request was specified with an incorrect SQL syntax,
or missing required headers.

Other important status codes Azure Cosmos DB request could return

408 Request timeout The operation did not complete within the allotted amount
of time. This code is returned when a stored procedure, trigger, or UDF (within a
query) does not complete execution within the maximum execution time.
429 Too many requests The collection has exceeded the provisioned throughput
limit. Retry the request after the server specified retry after duration. For more
information, see request units.
500 Internal Server Error The operation failed because of an unexpected service
error. Contact support.
503 Service Unavailable The operation couldn't be completed because the
service was unavailable. This situation could happen because of network
connectivity or service availability issues. It's safe to retry the operation. If
the issue persists, contact support.

=====

Required ports are blocked


Verify that the following ports are enabled for the SQL API.

Connection mode Supported protocol Supported SDKs API/Service port


Gateway HTTPS All SDKs SQL (443)
Direct TCP .NET SDK, Java SDK When using public/service endpoints:
ports in the 10000 through 20000 range. When using private endpoints: ports in the
0 through 65535 range

======

A full backup is taken every 4 hours. Only the last two backups are stored by
default. Both the backup interval and the retention period can be configured in the
Azure portal. This configuration can be set during or after the Azure Cosmos DB
account has been created.
If Azure Cosmos DB's containers or database are deleted, the existing container and
database snapshots will be retained for 30 days.
Azure Cosmos DB backups are stored in Azure Blob storage.
Backups are stored in the current write region or if using multi-region writes to
one of the write regions to guarantee low latency.
Snapshots of the backup are replicated to another region through geo-redundant
storage (GRS). This replication, provides resiliency against regional disasters.
Backups can't be accessed directly. To restore the backup, a support ticket needs
to be opened with the Azure Cosmos DB team.
Backups won't affect performance or availability. Furthermore, no RUs are consumed
during the backup process.
====

Azure Cosmos DB backups use by default geo-redundant blob storage that is


replicated to a paired region. This backup storage redundancy can be modified
either during or after the creation of the account. The redundancy options
available to periodic backup mode are:

Geo-redundant backup storage: The default value. Copies the backup asynchronously
across the paired region.
Zone-redundant backup storage: Copies the backup synchronously across three Azure
availability zones in the primary region.
Locally redundant backup storage: Copies the backup synchronously three times
within a single physical location in the primary region.

=====

When creating a new Azure Cosmos DB account, select Periodic under the Backup
policy tab.

Change the backup options as needed.

Backup Interval - This setting defines how often is the backup going to be done.
Can be changed in minutes or hours. The interval period can be between 1 and 24
hours. The default is 240 minutes.
Backup Retention - This setting defines how long should the backups be kept. Can be
changed in hours or days. The retention period will be at least two times the
backup interval and 720 hours (or 30 days) at the most. The default is 8 Hours.
Backup storage redundancy - One of the three redundancy options discussed in the
previous section. The default is Geo-redundant backup storage.

=====

DocumentDB Account Contributor Can manage Azure Cosmos DB accounts.


Cosmos DB Account Reader Can read Azure Cosmos DB account data.
Cosmos Backup Operator Can submit a restore request for Azure portal for a
periodic backup enabled database or a container. Can modify the backup interval and
retention on the Azure portal. Can't access any data or use Data Explorer.
CosmosRestoreOperator Can do restore action for Azure Cosmos DB account with
continuous backup mode.
Cosmos DB Operator Can provision Azure Cosmos accounts, databases, and
containers. Can't access any data or use Data Explorer.

00000000-0000-0000-0000-000000000001 Cosmos DB Built-in Data Reader -


Microsoft.DocumentDB/databaseAccounts/readMetadata
- Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/read
-Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/executeQuery
- Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/readChangeFeed
00000000-0000-0000-0000-000000000002 Cosmos DB Built-in Data Contributor -
Microsoft.DocumentDB/databaseAccounts/readMetadata
- Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/*
- Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/*

=====

Initialize the SDK with Azure AD

TokenCredential servicePrincipal = new ClientSecretCredential(


"<azure-ad-tenant-id>",
"<client-application-id>",
"<client-application-secret>");
CosmosClient client = new CosmosClient("<account-endpoint>", servicePrincipal);

=====

Enforcing RBAC as the only authentication method


When using RBAC, you can disable the Azure Cosmos DB account primary and secondary
key if you wish to use RBAC exclusively. Disabling the accounts can be done by
setting the disableLocalAuth to true when creating or updating your Azure Cosmos DB
account using Azure Resource Manager templates.

======

Data Encryption keys

Always encrypted requires that you create data encryption keys (DEK) ahead of time.
The DEKs are created at client-side using the Azure Comsos DB SDK. These DEKs are
stored in the Azure Cosmos DB service. The DEKs are defined at the database level
so they can be shared across multiple containers. Each DEK you create can be used
to encrypt only one property, or can be used to encrypt many properties. You can
have multiple DEKs per databases.

Customer-managed keys

A DEK must be wrapped by a customer-managed key (CMK) before it stored in Azure


Cosmos DB. Since CMKs control the wrapping and unwrapping of the DEKs, they control
the access to the data that is encrypted with those DEKs. CMK storage is designed
as an extensible/plug-in model, with a default implementation that expects them to
be stored in Azure Key Vault. The relationship between these components is
displayed in the following diagram.

Encryption policies are container-level specifications describing how the JSON


properties should be encrypted. These policies are similar to indexing policies in
structure. In the current release, you must create these policies at the container
creation time and can't be updated once they're created.

For each property that you want to encrypt, the encryption policy defines:

The path of the property in the form of /property. Only top-level paths are
currently supported, nested paths such as /path/to/property aren't supported.
The ID of the DEK to use when encrypting and decrypting the property.
An encryption type. It can be either randomized or deterministic.
The encryption algorithm to use when encrypting the property. The specified
algorithm can override the algorithm defined when creating the key if they're
compatible.

You can't encrypt the ID or the container's partition key.

Randomized vs. deterministic encryption

Because the Azure Cosmos DB service might need to support some querying
capabilities over the encrypted data, and it can't evaluate the data in plain text,
Always Encrypted has more than one encryption type. The encryption types supported
by Always Encrypted are:

Deterministic encryption: It always generates the same encrypted value for any
given plain text value and encryption configuration. Using deterministic encryption
allows queries to do equality filters on encrypted properties. However, it may
allow attackers to guess information about encrypted values by examining patterns
in the encrypted property. This is especially true if there's a small set of
possible encrypted values, such as True/False, or North/South/East/West region.

Randomized encryption: It uses a method that encrypts data in a less predictable


manner. Randomized encryption is more secure, but prevents queries from filtering
on encrypted properties.

=====

To use Always Encrypted, an instance of an EncryptionKeyStoreProvider must be


attached to your Azure Cosmos DB SDK instance. This object is used to interact with
the key store hosting your CMKs. The default key store provider for Azure Key Vault
is named AzureKeyVaultKeyStoreProvider. To use the AzureKeyVaultKeyStoreProvider,
you'll need to add the Microsoft.Data.Encryption.AzureKeyVaultProvider package.

var tokenCredential = new ClientSecretCredential(


"<aad-app-tenant-id>", "<aad-app-client-id>", "<aad-app-secret>");
var keyStoreProvider = new AzureKeyVaultKeyStoreProvider(tokenCredential);
var client = new CosmosClient("<connection-string>")
.WithEncryption(keyStoreProvider);

=====

Once we created the CMK in the Azure Key Vault, its time to create our DEK in the
parent database. To create this DEK, we'll use the CreateClientEncryptionKeyAsync
method and pass the following information:

A string identifier that will uniquely identify the key in the database.
The encryption algorithm intended to be used with the key. Only one algorithm is
currently supported.
The key identifier of the CMK stored in Azure Key Vault. This parameter is passed
in a generic EncryptionKeyWrapMetadata object where the name can be any friendly
name you want, and the value must be the key identifier.
The following snippets show how we create this DEK in .NET.

var database = client.GetDatabase("my-database");


await database.CreateClientEncryptionKeyAsync(
"my-key",
DataEncryptionKeyAlgorithm.AEAD_AES_256_CBC_HMAC_SHA256,
new EncryptionKeyWrapMetadata(
keyStoreProvider.ProviderName,
"akvKey",
"https://<my-key-vault>.vault.azure.net/keys/<key>/<version>"));

=====

It's a good security practice to rotate your CMKs regularly. You should also rotate
your CMK if you suspect that the current CMK has been compromised. Once the CMK is
rotated, you provide that new CMK identifier for the DEK rewrapper to start using
it. This operation doesn't affect the encryption of your data, but the protection
of the DEK. Review the following script that rewraps the new CMK to the DEK:

await database.RewrapClientEncryptionKeyAsync(
"my-key",
new EncryptionKeyWrapMetadata(
keyStoreProvider.ProviderName,
"akvKey",
" https://<my-key-vault>.vault.azure.net/keys/<new-key>/<version>"));
=====

var path1 = new ClientEncryptionIncludedPath


{
Path = "/property1",
ClientEncryptionKeyId = "my-key",
EncryptionType = EncryptionType.Deterministic.ToString(),
EncryptionAlgorithm =
DataEncryptionKeyAlgorithm.AEAD_AES_256_CBC_HMAC_SHA256.ToString()
};
var path2 = new ClientEncryptionIncludedPath
{
Path = "/property2",
ClientEncryptionKeyId = "my-key",
EncryptionType = EncryptionType.Randomized.ToString(),
EncryptionAlgorithm =
DataEncryptionKeyAlgorithm.AEAD_AES_256_CBC_HMAC_SHA256.ToString()
};
await database.DefineContainer("my-container", "/partition-key")
.WithClientEncryptionPolicy()
.WithIncludedPath(path1)
.WithIncludedPath(path2)
.Attach()
.CreateAsync();

======

Write encrypted data


When an Azure Cosmos DB document is written, the SDK evaluates the encryption
policies to determine if any properties need to be encrypted and how to encrypt
them. If a property needs to be encrypted, it creates a base 64 string in place of
the original text.

Encryption of complex types

When the property to encrypt is a JSON array, every entry of the array is
encrypted.
When the property to encrypt is a JSON object, only the leaf values of the object
get encrypted. The intermediate subproperty names remain in plain text form.

=====

The AddParameterAsync method passes the value of the query parameter used in
queries that filter on encrypted properties. This method takes the following
arguments:

The name of the query parameter.


The value to use in the query.
The path of the encrypted property (as defined in the encryption policy).
We can see an example that uses the AddParameterAsync below:

var queryDefinition = container.CreateQueryDefinition(


"SELECT * FROM c where c.property1 = @Property1");
await queryDefinition.AddParameterAsync(
"@Property1",
1234,
"/property1");

=====
Reading documents when only a subset of properties can be decrypted
Different document properties in the same container can use different encryption
policies. Each policy can use different CMKs to encrypt the properties. If your
client has access to some of the CMKs used to decrypt some of the properties, but
not access to other CMKs to decrypt other properties, you can still partially query
the documents with the properties you can decrypt. You should just remove those
properties from your queries, which you don't have access to their CMKs. For
example, if property1 was encrypted with key1 and property2 was encrypted with
key2, and your app only has access to key1, your query should ignore property2.
This query could look like, SELECT c.property1, c.property3 FROM c.

=====

What is the proper set of steps to set up Always Encrypted for Cosmos DB?

Create a CMKs in Azure Key Vault, Use the SDK to create the DEKs (with the CMKs) on
a database, create a container with the DEK and encryption policies.
That's correct. You can only set up Always Encrypted at the container's creation
time.

======

What do you need to do to use the Data Explorer with RBAC?

Data Explorer doesn't support RBAC. (Data explorer tab in the Portal, not the Data
explorer service)
That's correct. Data Explorer doesn't support RBAC, to query your data, use Azure
Cosmos DB Explorer instead.

=====

az cosmosdb: This group contains the commands required to create and manage a new
Azure Cosmos DB account.
az cosmosdb sql: This subgroup of the az cosmosdb group contains the commands to
manage SQL API-specific resources such as databases and containers.

By default, this command will create a new account using the SQL API.

az cosmosdb create \
--name '<account-name>' \
--resource-group '<resource-group>'

az cosmosdb create \
--name '<account-name>' \
--resource-group '<resource-group>' \
--default-consistency-level 'eventual' \
--enable-free-tier 'true'

az cosmosdb create \
--name '<account-name>' \
--resource-group '<resource-group>' \
--locations regionName='eastus'

======

az cosmosdb sql database create \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--name '<database-name>'

az cosmosdb sql container create \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--throughput '400' \
--partition-key-path '<partition-key-path-string>'

az cosmosdb sql container create \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--partition-key-path '<partition-key-path-string>' \
--idx '@.\policy.json' \
--throughput '400'

az cosmosdb sql container create \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--partition-key-path '<partition-key-path-string>' \
--idx '{\"indexingMode\":\"consistent\",\"automatic\":true,\"includedPaths\":
[{\"path\":\"/*\"}],\"excludedPaths\":[{\"path\":\"/headquarters/*\"},
{\"path\":\"/\\\"_etag\\\"/?\"}]}' \
--throughput '400'

======

az cosmosdb sql container throughput update \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--throughput '1000'

az cosmosdb sql database throughput update \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--name '<database-name>' \
--throughput '4000'

======

az cosmosdb sql container throughput migrate \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--throughput-type 'autoscale'

az cosmosdb sql container throughput update \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--max-throughput '5000'

az cosmosdb sql container throughput show \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--query 'resource.minimumThroughput' \
--output 'tsv'

az cosmosdb sql container throughput migrate \


--account-name '<account-name>' \
--resource-group '<resource-group>' \
--database-name '<database-name>' \
--name '<container-name>' \
--throughput-type 'manual'

======

az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--locations regionName='eastus' failoverPriority=0 isZoneRedundant=False \
--locations regionName='westus2' failoverPriority=1 isZoneRedundant=False \
--locations regionName='centralus' failoverPriority=2 isZoneRedundant=False

az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--enable-automatic-failover 'true'

az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--enable-multiple-write-locations 'true'

To remove a region from an Azure Cosmos DB account, use the az cosmosdb update
command to specify the locations that you want to remain using the --locations
argument one or more times. Any location that is not included in the list will be
removed from the account.

az cosmosdb update \
--name '<account-name>' \
--resource-group '<resource-group>' \
--locations regionName='eastus' failoverPriority=0 isZoneRedundant=False \
--locations regionName='westus2' failoverPriority=1 isZoneRedundant=False

======

az cosmosdb failover-priority-change \
--name '<account-name>' \
--resource-group '<resource-group>' \
--failover-policies 'eastus=0' 'centralus=1' 'westus2=2'

Even if you are not changing the priorities of every region, you must include all
regions in the failover-policies argument.

======
Changing the region with priority = 0 will trigger a manual failover for an Azure
Cosmos account.

The az cosmosdb update command is used to update the failover policies for an
account. If you use this command and change the failover priority for the region
that is already set to 0, the command will trigger a manual failover.

Region Failover Priority


East US 0
West US 2 1

az cosmosdb failover-priority-change \
--name '<account-name>' \
--resource-group '<resource-group>' \
--failover-policies 'westus2=0' 'eastus=1'

======

Which Azure CLI command will change the maximum amount of throughput for a
container using autoscale throughput?

az cosmosdb sql container throughput update --max-throughput


That's correct. The update command with the --max-throughput argument changes the
maximum amount of throughput for a container.

======

Minimal required template

{
"$schema":
"https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
]
}

All resources we place in this template will be JSON objects within the resources
array.

======

Final

{
"$schema":
"https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.DocumentDB/databaseAccounts",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id))]",
"location": "[resourceGroup().location]",
"properties": {
"databaseAccountOfferType": "Standard",
"locations": [
{
"locationName": "westus"
}
]
}
},
{
"type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id),
'/cosmicworks')]",
"dependsOn": [
"[resourceId('Microsoft.DocumentDB/databaseAccounts', concat('csmsarm',
uniqueString(resourceGroup().id)))]"
],
"properties": {
"resource": {
"id": "cosmicworks"
}
}
},
{
"type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id),
'/cosmicworks/products')]",
"dependsOn": [
"[resourceId('Microsoft.DocumentDB/databaseAccounts', concat('csmsarm',
uniqueString(resourceGroup().id)))]",
"[resourceId('Microsoft.DocumentDB/databaseAccounts/sqlDatabases',
concat('csmsarm', uniqueString(resourceGroup().id)), 'cosmicworks')]"
],
"properties": {
"options": {
"throughput": 400
},
"resource": {
"id": "products",
"partitionKey": {
"paths": [
"/categoryId"
]
}
}
}
}
]
}

======

Final template in Bicep

resource Account 'Microsoft.DocumentDB/databaseAccounts@2021-05-15' = {


name: 'csmsbicep${uniqueString(resourceGroup().id)}'
location: resourceGroup().location
properties: {
databaseAccountOfferType: 'Standard'
locations: [
{
locationName: 'westus'
}
]
}
}

resource Database 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases@2021-05-15' =


{
parent: Account
name: 'cosmicworks'
properties: {
options: {

}
resource: {
id: 'cosmicworks'
}
}
}

resource Container
'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2021-05-15' = {
parent: Database
name: 'customers'
properties: {
resource: {
id: 'customers'
partitionKey: {
paths: [
'/regionId'
]
}
}
}
}

=======

Argument Description
--resource-group The name of the resource group that is the target of the
deployment
--template-file The name of the file with the resources defined to deploy

az deployment group create \


--resource-group '<resource-group>' \
--template-file '.\template.json'

az deployment group create \


--resource-group '<resource-group>' \
--name '<deployment-name>' \
--template-file '.\template.json'

az deployment group create \


--resource-group '<resource-group>' \
--template-file '.\template.json' \
--parameters name='<value>'

az deployment group create \


--resource-group '<resource-group>' \
--template-file '.\template.json' \
--parameters '@.\template.json'

az deployment group create \


--resource-group '<resource-group>' \
--template-file '.\template.bicep'

======

The indexingPolicy object can be lifted with no changes and set to the
properties.resource.indexingPolicy property of the
Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers.

{
"type": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers",
"apiVersion": "2021-05-15",
"name": "[concat('csmsarm', uniqueString(resourceGroup().id),
'/cosmicworks/products')]",
"dependsOn": [
"[resourceId('Microsoft.DocumentDB/databaseAccounts', concat('csmsarm',
uniqueString(resourceGroup().id)))]",
"[resourceId('Microsoft.DocumentDB/databaseAccounts/sqlDatabases',
concat('csmsarm', uniqueString(resourceGroup().id)), 'cosmicworks')]"
],
"properties": {
"options": {
"throughput": 400
},
"resource": {
"id": "products",
"partitionKey": {
"paths": [
"/categoryId"
]
},
"indexingPolicy": {
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/price/*"
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}
}
}
}

resource Container
'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2021-05-15' = {
parent: Database
name: 'customers'
properties: {
resource: {
id: 'customers'
partitionKey: {
paths: [
'/regionId'
]
}
indexingPolicy: {
indexingMode: 'consistent'
automatic: true
includedPaths: [
{
path: '/address/*'
}
]
excludedPaths: [
{
path: '/*'
}
]
}
}
}
}

==========

In a database, a transaction is typically defined as a sequence of point operations


grouped together into a single unit of work. It's expected that a transaction
provides ACID guarantees:

Atomicity guarantees that all the work done inside a transaction is treated as a
single unit where either all of it is committed or none.
Consistency makes sure that the data is always in a healthy internal state across
transactions.
Isolation guarantees that no two transactions interfere with each other –
generally, most commercial systems provide multiple isolation levels that can be
used based on the application's needs.
Durability ensures that any change that's committed in the database will always be
present.

========

Author Stored procedures

function name() {
}

Within the function, the getContext() method retrieves a context object, which can
be used to perform multiple actions, including:

Access the HTTP response object

Access the corresponding Azure Cosmos DB SQL API container

Using the context object, you can invoke the getResponse() method to access the
HTTP response object to perform actions such as returning a HTTP OK (200) and
setting the response's body to a static string.

function greet() {
var context = getContext();
var response = context.getResponse();
response.setBody("Hello, Learn!");
}

======

Again, use the context object, you can invoke the getCollection() method to access
the container using the JavaScript query API.

function createProduct(item) {
var context = getContext();
var container = context.getCollection();
container.createDocument(
container.getSelfLink(),
item
);
}

======

This stored procedure is almost complete. While this code will run fine, it does
stand the risk of swallowing errors and potentially not returning if the stored
procedure has exceeded the timeout. We should update the code by implementing two
more changes:

Store the boolean return value of container.createDocument, and then use that to
determine if we should return from the function due to an impending server timeout.

Add a third parameter to container.createDocument to handle potential errors and


set the response of this stored procedure to the newly created item returned from
the operation.

function createProduct(item) {
var context = getContext();
var container = context.getCollection();
var accepted = container.createDocument(
container.getSelfLink(),
item,
(error, newItem) => {
if (error) throw error;
context.getResponse().setBody(newItem)
}
);
if (!accepted) return;
}

Alternatively, you can use the __ (double underline) shortcut as an equivalent to


getContext().getCollection().

======

Transactions are deeply and natively integrated into Azure Cosmos DB SQL API’s
JavaScript programming model. Inside a JavaScript function, all operations are
automatically wrapped under a single transaction. If the function completes without
any exception, all data changes are committed. Azure Cosmos DB’s SQL API will roll
back the entire transaction if a single exception is thrown from the script.

Effectively, the start of the JavaScript function is similar to a BEGIN TRANSACTION


statement in a database system, with the end of the function scope being the
functional equivalent of COMMIT TRANSACTION. If any error is thrown, that’s the
functional equivalent of ROLLBACK TRANSACTION.

In code, this is surfaced simply by throwing any error in JavaScript:

throw new Error('Something');

======

string sproc = @"function greet() {


var context = getContext();
var response = context.getResponse();
response.setBody('Hello, Learn!');
}";

Alternatively, you can use file APIs such as System.IO.File to read a function from
a *.js file.

StoredProcedureProperties properties = new()


{
Id = "greet",
Body = sproc
};

StoredProcedureProperties properties = new("greet", sproc);

await container.Scripts.CreateStoredProcedureAsync(properties);

If you'd like to parse the results, the CreateStoredProcedureAsync<> method returns


an object of type Microsoft.Azure.Cosmos.Scripts.StoredProcedureResponse that
contains metadata about the newly created stored procedure within the container.

======

User-defined functions (UDFs) are used to extend the Azure Cosmos DB SQL API’s
query language grammar and implement custom business logic. UDFs can only be called
from inside queries as they enhance and extend the SQL query language

UDFs do not have access to the context object and are meant to be used as compute-
only code

A user-defined function is defined as a JavaScript function that takes in one or


more scalar input[s] and then returns a scalar value as the output.

function name(input) {
return output;
}

function addTax(preTax) {
return preTax * 1.15;
}

SELECT
p.name,
p.price,
udf.addTax(p.price) AS priceWithTax
FROM
products p
=======

string udf = @"function addTax(preTax) {


return preTax * 1.15;
}";

UserDefinedFunctionProperties properties = new()


{
Id = "addTax",
Body = udf
};
await container.Scripts.CreateUserDefinedFunctionAsync(properties);

========

Triggers are defined as JavaScript functions. The function is then executed when
the trigger is invoked.

function name() {
}

Within the function, the getContext() method retrieves a context object, which can
be used to perform multiple actions, including:

Access the HTTP request object (the source of a pre-trigger)

Access the HTTP response object (the source of a post-trigger)

Access the corresponding Azure Cosmos DB SQL API container

Using the context object, you can invoke the getRequest() or getResponse() methods
to access the HTTP request and response objects. You can also invoke the
getCollection() method to access the container using the JavaScript query API.

Pre trigger

Pre-triggers are ran before an operation and cannot have any input parameters. They
can perform actions such as validate the properties of an item, or inject missing
properties.

function addLabel(item) {
var context = getContext();
var request = context.getRequest();

var pendingItem = request.getBody();

if (!('label' in pendingItem))
pendingItem['label'] = 'new';

request.setBody(pendingItem);
}

Post-trigger

Post-triggers run after an operation has completed and can have input parameters
even though they are not required. They have action to the HTTP response message
right before it is sent to the client. They can perform actions such as updating or
creating secondary items based on changes to your original item.
Let's walk through a slightly different example with the same JSON file. Now, a
post-trigger will be used to create a second item with a different materialized
view of our data. Our goal, is to create a second item with three JSON properties;
sourceId, categoryId, and displayName.

{
"sourceId": "caab0e5e-c037-48a4-a760-140497d19452",
"categoryId": "e89a34d2-47ee-4da8-bcf6-10f552604b79",
"displayName": "Handlebar [Accessories]",
}

We are including the categoryId property because all items created within a post-
trigger must have the same logical partition key as the original item that was the
source of the trigger.

We can start our function by getting both the container and HTTP response using the
getCollection() and getResponse() methods. We will also get the newly created item
using the getBody() method of the HTTP response object.

function createView() {
var context = getContext();
var container = context.getCollection();
var response = context.getResponse();

var createdItem = response.getBody();

var viewItem = {
sourceId: createdItem.id,
categoryId: createdItem.categoryId,
displayName: `${createdItem.name} [${createdItem.categoryName}]`
};

var accepted = container.createDocument(


container.getSelfLink(),
viewItem,
(error, newItem) => {
if (error) throw error;
}
);
if (!accepted) return;
}

======

Create a pre-trigger

string preTrigger = @"function addLabel() {


var context = getContext();
var request = context.getRequest();

var pendingItem = request.getBody();

if (!('label' in pendingItem))
pendingItem['label'] = 'new';

request.setBody(pendingItem);
}";

TriggerProperties properties = new()


{
Id = "addLabel",
Body = preTrigger,
TriggerOperation = TriggerOperation.Create,
TriggerType = TriggerType.Pre
};

await container.Scripts.CreateTriggerAsync(properties);

======

Create a post-trigger

string postTrigger = @"function createView() {


var context = getContext();
var container = context.getCollection();
var response = context.getResponse();

var createdItem = response.getBody();

var viewItem = {
sourceId: createdItem.id,
categoryId: createdItem.categoryId,
displayName: `${createdItem.name} [${createdItem.categoryName}]`
};

var accepted = container.createDocument(


container.getSelfLink(),
viewItem,
(error, newItem) => {
if (error) throw error;
}
);
if (!accepted) return;
}";

TriggerProperties properties = new()


{
Id = "createView",
Body = postTrigger,
TriggerOperation = TriggerOperation.Create,
TriggerType = TriggerType.Post
};

await container.Scripts.CreateTriggerAsync(properties);

=========

TriggerOperation

All 0
Specifies all operations.

Create 1
Specifies create operations only.

Delete 3
Specifies delete operations only.
Replace 4
Specifies replace operations only.

Update 2
Specifies update operations only.

========

Prior to invoking the operation, create an object of type


Microsoft.Azure.Cosmos.ItemRequestOptions. Within that options object, configure
the PreTriggers and PostTriggers property lists to include the triggers you would
like enabled for this operation.

ItemRequestOptions options = new()


{
PreTriggers = new List<string> { "addLabel" },
PostTriggers = new List<string> { "createView" }
};

Remember, triggers are not automatically executed; they must be specified for each
database operation where you want them to execute.

await container.CreateItemAsync(newItem, requestOptions: options);

======

You have authored a user-defined function named addTax. You are writing a SQL
query to return a flat array of scalar price values with the calculated tax value.
Which valid SQL query should you use for this task?

SELECT VALUE udf.addTax(p.price) FROM products p

======

--enable-analytical-storage true

=====

Is the autosync replication from transactional store to the analytical store


asynchronous or synchronous and what are the latencies?
Auto-sync latency is usually within 2 minutes. In cases of shared throughput
database with a large number of containers, auto-sync latency of individual
containers could be higher and take up to 5 minutes. We would like to learn more
how this latency fits your scenarios.

You might also like