Steve Flowers
- Apr 22
- 4 min read

Quantify Azure Cosmos DB Per Partition Autoscale Savings

Azure Cosmos DB has a feature in preview that can translate to massive cost savings for your workload. Per-region, per-partition autoscale (aka Autoscale v2) scales physical partitions independently (partition key ranges) so that you do not incur wasted resources on your account.

Today, Cosmos DB partitions data at the physical, compute layer via physical partitions. Request units are shared evenly across physical partitions. For example, if you have 50,000 RU's provisioned and 5 physical partitions, each physical partition has 10k RUs allocated. If one of those partitions regularly uses all 10k and the other 4 partitions regularly use less than 10k, the unused RUs are wasted yet you are still billed for them.

In theory, your data model should distribute data evenly across physical partitions such that they are all evenly consuming RUs and there is little waste. In reality however, workloads are hardly ever perfectly distributed. In the worst case, you have a hot partition which likely requires a data model and partition key change. But most often, there is simply an uneven access pattern across your data.

Autoscale v2 solves these problems by scaling physical partitions independently. If one partition requires 10k RUs, instead of scaling and allocating 10k RUs to all physical partitions, only the partition which requires 10k RUs is scaled. The other partitions remain at their current level based on the needs of the application.

Autoscale v2 also helps save cost from the perspective of HA/DR. If you have additional regions configured for failover, instead of scaling those regions to meet the needs of the active region, they remain scaled lower saving on cost.

You can quantify your cost savings by using the Azure Monitor logs collected from Azure Cosmos DB. These logs are only available if you've configured the diagnostic settings for your Cosmos DB account.

Here is a KQL query you can run in your Log Analytics workspace to quantify your saving potential:

// Table of data
let database = 'eclipse';  //Set this to your database name
let collection = 'cars_scalelogs';  //Set this to your container name
let timeSpan = 5d;
let maxRequestPartition = toscalar(
CDBPartitionKeyRUConsumption
| where TimeGenerated >= ago(timeSpan)
| where DatabaseName == database
| where CollectionName  == collection
| order by RequestCharge
| top 1 by RequestCharge
| extend RequestCharge = round(RequestCharge, 2)
| project RequestCharge);
CDBPartitionKeyRUConsumption
| where TimeGenerated >= ago(timeSpan)
| where DatabaseName == database
| where CollectionName  == collection
| summarize max(toreal(RequestCharge)) by PartitionKeyRangeId
| order by max_RequestCharge
| extend maxPartitionCharge = round(max_RequestCharge, 2)
| extend highestPartitionCharge = maxRequestPartition
| extend diffCharge = round(highestPartitionCharge - max_RequestCharge, 2)
| extend realDiff = round(diffCharge / highestPartitionCharge, 2)
| extend percentSavings = round(realDiff * 100, 2)
| project PartitionKeyRangeId, maxPartitionCharge, highestPartitionCharge, diffCharge, percentSavings

The first two lines of the query are variables that must be set to your specific environment. Substitute these values with your database name and container name. The third variable is the timespan you would like to analyze. You may only need to analyze an hour of data for a busy application or 24 hours to a couple of days for quieter applications.

The output of this query is a table of data which shows your physical partitions:

"maxPartitionCharge" is the max number of RUs consumed by your application against a specific physical partition. This number is then compared to the highest RU charge of all physical partitions. In the current version of autoscale, all physical partitions will be scaled to meet the highest RU consumption of all partitions. Autoscale v2 scales these partitions independently instead.

This table tells us that physical partition 4 is the highest consumer of RUs. The "percentSavings" column shows us the percent of savings for each physical partition when using Autoscale v2.

You can then see the potential cost savings on a graph by applying the "render" operator to the query. In the next KQL query, we run the same query but append the render operation to an area chart:

// Area chart
let database = 'eclipse';  //Set this to your database name
let collection = 'cars_scalelogs';  //Set this to your container name
let timeSpan = 5d;
let maxRequestPartition = toscalar(
CDBPartitionKeyRUConsumption
| where TimeGenerated >= ago(timeSpan)
| where DatabaseName == database
| where CollectionName  == collection
| order by RequestCharge
| top 1 by RequestCharge
| extend RequestCharge = round(RequestCharge, 2)
| project RequestCharge);
CDBPartitionKeyRUConsumption
| where TimeGenerated >= ago(timeSpan)
| where DatabaseName == database
| where CollectionName  == collection
| summarize max(toreal(RequestCharge)) by PartitionKeyRangeId
| order by max_RequestCharge
| extend maxPartitionCharge = round(max_RequestCharge, 2)
| extend highestPartitionCharge = maxRequestPartition
| extend diffCharge = round(highestPartitionCharge - max_RequestCharge, 2)
| extend realDiff = round(diffCharge / highestPartitionCharge, 2)
| extend percentSavings = round(realDiff * 100, 2)
| project PartitionKeyRangeId, maxPartitionCharge, highestPartitionCharge
| render areachart with (kind=unstacked)

The output of this query looks like this:

The blue portion of the area chart shows how autoscale provisions RUs today; based on your max RU consumption of the highest partition key range. The red area of the chart shows the actual consumption of your physical partitions. Therefore, the blue area shows your cost savings when using Autoscale v2.

We can also display this data using a column chart:

I hope this blog post and the provided KQL queries will help you quantify the savings impact to your org. The Per-region per-partition autoscale feature is currently in public preview (as of this writing) but now is the time to make the case to the business as to why this feature should be enabled. You can start testing this feature right away, so you are ready to implement when it becomes generally available.

thoughtreplica.com

Azure Data and AI insights by Steve Flowers

Quantify Azure Cosmos DB Per Partition Autoscale Savings

Recent Posts