Search
  • Steve Flowers

Reporting on Cosmos DB resiliency across Azure subscriptions




Understanding and implementing proper resiliency in Azure Cosmos DB can ensure your application remains available during an outage. There are several considerations for resiliency which may be use case dependent, however, in this post I will focus on use cases leveraging the global availability of Cosmos DB.


So, what are the configurations that make Cosmos DB resilient?


To understand the implications of these configurations, please read the docs in the links provided. Once we understand what each configuration does for us regarding availability, the next step is to track whether we are meeting the requirements for that availability in our organization.


We can use Azure Resource Graph to determine the state of Cosmos DB account across the subscriptions we have access to.


Azure Resource Graph is an Azure service designed to extend Azure Resource Management by providing efficient and performant resource exploration with the ability to query at scale across a given set of subscriptions so that you can effectively govern your environment.

Below is an Azure Resource Graph query to help us understand availability across our Cosmos DB accounts. This query can be used ad-hoc or programmatically via .NET, Java, Javascript, Python, PowerShell, etc...



resources
| where type =~ "microsoft.documentdb/databaseaccounts"
| extend ZoneRedundant = properties.locations[0].isZoneRedundant
| extend MultiWrite = properties.enableMultipleWriteLocations
| extend WriteLocations = array_length(properties.writeLocations)
| extend ReadLocations = array_length(properties.readLocations)
| extend AutomaticFailover = properties.enableAutomaticFailover
| extend ConsistencyLevel = properties.consistencyPolicy.defaultConsistencyLevel
| extend BackupPolicy = properties.backupPolicy.type
| extend CosmosAccount = name
| project id, tenantId, subscriptionId, resourceGroup, 
CosmosAccount, kind, location, ZoneRedundant, MultiWrite, 
WriteLocations, ReadLocations, AutomaticFailover, ConsistencyLevel, BackupPolicy, IsStrongAndMultiWrite

Two items of note in the query: we count read locations as we should typically have 2 or more and we count write locations as for certain use cases we should have two or more. It is important to understand that when using Strong consistency, we should have 3 write regions since Strong consistency requires a global quorum. During an outage, you cannot have a quorum if you only have two regions configured and one of them has failed.


Here is an example of extending the query to include logic for a customer who wants to simply apply the same rules across all of their Cosmos DB accounts and report whether accounts are compliant (true|false):



resources
| where type =~ "microsoft.documentdb/databaseaccounts"
| extend ZoneRedundant = properties.locations[0].isZoneRedundant
| extend MultiWrite = properties.enableMultipleWriteLocations
| extend WriteLocations = array_length(properties.writeLocations)
| extend ReadLocations = array_length(properties.readLocations)
| extend AutomaticFailover = properties.enableAutomaticFailover
| extend ConsistencyLevel = properties.consistencyPolicy.defaultConsistencyLevel
| extend BackupPolicy = properties.backupPolicy.type
| extend CosmosAccount = name
| extend IsStrongAndMultiWrite = 
    iff(ConsistencyLevel == 'Strong', 
        iff(WriteLocations > 2, 
            'true' , 
            'false'), 
        'NA')
| extend Compliant = 
    iff(ZoneRedundant == true, 
        iff(ReadLocations >= 2, 
            iff(AutomaticFailover == true, 
                iff(IsStrongAndMultiWrite == 'true', 
                    'true', 
                    iff(IsStrongAndMultiWrite == 'false', 
                        'false',
                        iff(IsStrongAndMultiWrite == 'NA', 
                            'true', 
                            'err'))), 
                'false'), 
            "false"), 
        "false")
| project id, tenantId, subscriptionId, resourceGroup, 
CosmosAccount, kind, location, Compliant, ZoneRedundant, MultiWrite, 
WriteLocations, ReadLocations, AutomaticFailover, ConsistencyLevel, BackupPolicy, IsStrongAndMultiWrite

In this case we are identifying accounts that are multi-write and using Strong consistency to account for the need of 3 or more write regions. We are using a nested iff statement to evaluate the state of the configurations and returning simply true or false for compliant.


This will help you track your availability across accounts in your tenant and by programmatically calling this query and storing the data in a database, your data lake, or a Power BI data set, you can report on compliance over time.


Thanks for reading and I hope this helps you gain confidence in your Cosmos DB configurations for availability.

28 views0 comments