Partitioning in Azure Cosmos DB

Service Definition

Azure Cosmos DB is a multi-model database service that is distributed globally, massively scalable.

  • Multi-model data with your favorite API
  • Elastically scale storage and throughput
  • Multiple, well-defined consistency levels
  • <10ms latency guarantees at the 99th percentile
  • SLAs leading the industry in terms of performance, latency, availability, and throughput.

System Topology

The Azure Cosmos DB service is deployed throughout the world, including the sovereign (Azure Germany, Microsoft Azure China 21Vianet Group) and government clouds (Azure U.S. government). Microsoft is deploying and managing the Azure Cosmos DB service — available in all Azure regions — on machine stamps, each with specific local SSDs.

The Cosmos DB service is layered on top of Azure Service Fabric, Azure's basic distributed system infrastructure. It uses Service Fabric to name, route, cluster and container management, rolling upgrade coordination, failure detection, leader choice (within a resource partition) and load balancing capabilities.

Azure Cosmos DB is deployed across one or more Service Fabric clusters, each with multiple hardware generations and a varying number of machines (currently between 60 - 800 machines). Machines typically spread through 10 - 20 fault domains within a cluster. The partitions of the resource are a logical concept. Physically, in terms of a replica group, a resource partition is implemented, called replica sets.

Within a fixed set of processes, each machine hosts replicas corresponding to different resource partitions. Replicas corresponding to the partitions of the resource are placed through these machines and load balanced. Each replica hosts an example of the schema - agnostic database engine from Azure Cosmos DB that manages both the resources and the associated indexes.

The in-turn Cosmos DB database engine consists of components including multiple primitive coordination implementation, JavaScript language runtime, query processor, storage and indexing subsystems responsible for transactional data storage and indexing, respectively. The database engine persists its index on SSDs and replicates it among the instances of the database engine within the replica set(s) respectively in order to provide durability and high availability.

While the index on local SSDs is always persistent, the log will persist either locally, on another machine within the cluster, or remotely across a cluster or a data center within a region. The ability to dynamically configure the proximity between the database engine (compute) and log (storage) at a resource partition's replica granularity is crucial to allow tenants to dynamically select different service tiers.

Partitioning

To meet the performance needs of your application, Azure Cosmos DB uses partitioning to scale individual containers in a database. The items in a container are divided into separate subsets called logical partitions when partitioning. On the basis of the value of a partition key associated with each item in a container, logical partitions are formed. All items in a logical partition have the same partition key value.

Partition ranges can be subdivided dynamically to grow the database seamlessly as the application grows while maintaining high availability at the same time. Azure Cosmos DB fully manages partition management, so as a developer you don't need to write code or manage your partitions.

Selecting a partition key is an important decision that will affect the performance of your application. When selecting a partition key, consider the following details:

  • A single logical partition has a 10 GB upper storage limit.
  • There is a minimum throughput of 400 request units per second (RU/s) in partitioned containers. Requests for the same partition key can not exceed the performance assigned to a partition. If requests exceed the allocated throughput, the rate of requests is limited. So, picking a partition key that doesn't result in "hot spots" in your application is important.
  • Choose a partition key that evenly and evenly spreads the workload across all partitions over time. Your choice of partition key should balance the need for efficient partition queries and transactions to achieve scalability against the goal of distributing items across multiple partitions.
  • Choose a partition key with a wide range of values and patterns of access that are spread evenly across logical partitions. This helps spread the data and activity in your container across the set of logical partitions, allowing resources to be distributed across logical partitions for data storage and throughput.
  • Partition key candidates may include properties that frequently appear in your queries as a filter. Including the partition key in the filter predicate can efficiently route queries.

Wrap up

Partitioning is one of the important concepts of Azure Cosmos DB and used to scale individual containers in a database to meet your application's performance needs.

Hi, I'm Maher, Development Technologies MVP. I'm blogging about ASP.NET Core and Microsoft Azure.