-
Notifications
You must be signed in to change notification settings - Fork 285
Description
INTRODUCTION
The default Durable Functions configuration supports massive scale-out across multiple nodes in all supported hosting environments, including the "serverless" Azure Functions Consumption plan. By default, all Durable Functions deployments in Azure use the Azure Storage provider, which is a storage provider extension for the Durable Task Framework (DTFx). This provider implements the scale-out mechanisms used by Durable Functions.
TERMINOLOGY
- Durable Task Framework (DTFx): The underlying framework which is used to implement the Durable Functions triggers.
- DTFx Client: The output binding used to create, query, and manage orchestration instances. Note that in some cases the client can be invoked directly via an HTTP request to the binding extension.
- Storage Provider: A DTFx extension for implementing durable storage and messaging on top of Azure Storage.
- Task Hub: A logical grouping of storage resources for providing durability to one or more durable function applications. Orchestrator and activity functions must be configured to run in the same task hub in order for them to communicate with each other.
- Control Queue: An Azure Storage queue which is used to drive orchestrator function execution. Note that orchestrator function execution is generally non-blocking, single-threaded, and light on CPU usage. Orchestration instances have an affinity with a single control queue based on a hashing algorithm. Each control queue may process messages from any and all orchestrator functions.
- Work-Item Queue: An Azure Storage queue which drives activity function execution. Note that unlike orchestrator functions, activity function execution will vary widely in the type and amount of compute resources required. The max throughput of this queue is expected to be significantly higher than that of control queues (which are bounded by single-threaded execution).
- Worker: A compute instance. This term may be used interchangeably with VM, host, node, compute instance, or Function App instance.
- History Table: An Azure Storage Table which stores orchestration history partitioned by orchestration instance IDs.
- Scale Controller: The Azure Functions infrastructure component which monitors trigger inputs (queues, etc.) and assigns function apps to compute instances based on trigger throughput. In this context, the scale controller monitors the queues in a given task hub, which is expected to be shared across multiple triggers.
GOALS
The following are the high-level scale goals for scale-out when using Durable Functions with the Azure Storage provider.
- Scale-out to hundreds of VMs/compute instances
- Scale-out to millions of orchestration instances
- Hundreds of combined activity and orchestrator executions per-second (when scaled-out)
- Efficiently maximize Azure Storage account resource limits under full load.
SCENARIOS
The following are high-level scenarios that describe when and how scale-out applies.
Azure Consumption Plan
Customer creates an Azure Functions application that uses the Durable Functions extension to implement durable orchestrations. The default configuration uses four partitions for scaling out orchestrator function load. The Azure Functions Scale Controller detects that there are four partitions and automatically starts monitoring the four control queues and the one work-item queue for unprocessed messages.
Once the function app is woken up by the Scale Controller, the Durable Functions extension starts up on a single worker and processes queue messages from all four control queues as well as the global work-item queue. As load on these queues increases, the Scale Controller assigns additional compute instances (workers) to the function app and the extension automatically load balances control queue processing across these nodes until there is a 1:1 ratio of compute instance and control queue. At this point, no further compute instances will be added to accommodate additional control queue load. However, if the work item queue continues to see an increase in throughput, the Scale Controller continues adding compute instances in addition to the four existing instances until the application is able to keep up with the work-item queue load.
Azure App Service Plan
Customer creates an Azure Functions application that uses the Durable Functions extension in the Azure App Service plan with a fixed number of VMs (e.g. 10 Large/A3 VMs). The default configuration uses four partitions for scaling out orchestrator function load. The customer uses the AlwaysOn feature of App Service to ensure all VMs are always warmed up, regardless of the load. The customer may or may not be using AutoScale rules to add or remove VMs based on load - for the sake of simplicity, we assume AutoScale is not being used.
On startup, the Durable Functions extension automatically load balances the partitions across four VMs, even before any messages arrive. As control queue messages arrive in busts across the four control queues, they are immediately picked up and processed concurrently by the four VMs which are monitoring these queues. The remaining six VMs actively wait to help process stateless work item messages.
Adding Partitions
The customer decides that their workload requires more than four partitions for processing orchestrator functions. By updating host.json, the number of partitions is increased from 4 to 16. This doesn't involve any downtime. Durable Functions clients and workers detect the new partitions and new control queues are created automatically. Also, new orchestration instances are load balanced to the new partitions. Existing instances continue to run in their previous partitions.
Removing Partitions
The customer is noticing performance problems because they've added too many partitions, yet are only processing orchestrator load on a single VM. The remedy this, they update host.json to drop the number of partitions down to 4. Clients detect this and start creating new instances on the first four partitions. The remaining partitions go into "deprovisioning" mode, and instances on those partitions get automatically migrated to the first four partitions. No further processing of messages occurs on the deprovisioning partitions.
REQUIREMENTS
These requirements are intended to formally describe the key assumptions for Durable Functions scale-out. The requirements are prioritized accordingly in support of the previously mentioned goals.
HIGH-LEVEL REQUIREMENTS
| Category | Priority | Requirement | Notes |
|---|---|---|---|
| History | 0 | Orchestration history is equally distributed across table storage partitions | |
| Trigger | 0 | Function processing is automatically load-balanced across all available workers | |
| Trigger | 0 | Workers can be added or removed at any time and control-queue partitions are load-balanced automatically | |
| Trigger | 0 | No two workers will ever process the same control queue | |
| Trigger | 1 | The number of concurrent orchestrator function executions is configurable per-node | |
| Trigger | 1 | The number of concurrent activity function executions is configurable per-node | |
| Trigger | 1 | Customers can add additional partitions to the task hub with minimal downtime | |
| Trigger | 2 | Customers can remove partitions from the task hub with minimal downtime | |
| Trigger | 2 | The lease lock timeout and renewal intervals are configurable | |
| Consumption Plan | 1 | Scale controller activates a function app when one or more messages appear in either a control queue or a work item queue | |
| Consumption Plan | 1 | Scale controller adds compute instances up to the partition count as control queue load increases | |
| Consumption Plan | 1 | Scale controller adds compute instances indefinitely as work-item queue load increases |
PARTITION MANAGEMENT
The design of partition management is influenced by the design of partition management done by the Azure Event Hubs processing host (and in fact, borrows much of the implementation). Here we go over the details of partition management as they relate to Durable Functions.
- The number of partitions defaults to four (this is not a hard requirement, but represents a good starting point).
- If more partitions are needed later, the customer can update the task hub configuration in host.json with a new partition count. The host will detect this change after it has been restarted.
- The host ensures that there is one control queue for every partition within the task hub in Azure Storage and will create them if necessary.
- An Instances table exists (separate from the History table) that contains instance ID-to-partition mapping information. This table is periodically used by DTFx clients to know which queue to send control messages to. It may also be used for purposes unrelated to scale-out, like instance query.
- When DTFx clients create new instances, they randomly choose an available partition, update the Instances table, and then enqueue the start message in the appropriate control queue. In the future, we could consider a smarter load balancing algorithm than random.
- DTFx clients periodically scan the list of partitions (e.g. every 30 seconds) to learn about any changes.
- Workers take ownership of partitions using a blob lease. The blob lease contains the name of the associated queue as well as the status of the partition - e.g. "active" or "deprovisioning".
- Workers can steal partitions from other workers if the partition distribution is not balanced - e.g. if Worker A has two partitions and Worker B has none, Worker B can steal one partition from Worker A.
- All workers will try to acquire all partition leases in order to handle cases where workers lose their lease for any reason.
- If a worker acquires a "deprovisioning" lease, it will copy all messages in the deprovisioning queue into an active queue. When all messages have been removed from the deprovisioning queue for more than 60 seconds, the queue will be deleted. Waiting for 60 seconds ensures that all clients have time to learn that the deprovisioning queue should no longer be used for new orchestration instances.
- The work-item queue is stateless and global across the task hub. All work-item messages flow through this queue regardless of which orchestration instance they belong to.
SCALE CONTROLLER INTERACTION
- The scale controller will activate the function app if any visible messages exist in the work-item or API queues.
- The scale controller will add new VMs if the work-item queue length increases over time.
- The scale controller will add new VMs if both a partition's control queue length increases and the partition is on a VM that is also processing other partitions (otherwise adding VMs won't help).
CONCURRENCY MANAGEMENT
The owner of the function app can make the following changes in host.json:
- Number of concurrent activity functions per node
- Number of concurrent orchestrator functions per node
- Lease expiration time for partitions
- Lease renewal interval for partitions
- Lease acquire interval for partitions
- Number of partitions
All changes to host.json require all workers to be restarted.