Configuring Automatic Scaling

When deployed to a cloud cluster Connect is able to dynamically scale its available resource. This page provides a detailed breakdown of the configuration options available though the autoscaler.toml configuration file.

See Running On A Cluster for details on how to use the autoscaler.toml configuration

An example configuration is included in the Connect package at muxer-sdk/autoscaler/data and is shown below:

autoscaler.toml
[program]
path                    = "data/muxer/muxer"                                                  # program path
mappings                = [["config/muxer_config.toml", "data/muxer/muxer_config.toml"]]      # Target/Source pair.
environment_variables   = [["RUST_LOG","info"]]                                               # Env variabe pair

[program.uptime]
metric_name      = "muxer_uptime_sec"  # Metric to detect program failures.
# Time in seconds to allow same uptime allowed before killing the program.
# Please note that metrics are updated every 10 seconds.
threshold        = 30

[cluster]
location                = "uksouth"         # Location of the cluster (Azure: Region)
machine_class           = "Standard_A8_v2"  # The machine size (Azure)
disk_class              = "Standard_LRS"    # The storage type (Azure)
# location              = "eu-west-2a"      # Location of the cluster (AWS: Availability Zone)
# machine_class         = "c6i.2xlarge"     # The instance type (AWS)
# disk_class            = "gp2"             # The volume type (AWS)
# disk_size             = 80                # Disk size to provision with machine on AWS for each Muxer instance

[metrics]
timeout            = 400      # Timeout in milliseconds to wait metrics response. Too long will skew the timings.
allowed_timeouts   = 10       # How many times we allow timeouts. When there's a timeout autoscaler won't evaluate scaling rule.

[scalinglimit]
default = 10   # The starting number of program instances when the Autoscaler is initialised
max     = 30   # The maximum number of program instances when the Autoscaler is initialised
min     = 1    # The minimum number of program instances when the Autoscaler is initialised

# Try to keep the total headroom across all Muxers between 100 and 110
[scalingrule]
instance_capacity            = 1000          # The number of CCU each Muxer can tolerate
headroom_per_instance        = 10            # The amount of reserved capacity per Muxer instance
headroom_offset              = 50            # An additional constant reserved capacity
headroom_hysteresis          = 100           # A minimum capacity to achieve before scaling down 
despawn_threshold            = 1000          # Do not despawn if instance has more than 1000 clients.

sleep                        = 30            # The timeout before this rule triggers again
sample.period                = 1             # Period (seconds) to check for the aggregated ccu value
sample.window                = 120           # Take 120 samples
sample.aggregation           = "mean"        # The reported value is the mean over the sample window

Program

  • path - Path to program, this should take the folder that is uploaded into consideration.

  • mappings - This maps the files uploaded to the cluster to muxers current working directory

  • environment_variables - This sets environment variables of muxers

  • uptime - configures the parameters for detection of failed muxers via metrics. This is achieved by tracking the change of muxer_uptime_sec. This is updated by Connect and if it's not updated for threshold seconds Autoscaler considers the muxer as crashed and de-spawns it.

Cluster

  • location Availability zone that the Connect instances reside in. This must match the string passed to thehadean cluster create --availability-zone argument. If it doesn't the dynamic scaling will fail. Please check Platform SDK docs for further information.

  • machine_class The machine type on AWS for all Connect instances. Please check Platform SDK docs for information about supported values.

  • disk_class The volume type on AWS for all Connect instances. Please check Platform SDK docs for information about supported values.

  • disk_size The size of the volume created with each instance of Connect . Must meet minimum value for the associated volume type.

Metrics

  • timeout Timeout in milliseconds for metric server queries.

  • allowed_timeouts A machine will be deallocated from the cluster if this threshold of timeouts is passed.

Scaling Limit

A [scalinglimit] section must be provided. It defines the hard limits on the number of program instances managed by the Autoscaler.

It contains 3 values:

  • default is simply the initial number of program instances that will be spawned when the Autoscaler starts. It is worth keeping in mind that the Autoscaler will begin applying scaling rules immediately after the first client connection - setting a large default will make sense if there a lot of clients "waiting", but if not the Autoscaler will begin despawning instances until the capacity does not exceed demand.

  • max is the maximum number of instances that can exist at the same time. The Autoscaler does not explicitly support unbounded scaling, and max must always be provided (but in practice accepts values up to 2^32 - 1).

  • min is the minimum number of instances that the Autoscaler will maintain; it will not despawn below this limit. Setting this to 0 may cause the situation where all Connect nodes are despawned (and will never be respawned, because clients won't be able to connect).

Setting these values such that default == max == min effectively disables scaling - default instances will be spawned, and the Autoscaler will never go above or below that value. In this case, the [scalingrule] section may be omitted.

Scaling Rule

The [scalingrule] section describes how the Autoscaler should spawn and despawn Connect instances. The Autoscaler uses concurrently connected users (CCU) to determine when it should scale up or down. Scaling rule options are described below, you may not need to include all options in your particular case.

instance_capacity

  • instance_capacity is the practical CCU limit of Connect .

This value is application-specific. An application that is bandwidth-intensive may need to specify a smaller capacity than an application that communicates very little with its clients.

This value may be different from the limits defined in Connect configuration. Here, the value is used as a target, rather than a hard limit. It's important to remember that the Autoscaler does not control clients connecting to Connect nodes, there may be situations where an individual Connect node needs to hold more connections than specified here (and this may happen transiently as clients connect or reconnect).

headroom_offset, headroom_per_instance, and headroom_hysteresis

Autoscaler handles scaling by trying to maintain a certain reserve capacity, or headroom, at all times (unless otherwise constrained by options specified under [scalinglimit]).

  • headroom_offset - Autoscaler will ensure that at least this capacity is available regardless of the current number of Connect nodes.

  • headroom_per_instance Autoscaler will additionally reserve this amount of headroom for every Connect instance that is running.

  • headroom_hysteresis is used to provide an additional "buffer" when downscaling (to prevent the Autoscaler immediately scaling up again) - it is an additional amount of headroom that must be available in order to despawn a Connect instance. It has the same sharp hysteresis curve as observed with a https://en.wikipedia.org/wiki/Schmitt_trigger.

These three parameters can be treated as the gradient and intercept of the line equation,

Ht=HmM+HcHwH_t = H_m M + H_c -H_w'
Hw={ Hwon the way down0otherwiseH_w'=\left\{ \begin{array}{ c l } \ H_w & \quad \textrm{on the way down} \\ 0 & \quad \textrm{otherwise} \end{array} \right.

where H_t is the minimum amount of headroom - or 'free seats' - which is the threshold triggering a Connect node to spawn or despawn when crossed. H_m is headroom_per_instance , M is the number of Connect nodes currently 'ready' or 'provisioning', H_c is headroom_offset and H_w is headroom_hysteresis

For a simple worked example, using the values below with one starting Connect:

[scalingrule]
instance_capacity       = 1000
headroom_per_instance   = 50
headroom_offset         = 100
headroom_hysteresis     = 10
despawn_threshold       = 20
  • With one Connect instance, the desired headroom is 150. This means 850 clients can connect without a scale up (instance_capacity - ((headroom_per_instance * 1 instance) + headroom_offset).

  • Once the 851st client connects, the capacity of the cluster is less than the desired headroom, so a new instance is spawned.

  • With two instances, the desired headroom is (headroom_per_instance * 2 instances) + headroom_offset = 200. Since there is now a second instance adding 1000 capacity, the next scale-up would be when the 1801st client joins.

  • With three instances, the desired headroom is 250 and hence 2750 clients can be connected simultaneously.

To scale down again:

  • Autoscaler considers what the capacity at the end of the scale down operation will be to make its decision.

  • To scale down from 3 to 2 instances, the desired headroom at the end should be (headroom_per_instance * 2 instances) + headroom_offset + headroom_hysteresis = 210. This equates to there being less than 1790 clients connected.

Note that we don't scale down when less than 2740 (as we might expect from substituting 3 instances into the above formula) - if the capacity of each instance is only 1000, then deprovisioning from 3 instances to 2 at 2739 would leave 739 clients orphaned (and would also result in an immediate scale up as we'd have 0 capacity remaining!).

Autoscaler doesn't control which Connect node a client connects to. Although Autoscaler would select a node that has the least number of connected clients, despawning may result in a number of clients being disconnected from the Connect network - they will need to handle reconnection to surviving instances themselves. However this behaviour can be modified by changing despawn_threshold parameter. If this parameter is set to 0 autoscaler will not despawn any nodes that has connected clients.

Sleep

The sleep value causes the rule to be ignored if less than sleep seconds have elapsed since the last time the rule caused a scale up or scale down to occur.

The main two reasons for setting this are:

  • On scale up, spawning a new Virtual Machine in the cluster takes time. We don't want to keep triggering the rule whilst waiting for the VM to become ready

  • On scale down, removing an instance may disconnect a large number of clients that will need to reconnect. We need to wait for the reconnections to settle before making further scaling decisions.

If scaling up or down fails due to the scaling limit being hit, this does not trigger the timeout.

Sample Window

The final section defines a sampling window to take metrics over a longer span of time, and how the values should be aggregated.

  • sample.period (seconds) The sampling period for the scaling logic

  • sample.window defines how many of these values should be kept in a sliding window

  • sample.aggregation defines how the sliding window as a whole is mathematically reduced to a single value - this value is the one used for controlling the Autoscaler. This is described in a little more detail below.

If you don't anticipate particularly spiky demand, setting sample.period and sample.window to 1 and sample.aggregation to max, min, mean, or median effectively disables this feature:

# Query metrics every 10 seconds, perform no aggregation
sample.period = 10
sample.window = 1
sample.aggregation = 'max'

The available aggregation methods are:

  • max - [Recommended] take the highest value in the window (e.g. base the decisions of the Autoscaler on the highest CCU over the last 2 minutes)

  • min - take the minimum value over the timeframe

  • mean - use the mean average

  • median - use the median average

As an example, the below configuration will check for metrics once a minute, and use the maximum sampled value for CCU over the last 5 minutes to decide on whether to scale up or down.

sample.period = 60
sample.window = 5
sample.aggregation = 'max'

For completeness, range (the difference between min and max) and sum (adds all values) are also supported aggregation methods, but these have limited utility within CCU and the current rule definition format.

The Autoscaler does not move the connected clients before or after scaling up and down.

Last updated