Configuring Automatic Scaling
When deployed to a cloud cluster Connect is able to dynamically scale its available resource. This page provides a detailed breakdown of the configuration options available though the autoscaler.toml configuration file.
An example configuration is included in the Connect package at
muxer-sdk/autoscaler/dataand is shown below:
path = "data/muxer/muxer" # program path
mappings = [["config/muxer_config.toml", "data/muxer/muxer_config.toml"]] # Target/Source pair.
environment_variables = [["RUST_LOG","info"]] # Env variabe pair
metric_name = "muxer_uptime_sec" # Metric to detect program failures.
# Time in seconds to allow same uptime allowed before killing the program.
# Please note that metrics are updated every 10 seconds.
threshold = 30
location = "uksouth" # Location of the cluster (Azure: Region)
machine_class = "Standard_A8_v2" # The machine size (Azure)
disk_class = "Standard_LRS" # The storage type (Azure)
# location = "eu-west-2a" # Location of the cluster (AWS: Availability Zone)
# machine_class = "c6i.2xlarge" # The instance type (AWS)
# disk_class = "gp2" # The volume type (AWS)
# disk_size = 80 # Disk size to provision with machine on AWS for each Muxer instance
timeout = 400 # Timeout in milliseconds to wait metrics response. Too long will skew the timings.
allowed_timeouts = 10 # How many times we allow timeouts. When there's a timeout autoscaler won't evaluate scaling rule.
default = 10 # The starting number of program instances when the Autoscaler is initialised
max = 30 # The maximum number of program instances when the Autoscaler is initialised
min = 1 # The minimum number of program instances when the Autoscaler is initialised
# Try to keep the total headroom across all Muxers between 100 and 110
instance_capacity = 1000 # The number of CCU each Muxer can tolerate
headroom_per_instance = 10 # The amount of reserved capacity per Muxer instance
headroom_offset = 50 # An additional constant reserved capacity
headroom_hysteresis = 100 # A minimum capacity to achieve before scaling down
despawn_threshold = 1000 # Do not despawn if instance has more than 1000 clients.
sleep = 30 # The timeout before this rule triggers again
sample.period = 1 # Period (seconds) to check for the aggregated ccu value
sample.window = 120 # Take 120 samples
sample.aggregation = "mean" # The reported value is the mean over the sample window
path- Path to program, this should take the folder that is uploaded into consideration.
mappings- This maps the files uploaded to the cluster to
muxerscurrent working directory
environment_variables- This sets environment variables of
uptime- configures the parameters for detection of failed
muxersvia metrics. This is achieved by tracking the change of
muxer_uptime_sec. This is updated by Connect and if it's not updated for
thresholdseconds Autoscaler considers the
muxeras crashed and de-spawns it.
locationAvailability zone that the Connect instances reside in. This must match the string passed to the
hadean cluster create --availability-zoneargument. If it doesn't the dynamic scaling will fail. Please check Platform SDK docs for further information.
machine_classThe machine type on AWS for all Connect instances. Please check Platform SDK docs for information about supported values.
disk_classThe volume type on AWS for all Connect instances. Please check Platform SDK docs for information about supported values.
disk_sizeThe size of the volume created with each instance of Connect . Must meet minimum value for the associated volume type.
locationRegion that the Connect instances reside in. This must match the string passed to the
hadean cluster create --locationargument. If it doesn't the dynamic scaling will fail. Please check Platform SDK docs for further information.
machine_classThe machine size for all Connect instances. Please check Platform SDK docs for information about supported values.
disk_classThe storage type for all Connect instances. Please check Platform SDK docs for information about supported values.
disk_sizeNot applicable for Azure
timeoutTimeout in milliseconds for metric server queries.
allowed_timeoutsA machine will be deallocated from the cluster if this threshold of timeouts is passed.
[scalinglimit]section must be provided. It defines the hard limits on the number of program instances managed by the Autoscaler.
It contains 3 values:
defaultis simply the initial number of program instances that will be spawned when the Autoscaler starts. It is worth keeping in mind that the Autoscaler will begin applying scaling rules immediately after the first client connection - setting a large default will make sense if there a lot of clients "waiting", but if not the Autoscaler will begin despawning instances until the capacity does not exceed demand.
maxis the maximum number of instances that can exist at the same time. The Autoscaler does not explicitly support unbounded scaling, and
maxmust always be provided (but in practice accepts values up to 2^32 - 1).
minis the minimum number of instances that the Autoscaler will maintain; it will not despawn below this limit. Setting this to 0 may cause the situation where all Connect nodes are despawned (and will never be respawned, because clients won't be able to connect).
Setting these values such that
default == max == mineffectively disables scaling -
defaultinstances will be spawned, and the Autoscaler will never go above or below that value. In this case, the
[scalingrule]section may be omitted.
[scalingrule]section describes how the Autoscaler should spawn and despawn Connect instances. The Autoscaler uses concurrently connected users (CCU) to determine when it should scale up or down. Scaling rule options are described below, you may not need to include all options in your particular case.
instance_capacityis the practical CCU limit of Connect .
This value is application-specific. An application that is bandwidth-intensive may need to specify a smaller capacity than an application that communicates very little with its clients.
This value may be different from the limits defined in Connect configuration. Here, the value is used as a target, rather than a hard limit. It's important to remember that the Autoscaler does not control clients connecting to Connect nodes, there may be situations where an individual Connect node needs to hold more connections than specified here (and this may happen transiently as clients connect or reconnect).
headroom_offset, headroom_per_instance, and headroom_hysteresis
Autoscaler handles scaling by trying to maintain a certain reserve capacity, or headroom, at all times (unless otherwise constrained by options specified under
headroom_offset- Autoscaler will ensure that at least this capacity is available regardless of the current number of Connect nodes.
headroom_per_instanceAutoscaler will additionally reserve this amount of headroom for every Connect instance that is running.
headroom_hysteresisis used to provide an additional "buffer" when downscaling (to prevent the Autoscaler immediately scaling up again) - it is an additional amount of headroom that must be available in order to despawn a Connect instance. It has the same sharp hysteresis curve as observed with a https://en.wikipedia.org/wiki/Schmitt_trigger.
These three parameters can be treated as the gradient and intercept of the line equation,
where H_t is the minimum amount of headroom - or 'free seats' - which is the threshold triggering a Connect node to spawn or despawn when crossed. H_m is
headroom_per_instance, M is the number of Connect nodes currently 'ready' or 'provisioning', H_c is
headroom_offsetand H_w is
For a simple worked example, using the values below with one starting Connect:
instance_capacity = 1000
headroom_per_instance = 50
headroom_offset = 100
headroom_hysteresis = 10
despawn_threshold = 20
- With one Connect instance, the desired headroom is 150. This means 850 clients can connect without a scale up (
1 instance) +
- Once the 851st client connects, the capacity of the cluster is less than the desired headroom, so a new instance is spawned.
- With two instances, the desired headroom is (
2 instances) +
headroom_offset= 200. Since there is now a second instance adding 1000 capacity, the next scale-up would be when the 1801st client joins.
- With three instances, the desired headroom is 250 and hence 2750 clients can be connected simultaneously.
To scale down again:
- Autoscaler considers what the capacity at the end of the scale down operation will be to make its decision.
- To scale down from 3 to 2 instances, the desired headroom at the end should be (
2 instances) +
headroom_hysteresis= 210. This equates to there being less than 1790 clients connected.
Note that we don't scale down when less than 2740 (as we might expect from substituting 3 instances into the above formula) - if the capacity of each instance is only 1000, then deprovisioning from 3 instances to 2 at 2739 would leave 739 clients orphaned (and would also result in an immediate scale up as we'd have 0 capacity remaining!).
Autoscaler doesn't control which Connect node a client connects to. Although Autoscaler would select a node that has the least number of connected clients, despawning may result in a number of clients being disconnected from the Connect network - they will need to handle reconnection to surviving instances themselves. However this behaviour can be modified by changing
despawn_thresholdparameter. If this parameter is set to
0autoscaler will not despawn any nodes that has connected clients.
sleepvalue causes the rule to be ignored if less than
sleepseconds have elapsed since the last time the rule caused a scale up or scale down to occur.
The main two reasons for setting this are:
- On scale up, spawning a new Virtual Machine in the cluster takes time. We don't want to keep triggering the rule whilst waiting for the VM to become ready
- On scale down, removing an instance may disconnect a large number of clients that will need to reconnect. We need to wait for the reconnections to settle before making further scaling decisions.
If scaling up or down fails due to the scaling limit being hit, this does not trigger the timeout.
The final section defines a sampling window to take metrics over a longer span of time, and how the values should be aggregated.
sample.period(seconds) The sampling period for the scaling logic
sample.windowdefines how many of these values should be kept in a sliding window
sample.aggregationdefines how the sliding window as a whole is mathematically reduced to a single value - this value is the one used for controlling the Autoscaler. This is described in a little more detail below.
If you don't anticipate particularly spiky demand, setting
sample.windowto 1 and
medianeffectively disables this feature:
# Query metrics every 10 seconds, perform no aggregation
sample.period = 10
sample.window = 1
sample.aggregation = 'max'
The available aggregation methods are:
max- [Recommended] take the highest value in the window (e.g. base the decisions of the Autoscaler on the highest CCU over the last 2 minutes)
min- take the minimum value over the timeframe
mean- use the mean average
median- use the median average
As an example, the below configuration will check for metrics once a minute, and use the maximum sampled value for CCU over the last 5 minutes to decide on whether to scale up or down.
sample.period = 60
sample.window = 5
sample.aggregation = 'max'
range(the difference between
sum(adds all values) are also supported aggregation methods, but these have limited utility within CCU and the current rule definition format.
The Autoscaler does not move the connected clients before or after scaling up and down.