Metrics
Prerequisites
In order to gather metrics, make sure you have a simulation that is currently running. If that is not the case, please refer to the section on starting your simulation. You will also need your cluster key for accessing the metrics of remote simulations.
Other dependencies:

Accessing metrics through aether CLI

While your simulation is running, our infrastructure will collect metrics from all the different machines, and make them available through our HTTP Gateway. The easiest way to talk to the Gateway is through the aether CLI.
We collect metrics on a per-machine basis, and since your simulation is run on multiple machines, the Gateway will expose multiple endpoints from where to scrape metrics.
To list all the endpoints, we will use the aether metric --list command.
$ aether metric --list
//169.2.2.1/exporter/
//169.2.2.1/node_exporter/
To get the actual metrics, we will use the aether metric --get <endpoint> command, which will print all the metrics of a particular machine (in Prometheus format) to stdout.
$ aether metric --get //169.2.2.1/exporter/
# HELP metrics_server_dev_requests_served Number of requests served by this metrics server
# TYPE metrics_server_dev_requests_served counter
metrics_server_dev_requests_served 2
# HELP metrics_server_dev_uptime_ms Uptime of the metrics server
# TYPE metrics_server_dev_uptime_ms gauge
metrics_server_dev_uptime_ms 36360
# TYPE total_collisions counter
total_collisions{worker_id="5",hadean_pid="127.0.0.1.18021.0"} 11.000000
total_collisions{worker_id="6",hadean_pid="127.0.0.1.18027.0"} 20.000000
total_collisions{worker_id="7",hadean_pid="127.0.0.1.18015.0"} 11.000000
total_collisions{worker_id="4",hadean_pid="127.0.0.1.18019.0"} 15.000000
# TYPE dispatcher_manager_messages counter
dispatcher_manager_messages{hadean_pid="127.0.0.1.18011.0"} 70.000000
# HELP muxer_data_recv_from_workers_kib The ingress bandwidth received by a muxer from workers (in KiBs).
# TYPE muxer_data_recv_from_workers_kib counter
muxer_data_recv_from_workers_kib{hadean_pid="127.0.0.1.18007.0"} 49895.578125
muxer_data_recv_from_workers_kib{hadean_pid="127.0.0.1.18005.0"} 49558.5
# TYPE dispatcher_muxer_messages counter
dispatcher_muxer_messages{hadean_pid="127.0.0.1.18011.0"} 2.000000
# HELP aether_worker_received_messages_total The number of messages received by this worker.
# TYPE aether_worker_received_messages_total counter
aether_worker_received_messages_total{workerid="140720588402528",hadean_pid="127.0.0.1.18021.0"} 0.000000
aether_worker_received_messages_total{workerid="140736352035504",hadean_pid="127.0.0.1.18027.0"} 0.000000
aether_worker_received_messages_total{workerid="140736223445632",hadean_pid="127.0.0.1.18015.0"} 0.000000
aether_worker_received_messages_total{workerid="140730104848272",hadean_pid="127.0.0.1.18019.0"} 0.000000
# TYPE aether_time_elapsed_sec gauge
Note: for remote runs, you can simply add the --cluster <key> argument to your aether metric invocations.
Behind the scenes, the CLI's --list and --get commands query our Hadean Gateway REST API. For example --list queries the /rest/v0/managers/metrics/ endpoint of our API and returns to stdout its response. The --get command on the other hand simply makes a request to /rest/v0/<ip>/exporter/metrics/ to scrape the metrics of one of our services, and returns the output to stdout. The Gateway can only be accessed from within your cluster, but the CLI creates a tunnel for you and does these requests on your behalf.

Setting up a metric server proxy

Another way to access the Gateway is through the aether metric --serve command. This makes the Gateway queryable through curl, your browser and Prometheus on a local address.

WSL container IP

The WSL container IP is the wsl address from the NAT bridge. It is used to connect to services running inside the distribution. We can get the WSL container IP through wsl hostname -I or before a simulation runs:
Run process showing Wsl container IP

Serving metrics

This will start a proxy to our Gateway on the default port:
$ aether metric --serve --ip "127.0.0.1"
The argument --ip replaces the address only for remote runs. In local runs, the WSL container IP overrides the argument passed and will always be used to start the metrics server.
Serving metrics at http://127.0.0.1:12336
The benefit of using serve over get/list is that you can build more complex things on top of our Gateway API.

Using a browser to access metrics

First we will navigate to http://127.0.0.1:12336/rest/v0/managers/metrics, which should return a JSON like:
{ "endpoints": ["/127.0.0.1/exporter/", "/127.0.0.1/node_exporter/"] }
We then navigate to http://127.0.0.1:12336/rest/v0/127.0.0.1/exporter/metrics in order to get the metrics of a particular machine.

Using Invoke-WebRequest or curl to access metrics

The process is very similar to the one above. The output format for both requests will be the same:
$ Invoke-WebRequest http://127.0.0.1:12336/rest/v0/managers/metrics
{"endpoints":["/127.0.0.1/exporter/","/127.0.0.1/node_exporter/"]}
$ Invoke-WebRequest http://127.0.0.1:12336/rest/v0/127.0.0.1/exporter/metrics
StatusCode : 200
StatusDescription : OK
Content : # HELP metrics_server_dev_requests_served Number of requests served by this metrics server
# TYPE metrics_server_dev_requests_served counter
metrics_server_dev_requests_served 3
# HELP metrics_server_...
RawContent : HTTP/1.1 200 OK
Content-Length: 5805
Content-Type: text/plain;charset=utf-8;version=0.0.4
Date: Wed, 08 Sep 2021 12:48:48 GMT
# HELP metrics_server_dev_requests_served Number of requests served ...
Forms : {}
Headers : {[Content-Length, 5805], [Content-Type, text/plain;charset=utf-8;version=0.0.4], [Date, Wed, 08 Sep 2021 12:48:48 GMT]}
Images : {}
InputFields : {}
Links : {}
ParsedHtml : mshtml.HTMLDocumentClass
RawContentLength : 5805

Using Prometheus to access metrics

An even more user-friendly way to scrape metrics, is to let Prometheus do it for you! For this we expose the aether metric --prometheus-template command. This will generate a Prometheus configuration which defines all the jobs that will scrape endpoints for metrics.
$ aether metric --prometheus-template
The --serve-port and --serve-host options can be used to point at a aether metric --serve command running on a different machine.
This above command will create a file .\prometheus.yml which can be supplied to Prometheus via: prometheus --config.file=.\prometheus.yml. To see your metrics, navigate to http://localhost:9090.

Dynamic scaling

Your simulation can scale up or down depending on your current load, which means that new machines might get created or machines could get brought down. This means that the endpoints returned by --list (including the REST endpoint /rest/v0/managers/metrics) can become outdated. This means that you need to regularly refresh the list of endpoints in order to update your view of the system. For example, let's assume that you are running a simulation and you want to see the current set of metrics. You execute aether metric --list and aether metrics --get <endpoint> to get them, but now your simulation scales up (or down) due to a change in load. To see the new list of endpoints you must --list the endpoints again, since you only know about the endpoints from before the load went up.
Note: Even if you know where Hadean Platform will spawn your next machine, you cannot assume that /rest/v0/<new-ip-address>/exporter/metrics/ is queryable, unless /rest/v0/managers/metrics was queried beforehand.

A more dynamic Prometheus config

So far we have shown how to create a static template file, but what if your simulation dynamically scales while Prometheus is scraping? For this we will need a script which generates a new template file, and asks Prometheus to reload it.
# generate the config
aether metric --prometheus-template
# load it with Prometheus, while also enabling the `lifecycle` option
Start-Process prometheus -ArgumentList '--config.file=.\prometheus.yml','--web.enable-lifecycle'
for ( ; ; )
{
# sleep for a few minutes
Start-Sleep -S 300
# regenerate config
aether metric --prometheus-template
# tell Prometheus to reload the config file
Invoke-WebRequest http://localhost:9090/-/reload -Method Post
}

Viewing metrics in Grafana

Download and install Grafana.
Download our basic dashboard config JSON file blow:
basic-dashboard.json
13KB
Code
Basic Aether Engine dashboard config
Run Grafana, open http://localhost:3000 in your browser, and login (default user/pass is admin/admin).
You should now be able to see multiple panels showing various Aether Metrics, including tick rate for example.

Remote runs

For remote runs, you can simply add the --cluster <key> argument to your aether metric invocations.

Common errors

If you can't get your metrics, it is most likely because the simulation is not currently running.
If you get an output of 404 Not Found while executing aether metric --get //endpoint/, make sure that:
  • --list is listing the endpoint.
  • The endpoint starts with two forward slashes (//).
  • The endpoint ends with a forward slash (/). If you are still getting 404 errors, get in contact with us since there might be a problem with the cluster.
The Hadean Gateway API can also return JSON objects that have an error key set to the error which occured. This usually means something has gone wrong on our side. Example: querying /rest/v0/managers/metrics can return { "error": "Failed to do X." }. We respect HTTP status codes, in the sense that if something has gone wrong on our side, we return a 50X status code, but if the request is successful we return a 200 status code.