Autostopping and long running background jobs

Autostopping by default listens for HTTP/HTTPS traffic. By in many cases, a VM can run long running background jobs such as machine learning jobs etc. In this case, just considering network traffic to detect idleness of a VM is not right.

For long running jobs, you can use lightwing heartbeat agent (ecg).

Setup ECG

Linux

> wget https://lightwing-downloads.s3-ap-southeast-1.amazonaws.com/ecg_1.1.0_linux_amd64.zip
> unzip ecg_1.1.0_linux_amd64.zip
> sudo ./install.sh

Run ECG

ecg should be run on the server. It comes with many predefined checks.

Waiting for a long running job to finish

Let us assume your long running job is a simple python script like:

> python trainmodel.py

You can configure ecg (/etc/lightwing/ecg.toml) like the below:

# Configuration file for ecg agent

apiURL = "https://api.lightwing.io"
authToken = ""
gatewayName = ""

# Uncomment and edit the following based on your need.

# For metrics based heartbeats configure the below section.
# A heart beat will be sent when the metrics is greater than or equal to the configured threshold

#[metrics]
#cpu = "40"
#memory = "5Gb"

# For process based heartbeats configure the below section.
# A heart beat will be sent when a process with matching condition is found

#[process]
#condition = "python*"

You will have to restart ECG process after making the configuration changes.

sudo systemctl restart ecg

ECG comes with pre-installed watchers. In the example above, you can uncomment metrics/process watcher.

With metrics watcher, you can send heartbeat signals when the specified metric threshold is reached. Process watcher watches for existence of processes matching the given condition.