Autostopping by default listens for HTTP/HTTPS traffic. By in many cases, a VM can run long running background jobs such as machine learning jobs etc. In this case, just considering network traffic to detect idleness of a VM is not right.
For long running jobs, you can use lightwing heartbeat agent (ecg
).
Setup ECG
Linux
> wget https://lightwing-downloads.s3-ap-southeast-1.amazonaws.com/ecg_1.1.0_linux_amd64.zip
> unzip ecg_1.1.0_linux_amd64.zip
> sudo ./install.sh
Run ECG
ecg
should be run on the server. It comes with many predefined checks.
Waiting for a long running job to finish
Let us assume your long running job is a simple python script like:
> python trainmodel.py
You can configure ecg
(/etc/lightwing/ecg.toml) like the below:
# Configuration file for ecg agent
apiURL = "https://api.lightwing.io"
authToken = ""
gatewayName = ""
# Uncomment and edit the following based on your need.
# For metrics based heartbeats configure the below section.
# A heart beat will be sent when the metrics is greater than or equal to the configured threshold
#[metrics]
#cpu = "40"
#memory = "5Gb"
# For process based heartbeats configure the below section.
# A heart beat will be sent when a process with matching condition is found
#[process]
#condition = "python*"
You will have to restart ECG process after making the configuration changes.
sudo systemctl restart ecg
ECG comes with pre-installed watchers. In the example above, you can uncomment metrics/process watcher.
With metrics watcher, you can send heartbeat signals when the specified metric threshold is reached. Process watcher watches for existence of processes matching the given condition.