Autostopping by default listens for HTTP/HTTPS traffic. By in many cases, a VM can run long running background jobs such as machine learning jobs etc. In this case, just considering network traffic to detect idleness of a VM is not right.
For long running jobs, you can use lightwing heartbeat agent (
> wget https://lightwing-downloads.s3-ap-southeast-1.amazonaws.com/ecg_1.1.0_linux_amd64.zip > unzip ecg_1.1.0_linux_amd64.zip > sudo ./install.sh
ecg should be run on the server. It comes with many predefined checks.
Waiting for a long running job to finish
Let us assume your long running job is a simple python script like:
> python trainmodel.py
You can configure
ecg (/etc/lightwing/ecg.toml) like the below:
# Configuration file for ecg agent apiURL = "https://api.lightwing.io" authToken = "" gatewayName = "" # Uncomment and edit the following based on your need. # For metrics based heartbeats configure the below section. # A heart beat will be sent when the metrics is greater than or equal to the configured threshold #[metrics] #cpu = "40" #memory = "5Gb" # For process based heartbeats configure the below section. # A heart beat will be sent when a process with matching condition is found #[process] #condition = "python*"
You will have to restart ECG process after making the configuration changes.
sudo systemctl restart ecg
ECG comes with pre-installed watchers. In the example above, you can uncomment metrics/process watcher.
With metrics watcher, you can send heartbeat signals when the specified metric threshold is reached. Process watcher watches for existence of processes matching the given condition.