Use Hashicorp Nomad Cluster Autoscaling in OpenStack
8 minute read
Overview
Hashicorps Workload Orchestrator - Nomad - offers various autoscaling functions, which are implemented with the Nomad Autoscaler. This tutorial shows how you can do “Horizontal Cluster Autoscaling” (which means to dynamically add or remove cluster nodes to resp. from the Nomad cluster) in an OpenStack environment.
Download the Nomad Autoscaler binary
In order to use autoscaling in a Nomad cluster, you need to configure and start the Nomad Autoscaler, which comes as a separate binary. It makes sense to start the Nomad Autoscaler as a Nomad job. Therefore you just download the required binary from releases.hashicorp.com and move it to /usr/local/bin on one of your Nomad clients.
Create a Nomad job file for the Nomad Autoscaler
As we want to run the Autoscaler as a Nomad job, we have to create a job file for it. That file has all the required configuration parameters. We will go through an example file here:
As with all Nomad job files we first configure region and datacenter(s) in which the job could be run. As this job is close the the control plane, we choose to run the job in its own namespace. This makes it necessary to apply a policy with the required rights for this namespace and job and create a token out of this policy. This is the token, that is used in the next steps.
The http endpoint of the Autoscaler is bound to a port - we let Nomad choose it randomly.
Next follow the configuration parameters of the Autoscaler. We configure the directory for plugins as well as directories for the general configuration and scaling-policies.
To let the Autoscaler access the local Nomad client, it needs the neccessary ssl certificates and a Nomad token which has the rights to create cluster nodes (s. above).
The beginning of the job files for the deployment of an instance of the Autoscaler could look like this:
job "autoscaler-prod4" {
region = "de-west"
datacenters = ["prod4"]
namespace = "autoscalerprod4"
group "autoscaler" {
network {
port "http" {}
}
task "autoscaler_agent" {
driver = "exec"
config {
command = "/usr/local/bin/nomad-autoscaler"
args = [
"agent",
"-plugin-dir=local/nomad-autoscaler/plugins",
"-config=local/nomad-autoscaler/etc",
"-policy-dir=local/nomad-autoscaler/etc/policies",
"-nomad-address=https://127.0.0.1:4646",
"-http-bind-address=${NOMAD_IP_http}",
"-http-bind-port=${NOMAD_PORT_http}",
"-nomad-ca-cert=local/nomad-autoscaler/etc/certificates/ca.pem",
"-nomad-client-cert=local/nomad-autoscaler/etc/certificates/cert.pem",
"-nomad-client-key=local/nomad-autoscaler/etc/certificates/private_key.pem",
"-nomad-region=de-west"
]
}
[...]
The Nomad Nova Autoscaler Plugin
The autoscaler functions can be extended via plugins (e. g. in order to work with different cloud providers) as shortly mentioned above. The nova plugin for the Nomad Autoscaler is (among others) linked from the Nomad Autoscaler documentation. It is convenient to download the plugin from a cental distribution point, when the nomad job is executed:
[...]
artifact {
source = "https://github.com/jorgemarey/nomad-nova-autoscaler/releases/download/v0.6.0/nomad-nova-autoscaler-v0.6.0-linux-amd64.tar.gz"
destination = "local/nomad-autoscaler/plugins"
options {
checksum = "md5:fec29af8625842b154d30be8b8db305f"
}
}
[...]
Use of Nomad Variables
In order to keep sensitive information like ssl certificates or tokens out of the job file, it is useful to store them in Nomad Variables or in a Hashicorp Vault. In this part of the job file we see, that the Nomad token, the ssl certificates and the credentials to the OpenStack environment, which the Nova Autoscaler Plugin needs, are stored in Nomad variables. The Nomad APM Plugin, which can deliver cpu- und memory data, is used for the application performance management (APM). It is available automatically with every Nomad client as those are gathering cpu- and memory data of the clients. For more sophisticated metrics and scaling parameters (e. g. connections per second) we would use the Prometheus APM Plugin which obviously needs a running Prometheus instance and matching exporters.
[...]
template {
destination = "${NOMAD_SECRETS_DIR}/env.txt"
env = true
data = <<EOT
NOMAD_TOKEN={{ with nomadVar "nomad/jobs/autoscaler-prod4" }}{{ .token }}{{ end }}
EOT
}
template {
data = <<EOH
{{ with nomadVar "nomad/jobs/autoscaler-prod4" }}{{ .cacert }}{{ end }}
EOH
destination = "local/nomad-autoscaler/etc/certificates/ca.pem"
}
template {
data = <<EOH
{{ with nomadVar "nomad/jobs/autoscaler-prod4" }}{{ .clientcert }}{{ end }}
EOH
destination = "local/nomad-autoscaler/etc/certificates/cert.pem"
}
template {
data = <<EOH
{{ with nomadVar "nomad/jobs/autoscaler-prod4" }}{{ .clientkey }}{{ end }}
EOH
destination = "local/nomad-autoscaler/etc/certificates/private_key.pem"
}
template {
data = <<EOH
apm "nomad-apm" {
driver = "nomad-apm"
}
target "os-nova" {
driver = "os-nova"
config = {
auth_url = {{- with nomadVar "nomad/jobs/autoscaler-prod4" }} "{{ .osauthurl }}" {{- end }}
username = {{- with nomadVar "nomad/jobs/autoscaler-prod4" }} "{{ .osusername }}" {{- end }}
password = {{- with nomadVar "nomad/jobs/autoscaler-prod4" }} "{{ .ospassword }}" {{- end }}
domain_name = {{- with nomadVar "nomad/jobs/autoscaler-prod4" }} "{{ .osdomainname }}" {{- end }}
project_id = {{- with nomadVar "nomad/jobs/autoscaler-prod4" }} "{{ .osprojectid }}" {{- end }}
region_name = {{- with nomadVar "nomad/jobs/autoscaler-prod4" }} "{{ .osregion }}" {{- end }}
}
}
[...]
Configure the scaling strategy
At least we have to configure the scaling strategy. This consists of a check, the strategy itself and a definition of the target, for which the strategy is going to be used. The check uses the data of the APM plugin to implement the strategy (“keep the percentage of allocated CPU at 70” in this case) for the target (using the “os-nova” target plugin).
As a strategy we chose “target-value”, in order to keep the cpu usage of the Nomad clients at about 70%.
In the policy we additionally configure the length of the “cooldown” period, in which the autoscaler “pauses”, after a scaling event. Furthermore we set the “evaluation_interval” in which the autoscaler evaluates wether the number of Nomad clients needs to be changed. Also we set the min and max values, which define the limits in which the autoscaler will create resp. destroy Nomad clients.
The documentation of the OpenStack Nova Autoscaler Plugin has detailed information on all the various parameters. Most of them are self explaining. In order to implement the above strategy the autoscaler needs a defined Node-Pool and a “node_class” which is applied to all created Nomad clients, to have attributes, which apply to all the nodes, the plugin should watch.
As in this example it is possible to apply servergroups and security-groups to the newly created Nomad clients.
In the end we apply some cpu- and memory resources to the job itself.
[...]
strategy "target-value" {
driver = "target-value"
}
EOH
destination = "local/nomad-autoscaler/etc/nom>
}
template {
data = <<EOH
scaling "worker_pool_policy" {
enabled = true
min = 1
max = 2
policy {
cooldown = "2m"
evaluation_interval = "1m"
check "cpu_allocated_percentage" {
source = "nomad-apm"
query = "percentage-allocated_cpu"
strategy "target-value" {
target = 70
}
}
target "os-nova" {
dry-run = false
stop_first = true
image_id = "0c453c2c-cdc2-416a-95f7-c1>
flavor_name = "SCS-2V-2-20"
pool_name = "nom-pool"
name_prefix = "nom-"
network_id = "275b130d-c650-4f20-a25c-1f>
security_groups = "default"
availability_zones = "az1"
tags = "nom-pool,ubuntu-minimal"
server_group_id = "373265a7-5856-4e5c-a371-43>
node_class = "dynamic"
node_drain_deadline = "1h"
node_drain_ignore_system_jobs = false
node_purge = true
node_selector_strategy = "least_busy"
}
}
}
EOH
destination = "local/nomad-autoscaler/etc/pol>
}
resources {
cpu = 50
memory = 128
}
}
}
}
Result
Using the OpenStack Nova Autoscaler Plugin Nomad can dynamically create and remove Nomad clients in a Nomad cluster. When you have created an image (e. g. using Packer or Terraform) for your dynamically started Nomad clients and have started the Autoscaler with the job file, you should see automatically created nodes showing similar to this:
root@nomad1:~# nomad node status -allocs -os |grep -i dynamic
d838cb47 nom-pool prod4 nom-1a607061-1f5c dynamic debian false eligible ready 1
7f10377b nom-pool prod4 nom-d1d8e976-bc4f dynamic debian false eligible ready 1
Complete Nomad job file
job "autoscaler-ha-prod4" {
region = "de-west"
datacenters = ["prod4"]
namespace = "autoscalerprod4"
group "autoscaler" {
network {
port "http" {}
}
task "autoscaler_agent" {
driver = "exec"
config {
command = "/usr/local/bin/nomad-autoscaler"
args = [
"agent",
"-plugin-dir=local/nomad-autoscaler/plugins",
"-config=local/nomad-autoscaler/etc",
"-policy-dir=local/nomad-autoscaler/etc/policies",
"-nomad-address=https://127.0.0.1:4646",
"-http-bind-address=${NOMAD_IP_http}",
"-http-bind-port=${NOMAD_PORT_http}",
"-nomad-ca-cert=local/nomad-autoscaler/etc/certificates/ca.pem",
"-nomad-client-cert=local/nomad-autoscaler/etc/certificates/cert.pem",
"-nomad-client-key=local/nomad-autoscaler/etc/certificates/private_key.pem",
"-nomad-region=de-west"
]
}
template {
destination = "${NOMAD_SECRETS_DIR}/env.txt"
env = true
data = <<EOT
NOMAD_TOKEN={{ with nomadVar "nomad/jobs/autoscaler-ha-prod4" }}{{ .token }}{{ end }}
EOT
}
artifact {
source = "https://github.com/jorgemarey/nomad-nova-autoscaler/releases/download/v0.6.0/nomad-nova-autoscaler-v0.6.0-linux-amd64.tar.gz"
destination = "local/nomad-autoscaler/plugins"
options {
checksum = "md5:fec29af8625842b154d30be8b8db305f"
}
}
template {
data = <<EOH
{{ with nomadVar "nomad/jobs/autoscaler-ha-prod4" }}{{ .cacert }}{{ end }}
EOH
destination = "local/nomad-autoscaler/etc/certificates/ca.pem"
}
template {
data = <<EOH
{{ with nomadVar "nomad/jobs/autoscaler-ha-prod4" }}{{ .clientcert }}{{ end }}
EOH
destination = "local/nomad-autoscaler/etc/certificates/cert.pem"
}
template {
data = <<EOH
{{ with nomadVar "nomad/jobs/autoscaler-ha-prod4" }}{{ .clientkey }}{{ end }}
EOH
destination = "local/nomad-autoscaler/etc/certificates/private_key.pem"
}
template {
data = <<EOH
apm "nomad-apm" {
driver = "nomad-apm"
}
target "os-nova" {
driver = "os-nova"
config = {
auth_url = {{- with nomadVar "nomad/jobs/autoscaler-ha-prod4" }} "{{ .osauthurl }}" {{- end }}
username = {{- with nomadVar "nomad/jobs/autoscaler-ha-prod4" }} "{{ .osusername }}" {{- end }}
password = {{- with nomadVar "nomad/jobs/autoscaler-ha-prod4" }} "{{ .ospassword }}" {{- end }}
domain_name = {{- with nomadVar "nomad/jobs/autoscaler-ha-prod4" }} "{{ .osdomainname }}" {{- end }}
project_id = {{- with nomadVar "nomad/jobs/autoscaler-ha-prod4" }} "{{ .osprojectid }}" {{- end }}
region_name = {{- with nomadVar "nomad/jobs/autoscaler-ha-prod4" }} "{{ .osregion }}" {{- end }}
}
}
strategy "target-value" {
driver = "target-value"
}
EOH
destination = "local/nomad-autoscaler/etc/nomad-autoscaler.hcl"
}
template {
data = <<EOH
scaling "worker_pool_policy" {
enabled = true
min = 1
max = 2
policy {
cooldown = "2m"
evaluation_interval = "1m"
check "cpu_allocated_percentage" {
source = "nomad-apm"
query = "percentage-allocated_cpu"
strategy "target-value" {
target = 70
}
}
target "os-nova" {
dry-run = false
stop_first = true
image_id = "0c453c2c-cdc2-416a-95f7-c1779ed2fc54"
flavor_name = "SCS-2V-2-20"
pool_name = "nom-pool"
name_prefix = "nom-"
network_id = "275b130d-c650-4f20-a25c-1f6568f520dc"
security_groups = "default"
availability_zones = "az1"
tags = "nom-pool,ubuntu-minimal"
server_group_id = "373265a7-5856-4e5c-a371-43b923c4a3d0"
node_class = "dynamic"
node_drain_deadline = "1h"
node_drain_ignore_system_jobs = false
node_purge = true
node_selector_strategy = "least_busy"
}
}
}
EOH
destination = "local/nomad-autoscaler/etc/policies/scaling-policy.hcl"
}
resources {
cpu = 50
memory = 128
}
}
}
}