A Ruby gem to monitor docker swarm mode services and auto-scale them based on user configuration. It can be used to monitor web services and worker services. The web services type has metrics like response time using New Relic. The worker services type metrics are basically the queue size for each. This gem is inspired by HireFire and was indeed motivated by the migration from Heroku to Docker Swarm mode.
Add this line to your application's Gemfile:
gem 'scaltainer'
And then execute:
$ bundle
Or install it yourself as:
$ gem install scaltainer
scaltainer
This will do a one-time check on the running service replicas and sends scaling out/in commands to the swarm cluster as appropriate.
Configuration is read from scaltainer.yml
by default. If you want to read from another file add -f yourconfig.yml
:
scaltainer -f yourconfig.yml
Note that after each run a new file is created (yourconfig.yml.state
) which stores the state of the previous run.
This is because there are some configuration parameters (like sensitivity) need to
remember previous runs.
If you want to specify a different location for the state file, add the --state-file
parameter.
Example:
scaltainer -f /path/to/configuration/file.yml --state-file /path/to/different/state/file.yml
Typically one would want to repeatedly call scaltainer every minute or so. To do this
specify the wait time between repetitions using the -w
parameter in seconds:
scaltainer -w 60
This will repeatedly call scaltainer every 60 seconds, sleeping in between.
-
DOCKER_URL
: Should point to the docker engine URL. If not set, it defaults to local unix socket. -
HIREFIRE_TOKEN
: If your application is configured the hirefire way, you need to setHIREFIRE_TOKEN
environment variable before invokingscaltainer
. This is used when probing your application endpoint (see below) to get the number of jobs per queue for each worker. -
NEW_RELIC_API_KEY
: New Relic API key. Currently New Relic is used to retrieve average response time metric for web services. More monitoring services can be added in the future. -
RESPONSE_TIME_WINDOW
: Time window in minutes to measure average response time till the moment. For example 3 means measure average response time in the past 3 minutes. Default value is 5. -
LOG_LEVEL
: Accepted values here are:DEBUG
,INFO
(default),WARN
,ERROR
,FATAL
. Log output goes to stdout. -
DOCKER_SECRETS_PATH_GLOB
: Path glob containing environment files to load. This is useful if running from a docker swarm mode environment where one or more of the above environment variables are set usingdocker config
ordocker secret
. These files should be in the formVARIABLE=value
. A typical value of this variable would be:{/run/secrets/*,/config1,/config2}
The configuration file (determined by -f FILE
command line parameter) should be in the following form:
# to get worker metrics
endpoint: https://your-app.com/hirefire/$HIREFIRE_TOKEN/info
# optional docker swarm stack name or kubernetes namespace
namespace: mynamespace
# list of web services to monitor
web_services:
# each service name should match docker service name
web:
# New Relic application id (required)
newrelic_app_id: <app_id>
# minimum replicas to maintain (default: 0)
min: 1
# maximum replicas to maintain (default: unlimited)
max: 5
# maximum response time above which to scale up (required)
max_response_time: 300
# minimum response time below which to scale down (required)
min_response_time: 100
# replica quantitiy to scale up at a time (default: 1)
upscale_quantity: 2
# replica quantitiy to scale down at a time (default: 1)
downscale_quantity: 1
# number of breaches to wait for before scaling up (default: 1)
upscale_sensitivity: 1
# number of breaches to wait for before scaling down (default: 1)
downscale_sensitivity: 1
webapi:
...
worker_services:
worker1:
min: 1
max: 10
# number of jobs each worker replica should process (required)
# the bigger the ratio, the less number of workers scaled out
ratio: 3
upscale_sensitivity: 1
downscale_sensitivity: 1
worker2:
...
More details about configuration parameters can be found in HireFire docs.
Scaltainer is availabe on Docker Hub, so you can docker run
it:
docker run -it --rm rayyanqcri/scaltainer
Which will print the usage. To add arguments, just append them:
docker run -it --rm rayyanqcri/scaltainer -f scaltainer.yml
Scaltainer should typically be run as a minutely cron service. If you are using rayyanqcri/swarm-scheduler, a service definition for scaltainer is typically something like this:
version: '3.3'
services:
scaltainer:
image: rayyanqcri/scaltainer:latest
command: -f /scaltainer.yml --state-file /tmp/scaltainer-state.yml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- DOCKER_URL=unix:///var/run/docker.sock
- DOCKER_SECRETS_PATH_GLOB={/run/secrets/*}
- RESPONSE_TIME_WINDOW=3
configs:
- source: scaltainer
target: /scaltainer.yml
secrets:
- scaltainer
deploy:
replicas: 0
restart_policy:
condition: none
placement:
constraints:
- node.role == manager
configs:
scaltainer:
file: scaltainer.yml
secrets:
scaltainer:
file: scaltainer.env
Where scaltainer.env
is a file containing HireFire and NewRelic secrets:
HIREFIRE_TOKEN=
NEW_RELIC_API_KEY=
And scaltainer.yml
is the scaltainer configuration file.
After checking out the repo, run bin/setup
to install dependencies. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/hammady/scaltainer.
rake
The gem is available as open source under the terms of the MIT License.