Table of contents
This image includes Duplicity ready to make backups of whatever you need, cron-based.
Because you need to back things up regularly, and Duplicity is one of the best tools available for such a purpose.
Installing every possible Duplicity dependency to support all of its backends inside an Alpine system that is very lightweight by itself, and a little job runner Python script that takes care of converting some environment variables into flexible cron jobs and sending an email report automatically.
Each of the built-in flavors is separated into a specific docker image:
docker-duplicity
docker-duplicity-s3
docker-duplicity-docker
docker-duplicity-docker-s3
docker-duplicity-postgres
docker-duplicity-postgres-s3
Check the section bellow to get more info.
Each of the images mentioned above are tagged with :latest
, referring to the latest
tagged version in git, and :egde
, referring to the latest version in the master
branch. Each individual git released version is also tagged (e.g. :0.1.0
)
Apart from the environment variables that Duplicity uses by default, you have others specific for this image.
Define the cron schedule to run jobs under such circumstances.
Possibly non-obvious defaults:
- Daily: 2 AM, from Monday to Saturday
- Weekly: 1 AM, on Sundays
- Monthly: 5 AM, 1st day of month
Hours are expressed in UTC.
If you define any of these variables wrongly, your cron might not work!
You can use online tools such as https://crontab.guru to make it easy.
If you set these values in .env
file, don't use quotes:
CRONTAB_15MIN=*/15 * * * *
CRONTAB_HOURLY=0 * * * *
CRONTAB_DAILY=0 2 * * MON-SAT
CRONTAB_WEEKLY=0 1 * * SUN
CRONTAB_MONTHLY=0 5 1 * *
Only available in the PostgreSQL flavors.
Define a Regular Expression to filter databases that shouldn't be included in the DB dump.
You can use this to avoid getting permission errors when running a backup against a server where you don't control all the databases.
For example, if you don't want to include the databases named DB1
and DB2
you can
use:
DBS_TO_EXCLUDE="^(DB1|DB2)$"
Or, if you only want to include those databases that start with prod
:
DBS_TO_INCLUDE="^prod"
Where to store the backup.
Example: ftps://[email protected]/some_dir
Email report sender.
Subject of the email report. You can use these placeholders:
{periodicity}
will be one of these:15min
hourly
daily
weekly
monthly
{result}
will be:OK
if all worked fine.ERROR
if any job failed.
{hostname}
will be the container's host name, including the domainname (a.k.a. FQDN).
This variable is optional; the default is
Backup report: {hostname} - {periodicity} - {result}
Email report recipient. Multiple recipients can be defined as a comma-separated list of emails.
Define a command that needs to be executed.
Check the Dockerfile
to see built-in jobs.
Define when to execute the command you defined in the previous section. If you need
several values, you can separate them with spaces (example: daily monthly
).
Prebuilt flavors provide built-in jobs. You can disable those jobs by setting
corresponding JOB_*_WHEN
to value never
.
String to let you define options for duplicity.
String that some prebuilt flavors use to add custom options required for that flavor. You should never need to use this variable.
Host used to send the email report.
Port used to send the email report.
If your mail server requires authentication, specify the user account to log in.
If your mail server requires authentication, specify the password for the SMTP_USER.
Force the email client to connect to the server using SLL/TLS. Note that the client will utilize STARTTLS, regardless of this variable, if the server offers STARTTLS.
What to back up.
Example: file:///mnt/my_files
By default, SRC is set to /mnt/backup/src/ inside the container. Simply mount any external directory as a volume to /mnt/backup/src/. If you wish to include multiple directories, mount them as subdirectories of /mnt/backup/src/, like...
volumes:
- /path/to/data/to/backup1:/mnt/backup/src/foldername1:ro
- /path/to/data/to/backup2:/mnt/backup/src/foldername2:ro
Define a valid timezone (i.e. Europe/Madrid
or America/New_York
) to make log hours
match your local reality.
This is achieved directly by bundling the tzdata
package. Refer to its docs
for more info.
Duplicity checks the host name that it backs up and aborts the process if it detects a mismatch by default.
Docker uses volatile host names, so you better add --hostname
(and maybe also
--domainname
) when running this container to make profit of this feature, or add
--allow-source-mismatch
to OPTIONS
environment variable. Otherwise, you will get
errors like:
Fatal Error: Backup source host has changed.
Current hostname: 414e54ed20fb
Previous hostname: 6529bba0969c
Aborting because you may have accidentally tried to backup two different
data sets to the same remote location, or using the same archive directory.
If this is not a mistake, use the --allow-source-mismatch switch to avoid
seeing this message
Add jobs through environment variable pairs. The order will be followed.
Refer to Duplicity man page, or execute:
docker run -it --rm ghcr.io/tecnativa/docker-duplicity duplicity --help
You can use these bundled binaries to work faster:
dup
: Executes duplicity prefixed with the options defined in$OPTIONS
and$OPTIONS_EXTRA
(see above).backup
: Executes an immediate backup with default options.restore
: Restores immediately with default options. Most likely, you will need to use it with--force
./etc/periodic/daily/jobrunner
: execute immediately all jobs scheduled for daily backups. Changedaily
for other periodicity if you want to run those instead.
If you want to test how do your daily
jobs work, just run:
docker exec -it your_backup_container /etc/periodic/daily/jobrunner
Replace daily
by any other periodicity to test it too.
Sometimes you need more than just copying a file here, pasting it there. That's why we supply some special flavours of this image.
This includes just the most basic packages to boot the cron and use Duplicity with any backend. All other images are built on top of this one, so downloading several flavours won't repeat the abse layers (disk-friendly!).
It's preconfigured to backup daily:
# Incremental backup of all files
JOB_300_WHEN=daily
If you want to back up a PostgreSQL server, make sure you run this image in a fashion
similar to this docker-compose.yaml
definition:
services:
db:
image: postgres:9.6-alpine
environment:
POSTGRES_PASSWORD: mypass
POSTGRES_USER: myuser
POSTGRES_DB: mydb
backup:
image: ghcr.io/tecnativa/docker-duplicity-postgres
hostname: my.postgres.backup
environment:
# Postgres connection
PGHOST: db # This is the default
PGPASSWORD: mypass
PGUSER: myuser
# Additional configurations for Duplicity
AWS_ACCESS_KEY_ID: example amazon s3 access key
AWS_SECRET_ACCESS_KEY: example amazon s3 secret key
DST: boto3 s3://mybucket/myfolder
EMAIL_FROM: [email protected]
EMAIL_TO: [email protected]
OPTIONS: --s3-european-buckets --s3-use-new-style
PASSPHRASE: example backkup encryption secret
It will make dumps automatically:
# Makes postgres dumps for all databases except to templates and "postgres".
# They are uploaded by JOB_300_WHEN
JOB_200_WHEN=daily weekly
Imagine you need to run some command in another container to generate a backup file before actually backing it up in a remote place.
If this is your case, you can use this version, which includes a prepackaged Docker client.
See this docker-compose.yaml
example, where we back up a Gitlab server using its
crappy official image:
services:
gitlab:
image: gitlab/gitlab-ce
hostname: gitlab
domainname: example.com
environment:
GITLAB_OMNIBUS_CONFIG: |
# Your Gitlab configuration here
ports:
- "22:22"
- "80:80"
- "443:443"
volumes:
- config:/etc/gitlab:z
- data:/var/opt/gitlab:z
- logs:/var/log/gitlab:z
backup:
image: ghcr.io/tecnativa/docker-duplicity-docker
hostname: backup
domainname: gitlab.example.com
privileged: true # To speak with host's docker socket
volumes:
- config:/mnt/backup/src/config
- data:/mnt/backup/src/data
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
# Generate Gitlab backup before uploading it
JOB_200_WHAT: docker exec projectname_gitlab_1 gitlab-rake gitlab:backup:create
JOB_200_WHEN: daily weekly
# Additional configurations for Duplicity
AWS_ACCESS_KEY_ID: example amazon s3 access key
AWS_SECRET_ACCESS_KEY: example amazon s3 secret key
DST: boto3 s3://mybucket/myfolder
EMAIL_FROM: [email protected]
EMAIL_TO: [email protected]
OPTIONS: --s3-european-buckets --s3-use-new-style
PASSPHRASE: example backup encryption secret
Any of the other flavors has a special variant suffixed with -s3
. It provides some
opinionated defaults to make good use of S3 different storage types and its lifecycle
rules and filters, assuming you want to have weekly full backups. You
should combine it with lifecycle and expiration rules at your will.
# Full backup of all files
JOB_500_WHEN=weekly
Note, that for DST
variable you should use boto3 s3://bucket_name[/prefix]
style.
At the moment only "postgres" has this flavor. It extends from 'postgres-s3' and provides some defaults to make good use of "DST_{N}" env. variables. and uses the extra options according to destination.
In this mode the $DST
is set to multi
. This enables the use of $DST_{N}
and
$DST_{N}_{ENV_VAR_NAME}
.
$DST_{N}_{ENV_VAR_NAME}
will be process as ${ENV_VAR_NAME}
.
For example:
backup:
...
environment:
...
DST_1: scp://[email protected]//usr/backup
DST_2: boto3 s3://mybucket/myfolder
DST_2_AWS_ACCESS_KEY_ID: example amazon s3 access key
DST_2_AWS_SECRET_ACCESS_KEY: example amazon s3 secret key
DST_3: rsync://[email protected]:8022//volume1/folder/
DST_3_RSYNC_PASSWORD: the password to use with rsync
The restore
process uses the first destination defined.
All the dependencies you need to develop this project (apart from Docker itself) are managed with poetry.
To set up your development environment, run:
pip install pipx # If you don't have pipx installed
pipx install poetry # Install poetry itself
poetry install # Install the python dependencies and setup the development environment
To run the tests locally, add --prebuild
to autobuild the image before testing:
poetry run pytest --prebuild
By default, the image that the tests use (and optionally prebuild) is named
test:docker-duplicity
. If you prefer, you can build it separately before testing, and
remove the --prebuild
flag, to run the tests with that image you built:
docker image build -t test:docker-duplicity .
poetry run pytest
If you want to use a different image, pass the --image
command line argument with the
name you want:
# To build it automatically
poetry run pytest --prebuild --image my_custom_image
# To prebuild it separately
docker image build -t my_custom_image .
poetry run pytest --image my_custom_image
The poetry project configuration (in the pyproject.toml
file) includes a section which
contains the duplicity dependencies themselves. This allows us to manage those more
easily and avoid future conflicts. Those are then exported into a requirements.txt
file in the docker image build phase.
So, if you need to add a new duplicity dependency to be used inside the container, the correct process would be:
-
Add the dependency to the poetry project with:
poetry add --optional MY_NEW_PACKAGE
Note that it should be marked as an optional dependency, as it will not be used in general development outside the container.
-
The new optional dependency should then be added to the duplicity list in the
[tool.poetry.extras]
section ofpyproject.toml
[tool.poetry.extras] duplicity = ["b2", "b2sdk", "boto", "boto3", "gdata", "jottalib", "paramiko", "pexpect", "PyDrive", "pyrax", "python", "requests", "duplicity", "dropbox", "python", "mediafire", "MY_NEW_PACKAGE"]