Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not attempt to renew certificate no longer used #3376

Open
yajo opened this issue May 23, 2018 · 24 comments · May be fixed by #10782
Open

Do not attempt to renew certificate no longer used #3376

yajo opened this issue May 23, 2018 · 24 comments · May be fixed by #10782
Labels
area/acme contributor/wanted Participation from an external contributor is highly requested kind/enhancement a new or improved feature. priority/P3 maybe

Comments

@yajo
Copy link

yajo commented May 23, 2018

Do you want to request a feature or report a bug?

Bug

What did you do?

I booted one service in one server, behind Traefik, configured with docker backend and correctly labeled.

Everything worked fine:

  • Traefik contacted LE
  • Traefik obtained cert
  • Traefik renewed cert after awhile, when needed
  • Traefik 🎸 ❤️

But, after some time, i moved the service to another equally-configured server.

What did you expect to see?

Traefik in old server stop renewing cert if no container is using it actively.

What did you see instead?

These logs:

INFO[2018-05-23T16:30:05Z] Renewing certificate from LE : {Main:example.com SANs:[]} 
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Trying renewal with -1447 hours remaining
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Obtaining bundled SAN certificate
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] AuthURL: https://acme-v01.api.letsencrypt.org/acme/authz/******
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Could not find solver for: dns-01
legolog: 2018/05/23 16:30:05 [INFO][odoo.turbointernacional.com] acme: Trying to solve HTTP-01
ERRO[2018-05-23T16:31:35Z] Error renewing certificate from LE: acme: Error 403 - urn:acme:error:unauthorized - Invalid response from http://example.com/.well-known/acme-challenge/****** [83.48.28.165]: 404
Error Detail:
        Validation for example.com:80
        Resolved to:
                *.*.*.*
        Used: *.*.*.*

Output of traefik version: (What version of Traefik are you using?)

Version:      v1.5.3
Codename:     cancoillotte
Go version:   go1.9.4
Built:        2018-02-27_02:47:04PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

Docker backend, CLI flags configuration... but I don't think is relevant for this issue, where everything is working as expected.

@ldez ldez added area/acme kind/bug/confirmed a confirmed bug (reproducible). and removed status/0-needs-triage labels May 23, 2018
@ldez ldez assigned ldez and unassigned ldez May 24, 2018
@ldez ldez added kind/enhancement a new or improved feature. priority/P2 need to be fixed in the future and removed kind/bug/confirmed a confirmed bug (reproducible). labels May 24, 2018
@minhdanh
Copy link

minhdanh commented Aug 9, 2018

I'm seeing this behavior on Traefik 1.7-rc2 as well

@kumy
Copy link
Contributor

kumy commented Sep 4, 2018

Is there any known workaround for this?

Edit: I removed concerned certificate from acme.json and restarted traefik did the trick.

@Southclaws
Copy link

I'm seeing this too, given the ephemerality of services that Traefik targets, it would make sense to remove not attempt to renew certificates that are not present on any services.

It would be even better if those certificates were removed, maybe after some time if a service is just momentarily offline.

@MaxWinterstein
Copy link

MaxWinterstein commented Feb 12, 2019

Any news on this? I also ran into an issue where my traefik had requested a huge number of certs for non existing frontends.

Bash oneliner guy i am, i hacked some commands together to clean those old certs out of the acme.json. I use the api endpoint of the dashboard for this. I leave this here for anyone, but please think before you do something. I only use kubernetes backend.

How I removed unused certs

I take no warranty for your copy paste job! What worked for me, maybe fail on your environment

Note: This could be archived in many ways. I did not choose the shortest nor i played code golf. This is meant to be a little bit human readable at least.

I simply jump directly into my traefik container and did the folling:

  • Install dependencies
apk update && apk add jq curl
  • Fetch existing frontends
curl -s "https://<USERNAME>:<PASSWORD>@<TRAEFIK_DASHBOARD_URL>/api" | jq ".kubernetes.frontends" | jq "keys" | jq -r ".[]" | sed "s/\/.*//" | uniq > existing_frontends
  • Fetch existing certs
cat /acme/acme.json | jq ".Certificates" | jq ".[]" | jq ".Domain"  | jq -r ".Main" | sort | uniq > existing_certs
  • now get the certs that need to be removed
diff existing_certs existing_frontends | tail -n  4 | grep "^-" | sed "s/^-//" > certs_to_remove
  • and let jq remove them
cp /acme/acme.json /acme/acme.json.new
cat certs_to_remove | xargs -i sh -c "jq 'del(.Certificates[]| select(.Domain.Main == \"{}\"))' /acme/acme.json.new > /acme/acme.json.new2; mv /acme/acme.json.new2 /acme/acme.json.new"
  • verify the new file
cat /acme/acme.json.new | jq ".Certificates[].Domain.Main"
  • change permission
chmod 600 /acme/acme.json.new
  • backup and overwrite
cp /acme/acme.json /acme/acme.json.bak
echo "remove next # - think before you do something"
#cp /acme/acme.json.new /acme/acme.json

after all that work i deleted my traefik pod to cleanup everything and enjoyed a cup of coffee as reward.

I take no warranty for your copy paste job!

@puco
Copy link

puco commented Feb 25, 2019

This is especially annoying, when the certificates are stored in KV store (consul in our case) which limits the size of the acme.json object. We spin up instances on demand and tear them down after couple of days. But the certificates stay in the file and eventually preventing new certificates from being created.

The only workaround for me here is to stop traefik, semi-manually remove the obsolete certificates, push the new file to the KV store and start traefik again.

For a non-existing URL it does not make sense to renew and can be removed. Once the URL would be used again traefik can request the certificate again.

@strajansebastian
Copy link

also confirm this.
it would be nice to have an API call that can automatically delete the certs from the store (file/consul/other)

this are the logs from the traefik:v1.7.9 docker container:

{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"Renewing certificate from LE : {Main:My.Removed.Domain.TLD SANs:[]}\"\n","stream":"stdout","time":"2019-04-07T17:50:16.346915604Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Trying renewal with 688 hours remaining\"\n","stream":"stdout","time":"2019-04-07T17:50:16.347232814Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Obtaining bundled SAN certificate\"\n","stream":"stdout","time":"2019-04-07T17:50:16.34740912Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/AUTHZ_TOKEN_NOT_SHARED\"\n","stream":"stdout","time":"2019-04-07T17:50:16.659316163Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Could not find solver for: tls-alpn-01\"\n","stream":"stdout","time":"2019-04-07T17:50:16.686342867Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: use http-01 solver\"\n","stream":"stdout","time":"2019-04-07T17:50:16.686349967Z"}
{"log":"time=\"2019-04-07T17:50:16Z\" level=info msg=\"legolog: [INFO] [My.Removed.Domain.TLD] acme: Trying to solve HTTP-01\"\n","stream":"stdout","time":"2019-04-07T17:50:16.686353967Z"}
{"log":"time=\"2019-04-07T17:50:21Z\" level=error msg=\"Error renewing certificate from LE: acme: Error -\u003e One or more domains had a problem:\\n[My.Removed.Domain.TLD] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://My.Removed.Domain.TLD/.well-known/acme-challenge/ACME_CHALLANGE_TOKEN_NOT_SHARED [1.3.3.7]: \\\"\u003chtml\u003e\\\\r\\\\n\u003chead\u003e\u003ctitle\u003e403 Forbidden\u003c/title\u003e\u003c/head\u003e\\\\r\\\\n\u003cbody bgcolor=\\\\\\\"white\\\\\\\"\u003e\\\\r\\\\n\u003ccenter\u003e\u003ch1\u003e403 Forbidden\u003c/h1\u003e\u003c/center\u003e\\\\r\\\\n\u003chr\u003e\u003ccenter\u003e\\\", url: \\n\"\n","stream":"stdout","time":"2019-04-07T17:50:21.45825789Z"}

This is a script wrote a while back in which I made manual changes in order to removed the old/bad/unused/migrated certs from the consul acme.json:
Removed manually all reference for the targeted domains.

After this a manual push of the cert is required made. Ensure that you don't corrupt your acme.json while editing (you'll still have the original backups at FILE_ORIGINAL_BASE64)

You need to run this on one of your consul servers (script tested on ubuntu 16.04) (in some cases had to reboot the entire setup to make the changes visible, still didn't found a stable/easy/logical way of making the update).
Take the script from below as a guideline and for the first run I would recommend a step by step run so you'll understand what happens.
Also look after variables that start with CHANGE_ME and make the appropriate changes.

#!/bin/bash

# 
# IF YOU RUN THIS SCRIPT ENSURE THAT NO CERTIFICATE UPDATES ARE MADE DURING THIS PERIOD
# YOU RISK TO DELETE/OVERRIDE NEWLY GENERATED CERTIFICATES
#

BASE_CERT_PATH=traefik
CERT_PATH=${BASE_CERT_PATH}/acme/account/object

BASE_FILE_NAME=/root/consul_dump.`date  %Y-%m-%d_%H-%M-%S`
FILE_ORIGINAL_BASE64=${BASE_FILE_NAME}.1.base64
FILE_ORIGINAL_JSON=${BASE_FILE_NAME}.2.json
FILE_MODIFIED_JSON=${BASE_FILE_NAME}.3.json.modified
FILE_MODIFIED_BASE64=${BASE_FILE_NAME}.4.base64.modified

# get data out of consul - in base64 gzip enconding

# use this for HTTTPS authentication
CONSUL_AUTH_PARAMS="-ca-path=/consul/tls/ca.pem -client-cert=/consul/tls/consul.pem -client-key=/consul/tls/consul-key.pem -http-addr=https://127.0.0.1:8443 -tls-server-name=CHANGE_ME_CONSUL-HA-HOSTNAME.TLD"
echo `docker exec CHANGE_ME_CONSUL-DOCKER-CONTAINER-NAME consul kv get $CONSUL_AUTH_PARAMS -base64 $CERT_PATH` > $FILE_ORIGINAL_BASE64
cat $FILE_ORIGINAL_BASE64 | base64 --decode | gzip -dc | jq . > $FILE_ORIGINAL_JSON
cp $FILE_ORIGINAL_JSON $FILE_MODIFIED_JSON

# make manual changes
vim $FILE_MODIFIED_JSON

# convert back to base64 gzip and store for push upstream
cat $FILE_MODIFIED_JSON | gzip -c | base64 -w 0 > $FILE_MODIFIED_BASE64 

# put data back to consul store
echo 'THIS IS DONE'
echo 'MANUALLY RUN: '

echo "cat $FILE_MODIFIED_BASE64 | /var/lib/docker/overlay2/CHANGE_ME_CONSUL-LAYER-THAT-CONTAINS-THE-CONSUL-BINARY/diff/bin/consul kv put $CONSUL_AUTH_PARAMS -base64 $CERT_PATH -"

@nitrique

This comment has been minimized.

@Jensderond
Copy link

Jensderond commented Jul 15, 2020

For what its worth I created a Makefile that is tested with Traefik 2.2.x
This is mostly based on the comment of @MaxWinterstein

acmefile = acme.json
traefik_dashboard = <TRAEFIK_DASHBOARD_URL>
auth_user = <USERNAME>
auth_password = <PASSWORD>

.SILENT: clean
.PHONY: clean
clean:
    curl -s "https://$(auth_user):$(auth_password)@$(traefik_dashboard)/api/http/routers" | jq -r ".[]" | jq ".rule" | sed "s/\"Host(\`//g;s/\`)\"//g" | uniq > existing_frontends;
    cat $(acmefile) | jq ".default.Certificates[].domain.main" | sort | uniq | sed "s/\"//g" > existing_certs;
    awk 'NR==FNR{a[$$0];next}!($$0 in a)' existing_frontends existing_certs > certs_to_remove;
    cp $(acmefile) $(acmefile).new;
    cat certs_to_remove | xargs -I'{}' -i sh -c "jq 'del(.default.Certificates[]| select(.domain.main == \"{}\"))' $(acmefile).new > $(acmefile).new2; mv $(acmefile).new2 $(acmefile).new";
    chmod 600 $(acmefile).new;
    chown traefik:docker $(acmefile).new;
    mv $(acmefile) $(acmefile).bak;
    mv $(acmefile).new $(acmefile);
    rm certs_to_remove existing_certs existing_frontends;
    docker-compose restart;

Figured I might as well just create a Gist

@derjohn
Copy link

derjohn commented Oct 19, 2020

Wow,
that issue is still open after 2 years? With Traefik you can trigger the Let's Encrypt rate limit pretty fast, just have 5 certs in acme.json that can't be renewed, because the domain moved the DNS wo somewhere else I believe a non-existent domain does not trigger it (NXDOMAIN), but domains pointing to another http endpoint, will trigger the rate-limit.
Removing from ACME Json will cause a traefik start, or at least kill -HUP, so causing downtime.
Is this the "upsell reason" for buying traefik EE ?

@derjohn
Copy link

derjohn commented Oct 21, 2020

Just a side-note: There was a related ticket, where there is an API call mentioned that should be implemented: #7082

@MarkErik
Copy link

MarkErik commented Mar 14, 2021

I am also worried about hitting the rate limit for for URLs that may have been pointed to other addresses.

It has been ~5 months since the last comment, have any of you found a better solution? (or is the solution still to run one of the above scripts?)

@derjohn
Copy link

derjohn commented Mar 14, 2021

@MarkErik Basically it's still one of those scripts. I ran traefik in k8s and started migrating to ingress-nginx and cert manger: That scales better (multi-pod with Certs) and has better cert-management.

@Aaron-Ritter
Copy link

Aaron-Ritter commented Mar 29, 2021

Just a side-note: There was a related ticket, where there is an API call mentioned that should be implemented: #7082

@derjohn it was just a suggestion but there is no concrete issue related to it, did you find anything related to that?

@205g0
Copy link

205g0 commented May 12, 2021

@ldez is anyone on this issue?

I ran traefik in k8s and started migrating to ingress-nginx and cert manger: That scales better (multi-pod with Certs) and has better cert-management.

@derjohn better cert-management? would you mind to elaborate? and how was the general setup of both nginx and cert-manager compared to Traefik 2.x?

Edit: Just found that e.g. https://voyagermesh.com/docs/v2021.04.24-rc.0/guides/certificate/delete/ has a cert delete feature, this would be also handy for Traefik.

@AndrewSav
Copy link
Contributor

AndrewSav commented May 13, 2021

@derjohn better cert-management? would you mind to elaborate?

@205g0 well you just need to read up on / try the cert-manager. Certs are not buried in a json file that can be in any of the supported PVs somewhere, and instead are exposed as first class Kubernetes Secret objects, there are traceable Order / Challenge objects for acme (with lifecycle events) and crs/cert for all certs. Also there is support for Vault, self-signed and external CAs that's absent in traefik. All that amounts to better cert-management.

@205g0
Copy link

205g0 commented May 13, 2021

@AndrewSav thanks! Yesterday, I had checked cert-manager but somehow I was afraid or just too lazy to set it up. Which is the best ingress to pair it with? The Kubernetes nginx or the one from Nginx Inc. or an entirely different ingress? Or just with Traefik?

@AndrewSav
Copy link
Contributor

AndrewSav commented May 13, 2021

@205g0 cert-manager works well with all of them. I'm using traefik as ingress in my kuberentes clusters, and it's working well for me.

@MaxWinterstein
Copy link

MaxWinterstein commented May 13, 2021

Can definitely recommend the cert-manager and acme-dns way. Using this to get wildcart lets-encrypt certs refreshed works nice for me.

ditched traefik for ambassador a while ago and never looked back.

@205g0
Copy link

205g0 commented May 17, 2021

@MaxWinterstein cert-manager is def the way to go, even if it is paired with Traefik, better than Traefik's built-in resolver. Ambassador is on my list, I checked Contour and Gloo before because people compared them all the time to Ambassador and yeah. I got Contour running, nice docs but it seems not be very feature-rich but somewhat a community. Gloo looks best on paper but I couldn't get it to run, so yeah... you're happy with Ambassador? Any drawbacks?

@rossnick
Copy link

rossnick commented Jul 5, 2022

I filled bug #9162. It was closed as a duplicate of this one. While the issue is similar, I do not feel it's a duplicate, as this issue relates to a whole certificate not used anymore, mine was referencing a SAN in a still used certificate that was removed.

@kevinpollet kevinpollet added priority/P3 maybe and removed priority/P2 need to be fixed in the future labels Jul 6, 2022
@kevinpollet kevinpollet linked a pull request Aug 8, 2022 that will close this issue
@ldez ldez linked a pull request Jun 4, 2024 that will close this issue
2 tasks
@nmengin nmengin added the contributor/wanted Participation from an external contributor is highly requested label Jul 1, 2024
@ZVilusinsky

This comment was marked as off-topic.

@emilevauge

This comment was marked as off-topic.

@ZVilusinsky

This comment was marked as off-topic.

@emilevauge

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme contributor/wanted Participation from an external contributor is highly requested kind/enhancement a new or improved feature. priority/P3 maybe
Projects
Status: Review
Development

Successfully merging a pull request may close this issue.