Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Native Kubernetes LetsEncrypt Implementation #2542

Open
dtomcej opened this issue Dec 7, 2017 · 66 comments
Open

Proposal: Native Kubernetes LetsEncrypt Implementation #2542

dtomcej opened this issue Dec 7, 2017 · 66 comments
Assignees
Labels
area/acme area/provider/k8s/ingress contributor/wanted Participation from an external contributor is highly requested kind/proposal a proposal that needs to be discussed.

Comments

@dtomcej
Copy link
Contributor

dtomcej commented Dec 7, 2017

Proposal

This proposal is to discuss the possibility of implementing native Kubernetes LetsEncrypt/ACME support into Traefik.

Background

At its simplest, Traefik stores LetsEncrypt (LE) data in a json file. Currenly, to provide High Availablity (HA) support for LE, Traefik requires a Key-Value Store (KV) to store the challenge certificates, private keys, and completed certificates. When running in a managed cluster (such as Kubernetes), running an additional KV store incurs a lot of overhead, and doesn't leverage the Kubernetes cluster features. Having ACME data stored in a KV store does not promote re-use, or ease of use.

Suggestion

I propose that Traefik support native Kubernetes storage of LE certs in the Kubernetes API. This should reduce overhead, make it easier for users to properly leverage HA, and allow re-use of the requested certificates. This will also disconnect traefik from the filesystem, which is ideal in a Kubernetes environment.

Implementation

Ideally, we would use native Kubernetes objects to store LE state, and LE data.
I propose that we store the following:

Configmap:

  • The FQDN/SAN information for the certificates
  • The certificate renewal date?
  • The public portion of the completed certificate (pubkey)
  • The public portion of the challenge certificate
  • Links to the secret containing the correspoinding private keys
  • Any other pertinent public data tied to the certificate

Secret:

  • The private key for the challenge certificate
  • The private key for the LE certificate

This would allow HA/dynamic introduction of custom (Non-LE) certificates if they follow that template as well.

Required changes

ACME

The ACME client would need a flag (perhaps --acme.storage.kubernetes) to let the client know that it needs to store the cert data using the Kubernetes API.

Kubernetes Provider

The Kubernetes provider needs to be updated to handle API updates for configmaps/secrets/and others to handle cluster updates, and certs added from other nodes. We may need to force a cluster update after generating a challenge cert, so that it can be served immediately from another node. The Kubernetes provider would also need to be able to expose the information needed for the ACME client to work properly.

Note that we might want to allow for an alternate Kubernetes account to access the API, since we will have Traefik creating and destroying data through the API. Some users may want to have more RBAC control over traefik in this case.

@dtomcej dtomcej added the kind/proposal a proposal that needs to be discussed. label Dec 7, 2017
@dtomcej
Copy link
Contributor Author

dtomcej commented Dec 7, 2017

Ping @containous/traefik

@errm
Copy link
Contributor

errm commented Dec 8, 2017

Sounds good to me 👍

@timoreimann
Copy link
Contributor

SGTM too.

One detail: we may want to implement the linkage between ConfigMaps and Secrets through an annotation rather than some extra field embedded in the ConfigMap. Apart from being simpler, it'd also allow users to update relationships more easily.

@errm
Copy link
Contributor

errm commented Dec 8, 2017

1 for the link being in an annotation

We should also think about the namspace secrets etc are created in...

If traefik has its own isolated namespace then its fine just use the current one... but if people are deploying a traefik per namespace for team use or somesuch, it might be good to be able to override and save into another namespace... If my impression is correct it is a relatively common pattern to allow application pods to access secrets in there own namespace...

@dtomcej
Copy link
Contributor Author

dtomcej commented Dec 8, 2017

Not sure if the link should be an annotation for the following reasons:

  1. Are we wanting to have multiple links for annotations? Having multiple links within a configmap are easily updated and managed etc, but we may run into issues with multiple links if we are using a single annotation
  2. We could use metadata on the configmap instead of configmap contents to link, but I'm unsure as to which option is better.
  3. By using the configmap option, you can use kubernetes to store acme data, and not use the kubernetes provider to configure backends. This would allow kubernetes users to use the API to store the acme data, and the toml to configure static service proxying. In this case, there is no ingress.
  4. When an ingress triggers an update, we take the update, then update the ingress, ingress triggers an update kind of circular update things can happen 😢

Namespacing seems like a good idea. I know configmaps are namespaced. I agree @errm that the default should be the curent namespaces.

@timoreimann
Copy link
Contributor

timoreimann commented Dec 8, 2017

@dtomcej just to be clear: I was talking about annotations on the ConfigMap, not the Ingress object. I'm trying hard to think about how that could conflict with any of the points you mentioned, but I don't see how it could.

Please let me know if you had ConfigMap annotations in mind in the first place and see issues with that.

@igoratencompass
Copy link

1 for this, it is important to link traefik with the native k8s elements like ConfigMap and Secrets, especially when having in mind that etcdv2 backend is/was (can see 1.5.0-rc2 solves this issue) broken for long time.

@igoratencompass
Copy link

@dtomcej I would also like to mention the benefit of this in case of federation. Replicating the k8s resources across federated clusters solves the LE problem for which I can't see an easy solution for atm, i.e. having the same app running in multiple clusters and still be able to use LE for managing its certs. Of course you can cluster on etcd level lets say instead and have the certs stored there BUT many people, especially in AWS, run their clusters in private subnets which makes the networking difficult.

@dtomcej
Copy link
Contributor Author

dtomcej commented Dec 9, 2017

@timoreimann Ah. Configmap annos/meta would be fine :) Sorry, I thought you meant ingress annotations.

@errm
Copy link
Contributor

errm commented Dec 10, 2017

There is some prior art on this: e.g. https://github.com/jetstack/cert-manager where they are using some CustomResourceDefinitions to store the acme config stuff...

Also, there is a standard format for the secret with the keypair itself to be stored in that kubernetes provides:
https://github.com/kubernetes/kubernetes/blob/6caf34389b865be8b41811dc7c130c1006995d8a/pkg/apis/core/types.go#L4319-L4332

@emilevauge
Copy link
Member

@dtomcej First of all, I love the idea :)
My two cents:

  • Traefik can persist ACME certs using 2 implementations; acme/localStore and cluster/datastore. Both implement cluster/Store and cluster/Transaction. We could definitely add a new Kubernetes specific implementation.

  • Traefik persists the whole struct acme/Account which contains the ACME registration info, the email, the private key, the current challenge certificates, the certificates. We should definitely reuse this object here :)

Again, great proposal !

@geekgonecrazy
Copy link

Would love this! Being able to run in kubernetes in HA with out having to maintain an additional KV store would be great.

What would the read/write speed implications be here? If we had two instances setting certs is it going to be able to handle this?

@semekh
Copy link
Contributor

semekh commented Feb 12, 2018

We've been using cert-manager and it's pretty good at what it does. It's also not LE-specific.

Suggestion: Traefik should pick up the ingress secret and reload its configurations dynamically. This way, cert-manager and/or any other similar software, can generate the secret and Traefik will happily work with them.

@The-Loeki
Copy link

#2439 implements that.

I do have to say though cert-manager's ACME is not as good as Traefik's; we've had some weird issues with it and the list of supported DNS providers is less than stellar.

@cdrage
Copy link

cdrage commented Mar 21, 2018

So #2439 was merged, is there any progress on this issue? I'd like to dive into it and see what I can contribute / help out with.

@timoreimann
Copy link
Contributor

AFAIU, a few implementation options are on the table. We should agree to one particular approach; after that, the task is free to be picked up by anyone who feels committed.

The suggestion made in #2542 (comment) sounds fair and nicely decoupled to me. @dtomcej any thoughts?

@dtomcej
Copy link
Contributor Author

dtomcej commented Mar 22, 2018

You bet. We were just waiting for the ACME rewrite. Yes, We should monitor for secrets created with a prefix, just like we do for ingress classes.

But yes, just monitor, and dynamically add.

The question @timoreimann are we wanting to decouple the cert/secret loading from the ingress objects altogether?

Right now we load the secret based on the ingress, but if we were doing native support, we could separate the two.

Thoughts?

@kodmaskinen
Copy link

Any news on this?

@ekarlso
Copy link

ekarlso commented Jun 12, 2018

Any news on this 1 @andreassundstrom

@scher200
Copy link

scher200 commented Aug 27, 2018

From what I read in the docs (the note):
screenshot-not-supported-traefik-certs-store-as-secrets
this is planned, does anyone know when we can expect this awesome implementation. ^

@MadWombat
Copy link

This would be absolutely lovely. I am currently running a consul cluster with its only purpose being acme certificate storage. Would love to get rid of it. Anything I can do to help move this along?

@rudolphjacksonm
Copy link

We're looking into implementing Traefik in our Kubernetes cluster and the prospect of creating a dedicated k/v store is a bit heartbreaking.

I'm sure this isn't a simple change to implement and will take some time--in the meantime has anyone seen a write-up or any documentation on how to get Traefik to use secrets generated by cert-manager?

@ReillyTevera
Copy link

@benjamin-bergia When running in KV mode Traefik uses the KV store itself to perform leader election. This ensures that only one instance at a given time is able to perform LE renewals and also ensures that all of the other instances are properly notified when those certificates are updated/modified.

When configured to persist LE certificates to a file system neither of those safeties are in place. Multiple Traefik instances could attempt to do certificate renewals at the same time. The filesystem handler isn't written for concurrent writing so this could lead to corruption. Additionally Traefik doesn't support inotify for the LE file. This means that even if only one LE node does the renewal none of the other ones will detect that the file has been modified and reload the certs.

The only scenario where it would be okay to a persistent volume to store the LE file would be if you were only using a single Traefik node. If that is not the case you will need to either use a KV store for proper cluster behavior or load LE certificates from another source (IE use cert-manager).

Given that getting a HA KV cluster up and running (and properly monitored/backed up etc) is fairly non-trivial I'd recommend using cert-manager instead. While the docs for setting it up can be a little lacking the end result is fairly reliable and lightweight.

@benjamin-bergia
Copy link

@ReillyProcentive Thank you for the explanation. Is this proposal still being considered?

@ReillyTevera
Copy link

@benjamin-bergia Maybe. I'm not a Traefik dev so I can't say for sure.

I will say though that in my personal opinion it would be better to focus on creating some high quality documentation on using cert-manager and Traefik together rather than implementing the proposal.

@borland667
Copy link

@benjamin-bergia Maybe. I'm not a Traefik dev so I can't say for sure.

I will say though that in my personal opinion it would be better to focus on creating some high quality documentation on using cert-manager and Traefik together rather than implementing the proposal.

Is there such Doc written somewhere by now? I've been trying to find documentation on how to setup a Traefik Ingress with cert-manager and lets-encrypt but most of it is for nginx.

@nsteinmetz
Copy link

@borland667 I wrote this in French based on what has been said before - hopefully yaml files would be sufficient for you.

https://www.cerenit.fr/blog/kubernetes-ovh-traefik-cert-manager-secrets/

@borland667
Copy link

@borland667 I wrote this in French based on what has been said before - hopefully yaml files would be sufficient for you.

https://www.cerenit.fr/blog/kubernetes-ovh-traefik-cert-manager-secrets/

Thanks Nicolas!!

@tlvenn
Copy link

tlvenn commented Sep 18, 2019

In the light of the V2 release, I am wondering what is the plan for this feature. If one is using a k8s dynamic provider, he/she would expect Traefik to be k8s native end to end and therefore when shared storage is needed, the k8s api should be used to store it using configmaps & secrets which is what this proposal is all about.

A pull request (#3088) was even created to address this but closed without much further insight as to what the plan is.

One value of Traefik is that it does take care of the certificates, adding one moving part such as the cert-manager kinda eclipse some traefik values and not all dns providers supported by Traefik are supported by cert-manager, OVH comes to mind for example.

@emilevauge @dtomcej any update on this please ?

@geekgonecrazy
Copy link

I think based on their enterprise edition which includes something like this... This won’t make it in priorities any time soon if ever. Which can’t blame them at all.

Just my observation :)

@donnyv12
Copy link

Just ran into this particular issue with traefik in my kubernetes cluster. It's kind of a bummer to not be able to generate a kubernetes secret with the cert that can be consumed by Traefik for SSL certs. This is a similar pattern to what Cert Manager is doing. I agree with others above who suggested making the documentation more clear on how to use Traefik with Cert-man, but part of why I chose traefik was to NOT have to maintain cert-man as well.

@ichord
Copy link

ichord commented Nov 2, 2019

This make LetsEncrypt the real out of box support feature, for maintaining cert files is the biggest trouble while configuring for k8s.

We could use cert-manager, but the shine LetsEncrypt feature is meaningless (at least for me).

@jayjun
Copy link

jayjun commented Nov 27, 2019

I’m wary about mentioning competitors, but maybe it can guide Traefik on how to improve Kubernetes support.

Voyager has built-in Let’s Encrypt support and runs on multiple replicas without a separate store, because it ships a Kubernetes operator that can renew certificates. Voyager only runs on Kubernetes though so it’s expected to be better integrated.

@nsteinmetz
Copy link

For traefik2, I updated the tutorial => https://www.cerenit.fr/blog/kubernetes-ovh-traefik2-cert-manager-secrets/

As said in #5792, I can provide the translation if needed.

@jayjun if you don't mind delegating certificates to cert-manager, you have a scalable solution with Traefik and multiple replicas.

It was more transparent with Traefik V1 and the use of the ingress-shim from cert-manager but it's still very easy with Traefik V2. You just need to create the certificate object (till annotations are again supported in Traefik V2 for the Ingress(CRD) provider)

@JumpMaster
Copy link

JumpMaster commented Mar 7, 2021

This has been implemented by PR7351. I'm using this successfully with Traefik 2.4.5 and 2.4.6.

I also fear that this is not implemented to make HA a TraefikEE exclusive feature. But hopefully with this PR public it can no longer be ignored.

@SantoDE
Copy link
Collaborator

SantoDE commented Mar 8, 2021

@JumpMaster just to quickly jump in:

no, that's not the reason :) We see your PR and we actively did spent some work on that already. However, as you can see, it's marked as needs-design-reviev which only indicates that this is a huge change which needs some careful considerations before. We had similiar approaches with other latest PRs such as e.g. the integration of Consul Connect.

So, tldr: no worries. It's not about ignoring something but rather about taking the time it deserves.

@tfny

This comment was marked as outdated.

@tfny tfny added the contributor/wanted Participation from an external contributor is highly requested label Aug 29, 2022
@tfny
Copy link
Contributor

tfny commented Sep 9, 2022

Because we are asking for help, we figure it is only fair to share where we are on this subject. This comment is a summary of the discussions, both internal and on this thread to help out anyone interested in contributing to the design or implementation.

We, the current maintainers of Traefik Proxy, believe there is a slight discrepancy between the original expectation for this proposal and the implementation in the linked PR. We believe the solution could be a compromise between the two. So let's get to it piece by piece:

The original proposal
The original proposal was based on a use case from Traefik v1.x, because it provided an ACME HA feature. The proposal made sense back then, to store the certificates in secrets without requiring a file which is more ideal in a Kubernetes environment.

When we moved to v2.x and the ACME HA feature was dropped because it caused more harm than good, both in code complexity and configuration, plus it had a low user base. But removing ACME HA from v2 meant that we couldn’t deliver a simple certificate store that uses secrets, unless we are okay if things break when running more than one instance of Traefik Proxy in production. Consensus might be that we don't care if things break when running more than one instance of Traefik Proxy; but more on that later.

The PR 7351
At this point you already know the main issue with the PR 7351: it proposed to store all certificates belonging to a resolver on the same secret.
Kubernetes secrets are not a storage per say, and they don't try to be one so it's expected that they will fail in many scenarios when you attempt using it like that. For example the amount of data you can store is very limited.
That is not to diminish or dismiss work of the original author, on the contrary we really appreciate the effort and time he dedicated to this, really, but in the end it could not be used as is.

Alternatives
Finally we took the decision to explore what would be our alternatives then and realized this:
Rework PR 7351
Even if we circumvent the major problem by allowing it to store one secret per certificate while reworking the certificate storage and Ingress/IngressRoute mechanisms to accommodate that we still have the scalability issue.
We would have to document it thoroughly but even with that we thought users would run it with multiple instances anyway.

Call for the King
Another solution would be to improve our integration with Cert Manager while also providing more guides and documentation around that. Potential improvements include adapting our own CRD's and making use of theirs CRD.

Your turn
We hope this provides a good summary for anyone jumping in with a fresh or renewed approach to the use case and proposal. And if you already have ideas to solve the issues presented in any of the "Alternatives" studies or even a new one to propose, don't hesitate to share with us!

Side note on ACME & HA
Even though this is not exactly on point, it’s a linked topic: Handling an HA feature on ACME was also on the table, but this was definitely not the best option for us. First it would introduce a lot of complexity in the code again (specifically distributing the ACME challenges, which was never stable back then), then it would only be available in Kubernetes, and even then taken from the API reference "Note: Replacing a resource object may not result immediately in changes being propagated to downstream objects. For instance, replacing a ConfigMap or Secret resource will not result in all Pods seeing the changes unless the Pods are restarted out of band." Also, that would leave every other supported platform out :(
Just for the first reason alone, we believe this is not a good idea, plus we would be solving / replicating a lot of the problems Cert Manager, Traefik Enterprise & Hub already handle very well.

@tfny
Copy link
Contributor

tfny commented Sep 9, 2022

Tagging @koenbollen as you might be interested in this.

@PatrickLaabs
Copy link

Hi everyone,
I am a little bit late on the show.. there was a lot to do in the last couple of months 😄

I guess the 'Call for the King' might be the best option, since - as far as I know - many companies and users are now using cert-manager for managing their certs with lets encrypt.

A good guide on how to get started with cert-manager along with Traefik might be helpful.

If there's no one currently working on it, I'd like to tackle this one.
Maybe I'll find a way for a good implementation inside Traefik which will only need some good configuration on the deployment side.

Any thoughts on this? 😄

@nmengin
Copy link
Contributor

nmengin commented Nov 22, 2022

Hello @PatrickLaabs,

I agree with you, a good guide might be helpful for a lot of users.
We'll gladly review your PR. 👍

@mloiseleur
Copy link
Contributor

@PatrickLaabs

If it can help, there were an attemp on PR #9220. See here.

Nowadays, there is a good getting started guide on Kubernetes, so you can probably skip / remove the installation part and focus on TLS with Traefik native LE and with Cert Manager.

@PrivatePuffin
Copy link

Another solution would be to improve our integration with Cert Manager while also providing more guides and documentation around that. Potential improvements include adapting our own CRD's and making use of theirs CRD.

This would be my personal prefered option.
Cert-Manager has slowly-but-certainly been moving to be, basically, one of the industry standards for certificates on kubernetes. Spending scarce development effort reinventing a part of that wheel, seems completely silly.

@PatrickLaabs
Copy link

PatrickLaabs commented Mar 9, 2023

Hey,
sorry for the late response.

Yes, I totally agree on the fact, that cert-manager is almost / kind-of the standard way for certifications.
I am currently working on the implementation on our internal clusters.

I will take this Issue now, and will update the documentation about the implementations / todos that needs to be done for generating certificates with the cert-manager and handing them over to traefik.

Edit:
Well, I cannot use /assign. Can anyone assign this issue to me, please? 😄

@PrivatePuffin
Copy link

Hey, sorry for the late response.

Yes, I totally agree on the fact, that cert-manager is almost / kind-of the standard way for certifications. I am currently working on the implementation on our internal clusters.

I will take this Issue now, and will update the documentation about the implementations / todos that needs to be done for generating certificates with the cert-manager and handing them over to traefik.

Edit: Well, I cannot use /assign. Can anyone assign this issue to me, please? 😄

Great to hear.

For native ingress-wide support, it just needs an annotation, so I think that doesn't need any work from traefik' side.

But for ingressRoute and/or implementation directly into the traefik config and such, you might indeed want to work on implementation. As well :)

@pcellix
Copy link

pcellix commented Mar 20, 2023

Hi,

I have a question. Do you think it would be possible to make traefik compatible with previous api. That way the annotation can be reused
For example
traefik.ingress.kubernetes.io/router.tls.certresolver: default
For example on the traefik after cert-manager integration it could be used the following way:

  - "--certificatesresolvers.default.acme.certmanager=true"
  - "--certificatesresolvers.default.acme.httpchallenge.entrypoint=web"
  - "--certificatesresolvers.default.acme.secretprefix=devcluster"

That way a lot of us who already the infrastructure running could reuse the existing without breaking anything (creating changes to ingress). I think it would be the best solution as it would require the least amount of effort to modify and would still use cert-manager
Please let me know your thoughts

Thank you,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme area/provider/k8s/ingress contributor/wanted Participation from an external contributor is highly requested kind/proposal a proposal that needs to be discussed.
Projects
Status: On-Hold
Development

No branches or pull requests