Support multiple Egress IPs per Egress #6591
Labels
area/transit/egress
Issues or PRs related to Egress (SNAT for traffic egressing the cluster).
kind/feature
Categorizes issue or PR as related to a new feature.
reported-by/end-user
Issues reported by end users.
Problem statement
Currently an Egress resource only supports a single Egress IP. Workloads selected by the Egress will always use this Egress IP for all outgoing connections to the external network.
Supporting multiple Egress IPs per Egress can enable the following use cases:
While it is possible to create multiple Egresses (each with its own Egress IP) selecting the same workload, only one Egress IP will "win" for each Node in the cluster: all connections originating from selected Pods on the Node will use the same Egress IP. Moreover, it is likely that the same Egress IP will "win" on all Nodes in the cluster (it is determined by the order in which the Egress resources are processed by the Antrea Agent, and the order should be the same across all Nodes / Agents in the cluster). So in practice, a single Egress IP will be used, and creating multiple Egresses doesn't really achieve anything.
Changes to the Egress API
At the very least, we would need to allow multiple
externalIPPool
values and multipleegressIP
values. Making a singular field plural in a K8s API is fairly "common", and well-documented: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#making-a-singular-field-plural. We should follow these directions when making change to the Egress CRD.In the first use case described above, we want to use a single ExternalIPPool and allocate more than one Egress IP from the pool.
In the second use case described above, we want to use multiple ExternalIPPool(s) and allocate at least one Egress IP from each pool.
This means that we may have
len(egressIPs) > len(externalIPPools)
. We may need a new CRD field to tell Antrea how many IPs to allocate from each pool (same count would apply across all pools).Then there is the "topology-aware" part of the API, which would rely on the
topology.kubernetes.io/zone
Node label.The "zone" of each Egress IP could be determined by the Node to which it is assigned.
The Egress resource should let the user specify that Egress IPs in the same zone should be preferred. Additionally, a user may want to be able to express that if an Egress IP is local to the Node on which a selected workload Pod is running, the local Egress IP should always be used. If multiple Egress IPs have the same preference level, one will be selected at random.
Additionally, it would be nice to have the concept of "anti-affinity" for Egress IPs allocated from the same ExternalIPPool, so that they can be assigned to different Nodes whenever possible. Note that today, assignment of Egress IPs to specific Nodes already requires iterating over all Egress resources for every change (in particular, this is necessary to implement
maxEgressIPsPerNode
correctly, see #4627), so I believe that this should be straightforward to implement, without impacting performance.Alternative considered
Rather than support multiple IPs per Egress, we could also change how the implementation handles multiple Egresses applied to the same workload (see description above). However, @tnqn has made the point that it would be harder to implement (and possibly less user-friendly as well), as the Egress controller would no longer be able to process Egress resources independently from each other.
Open questions
The text was updated successfully, but these errors were encountered: