Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support EndpointSlice addressType "FQDN" #10080

Open
ChristianAnke opened this issue Jun 13, 2023 · 33 comments
Open

Support EndpointSlice addressType "FQDN" #10080

ChristianAnke opened this issue Jun 13, 2023 · 33 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@ChristianAnke
Copy link

When configuring an EndpointSlice with addressType "FQDN" it will be correctly configured.
https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/

Currently the following configuration is accepted, but not working when accessing the Ingress endpoint:

apiVersion: v1
kind: Service
metadata:
  name: reverse-proxy
spec:
  ports:
    - name: https
      port: 443
      targetPort: 443
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: reverse-proxy-1
  labels:
    kubernetes.io/service-name: reverse-proxy
addressType: FQDN
ports:
  - name: https
    appProtocol: https
    protocol: TCP
    port: 443
endpoints:
  - addresses:
      - "others.org"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  name: reverse-proxy
spec:
  rules:
    - host: myself.org
      http:
        paths:
          - backend:
              service:
                name: reverse-proxy
                port:
                  number: 443
            pathType: Prefix
            path: /foo

Error when accessing URL:

[lua] balancer.lua:348: balance(): error while setting current upstream peer [others.org]:443: invalid IPv6 address while connecting to upstream, client: xxx.xxx.xxx.xxx, server: myself.org, request: "GET /foo HTTP/2.0", host: "myself.org"

Requires Kubernetes Version: v1.21

@ChristianAnke ChristianAnke added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 13, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longwuyuan
Copy link
Contributor

longwuyuan commented Jun 13, 2023

@ChristianAnke , i did these steps ;

  • kubectl create deployment test0 --image nginx:alpine
  • kubectl expose deployment test0 --port 80
  • kubectl create ingress test0 --class nginx --rule test0.mydomain.com/"*"=test0:80
  • curl --resolve test0.mydomain.com:80: http://test0.mydomain.com

I was able to get a response code of 200.

So can you write your own instructions based on the above commands and the manifest produced by above kubectl commands, using the flag --dry-run=client . Edit the manifests as required. And also provide the appropriate curl command that does not get a 200 response. Then add the output of commands like kubectl logs $controllerpod

Then copy/paste the entire commands instructions or manifests for all related objects, so someone can reproduce the problem you are reporting.

Next, the new issue template asks questions so that there is data available to analyse the reported problem. You have not answered any questions. There is no info even on the controller version etc. So edit your issue description and kindly answer the questions asked in a new issue template. Please do format the information in markdown and code-snippets

@ChristianAnke
Copy link
Author

@longwuyuan , thanks for the answer.

on purpose i provided a kubernetes manifest which is reflecting what is required to reproduce the issue. I do not understand why you come up with a completely different setup than the one i provided.

Furthermore did i fill out template with the asked questions. I just removed the un-commented things because i have no idea how this was meant to be used since nothing of the template was visible in the preview mode. The template is this:

<!-- What do you want to happen? -->

<!-- Is there currently another issue associated with this? -->

<!-- Does it require a particular kubernetes version? -->

<!-- If this is actually about documentation, uncomment the following block -->

<!-- 
/kind documentation
/remove-kind feature
-->

@longwuyuan
Copy link
Contributor

Understood.

  • the goal is for a reader to be able to reproduce the problem
  • the template for a new issue asks questions that are relevant for a reader completely unaware of environment or context
  • your manifest is helpful on 3 objects
  • your manifest assumes multiple objects ranging from the controller version to the app deployment etc
  • lesser the assumptions, better chances to reproduce the problem
  • any chance you can write a manifest that someone can just copy/paste in a cluster created using kind or minikube ? Providing the curl request along with manifest for creating deployment, service, ingress will help

@tombokombo
Copy link
Contributor

Hi @ChristianAnke why would you play with low-level endpoint slices api, have you tried service externalName https://kubernetes.io/docs/concepts/services-networking/service/#externalname ?
It shoul work see https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/endpointslices.go#L55

@longwuyuan
Copy link
Contributor

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 17, 2023
@github-actions
Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 18, 2023
@BulatSaif
Copy link

I am verifying that issue is still present (kubernetes-version=1.27.4). I run the configuration form the description, here is full log output:

curl myself.org/foo
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>

ingress logs:

2023/09/06 19:49:55 [error] 1159#1159: *924559 [lua] balancer.lua:348: balance(): error while setting current upstream peer [others.org]:443: invalid IPv6 address while connecting to upstream, client: 10.244.0.1, server: myself.org, request: "GET /foo HTTP/1.1", host: "myself.org"

2023/09/06 19:50:00 [error] 1159#1159: *924559 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.244.0.1, server: myself.org, request: "GET /foo HTTP/1.1", upstream: "https://0.0.0.1:80/foo", host: "myself.org"
10.244.0.1 - - [06/Sep/2023:19:50:00  0000] "GET /foo HTTP/1.1" 504 160 "-" "curl/7.68.0" 77 5.000 [namespace-reverse-proxy-443] [] 0.0.0.1:80 0 5.000 504 9703016b67d175428a4f90615eee684f

line upstream: "https://0.0.0.1:80/foo" looks wrong, it should be https://[IP of others.org]:443/foo

@Ghilteras
Copy link

Ghilteras commented Mar 6, 2024

@tombokombo you can't use ExternalName service when exposing TCP traffic, you must use EndpointSlice and for some reason the FQDN is treated as IPv6

@longwuyuan is there any chance this can be fixed?

[error] 38#38: *2744 stream [lua] tcp_udp_balancer.lua:196: balance(): error while setting current upstream peer [my.foo.fqdn.com]:6379: invalid IPv6 address while connecting to upstream, client: 100.109.183.192, server: 0.0.0.0:6379, bytes from/to client:0/0, bytes from/to upstream:0/0

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 7, 2024

I don't understand the tiny details, that the question would imply, fully well. But if you are asking if a endpointslice can be created manually for the purpose of the controller picking it up in lieu of the function to get endpointslices, as a feature, then its not likely in the near future.

It will also help to know what, in layman terms, is the bigger picture problem, that is blocking use of ingress-nginx controller functions, and that will get fixed if you create a endpointslice and make the controller use that for routing ? Hoping for some elaboration on the reference to a "TCP service" and address-type "FQDN" etc. Kindly elaborate on the end goal .

@rteng1
Copy link

rteng1 commented Mar 7, 2024

@longwuyuan

I'm working with @Ghilteras on this

"TCP service" is as per https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/, in this case is example-go service. In our case, it's my-proxy service as code refs below.

Because we want to use the service as a proxy to a FQDN, we created a k8s service that has no selectors, and an endpointSlice of type FQDN that maps to the service (which hopefully creates the endpoints for the "tcp" service). But we are getting invalid IPv6 address while connecting to upstream error which suggests the endpoint slice is not creating the endpoints correctly because of the FQDN address type

Code refs:

kind: ConfigMap
apiVersion: v1
metadata:
  name: tcp-services
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
data:
  6379: "default/my-proxy:6379"
---
apiVersion: v1
kind: Service
metadata:
  name: my-proxy
spec:
  ports:
    - name: tcp
      port: 6379
      targetPort: 6379
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: my-proxy
  labels:
    kubernetes.io/service-name: my-proxy
    kubernetes.io/managed-by: manual
addressType: FQDN
ports:
  - name: redis
    appProtocol: tcp
    protocol: TCP
    port: 6379
endpoints:
  - addresses:
      - clustercfg.xxxxx.cache.amazonaws.com

@longwuyuan
Copy link
Contributor

Looks like you want host a proxy inside the cluster, listening at port 6379, and you are expecting a connection to this LB:6379 should in-turn connect to a AWS Redis Instance.

  • What is this proxy software (haproxy/customsoftware) ?

  • Are you already aware that the tcp/udp feature i not a upstream Kubernetes spec but a feature that this project implements, just so that users can send TCP/UDP traffic instead of just HTTP/HTTPS/GRPC ? So its not a ingress API routing.

  • The controller switched to using endpointslices for the ingress objects related routing. In my opinion that same codepath is not traversed for tcp/udp traffic routing. I could be wrong. But did you base this design of a endpointslice for a K8S object of --type service, based on that. In any case, I think a new feature , of that nature, if so, will not get worked on because the implementation of tcp/udp traffic routing is expected to change design.

  • I think people look for dedicated redis-proxy just like people write postgres/myql proxies. So that categorically mysql/postgres targeted proxy has the destination configured inside it and hence establishes a connection of its own, using its own lookup. It seems you are attempting to provide a destination for TCP traffic outbound from this pod called "my-proxy", by creating a K8S object like a endpointslice, and thus helping the pod avoid some name resolution tasks. There are so many implications of this. Why would you choose to expose a TCP socket on the LB of a K8S cluster only to reach AWS redis instance on the internet ? The use case is unclear

@rteng1
Copy link

rteng1 commented Mar 8, 2024

@longwuyuan I think there might be a misunderstanding so let me address your questions in a different order:

I think people look for dedicated redis-proxy just like people write postgres/myql proxies. So that categorically mysql/postgres targeted proxy has the destination configured inside it and hence establishes a connection of its own, using its own lookup. It seems you are attempting to provide a destination for TCP traffic outbound from this pod called "my-proxy", by creating a K8S object like a endpointslice, and thus helping the pod avoid some name resolution tasks. There are so many implications of this. Why would you choose to expose a TCP socket on the LB of a K8S cluster only to reach AWS redis instance on the internet ? The use case is unclear

What is this proxy software (haproxy/customsoftware) ?

The redis (elasticache) is only reachable inside the k8s cluster VPC (our elasticache shares the same VPC as the k8s cluster), we're trying to EXPOSE the elasticache for access outside the VPC

As mentioned, the proxy is just a normal k8s service (my-proxy), the expose is through another load balancer service as https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 80
      protocol: TCP
    - name: https
      port: 443
      targetPort: 443
      protocol: TCP
    - name: redis
      port: 6379
      targetPort: 6379
      protocol: TCP
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

then proxying through the tcp services map via args to run the nginx controller

 args:
    - /nginx-ingress-controller
    - --tcp-services-configmap=ingress-nginx/tcp-services

and we make the NLB public ("external: false") so that redis is reachable from outside the VPC

Are you already aware that the tcp/udp feature i not a upstream Kubernetes spec but a feature that this project implements, just so that users can send TCP/UDP traffic instead of just HTTP/HTTPS/GRPC ? So its not a ingress API routing.

The controller switched to using endpointslices for the ingress objects related routing. In my opinion that same codepath is not traversed for tcp/udp traffic routing. I could be wrong. But did you base this design of a endpointslice for a K8S object of --type service, based on that. In any case, I think a new feature , of that nature, if so, will not get worked on because the implementation of tcp/udp traffic routing is expected to change design.

That's interesting - we feared that might be the case, but are you saying that basically "ingress for tcp traffic to k8s cluster" will not ever be supported because the k8s spec says so?

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 8, 2024

  • Ah, thanks. So you have a redis elasticache and you want public internet clients to make calls to it. So instead of a native AWS conduit, you are trying to setup a tcp proxy in K8s. That would explain the effort.
  • About ingress for tcp traffic support, its actually not so black & white.
  • You actually want to create a endpointslice (not even just a endpoint) for your goal. Stating the obvious here for thoughts on slice being a collection of many and in this case many endpoints. Next relevant details here are
    • The service configured as target for the opened TCP port, already has at least one endpoint (NOT a endpointslice)
    • So there is likely that one endpointslice is already existing. If yes, then it has a field like endpointslices.endpoints.addresses
    • You are expecting that the controller drop all this info, and somehow pick a new endpointslice (not even a endpoint) , that a user created, and that also has a field like endpointslice.endpoints.addresses. This kind of functionality is not likely to get implemented in near future.
    • I think you have the choice to explore a redis proxy https://duckduckgo.com/t=ffab&q=redis proxy&atb=v390-1&ia=web
      • You can leverage the K8S object of kind service, --type externName, in case it fits the design of usng a redis proxy

@longwuyuan
Copy link
Contributor

My comments may not be on similar opinions as others so please wait and see if others comment on this

@Ghilteras
Copy link

Ghilteras commented Mar 12, 2024

@longwuyuan please see inline below the comments

You actually want to create a endpointslice (not even just a endpoint) for your goal. Stating the obvious here for thoughts on slice being a collection of many and in this case many endpoints. Next relevant details here are

The issue is that the ingress controller does not recognize EndpointSlice of type FQDN

The service configured as target for the opened TCP port, already has at least one endpoint (NOT a endpointslice)

Since Endpoints are deprecated, we have just created an EndpointSlice

You are expecting that the controller drop all this info, and somehow pick a new endpointslice (not even a endpoint) , that a user created, and that also has a field like endpointslice.endpoints.addresses. This kind of functionality is not likely to get implemented in near future.

Not really. We would expect the Service to pick up the EndpointSlice as per k8s documentation, which works fine for EndpointSlice of type IPv4, but for EndpointSlice of type FQDN the controller thinks it's an IPv6. This looks like a bug, not a feature request. Shouldn't we change the kind to reflect that?

I think you have the choice to explore a redis proxy https://duckduckgo.com/t=ffab&q=redis proxy&atb=v390-1&ia=web

That's what we are doing with haproxy to circumvent the fqdn/ipv6 EndpointSlice bug

You can leverage the K8S object of kind service, --type externName, in case it fits the design of usng a redis proxy

I don't think you can't tie a Service to another Service though. This could work if we could use an Ingress, but we can't. That's why we are hooking the Service with the EndpointSlice

@longwuyuan
Copy link
Contributor

@Ghilteras thanks for the update. it helped

  • This discussion has now requires a perusal of the code and I am not a developer to make comments on code
  • But I am aware that we switched to using the endpointslice api for the ingress routing
  • I am also aware that the TCP/UDP ingress feature is not really using the ingress API a ingress is not for layer4 TCP/UDP
  • I am given to understand that the project implemented TCP/UDP ingestion and it likely use both go & lua code to implement a proxy
  • So the indication is that dev work is required around the tcp/udp proxy to use endpointlice of type fqdn
  • AFAIK, this is not likely to happen in near future because there are plans to change the implementation of the tcp/udp proxy. But don't take my word for it and lets see if there are comment around this
  • If there are possibilities of a PR from you, I think it will get reviewed for impact

On the Redis-Proxy part, my thoughts were that I found some hits on searching like https://artifacthub.io/packages/search?ts_query_web=redis proxy&sort=relevance&page=1

  • If these redis-proxies ingest your AWS-rRedis-Elasticcache hostname as a destination, then you can deploy them and expose using tcp/udp port feature

On a complete tangent, if I were to implement this, I would have the frontend consume a configurable ENV-VAR for the AWS-Redis-ElasticCache FQDN, instead of redis queries first coming to a K8S-Cluster and then getting bounced off to AWS. The efficiency & security of K8S as target of redis queries but ultimatety destined for a AWS-ElasticCache, would only be compromised if there was some really unpleasing design aspect, forcing you to do this.

But these are my opinions. It is clear that a developer needs to comment here. There is really acute shortage of developer time so the choices are to join the community meeting https://github.com/kubernetes/community/tree/master/sig-network (and of course wait here for comments from community experts and developers)

@Ghilteras
Copy link

We already use a tcp proxy (haproxy) in the meantime while we wait that EndpointSlices of type FQDN are not mistaken as IPv6s so we do have a workaround. But I still do not understand why this issue is tagged as Feature, because the fact that NGINX interprets the FQDN as IPv6 seems to be a bug. I might be missing context here obviously, but why are we talking about changing the implementation of the tcp/proxy? How does fixing this bug requires changing the implementation? I am just genuinely curious here

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 13, 2024

@Ghilteras sorry for not being clear enough.

  • The tcp/udp port exposing feature is not a upstream K8S spec AFAIK. Its a feature that this project implemented.

  • In that context there are plans to change how tcp/udp port expose feature works

  • So I assume its less likely that there are resources like developer-time available to make the endpointslices of type FQDN work with this feature, in the short term

  • The critical high priority problems being worked on are just too many and too complicated to make this issue a higher priority

  • I could be wrong so we need to wait for others to comments

  • Please do join the next community meeting and bring this up as developers and maintainers are there live

  • You have the option to set the bug label on the issue if you want to

  • Its just that there is not enough smoking gun data like

  • k describe of controller resources (pod, svc, configMap)

  • k descrbe of app resources pod,svc exposing tcp port

  • k describe of a endpointslice in use for the expected behaviour

  • Complete and real curl command as used

  • Complete and real k logs output of the controller pod, including logs of the curl command if any

  • Log messages in the controller pod logs or other info that shows your endpointslice got used and error was FQDN getting evaluated a IPv6

  • Complete output of k get events

  • When there is data that shows a bug, it becomes easy to apply labels like "triage accepted" and "bug" to a issue as it reduces the effort needed by a developer

  • The e2e tests include tests to reach a tcp/udp port of a pod inside the cluster. There are no tests to send traffic to a FQDN obtained from a user created endpointslice so its going to be a NON-TRIVIAL effort for a developer

  • In case you want to submit a PR, I am certain that there will review comments coming on it

Hope you have more info now.

@Ghilteras
Copy link

Ghilteras commented Mar 18, 2024

The previous post contains enough data already supplied from @BulatSaif and @ChristianAnke. If you guys require additional information please let us know.

The tcp/udp port exposing feature is not a upstream K8S spec AFAIK. Its a feature that this project implemented.

I think we are all aware of that, that's why we filed this issue against NGINX Ingress repo and not against k8s

In that context there are plans to change how tcp/udp port expose feature works

Again, we are not asking to change how tcp/udp port expose works, we are asking to fix a bug

So I assume its less likely that there are resources like developer-time available to make the endpointslices of type FQDN work with this feature, in the short term
The critical high priority problems being worked on are just too many and too complicated to make this issue a higher priority

IMHO bugs that are easy to fix and this one looks like it should not require a lot of effort can be prioritized without dramatically altering the roadmap of the project.

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Mar 18, 2024
@Ghilteras
Copy link

Ghilteras commented Mar 18, 2024

/remove-kind feature
/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 18, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Mar 18, 2024
@longwuyuan
Copy link
Contributor

@Ghilteras thanks for your comments.
I guess we have to wait for comments from others.

@Ghilteras
Copy link

Ghilteras commented Apr 24, 2024

Just circling back to this to check whether someone can accept the triage and remove the needs more information tags

@k8s-ci-robot
Copy link
Contributor

@Ghilteras: The label triage/accepted cannot be applied. Only GitHub organization members can add the label.

In response to this:

/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@antoniolago
Copy link

antoniolago commented May 12, 2024

I'm using k8s v1.29.1 and was using helm chart v4.8.3.
While trying to redirect to an external FQDN I encountered the same issues as the OP, but bumping ingress-nginx helm chart's version to v4.10.1 apparently solved it.

kind: Service
apiVersion: v1
metadata:
  name: nextcloud
  namespace: nextcloud
spec:
  ports:
    - name: nextcloud
      protocol: TCP
      port: 80
      targetPort: 9855
  type: ExternalName
  sessionAffinity: None
  externalName: mydomain.org
---
kind: EndpointSlice
apiVersion: discovery.k8s.io/v1
metadata:
  name: nextcloud
  namespace: nextcloud
  labels:
    kubernetes.io/service-name: "nextcloud"
addressType: FQDN
ports:
  - name: nextcloud
    port: 9855
    protocol: TCP
endpoints:
  - addresses:
      - "mydomain.org"
    conditions:
      ready: true
---
kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: ingress-nextcloud
  namespace: nextcloud
  annotations:
    kubernetes.io/ingress.allow-http: "true"
    acme.cert-manager.io/http01-edit-in-place: "true"
    cert-manager.io/cluster-issuer: letsencrypt-production
spec:
  tls:
    - hosts:
        - myotherdomain.org
      secretName: mydomain-certificate
  rules:
    - host: myotherdomain.org
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: 
                name: nextcloud
                port: 
                  number: 80
  ingressClassName: nginx

@sig-piskule
Copy link

sig-piskule commented Sep 10, 2024

I am running into this bug.

I have the need to deploy a helm application multiple times, and load balance it for reliability / no downtime upgrades.

Deploying the application multiple times in the same namespace will not work due to automation we have, and as a work-around, I attempted to deploy the application in two separate namespaces. The idea was to loadbalance them using Network Slices.

I was hoping to use a combination of ExternalName services, as well as EndpointSlices to get a 'simple' version of this working without needing to deploy additional infrastructure. I want to avoid HAProxy like solutions, since that would require a long-term commitment to patch, and learn HAProxy.

The idea on paper:
ingress-nginx -> load-balancer-k8s-service -> load-balancer-k8s-endpoint-slice -> load-balancer-k8s-service-external-name -> the-actual-service

The overall setup actually works. When tested without ingress-nginx, I can target my created Ingresses app1.lb.svc.cluster.local and webserver.app.svc.cluster.local directly from a bastion pod. NS Lookup returns an appropriate IPv4 address from the bastion node.

Only when attempting to connect through ingress-nginx, do I receive this error.

2024/09/10 20:01:35 [error] 8204#8204: *794721 [lua] balancer.lua:348: balance(): error while setting current upstream peer [app1.lb.svc.cluster.local]:443: invalid IPv6 address while connecting to upstream, client: 198.182.55.132, server: _, request: "GET /test HTTP/2.0", host: "test.my.example.com"

We are currently running ingress-nginx with on version 4.11.1. GKE is on version 1.29.7-gke.1104000.

They current work-around I have is to specify IPv4 addresses instead of using the FQDN. This is less than ideal incase the IP address changes, but hypothetically, the IP should not be changing.

This is my currently invalid setup.

---
# Create our Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: lb
  labels:
    environment: nginx-ingress
---
# This will route traffic from our LB namespace to our Application Instance 1
apiVersion: v1
kind: Service
metadata:
  name: app1
  namespace: lb
spec:
  type: ExternalName
  externalName: webserver.app1.svc.cluster.local
---
# This will route traffic from our LB namespace to our Application Instance 2
apiVersion: v1
kind: Service
metadata:
  name: app2
  namespace: lb
spec:
  type: ExternalName
  externalName: webserver.app2.svc.cluster.local
---
# This is our load balancer service, which will be exposed as an Ingress, publicly available
apiVersion: v1
kind: Service
metadata:
  name: load-balancer-service
  namespace: lb
spec:
  ports:
    - name: https
      port: 443
      protocol: TCP
---
# This is the endpoint slice. Any traffic to this endpoint slice should be directed to the ExternalName services above, which point to our actual services
# I am unsure if we can just point directly to the service here rather than using ExternalName service hops, but that should be tested when this bug
# is resolved
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: load-balancer-service
  namespace: lb
  labels:
    kubernetes.io/service-name: load-balancer-service
addressType: FQDN
endpoints:
  - addresses:
     - app1.lb.svc.cluster.local
     - app2.lb.svc.cluster.local
ports:
  - name: https
    protocol: TCP
    port: 443
---
# This is the ingress that exposes my main load balancer
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    ...
  name: load-balancer-ingress
  namespace: lb
spec:
  ingressClassName: nginx
  defaultBackend:
    service:
      name: load-balancer-service
      port:
        number: 443

To someone following along and wanting my workaround, you would change the endpoint slice to something like this:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: load-balancer-service
  namespace: lb
  labels:
    kubernetes.io/service-name: load-balancer-service
addressType: IPv4
endpoints:
  - addresses:
     - 10.0.0.1 # actual service 1 IP
     - 10.0.0.2 # actual service 2 IP
ports:
  - name: https
    protocol: TCP
    port: 443

@Ghilteras
Copy link

Ghilteras commented Sep 10, 2024

@antoniolago which component did you update to v4.10.1?

@sig-piskule the whole point of this bug is that you cannot use addressType: FQDN

@antoniolago
Copy link

@antoniolago which component did you update to v4.10.1?

Hello, that would be the ingress-nginx helm chart.

@Ghilteras
Copy link

Ghilteras commented Sep 10, 2024

I don't see Endpoint Slices having been updated in the last few years so I don't see how bumping the chart would do anything to this bug..

@longwuyuan
Copy link
Contributor

Hi,

If the expectation is that the project will support & maintain manual creation of endpointSlice, then please note that there are no resources to work on that kind of support or maintenance.

If the expectation is that the project will support & maintain manual creation of endpointSlice for the ultimate goal to route TCP/UDP traffic from outside the cluster to pods inside the cluster or to a FQDN, then this is not going to be worked on anytime in the foreseeable future. This is because the project can not support features and use-cases that are not close and implied by the Ingress-API specs & functionalities.

There is just not enough developer time available to maintain all the features that are far away from the Ingress-API implications. And the requirement of securing the controller by default while working on the Gateway-API is higher priority.

Also it seems a fair expectation that a user should be able to create a endpointSlice and configure it with a FQDN destination. But this project is primarily a ingress-controller and there was never a promise made to support/maintain manual creation of endpointSlice. There are many other features the project provides that are not part of the Ingress-API. But it was done when conditions were favoring like expectations and resources. Thus this issue is not really a bug as such. Allowing creation of endpointSlice would be a fringe use feature when compared to the routing of HTTP/HTTPS traffic from outside the cluster to pods inside the cluster.

@antoniolago
Copy link

I don't see Endpoint Slices having been updated in the last few years so I don't see how bumping the chart would do anything to this bug..

Agreed, but it did for me.

@Ghilteras
Copy link

Ghilteras commented Sep 11, 2024

@antoniolago I'm saying that I don't think it did fix it, but that rather you dont see the error anymore because you are not leveraging the slice.

@longwuyuan this project has always been stretched thin. This is not new. What is not clear is the priority of bugs vs features. I frankly don't understand why allowing a bug like this to persist vs prioritizing other things, especially since this does not require a lot of effort to fix a feature like EndpointSlices that are not End of Life. Also, you mentioned manual creation? I don't understand what you mean by that, it's definitely automatic IaC manifests that we lay down in k8s with the Ingress Slice.

@longwuyuan
Copy link
Contributor

There is no endpointSlice creating procedure in docs AFAIK https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

9 participants