Skip to content

[NLB] Wrong source ranges used for backend security group on old (pre 2023) NLB without own SG if preserveClientIP=true #4632

@frittentheke

Description

@frittentheke

Bug Description

This bug only appears in the context of using an old (created prior to ~ 2023) AWS Network Loadbalancer.
Back then NLBs did not have Security Groups (https://aws.amazon.com/blogs/containers/network-load-balancers-now-support-security-groups/) and those CANNOT be added without replacement of the NLB:

Image

AWS LBC will manage backend security groups for these old NLBs differently, see

if len(builder.sgOutput.securityGroupTokens) == 0 {
return builder.nlbNoSecurityGroups(targetPort, targetGroupSpec)
}
when it branches off to nlbNoSecurityGroups vs. the standardBuilder!

Steps to Reproduce

  • Step-by-step guide to reproduce the bug:

    • Ensure you are using an NLB which DOES NOT have security groups.
    • Create an internal NLB with preserveClientIP set to true
    • Check out backend security group does not NOT contain 0.0.0.0/0 or ::0/0 as allowed sources
  • Manifests applied while reproducing the issue:

  • Controller logs/error messages while reproducing the issue:

Expected Behavior
When using preserveClientIP e.g. via the

service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true

annotation, traffic reaching the Pods originates not from the internal NLB interfaces, but from the actual client IP. This requires (by default) to allow 0.0.0.0/0 for IPv4 or ::/0 for IPv6. By default and without client ip preservation traffic originates from the internal NLB interfaces and the corresponding subnets are all that needs to be allowed in the backend security group.

Actual Behavior
It seems that the condition of isPreserveClientIP is handled the wrong way ...

  1. In

    and
    if isPreserveClientIP {
    if builder.lbSourceRanges != nil {
    trafficSource = *builder.lbSourceRanges
    } else {
    trafficSource = []string{}
    }
    the lb source ranges (subnets) are added, which should only happen on isPreserveClientIP=false.

  2. Then

    if (preserveClientIP) && builder.lbScheme == elbv2model.LoadBalancerSchemeInternal {
    vpcInfo, err := builder.vpcInfoProvider.FetchVPCInfo(context.Background(), builder.vpcID, networking.FetchVPCInfoWithoutCache())
    if err != nil {
    return nil, err
    }
    if tgSpec.IPAddressType == elbv2model.TargetGroupIPAddressTypeIPv4 {
    defaultSourceRanges = vpcInfo.AssociatedIPv4CIDRs()
    } else {
    defaultSourceRanges = vpcInfo.AssociatedIPv6CIDRs()
    }
    }
    return defaultSourceRanges, nil
    }
    applies all VPC CIDRs as source for internal NLBs in case of preserveClientIP=true. While it's a sensible approach it's contrary what

documents, with "0.0.0.0/0" being the default also for internal NLB and with the recommendation given:

For enhanced security with internal network load balancers, we recommend limiting access by specifying allowed source IP ranges. This can be done using either the service.beta.kubernetes.io/load-balancer-source-ranges annotation or the spec.loadBalancerSourceRanges field.

While it somewhat makes sense to at first to allow "VPC CIDR only" for internal NLBs, this does not take into account any VPC peerings or other traffic from outside the VPC reaching the NLB. But most importantly this behavior is NOT aligned between old and new NLBs and thus somewhat surprising.

Regression
I believe yes, but I need to dig a little to find it.

Current Workarounds

Environment

  • AWS Load Balancer controller version: 3.1.0
  • Kubernetes version: 1.34
  • Using EKS (yes/no), if so version?: yes, v1.34.4-eks-3a10415
  • Using Service or Ingress: Service
  • AWS region: eu-central-1
  • How was the aws-load-balancer-controller installed:
    • If helm was used then please show output of helm ls -A | grep -i aws-load-balancer-controller
    • If helm was used then please show output of helm -n <controllernamespace> get values <helmreleasename>

mostly default:

USER-SUPPLIED VALUES:
aws-load-balancer-controller:
  clusterName: mycluster
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: REDACTED
  serviceMonitor:
    enabled: true
  • kubectl -n <appnamespace> describe svc <servicename>
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: istio-ingressgateway
    meta.helm.sh/release-namespace: istio-ingress
    service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true,
      dns_record.client_routing_policy=availability_zone_affinity
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: "true"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true,
      proxy_protocol_v2.enabled=true, deregistration_delay.timeout_seconds=120, deregistration_delay.connection_termination.enabled=true
    service.beta.kubernetes.io/aws-load-balancer-type: external
  creationTimestamp: "2022-05-19T11:17:40Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  - service.k8s.aws/resources
  labels:
    app: istio-ingressgateway
    app.kubernetes.io/instance: istio-ingressgateway
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: istio-ingressgateway
    app.kubernetes.io/part-of: istio
    app.kubernetes.io/version: 1.29.1
    helm.sh/chart: gateway-1.29.1
    istio: ingressgateway
    istio.io/dataplane-mode: none
  name: istio-ingressgateway
  namespace: istio-ingress
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: REDACTED
  clusterIPs:
  - REDACTED
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http2
    nodePort: 30649
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    nodePort: 32766
    port: 443
    protocol: TCP
    targetPort: 443
  selector:
    app: istio-ingressgateway
    istio: ingressgateway
  sessionAffinity: None
  type: LoadBalancer

Possible Solution (Optional)

Contribution Intention (Optional)

  • Yes, I'm willing to submit a PR to fix this issue
  • No, I cannot work on a PR at this time

Additional Context

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions