Skip to content

Improve Err returned by runtime.RetryImmediateOnError #4680

@viveksb007

Description

@viveksb007

Bug Description

Currently the fn runtime.RetryImmediateOnError doesn't include actual underlying err in its final err and just returns the timeout. This fn should wrap the actual err from fn in its timeout err.

func RetryImmediateOnError(interval time.Duration, timeout time.Duration, retryable func(error) bool, fn func() error) error {

Issue because of this - err logs coming from places like

return errors.Wrap(err, "failed to delete securityGroup")

show failed to delete securityGroup: timed out waiting for the condition which could be enhanced as failed to delete securityGroup due to DependencyViolation: timed out waiting for the condition. Currently caller isn't aware of what was the last err seen based on that caller might want to have certain behavior like categorize it as a ServiceError or UserInducedError

Steps to Reproduce

  • Step-by-step guide to reproduce the bug:
  • Manifests applied while reproducing the issue:
  • Controller logs/error messages while reproducing the issue:
  1. Create a Service and backend pods
  2. Now add a SG rule to managed security group out of band (SG created by controller)
  3. Then try to delete the service and see logs from LBC

Expected Behavior

caller should know what is the last err it got from fn

Actual Behavior

err from fn is masked by timeout err.

Regression
Was the functionality working correctly in a previous version ? No
If yes, specify the last version where it worked as expected

Current Workarounds

Environment

  • AWS Load Balancer controller version:
  • Kubernetes version:
  • Using EKS (yes/no), if so version?:
  • Using Service or Ingress:
  • AWS region:
  • How was the aws-load-balancer-controller installed:
    • If helm was used then please show output of helm ls -A | grep -i aws-load-balancer-controller
    • If helm was used then please show output of helm -n <controllernamespace> get values <helmreleasename>
    • If helm was not used, then copy/paste the exact command used to install the controller, including flags and options.
  • Current state of the Controller configuration:
    • kubectl -n <controllernamespace> describe deployment aws-load-balancer-controller
  • Current state of the Ingress/Service configuration:
    • kubectl describe ingressclasses
    • kubectl -n <appnamespace> describe ingress <ingressname>
    • kubectl -n <appnamespace> describe svc <servicename>

Possible Solution (Optional)

RetryImmediateOnError can track the err its getting from fn and wrap the err which it got from wait.PollImmediate

Contribution Intention (Optional)

  • Yes, I'm willing to submit a PR to fix this issue
  • No, I cannot work on a PR at this time

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions