Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-328 High efficient scheduling events framework phase 2
  3. YUNIKORN-333

Reduce the number events published to K8s event system when predicate fails

    XMLWordPrintableJSON

Details

    Description

      The problem today is we are publishing too many events to K8s.
      If you look at the code: https://github.com/apache/incubator-yunikorn-k8shim/blob/86cc199c00d44c1dde71c9f2faf5bc17ff28bbb7/pkg/plugin/predicates/predictor.go#L303-L304, this is called in the core scheduling logic upon each allocation, which could happen thousands of times per sec. For example, if a pod could not be allocated onto any of the nodes due to some node taints, it runs a while and we will see a huge number of dup events when we do "kubectl describe pod". So this gives us:

      • good: we do not lose any of events, all pushed to K8s
      • bad: overhead to the K8s event system (but gladly it aggregates the dup events)

      I think there are a few options we can evaluation:

      • Shall we cache such events via the event cache system, and then push them in 1s interval just like what we have done for headRoom check?
      • Add some rate-limit mechanism to reduce number of dup events

      could you pls take a look and let me know your thought. thanks!

      Attachments

        Issue Links

          Activity

            People

              tingyao TingYao Huang
              adam.antal Adam Antal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: