Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-328 High efficient scheduling events framework phase 2
  3. YUNIKORN-333

Reduce the number events published to K8s event system when predicate fails

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The problem today is we are publishing too many events to K8s.
      If you look at the code: https://github.com/apache/incubator-yunikorn-k8shim/blob/86cc199c00d44c1dde71c9f2faf5bc17ff28bbb7/pkg/plugin/predicates/predictor.go#L303-L304, this is called in the core scheduling logic upon each allocation, which could happen thousands of times per sec. For example, if a pod could not be allocated onto any of the nodes due to some node taints, it runs a while and we will see a huge number of dup events when we do "kubectl describe pod". So this gives us:

      • good: we do not lose any of events, all pushed to K8s
      • bad: overhead to the K8s event system (but gladly it aggregates the dup events)

      I think there are a few options we can evaluation:

      • Shall we cache such events via the event cache system, and then push them in 1s interval just like what we have done for headRoom check?
      • Add some rate-limit mechanism to reduce number of dup events

      could you pls take a look and let me know your thought. thanks!

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tingyao TingYao Huang
            adam.antal Adam Antal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment