[YUNIKORN-333] Reduce the number events published to K8s event system when predicate fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.10
Component/s: core - scheduler
Labels:
- pull-request-available

Description

The problem today is we are publishing too many events to K8s.
If you look at the code: https://github.com/apache/incubator-yunikorn-k8shim/blob/86cc199c00d44c1dde71c9f2faf5bc17ff28bbb7/pkg/plugin/predicates/predictor.go#L303-L304, this is called in the core scheduling logic upon each allocation, which could happen thousands of times per sec. For example, if a pod could not be allocated onto any of the nodes due to some node taints, it runs a while and we will see a huge number of dup events when we do "kubectl describe pod". So this gives us:

good: we do not lose any of events, all pushed to K8s
bad: overhead to the K8s event system (but gladly it aggregates the dup events)

I think there are a few options we can evaluation:

Shall we cache such events via the event cache system, and then push them in 1s interval just like what we have done for headRoom check?
Add some rate-limit mechanism to reduce number of dup events

could you pls take a look and let me know your thought. thanks!

Attachments

Issue Links

is a clone of

YUNIKORN-331 Reduce the verbosity of the logs when predicate fails

Closed

links to

GitHub Pull Request #240

Activity

People

Assignee:: TingYao Huang

Reporter:: Adam Antal

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Jul/20 17:34

Updated:: 05/Sep/23 16:34

Resolved:: 16/Mar/21 08:22