Add voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • scheduler
    • None

    Description

      We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? "

      It's really hard to answer such questions.

      So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/

      {parentQueueName}

      " which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc.
      Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result self-explains to the questions.
      I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhiguohong Hong Zhiguo
            zhiguohong Hong Zhiguo

            Dates

              Created:
              Updated:

              Slack

                Issue deployment