XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • scheduler
    • None

    Description

      We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? "

      It's really hard to answer such questions.

      So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/

      {parentQueueName}

      " which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc.
      Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result self-explains to the questions.
      I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline.

      Attachments

        Activity

          People

            zhiguohong Hong Zhiguo
            zhiguohong Hong Zhiguo
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: