Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2285

Capacity scheduler root queue usage can show above 100% due to reserved container.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.5.0
    • Fix Version/s: None
    • Component/s: capacityscheduler
    • Labels:
      None

      Description

      I configure queue A, B to have 1%, 99% capacity respectively. There is no max capacity for each queue. Set high user limit factor.
      Submit app 1 to queue A. AM container takes 50% of cluster memory. Task containers take another 50%. Submit app 2 to queue B. Preempt task containers of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 5000% used.

      1. preemption_5000_percent.png
        323 kB
        Tassapol Athiapinya

        Issue Links

          Activity

          Hide
          tassapola Tassapol Athiapinya added a comment -

          After closer look, 5000% is valid number. It means 5000% of "guaranteed capacity" of queue A (about 50% of absolute used capacity). I am making changes to jira title accordingly. I will also make this improvement jira instead of a bug.

          The point here becomes whether it is nice to "re-label" text in web UI to better reflect its meaning saying "% used next to queue is % of guaranteed queue capacity, not absolute used capacity".

          Show
          tassapola Tassapol Athiapinya added a comment - After closer look, 5000% is valid number. It means 5000% of "guaranteed capacity" of queue A (about 50% of absolute used capacity). I am making changes to jira title accordingly. I will also make this improvement jira instead of a bug. The point here becomes whether it is nice to "re-label" text in web UI to better reflect its meaning saying "% used next to queue is % of guaranteed queue capacity, not absolute used capacity".
          Hide
          tassapola Tassapol Athiapinya added a comment -

          Also it is not major but percentage shown is not right. In attached screenshot, root queue used is 146.5%.

          Show
          tassapola Tassapol Athiapinya added a comment - Also it is not major but percentage shown is not right. In attached screenshot, root queue used is 146.5%.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Assigned it to me, working on this ...

          Show
          leftnoteasy Wangda Tan added a comment - Assigned it to me, working on this ...
          Hide
          leftnoteasy Wangda Tan added a comment -

          Tassapol Athiapinya, one question, is the root queue usage over 100% reproduce-able? If yes, could you provide info about how to reproduce this issue?

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Tassapol Athiapinya , one question, is the root queue usage over 100% reproduce-able? If yes, could you provide info about how to reproduce this issue? Thanks,
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          From the look of it, it sounds like this isn't tied to preemption. It looks like this was a bug that exists even when preemption is not enabled. Can we validate that?

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - From the look of it, it sounds like this isn't tied to preemption. It looks like this was a bug that exists even when preemption is not enabled. Can we validate that?
          Hide
          sunilg Sunil G added a comment -

          I also got 106% in root queue with user-factor configured as 2. I observed reservation was happening in my scenario (2 nodes with 8GB). There were some preemption happened also in my test case, I will try collect logs share the same. Also will try without preemption as Vinod mentioned.

          Show
          sunilg Sunil G added a comment - I also got 106% in root queue with user-factor configured as 2. I observed reservation was happening in my scenario (2 nodes with 8GB). There were some preemption happened also in my test case, I will try collect logs share the same. Also will try without preemption as Vinod mentioned.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks comments from Vinod and Sunil,

          From the look of it, it sounds like this isn't tied to preemption. It looks like this was a bug that exists even when preemption is not enabled. Can we validate that?

          I'll validate this tomorrow

          The usage of root queue above 100% is caused by reserved container, currently the UI shows queue allocated+reserved, we may need change that for user easier understand what happened.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks comments from Vinod and Sunil, From the look of it, it sounds like this isn't tied to preemption. It looks like this was a bug that exists even when preemption is not enabled. Can we validate that? I'll validate this tomorrow The usage of root queue above 100% is caused by reserved container, currently the UI shows queue allocated+reserved, we may need change that for user easier understand what happened.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I've verified this will still happen even if preemption is not enabled, both for 5000% queue usage and above 100% root queue usage.

          Show
          leftnoteasy Wangda Tan added a comment - I've verified this will still happen even if preemption is not enabled, both for 5000% queue usage and above 100% root queue usage.
          Hide
          leftnoteasy Wangda Tan added a comment -

          This problem should be already resolved after YARN-3243. Closing this as duplicated.

          Show
          leftnoteasy Wangda Tan added a comment - This problem should be already resolved after YARN-3243 . Closing this as duplicated.

            People

            • Assignee:
              leftnoteasy Wangda Tan
              Reporter:
              tassapola Tassapol Athiapinya
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development