Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4698

Negative value in RM UI counters due to double container release

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.5.1
    • Fix Version/s: None
    • Labels:
      None

      Description

      We noticed that on our cluster there are negative values in RM UI counters:

      • Containers Running: -19
      • Memory Used: -38GB
      • Vcores Used: -19

      After we checked RM logs, we found, that the following events had happened:

      • Assigned container: 67019 times
      • Released container: 67019 times
      • Invalid container released: 19 times

      Some log records related can be found within "Example.log-cut" attachment.

      After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill.
      Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff).
      Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions.

        Attachments

        1. Example.log-cut
          9 kB
          Dmytro Kabakchei
        2. mitigating2.5.1.diff
          2 kB
          Dmytro Kabakchei

          Issue Links

            Activity

              People

              • Assignee:
                wilfreds Wilfred Spiegelenburg
                Reporter:
                beard Dmytro Kabakchei
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: