Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4698

Negative value in RM UI counters due to double container release

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.5.1
    • None
    • None

    Description

      We noticed that on our cluster there are negative values in RM UI counters:

      • Containers Running: -19
      • Memory Used: -38GB
      • Vcores Used: -19

      After we checked RM logs, we found, that the following events had happened:

      • Assigned container: 67019 times
      • Released container: 67019 times
      • Invalid container released: 19 times

      Some log records related can be found within "Example.log-cut" attachment.

      After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill.
      Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff).
      Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions.

      Attachments

        1. mitigating2.5.1.diff
          2 kB
          Dmytro Kabakchei
        2. Example.log-cut
          9 kB
          Dmytro Kabakchei

        Issue Links

          Activity

            People

              wilfreds Wilfred Spiegelenburg
              beard Dmytro Kabakchei
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: