Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4698

Negative value in RM UI counters due to double container release

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.5.1
    • None
    • None

    Description

      We noticed that on our cluster there are negative values in RM UI counters:

      • Containers Running: -19
      • Memory Used: -38GB
      • Vcores Used: -19

      After we checked RM logs, we found, that the following events had happened:

      • Assigned container: 67019 times
      • Released container: 67019 times
      • Invalid container released: 19 times

      Some log records related can be found within "Example.log-cut" attachment.

      After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill.
      Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff).
      Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions.

      Attachments

        1. mitigating2.5.1.diff
          2 kB
          Dmytro Kabakchei
        2. Example.log-cut
          9 kB
          Dmytro Kabakchei

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            wilfreds Wilfred Spiegelenburg
            beard Dmytro Kabakchei
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment