Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9071

NM and service AM don't have updated status for reinitialized containers

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.2.0, 3.1.1
    • 3.1.2, 3.3.0, 3.2.1
    • None
    • None
    • In progress upgrade status may show READY state sooner than actual upgrade operations. External caller to upgrade API is recommended to wait minimum 30 seconds before querying yarn app -status.

    Description

      Container resource monitoring is not stopped during the reinitialization process, and this prevents the NM from obtaining updated process tree information when the container starts running again. I observed a reinitialized container go from RUNNING to REINITIALIZING to REINITIALIZING_AWAITING_KILL to SCHEDULED to RUNNING. Container monitoring was then started for a second time, but since the trackingContainers entry had already been initialized for the container, ContainersMonitor skipped finding the new PID and IP for the container. A possible solution would be to stop the container monitoring in the reinitialization process so that the process tree information would be initialized properly when monitoring is restarted. When the same container was stopped by the NM later, the NM did not kill the container, and the service AM received an unexpected event (stop at reinitializing).

      Attachments

        1. YARN-9071.001.patch
          5 kB
          Chandni Singh
        2. YARN-9071.002.patch
          9 kB
          Chandni Singh
        3. YARN-9071.003.patch
          6 kB
          Chandni Singh
        4. YARN-9071.004.patch
          12 kB
          Chandni Singh
        5. q.log
          166 kB
          Eric Yang
        6. YARN-9071.005.patch
          17 kB
          Chandni Singh
        7. YARN-9071.006.patch
          17 kB
          Chandni Singh

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            csingh Chandni Singh
            billie Billie Rinaldi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment