Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3464

Race condition in LocalizerRunner kills localizer before localizing all resources

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Race condition in LocalizerRunner causes container localization timeout.
      Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty.

            } else if (pending.isEmpty()) {
              action = LocalizerAction.DIE;
            }
      

      If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled.
      Without ContainerLocalizer, LocalizerRunner#update will never be called.
      The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT.

      Attachments

        1. YARN-3464-branch-2.6.1.txt
          15 kB
          Vinod Kumar Vavilapalli
        2. YARN-3464.001.patch
          13 kB
          Zhihai Xu
        3. YARN-3464.000.patch
          11 kB
          Zhihai Xu

        Activity

          People

            zxu Zhihai Xu
            zxu Zhihai Xu
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: