Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8990

Fix fair scheduler race condition in app submit and queue cleanup

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      With the introduction of the dynamic queue deletion in YARN-8191 a race condition was introduced that can cause a queue to be removed while an application submit is in progress.

      The issue occurs in FairScheduler.addApplication() when an application is submitted to a dynamic queue which is empty or the queue does not exist yet. If during the processing of the application submit the AllocationFileLoaderService kicks of for an update the queue clean up will be run first. The application submit first creates the queue and get a reference back to the queue.
      Other checks are performed and as the last action before getting ready to generate an AppAttempt the queue is updated to show the submitted application ID..

      The time between the queue creation and the queue update to show the submit is long enough for the queue to be removed. The application however is lost and will never get any resources assigned.

      Attachments

        1. YARN-8990.001.patch
          13 kB
          wilfreds#1
        2. YARN-8990.002.patch
          28 kB
          Haibo Chen

        Issue Links

          Activity

            People

              wilfreds Wilfred Spiegelenburg
              wilfreds Wilfred Spiegelenburg
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m