Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5543

ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      In SchedulingMonitor.java, when the service starts, it starts a checker thread to perform Capacity Scheduler's preemption. However, the implementation of this checker thread has the following issue:

      while (!stopped && !Thread.currentThread().isInterrupted()) {
          ....
          try {
            Thread.sleep(monitorInterval)
          } catch (InterruptedException e) {
            ....
            break;
          }
      }
      

      The above code snippet will terminate the checker thread whenever it is interrupted.
      We noticed in our cluster that this could lead to CapacityScheduler's preemption disabled unexpectedly due to the checker thread getting terminated.

      We propose to use ScheduledExecutorService to improve the robustness of this part of the code to ensure the liveness of CapacityScheduler's preemption functionality.

      Attachments

        1. YARN-5543.001.patch
          4 kB
          Min Shen
        2. YARN-5543.002.patch
          5 kB
          Min Shen
        3. YARN-5543.003.patch
          5 kB
          Min Shen
        4. YARN-5543-branch-2.7.001.patch
          5 kB
          Jonathan Hung
        5. YARN-5543.004.patch
          5 kB
          Jonathan Hung
        6. YARN-5543-branch-2.7.002.patch
          5 kB
          Jonathan Hung

        Issue Links

          Activity

            People

              mshen Min Shen
              mshen Min Shen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: