Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5543

ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In SchedulingMonitor.java, when the service starts, it starts a checker thread to perform Capacity Scheduler's preemption. However, the implementation of this checker thread has the following issue:

      while (!stopped && !Thread.currentThread().isInterrupted()) {
          ....
          try {
            Thread.sleep(monitorInterval)
          } catch (InterruptedException e) {
            ....
            break;
          }
      }
      

      The above code snippet will terminate the checker thread whenever it is interrupted.
      We noticed in our cluster that this could lead to CapacityScheduler's preemption disabled unexpectedly due to the checker thread getting terminated.

      We propose to use ScheduledExecutorService to improve the robustness of this part of the code to ensure the liveness of CapacityScheduler's preemption functionality.

        Attachments

        1. YARN-5543.001.patch
          4 kB
          Min Shen
        2. YARN-5543.002.patch
          5 kB
          Min Shen
        3. YARN-5543.003.patch
          5 kB
          Min Shen
        4. YARN-5543.004.patch
          5 kB
          Jonathan Hung
        5. YARN-5543-branch-2.7.001.patch
          5 kB
          Jonathan Hung
        6. YARN-5543-branch-2.7.002.patch
          5 kB
          Jonathan Hung

          Issue Links

            Activity

              People

              • Assignee:
                mshen Min Shen
                Reporter:
                mshen Min Shen
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: