Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6678

Handle IllegalStateException in Async Scheduling mode of CapacityScheduler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0, 3.0.0-alpha3
    • 2.9.0, 3.0.0-beta1
    • capacityscheduler
    • None
    • Reviewed

    Description

      Error log:

      java.lang.IllegalStateException: Trying to reserve container container_e10_1495599791406_7129_01_001453 for application appattempt_1495599791406_7129_000001 when currently reserved container container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 #containers=40 available=... used=...
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
      

      Reproduce this problem:
      1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
      2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and allocated app-1/container-X2
      3. nm1 reserved app-2/container-Y
      4. proposal-1 was accepted but throw IllegalStateException when applying

      Currently the check code for reserve proposal in FiCaSchedulerApp#accept as follows:

                // Container reserved first time will be NEW, after the container
                // accepted & confirmed, it will become RESERVED state
                if (schedulerContainer.getRmContainer().getState()
                    == RMContainerState.RESERVED) {
                  // Set reReservation == true
                  reReservation = true;
                } else {
                  // When reserve a resource (state == NEW is for new container,
                  // state == RUNNING is for increase container).
                  // Just check if the node is not already reserved by someone
                  if (schedulerContainer.getSchedulerNode().getReservedContainer()
                      != null) {
                    if (LOG.isDebugEnabled()) {
                      LOG.debug("Try to reserve a container, but the node is "
                          + "already reserved by another container="
                          + schedulerContainer.getSchedulerNode()
                          .getReservedContainer().getContainerId());
                    }
                    return false;
                  }
                }
      

      The reserved container on the node of reserve proposal will be checked only for first-reserve container.
      We should confirm that reserved container on this node is equal to re-reserve container.

      Attachments

        1. YARN-6678.001.patch
          14 kB
          Tao Yang
        2. YARN-6678.002.patch
          12 kB
          Tao Yang
        3. YARN-6678.003.patch
          12 kB
          Tao Yang
        4. YARN-6678.004.patch
          12 kB
          Tao Yang
        5. YARN-6678.005.patch
          11 kB
          Tao Yang
        6. YARN-6678.branch-2.005.patch
          11 kB
          Tao Yang

        Activity

          People

            Tao Yang Tao Yang
            Tao Yang Tao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: