Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7591

NPE in async-scheduling mode of CapacityScheduler

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.0.0-alpha4, 2.9.1
    • 3.0.0, 2.9.1
    • capacityscheduler
    • None
    • Reviewed

    Description

      Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in special scenarios as below.
      (1) The user should be removed after its last application finished, NPE may be raised if getting something from user object without the null check in async-scheduling threads.
      (2) NPE may be raised when trying fulfill reservation for a finished application in CapacityScheduler#allocateContainerOnSingleNode.

          RMContainer reservedContainer = node.getReservedContainer();
          if (reservedContainer != null) {
            FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
                reservedContainer.getContainerId());
      
            // NPE here: reservedApplication could be null after this application finished
            // Try to fulfill the reservation
            LOG.info(
                "Trying to fulfill reservation for application " + reservedApplication
                    .getApplicationId() + " on node: " + node.getNodeID());
      

      (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve containerY on node1) were generated by different async-scheduling threads around the same time and proposal2 was submitted in front of proposal1, NPE is raised when trying to submit proposal2 in FiCaSchedulerApp#commonCheckContainerAllocation.

          if (reservedContainerOnNode != null) {
            // NPE here: allocation.getAllocateFromReservedContainer() should be null for proposal2 in this case
            RMContainer fromReservedContainer =
                allocation.getAllocateFromReservedContainer().getRmContainer();
      
            if (fromReservedContainer != reservedContainerOnNode) {
              if (LOG.isDebugEnabled()) {
                LOG.debug(
                    "Try to allocate from a non-existed reserved container");
              }
              return false;
            }
          }
      

      Attachments

        1. YARN-7591.001.patch
          5 kB
          Tao Yang
        2. YARN-7591.002.patch
          5 kB
          Tao Yang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Tao Yang Tao Yang
            Tao Yang Tao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment