Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8372

Distributed shell app master should not release containers when shutdown if keep-container is true

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.2.0, 3.1.1
    • distributed-shell
    • None
    • Reviewed

    Description

      try {
        response = client.allocate(progress);
      } catch (ApplicationAttemptNotFoundException e) {
      handler.onShutdownRequest();
      LOG.info("Shutdown requested. Stopping callback.");
      return;

      is a code snippet from AMRMClientAsyncImpl. The corresponding onShutdownRequest call for the Distributed Shell App master,

      @Override
      public void onShutdownRequest() {
        done = true;
      }

      Due to the above change, the current behavior is that whenever an application attempt fails due to a NM restart (NM where the DS AM is running), an ApplicationAttemptNotFoundException is thrown and all containers for that attempt including the ones that are running on other NMs are killed by the AM and marked as COMPLETE. The subsequent attempt spawns new containers just like a new attempt. This behavior is different to a Map Reduce application where the containers are not killed.
      cc Rohith Sharma K S

      Attachments

        1. YARN-8372.3.patch
          5 kB
          Suma Shivaprasad
        2. YARN-8372.2.patch
          5 kB
          Suma Shivaprasad
        3. YARN-8372.1.patch
          2 kB
          Suma Shivaprasad

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            suma.shivaprasad Suma Shivaprasad
            charanh Charan Hebri
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment