Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3426

Second AM attempt launched for session mode and recovery disabled for certain cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.7.2, 0.9.0, 0.8.5
    • None
    • None
    • Reviewed

    Description

      ApplicationSubmissionContext#setMaxAppAttempts does not fully guarantee that there will be only that many attempts at maximum. There are a few exceptional cases that are not count. Tez should protect itself from accidentally starting the second attempt in session mode and when recovery is disabled since the second attempt will always succeed with no work to do.

        @Override
        public boolean shouldCountTowardsMaxAttemptRetry() {
          try {
            this.readLock.lock();
            int exitStatus = getAMContainerExitStatus();
            return !(exitStatus == ContainerExitStatus.PREEMPTED
                || exitStatus == ContainerExitStatus.ABORTED
                || exitStatus == ContainerExitStatus.DISKS_FAILED
                || exitStatus == ContainerExitStatus.KILLED_BY_RESOURCEMANAGER);
          } finally {
            this.readLock.unlock();
          }
        }
      

      Attachments

        1. TEZ-3426.001.patch
          0.9 kB
          Jason Darrell Lowe
        2. TEZ-3426.002.patch
          1 kB
          Jason Darrell Lowe
        3. TEZ-3426.003.patch
          8 kB
          Jason Darrell Lowe
        4. TEZ-3426.004.patch
          8 kB
          Jason Darrell Lowe
        5. TEZ-3426-branch-0.7.004.patch
          8 kB
          Jason Darrell Lowe

        Issue Links

          Activity

            People

              jlowe Jason Darrell Lowe
              jeagles Jonathan Turner Eagles
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: