Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23888

speculative task should not run on a given host where another attempt is already running on

    XMLWordPrintableJSON

Details

    Description

       

      There's a bug in:

      /** Check whether a task is currently running an attempt on a given host */
       private def hasAttemptOnHost(taskIndex: Int, host: String): Boolean = {
         taskAttempts(taskIndex).exists(_.host == host)
       }
      

      This will ignore hosts which have finished attempts, so we should check whether the attempt is currently running on the given host. 

      And it is possible for a speculative task to run on a host where another attempt failed here before.

      Assume we have only two machines: host1, host2.  We first run task0.0 on host1. Then, due to  a long time waiting for task0.0, we launch a speculative task0.1 on host2. And, task0.1 finally failed on host1, but it can not re-run since there's already  a copy running on host2. After another long time, we launch a new  speculative task0.2. And, now, we can run task0.2 on host1 again, since there's no more running attempt on host1.

      ******

      After discussion in the PR, we simply make the comment be consistent the method's behavior. See details in PR#20998.

       

      Attachments

        Activity

          People

            Ngone51 wuyi
            Ngone51 wuyi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: