Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-533

Killing tasks in spark - request for comment

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Resolution: Fixed
    • None
    • 0.8.1
    • None
    • None

    Description

      This patch is the first step towards introducing support for killing tasks in Spark. The higher-level goal is to support killing tasks that are speculatively executed or unnecessary (e.g. limit queries in Shark).

      The plan is to support task killing by interrupting threads that are executing the tasks to either generate an InterruptedException (if the task is waiting on a lock) or set the interrupted status for the thread. As tasks in Spark can be CPU-bound while working on in-memory data, we need to periodically check the interrupt bit from the thread.

      This is simple for ShuffleMap tasks where the user-defined function is invoked once per element. However as ResultTasks currently provide an RDD iterator to the user-defined-function, we have to change the interface to allow such tasks to be interrupted.

      This patch shows an example of a new interface `OutputFunction` that makes ResultTasks use functions that can be called for each-element of the RDD.

      Note: This is not a complete patch and only shows how OutputFunction will look for `collect` and `fold`. It would be great to get comments on the general approach and on the specific semantics used to create OutputFunctions.

      This work was done jointly with @apanda

      Attachments

        Issue Links

          Activity

            People

              rxin Reynold Xin
              shivaram Shivaram Venkataraman
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: