Details
-
New Feature
-
Status: Resolved
-
Resolution: Fixed
-
None
-
None
-
None
Description
This patch is the first step towards introducing support for killing tasks in Spark. The higher-level goal is to support killing tasks that are speculatively executed or unnecessary (e.g. limit queries in Shark).
The plan is to support task killing by interrupting threads that are executing the tasks to either generate an InterruptedException (if the task is waiting on a lock) or set the interrupted status for the thread. As tasks in Spark can be CPU-bound while working on in-memory data, we need to periodically check the interrupt bit from the thread.
This is simple for ShuffleMap tasks where the user-defined function is invoked once per element. However as ResultTasks currently provide an RDD iterator to the user-defined-function, we have to change the interface to allow such tasks to be interrupted.
This patch shows an example of a new interface `OutputFunction` that makes ResultTasks use functions that can be called for each-element of the RDD.
Note: This is not a complete patch and only shows how OutputFunction will look for `collect` and `fold`. It would be great to get comments on the general approach and on the specific semantics used to create OutputFunctions.
This work was done jointly with @apanda
Attachments
Issue Links
- duplicates
-
SPARK-736 Killing tasks in spark
- Resolved