[SPARK-533] Killing tasks in spark - request for comment - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.8.1
Component/s: None
Labels:
None

Description

This patch is the first step towards introducing support for killing tasks in Spark. The higher-level goal is to support killing tasks that are speculatively executed or unnecessary (e.g. limit queries in Shark).

The plan is to support task killing by interrupting threads that are executing the tasks to either generate an InterruptedException (if the task is waiting on a lock) or set the interrupted status for the thread. As tasks in Spark can be CPU-bound while working on in-memory data, we need to periodically check the interrupt bit from the thread.

This is simple for ShuffleMap tasks where the user-defined function is invoked once per element. However as ResultTasks currently provide an RDD iterator to the user-defined-function, we have to change the interface to allow such tasks to be interrupted.

This patch shows an example of a new interface `OutputFunction` that makes ResultTasks use functions that can be called for each-element of the RDD.

Note: This is not a complete patch and only shows how OutputFunction will look for `collect` and `fold`. It would be great to get comments on the general approach and on the specific semantics used to create OutputFunctions.

This work was done jointly with @apanda

Attachments

Issue Links

duplicates

SPARK-736 Killing tasks in spark

Resolved

Activity

People

Assignee:: Reynold Xin

Reporter:: Shivaram Venkataraman

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Sep/12 14:18

Updated:: 17/Nov/13 14:45

Resolved:: 17/Nov/13 14:45