Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.0
-
None
Description
Currently, the only way to cancel running Spark Jobs is by using SparkContext.cancelJobGroup, using a job group name that was previously set using SparkContext.setJobGroup. This is problematic if multiple different parts of the system want to do cancellation, and set their own ids.
For example, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133 sets it's own job group, which may override job group set by user. This way, if user cancels the job group they set, it will not cancel these broadcast jobs launches from within their jobs...
As a solution, consider adding SparkContext.addJobTag / SparkContext.removeJobTag, which would allow to have multiple "tags" on the jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible cancelling of jobs.