Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31549

Pyspark SparkContext.cancelJobGroup do not work correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.4.5, 3.0.0
    • 3.0.0
    • PySpark
    • None

    Description

      Pyspark SparkContext.cancelJobGroup do not work correctly. This is an issue existing for a long time. This is because of pyspark thread didn't pinned to jvm thread when invoking java side methods, which leads to all pyspark API which related to java local thread variables do not work correctly. (Including `sc.setLocalProperty`, `sc.cancelJobGroup`, `sc.setJobDescription` and so on.)

      This is serious issue. Now there's an experimental pyspark 'PIN_THREAD' mode added in spark-3.0 which address it, but the 'PIN_THREAD' mode exists two issue:

      • It is disabled by default. We need to set additional environment variable to enable it.
      • There's memory leak issue which haven't been addressed.

      Now there's a series of project like hyperopt-spark, spark-joblib which rely on `sc.cancelJobGroup` API (use it to stop running jobs in their code). So it is critical to address this issue and we hope it work under default pyspark mode. An optional approach is implementing methods like `rdd.setGroupAndCollect`.

      Attachments

        Issue Links

          Activity

            People

              weichenxu123 Weichen Xu
              weichenxu123 Weichen Xu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: