[SPARK-22340] pyspark setJobGroup doesn't match java threads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2
Fix Version/s: 3.0.0
Component/s: PySpark
Labels:
None

Description

With pyspark, sc.setJobGroup's documentation says

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared.

However, this doesn't appear to be associated with Python threads, only with Java threads. As such, a Python thread which calls this and then submits multiple jobs doesn't necessarily get its jobs associated with any particular spark job group. For example:

def run_jobs():
    sc.setJobGroup('hello', 'hello jobs')
    x = sc.range(100).sum()
    y = sc.range(1000).sum()
    return x, y

import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
    future = executor.submit(run_jobs)
    sc.cancelJobGroup('hello')
    future.result()

In this example, depending how the action calls on the Python side are allocated to Java threads, the jobs for x and y won't necessarily be assigned the job group hello.

First, we should clarify the docs if this truly is the case.

Second, it would be really helpful if we could make the job group assignment reliable for a Python thread, though I’m not sure the best way to do this. As it stands, job groups are pretty useless from the pyspark side, if we can't rely on this fact.

My only idea so far is to mimic the TLS behavior on the Python side and then patch every point where job submission may take place to pass that in, but this feels pretty brittle. In my experience with py4j, controlling threading there is a challenge.

Attachments

Issue Links

is duplicated by

SPARK-29017 JobGroup and LocalProperty not respected by PySpark

Resolved

links to

GitHub Pull Request #24705

GitHub Pull Request #24898

GitHub Pull Request #26588

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Leif Mortenson

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 24/Oct/17 00:23

Updated:: 12/Dec/22 18:11

Resolved:: 07/Nov/19 21:47