It was discovered that a particular data distribution in a DataFrame with groupBy clause could result in a JVM crash when calling df.rdd.isEmpty.
Reproducible 100% on this dataset.
The ticket is related to (can be thought of as a follow-up for) https://issues.apache.org/jira/browse/SPARK-33277. We need to patch one more place to make sure Python iterator is in sync with Java iterator and is terminated whenever the task is marked as completed.
Note that all other operations appear to work fine: count, collect.