Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
Description
The RDD repartition also uses the round-robin way to distribute data, this can also cause incorrect answers on RDD workload the similar way as in https://issues.apache.org/jira/browse/SPARK-23207
The approach that fixes DataFrame.repartition() doesn't apply on the RDD repartition issue, as discussed in https://github.com/apache/spark/pull/20393#issuecomment-360912451
We track for alternative solutions for this issue in this task.
Attachments
Issue Links
- is duplicated by
-
SPARK-25156 Same query returns different result
- Closed
- is related to
-
SPARK-28699 Cache an indeterminate RDD could lead to incorrect result while stage rerun
- Resolved
-
SPARK-25342 Support rolling back a result stage
- In Progress
-
SPARK-25341 Support rolling back a shuffle map stage and re-generate the shuffle files
- Resolved
- relates to
-
SPARK-23207 Shuffle+Repartition on an DataFrame could lead to incorrect answers
- Resolved
-
SPARK-29042 Sampling-based RDD with unordered input should be INDETERMINATE
- Resolved
- links to