Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.3.3, 2.4.3, 3.0.0
Description
It's another case for the indeterminate stage/RDD rerun while stage rerun happened.
We can reproduce this by the following code, thanks to Tyson for reporting this!
import scala.sys.process._ import org.apache.spark.TaskContext val res = spark.range(0, 10000 * 10000, 1).map{ x => (x % 1000, x)} // kill an executor in the stage that performs repartition(239) val df = res.repartition(113).cache.repartition(239).map { x => if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && TaskContext.get.stageAttemptNumber == 0) { throw new Exception("pkill -f -n java".!!) } x } val r2 = df.distinct.count()
Attachments
Issue Links
- causes
-
SPARK-28845 Enable spark.sql.execution.sortBeforeRepartition only for retried stages
- Resolved
- relates to
-
SPARK-23207 Shuffle+Repartition on an DataFrame could lead to incorrect answers
- Resolved
-
SPARK-23243 Shuffle+Repartition on an RDD could lead to incorrect answers
- Resolved
- links to
(3 links to)