Description
Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct. In my example it should have 2 failed tasks but it only shows one. Note I tested with master branch to verify its not fixed.
I will attach screen shot.
To reproduce:
$SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true"
import org.apache.spark.SparkEnv
sc.parallelize(1 to 10000, 10).map { x => if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect()
Attachments
Attachments
Issue Links
- contains
-
SPARK-25284 Spark UI: make sure skipped stages are updated onJobEnd
- Resolved
- is duplicated by
-
SPARK-24539 HistoryServer does not display metrics from tasks that complete after stage failure
- Resolved
-
SPARK-25910 accumulator updates from previous stage attempt should not fail
- Resolved
- links to