Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24415

Stage page aggregated executor metrics wrong when failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.3.0
    • 2.3.2, 2.4.0
    • Web UI
    • None

    Description

      Running with spark 2.3 on yarn and having task failures and blacklisting, the aggregated metrics by executor are not correct.  In my example it should have 2 failed tasks but it only shows one.    Note I tested with master branch to verify its not fixed.

      I will attach screen shot.

      To reproduce:

      $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=1 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1"  --conf "spark.blacklist.application.maxFailedTasksPerExecutor=2" --conf "spark.blacklist.killBlacklistedExecutors=true"

      import org.apache.spark.SparkEnv 

      sc.parallelize(1 to 10000, 10).map { x => if (SparkEnv.get.executorId.toInt >= 1 && SparkEnv.get.executorId.toInt <= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect()

      Attachments

        Issue Links

          Activity

            People

              ankur.gupta Ankur Gupta
              tgraves Thomas Graves
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: