Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37831

Add task partition id in metrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • Spark Core
    • None

    Description

      There is no partition id in current metrics, it makes difficult to trace stage metrics, such as stage shuffle read, especially when there are stage retries. It is also impossible to check task metrics between different applications.

      class TaskData private[spark](
          val taskId: Long,
          val index: Int,
          val attempt: Int,
          val launchTime: Date,
          val resultFetchStart: Option[Date],
          @JsonDeserialize(contentAs = classOf[JLong])
          val duration: Option[Long],
          val executorId: String,
          val host: String,
          val status: String,
          val taskLocality: String,
          val speculative: Boolean,
          val accumulatorUpdates: Seq[AccumulableInfo],
          val errorMessage: Option[String] = None,
          val taskMetrics: Option[TaskMetrics] = None,
          val executorLogs: Map[String, String],
          val schedulerDelay: Long,
          val gettingResultTime: Long) 

      Adding partitionId in Task Data can not only make us easy to trace task metrics, also can make it possible to collect metrics for actual stage outputs, especially when stage retries.

      Attachments

        Activity

          People

            Jackey Lee Jackey Lee
            Jackey Lee Jackey Lee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: