Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30306

Instrument Python UDF execution time and metrics using Spark Metrics system

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.1.0
    • None
    • PySpark, Spark Core
    • None

    Description

      This proposes to extend Spark instrumentation to add metrics aimed at understanding the performance of Python code called by Spark, via UDF, Pandas UDF or with MapPartittions. Relevant performance counters are exposed using the Spark Metrics System (based on the Dropwizard library).  This allows to easily consume the metrics produced by executors, for example using a performance dashboard. See also the attached screenshot.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lucacanali Luca Canali
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: