Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34265

Instrument Python UDF execution using SQL Metrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.3.0
    • 3.4.0
    • PySpark, SQL
    • None

    Description

      This proposes to add SQLMetrics instrumentation for Python UDF.
      This is aimed at improving monitoring and performance troubleshooting of Python UDFs, Pandas UDF, including also the use of MapPartittion, and MapInArrow.
      The introduced metrics are exposed to the end users via the metrics system and are visible through the WebUI interface, in the SQL/DataFrame tab for execution steps related to Python UDF execution. See also the attached screenshots.

      This intrumentation is lightweight and can be used in production and for monitoring. It is complementary to the Python/Pandas UDF Profiler introduced in Spark 3.3 https://spark.apache.org/docs/latest/api/python/development/debugging.html#python-pandas-udf

      Attachments

        1. PandasUDF_ArrowEvalPython_Metrics.png
          12 kB
          Luca Canali
        2. proposed_Python_SQLmetrics_v20210128.png
          64 kB
          Luca Canali
        3. PythonSQLMetrics_Jira_Picture.png
          25 kB
          Luca Canali

        Activity

          People

            lucacanali Luca Canali
            lucacanali Luca Canali
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: