Description
This proposes to add SQLMetrics instrumentation for Python UDF.
This is aimed at improving monitoring and performance troubleshooting of Python UDFs, Pandas UDF, including also the use of MapPartittion, and MapInArrow.
The introduced metrics are exposed to the end users via the metrics system and are visible through the WebUI interface, in the SQL/DataFrame tab for execution steps related to Python UDF execution. See also the attached screenshots.
This intrumentation is lightweight and can be used in production and for monitoring. It is complementary to the Python/Pandas UDF Profiler introduced in Spark 3.3 https://spark.apache.org/docs/latest/api/python/development/debugging.html#python-pandas-udf