Description
In 1.6.1, DataFrame writes started using DataFrameWriter actions like insertInto would show up in the SQL tab. In 2.0.0 and later, they no longer do. The problem is that 2.0.0 and later no longer wrap execution with SQLExecution.withNewExecutionId, which emits SparkListenerSQLExecutionStart.
Here are the relevant parts of the stack traces:
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56) org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56) org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56) => holding Monitor(org.apache.spark.sql.hive.HiveContext$QueryExecution@424773807}) org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:196)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) => holding Monitor(org.apache.spark.sql.execution.QueryExecution@490977924}) org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:301)
I think this was introduced by 54d23599. The fix should be to add withNewExecutionId to https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L610
Attachments
Attachments
Issue Links
- causes
-
SPARK-26682 Task attempt ID collision causes lost data
- Resolved
- contains
-
SPARK-20635 No SQL tab in Spark UI
- Resolved
- is duplicated by
-
SPARK-20635 No SQL tab in Spark UI
- Resolved
- relates to
-
SPARK-22977 DataFrameWriter operations do not show details in SQL tab
- Resolved
- links to