Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20213

DataFrameWriter operations do not show up in SQL tab

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2, 2.1.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL, Web UI
    • Labels:
      None

      Description

      In 1.6.1, DataFrame writes started using DataFrameWriter actions like insertInto would show up in the SQL tab. In 2.0.0 and later, they no longer do. The problem is that 2.0.0 and later no longer wrap execution with SQLExecution.withNewExecutionId, which emits SparkListenerSQLExecutionStart.

      Here are the relevant parts of the stack traces:

      Spark 1.6.1
      org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
      org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56)
      org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56)
      org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
      org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56) => holding Monitor(org.apache.spark.sql.hive.HiveContext$QueryExecution@424773807})
      org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
      org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:196)
      
      Spark 2.0.0
      org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
      org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
      org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) => holding Monitor(org.apache.spark.sql.execution.QueryExecution@490977924})
      org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
      org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:301)
      

      I think this was introduced by 54d23599. The fix should be to add withNewExecutionId to https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L610

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                cloud_fan Wenchen Fan
                Reporter:
                rdblue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: