Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20213

DataFrameWriter operations do not show up in SQL tab

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.1.0
    • 2.3.0
    • SQL, Web UI
    • None

    Description

      In 1.6.1, DataFrame writes started using DataFrameWriter actions like insertInto would show up in the SQL tab. In 2.0.0 and later, they no longer do. The problem is that 2.0.0 and later no longer wrap execution with SQLExecution.withNewExecutionId, which emits SparkListenerSQLExecutionStart.

      Here are the relevant parts of the stack traces:

      Spark 1.6.1
      org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
      org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56)
      org.apache.spark.sql.execution.QueryExecution$$anonfun$toRdd$1.apply(QueryExecution.scala:56)
      org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
      org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56) => holding Monitor(org.apache.spark.sql.hive.HiveContext$QueryExecution@424773807})
      org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
      org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:196)
      
      Spark 2.0.0
      org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
      org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
      org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) => holding Monitor(org.apache.spark.sql.execution.QueryExecution@490977924})
      org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
      org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:301)
      

      I think this was introduced by 54d23599. The fix should be to add withNewExecutionId to https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L610

      Attachments

        Issue Links

          Activity

            People

              cloud_fan Wenchen Fan
              rdblue Ryan Blue
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: