Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29092

EXPLAIN FORMATTED does not work well with DPP

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

       

      withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
        SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") {
        withTable("df1", "df2") {
          spark.range(1000)
            .select(col("id"), col("id").as("k"))
            .write
            .partitionBy("k")
            .format(tableFormat)
            .mode("overwrite")
            .saveAsTable("df1")
      
          spark.range(100)
            .select(col("id"), col("id").as("k"))
            .write
            .partitionBy("k")
            .format(tableFormat)
            .mode("overwrite")
            .saveAsTable("df2")
      
          sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2")
            .show(false)
      
          sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2")
            .show(false)
        }
      }
      

      The output of EXPLAIN EXTENDED is expected.

      == Physical Plan ==
      *(2) Project [id#2721L, k#2724L]
      +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight
         :- *(2) ColumnarToRow
         :  +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, DataFilters: [], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN subquery2741)], PushedFilters: [], ReadSchema: struct<id:bigint>
         :        +- Subquery subquery2741, [id=#358]
         :           +- *(2) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L#2740L])
         :              +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354]
         :                 +- *(1) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L])
         :                    +- *(1) Project [k#2724L]
         :                       +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
         :                          +- *(1) ColumnarToRow
         :                             +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint>
         +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#379]
            +- *(1) Project [k#2724L]
               +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
                  +- *(1) ColumnarToRow
                     +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint>
      
      

      However, the output of FileScan node of EXPLAIN FORMATTED does not show the effect of DPP

      * Project (9)
      +- * BroadcastHashJoin Inner BuildRight (8)
         :- * ColumnarToRow (2)
         :  +- Scan parquet default.df1 (1)
         +- BroadcastExchange (7)
            +- * Project (6)
               +- * Filter (5)
                  +- * ColumnarToRow (4)
                     +- Scan parquet default.df2 (3)
      
      (1) Scan parquet default.df1 
      Output: [id#2716L, k#2717L]
      

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dkbiswal Dilip Biswal
                Reporter:
                smilegator Xiao Li
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: