Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32659

Fix the data issue of inserted DPP on non-atomic type

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.1, 3.1.0
    • Component/s: SQL
    • Labels:
    • Target Version/s:

      Description

      DPP has data issue when pruning on non-atomic type. for example:

       spark.range(1000)
       .select(col("id"), col("id").as("k"))
       .write
       .partitionBy("k")
       .format("parquet")
       .mode("overwrite")
       .saveAsTable("df1");
      
      spark.range(100)
       .select(col("id"), col("id").as("k"))
       .write
       .partitionBy("k")
       .format("parquet")
       .mode("overwrite")
       .saveAsTable("df2")
      
      spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio=2")
      spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=false")
      spark.sql("SELECT df1.id, df2.k FROM df1 JOIN df2 ON struct(df1.k) = struct(df2.k) AND df2.id < 2").show
      

      It should return two records, but it returns empty.

        Attachments

          Activity

            People

            • Assignee:
              yumwang Yuming Wang
              Reporter:
              yumwang Yuming Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: