Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32659

Fix the data issue of inserted DPP on non-atomic type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • SQL

    Description

      DPP has data issue when pruning on non-atomic type. for example:

       spark.range(1000)
       .select(col("id"), col("id").as("k"))
       .write
       .partitionBy("k")
       .format("parquet")
       .mode("overwrite")
       .saveAsTable("df1");
      
      spark.range(100)
       .select(col("id"), col("id").as("k"))
       .write
       .partitionBy("k")
       .format("parquet")
       .mode("overwrite")
       .saveAsTable("df2")
      
      spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio=2")
      spark.sql("set spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly=false")
      spark.sql("SELECT df1.id, df2.k FROM df1 JOIN df2 ON struct(df1.k) = struct(df2.k) AND df2.id < 2").show
      

      It should return two records, but it returns empty.

      Attachments

        Activity

          People

            yumwang Yuming Wang
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: