Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32352

Partially push down support data filter if it mixed in partition filters

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We support partially push partition filters since SPARK-28169. We can also support partially push down data filters if it mixed in partition filters and data filters. For example:

      spark.sql(
        s"""
           |CREATE TABLE t(i INT, p STRING)
           |USING parquet
           |PARTITIONED BY (p)""".stripMargin)
      
      spark.range(0, 1000).selectExpr("id as col").createOrReplaceTempView("temp")
      for (part <- Seq(1, 2, 3, 4)) {
        sql(s"""
               |INSERT OVERWRITE TABLE t PARTITION (p='$part')
               |SELECT col FROM temp""".stripMargin)
      }
      
      spark.sql("SELECT * FROM t WHERE  WHERE (p = '1' AND i = 1) OR (p = '2' and i = 2)").explain()
      

      We can also push down {{ i = 1 or i = 2 }}.

        Attachments

          Activity

            People

            • Assignee:
              angerszhuuu angerszhu
              Reporter:
              yumwang Yuming Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: