Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32352

Partially push down support data filter if it mixed in partition filters

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • SQL
    • None

    Description

      We support partially push partition filters since SPARK-28169. We can also support partially push down data filters if it mixed in partition filters and data filters. For example:

      spark.sql(
        s"""
           |CREATE TABLE t(i INT, p STRING)
           |USING parquet
           |PARTITIONED BY (p)""".stripMargin)
      
      spark.range(0, 1000).selectExpr("id as col").createOrReplaceTempView("temp")
      for (part <- Seq(1, 2, 3, 4)) {
        sql(s"""
               |INSERT OVERWRITE TABLE t PARTITION (p='$part')
               |SELECT col FROM temp""".stripMargin)
      }
      
      spark.sql("SELECT * FROM t WHERE  WHERE (p = '1' AND i = 1) OR (p = '2' and i = 2)").explain()
      

      We can also push down {{ i = 1 or i = 2 }}.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            angerszhuuu angerszhu
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment