Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32985

Decouple bucket filter pruning and bucket table scan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.0
    • 3.2.0
    • SQL
    • None

    Description

      As a followup from discussion in https://github.com/apache/spark/pull/29804#discussion_r493100510 . Currently in data source v1 file scan `FileSourceScanExec`, bucket filter pruning will only take effect with bucket table scan - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L542 . However this is unnecessary, as bucket filter pruning can also happen if we disable bucketed table scan. This help query leverage the benefit from bucket filter pruning to save CPU/IO to not read unnecessary bucket files, and do not bound by bucket table scan when the parallelism of tasks is a concern.

      Attachments

        Issue Links

          Activity

            People

              chengsu Cheng Su
              chengsu Cheng Su
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: