Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30425

FileScan of Data Source V2 doesn't implement Partition Pruning

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • None
    • SQL
    • None

    Description

      I was trying to understand how Data Source V2 handling partition pruning,  I didn't find the code anywhere which filtering out the unnecessary files in current Data Source V2 implementation. For a File data source, the base class FileScan of Data Source V2 possibly should handle this in "partitions" method. But the current implementation is like the following:

      protected def partitions: Seq[FilePartition] = {
      val selectedPartitions = fileIndex.listFiles(Seq.empty, Seq.empty)

       

      listFiles passed to empty sequence where no files will be filtered by the partition filter.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jerrychenhf Haifeng Chen
              Wenchen Fan Wenchen Fan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified