Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30475

File source V2: Push data filters for file listing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      Follow up on SPARK-30428 which added support for partition pruning in File source V2.
      We should also pass the dataFilters to the listFiles method.

      Datasources such as csv and json do not implement the SupportsPushDownFilters trait. In order to support data skipping uniformly for all file based data sources, one can override the listFiles method in a FileIndex implementation and use the dataFilters and partitionFilters to consult external metadata and prunes the list of files.

      Attachments

        Issue Links

          Activity

            People

              Guy Khazma Guy Khazma
              Guy Khazma Guy Khazma
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: