Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27589 Spark file source V2
  3. SPARK-27291

File source V2: Ignore empty files in load

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      In https://github.com/apache/spark/pull/23130, all empty files are excluded from target file splits in `FileSourceScanExec`.
      In File source V2, we should keep the same behavior.

      This PR suggests to filter out empty files on listing files in `PartitioningAwareFileIndex` so that the upper level doesn't need to handle them.

      Attachments

        Issue Links

          Activity

            People

              Gengliang.Wang Gengliang Wang
              Gengliang.Wang Gengliang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: