Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17613

PartitioningAwareFileCatalog.allFiles doesn't handle URI specified path at parent

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 2.1.0
    • SQL
    • None

    Description

      Consider you have a bucket as

      s3a://some-bucket
      

      and under it you have files:

      s3a://some-bucket/file1.parquet
      s3a://some-bucket/file2.parquet
      

      Getting the parent path of

      s3a://some-bucket/file1.parquet

      yields

      s3a://some-bucket/

      and the ListingFileCatalog uses this as the key in the hash map.
      When catalog.allFiles is called, we use

      s3a://some-bucket

      (no slash at the end) to get the list of files, and we're left with an empty list!

      Attachments

        Activity

          People

            brkyvz Burak Yavuz
            brkyvz Burak Yavuz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: