Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11752

Handle s3:// paths in Iceberg tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend, Frontend
    • ghx-label-4

    Description

      Components using S3FileIO might write out file paths starting with 's3://' instead of 's3a://'. The latter is used by HadoopFileIO that Impala is using.

      By default, HadoopFileIO doesn't interpret paths starting with 's3://'. (Probably this could be resolved by setting "fs.s3.impl" to "org.apache.hadoop.fs.s3a.S3AFileSystem" so that an s3a fs instance is created)

      FeIcebergTable.Utils.FeIcebergTable() depends on file paths returned by recursive file listing match the file paths in Iceberg metadata files. But the recursive listing returns s3a:// paths, while metadata contains s3:// paths, which means we'll load files one-by-one as we won't find the files in the hash map 'hdfsFileDescMap'.

      Moreover, if position delete file processing is also based on exact matches of the file URIs. Therefore if entries with s3:// paths won't have the desired effects.

      Attachments

        Activity

          People

            Unassigned Unassigned
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: