Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11662

Improve "refresh iceberg_tbl_on_oss;" performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • None
    • Impala 4.3.0
    • None
    • ghx-label-9

    Description

      Since Iceberg provides rich metadata, the cost of directory listing on OSS service e.g. S3A is higher than the cost on HDFS, we could create the file descriptors from Iceberg metadata instead of using org.apache.hadoop.fs.FileSystem#listFiles. https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
      The only thing missing there is the last_modification_time of the files. But since Iceberg files are immutable, maybe we could just come up with a special timestamp for these files.

      Attachments

        Activity

          People

            lipenglin Li Penglin
            lipenglin Li Penglin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: