Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11784

Don't call Iceberg's planFiles redundantly during table load

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.3.0
    • Catalog
    • ghx-label-9

    Description

      Iceberg's planFiles() API is very expensive because it involves reading the Avro manifest files. It's especially expensive on object stores, though manifest caching can help here.

      Currently we invoke this API two times during table loading (via IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in loadPartitionStats().

      We should just invoke IcebergUtil.getIcebergFiles() once, then pass the result object to loadAllPartition() and loadPartitionStats().

      Attachments

        Issue Links

          Activity

            People

              boroknagyz Zoltán Borók-Nagy
              boroknagyz Zoltán Borók-Nagy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: