[IMPALA-10254] Load data files via Iceberg for Iceberg Tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Catalog
Labels:
- impala-iceberg

Epic Link:
Iceberg support in Impala
Epic Color:
ghx-label-1

Description

Currently we still load the file descriptors of an Iceberg table via recursive file listing.

This lists too many files, e.g. metadata files, files that are being written (can later throw checksum errors), files from aborted INSERTs, removed files, etc.

We should use the Iceberg API to load the file descriptors corresponding to the table snapshot. Iceberg DataFiles might also already contain the split offsets.

Attachments

Sub-Tasks

Re-use FileDescriptors loaded by HdfsTable during IcebergTable load

Resolved

Tamas Mate

Activity

People

Assignee:: Tamas Mate

Reporter:: Zoltán Borók-Nagy

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/Oct/20 08:56

Updated:: 29/Mar/23 09:53

Resolved:: 29/Mar/23 09:53