Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-1
Description
Currently we still load the file descriptors of an Iceberg table via recursive file listing.
This lists too many files, e.g. metadata files, files that are being written (can later throw checksum errors), files from aborted INSERTs, removed files, etc.
We should use the Iceberg API to load the file descriptors corresponding to the table snapshot. Iceberg DataFiles might also already contain the split offsets.