Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Resolved
-
None
-
None
-
ghx-label-9
Description
Since Iceberg provides rich metadata, the cost of directory listing on OSS service e.g. S3A is higher than the cost on HDFS, we could create the file descriptors from Iceberg metadata instead of using org.apache.hadoop.fs.FileSystem#listFiles. https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
The only thing missing there is the last_modification_time of the files. But since Iceberg files are immutable, maybe we could just come up with a special timestamp for these files.