Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-14
Description
If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This fragmentation decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files which have to be merged on read.
Currently INSERT OVERWRITE is used as a workaround to rewrite and compact Iceberg tables.
The OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this problem.
This first subtask introduces the new syntax, temporarily as an alias for INSERT OVERWRITE.
Syntax: OPTIMIZE TABLE <table_name>;
Limitations - OPTIMIZE TABLE can not be executed on the following tables:
- Tables with partition evolution
- Tables with complex types columns
- Non-Iceberg tables