Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12293 OPTIMIZE statement to Compact Iceberg Tables
  3. IMPALA-12406

OPTIMIZE statement as an alias for INSERT OVERWRITE

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Frontend
    • ghx-label-14

    Description

      If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This fragmentation decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files which have to be merged on read.

      Currently INSERT OVERWRITE is used as a workaround to rewrite and compact Iceberg tables.

      The OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this problem.

      This first subtask introduces the new syntax, temporarily as an alias for INSERT OVERWRITE.

      Syntax: OPTIMIZE TABLE <table_name>;

      Limitations - OPTIMIZE TABLE can not be executed on the following tables:

      • Tables with partition evolution
      • Tables with complex types columns
      • Non-Iceberg tables

      Attachments

        Activity

          People

            noemi Noemi Pap-Takacs
            noemi Noemi Pap-Takacs
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: