Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13499

REFRESH on Iceberg tables can lead to data loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • Impala 4.4.1
    • None
    • Catalog
    • None
    • ghx-label-8

    Description

      When running a REFRESH statement on an Iceberg table the catalog loads it from the Hive metastore and later performs an alter_table here. It does so without taking a Hive lock, meaning that if any external process commits to the table between load and alter, the newly committed "metadata_location" property will be overwritten with the previous value and effectively will result in data loss.

      It should either take a Hive lock when doing this, or, if "iceberg.engine.hive.lock-enabled = false" use "alter_table_with_environmentContext" and set expected_parameter_key / expected_parameter_value to metadata_location / <previous version>.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              saulius.vl Saulius Valatka
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: