Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12667

Mention interoperability considerations for Iceberg table conversion

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      When Impala writes legacy tables with STRING columns, it doesn't add UTF8 annotation in the Parquet files. It doesn't do it because the users might store binary data in STRING columns (Impala only supports BINARY columns recently).

      When a legacy table is converted to Iceberg, the data files are not re-written, i.e. we just create the Iceberg metadata files over the existing data files.

      The Iceberg spec requires STRING columns to be stored with UTF8 annotation in Parquet files. Non-impala readers might throw exceptions when they find STRING columns without UTF8 annotation.

      Add a section about the above in the docs. Also mention CTAS statements as a possible workaround.

      Attachments

        Activity

          People

            Unassigned Unassigned
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: