Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12987

Errors with \0 character in partition values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • ghx-label-9

    Description

      Inserting strings with "\0" values to partition columns leads errors both in Iceberg and Hive tables.

      The issue is more severe in Iceberg tables as from this point the table can't be read in Impala or Hive:

      create table iceberg_unicode (s string, p string) partitioned by spec (identity(p)) stored as iceberg;
      insert into iceberg_unicode select "a", "a\0a";
      ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table hdfs://localhost:20500/test-warehouse/iceberg_unicode
      CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 paths for table default.iceberg_unicode: failed to load 1 paths. Check the catalog server log for more details.
      

      The partition directory created above seems truncated:
      hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

      In partition Hive tables the insert also returns an error, but the new partition is not created and the table remains usable. The error is similar to IMPALA-11499's

      Note that Java handles \0 characters in unicode in a special way, which may be related: https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: