Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12987

Errors with \0 character in partition values

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • ghx-label-9

    Description

      Inserting strings with "\0" values to partition columns leads errors both in Iceberg and Hive tables.

      The issue is more severe in Iceberg tables as from this point the table can't be read in Impala or Hive:

      create table iceberg_unicode (s string, p string) partitioned by spec (identity(p)) stored as iceberg;
      insert into iceberg_unicode select "a", "a\0a";
      ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table hdfs://localhost:20500/test-warehouse/iceberg_unicode
      CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 paths for table default.iceberg_unicode: failed to load 1 paths. Check the catalog server log for more details.
      

      The partition directory created above seems truncated:
      hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

      In partition Hive tables the insert also returns an error, but the new partition is not created and the table remains usable. The error is similar to IMPALA-11499's

      Note that Java handles \0 characters in unicode in a special way, which may be related: https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            csringhofer Csaba Ringhofer

            Dates

              Created:
              Updated:

              Slack

                Issue deployment