Hive
  1. Hive
  2. HIVE-4022

Structs and struct fields cannot be NULL in INSERT statements

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement.

      Again, this affects both Avro-backed tables and native Hive tables.

      ***For native Hive tables:

      The following:

      echo 1,2 >twovalues.csv
      hive
      CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
      CREATE TABLE oc (z STRUCT<a: int, b: int>);
      INSERT INTO TABLE oc SELECT null FROM tc;

      produces the error

      FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.

      The following:

      INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;

      produces the error:

      FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.

      ***For Avro:

      In HIVE-3528, there is in fact a null-struct test case in line 14 of
      https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

      The test script at
      https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

      does indeed work. But in that test, the query gets all of its data from a test table verbatim:

      INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

      If instead we stick in a hard-coded null for the struct directly into the query, it fails:

      INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

      with the following error:

      FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

      Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:

      INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

        Activity

        Hide
        Michael Malak added a comment -

        Note that there is a workaround for the case of setting STRUCT fields to NULL, but not for setting the whole STRUCT to a NULL.

        The following workaround does work:

        INSERT INT TABLE oc SELECT named_struct('a', cast(null as int), 'b', cast(null as int)) FROM tc;

        But there is no equivalent workaround to casting the whole STRUCT to NULL, as noted in the first comment of https://issues.apache.org/jira/browse/HIVE-1287

        Show
        Michael Malak added a comment - Note that there is a workaround for the case of setting STRUCT fields to NULL, but not for setting the whole STRUCT to a NULL. The following workaround does work: INSERT INT TABLE oc SELECT named_struct('a', cast(null as int), 'b', cast(null as int)) FROM tc; But there is no equivalent workaround to casting the whole STRUCT to NULL, as noted in the first comment of https://issues.apache.org/jira/browse/HIVE-1287
        Hide
        Adrian Hains added a comment -

        I found a workaround to get me past this restriction. I had a need to add some struct columns to a table t1 by way of copying the data to a new table t2 with the correct updated schema. Trying to insert directly to t2 by selecting from t1 with null literals failed for me as described in this jira ticket. To work around this I created an additional table t2copy that has the same schema as t2. Then I did an insert to t1 selecting from t2 left outer join t2copy, and referencing the t2copy.newStructColumn instance to have a table-sourced null value pass to t1. This worked. It may be that t2copy having the same struct definition is unnecessary, and a simple empty table with a bogus struct column definition would have worked just as well.

        Show
        Adrian Hains added a comment - I found a workaround to get me past this restriction. I had a need to add some struct columns to a table t1 by way of copying the data to a new table t2 with the correct updated schema. Trying to insert directly to t2 by selecting from t1 with null literals failed for me as described in this jira ticket. To work around this I created an additional table t2copy that has the same schema as t2. Then I did an insert to t1 selecting from t2 left outer join t2copy, and referencing the t2copy.newStructColumn instance to have a table-sourced null value pass to t1. This worked. It may be that t2copy having the same struct definition is unnecessary, and a simple empty table with a bogus struct column definition would have worked just as well.

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael Malak
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development