Hive
  1. Hive
  2. HIVE-4022

Structs and struct fields cannot be NULL in INSERT statements

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement.

      Again, this affects both Avro-backed tables and native Hive tables.

      ***For native Hive tables:

      The following:

      echo 1,2 >twovalues.csv
      hive
      CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
      CREATE TABLE oc (z STRUCT<a: int, b: int>);
      INSERT INTO TABLE oc SELECT null FROM tc;

      produces the error

      FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.

      The following:

      INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;

      produces the error:

      FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.

      ***For Avro:

      In HIVE-3528, there is in fact a null-struct test case in line 14 of
      https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

      The test script at
      https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

      does indeed work. But in that test, the query gets all of its data from a test table verbatim:

      INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

      If instead we stick in a hard-coded null for the struct directly into the query, it fails:

      INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

      with the following error:

      FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

      Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:

      INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

        Activity

        Michael Malak made changes -
        Field Original Value New Value
        Summary Avro SerDe queries don't handle hard-coded nulls for optional/nullable structs Structs and struct fields cannot be NULL in INSERT statements
        Description Related to HIVE-3528,

        There is in fact a null-struct test case in line 14 of
        https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

        The test script at
        https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

        does indeed work. But in that test, the query gets all of its data from a test table verbatim:

        INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

        If instead we stick in a hard-coded null for the struct directly into the query, it fails:

        INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

        with the following error:

        FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

        Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:

        INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;
        Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement.

        Again, this affects both Avro-backed tables and native Hive tables.

        ***For native Hive tables:

        The following:

        echo 1,2 >twovalues.csv
        hive
        CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
        LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
        CREATE TABLE oc (z STRUCT<a: int, b: int>);
        INSERT INTO TABLE oc SELECT null FROM tc;

        produces the error

        FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.

        The following:

        INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;

        produces the error:

        FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.

        ***For Avro:

        In HIVE-3528, there is in fact a null-struct test case in line 14 of
        https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

        The test script at
        https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

        does indeed work. But in that test, the query gets all of its data from a test table verbatim:

        INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

        If instead we stick in a hard-coded null for the struct directly into the query, it fails:

        INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

        with the following error:

        FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

        Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:

        INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;
        Michael Malak created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael Malak
          • Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development