Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14205

Hive doesn't support union type with AVRO file format

    XMLWordPrintableJSON

Details

    Description

      Reproduce steps:

      hive> CREATE TABLE avro_union_test
          > PARTITIONED BY (p int)
          > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
          > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
          > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
          > TBLPROPERTIES ('avro.schema.literal'='{
          >    "type":"record",
          >    "name":"nullUnionTest",
          >    "fields":[
          >       {
          >          "name":"value",
          >          "type":[
          >             "null",
          >             "int",
          >             "long"
          >          ],
          >          "default":null
          >       }
          >    ]
          > }');
      OK
      Time taken: 0.105 seconds
      hive> alter table avro_union_test add partition (p=1);
      OK
      Time taken: 0.093 seconds
      hive> select * from avro_union_test;
      FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception Hive internal error inside isAssignableFromSettablePrimitiveOI void not supported yet.java.lang.RuntimeException: Hive internal error inside isAssignableFromSettablePrimitiveOI void not supported yet.
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:140)
      	at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
      	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
      	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
      	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
      	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
      

      Another test case to show this problem is:

      hive> create table avro_union_test2 (value uniontype<int,bigint>) stored as avro;
      OK
      Time taken: 0.053 seconds
      hive> show create table avro_union_test2;
      OK
      CREATE TABLE `avro_union_test2`(
        `value` uniontype<void,int,bigint> COMMENT '')
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION
        'hdfs://localhost/user/hive/warehouse/avro_union_test2'
      TBLPROPERTIES (
        'transient_lastDdlTime'='1468173589')
      Time taken: 0.051 seconds, Fetched: 12 row(s)
      

      Although column value is defined as uniontype<int,bigint> in create table command, its type becomes uniontype<void,int,bigint> after table is defined. Hive accidentally make the nullable definition in avro schema (["null", "int", "long"]) into union definition.

      Attachments

        1. HIVE-14205.1.patch
          8 kB
          Yibing Shi
        2. HIVE-14205.2.patch
          20 kB
          Yibing Shi
        3. HIVE-14205.3.patch
          21 kB
          Yibing Shi
        4. HIVE-14205.4.patch
          21 kB
          Yibing Shi
        5. HIVE-14205.5.patch
          21 kB
          Yibing Shi
        6. HIVE-14205.6.patch
          23 kB
          Yibing Shi
        7. HIVE-14205.7.patch
          23 kB
          Yibing Shi

        Issue Links

          Activity

            People

              Yibing Yibing Shi
              Yibing Yibing Shi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: