Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14205

Hive doesn't support union type with AVRO file format

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Reproduce steps:

      hive> CREATE TABLE avro_union_test
          > PARTITIONED BY (p int)
          > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
          > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
          > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
          > TBLPROPERTIES ('avro.schema.literal'='{
          >    "type":"record",
          >    "name":"nullUnionTest",
          >    "fields":[
          >       {
          >          "name":"value",
          >          "type":[
          >             "null",
          >             "int",
          >             "long"
          >          ],
          >          "default":null
          >       }
          >    ]
          > }');
      OK
      Time taken: 0.105 seconds
      hive> alter table avro_union_test add partition (p=1);
      OK
      Time taken: 0.093 seconds
      hive> select * from avro_union_test;
      FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception Hive internal error inside isAssignableFromSettablePrimitiveOI void not supported yet.java.lang.RuntimeException: Hive internal error inside isAssignableFromSettablePrimitiveOI void not supported yet.
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
      	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:140)
      	at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
      	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
      	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
      	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
      	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
      

      Another test case to show this problem is:

      hive> create table avro_union_test2 (value uniontype<int,bigint>) stored as avro;
      OK
      Time taken: 0.053 seconds
      hive> show create table avro_union_test2;
      OK
      CREATE TABLE `avro_union_test2`(
        `value` uniontype<void,int,bigint> COMMENT '')
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION
        'hdfs://localhost/user/hive/warehouse/avro_union_test2'
      TBLPROPERTIES (
        'transient_lastDdlTime'='1468173589')
      Time taken: 0.051 seconds, Fetched: 12 row(s)
      

      Although column value is defined as uniontype<int,bigint> in create table command, its type becomes uniontype<void,int,bigint> after table is defined. Hive accidentally make the nullable definition in avro schema (["null", "int", "long"]) into union definition.

      Attachments

        1. HIVE-14205.1.patch
          8 kB
          Yibing Shi
        2. HIVE-14205.2.patch
          20 kB
          Yibing Shi
        3. HIVE-14205.3.patch
          21 kB
          Yibing Shi
        4. HIVE-14205.4.patch
          21 kB
          Yibing Shi
        5. HIVE-14205.5.patch
          21 kB
          Yibing Shi
        6. HIVE-14205.6.patch
          23 kB
          Yibing Shi
        7. HIVE-14205.7.patch
          23 kB
          Yibing Shi

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Yibing Yibing Shi Assign to me
            Yibing Yibing Shi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment