Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16370

Avro data type null not supported on partitioned tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.1.0, 2.1.1
    • None
    • None
    • None

    Description

      I was attempting to create hive tables over some partitioned Avro files. It seems the void data type (Avro null) is not supported on partitioned tables (i could not replicate the bug on an un-partitioned table).

      ---------------

      i managed to replicate the bug on two different hive versions.

      Hive 1.1.0-cdh5.10.0
      Hive 2.1.1-amzn-0
      ----------------

      how to replicate (avro tools are required to create the avro files):

      $ wget http://mirror.serversupportforum.de/apache/avro/avro-1.8.1/java/avro-tools-1.8.1.jar

      $ mkdir /tmp/avro
      $ mkdir /tmp/avro/null
      $ echo "{ \
      \"type\" : \"record\", \
      \"name\" : \"null_failure\", \
      \"namespace\" : \"org.apache.avro.null_failure\", \
      \"doc\":\"the purpose of this schema is to replicate the hive avro null failure\", \
      \"fields\" : [

      {\"name\":\"one\", \"type\":\"null\",\"default\":null}

      ] \
      } " > /tmp/avro/null/schema.avsc
      $ echo "

      {\"one\":null}

      " > /tmp/avro/null/data.json
      $ java -jar avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro/null/schema.avsc /tmp/avro/null/data.json > /tmp/avro/null/data.avro

      $ hdfs dfs -mkdir /tmp/avro
      $ hdfs dfs -mkdir /tmp/avro/null
      $ hdfs dfs -mkdir /tmp/avro/null/schema
      $ hdfs dfs -mkdir /tmp/avro/null/data
      $ hdfs dfs -mkdir /tmp/avro/null/data/foo=bar
      $ hdfs dfs -copyFromLocal /tmp/avro/null/schema.avsc /tmp/avro/null/schema/schema.avsc
      $ hdfs dfs -copyFromLocal /tmp/avro/null/data.avro /tmp/avro/null/data/foo=bar/data.avro

      $ hive

      hive> CREATE EXTERNAL TABLE avro_null
      PARTITIONED BY (foo string)
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      STORED as INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION
      '/tmp/avro/null/data/'
      TBLPROPERTIES (
      'avro.schema.url'='/tmp/avro/null/schema/schema.avsc')
      ;

      OK
      Time taken: 3.127 seconds

      hive> msck repair table avro_null;
      OK
      Partitions not in metastore: avro_null:foo=bar
      Repair: Added partition to metastore avro_null:foo=bar
      Time taken: 0.712 seconds, Fetched: 2 row(s)

      hive> select * from avro_null;
      FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception Hive internal error inside isAssignableFromSettablePrimitiveOI void not supported yet.java.lang.RuntimeException: Hive internal error inside isAssignableFromSettablePrimitiveOI void not supported yet.

      hive> select foo, count(1) from avro_null group by foo;

      OK
      bar 1
      Time taken: 29.806 seconds, Fetched: 1 row(s)

      Attachments

        1. HIVE-16370.2.patch
          2 kB
          Alice Fan
        2. HIVE-16370.01-branch-1.patch
          2 kB
          Alice Fan

        Activity

          People

            Unassigned Unassigned
            iamwrong rui miranda
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: