Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2229

Inconsistent behavior between Impala and Hive when creating an Avro table with an Avro schema in SERDEPROPERTIES and TBLPROPERTIES.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 1.3, Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
    • Fix Version/s: None
    • Component/s: Catalog
    • Labels:

      Description

      It looks like Impala and Hive search the possible locations for an Avro schema in different orders. See the different behavior for Impala and Hive using the following create table stmt:

      CREATE TABLE t
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      WITH SERDEPROPERTIES
      ('avro.schema.literal'='{"name": "my_record", "type": "record",
       "fields": [{"name": "serde_string", "type": "string"}]}')
      TBLPROPERTIES
      ('avro.schema.literal'='{"name": "my_record", "type": "record",
       "fields": [{"name": "tblprop_string", "type": "string"}]}');
      

      Run the CREATE TABLE and DESC in Hive:

      hive> CREATE TABLE t
          > ROW FORMAT SERDE
          > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
          > WITH SERDEPROPERTIES
          > ('avro.schema.literal'='{"name": "my_record", "type": "record",
          >  "fields": [{"name": "serde_string", "type": "string"}]}')
          > TBLPROPERTIES
          > ('avro.schema.literal'='{"name": "my_record", "type": "record",
          >  "fields": [{"name": "tblprop_string", "type": "string"}]}');
      OK
      Time taken: 0.689 seconds
      hive> desc t;
      OK
      tblprop_string      	string              	from deserializer   
      Time taken: 0.224 seconds, Fetched: 1 row(s)
      hive> 
      

      Run the CREATE TABLE and DESC in Impala. Note that Impala's syntax is slightly different.

      [localhost:21000] > CREATE TABLE t
                        > WITH SERDEPROPERTIES
                        > ('avro.schema.literal'='{"name": "my_record", "type": "record",
                        > "fields": [{"name": "serde_string", "type": "string"}]}')
                        > STORED AS AVRO
                        > TBLPROPERTIES
                        > ('avro.schema.literal'='{"name": "my_record", "type": "record",
                        > "fields": [{"name": "tblprop_string", "type": "string"}]}');
      Query: create TABLE t
      WITH SERDEPROPERTIES
      ('avro.schema.literal'='{"name": "my_record", "type": "record",
      "fields": [{"name": "serde_string", "type": "string"}]}')
      STORED AS AVRO
      TBLPROPERTIES
      ('avro.schema.literal'='{"name": "my_record", "type": "record",
      "fields": [{"name": "tblprop_string", "type": "string"}]}')
      
      WARNINGS: Ignoring column definitions in favor of Avro schema.
      The Avro schema has 1 column(s) but 0 column definition(s) were given.
      Fetched 0 row(s) in 0.32s
      [localhost:21000] > desc t;
      Query: describe t
      +--------------+--------+-------------------+
      | name         | type   | comment           |
      +--------------+--------+-------------------+
      | serde_string | string | from deserializer |
      +--------------+--------+-------------------+
      Fetched 1 row(s) in 4.83s
      

      The relevant code snippets from Impala can be found in CreateTableStmt.java and HdfsTable.java:

      // Look for the schema in TBLPROPERTIES and in SERDEPROPERTIES, with the latter
      // taking precedence.
      List<Map<String, String>> schemaSearchLocations = Lists.newArrayList();
      schemaSearchLocations.add(
          getMetaStoreTable().getSd().getSerdeInfo().getParameters());
      schemaSearchLocations.add(getMetaStoreTable().getParameters());
      

      We should make Impala behave consistently with Hive. However, this is an incompatible change, so we will need to schedule the fix accordingly.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: