Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3442

AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.10.0
    • 0.10.0
    • None
    • None

    Description

      After creating a table and load data into it, I could check that the table is created successfully, and data is inside:

      DROP TABLE IF EXISTS ml_items;

      CREATE TABLE ml_items(id INT,
      title STRING,
      release_date STRING,
      video_release_date STRING,
      imdb_url STRING,
      unknown_genre TINYINT,
      action TINYINT,
      adventure TINYINT,
      animation TINYINT,
      children TINYINT,
      comedy TINYINT,
      crime TINYINT,
      documentary TINYINT,
      drama TINYINT,
      fantasy TINYINT,
      film_noir TINYINT,
      horror TINYINT,
      musical TINYINT,
      mystery TINYINT,
      romance TINYINT,
      sci_fi TINYINT,
      thriller TINYINT,
      war TINYINT,
      western TINYINT)
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
      STORED AS TEXTFILE;
      LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;

      select * from ml_items ORDER BY id ASC;

      While, the following create external table with AvroSerDe is not working:

      DROP TABLE IF EXISTS ml_items_as_avro;
      CREATE EXTERNAL TABLE ml_items_as_avro
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      WITH SERDEPROPERTIES (
      'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
      STORED as INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';

      describe ml_items_as_avro;

      INSERT OVERWRITE TABLE ml_items_as_avro
      SELECT id, title,
      imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime,
      documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
      sci_fi, thriller, war, western
      FROM ml_items;

      ml_items_as_avro is not created with expected schema, as shown in the "describe ml_items_as_avro" output. The output is below:

      PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
      PREHOOK: type: DROPTABLE
      POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
      POSTHOOK: type: DROPTABLE
      PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      WITH SERDEPROPERTIES (
      'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
      STORED as INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
      PREHOOK: type: CREATETABLE
      POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      WITH SERDEPROPERTIES (
      'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
      STORED as INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
      POSTHOOK: type: CREATETABLE
      POSTHOOK: Output: default@ml_items_as_avro
      PREHOOK: query: describe ml_items_as_avro
      PREHOOK: type: DESCTABLE
      POSTHOOK: query: describe ml_items_as_avro
      POSTHOOK: type: DESCTABLE
      error_error_error_error_error_error_error string from deserializer
      cannot_determine_schema string from deserializer
      check string from deserializer
      schema string from deserializer
      url string from deserializer
      and string from deserializer
      literal string from deserializer
      FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns.

      Attachments

        Activity

          People

            zhenxiao Zhenxiao Luo
            zhenxiao Zhenxiao Luo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: