Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
0.10.0
-
None
-
None
Description
After creating a table and load data into it, I could check that the table is created successfully, and data is inside:
DROP TABLE IF EXISTS ml_items;
CREATE TABLE ml_items(id INT,
title STRING,
release_date STRING,
video_release_date STRING,
imdb_url STRING,
unknown_genre TINYINT,
action TINYINT,
adventure TINYINT,
animation TINYINT,
children TINYINT,
comedy TINYINT,
crime TINYINT,
documentary TINYINT,
drama TINYINT,
fantasy TINYINT,
film_noir TINYINT,
horror TINYINT,
musical TINYINT,
mystery TINYINT,
romance TINYINT,
sci_fi TINYINT,
thriller TINYINT,
war TINYINT,
western TINYINT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
select * from ml_items ORDER BY id ASC;
While, the following create external table with AvroSerDe is not working:
DROP TABLE IF EXISTS ml_items_as_avro;
CREATE EXTERNAL TABLE ml_items_as_avro
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
describe ml_items_as_avro;
INSERT OVERWRITE TABLE ml_items_as_avro
SELECT id, title,
imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime,
documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
sci_fi, thriller, war, western
FROM ml_items;
ml_items_as_avro is not created with expected schema, as shown in the "describe ml_items_as_avro" output. The output is below:
PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error string from deserializer
cannot_determine_schema string from deserializer
check string from deserializer
schema string from deserializer
url string from deserializer
and string from deserializer
literal string from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns.