Details
-
New Feature
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Today we are not able to create a parquet based hive table without having to specify the column names and types. When we try to define it the following way, we get the error
"14/08/20 17:27:46 ERROR ql.Driver: FAILED: SemanticException [Error 10043]: Either list of columns or a custom serializer should be specified"
CREATE TABLE parquet_test ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/pratik/campaigns';
Whereas if we create a hive table on top of AVRO based files, we do not need to specify the column names, hive automatically figures out the schema through the SerDe.
CREATE EXTERNAL TABLE campaigns ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/pratik/campaigns' TBLPROPERTIES ('avro.schema.url'='hdfs:///user/pratik/campaigns.avsc');