[PARQUET-76] Hive cannot determine the list of columns automatically based on Parquet serde - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Today we are not able to create a parquet based hive table without having to specify the column names and types. When we try to define it the following way, we get the error
"14/08/20 17:27:46 ERROR ql.Driver: FAILED: SemanticException [Error 10043]: Either list of columns or a custom serializer should be specified"

CREATE  TABLE parquet_test
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  '/user/pratik/campaigns';

Whereas if we create a hive table on top of AVRO based files, we do not need to specify the column names, hive automatically figures out the schema through the SerDe.

CREATE EXTERNAL TABLE campaigns
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/pratik/campaigns'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/pratik/campaigns.avsc');

Attachments

Activity

People

Assignee:: Ashish Singh

Reporter:: Pratik Khadloya

Votes:: 4 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 20/Aug/14 21:35

Updated:: 23/Jun/24 03:26