Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-76

Hive cannot determine the list of columns automatically based on Parquet serde

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Today we are not able to create a parquet based hive table without having to specify the column names and types. When we try to define it the following way, we get the error
      "14/08/20 17:27:46 ERROR ql.Driver: FAILED: SemanticException [Error 10043]: Either list of columns or a custom serializer should be specified"

      CREATE  TABLE parquet_test
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
      LOCATION
        '/user/pratik/campaigns';
      

      Whereas if we create a hive table on top of AVRO based files, we do not need to specify the column names, hive automatically figures out the schema through the SerDe.

      CREATE EXTERNAL TABLE campaigns
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION '/user/pratik/campaigns'
      TBLPROPERTIES ('avro.schema.url'='hdfs:///user/pratik/campaigns.avsc');
      

      Attachments

        Activity

          People

            singhashish Ashish Singh
            tispratik Pratik Khadloya
            Votes:
            4 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: