Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-76

Hive cannot determine the list of columns automatically based on Parquet serde

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Today we are not able to create a parquet based hive table without having to specify the column names and types. When we try to define it the following way, we get the error
      "14/08/20 17:27:46 ERROR ql.Driver: FAILED: SemanticException [Error 10043]: Either list of columns or a custom serializer should be specified"

      CREATE  TABLE parquet_test
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
      LOCATION
        '/user/pratik/campaigns';
      

      Whereas if we create a hive table on top of AVRO based files, we do not need to specify the column names, hive automatically figures out the schema through the SerDe.

      CREATE EXTERNAL TABLE campaigns
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
      LOCATION '/user/pratik/campaigns'
      TBLPROPERTIES ('avro.schema.url'='hdfs:///user/pratik/campaigns.avsc');
      

        Attachments

          Activity

            People

            • Assignee:
              singhashish Ashish Singh
              Reporter:
              tispratik Pratik Khadloya
            • Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated: