Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-47

SERDE backed schema for parquet storage in Hive

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parquet-mr
    • None

    Description

      As of now, for a hive table stored as parquet, the schema can only be specified in Hive MetaStore. For our use-case, it is desired that the schema be provided by Thrift SerDe rather than MetaStore. Using thrift IDL as a schema provider, allows us to maintain a consistent schema across executions engines other than Hive such as Pig and Native MR.

      Additionally, for a large sparse schema, it is much easier to build thrift objects, and use parquet-thrift/elephant-bird to convert them into columns/tuples rather than constructing the whole big tuple itself.

      Attachments

        Activity

          People

            singhashish Ashish Singh
            abhishek.agarwal Abhishek Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: