Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As of now, for a hive table stored as parquet, the schema can only be specified in Hive MetaStore. For our use-case, it is desired that the schema be provided by Thrift SerDe rather than MetaStore. Using thrift IDL as a schema provider, allows us to maintain a consistent schema across executions engines other than Hive such as Pig and Native MR.
Additionally, for a large sparse schema, it is much easier to build thrift objects, and use parquet-thrift/elephant-bird to convert them into columns/tuples rather than constructing the whole big tuple itself.