Details
-
New Feature
-
Status: Reopened
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
Description
ORCFile format is currently one of the most efficient storage formats on HDFS from both the storage and search speed perspective, and it's a well supported standard.
This feature would receive an input stream, map its columns to the columns in a Hive table, and write it to HDFS in ORC format. It would need to support hive bucketing and dynamic hive partitioning, and generate the appropriate metadata in the Hive database.