Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.4.0
-
None
-
None
Description
Hive uses hash partitioner to distribute keys to reducers and thus creating hash bucketed tables/partitions. There are some cases where range partitioning will help in further query processing such as joins/filters.
Terasort (http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html) seems to have implemented a sampling based range partitioner and Hive can reuse this or implement something similar.