Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This is a follow up for HIVE-3433.
Had a offline discussion with Sambavi - she pointed out a scenario where the
implementation in HIVE-3433 will not scale. Assume that the user is performing
a cube on many columns, say '8' columns. So, each row would generate 256 rows
for the hash table, which may kill the current group by implementation.
A better implementation would be to add an additional mr job - in the first
mr job perform the group by assuming there was no cube. Add another mr job, where
you would perform the cube. The assumption is that the group by would have
decreased the output data significantly, and the rows would appear in the order of
grouping keys which has a higher probability of hitting the hash table.
Attachments
Attachments
Issue Links
- depends upon
-
HIVE-3433 Implement CUBE and ROLLUP operators in Hive
- Closed