Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3552

HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      This is a follow up for HIVE-3433.

      Had a offline discussion with Sambavi - she pointed out a scenario where the
      implementation in HIVE-3433 will not scale. Assume that the user is performing
      a cube on many columns, say '8' columns. So, each row would generate 256 rows
      for the hash table, which may kill the current group by implementation.

      A better implementation would be to add an additional mr job - in the first
      mr job perform the group by assuming there was no cube. Add another mr job, where
      you would perform the cube. The assumption is that the group by would have
      decreased the output data significantly, and the rows would appear in the order of
      grouping keys which has a higher probability of hitting the hash table.

        Attachments

        1. hive.3552.9.patch
          226 kB
          Namit Jain
        2. hive.3552.8.patch
          226 kB
          Namit Jain
        3. hive.3552.7.patch
          221 kB
          Namit Jain
        4. hive.3552.6.patch
          221 kB
          Namit Jain
        5. hive.3552.5.patch
          221 kB
          Namit Jain
        6. hive.3552.4.patch
          219 kB
          Namit Jain
        7. hive.3552.3.patch
          219 kB
          Namit Jain
        8. hive.3552.2.patch
          179 kB
          Namit Jain
        9. hive.3552.12.patch
          226 kB
          Namit Jain
        10. hive.3552.11.patch
          226 kB
          Namit Jain
        11. hive.3552.10.patch
          226 kB
          Namit Jain
        12. hive.3552.1.patch
          180 kB
          Namit Jain

          Issue Links

            Activity

              People

              • Assignee:
                namit Namit Jain
                Reporter:
                namit Namit Jain
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: