Hive
  1. Hive
  2. HIVE-2397

Support with rollup option for group by

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      We should support the ROLLUP operator similar to the way MySQL is implemented.

      Exerted from MySQL documents:

      mysql> SELECT year, country, product, SUM(profit)
      -> FROM sales
      -> GROUP BY year, country, product WITH ROLLUP;
      ----------------------------------+

      year country product SUM(profit)

      ----------------------------------+

      2000 Finland Computer 1500
      2000 Finland Phone 100
      2000 Finland NULL 1600
      2000 India Calculator 150
      2000 India Computer 1200
      2000 India NULL 1350
      2000 USA Calculator 75
      2000 USA Computer 1500
      2000 USA NULL 1575
      2000 NULL NULL 4525
      2001 Finland Phone 10
      2001 Finland NULL 10
      2001 USA Calculator 50
      2001 USA Computer 2700
      2001 USA TV 250
      2001 USA NULL 3000
      2001 NULL NULL 3010
      NULL NULL NULL 7535

      ----------------------------------+

      http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html

      1. HIVE-2397.2.patch.txt
        705 kB
        Kevin Wilfong
      2. HIVE-2397.3.patch.txt
        706 kB
        Kevin Wilfong
      3. HIVE-2397.4.patch.txt
        723 kB
        Kevin Wilfong
      4. HIVE-2397.5.patch.txt
        723 kB
        Kevin Wilfong

        Issue Links

          Activity

          Kevin Wilfong created issue -
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1637/
          -----------------------------------------------------------

          Review request for hive, Ning Zhang and Siying Dong.

          Summary
          -------

          If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task.

          I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed.

          If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation.

          If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set.

          I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was.

          Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this.

          This addresses bug HIVE-2397.
          https://issues.apache.org/jira/browse/HIVE-2397

          Diffs


          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895
          trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895

          Diff: https://reviews.apache.org/r/1637/diff

          Testing
          -------

          For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant)

          I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set.

          I also ran the original group by tests to verify I did not break anything.

          Thanks,

          Kevin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- Review request for hive, Ning Zhang and Siying Dong. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397 . https://issues.apache.org/jira/browse/HIVE-2397 Diffs trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1637/
          -----------------------------------------------------------

          (Updated 2011-08-25 19:23:58.241135)

          Review request for hive, Ning Zhang and Siying Dong.

          Changes
          -------

          Ning pointed out to me that the hash based aggregation was being done in the mapper when map aggregation was turned off.

          I fixed this, so now it is being done in the reducer.

          Summary
          -------

          If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task.

          I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed.

          If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation.

          If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set.

          I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was.

          Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this.

          This addresses bug HIVE-2397.
          https://issues.apache.org/jira/browse/HIVE-2397

          Diffs (updated)


          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895
          trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895

          Diff: https://reviews.apache.org/r/1637/diff

          Testing
          -------

          For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant)

          I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set.

          I also ran the original group by tests to verify I did not break anything.

          Thanks,

          Kevin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- (Updated 2011-08-25 19:23:58.241135) Review request for hive, Ning Zhang and Siying Dong. Changes ------- Ning pointed out to me that the hash based aggregation was being done in the mapper when map aggregation was turned off. I fixed this, so now it is being done in the reducer. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397 . https://issues.apache.org/jira/browse/HIVE-2397 Diffs (updated) trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin
          Kevin Wilfong made changes -
          Field Original Value New Value
          Attachment HIVE-2397.2.patch.txt [ 12491681 ]
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1637/
          -----------------------------------------------------------

          (Updated 2011-08-25 22:30:34.960572)

          Review request for hive, Ning Zhang and Siying Dong.

          Changes
          -------

          Fixes a bug.

          Previously, if hash aggregation wasn't reducing the number of entries, the aggregation was switched to sort based. However, if rollup is being used, hash aggregation needs to be kept on regardless because only hash aggregation can handle distincts.

          Summary
          -------

          If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task.

          I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed.

          If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation.

          If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set.

          I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was.

          Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this.

          This addresses bug HIVE-2397.
          https://issues.apache.org/jira/browse/HIVE-2397

          Diffs (updated)


          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895
          trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895

          Diff: https://reviews.apache.org/r/1637/diff

          Testing
          -------

          For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant)

          I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set.

          I also ran the original group by tests to verify I did not break anything.

          Thanks,

          Kevin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- (Updated 2011-08-25 22:30:34.960572) Review request for hive, Ning Zhang and Siying Dong. Changes ------- Fixes a bug. Previously, if hash aggregation wasn't reducing the number of entries, the aggregation was switched to sort based. However, if rollup is being used, hash aggregation needs to be kept on regardless because only hash aggregation can handle distincts. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397 . https://issues.apache.org/jira/browse/HIVE-2397 Diffs (updated) trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin
          Kevin Wilfong made changes -
          Attachment HIVE-2397.3.patch.txt [ 12491699 ]
          Hide
          Kevin Wilfong added a comment -

          I create a Design Doc in the Hive Wiki describing the changes to the query plan produced by this change. https://cwiki.apache.org/confluence/display/Hive/GroupByWithRollup

          Show
          Kevin Wilfong added a comment - I create a Design Doc in the Hive Wiki describing the changes to the query plan produced by this change. https://cwiki.apache.org/confluence/display/Hive/GroupByWithRollup
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1637/
          -----------------------------------------------------------

          (Updated 2011-09-07 01:34:25.870854)

          Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong.

          Summary
          -------

          If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task.

          I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed.

          If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation.

          If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set.

          I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was.

          Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this.

          This addresses bug HIVE-2397.
          https://issues.apache.org/jira/browse/HIVE-2397

          Diffs


          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895
          trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895

          Diff: https://reviews.apache.org/r/1637/diff

          Testing
          -------

          For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant)

          I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set.

          I also ran the original group by tests to verify I did not break anything.

          Thanks,

          Kevin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- (Updated 2011-09-07 01:34:25.870854) Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397 . https://issues.apache.org/jira/browse/HIVE-2397 Diffs trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1160895 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1160895 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1160895 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin
          Kevin Wilfong made changes -
          Attachment HIVE-2397.4.patch.txt [ 12493516 ]
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1637/
          -----------------------------------------------------------

          (Updated 2011-09-07 21:47:48.620408)

          Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong.

          Changes
          -------

          Two changes:

          1) Added groupby_withrollup.q.out, but forgot to add groupby_withrollup.q

          2) I modified the method modifyAggregations in GroupbyOperator to take a list of AggregationBuffer[]'s. This makes it so that we only need to evaluate each row of input once, and then we can update all the rollup aggregations at once, instead of reevaluating the same row of input for each rollup row.

          Summary
          -------

          If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task.

          I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed.

          If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation.

          If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set.

          I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was.

          Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this.

          This addresses bug HIVE-2397.
          https://issues.apache.org/jira/browse/HIVE-2397

          Diffs (updated)


          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1166376
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1166376
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1166376
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1166376
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1166376
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1166376
          trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_withrollup.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1166376

          Diff: https://reviews.apache.org/r/1637/diff

          Testing
          -------

          For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant)

          I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set.

          I also ran the original group by tests to verify I did not break anything.

          Thanks,

          Kevin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- (Updated 2011-09-07 21:47:48.620408) Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong. Changes ------- Two changes: 1) Added groupby_withrollup.q.out, but forgot to add groupby_withrollup.q 2) I modified the method modifyAggregations in GroupbyOperator to take a list of AggregationBuffer[]'s. This makes it so that we only need to evaluate each row of input once, and then we can update all the rollup aggregations at once, instead of reevaluating the same row of input for each rollup row. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397 . https://issues.apache.org/jira/browse/HIVE-2397 Diffs (updated) trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1166376 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1166376 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1166376 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1166376 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1166376 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1166376 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1166376 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1637/
          -----------------------------------------------------------

          (Updated 2011-10-14 21:14:02.700645)

          Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong.

          Changes
          -------

          It's been a while, so I ran svn up.

          Summary
          -------

          If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task.

          I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed.

          If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation.

          If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation.

          If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set.

          I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was.

          Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this.

          This addresses bug HIVE-2397.
          https://issues.apache.org/jira/browse/HIVE-2397

          Diffs (updated)


          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1183502
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1183502
          trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1183502
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1183502
          trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1183502
          trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1183502
          trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION
          trunk/ql/src/test/queries/clientpositive/groupby_withrollup.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION
          trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1183502

          Diff: https://reviews.apache.org/r/1637/diff

          Testing
          -------

          For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant)

          I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set.

          I also ran the original group by tests to verify I did not break anything.

          Thanks,

          Kevin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1637/ ----------------------------------------------------------- (Updated 2011-10-14 21:14:02.700645) Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong. Changes ------- It's been a while, so I ran svn up. Summary ------- If a user runs a query that includes group by ... with rollup, the behavior is like that of MySQL, see the task. I had to implement 4 different ways of providing this behavior to fit in with the 4 different ways of implementing group by depending on whether map aggregation is allowed, and whether the data is known to be skewed. If map aggregation is allowed, it is a simple matter of adding new keys with an increasing number of NULLs to the hash map to collect the data for the new rows as part of the map side hash aggregation. If map aggregation is not allowed and the data is not skewed, I perform a reduce job which performs a hash aggregation very similar to the way it is performed on the map side. I then perform a mergepartial reduce job to perform a final aggregation on the hash aggregation. If map aggregation is not allowed and the data is skewed and there are no distinct aggregations for the group by, I aggregate data for the new rows with NULLs as part of the non-hash aggregation. This was as simple as adding the new functionality to the map side hash aggregation. If map aggregation is not allowed and the data is skewed and there are distinct aggregations fro the group by, I perform a reduce job which performs a hash aggregation, and then use the same implementation that is used when there is no rollup option set. I have done my best not to detract from any optimizations that were made for each of the four different implementations of group by, but, r for the ones where I add a new reduce job, I am not sure how successful I was. Currently, the optimizations for multiple group bys is not supported for queries with the rollup option set, but I am continuing to look into this. This addresses bug HIVE-2397 . https://issues.apache.org/jira/browse/HIVE-2397 Diffs (updated) trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1183502 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1183502 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1183502 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1183502 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1183502 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1183502 trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/groupby_withrollup.q PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java 1183502 Diff: https://reviews.apache.org/r/1637/diff Testing ------- For each existing group by test that looked like it might be relevant, I added a corresponding test with the rollup option set. (I had found a number of bugs as I was implementing this with these tests, so I am satisfied that they are indeed relevant) I also added a new test which provides good coverage of having multiple group by keys with both distinct and non-distinct aggregations in each of the four different implementations with the rollup operator set. I also ran the original group by tests to verify I did not break anything. Thanks, Kevin
          Kevin Wilfong made changes -
          Attachment HIVE-2397.5.patch.txt [ 12499099 ]
          Namit Jain made changes -
          Assignee Kevin Wilfong [ kevinwilfong ] Namit Jain [ namit ]
          Namit Jain made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Namit Jain added a comment -

          duplicate of HIVE-3433

          Show
          Namit Jain added a comment - duplicate of HIVE-3433
          Namit Jain made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Namit Jain made changes -
          Link This issue is cloned as HIVE-3433 [ HIVE-3433 ]
          Ashutosh Chauhan made changes -
          Fix Version/s 0.10.0 [ 12320745 ]

            People

            • Assignee:
              Namit Jain
              Reporter:
              Kevin Wilfong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development