Hive
  1. Hive
  2. HIVE-2329

Not using map aggregation, fails to execute group-by after cluster-by with same key

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      hive.map.aggr=false
      select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1

      resulted..

      FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

      from hadoop logs..

      Caused by: java.lang.RuntimeException: cannot find field key from []
      at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
      at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
      at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
      at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
      ........

      I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

        Issue Links

          Activity

          Hide
          Navis added a comment -

          simple walk-around

          Show
          Navis added a comment - simple walk-around
          Hide
          John Sichi added a comment -

          Good catch. Can you submit an updated patch which includes a testcase (.q and .q.out) files, and then create a Review Board entry?

          https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess

          Show
          John Sichi added a comment - Good catch. Can you submit an updated patch which includes a testcase (.q and .q.out) files, and then create a Review Board entry? https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1313/
          -----------------------------------------------------------

          Review request for hive.

          Summary
          -------

          If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY.

          This addresses bug HIVE-2329.
          https://issues.apache.org/jira/browse/HIVE-2329

          Diffs


          ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION
          ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5

          Diff: https://reviews.apache.org/r/1313/diff

          Testing
          -------

          Thanks,

          Navis

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1313/ ----------------------------------------------------------- Review request for hive. Summary ------- If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY. This addresses bug HIVE-2329 . https://issues.apache.org/jira/browse/HIVE-2329 Diffs ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 Diff: https://reviews.apache.org/r/1313/diff Testing ------- Thanks, Navis
          Hide
          Carl Steinbach added a comment -

          Please change the status to Patch Available if this is ready for review. Thanks.

          Show
          Carl Steinbach added a comment - Please change the status to Patch Available if this is ready for review. Thanks.
          Hide
          Navis added a comment -

          rebased to trunk

          Show
          Navis added a comment - rebased to trunk
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1313/
          -----------------------------------------------------------

          (Updated 2011-12-07 08:15:06.858543)

          Review request for hive, John Sichi and Carl Steinbach.

          Changes
          -------

          rebased to trunk

          Summary
          -------

          If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY.

          This addresses bug HIVE-2329.
          https://issues.apache.org/jira/browse/HIVE-2329

          Diffs (updated)


          ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5
          ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION
          ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/1313/diff

          Testing (updated)
          -------

          test added : reduce_deduplicate_exclude_gby.q

          Thanks,

          Navis

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1313/ ----------------------------------------------------------- (Updated 2011-12-07 08:15:06.858543) Review request for hive, John Sichi and Carl Steinbach. Changes ------- rebased to trunk Summary ------- If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY. This addresses bug HIVE-2329 . https://issues.apache.org/jira/browse/HIVE-2329 Diffs (updated) ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1313/diff Testing (updated) ------- test added : reduce_deduplicate_exclude_gby.q Thanks, Navis
          Hide
          Phabricator added a comment -

          njain requested code review of "HIVE-2329 [jira] Not using map aggregation, fails to execute group-by after cluster-by with same key".
          Reviewers: JIRA

          HIVE-2329

          hive.map.aggr=false
          select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1

          resulted..

          FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

          from hadoop logs..

          Caused by: java.lang.RuntimeException: cannot find field key from []
          at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
          at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
          at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
          at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
          at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
          at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
          ........

          I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D657

          AFFECTED FILES
          ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out
          ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/1467/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - njain requested code review of " HIVE-2329 [jira] Not using map aggregation, fails to execute group-by after cluster-by with same key". Reviewers: JIRA HIVE-2329 hive.map.aggr=false select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1 resulted.. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask from hadoop logs.. Caused by: java.lang.RuntimeException: cannot find field key from [] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) ........ I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D657 AFFECTED FILES ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/1467/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          Namit Jain added a comment -

          +1

          Show
          Namit Jain added a comment - +1
          Hide
          Namit Jain added a comment -

          Committed. Thanks Navis

          Show
          Namit Jain added a comment - Committed. Thanks Navis
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.23.0 #12 (See https://builds.apache.org/job/Hive-trunk-h0.23.0/12/)
          HIVE-2329 Not using map aggregation, fails to execute group-by after
          cluster-by with same key (Navis via namit)

          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212551
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
          • /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
          • /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.23.0 #12 (See https://builds.apache.org/job/Hive-trunk-h0.23.0/12/ ) HIVE-2329 Not using map aggregation, fails to execute group-by after cluster-by with same key (Navis via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212551 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1137 (See https://builds.apache.org/job/Hive-trunk-h0.21/1137/)
          HIVE-2329 Not using map aggregation, fails to execute group-by after
          cluster-by with same key (Navis via namit)

          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212551
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
          • /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
          • /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1137 (See https://builds.apache.org/job/Hive-trunk-h0.21/1137/ ) HIVE-2329 Not using map aggregation, fails to execute group-by after cluster-by with same key (Navis via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212551 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out
          Hide
          Phabricator added a comment -

          test123 has commented on the revision "HIVE-2329 [jira] Not using map aggregation, fails to execute group-by after cluster-by with same key".

          REVISION DETAIL
          https://reviews.facebook.net/D657

          Show
          Phabricator added a comment - test123 has commented on the revision " HIVE-2329 [jira] Not using map aggregation, fails to execute group-by after cluster-by with same key". REVISION DETAIL https://reviews.facebook.net/D657
          Hide
          Ashutosh Chauhan added a comment -

          This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

          Show
          Ashutosh Chauhan added a comment - This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

            People

            • Assignee:
              Navis
              Reporter:
              Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development