Hive
  1. Hive
  2. HIVE-2732

Reduce Sink deduplication fails if the child reduce sink is followed by a join

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      set hive.optimize.reducededuplication=true;
      set hive.auto.convert.join=true;
      explain select * from (select * from src distribute by key sort by key) a join src b on a.key = b.key;

      fails with the following exception

      java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.SelectOperator cannot be cast to org.apache.hadoop.hive.ql.exec.ReduceSinkOperator
      at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:313)
      at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:226)
      at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:174)
      at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:287)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
      at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:68)
      at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:72)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7019)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7312)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
      at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

      If hive.auto.convert.join is set to false, it produces an incorrect plan where the two halves of the join are processed in two separate map reduce tasks, and the reducers of these two tasks both contain the join operator resulting in an exception.

        Issue Links

          Activity

          Hide
          Navis added a comment -

          Join/MapJoins also should not be deduplicated.

          Show
          Navis added a comment - Join/MapJoins also should not be deduplicated.
          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-2732 [jira] Reduce Sink deduplication fails if the child reduce sink is followed by a join".
          Reviewers: JIRA

          DPAL-854 Reduce Sink deduplication fails if the child reduce sink is followed by a join

          set hive.optimize.reducededuplication=true;
          set hive.auto.convert.join=true;
          explain select * from (select * from src distribute by key sort by key) a join src b on a.key = b.key;

          fails with the following exception

          java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.SelectOperator cannot be cast to org.apache.hadoop.hive.ql.exec.ReduceSinkOperator
          at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:313)
          at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:226)
          at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:174)
          at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:287)
          at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
          at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
          at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
          at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:68)
          at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:72)
          at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7019)
          at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7312)
          at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
          at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
          at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
          at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
          at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
          at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
          at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
          at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
          at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
          at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
          at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

          If hive.auto.convert.join is set to false, it produces an incorrect plan where the two halves of the join are processed in two separate map reduce tasks, and the reducers of these two tasks both contain the join operator resulting in an exception.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D1809

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
          ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
          ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/3855/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - navis requested code review of " HIVE-2732 [jira] Reduce Sink deduplication fails if the child reduce sink is followed by a join". Reviewers: JIRA DPAL-854 Reduce Sink deduplication fails if the child reduce sink is followed by a join set hive.optimize.reducededuplication=true; set hive.auto.convert.join=true; explain select * from (select * from src distribute by key sort by key) a join src b on a.key = b.key; fails with the following exception java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.SelectOperator cannot be cast to org.apache.hadoop.hive.ql.exec.ReduceSinkOperator at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:313) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:226) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:174) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:287) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:68) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:72) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7019) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7312) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) If hive.auto.convert.join is set to false, it produces an incorrect plan where the two halves of the join are processed in two separate map reduce tasks, and the reducers of these two tasks both contain the join operator resulting in an exception. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D1809 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3855/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          Phabricator added a comment -

          navis updated the revision "HIVE-2732 [jira] Reduce Sink deduplication fails if the child reduce sink is followed by a join".
          Reviewers: JIRA

          1. Rebased & Fixed test case

          REVISION DETAIL
          https://reviews.facebook.net/D1809

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
          ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_join.q
          ql/src/test/results/clientpositive/reduce_deduplicate_exclude_join.q.out

          Show
          Phabricator added a comment - navis updated the revision " HIVE-2732 [jira] Reduce Sink deduplication fails if the child reduce sink is followed by a join". Reviewers: JIRA 1. Rebased & Fixed test case REVISION DETAIL https://reviews.facebook.net/D1809 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_join.q ql/src/test/results/clientpositive/reduce_deduplicate_exclude_join.q.out
          Hide
          Namit Jain added a comment -

          +1

          Running tests

          Show
          Namit Jain added a comment - +1 Running tests
          Hide
          Namit Jain added a comment -

          Committed. Thanks Navis

          Show
          Namit Jain added a comment - Committed. Thanks Navis
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1433 (See https://builds.apache.org/job/Hive-trunk-h0.21/1433/)
          HIVE-2732 Reduce Sink deduplication fails if the child reduce sink is followed by a join
          (Navis via namit) (Revision 1338871)

          Result = FAILURE
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1338871
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
          • /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_join.q
          • /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_join.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1433 (See https://builds.apache.org/job/Hive-trunk-h0.21/1433/ ) HIVE-2732 Reduce Sink deduplication fails if the child reduce sink is followed by a join (Navis via namit) (Revision 1338871) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1338871 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_join.q /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_join.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-2732 Reduce Sink deduplication fails if the child reduce sink is followed by a join
          (Navis via namit) (Revision 1338871)

          Result = ABORTED
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1338871
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
          • /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_join.q
          • /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_join.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2732 Reduce Sink deduplication fails if the child reduce sink is followed by a join (Navis via namit) (Revision 1338871) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1338871 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_join.q /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_join.q.out
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

            People

            • Assignee:
              Navis
              Reporter:
              Kevin Wilfong
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development