Hive
  1. Hive
  2. HIVE-2732

Reduce Sink deduplication fails if the child reduce sink is followed by a join

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      set hive.optimize.reducededuplication=true;
      set hive.auto.convert.join=true;
      explain select * from (select * from src distribute by key sort by key) a join src b on a.key = b.key;

      fails with the following exception

      java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.SelectOperator cannot be cast to org.apache.hadoop.hive.ql.exec.ReduceSinkOperator
      at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:313)
      at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:226)
      at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:174)
      at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:287)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
      at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
      at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:68)
      at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:72)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:7019)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7312)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
      at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

      If hive.auto.convert.join is set to false, it produces an incorrect plan where the two halves of the join are processed in two separate map reduce tasks, and the reducers of these two tasks both contain the join operator resulting in an exception.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Navis
              Reporter:
              Kevin Wilfong
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development