Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18270

count(distinct) using join and group by produce incorrect output when hive.auto.convert.join=false and hive.auto.convert.join.noconditionaltask=false

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.2.1, 2.1.1, 2.2.0, 2.3.0
    • None
    • None
    • None

    Description

      When I run the following query:
      explain
      SELECT foo.id, count(distinct foo.line_id) as factor from
      foo JOIN bar ON (foo.id = bar.id)
      WHERE foo.orders != 'blah'
      group by foo.id;

      The following error is got:
      java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
      at java.util.ArrayList.rangeCheck(ArrayList.java:635)
      at java.util.ArrayList.get(ArrayList.java:411)
      at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$AbsctractReducerReducerProc.merge(ReduceSinkDeDuplication.java:216)
      at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$JoinReducerProc.process(ReduceSinkDeDuplication.java:557)
      at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$AbsctractReducerReducerProc.process(ReduceSinkDeDuplication.java:166)
      at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
      at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
      at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication.transform(ReduceSinkDeDuplication.java:108)
      at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10201)
      at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
      at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
      at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
      at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
      at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

      It looks like it is a bug of ReduceSinkDeDuplication optimizer.

      Since the columns of count distinct need to be added into reduce key for sorting, the reducesink of group can't be replaced with the ones of join.

      In the case of count distinct query, reducesink of group should not be merged

      Attachments

        1. HIVE-18270.1.patch
          2 kB
          Zac Zhou
        2. HIVE-18270.2.patch
          8 kB
          Zac Zhou
        3. HIVE-18270.3.patch
          7 kB
          Zac Zhou

        Activity

          People

            yuan_zac Zac Zhou
            yuan_zac Zac Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: