Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13282

GroupBy and select operator encounter ArrayIndexOutOfBoundsException

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 1.2.1, 2.0.0, 2.1.0
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      The group by and select operators run into the ArrayIndexOutOfBoundsException when they incorrectly initialize themselves with tag 0 but the incoming tag id is different.

      select count(*) from
      (select rt1.id from
      (select t1.key as id, t1.value as od from tab t1 group by key, value) rt1) vt1
      join
      (select rt2.id from
      (select t2.key as id, t2.value as od from tab_part t2 group by key, value) rt2) vt2
      where vt1.id=vt2.id;
      
      1. HIVE-13282.02.patch
        4 kB
        Sergey Shelukhin
      2. HIVE-13282.01.patch
        5 kB
        Matt McCline
      3. smb_fail_issue.patch
        0.6 kB
        Vikram Dixit K
      4. smb_groupby.q.out
        12 kB
        Matt McCline
      5. smb_groupby.q
        2 kB
        Matt McCline

        Activity

        Hide
        sershe Sergey Shelukhin added a comment -

        Attempting to rebase...

        Show
        sershe Sergey Shelukhin added a comment - Attempting to rebase...
        Hide
        mmccline Matt McCline added a comment -

        Probably not.

        Show
        mmccline Matt McCline added a comment - Probably not.
        Hide
        pxiong Pengcheng Xiong added a comment -

        Matt McCline, will this go into 2.2.0? Thanks

        Show
        pxiong Pengcheng Xiong added a comment - Matt McCline , will this go into 2.2.0? Thanks
        Hide
        vgumashta Vaibhav Gumashta added a comment -

        Removing target 1.2.2. Matt McCline feel free to add it back if you feel this is a must have in 1.2.2

        Show
        vgumashta Vaibhav Gumashta added a comment - Removing target 1.2.2. Matt McCline feel free to add it back if you feel this is a must have in 1.2.2
        Hide
        vgumashta Vaibhav Gumashta added a comment -

        Matt McCline Is it ok to target for 1.3 and remove target 1.2.2 (minor release)?

        Show
        vgumashta Vaibhav Gumashta added a comment - Matt McCline Is it ok to target for 1.3 and remove target 1.2.2 (minor release)?
        Hide
        mmccline Matt McCline added a comment -

        Vikram had worked on it. I took over. There were 2 issues I recall. One was the index out of bounds but when that was fixed another issue wrong results appeared. It had to do with handling of operator close when there are multiple parents. I wasn't able to work that one very well.

        Show
        mmccline Matt McCline added a comment - Vikram had worked on it. I took over. There were 2 issues I recall. One was the index out of bounds but when that was fixed another issue wrong results appeared. It had to do with handling of operator close when there are multiple parents. I wasn't able to work that one very well.
        Hide
        wzheng Wei Zheng added a comment - - edited

        Matt McCline I'm wondering about the status of this issue. I saw a similar backtrace

        Status: Failed
        Vertex failed, vertexName=Reducer 2, vertexId=vertex_1484780661793_0061_1_02, diagnostics=[Task failed, taskId=task_1484780661793_0061_1_02_000002, diagnostics=[TaskAttempt 0 failed, info=[Error:
                at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
                at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
                at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
                at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
                at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1668)
                at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
                at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
                at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
                at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                at java.lang.Thread.run(Thread.java:745)
        Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":-2147185208},"value":null}
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302)
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249)
                at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
                ... 14 more
        Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":-2147185208},"value":null}
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
                ... 16 more
        Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"
                at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:419)
                at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:387)
                at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:212)
                at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
                at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1016)
                at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:821)
                at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
                at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
                ... 17 more
        Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"_col0":-2147270511},"value":null}
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302)
                at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:411)
                ... 25 more
        Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"_col0":-2147270511},"value":null}
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
                ... 26 more
        Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
                at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:708)
                at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
                ... 27 more
        
        Show
        wzheng Wei Zheng added a comment - - edited Matt McCline I'm wondering about the status of this issue. I saw a similar backtrace Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1484780661793_0061_1_02, diagnostics=[Task failed, taskId=task_1484780661793_0061_1_02_000002, diagnostics=[TaskAttempt 0 failed, info=[Error: at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1668) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) { "key" :{ "_col0" :-2147185208}, "value" : null } at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) { "key" :{ "_col0" :-2147185208}, "value" : null } at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) { "key" :{" at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:419) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:387) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:212) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1016) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:821) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695) at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 17 more Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) { "key" :{ "_col0" :-2147270511}, "value" : null } at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:411) ... 25 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) { "key" :{ "_col0" :-2147270511}, "value" : null } at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 26 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:708) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 27 more
        Hide
        mmccline Matt McCline added a comment -

        No, this patch is not needed for 2.1.1

        Show
        mmccline Matt McCline added a comment - No, this patch is not needed for 2.1.1
        Hide
        spena Sergio Peña added a comment -

        Vikram Dixit K Is this patch still needed for 2.1.1? We're looking to release an RC this or next week.

        Show
        spena Sergio Peña added a comment - Vikram Dixit K Is this patch still needed for 2.1.1? We're looking to release an RC this or next week.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Removing 2.1.0 target as issue is not tagged as Critical/Blocker and the RC will be created tomorrow. Please feel free to commit to branch-2.1 anyway and fix for 2.1.0 if this happens before the release.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Removing 2.1.0 target as issue is not tagged as Critical/Blocker and the RC will be created tomorrow. Please feel free to commit to branch-2.1 anyway and fix for 2.1.0 if this happens before the release.
        Hide
        mmccline Matt McCline added a comment -

        Work-In-Progress.

        Changed GroupByOperator.process to ignore the tag and just use 0. This is what was causing the ArrayIndexOutOfBoundsException. Is this the correct solution here?

        Also, experimented with deferring close on the SMBMapJoinOperator and CommonMergeJoinOperator (Tez) until all the parents have closed. This produces wrong results for the ORDER BY query (0 instead of 480) and the new GROUP BY query (0 instead of ???) in tez_smb_1.q

        Show
        mmccline Matt McCline added a comment - Work-In-Progress. Changed GroupByOperator.process to ignore the tag and just use 0. This is what was causing the ArrayIndexOutOfBoundsException. Is this the correct solution here? Also, experimented with deferring close on the SMBMapJoinOperator and CommonMergeJoinOperator (Tez) until all the parents have closed. This produces wrong results for the ORDER BY query (0 instead of 480) and the new GROUP BY query (0 instead of ???) in tez_smb_1.q
        Hide
        vikram.dixit Vikram Dixit K added a comment -

        Matt McCline Can you try the patch I have attached above? I think that is how I was able to repro the issue. I think with the patch attached here, it produces a result sometimes but that is wrong (if you switch around the tables, it may even throw an exception).

        Show
        vikram.dixit Vikram Dixit K added a comment - Matt McCline Can you try the patch I have attached above? I think that is how I was able to repro the issue. I think with the patch attached here, it produces a result sometimes but that is wrong (if you switch around the tables, it may even throw an exception).
        Hide
        mmccline Matt McCline added a comment -

        Vikram Dixit K I have attached a Q where I attempt to repro the problem but I can't seem to coax the Optimizer to do an Sorted Merge Bucket Map Join Operator....

        Show
        mmccline Matt McCline added a comment - Vikram Dixit K I have attached a Q where I attempt to repro the problem but I can't seem to coax the Optimizer to do an Sorted Merge Bucket Map Join Operator....
        Hide
        vikram.dixit Vikram Dixit K added a comment - - edited

        Yes. We can move this out to 2.1.0. This only happens in case of reduce side SMB in tez. We have a simple workaround right now that will address this (disable smb join in this case). The real fix would take a lot of refactoring the code which is more suited for master than a maintenance release.

        Show
        vikram.dixit Vikram Dixit K added a comment - - edited Yes. We can move this out to 2.1.0. This only happens in case of reduce side SMB in tez. We have a simple workaround right now that will address this (disable smb join in this case). The real fix would take a lot of refactoring the code which is more suited for master than a maintenance release.
        Hide
        sershe Sergey Shelukhin added a comment -

        This is targeting 2.0.1 but has no patch. Should it be moved out to 2.1.0?

        Show
        sershe Sergey Shelukhin added a comment - This is targeting 2.0.1 but has no patch. Should it be moved out to 2.1.0?

          People

          • Assignee:
            mmccline Matt McCline
            Reporter:
            vikram.dixit Vikram Dixit K
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development