Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23010

IllegalStateException in tez.ReduceRecordProcessor when containers are being reused

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • None
    • None

    Description

      When executing a query in Hive that runs a filesink, mergejoin and two group by operators in a single reduce vertex (reducer 2 in simplified-explain.txt), the following exception occurs non-deterministically:

      java.lang.RuntimeException: java.lang.IllegalStateException: Was expecting dummy store operator but found: FS[17]
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
              at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
              at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
              at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
              at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
              at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
              at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
              at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
              at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.IllegalStateException: Was expecting dummy store operator but found: FS[17]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:421)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.getJoinParentOp(ReduceRecordProcessor.java:425)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:148)
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
              ... 16 more
      

      Looking at Yarn logs, IllegalStateException occurs in a container if and only if

      • the container has been running a task attempt of "Reducer 2" successfully before
      • the container is then being reused for another task attempt of the same "Reducer 2" vertex

      The same query runs fine with tez.am.container.reuse.enabled=false.

      Apparently, this error occurs deterministically within a container that is being reused for multiple task attempts of the same reduce vertex.

      We have not been able to reproduce this error deterministically or with a smaller execution plan due to low probability of container reuse for same vertex.

      Attachments

        1. simplified-explain.txt
          2 kB
          Sebastian Klemke

        Activity

          People

            Unassigned Unassigned
            packet Sebastian Klemke
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: