Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5199

Impala may hang on empty row batch exchange

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Distributed Exec
    • Labels:

      Description

      Hang reproduction:

      Start impala with the following stress timeouts (the same as those in test_exchange_delays):

      bin/start-impala-cluster.py --impalad_args=--stress_datastream_recvr_delay_ms=10000 --impalad_args=--datastream_sender_timeout_ms=5000
      

      Do a join where 0 rows will be exchanged:

      select count(*) from tpch.lineitem inner join tpch.orders on l_orderkey=o_orderkey where o_orderkey < 0
      

      When the data-stream-sender sends 0 rows, it sends it with params.row_batch.num_rows=0 and params.eos=true.

      The receiver hits the following case:
      https://github.com/apache/incubator-impala/blob/master/be/src/service/impala-server.cc#L1133
      It ultimately times out from FindRecvrOrWait() after datastream_sender_timeout_ms.
      https://github.com/apache/incubator-impala/blob/0d0c93ec8c4949940ec113192731f2adb66a0c5e/be/src/runtime/data-stream-mgr.cc#L123

      Due to stress_datastream_recvr_delay_ms, the exchange node is Prepare()'d even later.
      https://github.com/apache/incubator-impala/blob/a50c344077f6c9bbea3d3cbaa2e9146ba20ac9a9/be/src/exec/exchange-node.cc#L74

      Although this is a testing flag, this behavior is reasonable to expect in a real world setting under heavy stress, since the stress_datastream_recvr_delay_ms flag was added into testing after some users experienced nodes receiving data from DataStreamSenders and timing out after datastream_sender_timeout_ms, and before the ExchangeNodes were created. Thus, the CloseSender() function is called before the exchange node is created.

      After stress_datastream_recvr_delay_ms, the ExchangeNode is created and it calls CreateRecvr().
      https://github.com/apache/incubator-impala/blob/a50c344077f6c9bbea3d3cbaa2e9146ba20ac9a9/be/src/exec/exchange-node.cc#L80

      Then ExchangeNode::Open() calls DataStreamRecvr::CreateMerger() which makes a few threads wait on DataStreamRecvr::SenderQueue::GetBatch().
      https://github.com/apache/incubator-impala/blob/0d0c93ec8c4949940ec113192731f2adb66a0c5e/be/src/runtime/data-stream-recvr.cc#L124

      However, since the DataStreamSender has already received the timeout error from earlier, the receiver threads in GetBatch() will wait indefinitely until the query is cancelled by the user.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                sailesh Sailesh Mukil
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: