Uploaded image for project: 'IMPALA'
  2. IMPALA-1915

query hung in buffered block mgr and cannot be cancelled


    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.2
    • Fix Version/s: Impala 2.2
    • Component/s: None
    • Labels:


      I came across this after looping test_mem_usage_scaling. The problem is that two threads may be waiting for the same block (the main fragment thread and a build side thread), but only one of them will get it. Ippo had moved the exit check inside the wait loop in FindBuffer(), however the write callback only wakes one of the threads so the other thread will never notice that either the unpinned block list is exhausted or that it needs to initiate more writes (not all writes are initiated at once – there is a threshold).

      Since the query cannot be cancelled, this amounts to a leak of whatever resources the hung query is holding on to.

      Here's the stack for the hung fragment, and I had also checked that non_local_outstanding_writes_ is 0 while in this wait().

      Thread 2 (Thread 0x7fe9d2f4e700 (LWP 9125)):
      #0  0x0000003a69c0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000f086ae in boost::condition_variable::wait (this=0x9005c88, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:53
      #2  0x0000000000fa4241 in impala::BufferedBlockMgr::FindBuffer (this=0x9005c00, lock=..., buffer_desc=0x7fe9d2f4ca38) at /home/dhecht/src/Impala/be/src/runtime/buffered-block-mgr.cc:969
      #3  0x0000000000fa337a in impala::BufferedBlockMgr::FindBufferForBlock (this=0x9005c00, block=0x15ab4ea0, in_mem=0x7fe9d2f4ccbf) at /home/dhecht/src/Impala/be/src/runtime/buffered-block-mgr.cc:887
      #4  0x0000000000f9d5df in impala::BufferedBlockMgr::GetNewBlock (this=0x9005c00, client=0x1b5658f0, unpin_block=0x0, block=0x7fe9d2f4cec0, len=8388608) at /home/dhecht/src/Impala/be/src/runtime/buffered-\
      #5  0x00000000014ba2ad in impala::BufferedTupleStream::NewBlockForWrite (this=0x7ecef00, min_size=24, got_block=0x7fe9d2f4d0cf) at /home/dhecht/src/Impala/be/src/runtime/buffered-tuple-stream.cc:211
      #6  0x000000000159a004 in impala::BufferedTupleStream::AddRow (this=0x7ecef00, row=0x1cfc2ee0, dst=0x0) at /home/dhecht/src/Impala/be/src/runtime/buffered-tuple-stream.inline.h:29
      #7  0x00000000015a5393 in impala::PartitionedHashJoinNode::AppendRowStreamFull (this=0x1ee85800, stream=0x7ecef00, row=0x1cfc2ee0) at /home/dhecht/src/Impala/be/src/exec/partitioned-hash-join-node.cc:435
      #8  0x00007fe9e8544a61 in ?? ()
      #9  0x00007fe9d2f4d1c0 in ?? ()
      #10 0x0000000000ebf2ee in impala::ScopedTimer<impala::MonotonicStopWatch>::~ScopedTimer (this=0x7fe9d2f4d7d0, __in_chrg=<value optimized out>) at /home/dhecht/src/Impala/be/src/util/runtime-profile.h:732
      #11 0x00000000015a682a in impala::PartitionedHashJoinNode::ProcessBuildInput (this=0x1ee85800, state=0x808d800, level=0) at /home/dhecht/src/Impala/be/src/exec/partitioned-hash-join-node.cc:549
      #12 0x00000000015a5bc8 in impala::PartitionedHashJoinNode::ConstructBuildSide (this=0x1ee85800, state=0x808d800) at /home/dhecht/src/Impala/be/src/exec/partitioned-hash-join-node.cc:486
      #13 0x00000000015e3105 in impala::BlockingJoinNode::BuildSideThread (this=0x1ee85800, state=0x808d800, status=0x7fe9e15636d0) at /home/dhecht/src/Impala/be/src/exec/blocking-join-node.cc:133
      #14 0x00000000015e4d84 in boost::_mfi::mf2<void, impala::BlockingJoinNode, impala::RuntimeState*, impala::Promise<impala::Status>*>::operator() (this=0x16d6d710, p=0x1ee85800, a1=0x808d800, a2=0x7fe9e156\




            • Assignee:
              dhecht Dan Hecht
              dhecht Dan Hecht
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: