Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6362

Queries don't make progress due to what seems like a memory reservation deadlock while running the stress tests

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.12.0
    • Impala 2.12.0
    • Backend

    Description

      Queries stopped making progress, many of the fragment threads are trying to increase or decrease memory reservation and non of those threads is making progress.

      Did some quick analysis on the threads and I couldn't find any thread making progress, so this might be a deadlock.
      cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt | grep 0x0000000001b01006 -B 4 | awk '

      {print $4}' | sort -nr | uniq -c | sort -nr
      1312 impala::SpinLock::lock()
      1312 impala::ReservationTracker::IncreaseReservationInternalLocked(long,
      1312 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&)
      1312 base::SpinLock::SlowLock()
      1312 base::SpinLock::Lock()
      1311
      cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt | grep 0x0000000001b017c6 -B 4 | awk '{print $4}

      ' | sort -nr | uniq -c | sort -nr
      688 impala::ReservationTracker::DecreaseReservation(long,
      688 impala::ReservationTracker::DecreaseReservationLocked(long,
      400 impala::SpinLock::lock()
      400 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&)
      400 base::SpinLock::Lock()
      399

      #0  0x0000000003bd6944 in sys_futex ()
      #1  0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #2  0x0000000003bd6835 in base::SpinLock::SlowLock() ()
      #3  0x00000000015f75fd in base::SpinLock::Lock() ()
      #4  0x00000000015f7672 in impala::SpinLock::lock() ()
      #5  0x00000000015f8d4c in boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) ()
      #6  0x0000000001b015bf in impala::ReservationTracker::DecreaseReservation(long, bool) ()
      #7  0x0000000001b017c6 in impala::ReservationTracker::DecreaseReservationLocked(long, bool) ()
      #8  0x0000000001b015d6 in impala::ReservationTracker::DecreaseReservation(long, bool) ()
      #9  0x0000000001b017c6 in impala::ReservationTracker::DecreaseReservationLocked(long, bool) ()
      #10 0x0000000001b015d6 in impala::ReservationTracker::DecreaseReservation(long, bool) ()
      #11 0x00000000018aabf0 in impala::ReservationTracker::DecreaseReservation(long) ()
      #12 0x00000000018aaaee in impala::InitialReservations::Return(impala::BufferPool::ClientHandle*, long) ()
      #13 0x0000000001b5e8e9 in impala::ExecNode::Close(impala::RuntimeState*) ()
      #14 0x000000000293ef2c in impala::BlockingJoinNode::Close(impala::RuntimeState*) ()
      #15 0x00000000028d639f in impala::PartitionedHashJoinNode::Close(impala::RuntimeState*) ()
      #16 0x00000000018a51aa in impala::FragmentInstanceState::Close() ()
      #17 0x00000000018a24b8 in impala::FragmentInstanceState::Exec() ()
      #18 0x000000000188afe6 in impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
      #19 0x0000000001889886 in impala::QueryState::StartFInstances()::{lambda()#1}::operator()() const ()
      #20 0x000000000188bc25 in boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::{lambda()#1}, void>::invoke(boost::detail::function::function_buffer&) ()
      
      #0  0x0000000003bd6944 in sys_futex ()
      #1  0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #2  0x0000000003bd6835 in base::SpinLock::SlowLock() ()
      #3  0x00000000015f75fd in base::SpinLock::Lock() ()
      #4  0x00000000015f7672 in impala::SpinLock::lock() ()
      #5  0x00000000015f8d4c in boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) ()
      #6  0x0000000001b01006 in impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, bool, impala::Status*) ()
      #7  0x0000000001b01031 in impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, bool, impala::Status*) ()
      #8  0x0000000001b01031 in impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, bool, impala::Status*) ()
      #9  0x0000000001b006f5 in impala::ReservationTracker::IncreaseReservationToFit(long, impala::Status*) ()
      #10 0x0000000001af738e in impala::BufferPool::ClientHandle::IncreaseReservationToFit(long) ()
      #11 0x0000000002c66574 in impala::BufferedTupleStream::AdvanceWritePage(long, bool*) ()
      #12 0x0000000002c692d9 in impala::BufferedTupleStream::AddRowCustomBeginSlow(long, impala::Status*) ()
      #13 0x0000000002c69111 in impala::BufferedTupleStream::AddRowSlow(impala::TupleRow*, impala::Status*) ()
      #14 0x0000000002c69b5e in impala::BufferedTupleStream::AddRow(impala::TupleRow*, impala::Status*) ()
      #15 0x00007f059f628148 in impala::PhjBuilder::ProcessBuildBatch ()
      #16 0x000000000295e10c in impala::PhjBuilder::Send(impala::RuntimeState*, impala::RowBatch*) ()
      #17 0x0000000002941fda in impala::Status impala::BlockingJoinNode::SendBuildInputToSink<false>(impala::RuntimeState*, impala::DataSink*) ()
      #18 0x000000000293fe59 in impala::BlockingJoinNode::ProcessBuildInputAndOpenProbe(impala::RuntimeState*, impala::DataSink*) ()
      #19 0x00000000028d58cf in impala::PartitionedHashJoinNode::Open(impala::RuntimeState*) ()
      

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: