Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5143

Crash while running/cancelling concurrent queries QueryExecState::ExecQueryOrDmlRequest query-exec-state.cc:469

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None

      Description

      While running concurrent queries and trying to cancel some of them Impala crashed stack below

      Mini dump attached.

      #0  0x00000036ed232625 in raise () from /lib64/libc.so.6
      #1  0x00000036ed233e05 in abort () from /lib64/libc.so.6
      #2  0x00007f5110f08a55 in os::abort(bool) () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #3  0x00007f5111088f87 in VMError::report_and_die() () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #4  0x00007f5110f0d96f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  0x0000000000dea2c8 in NoBarrier_CompareAndSwap (this=0x0, instance_id=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/gutil/atomicops-internals-x86.h:85
      #7  Acquire_CompareAndSwap (this=0x0, instance_id=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/gutil/atomicops-internals-x86.h:138
      #8  Lock (this=0x0, instance_id=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/gutil/spinlock.h:74
      #9  lock (this=0x0, instance_id=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/spinlock.h:34
      #10 lock_guard (this=0x0, instance_id=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/thread/lock_guard.hpp:38
      #11 impala::QueryState::GetFInstanceState (this=0x0, instance_id=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/query-state.cc:172
      #12 0x0000000000dcc14e in impala::Coordinator::Exec (this=0x7f489a730400) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/coordinator.cc:483
      #13 0x0000000000af97de in impala::ImpalaServer::QueryExecState::ExecQueryOrDmlRequest (this=0x7f447c350800, query_exec_request=Unhandled dwarf expression opcode 0xf3
      )
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/query-exec-state.cc:469
      #14 0x0000000000b00fd4 in impala::ImpalaServer::QueryExecState::Exec (this=0x7f447c350800, exec_request=0x7f507518db80)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/query-exec-state.cc:158
      #15 0x0000000000ab082d in impala::ImpalaServer::ExecuteInternal (this=0x9a86a00, query_ctx=..., session_state=Unhandled dwarf expression opcode 0xf3
      )
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/impala-server.cc:829
      #16 0x0000000000ab6688 in impala::ImpalaServer::Execute (this=0x9a86a00, query_ctx=0x7f507518f220, session_state=..., exec_state=0x7f507518f1e0)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/impala-server.cc:776
      #17 0x0000000000af25a6 in impala::ImpalaServer::query (this=0x9a86a00, query_handle=..., query=Unhandled dwarf expression opcode 0xf3
      )
      ---Type <return> to continue, or q <return> to quit---
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/impala-beeswax-server.cc:68
      
      #18 0x0000000000d50085 in beeswax::BeeswaxServiceProcessor::process_query (this=0x9018440, seqid=0, iprot=Unhandled dwarf expression opcode 0xf3
      )
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:2979
      #19 0x0000000000d53384 in beeswax::BeeswaxServiceProcessor::dispatchCall (this=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:2952
      
      #20 0x0000000000807b6c in apache::thrift::TDispatchProcessor::process (this=0x9018440, in=..., out=..., connectionContext=0x7f47121c1280)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/thrift-0.9.0-p8/include/thrift/TDispatchProcessor.h:121
      #21 0x0000000001b3ce8b in apache::thrift::server::TThreadPoolServer::Task::run() ()
      #22 0x0000000001b24a49 in apache::thrift::concurrency::ThreadManager::Worker::run() ()
      #23 0x00000000009f66f9 in impala::ThriftThread::RunRunnable (this=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/rpc/thrift-thread.cc:64
      #24 0x00000000009f7152 in operator() (function_obj_ptr=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:280
      
      1. 60c767d4-d275-2a23-24273dd1-2ce46f90.dmp
        15.91 MB
        Mostafa Mokhtar
      2. hs_err_pid52633.log
        637 kB
        Mostafa Mokhtar

        Activity

        Hide
        dhecht Dan Hecht added a comment -

        This is similar to IMPALA-4890, except here the race is between Coordinator::Exec() and Coordinator::Teardown(), which aren't synchronized properly. Teardown() cleared the query_state_ out from under Exec. Note that this can only happen when cancelling from the webui since the query-handle isn't yet returned to the client (and so the client can't initiate a cancel).

        (gdb) f 12
        #12 0x0000000000dcc14e in impala::Coordinator::Exec (this=0x7f489a730400) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/coordinator.cc:483
        483     /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/coordinator.cc: No such file or directory.
                in /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/coordinator.cc
        (gdb) p query_state_
        $1 = (impala::QueryState *) 0x0
        (gdb) p torn_down_
        $2 = true
        

        Marcel Kornacker, I think you'll be fixing this with IMPALA-4890, right?

        Show
        dhecht Dan Hecht added a comment - This is similar to IMPALA-4890 , except here the race is between Coordinator::Exec() and Coordinator::Teardown() , which aren't synchronized properly. Teardown() cleared the query_state_ out from under Exec . Note that this can only happen when cancelling from the webui since the query-handle isn't yet returned to the client (and so the client can't initiate a cancel). (gdb) f 12 #12 0x0000000000dcc14e in impala::Coordinator::Exec ( this =0x7f489a730400) at /data/jenkins/workspace/impala- private -build-binaries/repos/Impala/be/src/runtime/coordinator.cc:483 483 /data/jenkins/workspace/impala- private -build-binaries/repos/Impala/be/src/runtime/coordinator.cc: No such file or directory. in /data/jenkins/workspace/impala- private -build-binaries/repos/Impala/be/src/runtime/coordinator.cc (gdb) p query_state_ $1 = (impala::QueryState *) 0x0 (gdb) p torn_down_ $2 = true Marcel Kornacker , I think you'll be fixing this with IMPALA-4890 , right?
        Hide
        marcelk Marcel Kornacker added a comment -

        commit fcc3b9eded82953c04fd7510d724a2c9b7ff59a3
        Author: Marcel Kornacker <marcel@cloudera.com>
        Date: Sun May 14 21:32:41 2017 -0700

        IMPALA-4890/5143: Coordinator race involving TearDown()

        TearDown() releases resources and destroys control
        structures (the QueryState reference), and it can be called
        while a concurrent thread executes Exec() or might call
        GetNext() in the future. The solution is not to destroy
        the control structures.

        This also releases resources automatically at the end
        of query execution.

        Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7

        Show
        marcelk Marcel Kornacker added a comment - commit fcc3b9eded82953c04fd7510d724a2c9b7ff59a3 Author: Marcel Kornacker <marcel@cloudera.com> Date: Sun May 14 21:32:41 2017 -0700 IMPALA-4890 /5143: Coordinator race involving TearDown() TearDown() releases resources and destroys control structures (the QueryState reference), and it can be called while a concurrent thread executes Exec() or might call GetNext() in the future. The solution is not to destroy the control structures. This also releases resources automatically at the end of query execution. Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7

          People

          • Assignee:
            marcelk Marcel Kornacker
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development