Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4274

Jenkins builds can hang at the BE test "buffered-block-mgr-test"

    Details

      Description

      The umbrella-bulid-and-test job has gotten stuck more than once lately on the BE test buffered-block-mgr-test (as has the ASF "repeated runs" Jenkins job.) A quick JIRA search shows that this test seems to have a history of flakiness. In this latest incarnation, the hang seems to be here:

      [root@impala-boost-static-burst-slave-17c6 ~]# strace -p 21506
      Process 21506 attached - interrupt to quit
      futex(0x4e10244, FUTEX_WAIT_PRIVATE, 1, NULL
      

      There are msgs in the tests WARNING and ERROR output logs, but these same messages show up in the logs of previous tests that have passed, so I don't know how relevant they are. (E.g., maybe there are negative tests that produces these msgs.) Nonetheless, I'll include them:

      WARNINGS:

      Log file created at: 2016/10/03 06:37:11
      Running on machine: impala-boost-static-burst-slave-17c6.vpc.cloudera.com
      Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
      W1003 06:37:11.796332  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:12.303864  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:12.558909  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:12.796193  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:13.049844  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:13.269994  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:13.496836  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:13.850524  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:14.041249  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:14.301388  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:14.459806  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:14.617405  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      E1003 06:37:15.018292  6468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183': fopen(/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183, "rb+") failed with errno=2 description=Error(2): No such file or directory
      E1003 06:37:15.018648  6466 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183': fopen(/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183, "rb+") failed with errno=2 description=Error(2): No such file or directory
      W1003 06:37:15.102265  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:15.253690  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      E1003 06:37:15.334065  6636 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128': fopen(/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128, "rb+") failed with errno=2 description=Error(2): No such file or directory
      E1003 06:37:15.334220  6635 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128': fopen(/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128, "rb+") failed with errno=2 description=Error(2): No such file or directory
      W1003 06:37:15.410234  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      E1003 06:37:15.495048  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_ac4ccd81-6498-4149-b325-261c3efe4af2': Truncate file /tmp/impala-scratch/0:0_ac4ccd81-6498-4149-b325-261c3efe4af2 to length 2048 failed with errno 2 (Error(2): No such file or directory)
      W1003 06:37:15.495100  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/impala-scratch/0:0_ac4ccd81-6498-4149-b325-261c3efe4af2': Truncate file /tmp/impala-scratch/0:0_ac4ccd81-6498-4149-b325-261c3efe4af2 to length 2048 failed with errno 2 (Error(2): No such file or directory). Will try another scratch file.
      W1003 06:37:15.558523  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      E1003 06:37:15.630766  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.630825  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      E1003 06:37:15.631402  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.644335  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      E1003 06:37:15.649292  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.649369  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      E1003 06:37:15.650197  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.650451  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      W1003 06:37:15.717438  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      E1003 06:37:15.787549  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16': Create file /tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.787639  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16': Create file /tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      E1003 06:37:15.787919  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.788485  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      E1003 06:37:15.788746  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16': Create file /tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.788976  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16': Create file /tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      E1003 06:37:15.789222  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148 failed with errno=13 description=Error(13): Permission denied
      W1003 06:37:15.789587  5468 buffered-block-mgr.cc:826] Error while allocating range in scratch file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148 failed with errno=13 description=Error(13): Permission denied. Will try another scratch file.
      W1003 06:37:15.844924  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:15.910274  5468 tmp-file-mgr.cc:74] Running without spill to disk: no scratch directories provided.
      W1003 06:37:15.965059  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:16.083060  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:16.205215  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:16.322567  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      W1003 06:37:16.439827  5468 HiveConf.java:2762] HiveConf of name hive.access.conf.url does not exist
      E1003 06:37:16.502789  5468 buffered-block-mgr-test.cc:410]  Iteration 0
      E1003 06:37:16.817687  5468 buffered-block-mgr-test.cc:410]  Iteration 0
      E1003 06:37:17.006023  5468 buffered-block-mgr-test.cc:410]  Iteration 0
      [etc...]
      

      ERRORS:

      Log file created at: 2016/10/03 06:37:15
      Running on machine: impala-boost-static-burst-slave-17c6.vpc.cloudera.com
      Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
      E1003 06:37:15.018292  6468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183': fopen(/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183, "rb+") failed with errno=2 description=Error(2): No such file or directory
      E1003 06:37:15.018648  6466 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183': fopen(/tmp/impala-scratch/0:0_d67923d0-7996-4667-a081-203e39103183, "rb+") failed with errno=2 description=Error(2): No such file or directory
      E1003 06:37:15.334065  6636 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128': fopen(/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128, "rb+") failed with errno=2 description=Error(2): No such file or directory
      E1003 06:37:15.334220  6635 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128': fopen(/tmp/impala-scratch/0:0_a454c713-c6ce-498a-85d0-0e868ea19128, "rb+") failed with errno=2 description=Error(2): No such file or directory
      E1003 06:37:15.495048  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/impala-scratch/0:0_ac4ccd81-6498-4149-b325-261c3efe4af2': Truncate file /tmp/impala-scratch/0:0_ac4ccd81-6498-4149-b325-261c3efe4af2 to length 2048 failed with errno 2 (Error(2): No such file or directory)
      E1003 06:37:15.630766  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.631402  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_30d7b836-21de-4d58-bf84-2dabc46945a1 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.649292  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.650197  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:1_065537ec-1680-4316-86af-ec5d82c09ac4 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.787549  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16': Create file /tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.787919  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.788746  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16': Create file /tmp/buffered-block-mgr-test.1/impala-scratch/0:0_50000b9f-89dd-437f-9502-18916cc8da16 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:15.789222  5468 tmp-file-mgr.cc:258] Error for temporary file '/tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148': Create file /tmp/buffered-block-mgr-test.0/impala-scratch/0:0_cc223ca3-3641-45e4-afa7-149df7649148 failed with errno=13 description=Error(13): Permission denied
      E1003 06:37:16.502789  5468 buffered-block-mgr-test.cc:410]  Iteration 0
      E1003 06:37:16.817687  5468 buffered-block-mgr-test.cc:410]  Iteration 0
      E1003 06:37:17.006023  5468 buffered-block-mgr-test.cc:410]  Iteration 0
      [etc...]
      

        Activity

        Hide
        dknupp David Knupp added a comment -

        Slaves are still active if anyone has time to triage:

        http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/4952/consoleFull
        09:20:00 Started by upstream project "impala-asf-master-core" build number 554
        09:20:00 Building remotely on impala-boost-static-burst-slave-17c6.vpc.cloudera.com in workspace /data/jenkins/workspace/impala-umbrella-build-and-test

        http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/5016/consoleFull
        02:42:19 Started by upstream project "impala-asf-master-exhaustive" build number 90
        02:42:19 Building remotely on impala-boost-static-burst-slave-05c6.vpc.cloudera.com in workspace /data/jenkins/workspace/impala-umbrella-build-and-test

        Show
        dknupp David Knupp added a comment - Slaves are still active if anyone has time to triage: http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/4952/consoleFull 09:20:00 Started by upstream project "impala-asf-master-core" build number 554 09:20:00 Building remotely on impala-boost-static-burst-slave-17c6.vpc.cloudera.com in workspace /data/jenkins/workspace/impala-umbrella-build-and-test http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/5016/consoleFull 02:42:19 Started by upstream project "impala-asf-master-exhaustive" build number 90 02:42:19 Building remotely on impala-boost-static-burst-slave-05c6.vpc.cloudera.com in workspace /data/jenkins/workspace/impala-umbrella-build-and-test
        Hide
        tarmstrong Tim Armstrong added a comment -

        Was able to grab a core dump from one of the running processes. Copying it to impala-desktop now.

        Backtraces are in gdb.txt

        Looks like there's a recursive acquisition of static_block_mgrs_lock_, which would explain the hang. I don't think that code has been changed recently, so maybe there was a change in timing with the test code.

        Thread 105 (Thread 0x7f4b85c44700 (LWP 18704)):
        #0  0x00000000028164ca in sys_futex (a=0x4166250, o=128, v=46, t=0x7f4b85c435b0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/linux_syscall_support.h:2713
        #1  0x000000000281660b in base::internal::SpinLockDelay (w=0x4166250, value=46, loop=2202068) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/spinlock_linux-inl.h:88
        #2  0x0000000002816397 in base::SpinLock::SlowLock (this=0x4166250) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/spinlock.cc:135
        #3  0x000000000118bd41 in base::SpinLock::Lock (this=0x4166250) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/spinlock.h:75
        #4  0x000000000118bdb6 in impala::SpinLock::lock (this=0x4166250) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/spinlock.h:34
        #5  0x0000000001196c28 in boost::lock_guard<impala::SpinLock>::lock_guard (this=0x7f4b85c436e0, m_=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/thread/lock_guard.hpp:38
        #6  0x000000000142c874 in impala::BufferedBlockMgr::~BufferedBlockMgr (this=0xa5eb000, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/buffered-block-mgr.cc:528
        #7  0x0000000001441532 in std::_Sp_counted_ptr<impala::BufferedBlockMgr*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x958aa60) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:373
        #8  0x000000000119b0c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x958aa60) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:149
        #9  0x0000000001196d51 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f4b85c43968, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:666
        #10 0x0000000001194790 in std::__shared_ptr<impala::BufferedBlockMgr, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f4b85c43960, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:914
        #11 0x00000000011947aa in std::shared_ptr<impala::BufferedBlockMgr>::~shared_ptr (this=0x7f4b85c43960, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr.h:93
        #12 0x000000000142ca0e in impala::BufferedBlockMgr::~BufferedBlockMgr (this=0xa5e9c00, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/buffered-block-mgr.cc:546
        #13 0x0000000001441532 in std::_Sp_counted_ptr<impala::BufferedBlockMgr*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x958b740) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:373
        #14 0x000000000119b0c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x958b740) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:149
        #15 0x0000000001196d51 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f4b85c43be8, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:666
        #16 0x0000000001194790 in std::__shared_ptr<impala::BufferedBlockMgr, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f4b85c43be0, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:914
        #17 0x00000000011947aa in std::shared_ptr<impala::BufferedBlockMgr>::~shared_ptr (this=0x7f4b85c43be0, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr.h:93
        #18 0x0000000001194879 in impala::BufferedBlockMgrTest::CreateDestroyThread (this=0xa0c9430, state=0x9411b00) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/buffered-block-mgr-test.cc:551
        #19 0x00000000011ab27e in boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>::operator() (this=0x6335fb8, p=0xa0c9430, a1=0x9411b00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
        #20 0x00000000011ab0e1 in boost::_bi::list2<boost::_bi::value<impala::BufferedBlockMgrTest*>, boost::_bi::value<impala::RuntimeState*> >::operator()<boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>, boost::_bi::list0> (this=0x6335fc8, f=..., a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
        #21 0x00000000011aac45 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>, boost::_bi::list2<boost::_bi::value<impala::BufferedBlockMgrTest*>, boost::_bi::value<impala::RuntimeState*> > >::operator() (this=0x6335fb8) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
        #22 0x00000000011a9e38 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>, boost::_bi::list2<boost::_bi::value<impala::BufferedBlockMgrTest*>, boost::_bi::value<impala::RuntimeState*> > > >::run (this=0x6335e00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
        #23 0x0000000001a5bc1a in thread_proxy ()
        #24 0x000000325d607851 in start_thread () from /lib64/libpthread.so.0
        #25 0x000000325d2e894d in clone () from /lib64/libc.so.6
        
        Show
        tarmstrong Tim Armstrong added a comment - Was able to grab a core dump from one of the running processes. Copying it to impala-desktop now. Backtraces are in gdb.txt Looks like there's a recursive acquisition of static_block_mgrs_lock_, which would explain the hang. I don't think that code has been changed recently, so maybe there was a change in timing with the test code. Thread 105 ( Thread 0x7f4b85c44700 (LWP 18704)): #0 0x00000000028164ca in sys_futex (a=0x4166250, o=128, v=46, t=0x7f4b85c435b0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/linux_syscall_support.h:2713 #1 0x000000000281660b in base::internal::SpinLockDelay (w=0x4166250, value=46, loop=2202068) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/spinlock_linux-inl.h:88 #2 0x0000000002816397 in base::SpinLock::SlowLock ( this =0x4166250) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/spinlock.cc:135 #3 0x000000000118bd41 in base::SpinLock::Lock ( this =0x4166250) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/gutil/spinlock.h:75 #4 0x000000000118bdb6 in impala::SpinLock::lock ( this =0x4166250) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/spinlock.h:34 #5 0x0000000001196c28 in boost::lock_guard<impala::SpinLock>::lock_guard ( this =0x7f4b85c436e0, m_=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/thread/lock_guard.hpp:38 #6 0x000000000142c874 in impala::BufferedBlockMgr::~BufferedBlockMgr ( this =0xa5eb000, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/buffered-block-mgr.cc:528 #7 0x0000000001441532 in std::_Sp_counted_ptr<impala::BufferedBlockMgr*, (__gnu_cxx::_Lock_policy)2>::_M_dispose ( this =0x958aa60) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:373 #8 0x000000000119b0c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release ( this =0x958aa60) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:149 #9 0x0000000001196d51 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count ( this =0x7f4b85c43968, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:666 #10 0x0000000001194790 in std::__shared_ptr<impala::BufferedBlockMgr, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr ( this =0x7f4b85c43960, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:914 #11 0x00000000011947aa in std::shared_ptr<impala::BufferedBlockMgr>::~shared_ptr ( this =0x7f4b85c43960, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr.h:93 #12 0x000000000142ca0e in impala::BufferedBlockMgr::~BufferedBlockMgr ( this =0xa5e9c00, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/buffered-block-mgr.cc:546 #13 0x0000000001441532 in std::_Sp_counted_ptr<impala::BufferedBlockMgr*, (__gnu_cxx::_Lock_policy)2>::_M_dispose ( this =0x958b740) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:373 #14 0x000000000119b0c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release ( this =0x958b740) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:149 #15 0x0000000001196d51 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count ( this =0x7f4b85c43be8, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:666 #16 0x0000000001194790 in std::__shared_ptr<impala::BufferedBlockMgr, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr ( this =0x7f4b85c43be0, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr_base.h:914 #17 0x00000000011947aa in std::shared_ptr<impala::BufferedBlockMgr>::~shared_ptr ( this =0x7f4b85c43be0, __in_chrg=<value optimized out>) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/shared_ptr.h:93 #18 0x0000000001194879 in impala::BufferedBlockMgrTest::CreateDestroyThread ( this =0xa0c9430, state=0x9411b00) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/buffered-block-mgr-test.cc:551 #19 0x00000000011ab27e in boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>:: operator () ( this =0x6335fb8, p=0xa0c9430, a1=0x9411b00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165 #20 0x00000000011ab0e1 in boost::_bi::list2<boost::_bi::value<impala::BufferedBlockMgrTest*>, boost::_bi::value<impala::RuntimeState*> >:: operator ()<boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>, boost::_bi::list0> ( this =0x6335fc8, f=..., a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313 #21 0x00000000011aac45 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>, boost::_bi::list2<boost::_bi::value<impala::BufferedBlockMgrTest*>, boost::_bi::value<impala::RuntimeState*> > >:: operator () ( this =0x6335fb8) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20 #22 0x00000000011a9e38 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::BufferedBlockMgrTest, impala::RuntimeState*>, boost::_bi::list2<boost::_bi::value<impala::BufferedBlockMgrTest*>, boost::_bi::value<impala::RuntimeState*> > > >::run ( this =0x6335e00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116 #23 0x0000000001a5bc1a in thread_proxy () #24 0x000000325d607851 in start_thread () from /lib64/libpthread.so.0 #25 0x000000325d2e894d in clone () from /lib64/libc.so.6
        Hide
        tarmstrong Tim Armstrong added a comment -

        I think the problem is that in the below code the 'mgr' shared_ptr may end up being the last reference to a BufferedBlockMgr, so when it's destroyed it ends up calling the destructor recursively.

          {
            lock_guard<SpinLock> lock(static_block_mgrs_lock_);
            BlockMgrsMap::iterator it = query_to_block_mgrs_.find(query_id_);
            // IMPALA-2286: Another fragment may have called Create() for this query_id_ and
            // saw that this BufferedBlockMgr is being destructed.  That fragement will
            // overwrite the map entry for query_id_, pointing it to a different
            // BufferedBlockMgr object.  We should let that object's destructor remove the
            // entry.  On the other hand, if the second BufferedBlockMgr is destructed before
            // this thread acquires the lock, then we'll remove the entry (because we can't
            // distinguish between the two expired pointers), and when the other
            // ~BufferedBlockMgr() call occurs, it won't find an entry for this query_id_.
            if (it != query_to_block_mgrs_.end()) {
              shared_ptr<BufferedBlockMgr> mgr = it->second.lock();
              if (mgr.get() == NULL) {
                // The BufferBlockMgr object referenced by this entry is being deconstructed.
                query_to_block_mgrs_.erase(it);
              } else {
                // The map references another (still valid) BufferedBlockMgr.
                DCHECK_NE(mgr.get(), this);
              }
            }
          }
        
        Show
        tarmstrong Tim Armstrong added a comment - I think the problem is that in the below code the 'mgr' shared_ptr may end up being the last reference to a BufferedBlockMgr, so when it's destroyed it ends up calling the destructor recursively. { lock_guard<SpinLock> lock(static_block_mgrs_lock_); BlockMgrsMap::iterator it = query_to_block_mgrs_.find(query_id_); // IMPALA-2286: Another fragment may have called Create() for this query_id_ and // saw that this BufferedBlockMgr is being destructed. That fragement will // overwrite the map entry for query_id_, pointing it to a different // BufferedBlockMgr object. We should let that object's destructor remove the // entry. On the other hand, if the second BufferedBlockMgr is destructed before // this thread acquires the lock, then we'll remove the entry (because we can't // distinguish between the two expired pointers), and when the other // ~BufferedBlockMgr() call occurs, it won't find an entry for this query_id_. if (it != query_to_block_mgrs_.end()) { shared_ptr<BufferedBlockMgr> mgr = it->second.lock(); if (mgr.get() == NULL) { // The BufferBlockMgr object referenced by this entry is being deconstructed. query_to_block_mgrs_.erase(it); } else { // The map references another (still valid) BufferedBlockMgr. DCHECK_NE(mgr.get(), this ); } } }
        Hide
        dknupp David Knupp added a comment -

        Tim Armstrong Can I abort these jobs and release the Jenkins slave?

        Show
        dknupp David Knupp added a comment - Tim Armstrong Can I abort these jobs and release the Jenkins slave?
        Hide
        tarmstrong Tim Armstrong added a comment -

        Go for it, I got all the info I need I think

        Show
        tarmstrong Tim Armstrong added a comment - Go for it, I got all the info I need I think
        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-4274: hang in buffered-block-mgr-test

        We started seeing hangs in CreateDestroyMulti() where a
        thread was recursively acquiring static_block_mgrs_lock_.
        This is only possible because a shared_ptr is destroyed
        while holding the lock.

        The fix is to reset the shared_ptr only after releasing
        the lock.

        Testing:
        I was unable to reproduce the hang locally, but the
        callstack in the JIRA was a strong enough smoking gun
        to feel confident that this should fix the hang.

        Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604
        Reviewed-on: http://gerrit.cloudera.org:8080/4690
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-4274 : hang in buffered-block-mgr-test We started seeing hangs in CreateDestroyMulti() where a thread was recursively acquiring static_block_mgrs_lock_. This is only possible because a shared_ptr is destroyed while holding the lock. The fix is to reset the shared_ptr only after releasing the lock. Testing: I was unable to reproduce the hang locally, but the callstack in the JIRA was a strong enough smoking gun to feel confident that this should fix the hang. Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604 Reviewed-on: http://gerrit.cloudera.org:8080/4690 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            dknupp David Knupp
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development