Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4517

Stress test crash: impala::LlvmCodeGen::FinalizeModule

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      The stress was running against a Kudu cluster on EC2.

      Stack Trace:

      #0  0x00000032a5a32625 in raise () from /lib64/libc.so.6
      #1  0x00000032a5a33e05 in abort () from /lib64/libc.so.6
      #2  0x00007f6bb13bba55 in os::abort(bool) ()
         from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #3  0x00007f6bb153bf87 in VMError::report_and_die() ()
         from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #4  0x00007f6bb13c096f in JVM_handle_linux_signal ()
         from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  0x00000000025aec00 in llvm::Comdat::getName() const ()
      #7  0x00000000027187b1 in (anonymous namespace)::Verifier::verify(llvm::Module const&) ()
      #8  0x0000000002718a4d in (anonymous namespace)::VerifierLegacyPass::doFinalization(llvm::Module&) ()
      #9  0x00000000026d2b3c in llvm::FPPassManager::doFinalization(llvm::Module&) ()
      #10 0x00000000026dd26f in llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
      #11 0x000000000235ed39 in llvm::MCJIT::emitObject(llvm::Module*) ()
      #12 0x000000000235f4db in llvm::MCJIT::generateCodeForModule(llvm::Module*) ()
      #13 0x000000000235bb90 in llvm::MCJIT::finalizeObject() ()
      #14 0x000000000162415b in impala::LlvmCodeGen::FinalizeModule (this=0x144796d80)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/codegen/llvm-codegen.cc:949
      #15 0x00000000019d4f2e in impala::PlanFragmentExecutor::OptimizeLlvmModule (
          this=0xf733aa90)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:264
      #16 0x00000000019d5a30 in impala::PlanFragmentExecutor::OpenInternal (this=0xf733aa90)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:317
      #17 0x00000000019d56ad in impala::PlanFragmentExecutor::Open (this=0xf733aa90)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:294
      #18 0x000000000152dc7c in impala::FragmentMgr::FragmentExecState::Exec (this=0xf733a700)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/fragment-exec-state.cc:58
      #19 0x000000000152541a in impala::FragmentMgr::FragmentThread (this=0x8e59300, 
          fragment_instance_id=...)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/fragment-mgr.cc:86
      #20 0x000000000152919c in boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>::operator() (this=0x4091ab40, p=0x8e59300, a1=...)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
      #21 0x0000000001528f59 in boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> >::operator()<boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0> (this=0x4091ab50, f=..., a=...)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
      #22 0x0000000001528883 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >::operator() (this=0x4091ab40)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #23 0x0000000001528216 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >, void>::invoke (function_obj_ptr=...)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
      #24 0x000000000133ce98 in boost::function0<void>::operator() (this=0x7f68bb6a8c40)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
      #25 0x00000000015e8921 in impala::Thread::SuperviseThread (name=..., category=..., 
          functor=..., thread_started=0x7f6b055698e0)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/thread.cc:317
      #26 0x00000000015ef8fa in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x2aff29c0, f=@0x2aff29b8, a=...)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457
      #27 0x00000000015ef83d in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0x2aff29b8)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #28 0x00000000015ef798 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x2aff2800)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
      #29 0x0000000001a3c34a in thread_proxy ()
      #30 0x00000032a5e079d1 in start_thread () from /lib64/libpthread.so.0
      #31 0x00000032a5ae88fd in clone () from /lib64/libc.so.6
      

      The core file can be found here:

      kudu-stress-5.vpc.cloudera.com:/data1/impalad/core.32601
      
      1. hs_err_pid32601.log
        294 kB
        Taras Bobrovytsky

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        What workload was this running?

        Show
        tarmstrong Tim Armstrong added a comment - What workload was this running?
        Hide
        tarasbob Taras Bobrovytsky added a comment -

        It was running Select, Delete and Upsert queries. There was a pretty heavy load.

        Show
        tarasbob Taras Bobrovytsky added a comment - It was running Select, Delete and Upsert queries. There was a pretty heavy load.
        Hide
        kwho Michael Ho added a comment -

        Is it reproducible ? If so, please post the query here. Thanks.

        Show
        kwho Michael Ho added a comment - Is it reproducible ? If so, please post the query here. Thanks.
        Hide
        tarasbob Taras Bobrovytsky added a comment -

        I don't think it's easily reproducible. Many queries were running at the same time (none of which individually cause a crash).

        Show
        tarasbob Taras Bobrovytsky added a comment - I don't think it's easily reproducible. Many queries were running at the same time (none of which individually cause a crash).
        Hide
        tarmstrong Tim Armstrong added a comment -

        Were the queries TPC-H/TPC-DS or randomly generated?

        I'm wondering if it's a failed memory allocation or some kind of use-after-free bug.

        Show
        tarmstrong Tim Armstrong added a comment - Were the queries TPC-H/TPC-DS or randomly generated? I'm wondering if it's a failed memory allocation or some kind of use-after-free bug.
        Hide
        kwho Michael Ho added a comment -

        Please upload log files to impala-desktop if they are available too.

        Show
        kwho Michael Ho added a comment - Please upload log files to impala-desktop if they are available too.
        Hide
        tarasbob Taras Bobrovytsky added a comment -

        The select queries that were running were not randomly generated. They were standard TPCH queries. I uploaded the log files to impala-desktop.ca.cloudera.com:/home/dev/IMPALA-4517.

        Show
        tarasbob Taras Bobrovytsky added a comment - The select queries that were running were not randomly generated. They were standard TPCH queries. I uploaded the log files to impala-desktop.ca.cloudera.com:/home/dev/ IMPALA-4517 .
        Hide
        tarasbob Taras Bobrovytsky added a comment -

        The impalad binary can be found here:

        kudu-stress-5.vpc.cloudera.com:/data1/impalad/impalad
        

        The binaries were produced by http://sandbox.jenkins.cloudera.com/view/Impala/view/Private-Utility/job/impala-private-build-binaries/1399/

        Show
        tarasbob Taras Bobrovytsky added a comment - The impalad binary can be found here: kudu-stress-5.vpc.cloudera.com:/data1/impalad/impalad The binaries were produced by http://sandbox.jenkins.cloudera.com/view/Impala/view/Private-Utility/job/impala-private-build-binaries/1399/
        Hide
        kwho Michael Ho added a comment -

        ::Verifier::verify() calls Comdat::getName() with a null Comdat* as %rdi (the first argument) is null:

        StringRef Comdat::getName() const { return Name->first(); }
        
        Registers:
        RAX=0xfffffffffffffff7, RBX=0x0000000045be4000, RCX=0x0000000045be4000, RDX=0x0000000000000019
        RSP=0x00007f68bb6a62c8, RBP=0x0000000045bec000, RSI=0x0000000000001000, RDI=0x0000000000000000
        R8 =0x000000003ec3fea8, R9 =0x0000000000000000, R10=0x0000000000000008, R11=0x00000032a5b5a590
        R12=0x000000003ec3fc20, R13=0x000000003ec3fc28, R14=0x00007f68bb6a6300, R15=0x00000001814f2ee8
        RIP=0x00000000025aec00, EFLAGS=0x0000000000010257, CSGSFS=0x000000000000e033, ERR=0x0000000000000004
          TRAPNO=0x000000000000000e
        
        (gdb) x/i 0x00000000025aec00
           0x25aec00 <_ZNK4llvm6Comdat7getNameEv>:	mov    (%rdi),%rax
        
        (gdb) x/20i _ZNK4llvm6Comdat7getNameEv
           0x25aec00 <_ZNK4llvm6Comdat7getNameEv>:	mov    (%rdi),%rax
           0x25aec03 <_ZNK4llvm6Comdat7getNameEv+3>:	mov    (%rax),%edx
           0x25aec05 <_ZNK4llvm6Comdat7getNameEv+5>:	add    $0x18,%rax
        
        Show
        kwho Michael Ho added a comment - ::Verifier::verify() calls Comdat::getName() with a null Comdat* as %rdi (the first argument) is null: StringRef Comdat::getName() const { return Name->first(); } Registers: RAX=0xfffffffffffffff7, RBX=0x0000000045be4000, RCX=0x0000000045be4000, RDX=0x0000000000000019 RSP=0x00007f68bb6a62c8, RBP=0x0000000045bec000, RSI=0x0000000000001000, RDI=0x0000000000000000 R8 =0x000000003ec3fea8, R9 =0x0000000000000000, R10=0x0000000000000008, R11=0x00000032a5b5a590 R12=0x000000003ec3fc20, R13=0x000000003ec3fc28, R14=0x00007f68bb6a6300, R15=0x00000001814f2ee8 RIP=0x00000000025aec00, EFLAGS=0x0000000000010257, CSGSFS=0x000000000000e033, ERR=0x0000000000000004 TRAPNO=0x000000000000000e (gdb) x/i 0x00000000025aec00 0x25aec00 <_ZNK4llvm6Comdat7getNameEv>: mov (%rdi),%rax (gdb) x/20i _ZNK4llvm6Comdat7getNameEv 0x25aec00 <_ZNK4llvm6Comdat7getNameEv>: mov (%rdi),%rax 0x25aec03 <_ZNK4llvm6Comdat7getNameEv+3>: mov (%rax),%edx 0x25aec05 <_ZNK4llvm6Comdat7getNameEv+5>: add $0x18,%rax
        Hide
        kwho Michael Ho added a comment -

        This seems more like a memory corruption.

        The core doesn't show the entire backtrace but it's likely we crashed at the following code snippet in LLVM:

        Verifier::verify() {
            for (const StringMapEntry<Comdat> &SMEC : M.getComdatSymbolTable())
              visitComdat(SMEC.getValue());
        }
        
        class Module{
        ...
          /// Get the Module's symbol table for COMDATs.
          ComdatSymTabType &getComdatSymbolTable() { return ComdatSymTab; }
        ...
        }
        

        ComdatSymTab appears to be a linearly probed hash table which maps symbol to comdat.

        (gdb) p this->module_->ComdatSymTab
        $23 = {<llvm::StringMapImpl> = {TheTable = 0x45be4000, NumBuckets = 4096, NumItems = 1763, NumTombstones = 0, ItemSize = 24}, Allocator = {<llvm::AllocatorBase<llvm::MallocAllocator>> = {<No data fields>}, <No data fields>}}
        

        The layout of the ComdatSymTab is that it contains NumBuckets of StringMapEntryBase* buckets followed by a hash table.
        It appears that the bucket table is littered with a lot of 0xfffffffffffffff8 which isn't exactly a meaningful value. The tombstone value
        in the hash table is -1.

        Scanning the bucket table, it appears that if we ignore 0x0 or 0xfffffffffffffff8, the number of valid entries actually matches
        that shown in NumItems fields of ComdatSymTab.

        (gdb) set $i=0
        (gdb) set $count=0
        (gdb) while ($i < 4096)
         >set $val=((unsigned long long*)0x45be4000)[$i]
         >if ($val != 0 && $val !=0xfffffffffffffff8)
          >set $count=$count+1
          >end
         >set $i=$i+1
         >end
        (gdb) p/x $i
        $25 = 0x1000
        (gdb) p/x $count
        $26 = 0x6e3
        (gdb) p $count
        $27 = 1763
        

        It appears that some code somewhere manages to write a bunch of 0xfffffffffffffff8 into ComdatSymTab. Not sure if that has anything to do with how we use LLVM or is it just random memory corruption ?

        The entires in the bucket tables appear to be valid:

        (gdb) p (char*)(0x000000000d64def0 + 24)
        $28 = 0xd64df08 "_ZSt22__final_insertion_sortIPN6impala15ReservoirSampleIN10impala_udf10TinyIntValEEEN9__gnu_cxx5__ops15_Iter_comp_iterIPFbRKS4_SA_EEEEvT_SE_T0_"
        
        (gdb) x/4096gx 0x45be4000
        0x45be4000:	0xfffffffffffffff8	0x000000000d64def0
        0x45be4010:	0x00000000bfa2f260	0xfffffffffffffff8
        0x45be4020:	0x0000000000000000	0x0000000000000000
        0x45be4030:	0xfffffffffffffff8	0x0000000000000000
        0x45be4040:	0xfffffffffffffff8	0x0000000000000000
        0x45be4050:	0xfffffffffffffff8	0x0000000000000000
        0x45be4060:	0x0000000000000000	0x0000000000000000
        0x45be4070:	0x0000000000000000	0xfffffffffffffff8
        0x45be4080:	0x000000002ceca490	0x000000002ceca0d0
        0x45be4090:	0x000000002f74ef40	0x0000000102985930
        0x45be40a0:	0x0000000000000000	0xfffffffffffffff8
        0x45be40b0:	0x0000000030471000	0xfffffffffffffff8
        0x45be40c0:	0x0000000000000000	0x00000000abd3f280
        0x45be40d0:	0x0000000000000000	0x0000000000000000
        0x45be40e0:	0xfffffffffffffff8	0x0000000000000000
        0x45be40f0:	0x0000000000000000	0x0000000000000000
        0x45be4100:	0x000000017b12cac0	0x0000000000000000
        0x45be4110:	0x0000000000000000	0x0000000000000000
        0x45be4120:	0xfffffffffffffff8	0x0000000000000000
        0x45be4130:	0x0000000000000000	0x000000010c5cf410
        0x45be4140:	0x00000000abd3ed00	0x0000000000000000
        0x45be4150:	0x0000000000000000	0x0000000000000000
        0x45be4160:	0x0000000000000000	0xfffffffffffffff8
        0x45be4170:	0x0000000000000000	0x0000000000000000
        0x45be4180:	0x0000000000000000	0x0000000102982000
        0x45be4190:	0x0000000000000000	0x0000000000000000
        0x45be41a0:	0x0000000000000000	0xfffffffffffffff8
        0x45be41b0:	0x000000002cec8640	0xfffffffffffffff8
        0x45be41c0:	0x0000000000000000	0x000000009c457180
        0x45be41d0:	0x000000000daee540	0x00000000e4b68370
        0x45be41e0:	0x0000000000000000	0x0000000000000000
        0x45be41f0:	0x0000000030470ca0	0x0000000000000000
        0x45be4200:	0xfffffffffffffff8	0x00000000b794a0e0
        0x45be4210:	0xfffffffffffffff8	0x0000000000000000
        0x45be4220:	0x0000000000000000	0x0000000000000000
        0x45be4230:	0x0000000000000000	0x0000000000000000
        0x45be4240:	0x0000000000000000	0x0000000034d97570
        0x45be4250:	0x000000016fb891d0	0x0000000000000000
        0x45be4260:	0x0000000042e210e0	0x0000000000000000
        0x45be4270:	0xfffffffffffffff8	0x00000000bfa2eb60
        0x45be4280:	0x0000000000000000	0xfffffffffffffff8
        0x45be4290:	0x0000000000000000	0x0000000000000000
        0x45be42a0:	0x0000000000000000	0x00000000e4b67200
        0x45be42b0:	0x0000000019679700	0x000000014085ec80
        0x45be42c0:	0x0000000062cac310	0x0000000022f15650
        0x45be42d0:	0xfffffffffffffff8	0xfffffffffffffff8
        
        ....
        
        Show
        kwho Michael Ho added a comment - This seems more like a memory corruption. The core doesn't show the entire backtrace but it's likely we crashed at the following code snippet in LLVM: Verifier::verify() { for (const StringMapEntry<Comdat> &SMEC : M.getComdatSymbolTable()) visitComdat(SMEC.getValue()); } class Module{ ... /// Get the Module's symbol table for COMDATs. ComdatSymTabType &getComdatSymbolTable() { return ComdatSymTab; } ... } ComdatSymTab appears to be a linearly probed hash table which maps symbol to comdat. (gdb) p this->module_->ComdatSymTab $23 = {<llvm::StringMapImpl> = {TheTable = 0x45be4000, NumBuckets = 4096, NumItems = 1763, NumTombstones = 0, ItemSize = 24}, Allocator = {<llvm::AllocatorBase<llvm::MallocAllocator>> = {<No data fields>}, <No data fields>}} The layout of the ComdatSymTab is that it contains NumBuckets of StringMapEntryBase* buckets followed by a hash table. It appears that the bucket table is littered with a lot of 0xfffffffffffffff8 which isn't exactly a meaningful value. The tombstone value in the hash table is -1. Scanning the bucket table, it appears that if we ignore 0x0 or 0xfffffffffffffff8, the number of valid entries actually matches that shown in NumItems fields of ComdatSymTab. (gdb) set $i=0 (gdb) set $count=0 (gdb) while ($i < 4096) >set $val=((unsigned long long*)0x45be4000)[$i] >if ($val != 0 && $val !=0xfffffffffffffff8) >set $count=$count+1 >end >set $i=$i+1 >end (gdb) p/x $i $25 = 0x1000 (gdb) p/x $count $26 = 0x6e3 (gdb) p $count $27 = 1763 It appears that some code somewhere manages to write a bunch of 0xfffffffffffffff8 into ComdatSymTab. Not sure if that has anything to do with how we use LLVM or is it just random memory corruption ? The entires in the bucket tables appear to be valid: (gdb) p (char*)(0x000000000d64def0 + 24) $28 = 0xd64df08 "_ZSt22__final_insertion_sortIPN6impala15ReservoirSampleIN10impala_udf10TinyIntValEEEN9__gnu_cxx5__ops15_Iter_comp_iterIPFbRKS4_SA_EEEEvT_SE_T0_" (gdb) x/4096gx 0x45be4000 0x45be4000: 0xfffffffffffffff8 0x000000000d64def0 0x45be4010: 0x00000000bfa2f260 0xfffffffffffffff8 0x45be4020: 0x0000000000000000 0x0000000000000000 0x45be4030: 0xfffffffffffffff8 0x0000000000000000 0x45be4040: 0xfffffffffffffff8 0x0000000000000000 0x45be4050: 0xfffffffffffffff8 0x0000000000000000 0x45be4060: 0x0000000000000000 0x0000000000000000 0x45be4070: 0x0000000000000000 0xfffffffffffffff8 0x45be4080: 0x000000002ceca490 0x000000002ceca0d0 0x45be4090: 0x000000002f74ef40 0x0000000102985930 0x45be40a0: 0x0000000000000000 0xfffffffffffffff8 0x45be40b0: 0x0000000030471000 0xfffffffffffffff8 0x45be40c0: 0x0000000000000000 0x00000000abd3f280 0x45be40d0: 0x0000000000000000 0x0000000000000000 0x45be40e0: 0xfffffffffffffff8 0x0000000000000000 0x45be40f0: 0x0000000000000000 0x0000000000000000 0x45be4100: 0x000000017b12cac0 0x0000000000000000 0x45be4110: 0x0000000000000000 0x0000000000000000 0x45be4120: 0xfffffffffffffff8 0x0000000000000000 0x45be4130: 0x0000000000000000 0x000000010c5cf410 0x45be4140: 0x00000000abd3ed00 0x0000000000000000 0x45be4150: 0x0000000000000000 0x0000000000000000 0x45be4160: 0x0000000000000000 0xfffffffffffffff8 0x45be4170: 0x0000000000000000 0x0000000000000000 0x45be4180: 0x0000000000000000 0x0000000102982000 0x45be4190: 0x0000000000000000 0x0000000000000000 0x45be41a0: 0x0000000000000000 0xfffffffffffffff8 0x45be41b0: 0x000000002cec8640 0xfffffffffffffff8 0x45be41c0: 0x0000000000000000 0x000000009c457180 0x45be41d0: 0x000000000daee540 0x00000000e4b68370 0x45be41e0: 0x0000000000000000 0x0000000000000000 0x45be41f0: 0x0000000030470ca0 0x0000000000000000 0x45be4200: 0xfffffffffffffff8 0x00000000b794a0e0 0x45be4210: 0xfffffffffffffff8 0x0000000000000000 0x45be4220: 0x0000000000000000 0x0000000000000000 0x45be4230: 0x0000000000000000 0x0000000000000000 0x45be4240: 0x0000000000000000 0x0000000034d97570 0x45be4250: 0x000000016fb891d0 0x0000000000000000 0x45be4260: 0x0000000042e210e0 0x0000000000000000 0x45be4270: 0xfffffffffffffff8 0x00000000bfa2eb60 0x45be4280: 0x0000000000000000 0xfffffffffffffff8 0x45be4290: 0x0000000000000000 0x0000000000000000 0x45be42a0: 0x0000000000000000 0x00000000e4b67200 0x45be42b0: 0x0000000019679700 0x000000014085ec80 0x45be42c0: 0x0000000062cac310 0x0000000022f15650 0x45be42d0: 0xfffffffffffffff8 0xfffffffffffffff8 ....
        Hide
        kwho Michael Ho added a comment -

        In case, it isn't clear, the reason why 0xfffffffffffffff8 has anything to do with the NULL is that the actual payload (Comdat) is at offset 0x8 in the bucket entry:

        (gdb) p *(llvm::Comdat*)(0x000000000d64def0 + 8)
        $33 = {Name = 0xd64def0, SK = llvm::Comdat::Any}
        
        (gdb) p/x 0xfffffffffffffff8 + 0x8
        $35 = 0x0
        
        Show
        kwho Michael Ho added a comment - In case, it isn't clear, the reason why 0xfffffffffffffff8 has anything to do with the NULL is that the actual payload (Comdat) is at offset 0x8 in the bucket entry: (gdb) p *(llvm::Comdat*)(0x000000000d64def0 + 8) $33 = {Name = 0xd64def0, SK = llvm::Comdat::Any} (gdb) p/x 0xfffffffffffffff8 + 0x8 $35 = 0x0
        Hide
        kwho Michael Ho added a comment -

        Hi Taras Bobrovytsky, I have already coped all the files to impala-desktop. Can you please try reproducing it with ASAN build ?

        Show
        kwho Michael Ho added a comment - Hi Taras Bobrovytsky , I have already coped all the files to impala-desktop. Can you please try reproducing it with ASAN build ?
        Hide
        kwho Michael Ho added a comment -

        Attempt to deploy ASAN builds to the stress cluster hit IMPALA-4544. Taras Bobrovytsky ran the same stress test again for longer (with non-ASAN build) and hasn't hit the problem again.

        Show
        kwho Michael Ho added a comment - Attempt to deploy ASAN builds to the stress cluster hit IMPALA-4544 . Taras Bobrovytsky ran the same stress test again for longer (with non-ASAN build) and hasn't hit the problem again.
        Hide
        kwho Michael Ho added a comment -

        Unable to reproduce this problem with ASAN and non-ASAN builds. The Kudu client was in flux so the problem may have been fixed already.

        Show
        kwho Michael Ho added a comment - Unable to reproduce this problem with ASAN and non-ASAN builds. The Kudu client was in flux so the problem may have been fixed already.
        Hide
        sailesh Sailesh Mukil added a comment -

        Saw this issue again in a test run.

        Show
        sailesh Sailesh Mukil added a comment - Saw this issue again in a test run.
        Hide
        jbapple Jim Apple added a comment -

        Talked out of band; Michael Ho thinks what Sailesh saw was not this issue. Still triaging.

        Show
        jbapple Jim Apple added a comment - Talked out of band; Michael Ho thinks what Sailesh saw was not this issue. Still triaging.

          People

          • Assignee:
            kwho Michael Ho
            Reporter:
            tarasbob Taras Bobrovytsky
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development