Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3519

Crash in stress test: impala::Tuple::DeepCopyVarlenData

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: Impala 2.6.0
    • Component/s: Backend
    • Labels:

      Description

      This might be related to IMPALA-3485.

      (gdb) bt
      #0  0x00000030f8832625 in raise () from /lib64/libc.so.6
      #1  0x00000030f8833e05 in abort () from /lib64/libc.so.6
      #2  0x00007f77f8268c55 in os::abort(bool) () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so
      #3  0x00007f77f83eacd7 in VMError::report_and_die() () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so
      #4  0x00007f77f826db6f in JVM_handle_linux_signal () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  0x00000030f8889710 in memcpy () from /lib64/libc.so.6
      #7  0x00000000012c7c8a in impala::Tuple::DeepCopyVarlenData (this=0x7f67000c7aa0, desc=..., pool=0x7f6a157913e8) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/tuple.cc:89
      #8  0x00000000012c7ae1 in impala::Tuple::DeepCopy (this=0x7f6ec1fc8020, dst=0x7f67000c7aa0, desc=..., pool=0x7f6a157913e8) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/tuple.cc:78
      #9  0x00000000012c7a7a in impala::Tuple::DeepCopy (this=0x7f6ec1fc8020, desc=..., pool=0x7f6a157913e8) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/tuple.cc:69
      #10 0x000000000188d774 in impala::DataStreamSender::Channel::AddRow (this=0x7e248700, row=0x7f7525e67dc8) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/data-stream-sender.cc:241
      #11 0x00000000018907dd in impala::DataStreamSender::Send (this=0x7f6855d5d000, state=0x7f689df22000, batch=0x1f061af40, eos=false) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/data-stream-sender.cc:446
      #12 0x000000000182372f in impala::PlanFragmentExecutor::OpenInternal (this=0x7f6e568c1970) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/plan-fragment-executor.cc:374
      #13 0x0000000001822c2f in impala::PlanFragmentExecutor::Open (this=0x7f6e568c1970) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/plan-fragment-executor.cc:327
      #14 0x0000000001401a22 in impala::FragmentMgr::FragmentExecState::Exec (this=0x7f6e568c1700) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/service/fragment-exec-state.cc:54
      #15 0x00000000013f9227 in impala::FragmentMgr::FragmentThread (this=0xd5c1cc0, fragment_instance_id=...) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/service/fragment-mgr.cc:86
      #16 0x00000000013fcd84 in boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>::operator() (this=0xcdf0150, p=0xd5c1cc0, a1=...) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
      #17 0x00000000013fcb41 in boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> >::operator()<boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0> (this=0xcdf0160, f=..., a=...)
          at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
      #18 0x00000000013fc46b in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >::operator() (this=0xcdf0150)
          at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #19 0x00000000013fbdd4 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >, void>::invoke
          (function_obj_ptr=...) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
      #20 0x000000000121d91c in boost::function0<void>::operator() (this=0x7f713e6a0c60) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
      #21 0x00000000014b9bc1 in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7f7352c8f960) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/util/thread.cc:315
      #22 0x00000000014c0724 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x7f737b53bbc0, f=@0x7f737b53bbb8, a=...)
          at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457
      #23 0x00000000014c0667 in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0x7f737b53bbb8) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #24 0x00000000014c062a in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x7f737b53ba00) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
      #25 0x00000000018d925a in thread_proxy ()
      #26 0x00000030f8c079d1 in start_thread () from /lib64/libpthread.so.0
      #27 0x00000030f88e88fd in clone () from /lib64/libc.so.6
      

      I've been digging around in the core dump and found some more info.

      The rows look like they're coming out of a parquet scan of customer.

      The crash comes because row 953 in a batch is corrupt. The first string slot in the tuple that it tries to copy has completely bogus data:

      $54 = (impala::StringValue *) 0x7f67000c7ab0
      (gdb) p *string_v
      $55 = {static MAX_LENGTH = 1073741824, ptr = 0x726f727245205d39 <Address 0x726f727245205d39 out of bounds>, len = 1869768224, static LLVM_CLASS_NAME = 0x2dce1c3 "struct.impala::StringValue"}
      

      I traced the code a little bit more and realised that it didn't make much sense: in Tuple::DeepCopy we memcpy() data from the src to destination before copying over string slots. However the data in the source and destination was different.

      (gdb) p *((StringValue*)(this + 16))
      $113 = {static MAX_LENGTH = 1073741824, ptr = 0x0, len = 0, static LLVM_CLASS_NAME = 0x2dce1c3 "struct.impala::StringValue"}
      (gdb) p *((StringValue*)(dst + 16))
      $114 = {static MAX_LENGTH = 1073741824, ptr = 0x726f727245205d39 <Address 0x726f727245205d39 out of bounds>, len = 1869768224, static LLVM_CLASS_NAME = 0x2dce1c3 "struct.impala::StringValue"}
      

      I tried printing the dst and somehow it's getting overwritten with garbage

      (gdb) p ((char*)dst)
      $111 = 0x7f67000c7aa0 "time-state.cc:209] Error from quBUILDING"
      

      Which is strange because it appears to be pointing into a valid MemPool chunk:

      (gdb) p *(pool->chunks_._M_impl._M_start)
      $125 = {data = 0x7f67000c7000 "", size = 4096, allocated_bytes = 2752}
      

      I'm going to gather the cores on impala-desktop and think about this some more.

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: