Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6330

impalad crash with --memory_maintenance_sleep_time_ms=1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Backend
    • None
    • ghx-label-2

    Description

      In a typical development environment (Ubuntu16.04), I'm seeing the following:

      $bin/start-impala-cluster.py  --impalad_args="--memory_maintenance_sleep_time_ms=1"
      $impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) c1, avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over (order by smallint_col desc) c3, rank() over (order by int_col desc) c4, dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over (order by bigint_col desc) c6 from functional.alltypes) t;'
      ...
      Error communicating with impalad: TSocket read 0 bytes
      ...
      # # CRASH!
      

      I saw this originally in an atypical environment (Docker), and the bug is adapted from tests/custom_cluster/test_mem_reservations.py failing in that environment. I was able to get it to reproduce by tuning the timing.

      The stack trace I see is:

      (gdb) bt
      #0  0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
      #1  0x00007fe230df402a in __GI_abort () at abort.c:89
      #2  0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
      #3  0x00007fe2330d8b66 in __cxxabiv1::__terminate(void (*)()) (handler=<optimized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
      #4  0x00007fe2330d8bb1 in std::terminate() () at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57
      #5  0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for std::bad_cast>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>)
          at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87
      #6  0x00007fe233110332 in std::__throw_bad_cast() () at ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63
      #7  0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale const&) (__loc=...)
          at /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137
      #8  0x00000000008d2cdf in void boost::algorithm::trim<std::string>(std::string&, std::locale const&) ()
      #9  0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at /home/philip/src/Impala/be/src/util/mem-info.cc:132
      #10 0x00007fe2396d74ce in impala::AggregateMemoryMetrics::Refresh() () at /home/philip/src/Impala/be/src/util/memory-metrics.cc:141
      #11 0x00007fe239cea7c8 in MemoryMaintenanceThread() () at /home/philip/src/Impala/be/src/common/init.cc:154
      #12 0x00007fe239cefd25 in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_ptr=...) at /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:112
      #13 0x00007fe2399c8122 in boost::function0<void>::operator()() const (this=0x7fe1dc74cce0) at /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
      #14 0x00007fe2397555c1 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) (name="memory-maintenance-thread", category="common", functor=..., thread_started=0x7fffca2710a0)
          at /home/philip/src/Impala/be/src/util/thread.cc:352
      #15 0x00007fe23975ed38 in boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) (this=0x7987dc0, f=@0x7987db8: 0x7fe2397552a2 <impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*)>, a=...) at /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:457
      #16 0x00007fe23975ec7b in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0x7987db8) at /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20
      #17 0x00007fe23975ec3e in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0x7987c00) at /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/thread/detail/thread.hpp:116
      #18 0x00000000008d059a in thread_proxy ()
      #19 0x00007fe23118e6ba in start_thread (arg=0x7fe1dc74d700) at pthread_create.c:333
      #20 0x00007fe230ec43dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

      Hanging out with this in gdb, I don't think the problem is likely in boost::trim() or with locales. If the problem were as simple as that, it would have failed considerably more regularly. I've added a unit test for ParseSmaps which has no trouble passing.

      I'm going fishing for it; wish me luck! My best guess is an interaction between BufferPool::Maintenance() and the usage of those buffers. I'm going to see if TSAN or ASAN builds help me out.

      tarmstrong, I assume you'll be curious about this one.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              philip Philip Martin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: