Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-312

impala daemons die if statestore goes down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 1.0
    • Impala 1.0
    • None
    • None

    Description

      Repro:

      ./bin/start-impala-cluster.py -s3 --wait
      killall statestored
      
      #0  0x00007f250419a425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007f250419db8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x00007f2504aec69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #3  0x00007f2504aea846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #4  0x00007f2504aea873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #5  0x00007f2504aea96e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #6  0x0000000000b07c64 in apache::thrift::transport::readAll<apache::thrift::transport::TBufferBase> (trans=..., buf=0x7f24ef275940 "", len=4)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:42
      #7  0x0000000000b06c1f in apache::thrift::transport::TBufferBase::readAll (this=0x391bd60, buf=0x7f24ef275940 "", len=4)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TBufferTransports.h:82
      #8  0x0000000000b32ec5 in apache::thrift::transport::TBufferedTransport::readAll (this=0x391bd60, buf=0x7f24ef275940 "", len=4)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TBufferTransports.h:279
      #9  0x0000000000b8839b in apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TBufferedTransport, apache::thrift::transport::TBufferBase>::readAll_virt (
          this=0x391bd60, buf=0x7f24ef275940 "", len=4) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TVirtualTransport.h:99
      #10 0x0000000000b25c95 in apache::thrift::transport::TTransport::readAll (this=0x391bd60, buf=0x7f24ef275940 "", len=4)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:126
      #11 0x0000000000b89881 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readI32 (this=0x48b1f40, i32=@0x7f24ef2759a0: -282633740)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TBinaryProtocol.tcc:375
      #12 0x0000000000b8906b in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin (this=0x48b1f40, name=..., 
          messageType=@0x7f24ef275ae8: 0, seqid=@0x7f24ef275ae4: 0) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TBinaryProtocol.tcc:206
      #13 0x0000000000b8872c in apache::thrift::protocol::TVirtualProtocol<apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>, apache::thrift::protocol::TProtocolDefaults>::readMessageBegin_virt (this=0x48b1f40, name=..., messageType=@0x7f24ef275ae8: 0, seqid=@0x7f24ef275ae4: 0)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TVirtualProtocol.h:432
      #14 0x0000000000b25fdc in apache::thrift::protocol::TProtocol::readMessageBegin (this=0x48b1f40, name=..., messageType=@0x7f24ef275ae8: 0, seqid=@0x7f24ef275ae4: 0)
          at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TProtocol.h:529
      #15 0x0000000000e19a90 in impala::StateStoreServiceClient::recv_RegisterSubscriber (this=0x48b1f00, _return=...)
          at /home/lskuff/dev/Impala/be/generated-sources/gen-cpp/StateStoreService.cpp:204
      #16 0x0000000000e1989e in impala::StateStoreServiceClient::RegisterSubscriber (this=0x48b1f00, _return=..., params=...)
          at /home/lskuff/dev/Impala/be/generated-sources/gen-cpp/StateStoreService.cpp:180
      #17 0x0000000000d49123 in impala::StateStoreSubscriber::Register (this=0x2b4cf20) at /home/lskuff/dev/Impala/be/src/statestore/state-store-subscriber.cc:118
      #18 0x0000000000d49735 in impala::StateStoreSubscriber::RecoveryModeChecker (this=0x2b4cf20) at /home/lskuff/dev/Impala/be/src/statestore/state-store-subscriber.cc:160
      #19 0x0000000000d53808 in boost::_mfi::mf0<void, impala::StateStoreSubscriber>::operator() (this=0x3fa11c8, p=0x2b4cf20) at /usr/include/boost/bind/mem_fn_template.hpp:49
      #20 0x0000000000d53778 in boost::_bi::list1<boost::_bi::value<impala::StateStoreSubscriber*> >::operator()<boost::_mfi::mf0<void, impala::StateStoreSubscriber>, boost::_bi::list0>
          (this=0x3fa11d8, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253
      #21 0x0000000000d536fd in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::StateStoreSubscriber>, boost::_bi::list1<boost::_bi::value<impala::StateStoreSubscriber*> > >::operator() (this=0x3fa11c8) at /usr/include/boost/bind/bind_template.hpp:20
      #22 0x0000000000d5351c in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::StateStoreSubscriber>, boost::_bi::list1<boost::_bi::value<impala::StateStoreSubscriber*> > > >::run (this=0x3fa1040) at /usr/include/boost/thread/detail/thread.hpp:61
      #23 0x00007f25060cace9 in thread_proxy () from /usr/lib/libboost_thread.so.1.46.1
      #24 0x00007f2505ea8e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #25 0x00007f2504257cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
      #26 0x0000000000000000 in ?? ()
      
      

      hit this running:

      experiments/test_process_failures.py:95: TestProcessFailures.test_restart_statestore[exec_option: {'disable_codegen': False, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] FAILEDmini-impala-cluster: no process found
      impalad: no process found
      statestored: no process found
      
      
      ===================================================================================== FAILURES =====================================================================================
      _________________ TestProcessFailures.test_restart_statestore[exec_option: {'disable_codegen': False, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] __________________
      experiments/test_process_failures.py:102: in test_restart_statestore
      >     impalad.service.wait_for_metric_value('statestore-subscriber.connected', 0, timeout=30)
      common/impala_service.py:60: in wait_for_metric_value
      >       value = self.get_metric_value(metric_name)
      common/impala_service.py:53: in get_metric_value
      >     return json.loads(self._read_debug_webpage('jsonmetrics'))[metric_name]
      common/impala_service.py:50: in _read_debug_webpage
      >     assert 0, 'Debug webpage did not become available in expected time.'
      E     AssertionError: Debug webpage did not become available in expected time.
      ----------------------------------------------------------------------------------- Captured log -----------------------------------------------------------------------------------
      impala_cluster.py           43 INFO     Found 3 impalad processes and 1 statestored processes
      impala_service.py           59 INFO     Getting metric: statestore.live-backends from lskuff-T420s:25010
      impala_service.py           62 INFO     Metric 'statestore.live-backends' has reach desired value: 3
      impala_cluster.py           96 INFO     Attempting to find PID for /home/lskuff/dev/Impala/be/build/debug/statestore/statestored
      impala_cluster.py          117 INFO     Killing: /home/lskuff/dev/Impala/be/build/debug/statestore/statestored (PID: 32186)
      shell_util.py               27 DEBUG    Executing: kill -9 32186
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           66 INFO     Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      impala_service.py           67 INFO     Sleeping 1s before next retry.
      impala_service.py           59 INFO     Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      impala_service.py           48 INFO     Debug webpage not yet available.
      --------------------------------------------------------------------------------- Captured stdout ----------------------------------------------------------------------------------
      Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000
      Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000
      Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000
      Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000
      Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000
      Connected to localhost:21000
      Server version: impalad version 1.0 DEBUG (build 7ab6c4c5b41b3636b735991e64b358656a4c0d65)
      Query: select 1
      Query finished, fetching results ...
      +---+
      | 1 |
      +---+
      | 1 |
      +---+
      Returned 1 row(s) in 0.11s
      Starting State Store with logging to /tmp/statestored.out
      Starting ImpalaD 0 logging to /tmp/impalad.node0.out
      Starting ImpalaD 1 logging to /tmp/impalad.node1.out
      Starting ImpalaD 2 logging to /tmp/impalad.node2.out
      Cluster not yet available. Sleeping...
      Cluster not yet available. Sleeping...
      Cluster not yet available. Sleeping...
      Cluster not yet available. Sleeping...
      Cluster not yet available. Sleeping...
      ImpalaD Cluster Running with 3 nodes.
      --------------------------------------------------------------------------------- Captured stderr ----------------------------------------------------------------------------------
      mini-impala-cluster: no process found
      impalad: no process found
      statestored: no process found
      mini-impala-cluster: no process found
      impalad: no process found
      statestored: no process found
      MainThread: Found 3 impalad processes and 1 statestored processes
      MainThread: Getting metric: statestore.live-backends from lskuff-T420s:25010
      MainThread: Metric 'statestore.live-backends' has reach desired value: 3
      MainThread: Attempting to find PID for /home/lskuff/dev/Impala/be/build/debug/statestore/statestored
      MainThread: Killing: /home/lskuff/dev/Impala/be/build/debug/statestore/statestored (PID: 32186)
      MainThread: Executing: kill -9 32186
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1
      MainThread: Sleeping 1s before next retry.
      MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      MainThread: Debug webpage not yet available.
      
      

      Attachments

        Activity

          People

            alan@cloudera.com Alan Choi
            lskuff Lenni Kuff
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: