Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 1.0
-
None
-
None
Description
Repro:
./bin/start-impala-cluster.py -s3 --wait killall statestored
#0 0x00007f250419a425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f250419db8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f2504aec69d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x00007f2504aea846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007f2504aea873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00007f2504aea96e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x0000000000b07c64 in apache::thrift::transport::readAll<apache::thrift::transport::TBufferBase> (trans=..., buf=0x7f24ef275940 "", len=4) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:42 #7 0x0000000000b06c1f in apache::thrift::transport::TBufferBase::readAll (this=0x391bd60, buf=0x7f24ef275940 "", len=4) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TBufferTransports.h:82 #8 0x0000000000b32ec5 in apache::thrift::transport::TBufferedTransport::readAll (this=0x391bd60, buf=0x7f24ef275940 "", len=4) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TBufferTransports.h:279 #9 0x0000000000b8839b in apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TBufferedTransport, apache::thrift::transport::TBufferBase>::readAll_virt ( this=0x391bd60, buf=0x7f24ef275940 "", len=4) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TVirtualTransport.h:99 #10 0x0000000000b25c95 in apache::thrift::transport::TTransport::readAll (this=0x391bd60, buf=0x7f24ef275940 "", len=4) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:126 #11 0x0000000000b89881 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readI32 (this=0x48b1f40, i32=@0x7f24ef2759a0: -282633740) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TBinaryProtocol.tcc:375 #12 0x0000000000b8906b in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin (this=0x48b1f40, name=..., messageType=@0x7f24ef275ae8: 0, seqid=@0x7f24ef275ae4: 0) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TBinaryProtocol.tcc:206 #13 0x0000000000b8872c in apache::thrift::protocol::TVirtualProtocol<apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>, apache::thrift::protocol::TProtocolDefaults>::readMessageBegin_virt (this=0x48b1f40, name=..., messageType=@0x7f24ef275ae8: 0, seqid=@0x7f24ef275ae4: 0) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TVirtualProtocol.h:432 #14 0x0000000000b25fdc in apache::thrift::protocol::TProtocol::readMessageBegin (this=0x48b1f40, name=..., messageType=@0x7f24ef275ae8: 0, seqid=@0x7f24ef275ae4: 0) at /home/lskuff/dev/Impala/thirdparty/thrift-0.9.0/build/include/thrift/protocol/TProtocol.h:529 #15 0x0000000000e19a90 in impala::StateStoreServiceClient::recv_RegisterSubscriber (this=0x48b1f00, _return=...) at /home/lskuff/dev/Impala/be/generated-sources/gen-cpp/StateStoreService.cpp:204 #16 0x0000000000e1989e in impala::StateStoreServiceClient::RegisterSubscriber (this=0x48b1f00, _return=..., params=...) at /home/lskuff/dev/Impala/be/generated-sources/gen-cpp/StateStoreService.cpp:180 #17 0x0000000000d49123 in impala::StateStoreSubscriber::Register (this=0x2b4cf20) at /home/lskuff/dev/Impala/be/src/statestore/state-store-subscriber.cc:118 #18 0x0000000000d49735 in impala::StateStoreSubscriber::RecoveryModeChecker (this=0x2b4cf20) at /home/lskuff/dev/Impala/be/src/statestore/state-store-subscriber.cc:160 #19 0x0000000000d53808 in boost::_mfi::mf0<void, impala::StateStoreSubscriber>::operator() (this=0x3fa11c8, p=0x2b4cf20) at /usr/include/boost/bind/mem_fn_template.hpp:49 #20 0x0000000000d53778 in boost::_bi::list1<boost::_bi::value<impala::StateStoreSubscriber*> >::operator()<boost::_mfi::mf0<void, impala::StateStoreSubscriber>, boost::_bi::list0> (this=0x3fa11d8, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253 #21 0x0000000000d536fd in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::StateStoreSubscriber>, boost::_bi::list1<boost::_bi::value<impala::StateStoreSubscriber*> > >::operator() (this=0x3fa11c8) at /usr/include/boost/bind/bind_template.hpp:20 #22 0x0000000000d5351c in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::StateStoreSubscriber>, boost::_bi::list1<boost::_bi::value<impala::StateStoreSubscriber*> > > >::run (this=0x3fa1040) at /usr/include/boost/thread/detail/thread.hpp:61 #23 0x00007f25060cace9 in thread_proxy () from /usr/lib/libboost_thread.so.1.46.1 #24 0x00007f2505ea8e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #25 0x00007f2504257cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #26 0x0000000000000000 in ?? ()
hit this running:
experiments/test_process_failures.py:95: TestProcessFailures.test_restart_statestore[exec_option: {'disable_codegen': False, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] FAILEDmini-impala-cluster: no process found impalad: no process found statestored: no process found ===================================================================================== FAILURES ===================================================================================== _________________ TestProcessFailures.test_restart_statestore[exec_option: {'disable_codegen': False, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] __________________ experiments/test_process_failures.py:102: in test_restart_statestore > impalad.service.wait_for_metric_value('statestore-subscriber.connected', 0, timeout=30) common/impala_service.py:60: in wait_for_metric_value > value = self.get_metric_value(metric_name) common/impala_service.py:53: in get_metric_value > return json.loads(self._read_debug_webpage('jsonmetrics'))[metric_name] common/impala_service.py:50: in _read_debug_webpage > assert 0, 'Debug webpage did not become available in expected time.' E AssertionError: Debug webpage did not become available in expected time. ----------------------------------------------------------------------------------- Captured log ----------------------------------------------------------------------------------- impala_cluster.py 43 INFO Found 3 impalad processes and 1 statestored processes impala_service.py 59 INFO Getting metric: statestore.live-backends from lskuff-T420s:25010 impala_service.py 62 INFO Metric 'statestore.live-backends' has reach desired value: 3 impala_cluster.py 96 INFO Attempting to find PID for /home/lskuff/dev/Impala/be/build/debug/statestore/statestored impala_cluster.py 117 INFO Killing: /home/lskuff/dev/Impala/be/build/debug/statestore/statestored (PID: 32186) shell_util.py 27 DEBUG Executing: kill -9 32186 impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 66 INFO Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 impala_service.py 67 INFO Sleeping 1s before next retry. impala_service.py 59 INFO Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. impala_service.py 48 INFO Debug webpage not yet available. --------------------------------------------------------------------------------- Captured stdout ---------------------------------------------------------------------------------- Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000 Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000 Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000 Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000 Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, Could not connect to localhost:21000 Connected to localhost:21000 Server version: impalad version 1.0 DEBUG (build 7ab6c4c5b41b3636b735991e64b358656a4c0d65) Query: select 1 Query finished, fetching results ... +---+ | 1 | +---+ | 1 | +---+ Returned 1 row(s) in 0.11s Starting State Store with logging to /tmp/statestored.out Starting ImpalaD 0 logging to /tmp/impalad.node0.out Starting ImpalaD 1 logging to /tmp/impalad.node1.out Starting ImpalaD 2 logging to /tmp/impalad.node2.out Cluster not yet available. Sleeping... Cluster not yet available. Sleeping... Cluster not yet available. Sleeping... Cluster not yet available. Sleeping... Cluster not yet available. Sleeping... ImpalaD Cluster Running with 3 nodes. --------------------------------------------------------------------------------- Captured stderr ---------------------------------------------------------------------------------- mini-impala-cluster: no process found impalad: no process found statestored: no process found mini-impala-cluster: no process found impalad: no process found statestored: no process found MainThread: Found 3 impalad processes and 1 statestored processes MainThread: Getting metric: statestore.live-backends from lskuff-T420s:25010 MainThread: Metric 'statestore.live-backends' has reach desired value: 3 MainThread: Attempting to find PID for /home/lskuff/dev/Impala/be/build/debug/statestore/statestored MainThread: Killing: /home/lskuff/dev/Impala/be/build/debug/statestore/statestored (PID: 32186) MainThread: Executing: kill -9 32186 MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Waiting for metric value 'statestore-subscriber.connected'=0. Current value: 1 MainThread: Sleeping 1s before next retry. MainThread: Getting metric: statestore-subscriber.connected from lskuff-T420s:25000 MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available. MainThread: Debug webpage not yet available.