Description
Sometimes, during a LEAKCHECK build, the TS gets a SIGSEGV from within tcmalloc's GetStackTrace() call. For example, here it is in a Java unit test:
2014-09-23 15:42:05,296 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] *** Aborted at 1411512125 (unix time) try "date -d @1411512125" if you are using GNU date *** 2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] PC: @ 0x6ed17f GetStackTrace() 2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] *** SIGSEGV (@0x7faf9d010008) received by PID 28759 (TID 0x7faf9d00f700) from PID 18446744072048672776; stack trace: *** 2014-09-23 15:42:05,299 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x32aae0f500 (unknown) 2014-09-23 15:42:05,300 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x6ed17f GetStackTrace() 2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x6e70c9 MallocHook_GetCallerStackTrace 2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x6f01c6 NewHook() 2014-09-23 15:42:05,304 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x6e6bf6 MallocHook::InvokeNewHookSlow() 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x119eca3 tc_new 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x32ad29c3c9 (unknown) 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x32ad29cde5 (unknown) 2014-09-23 15:42:05,306 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x32ad29cf33 (unknown) 2014-09-23 15:42:05,308 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x822c32 kudu::log::Log::CreatePlaceholderSegment() 2014-09-23 15:42:05,311 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x823042 kudu::log::Log::PreAllocateNewSegment() 2014-09-23 15:42:05,315 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x82ae43 kudu::log::Log::SegmentAllocationTask::Run() 2014-09-23 15:42:05,318 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x10f125c kudu::FutureTask::Run() 2014-09-23 15:42:05,321 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x10f879f kudu::ThreadPool::DispatchThread() 2014-09-23 15:42:05,323 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x10f571f kudu::Thread::SuperviseThread() 2014-09-23 15:42:05,324 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x32aae07851 (unknown) 2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x32aaae894d (unknown) 2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] @ 0x0 (unknown)
And here's another example from linked_list-test:
*** Aborted at 1405495265 (unix time) try "date -d @1405495265" if you are using GNU date *** PC: @ 0x72032f GetStackTrace() *** SIGSEGV (@0x7fffe9d08d08) received by PID 9817 (TID 0x7fc3a5bce800) from PID 18446744073337343240; stack trace: *** @ 0x3918c0f4a0 (unknown) at ??:0 @ 0x72032f GetStackTrace() at /home/adar/kudu/thirdparty/gperftools-2.1/src/stacktrace_x86-inl.h:325 @ 0x71a279 MallocHook_GetCallerStackTrace at /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:666 @ 0x723376 NewHook() at /home/adar/kudu/thirdparty/gperftools-2.1/src/heap-checker.cc:575 @ 0x719da6 MallocHook::InvokeNewHookSlow() at /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:525 @ 0xa4ed43 tc_new at /home/adar/kudu/thirdparty/gperftools-2.1/src/tcmalloc.cc:1607 @ 0x30d0c9c3c9 (unknown) at ??:0 @ 0x30d0c9cde5 (unknown) at ??:0 @ 0x30d0c9cf33 (unknown) at ??:0 @ 0x77afb4 kudu::master::MasterServiceProxy::MasterServiceProxy() at /home/adar/kudu/src/master/master.proxy.cc:16 @ 0x6e0eff kudu::ExternalMiniCluster::master_proxy() at /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:231 @ 0x6df2f7 kudu::ExternalMiniCluster::WaitForTabletServerCount() at /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:204 @ 0x6de6f9 kudu::ExternalMiniCluster::Start() at /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:117 @ 0x6cd5d5 kudu::LinkedListTest::RestartCluster() at /home/adar/kudu/src/integration-tests/linked_list-test.cc:120 @ 0x6c98e8 kudu::LinkedListTest_TestLoadAndVerify_Test::TestBody() at /home/adar/kudu/src/integration-tests/linked_list-test.cc:483
Both Todd and I tried to fix this by switching gperftools over to using libunwind for stack traces. Both times a bunch of unit tests slowed down considerably, leading us to abandon our efforts. You can see the gerrits here:
http://gerrit.sjc.cloudera.com:8080/#/c/3477/
http://gerrit.sjc.cloudera.com:8080/#/c/3520/
Todd said that Impala has a hack to make the gperftools unwinder less dangerous. That might be worth exploring. Or perhaps it's a known issue fixed in newer versions of gperftools. https://code.google.com/p/gperftools/issues/detail?id=66 and https://code.google.com/p/gperftools/issues/detail?id=547 seem related, and were fixed as part of gperftools-2.2 (we're still on 2.1).
Attachments
Issue Links
- duplicates
-
KUDU-539 linked_list-test dies trying to print a pb
- Resolved