Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-517

SIGSEGV in gperftools GetStackTrace

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • M4.5
    • n/a
    • tserver
    • None

    Description

      Sometimes, during a LEAKCHECK build, the TS gets a SIGSEGV from within tcmalloc's GetStackTrace() call. For example, here it is in a Java unit test:

      2014-09-23 15:42:05,296 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] *** Aborted at 1411512125 (unix time) try "date -d @1411512125" if you are using GNU date ***
      2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] PC: @           0x6ed17f GetStackTrace()
      2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)] *** SIGSEGV (@0x7faf9d010008) received by PID 28759 (TID 0x7faf9d00f700) from PID 18446744072048672776; stack trace: ***
      2014-09-23 15:42:05,299 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @       0x32aae0f500 (unknown)
      2014-09-23 15:42:05,300 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x6ed17f GetStackTrace()
      2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x6e70c9 MallocHook_GetCallerStackTrace
      2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x6f01c6 NewHook()
      2014-09-23 15:42:05,304 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x6e6bf6 MallocHook::InvokeNewHookSlow()
      2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @          0x119eca3 tc_new
      2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @       0x32ad29c3c9 (unknown)
      2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @       0x32ad29cde5 (unknown)
      2014-09-23 15:42:05,306 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @       0x32ad29cf33 (unknown)
      2014-09-23 15:42:05,308 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x822c32 kudu::log::Log::CreatePlaceholderSegment()
      2014-09-23 15:42:05,311 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x823042 kudu::log::Log::PreAllocateNewSegment()
      2014-09-23 15:42:05,315 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @           0x82ae43 kudu::log::Log::SegmentAllocationTask::Run()
      2014-09-23 15:42:05,318 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @          0x10f125c kudu::FutureTask::Run()
      2014-09-23 15:42:05,321 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @          0x10f879f kudu::ThreadPool::DispatchThread()
      2014-09-23 15:42:05,323 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @          0x10f571f kudu::Thread::SuperviseThread()
      2014-09-23 15:42:05,324 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @       0x32aae07851 (unknown)
      2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @       0x32aaae894d (unknown)
      2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO - kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]     @                0x0 (unknown)
      

      And here's another example from linked_list-test:

          *** Aborted at 1405495265 (unix time) try "date -d @1405495265" if you are using GNU date ***
          PC: @           0x72032f GetStackTrace()
          *** SIGSEGV (@0x7fffe9d08d08) received by PID 9817 (TID 0x7fc3a5bce800) from PID 18446744073337343240; stack trace: ***
              @       0x3918c0f4a0 (unknown) at ??:0
              @           0x72032f GetStackTrace() at /home/adar/kudu/thirdparty/gperftools-2.1/src/stacktrace_x86-inl.h:325
              @           0x71a279 MallocHook_GetCallerStackTrace at /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:666
              @           0x723376 NewHook() at /home/adar/kudu/thirdparty/gperftools-2.1/src/heap-checker.cc:575
              @           0x719da6 MallocHook::InvokeNewHookSlow() at /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:525
              @           0xa4ed43 tc_new at /home/adar/kudu/thirdparty/gperftools-2.1/src/tcmalloc.cc:1607
              @       0x30d0c9c3c9 (unknown) at ??:0
              @       0x30d0c9cde5 (unknown) at ??:0
              @       0x30d0c9cf33 (unknown) at ??:0
              @           0x77afb4 kudu::master::MasterServiceProxy::MasterServiceProxy() at /home/adar/kudu/src/master/master.proxy.cc:16
              @           0x6e0eff kudu::ExternalMiniCluster::master_proxy() at /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:231
              @           0x6df2f7 kudu::ExternalMiniCluster::WaitForTabletServerCount() at /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:204
              @           0x6de6f9 kudu::ExternalMiniCluster::Start() at /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:117
              @           0x6cd5d5 kudu::LinkedListTest::RestartCluster() at /home/adar/kudu/src/integration-tests/linked_list-test.cc:120
              @           0x6c98e8 kudu::LinkedListTest_TestLoadAndVerify_Test::TestBody() at /home/adar/kudu/src/integration-tests/linked_list-test.cc:483
      

      Both Todd and I tried to fix this by switching gperftools over to using libunwind for stack traces. Both times a bunch of unit tests slowed down considerably, leading us to abandon our efforts. You can see the gerrits here:
      http://gerrit.sjc.cloudera.com:8080/#/c/3477/
      http://gerrit.sjc.cloudera.com:8080/#/c/3520/

      Todd said that Impala has a hack to make the gperftools unwinder less dangerous. That might be worth exploring. Or perhaps it's a known issue fixed in newer versions of gperftools. https://code.google.com/p/gperftools/issues/detail?id=66 and https://code.google.com/p/gperftools/issues/detail?id=547 seem related, and were fixed as part of gperftools-2.2 (we're still on 2.1).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adar Adar Dembo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: