Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2275

SIGSEGV due to bug in libunwind

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.7.0
    • None
    • None

    Description

      Rarely, the kernel stack watchdog can cause a segfault due to a bug in libunwind.

      *** Aborted at 1516180006 (unix time) try "date -d @1516180006" if you are using GNU date ***
      
      PC: @ 0x8c94b4 (unknown)
      
      *** SIGSEGV (@0x7f27173e0000) received by PID 22279 (TID 0x7f270f87f700) from PID 389939200; stack trace: ***

      From a core file (produced from the minidump), the backtrace is

      #0  access_mem (as=<optimized out>, addr=139805870391296, val=0x7f270f87bcc0, write=<optimized out>, arg=<optimized out>)
         at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Ginit.c:173
      #1  0x00000000008c8e02 in is_plt_entry (c=0x7f270f87c0e0) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:43
      #2  _ULx86_64_step (cursor=0x7f270f87c0e0) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:125
      #3  0x00000000008c412d in google::GetStackTrace (result=result@entry=0x292c0c8, max_depth=max_depth@entry=16, skip_count=0, skip_count@entry=2)
         at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/glog-0.3.5/src/stacktrace_libunwind-inl.h:78
      #4  0x0000000001a9be8c in Collect (skip_frames=2, this=0x292c0c0) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:350
      #5  kudu::(anonymous namespace)::HandleStackTraceSignal (signum=<optimized out>) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:176
      #6  0x00007f2716854670 in _quicksort () from ./lib64/libc.so.6
      #7  0x0000000000000000 in ?? ()

      Note that addr = 139805870391296 = 0x7f27173e0000.

      The segfault happens because libunwind is accessing invalid memory it's supposed to have validated:

      /* validate address */
      const struct cursor *c = (const struct cursor *)arg;
      if (likely (c != NULL) && unlikely (c->validate)
          && unlikely (validate_mem (addr)))
          return -1;
      *val = *(unw_word_t *) addr;

      Others have seen this same problem before.

      There's also a fix for this issue in commit 836c91c43d7a996028aa7e8d1f53630a6b8e7cbe. It's not in any release of libunwind yet, so we could do one of the following

      1. upgrade libunwind to 1.2 (most recent release) and patch in the fix
      2. upgrade to a snapshot containing the fix

      To workaround, one can set --hung_task_check_interval_ms to a large value like 2^30, so the stack watchdog runs very rarely (although the flag is a 32-bit signed integer, so not too big). The tradeoff is the effective loss of the stack watchdog, which can make debugging certain performance problems more difficult.

      Attachments

        Issue Links

          Activity

            People

              tlipcon Todd Lipcon
              wdberkeley William Berkeley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: