Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.0
-
None
-
None
Description
Rarely, the kernel stack watchdog can cause a segfault due to a bug in libunwind.
*** Aborted at 1516180006 (unix time) try "date -d @1516180006" if you are using GNU date *** PC: @ 0x8c94b4 (unknown) *** SIGSEGV (@0x7f27173e0000) received by PID 22279 (TID 0x7f270f87f700) from PID 389939200; stack trace: ***
From a core file (produced from the minidump), the backtrace is
#0 access_mem (as=<optimized out>, addr=139805870391296, val=0x7f270f87bcc0, write=<optimized out>, arg=<optimized out>) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Ginit.c:173 #1 0x00000000008c8e02 in is_plt_entry (c=0x7f270f87c0e0) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:43 #2 _ULx86_64_step (cursor=0x7f270f87c0e0) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/libunwind-1.1a/src/x86_64/Gstep.c:125 #3 0x00000000008c412d in google::GetStackTrace (result=result@entry=0x292c0c8, max_depth=max_depth@entry=16, skip_count=0, skip_count@entry=2) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/thirdparty/src/glog-0.3.5/src/stacktrace_libunwind-inl.h:78 #4 0x0000000001a9be8c in Collect (skip_frames=2, this=0x292c0c0) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:350 #5 kudu::(anonymous namespace)::HandleStackTraceSignal (signum=<optimized out>) at /usr/src/debug/kudu-1.5.0-cdh5.13.1/src/kudu/util/debug-util.cc:176 #6 0x00007f2716854670 in _quicksort () from ./lib64/libc.so.6 #7 0x0000000000000000 in ?? ()
Note that addr = 139805870391296 = 0x7f27173e0000.
The segfault happens because libunwind is accessing invalid memory it's supposed to have validated:
/* validate address */ const struct cursor *c = (const struct cursor *)arg; if (likely (c != NULL) && unlikely (c->validate) && unlikely (validate_mem (addr))) return -1; *val = *(unw_word_t *) addr;
Others have seen this same problem before.
There's also a fix for this issue in commit 836c91c43d7a996028aa7e8d1f53630a6b8e7cbe. It's not in any release of libunwind yet, so we could do one of the following
- upgrade libunwind to 1.2 (most recent release) and patch in the fix
- upgrade to a snapshot containing the fix
To workaround, one can set --hung_task_check_interval_ms to a large value like 2^30, so the stack watchdog runs very rarely (although the flag is a 32-bit signed integer, so not too big). The tradeoff is the effective loss of the stack watchdog, which can make debugging certain performance problems more difficult.
Attachments
Issue Links
- is related to
-
KUDU-2291 Implement a /stacks web page
- Resolved