Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
-
ghx-label-7
Description
When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, will happen deadlock.
The stacktrace is as following:
(gdb) bt #0 0x0000ffff83099544 in __GI___nanosleep (requested_time=0xffffffc71698, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28 #1 0x00000000054cf144 in base::internal::SpinLockDelay (w=0x77385b0 <tcmalloc::Static::pageheap_lock_>, value=2, loop=727956) at /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86 #2 0x0000000005529800 in SpinLock::SlowLock() () #3 0x00000000055fb5c4 in tcmalloc::ThreadCache::InitModule() () #4 0x0000000005743374 in tc_calloc () #5 0x0000ffff81c737f4 in _dlerror_run (operate=operate@entry=0xffff81c73158 <dlsym_doit>, args=0xffffffc717d8, args@entry=0xffffffc717f8) at dlerror.c:140 #6 0x0000ffff81c731f0 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:70 #7 0x000000000310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74 #8 0x000000000310ef1c in (anonymous namespace)::InitIfNecessary () at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100 #9 0x000000000310f0b4 in dl_iterate_phdr (callback=0xffff81620d18 <_Unwind_IteratePhdrCallback>, data=0xffffffc71900) at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158 #10 0x0000ffff816215b4 in _Unwind_Find_FDE (pc=0xffff8161f98f <_Unwind_Backtrace+79>, bases=bases@entry=0xffffffc72438) at ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469 #11 0x0000ffff8161dfdc in uw_frame_state_for (context=context@entry=0xffffffc72110, fs=fs@entry=0xffffffc719f0) at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249 #12 0x0000ffff8161ef3c in uw_init_context_1 (context=context@entry=0xffffffc72110, outer_cfa=0xffffffc72b50, outer_cfa@entry=0xffffffc72be0, outer_ra=0x55298d8 <GetStackTrace_libgcc(void**, int, int)+40>) at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578 #13 0x0000ffff8161f990 in _Unwind_Backtrace (trace=0x5529a48 <libgcc_backtrace_helper(_Unwind_Context*, void*)>, trace_argument=0xffffffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283 #14 0x00000000055298d8 in GetStackTrace_libgcc(void**, int, int) () #15 0x0000000005529db4 in GetStackTrace(void**, int, int) () #16 0x00000000055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
I think this is same issue with https://github.com/gperftools/gperftools/issues/1184 ,
because the issue will happen when I building gperftools both with libunwind and without libunwind .
And KUDU also has same issue:
https://issues.apache.org/jira/browse/KUDU-3072
I think the solution in following link is not correct
https://gerrit.cloudera.org/#/c/15420/
On aarch64 , the method of getting stacktrace is not same with arm.
I think the correct solution of getting stacktrace is should like this:
https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc
or just use libunwind or use gcc.
But I think the gperftools maybe not the root cause of this issue, because both gperftools and libunwind now can support aarch64 perfectly (with libunwind or gcc).
Maybe this commit of kudu has bug?
https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29
Because on x86, the gperftools will not use libunwind or libgcc to getstacktrace, so the issue will not happen.
I tried :
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
#define HOOK_DL_ITERATE_PHDR 1
#endif
change to
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
#define HOOK_DL_ITERATE_PHDR 1
#endif
the deadlock issue will not happen.
tarmstrong@cloudera.com tlipcon adar
What do you think about this issue? how to fix it? any suggestion?