Details
Description
Kudu masters and tablet servers built from the source code released with Kudu 1.17.0 crash with SIGSEGV when running on Graviton3 (aarch64) instances in EC2.
Upon closer examination, it turned out the problem happens when StackCollector tries to symbolize a thread's stack, and an example of the trace looked like below. The stack trace has been collected under GDB when running a smoke test with the kudu CLI tool: kudu perf loadgen <master_rpc_addr> --table_num_replicas=3 --num_rows_per_thread=1000000:
#0 access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688, val=0xfffff325ca18, write=0, arg=0xfffff325ce70) at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337 #1 0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70) at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43 #2 0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70) at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171 #3 0x00000000025050c8 in kudu::StackTrace::Collect ( this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0) at /root/Projects/kudu/src/kudu/util/debug-util.cc:612 #4 0x0000000002507f64 in kudu::StackTrace::Collect ( this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0) at /root/Projects/kudu/src/kudu/util/debug-util.cc:579 #5 0x000000000259c390 in kudu::(anonymous namespace)::SubmitSpinLockProfileData (contendedlock=0x4ed8a220, wait_cycles=2966400) at /root/Projects/kudu/src/kudu/util/spinlock_profiling.cc:229
The crash happens with SIGSEGV somewhere in the libunwind code, and that looks very similar to what's reported in this github issue.