Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3517

Kudu servers crash on Graviton3 (aarch64) instances in EC2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.17.0
    • 1.18.0
    • CLI, client, master, tserver
    • Graviton3 instances in EC2

    Description

      Kudu masters and tablet servers built from the source code released with Kudu 1.17.0 crash with SIGSEGV when running on Graviton3 (aarch64) instances in EC2.

      Upon closer examination, it turned out the problem happens when StackCollector tries to symbolize a thread's stack, and an example of the trace looked like below. The stack trace has been collected under GDB when running a smoke test with the kudu CLI tool: kudu perf loadgen <master_rpc_addr> --table_num_replicas=3 --num_rows_per_thread=1000000:

      #0  access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688, 
          val=0xfffff325ca18, write=0, arg=0xfffff325ce70)
          at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337
      #1  0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70)
          at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43
      #2  0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70)
          at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171
      #3  0x00000000025050c8 in kudu::StackTrace::Collect (
          this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
          at /root/Projects/kudu/src/kudu/util/debug-util.cc:612
      #4  0x0000000002507f64 in kudu::StackTrace::Collect (
          this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
          at /root/Projects/kudu/src/kudu/util/debug-util.cc:579
      #5  0x000000000259c390 in kudu::(anonymous namespace)::SubmitSpinLockProfileData (contendedlock=0x4ed8a220, wait_cycles=2966400)
          at /root/Projects/kudu/src/kudu/util/spinlock_profiling.cc:229
      

      The crash happens with SIGSEGV somewhere in the libunwind code, and that looks very similar to what's reported in this github issue.

      Attachments

        Activity

          People

            aserbin Alexey Serbin
            aserbin Alexey Serbin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: