Description
I'm seeing a small percentage of test timeouts caused by a hang in subprocess starting. I managed to catch one in the act and attach to the subprocess in gdb. It's hung before 'exec' here:
* 1 Thread 0x7f1feebdb440 (LWP 9453) "disk_failure-it" __sanitizer::internal_sched_yield () at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:414 (gdb) bt #0 __sanitizer::internal_sched_yield () at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:414 #1 0x00000000004c39db in Do (this=<synthetic pointer>) at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_mutex.cc:195 #2 __tsan::Mutex::Lock (this=this@entry=0x600001000000) at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_mutex.cc:235 #3 0x00000000004c72ed in GenericScopedLock (mu=0x600001000000, this=<synthetic pointer>) at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_mutex.h:189 #4 TraceSwitch (thr=<optimized out>) at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.cc:552 #5 __tsan::__tsan_trace_switch () at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.cc:581 #6 0x00000000004da3df in __tsan_trace_switch_thunk () at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_rtl_amd64.S:53 #7 0x00000000004cfabc in TraceAddEvent (thr=<optimized out>, addr=0, typ=__tsan::EventTypeFuncExit, fs=...) at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.h:845 #8 FuncExit (thr=<optimized out>) at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.cc:997 #9 __tsan_func_exit () at /data/1/todd/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_inl.h:108 #10 0x00007f1fe06a3ee5 in safe_strtou32_base (str=<optimized out>, value=<optimized out>, base=<optimized out>) at ../../src/kudu/gutil/strings/numbers.cc:717 #11 0x00007f1fe06a4426 in safe_strtou32 (str=0xfffffffffffc04c0 <error: Cannot access memory at address 0xfffffffffffc04c0>, value=0x1fffff) at ../../src/kudu/gutil/strings/numbers.cc:773 #12 0x00007f1fe288e42f in kudu::(anonymous namespace)::CloseNonStandardFDs (fd_dir=0x7bb400000000) at ../../src/kudu/util/subprocess.cc:144 #13 0x00007f1fe288d3b4 in kudu::Subprocess::Start (this=<optimized out>) at ../../src/kudu/util/subprocess.cc:418 #14 0x00007f1fee0ddcf0 in kudu::cluster::ExternalDaemon::StartProcess (this=<optimized out>, user_flags=...) at ../../src/kudu/mini-cluster/external_mini_cluster.cc:821