Currently, glog's default behavior is to call fadvise(FADV_DONTNEED) on the log file after each entry that is written. In many versions of the Linux kernel, each invocation of this call causes work to be scheduled on all other CPUs, causing up to one context switch per CPU for every log line. We saw this cause an extremely long GC pause in the catalogd in the case where the native side of the catalog was logging a lot of messages about publishing metadata updates at the same time that the Java side was running a GC. The GC spent almost all of its time in the kernel due to the high context switch rate causing a lot of TLB clears and misses, and instead of pausing the JVM for a couple of seconds took several minutes.
This was identified and fixed upstream in glog here: https://github.com/google/glog/commit/dacd29679633c9b845708e7015bd2c79367a6ea2
We should backport this fix into the version in the toolchain.