Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7252

Backport rate limiting of fadvise calls into toolchain glog

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 3.0
    • Fix Version/s: Impala 3.1.0
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-9

      Description

      Currently, glog's default behavior is to call fadvise(FADV_DONTNEED) on the log file after each entry that is written. In many versions of the Linux kernel, each invocation of this call causes work to be scheduled on all other CPUs, causing up to one context switch per CPU for every log line. We saw this cause an extremely long GC pause in the catalogd in the case where the native side of the catalog was logging a lot of messages about publishing metadata updates at the same time that the Java side was running a GC. The GC spent almost all of its time in the kernel due to the high context switch rate causing a lot of TLB clears and misses, and instead of pausing the JVM for a couple of seconds took several minutes.

      This was identified and fixed upstream in glog here: https://github.com/google/glog/commit/dacd29679633c9b845708e7015bd2c79367a6ea2

      We should backport this fix into the version in the toolchain.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tianyiwang Tianyi Wang
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: