Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2144

Add metric for reactor load

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • metrics, ops-tooling
    • None

    Description

      Recently I was debugging a cluster that appeared to have network issues. Only after lots of investigation did I realize that the reactor threads were not keeping up with network traffic due to hitting KUDU-1964 (this cluster was running 1.3.0). At first glance the reactors did not seem busy, since each was only using ~25% of a CPU – however, the other 75% of the time was spent blocked on OpenSSL locks and not in epoll_wait as one would normally expect.

      This would be easier to diagnose if we had a metric showing the amount of time the reactors spend idle (ie in epoll_wait) vs doing work (ie executing callbacks, IO, etc). If any reactor is spending a high percentage of time not in epoll, that suggests the reactors may be a bottleneck and increasing latency or degrading throughput.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: