Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3048

Add time/clock synchronization metrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • clock, master, tserver

    Description

      For better visibility, it would be great to add metrics reflecting time/clock synchronization parameters:

      • the stats on the max_error sampled while reading the underlying clock
      • the stats on time intervals when the underlying clock was extrapolated instead of using the actual readings: number of such intervals and stats on the interval duration
      • whether hybrid clock timestamps are generated using interpolated clock readings instead of real ones
      • if using the built-in time source:
        • difference between tracked true time and local wallclock
        • most recently computed true time
        • the stats on the maximum error of the computed true time

      As for the rationale behind the new metrics:

      • max_error shows how far the clock is from the true time, and maybe it's time to use other set of NTP servers or instead increase the --max_clock_sync_error_usec flag value
      • presence of the extrapolation intervals for the hybrid clock signals about periods of non-availability for NTP servers, and possible action would be re-visiting the set of NTP servers
      • if hybrid timestamps are being extrapolated for some time, Kudu masters and tablet servers might crash if the clock errors eventually goes beyond the configured threshold: it's time to start troubleshooting the issue to avoid possible non-availability of the cluster
      • the delta between true time tracked by the built-in NTP client and the local system clock is useful to understand how the log timestamps are related to the HybridClock timestamps (in case of using the built-in NTP client those might diverge)
      • the stats on true time computed by the built-in NTP client give insights on the quality of the reference NTP servers

      The new metrics can be used for monitoring and alerting, allowing for pro-active maintenance of a Kudu cluster.

      Attachments

        Activity

          People

            aserbin Alexey Serbin
            aserbin Alexey Serbin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: