Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3521

Kudu servers sometimes crash when host clock is synchronized by PTPd



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.16.1, 1.18.0, 1.17.1
    • None
    • None


      This issue has been reported on the #kudu-general Slack channel. A Kudu server of 1.16.0 version (not sure whether it was kudu-master or kudu-tserver, but it doesn't matter) crashed with the following error:

      F1024 22:32:06.866636 3323203 hybrid_clock.cc:452] Check failed: _s.ok() unable to get current timestamp with error bound: Service unavailable: clock error estimate (18446744073709551615us) too high (clock considered synchronized by the kernel)

      From the analysis of the code in hybrid_clock.cc, the only case it could happen is when t.maxerror turned to be a negative number (e.g., -1) in this code.

      Negative values of the timex::maxerror field have never been seen when using ntpd or chronyd for clock synchronization, but it's necessary to update the code to adapt for such situations: apparently, PTP might set the maxerror field of the timex structure to a negative value and then call adjtimex(). That's obvious from the PTPd's code. The essence of the issue is using unsigned integers for clock error in the Kudu code, but timex.maxerror is a signed number, and at least PTPd sets it to a negative number when calling adjtimex(). Also, nowhere in the documentation for adjtimex() it's stated that the maxerror field's value should be a non-negative number.

      As a side note, there was a prior attempt to address this issue, but not enough evidence was presented for the RCA.




            aserbin Alexey Serbin
            aserbin Alexey Serbin
            0 Vote for this issue
            3 Start watching this issue