Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3238

fine tune clock-sync check vs lease-check settings

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.4
    • 1.3.5, 1.4
    • core
    • None

    Description

      There are now two components that try to assure 'discovery-lite' (OAK-2844) is reporting a coherent cluster view to the upper layers:

      • OAK-2682 : time difference detection: by default fails if clock is off by more than 2 seconds at startup. That results in a 4 sec max margin in a document-cluster
      • OAK-2739 : lease-checking: every instance checks if the local lease is valid upon any document access. This check is done against the actual 'leaseEndTime' - which is updated every (by default) 30 seconds to be valid for (by default) another 60 seconds.

      These two factors combined, in the worst case you could still end up having that 4 second time window where the local instance fails to update the lease (eg lease-thread dies) but it considers itself still owning a valid lease - while a remote instance might be those 4 seconds off and considers the lease as timed out.

      So overall: the 3 factors 'lease duration', 'lease update frequency' and 'maximum allowed clock difference' must be better tuned to end up in a stable mechanism.

      Suggestion:

      • increase the 'lease duration' to be 3 x 'lease update frequency', ie 90sec lease duration
      • reduce the lease check failure limit from 'lease duration' to 2x 'lease update frequency' - assuming that one 'lease update interval' is way larger than the 'maximum allowed clock difference'

      Attachments

        Issue Links

          Activity

            People

              stefanegli Stefan Egli
              stefanegli Stefan Egli
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: