Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3398

make lease update more robust

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.6
    • 1.3.7, 1.4
    • core
    • None

    Description

      With the lease check introduced in OAK-2739 (and refined to do a oak-core stop in OAK-3397) it becomes more critical that the lease is always properly updated (to avoid an unnecessary oak-core stop). The following issues exist atm:

      • currently the lease is valid 60sec by default, updated every 20sec, the lease check fails with a margin of 20sec before it times out. this means if the lease update thread is not operating for 20sec it will cause a stop. that's quite a low figure probably
        • the suggestion is to increase the lease timeout to 120sec from 60sec - update it as soon as 10sec has been eaten off it, and leave the 20sec safety margin at the end. This would result in 90sec 'idle equals faulty'
      • on a machine with heavy load it seems likely that the lease-update-thread doesn't get scheduled timely enough - as it races for cpu against all the other busy threads
        • the suggestion is to increase the thread priority of the lease update thread - so if the VM supports thread priorities, that would help reduce lease failure 'just because the cpu is too busy'
      • the ClusterNodeInfo, when renewing the lease, doesn't check if the lease has been marked as timed-out/recovering by another instance. it just overwrites whatever is there.
        • It should, however, only update the lease when it has not yet been marked as timed out.

      Attachments

        Issue Links

          Activity

            People

              stefanegli Stefan Egli
              stefanegli Stefan Egli
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: