A single temporary network issue can shut down the DocumentStore. We observed the following situation:
- org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease was called (this is done regularly and completely normal)
- the network had a temporary issue (whatsoever)
- the database call terminated after a lot of time (the default db maxWaitTime is 120 seconds).
- org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease decides that the current lease is too old (>120 seconds thats the default for the oak.documentMK.leaseDurationSeconds property), sets a leaseCheckFailed variable and throws an Exception
- because leaseCheckFailed is set all following tries (if any) will immediately throw an Exception, too.
I'd recommend to make the ClusterNodeInfo code more robust so that at least one retry will be made.